Exploring Kubernetes from within Kubernetes... with Python - PART 2

Part 1 is available here

Part 1 was a "set the scene" kind of post and in the post I will explore the actual Python solution to gather some basic statistics from within the cluster.

Once again, keep in mind that this is not a "how to measure resources" blog post - there are far better tools for that. This is a way to understand the Python Kubernetes Client at the hand of a practical example.

The goal

The goal for this post is a Python application with a REST API that will print the available CPU and Memory per Node as well as the committed CPU and Memory per Node.

The output data will be in JSON with the following basic structure (example output, showing 1 node only):

{
    "Nodes": [
        {
            "NodeName": "node3",
            "CPU": {
                "Capacity": 2000,
                "Allocatable": 2000,
                "Requests": {
                    "InstrumentedValue": 750.0,
                    "Percent": 37.5
                },
                "Limits": {
                    "InstrumentedValue": 1500.0,
                    "Percent": 75.0
                }
            },
            "RAM": {
                "Capacity": 8344227840,
                "Allocatable": 8344227840,
                "Requests": {
                    "InstrumentedValue": 530579456.0,
                    "Percent": 6.358640561761075
                },
                "Limits": {
                    "InstrumentedValue": 1073741824.0,
                    "Percent": 12.868078923405813
                }
            }
        }
    ]
}

Project Implementation

The Python code can be found on GitHub, is the (probably) funny named pykles repository.

The container is also already available on Docker Hub

The focus of this blog post in the Kubernetes CLient implementation and perhaps I will do another blog post (or two) about the rest of the stack, like FastAPI.

In Part 1 I discussed the background and I also refer to my K3s Kubernetes cluster as described in an even earlier blog post.

Deployment

To deploy the application, simply run the following:

kubectl create namespace pykles

kubectl apply -f https://raw.githubusercontent.com/nicc777/pykles/main/kubernetes_manifests/pykles.yaml -n pykles

Once deployed, your resources should look something like this:

NAME                                     READY   STATUS    RESTARTS   AGE
pod/pykles-deployment-7944474c57-phw2t   1/1     Running   0          50s

NAME                         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/pykles-app-service   ClusterIP   10.43.122.237   <none>        8080/TCP   50s

NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/pykles-deployment   1/1     1            1           50s

NAME                                           DESIRED   CURRENT   READY   AGE
replicaset.apps/pykles-deployment-7944474c57   1         1         1       50s

But in order to test, we need to connect to the REST API. To simplify this process, the following command illustrate how to use the kubectl port forwarder:

kubectl port-forward pod/pykles-deployment-7944474c57-phw2t -n pykles 7081:8080

Note: Ensure you refer to the correct POD!

Simple Testing of the REST end-point

We can now use curl to test:

curl http://localhost:7081/

Your output should look something like this:

{"Nodes":[{"NodeName":"node1","CPU":{"Capacity":2000,"Allocatable":2000,"Requests":{"InstrumentedValue":0.0,"Percent":0.0},"Limits":{"InstrumentedValue":0.0,"Percent":0.0}},"RAM":{"Capacity":8344227840,"Allocatable":8344227840,"Requests":{"InstrumentedValue":73400320.0,"Percent":0.8796538326546942},"Limits":{"InstrumentedValue":178257920.0,"Percent":2.136302165018543}}},{"NodeName":"node2","CPU":{"Capacity":2000,"Allocatable":2000,"Requests":{"InstrumentedValue":500.0,"Percent":25.0},"Limits":{"InstrumentedValue":1000.0,"Percent":50.0}},"RAM":{"Capacity":8344227840,"Allocatable":8344227840,"Requests":{"InstrumentedValue":268435456.0,"Percent":3.217019730851453},"Limits":{"InstrumentedValue":536870912.0,"Percent":6.434039461702906}}},{"NodeName":"node3","CPU":{"Capacity":2000,"Allocatable":2000,"Requests":{"InstrumentedValue":750.0,"Percent":37.5},"Limits":{"InstrumentedValue":1500.0,"Percent":75.0}},"RAM":{"Capacity":8344227840,"Allocatable":8344227840,"Requests":{"InstrumentedValue":530579456.0,"Percent":6.358640561761075},"Limits":{"InstrumentedValue":1073741824.0,"Percent":12.868078923405813}}}]}

If you followed the instructions from the previous blog posts, you should now have statistics of three nodes.

Diving into the Code

So now that we get some CPU and RAM statistics out of the cluster, let's see how that actually works.

All references to files will use the relative path to the files as they are organized in the project repository.

Basic Project Structure

The project is really small and the basic structure looks like this:

.
├── Dockerfile
├── kubernetes_manifests
│   └── pykles.yaml
├── LICENSE
├── MANIFEST.in
├── pyproject.toml
├── README.md
├── setup.cfg
└── src
    └── pykles
        ├── __init__.py
        ├── kubernetes_functions.py
        ├── models.py
        ├── pykles.py
        └── services.py

The Kubernetes specific integration I have all put into the file src/pykles/kubernetes_functions.py and I will mainly refer to this file for the Kubernetes functions.

The Python Kubernetes Client

The Python Kubernetes client is available on GitHub.

Personally I find the documentation difficult/odd and it takes some getting used to in order to properly navigate.

However, I have put into the source code links to the relevant documentation pages on GitHub, so hopefully that twill help.

The client can be installed via pip3 install kubernetes, should you want to do this manually.

Authentication and Authorization

The following function will retrieve the credentials from the environment and return a client for CoreV1Api API:

def get_v1_client():
    try:
        config.load_incluster_config()
        logger.debug('Kubernetes Config Loaded')
        return client.CoreV1Api()
    except:
        logger.error('EXCEPTION: {}'.format(traceback.format_exc()))
    raise Exception('Failed to load Kubernetes Config')

Now, my initial statement and the code that followed may seem simple and straight forward... but it's actually a little more complicated than it appears. Let's look a little deeper:

What does it mean to "Get credentials from the environment"

This application integrates to the Kubernetes API and will run "in-cluster" - meaning it will be running as a Pod in the cluster.

All requests to teh Kubernetes API must include a token. When you use your kubectl command, you also need to be authenticated. In the case of my lab environment, this token is in the ~/k3s.yaml file that was created during the cluster creation process.

The kubectl command will use the certificate-authority-data for the cluster and ensure that the commands are all properly authorized. If you use a more general configuration file, you may have several clusters, and you will need to swtich context each time you need to work on a different cluster.

However, when you work "in-cluster", the authorization tokens need to be obtained from a ServiceAccount (Kubernetes documentation).

The ServiceAccount, ClusterRole and ClusterRoleBinding is all required to provide the application with the required credentials to interact with the Kubernetes API.

Important: In the demonstration project the permissions defined are equivalent to "super user", "root" or "administrator" permissions. I have done this in order to freely experiment with various calls in a Lab environment. If you ever need to develop such integration in a production cluster, you have to use more stricter permissions to limit your application to only the permissions it actually needs (least privilege principle).

Finally, note that the command config.load_incluster_config() is specifically implementing the logic for Pods running "in-cluster". You will not be able to use this application to connect to another cluster. For example, you cannot run the application native on your development machine to connect to a remote cluster. To run the application outside the cluster, consider using config.load_kube_config() which uses your Kubernetes configuration from your local environment.

The API Client

There are more than one API client and it is therefore important to know which one you need. In fact, there are around 60 different clients - below is a shorter list:

Depending on the integration you need, you need to pick the client. Each client, like CoreV1Api which we will use, only implement specific integration to the Kubernetes API. For example, the following are some actions methods implemented by CoreV1Api:

You should also start to see a pattern here... Each method implements a kind of CRUD operation oin resources in Kubernetes. The operations are:

These actions, in the background, will translate to one of the Kubernetes API request verbs.

In summary, you need to determine the correct client to use based on the operation you need to perform.

In our example, we need to call the following methods:

When you search for these on the kubernetes client README, you will see both are part of the CoreV1Api client - hence we only need it.

Making an API call

Let's look at the first part of one of the functions in our src/pykles/kubernetes_functions.py file:

def get_all_pod_resource_utilization_stats(
    node_id: str,
    next_token: str=None
)->dict:
    # ... some lines omitted ...
    try:
        client = get_v1_client()
        if next_token is not None:
            response = client.list_pod_for_all_namespaces(
                field_selector='spec.nodeName={},status.phase!=Failed,status.phase!=Succeeded'.format(node_id),
                limit=100
            )
        else:
            response = client.list_pod_for_all_namespaces(
                field_selector='spec.nodeName={},status.phase!=Failed,status.phase!=Succeeded'.format(node_id),
                _continue=next_token,
                limit=100
            )
    # ... processing of the results in response ...

Whenever you "read" data, you probably won't know how much data is actually coming to you. Need a list of Pods? Well, how many pods are there? The API provides a way to process data in smaller junks and the size can be more or less controlled by the limit keyword which should be available for most "read" operations. In this example, we return a maximum of 100 pods at a time. Should there be more than 100, the response will include a string value in the property response.metadata._continue (if no additional data is available, this value will be None). So later in the function, we have this:

# ...
        if response.metadata._continue is not None:
            results = get_all_pod_resource_utilization_stats(
                node_id=node_id,
                next_token=response.metadata._continue
            )
        # Here we get the response and ADD it to our existing result set
# ...

Other parameters can of course also be set, but what are they? Here is where the documentation in the GutHub format becomes a little bit more tricky to read. At the time of writing, I tried to compile the HTML documents from source, but I got the same result as on https://kubernetes.readthedocs.io/en/latest/ where pages like this one (the one we are interested in), just contains headings without any content.

The actual content can be found here in the raw Markdown format. It appears that the documentation compilation as per the GitHub README does not compile these Markdown files - that you have to do yourself.

Therefore, if you clone the kubernetes Python client repository, you can generate your own HTML version of the required page with the following command, assuming pandoc is installed on your system:

cd kubernetes/docs

pandoc --standalone CoreV1Api.md > CoreV1Api.html

You can now open the generated CoreV1Api.html in your favourite web browser.

The parts we are interested in is the list_node and list_pod_for_all_namespaces sections. In the documentation you will see an example implementation as well as a table of parameters with a short description of how it is used. In the implementation of the call we make to the list_pod_for_all_namespaces method, we use the field_selector parameter which has the following description:

string: A selector to restrict the list of returned objects by their fields. Defaults to everything.

Still not very helpful...

But there is a "cheat" available to see an example of how this works and that is by using the kubectl command with some additional debugging enabled that will show the actual API calls:

kubectl -v=8 describe node node2

There is a lot of additional output, including the following line:

I0416 14:42:19.171806  204656 round_trippers.go:432] GET https://10.0.50.170:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dnode2%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded&limit=500

Important: This, by the way, is generally how you can reverse engineer any kubectl command to the correct Python Client. In the example above, a call is made to the /api/v1/pods endpoint. If we search for this exact string in the repository README, we find that the endpoint belongs to the CoreV1Api class (client).

In fact, from the kubectl implementation we can see that you would typically list all nodes and then describe each node. Any additional API calls will also now be shown and you can therefore adapt similar flow if you need to.

What was important here is that we got an example of the parameter value for field_selector to use: spec.nodeName%3Dnode2%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded. Keep in mind the following when reading the whole string:

Therefore, spec.nodeName%3Dnode2%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded will translate to spec.nodeName=node2,status.phase!=Failed,status.phase!=Succeeded - and that is our parameter value in Python.

Interpreting the Results

Back to the CoreV1Api.html page, we see, for example, that the return type of the list_node call is V1NodeList. At the time of writing, teh return types were listed on the GitHub repository README and when you search for V1NodeList it will point to a web page on GitHub which is, thankfully, formatted. The table shows the following:

Name Type Description Notes
api_version str APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources [optional]
items list[V1Node] List of nodes
kind str Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the kubernetes.client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds [optional]
metadata V1ListMeta [optional]

Note: As you can see, there can be a little bit of a nested structure in some of tehse return types and you may end up referring to several pages.

In the source code I tried to link to all the relevant documentation:

        # ... snipped ...
        for pod_data in response.items:                         # https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1PodList.md
            pod_metadata = pod_data.metadata                    # https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1ObjectMeta.md
            pod_spec = pod_data.spec                            # https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1PodSpec.md
            pod_status = pod_data.status                        # https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1PodStatus.md
            for container_data in pod_spec.containers:          # https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1Container.md
                container_resources = container_data.resources  # https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1ResourceRequirements.md
        # ... snipped ...

At least there is enough information which will allow you to extract the required information and build up your own data structures, for example (building from the previous code snippet):

                # ... snipped ...
                limits = container_resources.limits             # https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
                logger.debug('containerName={}    limits={}'.format(container_data.name, limits))
                if limits is not None:
                    if 'cpu' in limits:
                        cpu_commitment += kubernetes_unit_conversion(value=limits['cpu'])
                    if 'memory' in limits:
                        ram_commitment += kubernetes_unit_conversion(value=limits['memory'])
                # ... snipped ...

However, this may not be the end of your woes... Note the additional URL reference to Resource Management for Pods and Containers which is important to understand how the data will be presented back.

Enter our helper function:

def kubernetes_unit_conversion(value: str)->float:
    """
        References:
            * https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/
            * https://physics.nist.gov/cuu/Units/binary.html
    """
    result = 0
    if value.endswith('m') or value.endswith('k'):
        result = value=_extract_number(value=value) / 1000
    elif value.endswith('M'):
        result = value=_extract_number(value=value) * 1000000
    elif value.endswith('G'):
        result = value=_extract_number(value=value) * 1000000000
    elif value.endswith('T'):
        result = value=_extract_number(value=value) * 1000000000000
    elif value.endswith('P'):
        result = value=_extract_number(value=value) * 1000000000000000
    elif value.endswith('Mi'):
        result = value=_extract_number(value=value) * (1024*1024)
    elif value.endswith('Gi'):
        result = value=_extract_number(value=value) * (1024*1024*1024)
    elif value.endswith('Ti'):
        result = value=_extract_number(value=value) * (1024*1024*1024*1024)
    elif value.endswith('Pi'):
        result = value=_extract_number(value=value) * (1024*1024*1024*1024*1024)
    else:
        result = _extract_number(value=value)        
    return float(result)

Finally, we have all the pieces together...

The rest is pretty much standard Python stuff that will extract the data, pass it back to the calling function which ultimately will return the JSON values. Perhaps I will discuss some more of that in a later blog post, but for now, I'm going to stop here.

But what about integrating to third party API's (not part of the Official Kubernetes Distribution)

The Kubernetes Python Client also allow you to integrate with API extensions (see Extend the Kubernetes API with CustomResourceDefinitions).

The Custom Objects API is a good place to start. Look at this Stackoverflow Thread for an example problem and solution.

The third party with CRD resources may or may not have better documentation - you will be at their mercy - but as long as you can interact via kubectl with a third party CRD, you will be able to reverse engineer the calls to Python.

Conclusion and Final Thoughts

This was a rather lengthy look at how you can interact with the Kubernetes API from within the cluster using the official Kubernetes Python Client.

It works great, but documentation is painful and you sometimes need "hacky" ways to find the correct use of certain parameters. AT least all of this is possible, with the biggest challenge being knowing how to get to it all - this took me a fair amount of time, so I really hope that if you are reading this, I was able to save you some time!

Please remember that the code examples are just that: examples. None of the code I showed should be considered production ready, and especially the permissions were deliberately wide open to enable you to experiment in a lab environment. Security and RBAC is a topic on it's own - perhaps for another day.