Exploring Kubernetes from within Kubernetes... with Python - PART 1¶
Part 2 is available here
So my blog posts will obviously not be daily, as I start to delve into deeper more technical topics.
Today I'm looking at using Python and integrating with the Kubernetes API, in a two part post. The first part will be just introducing the basics - kind of setting the scene. Part two will be the actual Python solution.
To make this practical, I choose a topic for which there may be a number of better alternatives, but yet, it's good to gain understanding on some of these mechanisms.
I want to know one simple thing: Do I have enough physical resources (CPU and RAM) to handle all my committed deployments?
For this post, I also assume the reader to be familiar with some basics of Kubernetes. Feel free to refer to this blog entry that shows how I implement my lab environments and on which commands in this blog post will also be based.
Context¶
Certain services, like AWS EKS provide out-of-the-box solutions to automatically scale the number of nodes. The scaling metrics can be tweaked, but for the most part it all just happens automatically.
However... Sometimes you run Kubernetes on physical tin or in a cloud environments with some financial constraints, and you have to ensure that the deployed applications do not consume more than the available resources. This can be tricky, but one of the fundamental problems to solve upfront is to know how much capacity is available and then measure what is actually being committed. Actual usage may be lower than the total commitment, however, but for now we focus on what is committed as this serves as some upper boundary or guarantee that our users expect to be honored if their applications run on our cluster.
Background on resources¶
I am going to assume anyone reading this will at least understand the concepts around CPU and RAM - particularly around the physical constraints and at least some basic understanding of how application use CPU and RAM resources.
Over and above this basic understanding There are also important principles to take note of:
Principle | Description |
---|---|
Resources are finite | As mentioned, in our context of physical hardware, both CPU and RAM resources are finite - we cannot use more than what is available |
Swap is not an option | There is much debate around using swap within the Kubernetes ecosystem, but for the purpose of this blog we assume no swap mechanism is being used (standard for the bulk of use cases) |
Deployments must impose limits | This is a best practice principle. Refer to the Kubernetes documentation for the gory detail. |
If we look at our Python Demonstration/Reference Application, the following section if the Kubernetes manifest is important:
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "256Mi"
cpu: "500m"
The specifics of what this means and technically how this is implemented, can be read in this section of the Kubernetes documentation.
What is important is that although you could get away with over-subscribing the CPU (within reasonable limits), memory (RAM) is far less forgiving. Without any swap available, should all your pods start to consume too much available RAM, Kubernetes will start to evict pods. The technical detail of this subject and how it is implemented is vast, but feel free to refer to the Kubernetes documentation dealing with Scheduling, Preemption and Eviction.
Setting up our playground¶
I use a local lab environment to run all the tests. Please have a look at this blog entry for details.
Note: Since my last Kubernetes technical post I have updated the demo application to include a memory monster :-D
The sample application repository is still available on GitHub. An image of the latest build is on Docker Hub under the 0.0.2 tag
My lab environment has the following IP addresses:
- 10.0.50.170
- 10.0.50.222
- 10.0.50.72
To deploy the demo app, create a test
namespace and deploy the manifest:
$ kubectl create namespace test
$ kubectl apply -f https://raw.githubusercontent.com/nicc777/pyk8sdemo-app/main/kubernetes_manifests/troubleshooting-app.yaml -n test
$ kubectl get all -n NAME READY STATUS RESTARTS AGE
pod/svclb-flask-demo-app-service-xhg4t 1/1 Running 0 16s
pod/svclb-flask-demo-app-service-tvzvl 1/1 Running 0 16s
pod/svclb-flask-demo-app-service-fz8mm 1/1 Running 0 16s
pod/flask-demo-app-deployment-7b967d6bc8-xnhrr 1/1 Running 0 16s
pod/flask-demo-app-deployment-7b967d6bc8-c6427 1/1 Running 0 16s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/flask-demo-app-service LoadBalancer 10.43.164.224 10.0.50.170,10.0.50.222,10.0.50.72 8880:32670/TCP 16s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/svclb-flask-demo-app-service 3 3 3 3 3 <none> 16s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/flask-demo-app-deployment 2/2 2 2 16s
NAME DESIRED CURRENT READY AGE
replicaset.apps/flask-demo-app-deployment-7b967d6bc8 2 2 2 16s
Scaling up¶
To increase the number of pods we have running, issue the following command:
kubectl scale deployment.apps/flask-demo-app-deployment --replicas=6 -n test
You should now see a total of 6 pods running
Load Testing Basics¶
For testing, we are going to create a little locust test suite.
In the python test application project, there is a directory called test/locust
which contains our locust tests. For this test I optted not to use a load balancer, but rather reference the node endpoints from locust. This is perhaps not very portable, but it is sufficient for our tests in this scenario.
In order to use these test files, export the Node IP addresses as HOST1
, HOST2
and HOST3
, for example:
export HOST1=10.0.50.170
export HOST2=10.0.50.222
export HOST3=10.0.50.72
You can now run the locust test with a moderate number of users (for example 12) to see that each of the nodes are actually used:
locust -f basic_home_page_test.py -u 12 -r 3 --autostart
When you visit http://localhost:8089/ you should see the test already running.
To watch what the memory performance of the cluster is like, we can use the command:
kubectl describe node/node1 | grep memory | tail -1 && kubectl describe node/node2 | grep memory | tail -1 && kubectl describe node/node3 | grep memory | tail -1
The output should look something like this:
memory 396Mi (4%) 682Mi (8%)
memory 256Mi (3%) 512Mi (6%)
memory 256Mi (3%) 512Mi (6%)
This shows the requested memory and memory limits as per our deployments. We are still well below the physical limits of our system.
Similarly, we can easily get the actual memory stats directly from the nodes with the following command:
multipass exec node1 -- free -m | grep "Mem:" && multipass exec node2 -- free -m | grep "Mem:" && multipass exec node3 -- free -m | grep "Mem:"
And the output should now look something like this:
Mem: 7957 917 4178 1 2861 6742
Mem: 7957 506 5055 1 2395 7155
Mem: 7957 477 5222 1 2256 7188
The 4th column and last column are of interest: free
and available
RAM.
To watch this number over time, use the following command:
watch -n3 "multipass exec node1 -- free -m | grep \"Mem:\" && multipass exec node2 -- free -m | grep \"Mem:\" && multipass exec node3 -- free -m | grep \"Mem:\""
Now let's run a basic memory intensive test:
locust -f basic_hunger_test.py -u 12 -r 3 --autostart
After a couple of minutes, my physical memory numbers looks like this:
Mem: 7957 1091 3999 1 2866 6569
Mem: 7957 659 4897 1 2400 7001
Mem: 7957 637 5058 1 2262 7027
We can see that more memory is being consumed. However, the maximum commitment is still unchanged:
memory 396Mi (4%) 682Mi (8%)
memory 256Mi (3%) 512Mi (6%)
memory 256Mi (3%) 512Mi (6%)
After some time, I stopped the test with the following statistics:
Name # reqs # fails | Avg Min Max Median | req/s failures/s
----------------------------------------------------------------------------------------------------------------------------------------------------------------
GET / 714 0(0.00%) | 3 2 10 3 | 1.93 0.00
GET /hungry 746 0(0.00%) | 50 21 1007 33 | 2.02 0.00
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregated 1460 0(0.00%) | 27 2 1007 23 | 3.95 0.00
Response time percentiles (approximated)
Type Name 50% 66% 75% 80% 90% 95% 98% 99% 99.9% 99.99% 100% # reqs
--------|--------------------------------------------------------------------------------|---------|------|------|------|------|------|------|------|------|------|------|------|
GET / 3 3 3 3 4 4 7 8 11 11 11 714
GET /hungry 33 36 39 43 86 140 320 380 1000 1000 1000 746
--------|--------------------------------------------------------------------------------|---------|------|------|------|------|------|------|------|------|------|------|------|
None Aggregated 23 29 33 34 43 90 160 320 800 1000 1000 1460
Concluding part 1¶
In this first part I looked at some basics of Kubernetes resource management and how we can measure and test it using existing tools.
Part two, which will hopefully follow in a week or so, will look at how we create a Python based pod that monitors these statistics for us in a more summarized way across all our pods and namespaces.
Tags¶
kubernetes, python, resources, api, locust