Embark on a journey of knowledge! Take the quiz and earn valuable credits.
Take A QuizChallenge yourself and boost your learning! Start the quiz now to earn credits.
Take A QuizUnlock your potential! Begin the quiz, answer questions, and accumulate credits along the way.
Take A Quiz
🌐 Introduction
In the age of microservices and cloud-native design, the ability
to scale efficiently is fundamental to application success. Kubernetes
provides robust, built-in mechanisms to automatically adjust workloads
based on demand.
This chapter focuses on two critical methods:
We'll walk through when to use each, how to configure them,
and common pitfalls with real-world code and examples.
🧱 Section 1:
Understanding the Difference
Aspect |
Horizontal Scaling |
Vertical Scaling |
What it does |
Adds/removes pods |
Adjusts CPU/Memory per
pod |
Resource adjusted |
Pod count |
Resource
allocation (requests/limits) |
Use case |
Handling increased
traffic |
CPU/memory-heavy apps
(e.g., ML inference) |
Component used |
HorizontalPodAutoscaler |
VerticalPodAutoscaler |
Downtime |
No (rolling scale) |
Yes (pod restarts
needed) |
Best for |
Stateless
apps |
Memory-bound
apps, legacy workloads |
🛠️ Section 2: Horizontal
Pod Autoscaling (HPA)
HPA automatically adjusts the number of replicas in a
deployment or replicaset based on CPU, memory, or custom metrics.
🔧 Prerequisites
✅ Step-by-Step HPA Example
(CPU-Based)
Step 1: Deploy a Sample Application
bash
kubectl
create deployment php-apache --image=k8s.gcr.io/hpa-example
kubectl
expose deployment php-apache --port=80 --type=LoadBalancer
Step 2: Apply CPU Resource Requests
bash
kubectl
patch deployment php-apache \
--patch '{"spec":
{"template": {"spec": {"containers":
[{"name": "php-apache", "resources":
{"requests": {"cpu": "200m"}}}]}}}}'
Step 3: Create HPA Resource
bash
kubectl
autoscale deployment php-apache \
--cpu-percent=50 \
--min=1 \
--max=10
Step 4: Load Test & Monitor
bash
kubectl
run -i --tty load-generator --image=busybox /bin/sh
#
Inside busybox:
while true; do wget -q -O-
http://php-apache.default.svc.cluster.local; done
Use:
bash
kubectl
get hpa
kubectl
get pods -w
📊 HPA YAML Example
yaml
apiVersion:
autoscaling/v2
kind:
HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
🧪 Section 3: Custom
Metrics for HPA
You can use custom metrics like requests per second,
queue length, or business KPIs.
Tools Required:
Example Metric:
yaml
-
type: Pods
pods:
metric:
name: queue_messages_ready
target:
type: AverageValue
averageValue: 30
🧠 Section 4: Vertical Pod
Autoscaling (VPA)
VPA automatically adjusts the CPU and memory requests
of containers based on usage.
🔧 VPA Modes
Mode |
Description |
Off |
Only provides
recommendations (does not act) |
Initial |
Applies
recommendations only at pod creation |
Auto |
Continuously adjusts
resources (causes restarts) |
✅ Installing VPA
On GKE:
bash
kubectl
apply -f
https://github.com/kubernetes/autoscaler/releases/download/vertical-pod-autoscaler-<version>/vertical-pod-autoscaler.yaml
On other clusters, install using Helm or manifests from the VPA GitHub repo.
🛠️ VPA YAML Example
yaml
apiVersion:
autoscaling.k8s.io/v1
kind:
VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto"
🔄 Section 5: HPA + VPA
Together?
While HPA and VPA can technically co-exist (using memory vs.
CPU), they can conflict. Best practices:
🧰 Section 6: Cluster
Autoscaler
If your workloads can’t schedule due to lack of node
resources, Cluster Autoscaler helps by:
It works with cloud providers like:
🧠 Best Practices Summary
Tip |
Applies To |
Define CPU/memory
requests for all pods |
HPA/VPA |
Use readinessProbe to protect scaling logic |
HPA |
Avoid HPA+VPA
managing the same metric |
Both |
Load test your application to find thresholds |
HPA |
Use custom metrics
for domain-specific autoscaling |
HPA |
✅ Summary
Scaling in Kubernetes isn’t a one-size-fits-all solution.
Horizontal and vertical autoscaling offer powerful tools to respond to changing
load and optimize resource usage.
Key takeaways:
Answer:
Kubernetes automates deployment, scaling, and management of containerized
applications. It offers built-in features like horizontal pod autoscaling,
load balancing, and self-healing, allowing applications to handle
traffic spikes and system failures efficiently.
Answer:
Answer:
HPA monitors metrics like CPU or memory usage and automatically adjusts the
number of pods in a deployment to meet demand. It uses the Kubernetes Metrics
Server or custom metrics APIs.
Answer:
Yes. The Cluster Autoscaler automatically adjusts the number of nodes in
a cluster based on resource needs, ensuring pods always have enough room to
run.
Answer:
Ingress manages external access to services within the cluster. It provides SSL
termination, routing rules, and load balancing, enabling
scalable and secure traffic management.
Answer:
Use Kubernetes Deployments to perform rolling updates with zero
downtime. You can also perform canary or blue/green deployments
using tools like Argo Rollouts or Flagger.
Answer:
Yes. Stateless apps are easier to scale and deploy. For stateful apps,
Kubernetes provides StatefulSets, persistent volumes, and storage
classes to ensure data consistency across pod restarts or migrations.
Answer:
Use tools like Prometheus for metrics, Grafana for dashboards, ELK
stack or Loki for logs, and Kubernetes probes
(liveness/readiness) to track application health and scalability trends.
Answer:
Yes. Kubernetes is cloud-agnostic. You can deploy apps on any provider (AWS,
Azure, GCP) or use multi-cloud/hybrid tools like Rancher, Anthos,
or KubeFed for federated scaling across environments.
Answer:
Please log in to access this content. You will be redirected to the login page shortly.
LoginReady to take your education and career to the next level? Register today and join our growing community of learners and professionals.
Comments(0)