Chapters

Creating Scalable Applications with Kubernetes

1.46K 0 0 0 0

Pawan Pal

📘 Chapter 2: Horizontal & Vertical Scaling in Kubernetes

🌐 Introduction

In the age of microservices and cloud-native design, the ability to scale efficiently is fundamental to application success. Kubernetes provides robust, built-in mechanisms to automatically adjust workloads based on demand.

This chapter focuses on two critical methods:

Horizontal Pod Autoscaling (HPA) – scaling out by adding/removing pods
Vertical Pod Autoscaling (VPA) – scaling up/down by changing pod resources

We'll walk through when to use each, how to configure them, and common pitfalls with real-world code and examples.

🧱 Section 1: Understanding the Difference

Aspect	Horizontal Scaling	Vertical Scaling
What it does	Adds/removes pods	Adjusts CPU/Memory per pod
Resource adjusted	Pod count	Resource allocation (requests/limits)
Use case	Handling increased traffic	CPU/memory-heavy apps (e.g., ML inference)
Component used	HorizontalPodAutoscaler	VerticalPodAutoscaler
Downtime	No (rolling scale)	Yes (pod restarts needed)
Best for	Stateless apps	Memory-bound apps, legacy workloads

🛠️ Section 2: Horizontal Pod Autoscaling (HPA)

HPA automatically adjusts the number of replicas in a deployment or replicaset based on CPU, memory, or custom metrics.

🔧 Prerequisites

Kubernetes version ≥1.6
Metrics Server installed in the cluster
Pods must define resources.requests.cpu

✅ Step-by-Step HPA Example (CPU-Based)

Step 1: Deploy a Sample Application

bash

kubectl create deployment php-apache --image=k8s.gcr.io/hpa-example

kubectl expose deployment php-apache --port=80 --type=LoadBalancer

Step 2: Apply CPU Resource Requests

bash

kubectl patch deployment php-apache \

--patch '{"spec": {"template": {"spec": {"containers": [{"name": "php-apache", "resources": {"requests": {"cpu": "200m"}}}]}}}}'

Step 3: Create HPA Resource

bash

kubectl autoscale deployment php-apache \

--cpu-percent=50 \

--min=1 \

--max=10

Step 4: Load Test & Monitor

bash

kubectl run -i --tty load-generator --image=busybox /bin/sh

# Inside busybox:

while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done

Use:

bash

kubectl get hpa

kubectl get pods -w

📊 HPA YAML Example

yaml

apiVersion: autoscaling/v2

kind: HorizontalPodAutoscaler

metadata:

name: php-apache

spec:

scaleTargetRef:

apiVersion: apps/v1

kind: Deployment

name: php-apache

minReplicas: 2

maxReplicas: 10

metrics:

- type: Resource

resource:

name: cpu

target:

type: Utilization

averageUtilization: 50

🧪 Section 3: Custom Metrics for HPA

You can use custom metrics like requests per second, queue length, or business KPIs.

Tools Required:

Prometheus Adapter
Custom Metrics API

Example Metric:

yaml

- type: Pods

pods:

metric:

name: queue_messages_ready

target:

type: AverageValue

averageValue: 30

🧠 Section 4: Vertical Pod Autoscaling (VPA)

VPA automatically adjusts the CPU and memory requests of containers based on usage.

🔧 VPA Modes

Mode	Description
Off	Only provides recommendations (does not act)
Initial	Applies recommendations only at pod creation
Auto	Continuously adjusts resources (causes restarts)

✅ Installing VPA

On GKE:

bash

kubectl apply -f https://github.com/kubernetes/autoscaler/releases/download/vertical-pod-autoscaler-<version>/vertical-pod-autoscaler.yaml

On other clusters, install using Helm or manifests from the VPA GitHub repo.

🛠️ VPA YAML Example

yaml

apiVersion: autoscaling.k8s.io/v1

kind: VerticalPodAutoscaler

metadata:

name: my-app-vpa

spec:

targetRef:

apiVersion: "apps/v1"

kind: Deployment

name: my-app

updatePolicy:

updateMode: "Auto"

🔄 Section 5: HPA + VPA Together?

While HPA and VPA can technically co-exist (using memory vs. CPU), they can conflict. Best practices:

Use HPA for horizontal scaling based on CPU
Use VPA in recommendation or initial mode for startup sizing
Avoid both managing the same resource (e.g., CPU)

🧰 Section 6: Cluster Autoscaler

If your workloads can’t schedule due to lack of node resources, Cluster Autoscaler helps by:

Adding nodes when pending pods can't fit
Removing underutilized nodes

It works with cloud providers like:

AWS EKS (with ASG integration)
GCP GKE
Azure AKS

🧠 Best Practices Summary

Tip	Applies To
Define CPU/memory requests for all pods	HPA/VPA
Use readinessProbe to protect scaling logic	HPA
Avoid HPA+VPA managing the same metric	Both
Load test your application to find thresholds	HPA
Use custom metrics for domain-specific autoscaling	HPA

✅ Summary

Scaling in Kubernetes isn’t a one-size-fits-all solution. Horizontal and vertical autoscaling offer powerful tools to respond to changing load and optimize resource usage.

Key takeaways:

HPA scales pods based on real-time metrics like CPU
VPA adjusts resource requests and can restart pods
Both autoscalers can enhance performance if used wisely
Custom metrics unlock smarter scaling logic
Cluster autoscaler ensures infrastructure matches workload demand

Back

FAQs

❓1. What makes Kubernetes ideal for building scalable applications?

Answer:
Kubernetes automates deployment, scaling, and management of containerized applications. It offers built-in features like horizontal pod autoscaling, load balancing, and self-healing, allowing applications to handle traffic spikes and system failures efficiently.

❓2. What is the difference between horizontal and vertical scaling in Kubernetes?

Answer:

Horizontal scaling increases or decreases the number of pod replicas.
Vertical scaling adjusts the resources (CPU, memory) allocated to a pod.
Kubernetes primarily supports horizontal scaling through the Horizontal Pod Autoscaler (HPA).

❓3. How does the Horizontal Pod Autoscaler (HPA) work?

Answer:
HPA monitors metrics like CPU or memory usage and automatically adjusts the number of pods in a deployment to meet demand. It uses the Kubernetes Metrics Server or custom metrics APIs.

❓4. Can Kubernetes scale the number of nodes in a cluster?

Answer:
Yes. The Cluster Autoscaler automatically adjusts the number of nodes in a cluster based on resource needs, ensuring pods always have enough room to run.

❓5. What’s the role of Ingress in scalable applications?

Answer:
Ingress manages external access to services within the cluster. It provides SSL termination, routing rules, and load balancing, enabling scalable and secure traffic management.

❓6. How do I manage application rollouts during scaling?

Answer:
Use Kubernetes Deployments to perform rolling updates with zero downtime. You can also perform canary or blue/green deployments using tools like Argo Rollouts or Flagger.

❓7. Is Kubernetes suitable for both stateless and stateful applications?

Answer:
Yes. Stateless apps are easier to scale and deploy. For stateful apps, Kubernetes provides StatefulSets, persistent volumes, and storage classes to ensure data consistency across pod restarts or migrations.

❓8. How can I monitor the scalability of my Kubernetes applications?

Answer:
Use tools like Prometheus for metrics, Grafana for dashboards, ELK stack or Loki for logs, and Kubernetes probes (liveness/readiness) to track application health and scalability trends.

❓9. Can I run scalable Kubernetes apps on multiple clouds?

Answer:
Yes. Kubernetes is cloud-agnostic. You can deploy apps on any provider (AWS, Azure, GCP) or use multi-cloud/hybrid tools like Rancher, Anthos, or KubeFed for federated scaling across environments.

❓10. What are some common mistakes when trying to scale apps with Kubernetes?

Answer:

Not setting proper resource limits and requests
Overlooking pod disruption budgets during scaling
Misconfiguring autoscalers or probes
Ignoring log/metrics aggregation for troubleshooting
Running all workloads in a single namespace without isolation

Previous Next

Comments(0)

Post Comment

Chapters

Creating Scalable Applications with Kubernetes

Pawan Pal

📘 Chapter 2: Horizontal & Vertical Scaling in Kubernetes

FAQs

❓1. What makes Kubernetes ideal for building scalable applications?

❓2. What is the difference between horizontal and vertical scaling in Kubernetes?

❓3. How does the Horizontal Pod Autoscaler (HPA) work?

❓4. Can Kubernetes scale the number of nodes in a cluster?

❓5. What’s the role of Ingress in scalable applications?

❓6. How do I manage application rollouts during scaling?

❓7. Is Kubernetes suitable for both stateless and stateful applications?

❓8. How can I monitor the scalability of my Kubernetes applications?

❓9. Can I run scalable Kubernetes apps on multiple clouds?

❓10. What are some common mistakes when trying to scale apps with Kubernetes?

Comments(0)

Explore Other Libraries

Online Exams

Question Bank

Career News

Feeds

Full Forms

Dictionary

Interview Question

Gigs

Quotes

Lyrics

Videos

Courses

Blogs

Tutorials

Forum

Educators

Corporates

Tools

Related Searches

Join Our Community Today