Creating Scalable Applications with Kubernetes

817 0 0 0 0

📘 Chapter 2: Horizontal & Vertical Scaling in Kubernetes

🌐 Introduction

In the age of microservices and cloud-native design, the ability to scale efficiently is fundamental to application success. Kubernetes provides robust, built-in mechanisms to automatically adjust workloads based on demand.

This chapter focuses on two critical methods:

  • Horizontal Pod Autoscaling (HPA) – scaling out by adding/removing pods
  • Vertical Pod Autoscaling (VPA) – scaling up/down by changing pod resources

We'll walk through when to use each, how to configure them, and common pitfalls with real-world code and examples.


🧱 Section 1: Understanding the Difference

Aspect

Horizontal Scaling

Vertical Scaling

What it does

Adds/removes pods

Adjusts CPU/Memory per pod

Resource adjusted

Pod count

Resource allocation (requests/limits)

Use case

Handling increased traffic

CPU/memory-heavy apps (e.g., ML inference)

Component used

HorizontalPodAutoscaler

VerticalPodAutoscaler

Downtime

No (rolling scale)

Yes (pod restarts needed)

Best for

Stateless apps

Memory-bound apps, legacy workloads


🛠️ Section 2: Horizontal Pod Autoscaling (HPA)

HPA automatically adjusts the number of replicas in a deployment or replicaset based on CPU, memory, or custom metrics.

🔧 Prerequisites

  • Kubernetes version ≥1.6
  • Metrics Server installed in the cluster
  • Pods must define resources.requests.cpu

Step-by-Step HPA Example (CPU-Based)

Step 1: Deploy a Sample Application

bash

 

kubectl create deployment php-apache --image=k8s.gcr.io/hpa-example

kubectl expose deployment php-apache --port=80 --type=LoadBalancer

Step 2: Apply CPU Resource Requests

bash

 

kubectl patch deployment php-apache \

  --patch '{"spec": {"template": {"spec": {"containers": [{"name": "php-apache", "resources": {"requests": {"cpu": "200m"}}}]}}}}'

Step 3: Create HPA Resource

bash

 

kubectl autoscale deployment php-apache \

  --cpu-percent=50 \

  --min=1 \

  --max=10

Step 4: Load Test & Monitor

bash

 

kubectl run -i --tty load-generator --image=busybox /bin/sh

 

# Inside busybox:

while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done

Use:

bash

 

kubectl get hpa

kubectl get pods -w


📊 HPA YAML Example

yaml

 

apiVersion: autoscaling/v2

kind: HorizontalPodAutoscaler

metadata:

  name: php-apache

spec:

  scaleTargetRef:

    apiVersion: apps/v1

    kind: Deployment

    name: php-apache

  minReplicas: 2

  maxReplicas: 10

  metrics:

    - type: Resource

      resource:

        name: cpu

        target:

          type: Utilization

          averageUtilization: 50


🧪 Section 3: Custom Metrics for HPA

You can use custom metrics like requests per second, queue length, or business KPIs.

Tools Required:

  • Prometheus Adapter
  • Custom Metrics API

Example Metric:

yaml

 

- type: Pods

  pods:

    metric:

      name: queue_messages_ready

    target:

      type: AverageValue

      averageValue: 30


🧠 Section 4: Vertical Pod Autoscaling (VPA)

VPA automatically adjusts the CPU and memory requests of containers based on usage.

🔧 VPA Modes

Mode

Description

Off

Only provides recommendations (does not act)

Initial

Applies recommendations only at pod creation

Auto

Continuously adjusts resources (causes restarts)


Installing VPA

On GKE:

bash

 

kubectl apply -f https://github.com/kubernetes/autoscaler/releases/download/vertical-pod-autoscaler-<version>/vertical-pod-autoscaler.yaml

On other clusters, install using Helm or manifests from the VPA GitHub repo.


🛠️ VPA YAML Example

yaml

 

apiVersion: autoscaling.k8s.io/v1

kind: VerticalPodAutoscaler

metadata:

  name: my-app-vpa

spec:

  targetRef:

    apiVersion: "apps/v1"

    kind: Deployment

    name: my-app

  updatePolicy:

    updateMode: "Auto"


🔄 Section 5: HPA + VPA Together?

While HPA and VPA can technically co-exist (using memory vs. CPU), they can conflict. Best practices:

  • Use HPA for horizontal scaling based on CPU
  • Use VPA in recommendation or initial mode for startup sizing
  • Avoid both managing the same resource (e.g., CPU)

🧰 Section 6: Cluster Autoscaler

If your workloads can’t schedule due to lack of node resources, Cluster Autoscaler helps by:

  • Adding nodes when pending pods can't fit
  • Removing underutilized nodes

It works with cloud providers like:

  • AWS EKS (with ASG integration)
  • GCP GKE
  • Azure AKS

🧠 Best Practices Summary

Tip

Applies To

Define CPU/memory requests for all pods

HPA/VPA

Use readinessProbe to protect scaling logic

HPA

Avoid HPA+VPA managing the same metric

Both

Load test your application to find thresholds

HPA

Use custom metrics for domain-specific autoscaling

HPA


Summary

Scaling in Kubernetes isn’t a one-size-fits-all solution. Horizontal and vertical autoscaling offer powerful tools to respond to changing load and optimize resource usage.

Key takeaways:


  • HPA scales pods based on real-time metrics like CPU
  • VPA adjusts resource requests and can restart pods
  • Both autoscalers can enhance performance if used wisely
  • Custom metrics unlock smarter scaling logic
  • Cluster autoscaler ensures infrastructure matches workload demand

Back

FAQs


❓1. What makes Kubernetes ideal for building scalable applications?

Answer:
Kubernetes automates deployment, scaling, and management of containerized applications. It offers built-in features like horizontal pod autoscaling, load balancing, and self-healing, allowing applications to handle traffic spikes and system failures efficiently.

❓2. What is the difference between horizontal and vertical scaling in Kubernetes?

Answer:

  • Horizontal scaling increases or decreases the number of pod replicas.
  • Vertical scaling adjusts the resources (CPU, memory) allocated to a pod.
    Kubernetes primarily supports horizontal scaling through the Horizontal Pod Autoscaler (HPA).

❓3. How does the Horizontal Pod Autoscaler (HPA) work?

Answer:
HPA monitors metrics like CPU or memory usage and automatically adjusts the number of pods in a deployment to meet demand. It uses the Kubernetes Metrics Server or custom metrics APIs.

❓4. Can Kubernetes scale the number of nodes in a cluster?

Answer:
Yes. The Cluster Autoscaler automatically adjusts the number of nodes in a cluster based on resource needs, ensuring pods always have enough room to run.

❓5. What’s the role of Ingress in scalable applications?

Answer:
Ingress manages external access to services within the cluster. It provides SSL termination, routing rules, and load balancing, enabling scalable and secure traffic management.

❓6. How do I manage application rollouts during scaling?

Answer:
Use Kubernetes Deployments to perform rolling updates with zero downtime. You can also perform canary or blue/green deployments using tools like Argo Rollouts or Flagger.

❓7. Is Kubernetes suitable for both stateless and stateful applications?

Answer:
Yes. Stateless apps are easier to scale and deploy. For stateful apps, Kubernetes provides StatefulSets, persistent volumes, and storage classes to ensure data consistency across pod restarts or migrations.

❓8. How can I monitor the scalability of my Kubernetes applications?

Answer:
Use tools like Prometheus for metrics, Grafana for dashboards, ELK stack or Loki for logs, and Kubernetes probes (liveness/readiness) to track application health and scalability trends.

❓9. Can I run scalable Kubernetes apps on multiple clouds?

Answer:
Yes. Kubernetes is cloud-agnostic. You can deploy apps on any provider (AWS, Azure, GCP) or use multi-cloud/hybrid tools like Rancher, Anthos, or KubeFed for federated scaling across environments.

❓10. What are some common mistakes when trying to scale apps with Kubernetes?

Answer:

  • Not setting proper resource limits and requests
  • Overlooking pod disruption budgets during scaling
  • Misconfiguring autoscalers or probes
  • Ignoring log/metrics aggregation for troubleshooting
  • Running all workloads in a single namespace without isolation