Deploying Containers with Kubernetes

7.89K 0 0 0 0

✅ Chapter 5: Monitoring, Troubleshooting, and Best Practices

🔍 Introduction

Kubernetes is powerful, but with great power comes complexity. Once your applications are deployed, monitoring their health and troubleshooting issues efficiently becomes critical. Additionally, following best practices ensures your clusters remain secure, stable, and scalable.

In this chapter, you’ll learn:

  • Key methods to monitor Kubernetes clusters and workloads
  • Essential troubleshooting techniques
  • Best practices for production-grade Kubernetes deployments

By mastering these skills, you’ll be able to proactively maintain and secure your Kubernetes environments.


📋 Part 1: Monitoring Kubernetes Clusters and Workloads

Monitoring is not just optional—it's essential. Kubernetes environments are dynamic, and things can change rapidly.


📊 What to Monitor in Kubernetes?

Component

What to Monitor

Nodes

CPU, memory usage, disk space, network health

Pods

Restarts, CPU/memory usage, readiness, liveness

Deployments

Replica counts, rollout status

Services

Response times, error rates

Cluster Events

Warnings, failures

API Server

Request throughput, latencies

etcd

Storage usage, leader election health


🔹 Key Monitoring Tools

Tool

Purpose

kubectl top

Basic resource usage metrics

Metrics Server

Cluster-wide CPU/memory aggregation

Prometheus

In-depth metrics collection

Grafana

Visualization and alerting

ELK Stack

Centralized logging (Elasticsearch, Logstash, Kibana)

Kube-state-metrics

Kubernetes object status metrics


📋 Installing Metrics Server (for kubectl top)

bash

 

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Check node and pod metrics:

bash

 

kubectl top nodes

kubectl top pods


📈 Setting Up Prometheus and Grafana

  • Prometheus scrapes metrics from Kubernetes objects
  • Grafana visualizes them

Deploy using Helm:

bash

 

helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack

Access Grafana dashboard:

bash

 

kubectl port-forward svc/kube-prometheus-stack-grafana 3000:80

Visit http://localhost:3000.

Default credentials:

  • Username: admin
  • Password: prom-operator

🧩 Important Metrics to Monitor

Metric

Why It Matters

Pod CPU/Memory usage

Detect resource bottlenecks

Node CPU/Memory usage

Capacity planning

Pod restarts

Application instability

Deployment availability

Ensure service uptime

API Server latency

Cluster responsiveness

etcd database size

Avoid storage exhaustion


🛠️ Part 2: Troubleshooting Kubernetes Applications

Things go wrong — here’s how you methodically troubleshoot.


🔎 Checking Pod Status

bash

 

kubectl get pods

Look for unusual statuses like CrashLoopBackOff, Error, or Pending.


📜 Inspect Pod Events

bash

 

kubectl describe pod <pod-name>

  • Events section shows scheduling issues, image pull errors, OOM kills, etc.

🛠️ Viewing Pod Logs

bash

 

kubectl logs <pod-name>

For multi-container Pods:

bash

 

kubectl logs <pod-name> -c <container-name>

Tail logs:

bash

 

kubectl logs -f <pod-name>


🛡️ Executing Commands Inside a Pod

bash

 

kubectl exec -it <pod-name> -- /bin/sh

Useful for debugging container filesystem, environment, or manual probes.


🚥 Troubleshooting Deployments

Check rollout status:

bash

 

kubectl rollout status deployment <deployment-name>

Undo rollout:

bash

 

kubectl rollout undo deployment <deployment-name>

View revision history:

bash

 

kubectl rollout history deployment <deployment-name>


🔥 Common Problems and How to Fix Them

Issue

Potential Cause

Solution

CrashLoopBackOff

App crash, bad config, missing dependency

Check logs and environment vars

ImagePullBackOff

Wrong image name or missing image

Verify image name, pull manually

Pods stuck in Pending

Resource limits exceeded

Check node resources, scheduling constraints

Network unreachable

Service misconfiguration

Check Service, DNS, NetworkPolicy

Readiness probe failures

Application not ready

Adjust readiness probe settings


📦 Part 3: Kubernetes Best Practices for Stability, Security, and Scalability

Building on a strong foundation ensures long-term success.


General Best Practices

  • Use Namespaces to separate environments (e.g., dev, staging, prod).
  • Tag resources properly with labels and annotations.
  • Prefer Deployments over manual Pod creation.
  • Always set resource requests and limits.
  • Use rolling updates for zero downtime deployments.

Application Best Practices

Practice

Why It Matters

Readiness and Liveness Probes

Ensures only healthy Pods serve traffic

Environment Variables and ConfigMaps

Externalize configuration

Use Secrets for Credentials

Avoid storing passwords in plain YAML


Security Best Practices

  • Use RBAC (Role-Based Access Control) for users and services.
  • Limit access to cluster components and APIs.
  • Use PodSecurityPolicies or OPA Gatekeeper for workload restrictions.
  • Scan images regularly for vulnerabilities (e.g., Trivy, Clair).
  • Encrypt Secrets at rest.

Resource Management Best Practices

  • Always define requests and limits in Pod specs:

yaml

 

resources:

  requests:

    cpu: "100m"

    memory: "256Mi"

  limits:

    cpu: "500m"

    memory: "512Mi"

  • Use Horizontal Pod Autoscaler (HPA):

bash

 

kubectl autoscale deployment my-app --cpu-percent=50 --min=2 --max=5


Networking Best Practices

  • Use NetworkPolicies to restrict traffic between Pods.
  • Prefer Ingress Controllers over exposing NodePorts for public apps.
  • Regularly audit network traffic patterns.

Observability Best Practices

  • Centralize logging with ELK or Loki stack.
  • Use Prometheus and Grafana for proactive monitoring.
  • Set up alerts for critical conditions like high CPU, node disk fill, etc.

🚀 Summary: What You Learned in Chapter 5

  • Kubernetes needs continuous monitoring at cluster, node, and Pod levels
  • Effective troubleshooting involves examining pod status, events, and logs
  • Production-ready clusters follow strict best practices on configuration, security, and resource management
  • Proactive monitoring and alerting systems are essential to maintain uptime and reliability
  • Kubernetes is not “set and forget”—it demands lifecycle management and operational excellence



Back

FAQs


✅ 1. What is Kubernetes, and how does it differ from Docker?

Answer: Docker is used to build and run containers, while Kubernetes is a container orchestration platform that manages the deployment, scaling, and operation of multiple containers across a cluster of machines.

✅ 2. Do I need to learn Docker before learning Kubernetes?

Answer: Yes, a basic understanding of Docker is essential since Kubernetes is designed to manage and orchestrate Docker (or OCI-compatible) containers. You'll need to know how to build and run container images before deploying them with Kubernetes.

✅ 3. What is a Pod in Kubernetes?

Answer: A Pod is the smallest deployable unit in Kubernetes. It encapsulates one or more containers that share the same network, storage, and lifecycle. Pods are used to run containerized applications.

✅ 4. How do I expose my application to the internet using Kubernetes?

Answer: You can expose your application using a Service of type LoadBalancer or NodePort. For more advanced routing (e.g., domain-based routing), you can use an Ingress Controller.

✅ 5. What is a Deployment in Kubernetes?

Answer: A Deployment is a Kubernetes object that ensures a specified number of replicas (Pods) are running at all times. It handles rolling updates, rollback, and maintaining the desired state of the application.

✅ 6. Can Kubernetes run locally for learning and development?

Answer: Yes. Tools like Minikube, Kind, and Docker Desktop (with Kubernetes enabled) allow you to run a local Kubernetes cluster on your machine for development and testing.

✅ 7. What’s the difference between ConfigMap and Secret in Kubernetes?

Answer: Both are used to inject configuration data into Pods. ConfigMaps store non-sensitive data like environment variables, while Secrets are designed to store sensitive data like passwords, API tokens, or keys—encrypted at rest.

✅ 8. How does Kubernetes handle application failure or crashes?

Answer: Kubernetes automatically restarts failed containers, replaces them, reschedules Pods to healthy nodes, and ensures the desired state (like the number of replicas) is always maintained.

✅ 9. How do I monitor applications running in Kubernetes?

Answer: Kubernetes integrates well with monitoring tools like Prometheus, Grafana, Kube-state-metrics, and ELK stack (Elasticsearch, Logstash, Kibana). These tools help you track performance, health, and logs.

✅ 10. Is Kubernetes suitable for small projects or just large enterprises?

Answer: While Kubernetes shines in large, scalable environments, it can also be used for small projects—especially with tools like Minikube or cloud-managed clusters. However, simpler alternatives like Docker Compose may be better suited for truly small-scale applications.