Monitoring Applications with Prometheus and Grafana: Real-Time Insights for Smarter Operations

856 0 0 0 0

✅ Chapter 3: Collecting, Querying, and Visualizing Metrics

🔍 Introduction

Now that Prometheus and Grafana are installed and configured, it’s time to put them to work.

In this chapter, you’ll learn:

  • How applications expose metrics
  • Best practices for metric naming and labeling
  • How Prometheus collects and stores these metrics
  • Writing PromQL queries to extract meaningful insights
  • Building rich visualizations in Grafana
  • Examples of real-world metric collection and dashboard creation

By the end, you’ll be able to monitor any application or infrastructure with precision and clarity!


🛠️ Part 1: Collecting Metrics


Monitoring starts at the source: the applications or systems generating metrics.


🔹 How Applications Expose Metrics

Typically, applications expose a /metrics HTTP endpoint where Prometheus scrapes data.

Common libraries available:

  • Go: prometheus/client_golang
  • Python: prometheus_client
  • Java: prometheus_client_java
  • Node.js: prom-client
  • .NET: prometheus-net

If you use any of these libraries, your app can easily provide Prometheus-compatible metrics.


📋 Example: Simple Metrics in Python

python

 

from prometheus_client import start_http_server, Summary

 

REQUEST_TIME = Summary('request_processing_seconds', 'Time spent processing request')

 

@REQUEST_TIME.time()

def process_request():

    pass

 

if __name__ == '__main__':

    start_http_server(8000)

    while True:

        process_request()

️ This exposes metrics on http://localhost:8000/metrics.


🔹 Important Metric Types in Prometheus

Type

Description

Counter

Monotonically increasing value (e.g., requests served)

Gauge

Value that can go up or down (e.g., memory usage)

Histogram

Sample observations and bucket them (e.g., request durations)

Summary

Similar to histogram, but with quantiles

Choosing the right type ensures meaningful aggregation and analysis.


🔹 Best Practices for Exposing Metrics

Practice

Reason

Use consistent naming

Easier querying and dashboard building

Add labels thoughtfully

Don't over-label, avoid high cardinality

Include help strings

Make metrics self-documenting

Avoid exposing sensitive data

Protect security and privacy


📚 Part 2: Querying Metrics with PromQL


PromQL (Prometheus Query Language) is a powerful, flexible language for querying time-series data.

Let’s break down its basics:


🔹 Basic PromQL Queries

Query

What It Does

up

Shows if targets are reachable (1 = up, 0 = down)

http_requests_total

Raw counter of HTTP requests

rate(http_requests_total[5m])

Request rate per second averaged over last 5 minutes

avg_over_time(cpu_usage[1h])

Average CPU usage over 1 hour


🔥 Example: Check Target Health

promql

 

up{job="myapp"}

Lists all targets under "myapp" and their up/down status.


🔥 Example: Error Rate

promql

 

rate(http_requests_total{status="500"}[5m])

Shows how many HTTP 500 errors are occurring every second over the last 5 minutes.


🔥 Example: CPU Usage Above 80%

promql

 

100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100)

> 80

Alerts if CPU usage on any server exceeds 80%.


📋 Useful Functions in PromQL

Function

Purpose

rate()

Calculates per-second average rate

sum()

Aggregates across series

avg()

Computes average

max()

Finds maximum value

count()

Counts time-series elements


📈 Part 3: Visualizing Metrics with Grafana


With data being collected and queried, it’s time to visualize metrics meaningfully!


🔹 Creating a New Dashboard

  1. Click "+" Dashboard Add new panel in Grafana
  2. Choose Prometheus as the data source
  3. Enter your PromQL query (e.g., rate(http_requests_total[5m]))
  4. Select the panel type (Graph, Gauge, Stat, Bar Gauge)
  5. Save the dashboard!

📋 Common Grafana Panel Types

Panel Type

Best for

Time-Series

Trends over time (e.g., CPU usage, traffic volume)

Gauge

Health indicators (e.g., Memory Usage %)

Stat

Single-value metrics (e.g., current request count)

Table

Listing multiple metrics (e.g., server list, uptime)


🔹 Example: Building a Web Traffic Dashboard

Panel 1 - Request Rate:

promql

 

rate(http_requests_total[1m])

Panel 2 - 5xx Error Rate:

promql

 

rate(http_requests_total{status=~"5.."}[5m])

Panel 3 - Average Response Time:

promql

 

avg_over_time(request_duration_seconds_sum[5m]) / avg_over_time(request_duration_seconds_count[5m])


🔥 Tips for Great Dashboards

Tip

Why Important

Use templating variables

Create dynamic dashboards for multiple services

Group panels by resource (CPU, Memory, Errors)

Improve readability

Add thresholds and coloring

Highlight abnormal behavior

Use annotations

Mark deployments/events to correlate spikes


🧩 Part 4: Combining Queries and Visualizations for Full Observability


Real-world monitoring requires multi-dimensional dashboards.

Example:
For a Kubernetes app, you may track:

Resource

PromQL Query

Pod CPU Usage

sum(rate(container_cpu_usage_seconds_total[5m])) by (pod)

Pod Memory Usage

sum(container_memory_usage_bytes) by (pod)

HTTP Request Rate

rate(http_requests_total[1m])

Error Rate

rate(http_requests_total{status=~"5.."}[5m])

Then use Grafana to:

  • Visualize CPU/Memory with line graphs
  • Use gauges for error rates
  • Set alert thresholds for spikes or drops

This provides full observability into your system's health.


🚀 Conclusion


Collecting, querying, and visualizing metrics are the core pillars of effective monitoring.

  • Prometheus collects and stores fine-grained data points.
  • PromQL enables complex, actionable queries.
  • Grafana translates metrics into meaningful visuals for fast decisions.

Mastering these steps will transform your monitoring from basic graphs to operational intelligence — enabling proactive troubleshooting, performance tuning, and smarter scaling.

In the next chapter, we’ll build alerting systems to automatically notify teams when problems arise!


Real-time insights are just a dashboard away. 🚀

Back

FAQs


❓1. What is Prometheus used for in application monitoring?

Answer:
Prometheus is used to collect, store, and query time-series metrics from applications, servers, databases, and services. It scrapes metrics endpoints at regular intervals, stores the data locally, and allows you to query and trigger alerts based on conditions like performance degradation or system failures.

❓2. How does Grafana complement Prometheus?

Answer:
Grafana is used to visualize and analyze the metrics collected by Prometheus. It allows users to build interactive, real-time dashboards and graphs, making it easier to monitor system health, detect anomalies, and troubleshoot issues effectively.

❓3. What is the typical data flow between Prometheus and Grafana?

Answer:
Prometheus scrapes and stores metrics → Grafana queries Prometheus via APIs → Grafana visualizes the metrics through dashboards and sends alerts if conditions are met.

❓4. What kind of applications can be monitored with Prometheus and Grafana?

Answer:
You can monitor web applications, microservices, databases, APIs, Kubernetes clusters, Docker containers, infrastructure resources (CPU, memory, disk), and virtually anything that exposes metrics in Prometheus format (/metrics endpoint).

❓5. How do Prometheus and Grafana handle alerting?

Answer:
Prometheus has a built-in Alertmanager component that manages alert rules, deduplicates similar alerts, groups them, and routes notifications (via email, Slack, PagerDuty, etc.). Grafana also supports alerting from dashboards when thresholds are crossed.

❓6. What is PromQL?

Answer:
PromQL (Prometheus Query Language) is a powerful query language used to retrieve and manipulate time-series data stored in Prometheus. It supports aggregation, filtering, math operations, and advanced slicing over time windows.

❓7. Can Prometheus store metrics data long-term?

Answer:
By default, Prometheus is optimized for short-to-medium term storage (weeks/months). For long-term storage, it can integrate with systems like Thanos, Cortex, or remote storage solutions to scale and retain historical data for years.

❓8. Is it possible to monitor Kubernetes clusters with Prometheus and Grafana?

Answer:
Yes! Prometheus and Grafana are commonly used together to monitor Kubernetes clusters, capturing node metrics, pod statuses, resource usage, networking, and service health. Tools like kube-prometheus-stack simplify this setup.

❓9. What types of visualizations can Grafana create?

Answer:
Grafana supports time-series graphs, gauges, bar charts, heatmaps, pie charts, histograms, and tables. It also allows users to create dynamic dashboards using variables and templating for richer interaction.

❓10. Are Prometheus and Grafana free to use?

Answer:
Yes, both Prometheus and Grafana are open-source and free to use. Grafana also offers paid enterprise editions with additional features like authentication integration (LDAP, SSO), enhanced security, and advanced reporting for larger organizations.