Embark on a journey of knowledge! Take the quiz and earn valuable credits.
Take A QuizChallenge yourself and boost your learning! Start the quiz now to earn credits.
Take A QuizUnlock your potential! Begin the quiz, answer questions, and accumulate credits along the way.
Take A Quiz
🔍 Introduction
Now that Prometheus and Grafana are installed and
configured, it’s time to put them to work.
In this chapter, you’ll learn:
By the end, you’ll be able to monitor any application or
infrastructure with precision and clarity!
🛠️ Part 1: Collecting
Metrics
Monitoring starts at the source: the applications or
systems generating metrics.
🔹 How Applications Expose
Metrics
Typically, applications expose a /metrics HTTP endpoint
where Prometheus scrapes data.
Common libraries available:
✅ If you use any of these
libraries, your app can easily provide Prometheus-compatible metrics.
📋 Example: Simple Metrics
in Python
python
from
prometheus_client import start_http_server, Summary
REQUEST_TIME
= Summary('request_processing_seconds', 'Time spent processing request')
@REQUEST_TIME.time()
def
process_request():
pass
if
__name__ == '__main__':
start_http_server(8000)
while True:
process_request()
➡️ This exposes metrics on
http://localhost:8000/metrics.
🔹 Important Metric Types
in Prometheus
Type |
Description |
Counter |
Monotonically
increasing value (e.g., requests served) |
Gauge |
Value that
can go up or down (e.g., memory usage) |
Histogram |
Sample observations
and bucket them (e.g., request durations) |
Summary |
Similar to
histogram, but with quantiles |
✅ Choosing the right type ensures
meaningful aggregation and analysis.
🔹 Best Practices for
Exposing Metrics
Practice |
Reason |
Use consistent
naming |
Easier querying and
dashboard building |
Add labels thoughtfully |
Don't
over-label, avoid high cardinality |
Include help
strings |
Make metrics
self-documenting |
Avoid exposing sensitive data |
Protect
security and privacy |
📚 Part 2: Querying
Metrics with PromQL
PromQL (Prometheus Query Language) is a powerful,
flexible language for querying time-series data.
Let’s break down its basics:
🔹 Basic PromQL Queries
Query |
What It Does |
up |
Shows if targets are
reachable (1 = up, 0 = down) |
http_requests_total |
Raw counter
of HTTP requests |
rate(http_requests_total[5m]) |
Request rate per
second averaged over last 5 minutes |
avg_over_time(cpu_usage[1h]) |
Average CPU
usage over 1 hour |
🔥 Example: Check Target
Health
promql
up{job="myapp"}
✅ Lists all targets under
"myapp" and their up/down status.
🔥 Example: Error Rate
promql
rate(http_requests_total{status="500"}[5m])
✅ Shows how many HTTP 500 errors
are occurring every second over the last 5 minutes.
🔥 Example: CPU Usage
Above 80%
promql
100
- (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance)
* 100)
>
80
✅ Alerts if CPU usage on any
server exceeds 80%.
📋 Useful Functions in
PromQL
Function |
Purpose |
rate() |
Calculates per-second
average rate |
sum() |
Aggregates
across series |
avg() |
Computes average |
max() |
Finds maximum
value |
count() |
Counts time-series
elements |
📈 Part 3: Visualizing
Metrics with Grafana
With data being collected and queried, it’s time to visualize
metrics meaningfully!
🔹 Creating a New
Dashboard
📋 Common Grafana Panel
Types
Panel Type |
Best for |
Time-Series |
Trends over time (e.g.,
CPU usage, traffic volume) |
Gauge |
Health
indicators (e.g., Memory Usage %) |
Stat |
Single-value metrics
(e.g., current request count) |
Table |
Listing
multiple metrics (e.g., server list, uptime) |
🔹 Example: Building a Web
Traffic Dashboard
Panel 1 - Request Rate:
promql
rate(http_requests_total[1m])
Panel 2 - 5xx Error Rate:
promql
rate(http_requests_total{status=~"5.."}[5m])
Panel
3 - Average Response Time:
promql
avg_over_time(request_duration_seconds_sum[5m])
/ avg_over_time(request_duration_seconds_count[5m])
🔥 Tips for Great
Dashboards
Tip |
Why Important |
Use templating
variables |
Create dynamic
dashboards for multiple services |
Group panels by resource (CPU, Memory, Errors) |
Improve
readability |
Add thresholds and
coloring |
Highlight abnormal
behavior |
Use annotations |
Mark
deployments/events to correlate spikes |
🧩 Part 4: Combining
Queries and Visualizations for Full Observability
Real-world monitoring requires multi-dimensional
dashboards.
Example:
For a Kubernetes app, you may track:
Resource |
PromQL Query |
Pod CPU Usage |
sum(rate(container_cpu_usage_seconds_total[5m]))
by (pod) |
Pod Memory Usage |
sum(container_memory_usage_bytes)
by (pod) |
HTTP Request Rate |
rate(http_requests_total[1m]) |
Error Rate |
rate(http_requests_total{status=~"5.."}[5m]) |
Then use Grafana to:
✅ This provides full
observability into your system's health.
🚀 Conclusion
Collecting, querying, and visualizing metrics are the
core pillars of effective monitoring.
Mastering these steps will transform your monitoring from
basic graphs to operational intelligence — enabling proactive
troubleshooting, performance tuning, and smarter scaling.
In the next chapter, we’ll build alerting systems to
automatically notify teams when problems arise!
Real-time insights are just a dashboard away. 🚀
Answer:
Prometheus is used to collect, store, and query time-series metrics from
applications, servers, databases, and services. It scrapes metrics endpoints at
regular intervals, stores the data locally, and allows you to query and trigger
alerts based on conditions like performance degradation or system failures.
Answer:
Grafana is used to visualize and analyze the metrics collected by
Prometheus. It allows users to build interactive, real-time dashboards
and graphs, making it easier to monitor system health, detect anomalies, and
troubleshoot issues effectively.
Answer:
Prometheus scrapes and stores metrics → Grafana queries Prometheus via APIs →
Grafana visualizes the metrics through dashboards and sends alerts if
conditions are met.
Answer:
You can monitor web applications, microservices, databases, APIs, Kubernetes
clusters, Docker containers, infrastructure resources (CPU, memory, disk),
and virtually anything that exposes metrics in Prometheus format (/metrics
endpoint).
Answer:
Prometheus has a built-in Alertmanager component that manages alert
rules, deduplicates similar alerts, groups them, and routes notifications (via
email, Slack, PagerDuty, etc.). Grafana also supports alerting from dashboards
when thresholds are crossed.
Answer:
PromQL (Prometheus Query Language) is a powerful query language used to
retrieve and manipulate time-series data stored in Prometheus. It supports
aggregation, filtering, math operations, and advanced slicing over time
windows.
Answer:
By default, Prometheus is optimized for short-to-medium term storage
(weeks/months). For long-term storage, it can integrate with systems
like Thanos, Cortex, or remote storage solutions to scale and retain
historical data for years.
Answer:
Yes! Prometheus and Grafana are commonly used together to monitor Kubernetes
clusters, capturing node metrics, pod statuses, resource usage, networking,
and service health. Tools like kube-prometheus-stack simplify this
setup.
Answer:
Grafana supports time-series graphs, gauges, bar charts, heatmaps, pie
charts, histograms, and tables. It also allows users to create dynamic
dashboards using variables and templating for richer interaction.
Answer:
Yes, both Prometheus and Grafana are open-source and free to use.
Grafana also offers paid enterprise editions with additional features
like authentication integration (LDAP, SSO), enhanced security, and advanced
reporting for larger organizations.
Please log in to access this content. You will be redirected to the login page shortly.
LoginReady to take your education and career to the next level? Register today and join our growing community of learners and professionals.
Comments(0)