// See everything. Know instantly.
GRAFANA GIVES YOU SUPERPOWERS.
In a world drowning in data, Grafana transforms raw metrics into actionable insights. It's not just a dashboard toolβit's the lens through which you view your entire infrastructure. When something breaks at 3 AM, Grafana tells you what, when, and why.
VISUALIZE ANYTHING.
From server CPU usage to business KPIs, from network traffic to application latency. Grafana connects to Prometheus, InfluxDB, Elasticsearch, PostgreSQL, and dozens of other data sources. You decide what to measure. Grafana makes it beautiful.
ALERT BEFORE DISASTER STRIKES.
Proactive monitoring means fixing problems before users notice. Grafana's alerting system notifies you via email, Slack, PagerDuty, or webhook when metrics cross thresholds. Sleep better at night knowing Grafana is watching your systems.
12 lessons. Complete Grafana control.
What is observability? Installing Grafana and understanding the interface.
BeginnerConnecting to Prometheus, InfluxDB, Elasticsearch, and more.
BeginnerCreating panels, queries, and organizing dashboards.
BeginnerPromQL, InfluxQL, and other query languages for time series data.
IntermediateGraphs, heatmaps, tables, gauges, and advanced visualizations.
IntermediateDynamic dashboards with variables and template queries.
IntermediateCreating alerts, notification policies, and alert rules.
IntermediateEmail, Slack, PagerDuty, webhook integrations.
IntermediateTeams, permissions, and organization management.
AdvancedInstalling plugins, building custom panels, Grafana Loki.
AdvancedGrafana API, provisioning, and infrastructure as code.
AdvancedHigh availability, security, and scaling Grafana.
AdvancedGrafana is an open-source platform for data visualization and monitoring. It connects to various data sources and transforms data into beautiful, interactive dashboards. Originally created in 2014, it has become the de facto standard for observability dashboards.
Grafana is used by companies of all sizesβfrom small startups to massive enterprises like Google, Netflix, and PayPal. It's the visualization layer for monitoring systems, providing the "what happened" and "why" behind your metrics.
Observability is the ability to measure the internal states of a system by examining its outputs. In IT operations, this means:
Grafana excels at metrics and can integrate with logging (Loki) and tracing (Tempo) systems for complete observability.
# Add Grafana repository sudo apt-get install -y apt-transport-https software-properties-common wget wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add - echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list # Install sudo apt update sudo apt install grafana # Start sudo systemctl start grafana-server sudo systemctl enable grafana-server
# Run Grafana docker run -d \ --name=grafana \ -p 3000:3000 \ -v grafana-data:/var/lib/grafana \ grafana/grafana
Access Grafana at http://localhost:3000. Default credentials: admin/admin
The Grafana interface consists of several key areas:
Grafana supports 50+ data sources. Let's cover the most popular ones.
Prometheus is the most common pairing with Grafanaβa powerful time-series database designed for metrics collection.
http://localhost:9090# If you don't have Prometheus, install it:
docker run -d \
--name=prometheus \
-p 9090:9090 \
-v prometheus.yml:/etc/prometheus/prometheus.yml \
prom/prometheus
# prometheus.yml:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
InfluxDB is popular for custom application metrics.
# Run InfluxDB docker run -d \ --name=influxdb \ -p 8086:8086 \ influxdb:2 # Configure in Grafana: # URL: http://localhost:8086 # Database: mydb # Username: admin # Password: yourpassword
Use SQL databases for metrics that live in your application database.
# Query example: SELECT $__timeGroup(created_at, '5m') AS time, count(*) AS request_count, avg(response_time) AS avg_response_time FROM http_requests WHERE $__timeFilter(created_at) GROUP BY 1 ORDER BY 1
For log and document storage visualization.
# Configure: # URL: http://localhost:9200 # Index name: logs-* # Time field: @timestamp
You can add multiple data sources and reference them in different panels:
# In a single dashboard: # - Panel 1: Queries Prometheus (infrastructure metrics) # - Panel 2: Queries InfluxDB (application metrics) # - Panel 3: Queries PostgreSQL (business metrics)
Let's query Prometheus for CPU usage:
# Prometheus query:
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
This calculates the percentage of CPU that's NOT idle, giving you total CPU usage.
The panel editor has several tabs:
Grafana offers multiple sharing options:
Prometheus Query Language (PromQL) is powerful for time-series data.
# Direct metric
node_cpu_seconds_total
# With label filter
node_cpu_seconds_total{mode="idle"}
# Rate - per-second rate of change
rate(node_cpu_seconds_total[5m])
# Irate - instant rate
irate(node_cpu_seconds_total[5m])
# Sum all values
sum(node_cpu_seconds_total)
# Average
avg(rate(http_requests_total[5m]))
# Max/Min
max(node_memory_MemAvailable_bytes)
# Count
count(node_cpu_seconds_total{mode="user"})
# By label
sum by (instance) (rate(node_cpu_seconds_total[5m]))
# Calculate percentage
100 - (avg by (mode) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Moving average
avg_over_time(node_memory_MemAvailable_bytes[5m])
# Predict future
predict_linear(node_memory_MemAvailable_bytes[1h], 3600)
# Use $variable in queries
node_exporter_build_info{job="$job"}
# Multi-value variable
node_exporter_build_info{job=~"$job"}
# Time ranges
$__range_s
Grafana offers dozens of visualizations:
The classic time-series visualization:
# Visualization options: # - Mode: Lines, Bars, Points # - Line interpolation: Smooth, Step, Linear # - Line width: 1-5px # - Fill opacity: 0-100% # - Gradient mode: Opacity, Hue, Saturation
Display a single value prominently:
Color-code values based on thresholds:
# Thresholds: # Green: 0-70 # Yellow: 70-90 # Red: 90-100 # Base: green # Threshold 1: 70 (yellow) # Threshold 2: 90 (red)
Transform values for display:
# Map numeric codes to text: # 0 -> OK # 1 -> Warning # 2 -> Critical # Map boolean: # true -> Active # false -> Inactive
Variables make dashboards dynamic and reusable.
$variableName# Query type: Query # Data source: Prometheus # Query: label_values(node_exporter_build_info, job) # Name: job # Multi-value: β # Include All option: β
Make selections cascade:
# Variable 1: $environment
# Query: label_values(node_exporter_build_info, environment)
# Variable 2: $host
# Query: label_values(node_exporter_build_info{environment="$environment"}, instance)
# Result: Select environment, then available hosts in that environment
# Prometheusadhoc variable (filters):
# Metric: node_network_receive_bytes_total
# Filters: {{label}}="{{value}}"
# Using regex:
# Query: label_values(up{job=~"$job.*"}, instance)
Grafana alerting monitors your metrics and notifies you when thresholds are crossed.
# Condition: # WHEN: avg() OF query(A, 5m, now) IS ABOVE 80 # This triggers when the 5-minute average # of query A exceeds 80
# Basic: IS ABOVE / IS BELOW / IS OUTSIDE RANGE / IS WITHIN RANGE
# Example queries:
# CPU usage above 80%
avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) < 20
# Memory above 90%
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes > 0.9
# Request errors above 5%
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05
# How often to check: # Evaluation interval: 5m (check every 5 minutes) # How long condition must be true: # For: 5m (trigger after 5 minutes of violation) # This prevents flapping alerts
Grafana supports many notification destinations:
[smtp] enabled = true host = smtp.example.com:587 user = grafana@example.com password = yourpassword from_address = grafana@example.com from_name = Grafana Alert
# In Slack: # 1. Create Incoming Webhook # 2. Copy webhook URL # In Grafana: # 1. Add Slack notification channel # 2. Paste webhook URL # 3. Set recipient (#alerts or @username) # 4. Test notification
# Webhook sends JSON:
{
"title": "[FIRING:1] CPU High (grafana)",
"message": "CPU usage above 80%",
"state": "alerting",
"evalMatches": [...],
"ruleUrl": "http://localhost:3000/alerting/..."
}
# Handle in your application to:
# - Create tickets
# - Page on-call
# - Run automation
Route alerts based on labels:
# Default policy: All alerts go to email # Custom policies: # - IF severity=critical -> Slack #critical + PagerDuty # - IF team=backend -> Slack #backend-alerts # - IF service=api -> Email on-call@company.com
Grafana uses organizations for multi-tenancy:
# Team permissions: # - Members can edit # - Can admin # - View only
# grafana.ini [auth.ldap] enabled = true config_file = /etc/grafana/ldap.toml # ldap.toml [[servers]] host = "ldap.example.com" port = 636 use_ssl = true [[servers.group_mappings]] group_dn = "cn=admins,ou=groups,dc=example,dc=com" org_role = "Admin"
Extend Grafana with plugins:
# Via grafana-cli grafana-cli plugins install grafana-worldmap-panel grafana-cli plugins install grafana-piechart-panel # Via Docker docker run -d -p 3000:3000 \ -v $(pwd)/plugins:/var/lib/grafana/plugins \ -e GF_PLUGINS=/var/lib/grafana/plugins \ grafana/grafana # Restart Grafana after installation sudo systemctl restart grafana-server
Loki is Grafana's log aggregation system:
# Run Loki
docker run -d --name=loki -p 3100:3100 grafana/loki
# Configure Loki as data source:
# URL: http://localhost:3100
# LogQL queries:
{job="nginx"} |= "error"
{job="nginx"} | json | status_code >= 500
rate({job="app"}[5m])
Everything in Grafana can be automated via API.
# Get API key: Configuration > API Keys
# Base URL
curl -H "Authorization: Bearer $API_KEY" \
http://localhost:3000/api/dashboards/uid/my-dashboard
# List dashboards
curl -H "Authorization: Bearer $API_KEY" \
http://localhost:3000/api/search?type=dash-db
# Create dashboard
curl -H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-X POST http://localhost:3000/api/dashboards/db \
-d '{"dashboard": {...}}'
Declaratively define dashboards, data sources, and more:
# provisioning/datasources/datasources.yml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
- name: Loki
type: loki
url: http://loki:3100
# provisioning/dashboards/dashboards.yml
apiVersion: 1
providers:
- name: 'Dashboards'
orgId: 1
folder: 'Monitoring'
type: file
options:
path: /var/lib/grafana/dashboards
# Export dashboard JSON and version control curl -H "Authorization: Bearer $API_KEY" \ http://localhost:3000/api/dashboards/uid/my-dashboard \ > dashboards/my-dashboard.json # Or use grafonnet library to generate from templates # https://github.com/grafana/grafonnet
For HA, use multiple Grafana instances:
# Use external database: # PostgreSQL or MySQL # grafana.ini: [database] type = postgres host = dbserver:5432 name = grafana user = grafana password = yourpassword # Use shared data sources: # Configure data sources with same URL across instances # Or use load balancer
# Backup: # - Database (dashboards, users, settings) # - Provisioning files # - Plugins # - Dashboards JSON # Grafana stores in: # - Database (SQLite, PostgreSQL, MySQL) # - /var/lib/grafana (dashboards, plugins, etc.)
You've completed the Grafana mastery guide. You now know how to:
Next steps: