· Sysadmin.id · Monitoring · 9 min read
Server Monitoring with Prometheus, Grafana, and node_exporter
Build a real monitoring stack on Ubuntu 24.04 — Prometheus for metrics, node_exporter on every server, Grafana dashboards, and your first alert rules. No SaaS bill, no black boxes.

If a disk fills up at 3 AM and nothing tells you, your users will — at 9 AM, loudly. Monitoring is the difference between fixing problems and explaining outages.
Prometheus + Grafana is the standard self-hosted answer: Prometheus scrapes and stores metrics, node_exporter exposes them from each server, Grafana turns them into dashboards. All open source, all running on your own hardware, no per-host SaaS pricing.
This guide builds the full stack on Ubuntu 24.04: one monitoring server running Prometheus and Grafana, node_exporter on every machine you want to watch, a ready-made dashboard, and two alert rules that catch the most common failures — a host going dark and a disk filling up.
Prerequisites
Before you start, make sure you have:
- A monitoring server running Ubuntu 24.04 LTS — 2 GB RAM is plenty for a small fleet
- Root or sudo access on it and on every server you want to monitor
- Network reachability from the monitoring server to each target on TCP port
9100
Prometheus pulls, it doesn’t receive. The monitoring server connects out to each target’s node_exporter. Targets never push anything, which is why a dead host is so easy to detect — the scrape just fails.
Step 1: Install Prometheus
Ubuntu’s repositories carry an old Prometheus, so install the official binary. First create a system user for it — daemons shouldn’t run as root:
sudo useradd --system --no-create-home --shell /usr/sbin/nologin prometheusDownload and install the binaries (check the releases page for the current version and substitute it below):
PROM_VERSION=3.4.1
cd /tmp
curl -LO https://github.com/prometheus/prometheus/releases/download/v${PROM_VERSION}/prometheus-${PROM_VERSION}.linux-amd64.tar.gz
tar xzf prometheus-${PROM_VERSION}.linux-amd64.tar.gz
sudo mv prometheus-${PROM_VERSION}.linux-amd64/prometheus /usr/local/bin/
sudo mv prometheus-${PROM_VERSION}.linux-amd64/promtool /usr/local/bin/Create the config and data directories:
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /var/lib/prometheusVerify the binary runs:
prometheus --versionStep 2: Configure Prometheus
Create /etc/prometheus/prometheus.yml. This tells Prometheus how often to scrape and what to scrape:
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- /etc/prometheus/rules.yml
scrape_configs:
# Prometheus watches itself — free and useful
- job_name: prometheus
static_configs:
- targets: ['localhost:9090']
# Every server you monitor goes in this list
- job_name: node
static_configs:
- targets:
- 'localhost:9100' # the monitoring server itself
- '10.0.0.11:9100' # web01 — use your real IPs/hostnames
- '10.0.0.12:9100' # db01Two settings worth understanding:
scrape_interval: 15s— Prometheus polls every target every 15 seconds. Fine for almost everyone; don’t go below10swithout a reason.static_configs— the simplest way to list targets. Once the fleet grows past a handful of servers, switch tofile_sd_configsand generate the target list with Ansible instead of editing this file by hand.
Create an empty rules file for now (we fill it in Step 7) and hand the config directory to the prometheus user:
sudo touch /etc/prometheus/rules.yml
sudo chown -R prometheus:prometheus /etc/prometheusStep 3: Run Prometheus as a systemd Service
Create /etc/systemd/system/prometheus.service:
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus \
--storage.tsdb.retention.time=30d \
--web.listen-address=127.0.0.1:9090
Restart=on-failure
[Install]
WantedBy=multi-user.targetTwo deliberate choices here:
--web.listen-address=127.0.0.1:9090binds Prometheus to localhost only. Its web UI has no authentication — don’t expose it. Grafana runs on the same box and reaches it over localhost.--storage.tsdb.retention.time=30dkeeps 30 days of metrics. Adjust to taste; disk usage is roughly 1–2 GB per month for a handful of servers.
Start it:
sudo systemctl daemon-reload
sudo systemctl enable --now prometheus
sudo systemctl status prometheusTo peek at the UI from your laptop, tunnel it over SSH rather than opening the port:
ssh -L 9090:localhost:9090 you@monitor-server
# then open http://localhost:9090 locallyStep 4: Install node_exporter on Every Server
node_exporter exposes the machine’s vitals — CPU, memory, disk, network, filesystem — as metrics on port 9100. It goes on every server you want to monitor, including the monitoring server itself.
sudo useradd --system --no-create-home --shell /usr/sbin/nologin node_exporter
NODE_VERSION=1.9.1
cd /tmp
curl -LO https://github.com/prometheus/node_exporter/releases/download/v${NODE_VERSION}/node_exporter-${NODE_VERSION}.linux-amd64.tar.gz
tar xzf node_exporter-${NODE_VERSION}.linux-amd64.tar.gz
sudo mv node_exporter-${NODE_VERSION}.linux-amd64/node_exporter /usr/local/bin/Create /etc/systemd/system/node_exporter.service:
[Unit]
Description=Prometheus Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
Restart=on-failure
[Install]
WantedBy=multi-user.targetStart it and check it answers:
sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter
curl -s localhost:9100/metrics | headYou should see plaintext metrics scroll by — lines like node_cpu_seconds_total{cpu="0",mode="idle"}.
Doing this on ten servers by hand gets old fast. This exact sequence — user, binary, unit file, firewall rule — is a perfect first Ansible playbook. That’s a separate post.
Step 5: Lock Down the Exporter Port
node_exporter has no authentication either, and its metrics leak more about a box than you’d like (kernel version, mount points, NIC names). Only the monitoring server should reach port 9100. On each monitored host, with UFW:
sudo ufw allow from <MONITORING_SERVER_IP> to any port 9100 proto tcpDon’t add a general allow 9100 rule. If your servers talk over a private network or WireGuard tunnel, point Prometheus at the private IPs and keep 9100 closed on the public interface entirely.
Now restart Prometheus to pick up your target list, and confirm every target is green at Status → Targets in the Prometheus UI (through the SSH tunnel):
sudo systemctl restart prometheusEvery target should show state UP. If one is DOWN, the error message on that page tells you why — it’s almost always the firewall.
Step 6: Install Grafana and Import a Dashboard
Grafana ships its own apt repository — install from there, on the monitoring server:
sudo apt install -y apt-transport-https software-properties-common wget
sudo mkdir -p /etc/apt/keyrings
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null
echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install -y grafana
sudo systemctl enable --now grafana-serverOpen http://<monitoring-server>:3000 (allow the port for your IP: sudo ufw allow from <YOUR_IP> to any port 3000 proto tcp). Log in with admin / admin — it forces a password change immediately.
Then two clicks of setup:
- Add the data source — Connections → Data sources → Add data source → Prometheus. Set the URL to
http://localhost:9090and click Save & test. - Import a dashboard — Dashboards → New → Import, enter dashboard ID
1860(“Node Exporter Full”), pick your Prometheus data source, and click Import.
That’s the payoff moment: CPU, memory, disk I/O, network, filesystem usage — per host, live, with a host selector at the top. You didn’t build a single panel.
Serving Grafana properly: for anything beyond a lab, put it behind Nginx with a Let’s Encrypt certificate instead of exposing port 3000 — the same reverse-proxy setup from my earlier Nginx post applies verbatim.
Step 7: Add Alert Rules
Dashboards are for looking; alerts are for sleeping. The two rules that earn their keep on day one: a host stopped responding and a disk is filling up.
Edit /etc/prometheus/rules.yml:
groups:
- name: node-alerts
rules:
- alert: InstanceDown
expr: up == 0
for: 2m
labels:
severity: critical
annotations:
summary: '{{ $labels.instance }} is not responding to scrapes'
- alert: DiskAlmostFull
expr: (node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"} / node_filesystem_size_bytes{fstype!~"tmpfs|overlay"}) < 0.15
for: 10m
labels:
severity: warning
annotations:
summary: '{{ $labels.instance }} {{ $labels.mountpoint }} has under 15% free space'How they work:
up == 0—upis a metric Prometheus generates for every scrape:1if the target answered,0if not. Thefor: 2mmeans the target must be down for two full minutes before the alert fires — no pages for a blip.DiskAlmostFulldivides available bytes by total bytes per filesystem and alerts under 15%. Thefstypefilter skips tmpfs and container overlay mounts, which are always “full” and always noise.
Always validate before restarting — a syntax error in the rules file stops Prometheus from starting:
promtool check config /etc/prometheus/prometheus.yml
sudo systemctl restart prometheusFiring alerts appear under Alerts in the Prometheus UI. To actually deliver them — email, Telegram, Slack — you add Alertmanager, which deserves its own post. In the meantime, Grafana can also alert on any panel if you need notifications today.
PromQL Starters
The dashboard covers most needs, but sooner or later you’ll want to query directly. Three expressions worth keeping:
| What | Query |
|---|---|
| CPU usage % per host | 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) |
| Memory available % | node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 |
| Disk free % per mount | node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"} / node_filesystem_size_bytes * 100 |
Paste any of them into the Prometheus UI’s expression browser or a new Grafana panel.
Common Issues and Fixes
Target shows DOWN with “connection refused”
The exporter isn’t running or the firewall is blocking the monitoring server. Check systemctl status node_exporter on the target, then confirm the UFW rule from Step 5 names the monitoring server’s actual source IP — on a private network that’s the private IP, not the public one.
Target shows DOWN with “context deadline exceeded”
The TCP connection opens but the scrape times out — usually a routing or MTU problem on VPN links, or a box under extreme load. Try curl http://<target>:9100/metrics from the monitoring server to see the raw behavior.
Grafana dashboard is empty but targets are UP
Almost always the data source or the job label. Open the dashboard’s host dropdown — if it’s empty, the dashboard’s queries expect the default Prometheus data source; re-import the dashboard and pick yours explicitly.
Prometheus won’t start after a config change
Run promtool check config /etc/prometheus/prometheus.yml — it points at the exact line. YAML indentation errors in scrape_configs are the usual culprit.
Metrics stop after a reboot
You forgot systemctl enable. enable --now in the steps above covers it, but if you started services manually while testing, they won’t survive a reboot without the enable.
Summary
Here’s what you built:
- Prometheus installed from official binaries, running as an unprivileged systemd service, bound to localhost
- node_exporter on every server, exposing system metrics on port
9100 - Firewall rules so only the monitoring server can scrape the exporters
- Grafana with the Node Exporter Full dashboard — full fleet visibility with zero panels built by hand
- Two alert rules that catch dead hosts and filling disks before your users do
The natural next steps: Alertmanager for real notifications, and Ansible to roll node_exporter across the fleet in one command — both are queued up as future posts.
Running servers without monitoring, or drowning in a noisy setup that pages you for nothing? Get in touch — monitoring is one of the most satisfying things to fix.
- prometheus
- grafana
- node-exporter
- monitoring
- linux
- ubuntu



