Linux Sysadmin & Server Management Freelancer from Indonesia

If a disk fills up at 3 AM and nothing tells you, your users will — at 9 AM, loudly. Monitoring is the difference between fixing problems and explaining outages.

Prometheus + Grafana is the standard self-hosted answer: Prometheus scrapes and stores metrics, node_exporter exposes them from each server, Grafana turns them into dashboards. All open source, all running on your own hardware, no per-host SaaS pricing.

This guide builds the full stack on Ubuntu 24.04: one monitoring server running Prometheus and Grafana, node_exporter on every machine you want to watch, a ready-made dashboard, and two alert rules that catch the most common failures — a host going dark and a disk filling up.

Prerequisites

Before you start, make sure you have:

A monitoring server running Ubuntu 24.04 LTS — 2 GB RAM is plenty for a small fleet
Root or sudo access on it and on every server you want to monitor
Network reachability from the monitoring server to each target on TCP port 9100

Prometheus pulls, it doesn’t receive. The monitoring server connects out to each target’s node_exporter. Targets never push anything, which is why a dead host is so easy to detect — the scrape just fails.

Step 1: Install Prometheus

Ubuntu’s repositories carry an old Prometheus, so install the official binary. First create a system user for it — daemons shouldn’t run as root:

sudo useradd --system --no-create-home --shell /usr/sbin/nologin prometheus

Download and install the binaries (check the releases page for the current version and substitute it below):

PROM_VERSION=3.4.1
cd /tmp
curl -LO https://github.com/prometheus/prometheus/releases/download/v${PROM_VERSION}/prometheus-${PROM_VERSION}.linux-amd64.tar.gz
tar xzf prometheus-${PROM_VERSION}.linux-amd64.tar.gz

sudo mv prometheus-${PROM_VERSION}.linux-amd64/prometheus /usr/local/bin/
sudo mv prometheus-${PROM_VERSION}.linux-amd64/promtool /usr/local/bin/

Create the config and data directories:

sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus

Verify the binary runs:

prometheus --version

Step 2: Configure Prometheus

Create /etc/prometheus/prometheus.yml. This tells Prometheus how often to scrape and what to scrape:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - /etc/prometheus/rules.yml

scrape_configs:
  # Prometheus watches itself — free and useful
  - job_name: prometheus
    static_configs:
      - targets: ['localhost:9090']

  # Every server you monitor goes in this list
  - job_name: node
    static_configs:
      - targets:
          - 'localhost:9100' # the monitoring server itself
          - '10.0.0.11:9100' # web01 — use your real IPs/hostnames
          - '10.0.0.12:9100' # db01

Two settings worth understanding:

scrape_interval: 15s — Prometheus polls every target every 15 seconds. Fine for almost everyone; don’t go below 10s without a reason.
static_configs — the simplest way to list targets. Once the fleet grows past a handful of servers, switch to file_sd_configs and generate the target list with Ansible instead of editing this file by hand.

Create an empty rules file for now (we fill it in Step 7) and hand the config directory to the prometheus user:

sudo touch /etc/prometheus/rules.yml
sudo chown -R prometheus:prometheus /etc/prometheus

Step 3: Run Prometheus as a systemd Service

Create /etc/systemd/system/prometheus.service:

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus \
  --storage.tsdb.retention.time=30d \
  --web.listen-address=127.0.0.1:9090
Restart=on-failure

[Install]
WantedBy=multi-user.target

Two deliberate choices here:

--web.listen-address=127.0.0.1:9090 binds Prometheus to localhost only. Its web UI has no authentication — don’t expose it. Grafana runs on the same box and reaches it over localhost.
--storage.tsdb.retention.time=30d keeps 30 days of metrics. Adjust to taste; disk usage is roughly 1–2 GB per month for a handful of servers.

Start it:

sudo systemctl daemon-reload
sudo systemctl enable --now prometheus
sudo systemctl status prometheus

To peek at the UI from your laptop, tunnel it over SSH rather than opening the port:

ssh -L 9090:localhost:9090 you@monitor-server
# then open http://localhost:9090 locally

Step 4: Install node_exporter on Every Server

node_exporter exposes the machine’s vitals — CPU, memory, disk, network, filesystem — as metrics on port 9100. It goes on every server you want to monitor, including the monitoring server itself.

sudo useradd --system --no-create-home --shell /usr/sbin/nologin node_exporter

NODE_VERSION=1.9.1
cd /tmp
curl -LO https://github.com/prometheus/node_exporter/releases/download/v${NODE_VERSION}/node_exporter-${NODE_VERSION}.linux-amd64.tar.gz
tar xzf node_exporter-${NODE_VERSION}.linux-amd64.tar.gz
sudo mv node_exporter-${NODE_VERSION}.linux-amd64/node_exporter /usr/local/bin/

Create /etc/systemd/system/node_exporter.service:

[Unit]
Description=Prometheus Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
Restart=on-failure

[Install]
WantedBy=multi-user.target

Start it and check it answers:

sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter

curl -s localhost:9100/metrics | head

You should see plaintext metrics scroll by — lines like node_cpu_seconds_total{cpu="0",mode="idle"}.

Doing this on ten servers by hand gets old fast. This exact sequence — user, binary, unit file, firewall rule — is a perfect first Ansible playbook. That’s a separate post.

Step 5: Lock Down the Exporter Port

node_exporter has no authentication either, and its metrics leak more about a box than you’d like (kernel version, mount points, NIC names). Only the monitoring server should reach port 9100. On each monitored host, with UFW:

sudo ufw allow from <MONITORING_SERVER_IP> to any port 9100 proto tcp

Don’t add a general allow 9100 rule. If your servers talk over a private network or WireGuard tunnel, point Prometheus at the private IPs and keep 9100 closed on the public interface entirely.

Now restart Prometheus to pick up your target list, and confirm every target is green at Status → Targets in the Prometheus UI (through the SSH tunnel):

sudo systemctl restart prometheus

Every target should show state UP. If one is DOWN, the error message on that page tells you why — it’s almost always the firewall.

Step 6: Install Grafana and Import a Dashboard

Grafana ships its own apt repository — install from there, on the monitoring server:

sudo apt install -y apt-transport-https software-properties-common wget
sudo mkdir -p /etc/apt/keyrings
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null
echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee /etc/apt/sources.list.d/grafana.list

sudo apt update
sudo apt install -y grafana
sudo systemctl enable --now grafana-server

Open http://<monitoring-server>:3000 (allow the port for your IP: sudo ufw allow from <YOUR_IP> to any port 3000 proto tcp). Log in with admin / admin — it forces a password change immediately.

Then two clicks of setup:

Add the data source — Connections → Data sources → Add data source → Prometheus. Set the URL to http://localhost:9090 and click Save & test.
Import a dashboard — Dashboards → New → Import, enter dashboard ID 1860 (“Node Exporter Full”), pick your Prometheus data source, and click Import.

That’s the payoff moment: CPU, memory, disk I/O, network, filesystem usage — per host, live, with a host selector at the top. You didn’t build a single panel.

Serving Grafana properly: for anything beyond a lab, put it behind Nginx with a Let’s Encrypt certificate instead of exposing port 3000 — the same reverse-proxy setup from my earlier Nginx post applies verbatim.

Step 7: Add Alert Rules

Dashboards are for looking; alerts are for sleeping. The two rules that earn their keep on day one: a host stopped responding and a disk is filling up.

Edit /etc/prometheus/rules.yml:

groups:
  - name: node-alerts
    rules:
      - alert: InstanceDown
        expr: up == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: '{{ $labels.instance }} is not responding to scrapes'

      - alert: DiskAlmostFull
        expr: (node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"} / node_filesystem_size_bytes{fstype!~"tmpfs|overlay"}) < 0.15
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: '{{ $labels.instance }} {{ $labels.mountpoint }} has under 15% free space'

How they work:

up == 0 — up is a metric Prometheus generates for every scrape: 1 if the target answered, 0 if not. The for: 2m means the target must be down for two full minutes before the alert fires — no pages for a blip.
DiskAlmostFull divides available bytes by total bytes per filesystem and alerts under 15%. The fstype filter skips tmpfs and container overlay mounts, which are always “full” and always noise.

Always validate before restarting — a syntax error in the rules file stops Prometheus from starting:

promtool check config /etc/prometheus/prometheus.yml
sudo systemctl restart prometheus

Firing alerts appear under Alerts in the Prometheus UI. To actually deliver them — email, Telegram, Slack — you add Alertmanager, which deserves its own post. In the meantime, Grafana can also alert on any panel if you need notifications today.

PromQL Starters

The dashboard covers most needs, but sooner or later you’ll want to query directly. Three expressions worth keeping:

What	Query
CPU usage % per host	`100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)`
Memory available %	`node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100`
Disk free % per mount	`node_filesystem_avail_bytes{fstype!~"tmpfs\|overlay"} / node_filesystem_size_bytes * 100`

Paste any of them into the Prometheus UI’s expression browser or a new Grafana panel.

Common Issues and Fixes

Target shows DOWN with “connection refused”

The exporter isn’t running or the firewall is blocking the monitoring server. Check systemctl status node_exporter on the target, then confirm the UFW rule from Step 5 names the monitoring server’s actual source IP — on a private network that’s the private IP, not the public one.

Target shows DOWN with “context deadline exceeded”

The TCP connection opens but the scrape times out — usually a routing or MTU problem on VPN links, or a box under extreme load. Try curl http://<target>:9100/metrics from the monitoring server to see the raw behavior.

Grafana dashboard is empty but targets are UP

Almost always the data source or the job label. Open the dashboard’s host dropdown — if it’s empty, the dashboard’s queries expect the default Prometheus data source; re-import the dashboard and pick yours explicitly.

Prometheus won’t start after a config change

Run promtool check config /etc/prometheus/prometheus.yml — it points at the exact line. YAML indentation errors in scrape_configs are the usual culprit.

Metrics stop after a reboot

You forgot systemctl enable. enable --now in the steps above covers it, but if you started services manually while testing, they won’t survive a reboot without the enable.

Summary

Here’s what you built:

Prometheus installed from official binaries, running as an unprivileged systemd service, bound to localhost
node_exporter on every server, exposing system metrics on port 9100
Firewall rules so only the monitoring server can scrape the exporters
Grafana with the Node Exporter Full dashboard — full fleet visibility with zero panels built by hand
Two alert rules that catch dead hosts and filling disks before your users do

The natural next steps: Alertmanager for real notifications, and Ansible to roll node_exporter across the fleet in one command — both are queued up as future posts.

Running servers without monitoring, or drowning in a noisy setup that pages you for nothing? Get in touch — monitoring is one of the most satisfying things to fix.