Effective monitoring shouldn’t require complex infrastructure. In this guide, Noveo Senior Developer Andrey walks through setting up Grafana’s monitoring stack with Docker—starting with Loki for centralized logs. Whether you’re debugging an application or looking for better system visibility, this practical approach balances simplicity and power, even for smaller deployments. Let’s break it down step by step.
Intro
Grafana monitoring stack, consisting of Alloy, Loki, Prometheus, and Tempo, is a distributed modern monitoring system built in Go, intended for collecting monitoring information from backend applications running on one or more servers.
It is also usable for monitoring mobile and desktop applications (it depends on the level of Opentelemetry support for your chosen language if we go with otlp at least. For example for Java opentelemetry reports a Stable level for both Traces, Metrics, and Logs, and thus full support is available for Android Java apps), and to some degree usable for web frontend client-side monitoring too.
__________
Note
The series of articles is written with the assumption we will be using Opentelemetry protocol as the main one for tracing related activities at least (but also abusing for metrics and logs too if necessary), but do know that Grafana Alloy tracing/logging/metrics collecting agent supports plenty of alternative protocols and your language support in them could be with some chance better than in opentelemetry.
__________
We will overview configuring this monitoring stack in a Docker-based approach for your homelab and for companies with a simplistic approach to infrastructure.
The article aims to make the monitoring system more accessible for a large number of people (their homelabs and basic production setups) and for this reason, we go with the Docker approach instead of the Kubernetes one.
If you run serious production with high load, it is better to run Grafana/Loki/Mimir (instead of Prometheus)/Tempo in Kubernetes instead, since its ecosystem with helm charts already made it easy to run it in a horizontal scalable way and able to take much larger workload.
The article will dive into configuring the monitoring with Docker-Compose and Opentofu (Terraform).
When in doubt, check Terraform-related code in infra repo for the source of truth as it is the version I run for my homelab.
It is worth it to configure this monitoring distributed stack even if you have only one backend application running at your servers (or even if you have only a mobile app). Well configured monitoring will grant you the ability to debug your application information significantly easier, and a well configured logging monitoring backend will give you the ability to filter data by any key/value in the logging records. It is possible to even build graphical dashboards based on Logging Information alone for overview of important information!
__________
Note
Grafana Loki became significantly more pleasant with the introduction in the 2024 year of a new Drilldown interface that simplifies navigation considerably.
The old "Explore" interfaces still have some usage cases left that new Drilldown interfaces do not cover yet, but the gap is quickly closing and for the Logging part I believe there is no big justification to open the old "Explore" interface any longer.
__________
Tip
I recommend you invest properly into other forms of monitoring like Metrics, as they help overview the healthy functioning of your application in a high high-performing way and bring you simplification in investigating problems raised from your next deployments.
Metrics' way of monitoring has plenty of open source solutions that bring them running out of the box for any type of infrastructure object.
It is also a good idea depending on your application needs to invest in Tracing for more deep transparency regarding its performance problems.
Configurations beyond Loki will be overviewed in separate next-part articles to keep the size of the current article to a reasonable time of comprehension.
__________
Tip
We can build graphical dashboards based on logs alone!
It is not efficient in comparison to using metrics, yes it is possible and necessary to be used as a last resort, or good enough to use in low-load systems.
Configuration
Getting server
You need to get somewhere a Linux server for deployment (it can be baremetal your own server, or it can VPN rented from some cloud provider).
I could recommend Hetzner server, due to the provider being very minimalistic and of a high quality with having quite low prices.
Its Arm64 prices for servers are looking to me like a killing feature.
CAX21 server (arm64, 4vcpu, 8gb ram) should be more than enough overkill for our purposes for homelab example purposes. You can squeeze things even into CAX11 (arm64, 2vcpu, 4gb ram) if desired, but be mindful, preferably to turn on Swap just in case as a fallback for insurance to handle the workload for everything put in at the start. (Usage of swap is not recommended for production at all, but for homelab in low load it should be fine)
Opentofu (Terraform) code is provided to configure things infrastructure as a code.
See this link for up-to-date code in case the article becomes outdated.
module "node_darklab_cax21" {
source = "../modules/hetzner_server"
name = "darklab"
hardware = "cax21"
backups = true
ssh_key_id = module.ssh_key.id
datacenter = "hel1-dc2"
}
Which utilizes code from this folder:
https://github.com/darklab8/infra/tree/master/tf/modules/hetzner_server
__________
Caution
I highly encourage you to attach Hetzner's firewall to the server as it is configured according to this code:
https://github.com/darklab8/infra/blob/master/tf/modules/hetzner_server/firewall.tf
And allow only traffic for 80 and 443 udp and tcp (for our caddy web reverse server), 22(tcp for ssh), and icmp for ping ports.
The configured cloud-level firewall ensures that in case you forget something about docker security, you have a nice fallback protecting your containers.
That is important with docker, which by default binds applications to 0.0.0.0 when using -p 8000:8000
exposure and it bypasses host-level firewalls like ufw.
Cloud-level firewall is your last safe net here in case of human error and misconfigurations.
__________
If you configure the server manually, please create a ssh key with the ssh-keygen command.
(Usually available right away in Linux as long as git is installed at least), you can make it available on Windows too if you open Git bash console that becomes available with installation of git.
Assuming you created everything correctly, you can make a record in ~/.ssh/config
Host homelab
HostName 65.109.15.108 # replace with IP address shown in hetzner interface
User root
IdentityFile ~/.ssh/id_rsa.darklab # replace with name of your SSH key
IdentitiesOnly yes
And connect to it by using the ssh homelab
command. Once you connect and verify, you will see server insides and be ready for next steps:
$ ssh homelab
The authenticity of host '65.109.15.108 (65.109.15.108)' can't be established.
ED25519 key fingerprint is SHA256:mQ5+B+9e/1xn3GmRvd0pBnINxtjiLazwT8CMNvI7YcU.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '65.109.15.108' (ED25519) to the list of known hosts.
Welcome to Ubuntu 24.04.1 LTS (GNU/Linux 6.8.0-52-generic aarch64)
# bla bla bla, other long text
root@homelab-example:~#
Configuring DNS
Buy some domain for your server, so that we can have a nice address like https://homelab.dd84ai.com
for the opening of the website later with TLS encryption in a named way. We can optionally use free DNS from deSec.
Create A record leading to the public IP of the server.
Raising docker containers
Once we get the server, we can proceed to the next step of configuring our monitoring stack.
We assume it will be served by Caddy for Let’s Encrypt and reverse proxy.
__________
Note
We assume you have installed Docker Engine and work from Linux.
Instructions may work for WSL2 with Docker Engine or Docker Desktop too, but not guaranteed.
With Docker available locally you will be able to apply instructions from this tutorial without being at the server directly.
Instructions for Docker Engine installations can be found here:
https://docs.docker.com/engine/install/ubuntu
If you used a Docker app image from Hetzner, then Docker is already installed on the server.
As a last resort, you can just execute the tutorial instructions directly on the server, just skip DOCKER_HOSTinstruction that will be mentioned next
__________
We configure with Docker-compose
__________
Note
For the convenience of working with some of the services running as docker swarm services for easy rotation of their image from CI, we utilize some swarm docker network (overlay) which requires running docker swarm init
at your server.
__________
Tip
You can check Opentofu (Terraform) configuration in addition at:
https://github.com/darklab8/infra/blob/master/tf/modules/docker_stack/monitoring.tf
__________
Important
We provide a docker-compose way of configuration as a demo example because more devs are highly likely familiar and comfortable with docker-compose than with terraform.
We utilize terraform for configuration of it and recommend it to use instead of docker-compose if you can.
The book "Terraform up and running" is an excellent start.
docker-compose.yaml
version: "3.8"
services:
caddy:
image: lucaslorentz/caddy-docker-proxy:2.9.1
container_name: caddy
restart: always
networks:
- caddy
ports:
- "80:80"
- "443:443"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- caddy_data:/data
logging:
driver: json-file # ensures logs from containers will not overfill server
options:
mode: non-blocking
max-buffer-size: 500m
grafana:
build:
dockerfile: ./Dockerfile.grafana
context: .
container_name: grafana
restart: always
environment:
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
- GF_SECURITY_ADMIN_USER=admin
- GF_FEATURE_TOGGLES_ENABLE=alertingSimplifiedRouting,alertingQueryAndExpressionsStepMode
- GF_INSTALL_PLUGINS=https://storage.googleapis.com/integration-artifacts/grafana-exploretraces-app/grafana-exploretraces-app-latest.zip;grafana-traces-app
networks:
- grafana
- caddy
volumes:
- grafana_data:/var/lib/grafana
logging:
driver: json-file
options:
mode: non-blocking
max-buffer-size: 500m
labels:
caddy_0: ${GRAFANA_DOMAIN}
caddy_0.reverse_proxy: "{{upstreams 3000}}"
loki:
build:
dockerfile: ./Dockerfile.loki
context: .
container_name: loki
restart: always
entrypoint: ["/usr/bin/loki"]
command: ["-config.file=/etc/loki/local-config.yaml"]
networks:
grafana:
aliases:
- loki
volumes:
- loki_data:/data
logging:
driver: json-file
options:
mode: non-blocking
max-buffer-size: 500m
mem_limit: 1000m
alloy-logs:
build:
dockerfile: ./Dockerfile.alloy.logs
context: .
container_name: alloy-logs
restart: always
networks:
grafana:
aliases:
- alloy-logs
entrypoint: ["/bin/alloy"]
command: ["run","/etc/alloy/config.alloy","--storage.path=/var/lib/alloy/data"]
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
logging:
driver: json-file
options:
mode: non-blocking
max-buffer-size: 500m
deploy:
resources:
limits:
memory: 1000M
networks:
grafana:
name: grafana
driver: overlay
attachable: true
caddy:
name: caddy
driver: overlay
attachable: true
volumes:
caddy_data:
name: "caddy_data"
grafana_data:
name: "grafana_data"
loki_data:
name: "loki_data"
Starting using Grafana
If everything is working as expected, you can log in to Grafana using the username admin
and the password you set in the GRAFANA_PASSWORD
environment variable.
Now you can observe logs from all your running Docker containers.
Select the desired application and browse its logs easily by filtering specific log levels.
You can also use quick filtering options at the top of the panel—under the Labels and Log levels bars.
To filter by any text, use the “Search in log lines” menu and press Include to specify what you're looking for.
Important
Make sure your logs are emitted in JSON format!
Grafana’s logging interface will automatically recognize all JSON key-values as valid labels, making filtering much simpler.
However, for Explore and LogQL queries, you still need to explicitly define JSON format for it to work correctly.
A bit further down, we deploy simple application examples that we’ll use in more advanced scenarios.
Once deployed, try experimenting with:
- Filtering logs by minimum duration
- Switching between different applications
- Filtering by specific URL patterns
In our case, we encountered a few errors in Caddy, and we filtered them using the error
log level.
Dashboards with Loki
Dashboards that use Loki are not known for high performance. They can struggle in horizontally scaled environments with high log volume. So, Loki dashboards are more of a last-resort tool when you need insights that metrics alone can't provide—such as detailed values and their precision.
For high-load applications, it's better to configure Mimir/Prometheus with metrics and use Recording Rules to optimize performance.
For low-workload applications (e.g., single instances), Loki is typically sufficient performance-wise.
Sample Logging App
To demonstrate a web-like app emitting logs, we created a dummy app example.
bash
CopyEdit
export DOCKER_HOST=ssh://root@homelab
docker compose -f docker-compose.app-logs.yaml build
docker compose -f docker-compose.app-logs.yaml up -d
Creating a Dashboard with Loki
Now let’s create a dashboard using Loki as a data source in flexible “code” mode.
We’ll start with a LogQL query from the Metric Queries page.
Notice how we used the unwrap function to select specific numeric values to be used in formulas.
Example: Max Duration by URL Pattern (over 2m)
max_over_time({service_name="app-logs"} | json | duration > 0 | url_path!="" | unwrap duration [2m]) by (url_pattern)
Note: The unwrap
function is essential here—it extracts the numeric value we need for calculations.
Example: Count of Requests by URL Pattern (over 2m)
sum(count_over_time({service_name="app-logs"} | json | duration > 0 | url_path!="" [2m])) by (url_pattern)
If you’re logging other fields—like user IPs, user agents, request/response body size—you can create charts grouped by these parameters.
This lets you see which endpoints use the most network traffic.
Example: 90th Percentile Duration (over 10m)
quantile_over_time(0.90,{service_name="app-logs"} | json | duration > 0 | unwrap duration [10m]) by (url_pattern)
Average Duration
If you want to show average values, just use avg_over_time
instead:
avg_over_time({service_name="app-logs"} | json | duration > 0 | unwrap duration [10m]) by (url_pattern)
Finishing the Dashboard
After assembling the graphs:
- Set proper titles
- Change units to seconds for duration-based charts
- Optionally use bar charts instead of line charts
- Enable a legend in table mode showing Last/Mean values
This results in a much easier-to-navigate dashboard compared to raw logs.
A final version of the dashboard is provided for optional import: dashboard_app_logs.json
What’s Next?
That’s it for the first part of setting up Grafana + Loki + Alloy.
In the next articles, we’ll cover:
- Metrics
- Traces
- Alerts
In the meantime, try playing around with the logging interface—filter logs in different ways and switch between services to get comfortable with it.
You’ll find updated versions of these articles and the next parts here.