- This project consist of 3 docker images:
node
: service that generates cpu/io/memory/network loading and extracts metrics via HTTPprometheus
: monitoring service that scrapesnode
service and saves results as time seriesalertmanager
: manages and sends alerts when one of measured values meets some condition (per rules)
-
To run this project you should have following software installed on your computer
- Ansible
- Vagrant
- docker engine [Optional, for local testing]
- docker-compose [Optional, for local testing]
- Clone this repo:
git clone git@github.com:nikoren/prom.git
cd prom
- Export variables to be able to send alerts with GMAIL account
export GMAIL_TO='some_destination_address@gmail.com'
export GMAIL_PASS='your_password' # Nice article if you need help to generate one: https://www.lifewire.com/get-a-password-to-access-gmail-by-pop-imap-2-1171882
export GMAIL_ACCOUNT='your_gmail_account_username@gmail.com'
- Generate local alertmanager config
cd alertmanager
ansible 'localhost,' -m template -a "src=alertmanager.yml.j2 dest=./alertmanager.yml" -e alertmanager_auth_username=$GMAIL_ACCOUNT -e "alertmanager_auth_pass='$(echo $GMAIL_PASS)'" -ealertmanager_to=${GMAIL_TO}
cd -
- Build the images
docker-compose build
- Start containers
docker-compose up
- Basically deployment is automated with ansible and vagrant
- just make sure the same environment variables for GMAIL access are exported in your enviroment,
- Vagrantfile is configured to pick those variables and present them to ansible-playbook as extra variables
git clone git@github.com:nikoren/prom.git
cd prom
export GMAIL_TO=take-home-test@league.pagerduty.com
export GMAIL_PASS='your_password' # Nice article if you need help to generate one: https://www.lifewire.com/get-a-password-to-access-gmail-by-pop-imap-2-1171882
export GMAIL_ACCOUNT='your_gmail_account_username@gmail.com'
vagrant up
# Zzz...you done
- Vagrant configured to redirect ports to your local environment, just be sure you
are not running local and vagrant at the same time to avoid port collisions anc confusion
- node: 9100
- prometheus: 9090
- alertmanager: 9093
-
In order to access the metrics Prometheus provides
promQL
language , -
following metrics are availble and can be viewed here
# HELP node_cpu Seconds the cpus spent in each mode.
# TYPE node_cpu counter
100 - (avg by (instance) (irate(node_cpu_seconds_total{job="node",mode="idle"}[3m])) * 100)
# HELP node_network_transmit_bytes Network device statistic transmit_bytes.
# TYPE node_network_transmit_bytes gauge
rate(node_network_transmit_bytes_total{device="eth0"}[1m])
# HELP node_disk_io_time_seconds_total Total seconds spent doing I/Os.
# TYPE node_disk_io_time_seconds_total counter
irate(node_disk_io_time_seconds_total{device="sda"}[10s])
- Prometheus provides alerting mechanism with
alertmanager
- Load alert is configured to send alert when
node
has load is higher than 5 for 1m - More rules can be configured here