Healthchecks.io External Monitoring - Setup Guide¶
๐ฏ What This Adds¶
External monitoring via Healthchecks.io completes your observability stack:
- Self-healing: Fixes 95% of issues automatically โ (Already deployed)
- Healthchecks.io: Detects the other 5% โ (You're setting this up now)
- Combined: 100% coverage of ALL failure modes
๐ซ How It Works¶
MiracleMax Server
โ
Cron runs every 5 minutes
โ
Pings healthchecks.io: "I'm alive, here's my status"
โ
Healthchecks.io receives ping
โ
If ping STOPS (for ANY reason):
โ
Email alert to [email protected]
Dead Man's Switch: Expects regular pings. Silence = Problem = Alert.
๐ Setup Instructions (10 minutes)¶
Step 1: Sign Up for Healthchecks.io (2 minutes)¶
- Go to: https://healthchecks.io/accounts/signup/
- Sign up (FREE tier is perfect for your needs)
- Verify your email: [email protected]
Free tier includes: - โ 20 checks (you only need 1-4) - โ Unlimited email alerts - โ 5-minute ping intervals - โ All features you need
Step 2: Create a Check (3 minutes)¶
-
Click "+ Add Check" in the dashboard
-
Configure the check:
-
Save the check
-
Click on the check you just created
-
Copy the Ping URL - it looks like:
Step 3: Configure Ansible (2 minutes)¶
Edit the configuration file:
Update the healthcheck_ping_url line:
# Ping URL - Set this after creating your check
healthcheck_ping_url: "https://hc-ping.com/YOUR-UUID-HERE"
Replace YOUR-UUID-HERE with the actual UUID from Step 2.
Save and exit (:wq in vim)
Step 4: Deploy with Ansible (2 minutes)¶
cd ~/infrastructure/ansible
# Deploy healthchecks monitoring
ansible-playbook playbooks/deploy-healthchecks.yml
What this does: - โ Deploys heartbeat script to miraclemax - โ Creates cron job (runs every 5 minutes) - โ Tests the connection - โ Sends first heartbeat
Step 5: Verify It's Working (1 minute)¶
Check Healthchecks.io dashboard: - Go to: https://healthchecks.io/projects/ - You should see your check: MiracleMax Server - Status should show: โ UP (green) - Last ping: "Just now" or "< 5 minutes ago"
Check on server:
# View heartbeat log
ssh [email protected] "tail -20 /var/log/healthcheck-heartbeat.log"
# Manually trigger heartbeat
ssh [email protected] "sudo /usr/local/bin/miraclemax-heartbeat"
# View cron job
ssh [email protected] "crontab -l | grep heartbeat"
๐งช Test the Monitoring (Optional but Recommended)¶
Test 1: Simulate Server Down¶
On healthchecks.io: 1. Go to your check settings 2. Temporarily change "Period" to 1 minute 3. Save
On your server:
# Stop the heartbeat cron temporarily
ssh [email protected] "sudo crontab -r"
# Wait 2-3 minutes
Expected result: - โ You'll receive an email: "MiracleMax Server is DOWN" - โ Healthchecks.io dashboard shows check as DOWN (red)
Restore:
# Re-deploy to restore cron
cd ~/infrastructure/ansible
ansible-playbook playbooks/deploy-healthchecks.yml
# Change period back to 5 minutes on healthchecks.io
Test 2: Simulate Service Failure¶
# Stop a service
ssh [email protected] "sudo systemctl stop story-stages"
# Wait for next heartbeat (up to 5 minutes)
# Check the heartbeat log
ssh [email protected] "tail -f /var/log/healthcheck-heartbeat.log"
Expected result: - โ ๏ธ Heartbeat still sends (server is up) - โ ๏ธ But includes: "โ ๏ธ ISSUES story-stages:DOWN" - โ ๏ธ Healthchecks.io receives ping with "/fail" endpoint - โ You can see service status in ping data
Restore:
# Self-healing will auto-restart, or do it manually:
ssh [email protected] "sudo systemctl start story-stages"
๐ What You'll Monitor¶
With this setup, you'll get alerts for:
| Failure Type | Self-Healing Fixes? | Healthchecks Detects? |
|---|---|---|
| App crash | โ Yes (auto-fix) | โ Yes (in ping data) |
| Self-heal fails | โ No | โ Yes (heartbeat shows issues) |
| Systemd hangs | โ No | โ Yes (heartbeat stops) |
| Server crash | โ No | โ Yes (heartbeat stops) |
| Power outage | โ No | โ Yes (heartbeat stops) |
| Network down | โ No | โ Yes (heartbeat stops) |
Result: 100% coverage โ
๐ง Advanced Configuration¶
Monitor Multiple Servers¶
Create additional checks in Healthchecks.io:
# In ansible/roles/healthchecks_monitoring/defaults/main.yml
healthcheck_ping_url: "{{ healthcheck_urls[inventory_hostname] }}"
# In ansible/inventory/hosts.yml
all:
vars:
healthcheck_urls:
miraclemax.local: "https://hc-ping.com/uuid-for-miraclemax"
otherserver.local: "https://hc-ping.com/uuid-for-otherserver"
Change Heartbeat Interval¶
# In ansible/roles/healthchecks_monitoring/defaults/main.yml
healthcheck_interval: 300 # 5 minutes (recommended)
# or
healthcheck_interval: 180 # 3 minutes (more aggressive)
# or
healthcheck_interval: 600 # 10 minutes (less aggressive)
Then redeploy:
Don't forget to update the period in Healthchecks.io dashboard too!
Include More Data in Heartbeat¶
Edit: ansible/roles/healthchecks_monitoring/templates/miraclemax-heartbeat.sh.j2
Add custom checks to the get_health_summary() function.
Use Healthchecks.io API (Advanced)¶
Auto-create checks via API:
Then update tasks/main.yml to create checks programmatically.
(Not implemented yet, but easy to add if you want it)
๐ง Email Notifications¶
What You'll Receive¶
When heartbeat stops:
Subject: MiracleMax Server is DOWN
Your check "MiracleMax Server" is DOWN.
Last ping was 11 minutes ago.
Check URL: https://healthchecks.io/checks/...
When heartbeat resumes:
Subject: MiracleMax Server is now UP
Your check "MiracleMax Server" is now UP.
Check URL: https://healthchecks.io/checks/...
Configure Alert Channels¶
Healthchecks.io supports multiple notification channels:
- Go to: Integrations in dashboard
- Add integrations:
- Email (already configured)
- SMS (requires paid plan)
- Slack
- Discord
- PagerDuty
- Webhook
- And many more...
๐ฏ Maintenance¶
View Heartbeat Logs¶
# Real-time
ssh [email protected] "tail -f /var/log/healthcheck-heartbeat.log"
# Last 50 lines
ssh [email protected] "tail -50 /var/log/healthcheck-heartbeat.log"
# Search for errors
ssh [email protected] "grep ERROR /var/log/healthcheck-heartbeat.log"
Manually Trigger Heartbeat¶
ssh [email protected] "sudo /usr/local/bin/miraclemax-heartbeat"
Pause Monitoring (During Maintenance)¶
In Healthchecks.io dashboard: 1. Click on your check 2. Click "Pause" 3. Do your maintenance 4. Click "Resume"
Or use the API:
# Pause
curl -X POST https://healthchecks.io/api/v1/checks/YOUR-UUID/pause \
-H "X-Api-Key: YOUR-API-KEY"
# Resume
curl -X POST https://healthchecks.io/api/v1/checks/YOUR-UUID/resume \
-H "X-Api-Key: YOUR-API-KEY"
Disable Monitoring¶
Redeploy:
๐ Philosophy: Ansai Compliance¶
โ
Observable: External monitoring via Healthchecks.io
โ
Self-Healing: Combined with existing self-healing (95% auto-fix)
โ
Config-as-Code: All configuration in Ansible
โ
Always Log: Heartbeat logs every ping
โ
Declarative: Define config, Ansible handles deployment
โ
No Manual Work: Automated monitoring and alerts
๐ Reference¶
Healthchecks.io Docs: https://healthchecks.io/docs/
API Reference: https://healthchecks.io/docs/api/
Pricing: https://healthchecks.io/pricing/ (FREE tier is sufficient)
๐ Summary¶
After completing this setup:
- โ Self-healing fixes 95% of issues automatically
- โ Healthchecks.io detects the other 5%
- โ 100% coverage of all failure modes
- โ Email alerts for everything
- โ Config-as-code (Ansible)
- โ Observable and maintainable
Time investment: 10 minutes setup
Ongoing maintenance: Zero (fully automated)
Peace of mind: Priceless ๐คโจ
๐ Quick Start¶
Ready? Here's the TL;DR:
# 1. Sign up at https://healthchecks.io
# 2. Create check, copy ping URL
# 3. Edit config
vim ~/infrastructure/ansible/roles/healthchecks_monitoring/defaults/main.yml
# Set: healthcheck_ping_url: "https://hc-ping.com/YOUR-UUID"
# 4. Deploy
cd ~/infrastructure/ansible
ansible-playbook playbooks/deploy-healthchecks.yml
# 5. Verify
# Check healthchecks.io dashboard - should show UP โ
Done! ๐ฏ