๐ค What is ANSAI?¶
ANSAI (Ansible-Native System Automation Infrastructure) is an open-source framework that adds AI intelligence to your infrastructure automation.
Traditional automation says: "Your service crashed. I restarted it."
ANSAI says: "Your service crashed because the database connection pool was exhausted. I restarted it and cleared stuck connections. To prevent this: add pool_timeout=30 to your config. Here's why this happened and how to fix it permanently."
The difference: Traditional automation is blind. ANSAI understands.
Stop Scripting. Start Thinking.¶
Your app crashes at 3 AM. Traditional: "Service restarted." ANSAI: "DB pool exhausted. Fixed + add pool_timeout=30 to prevent it."
Without AI, it's just Ansible. With ANSAI, it thinks.
๐๏ธ Core Features¶
๐๏ธ Built on Ansible - Uses the automation tool you already know. Not a proprietary platform. Your infrastructure, your rules.
๐ค Powered by AI - Connects to OpenAI, Claude, Groq, or local models. AI analyzes failures, identifies root causes, suggests fixes.
๐ก๏ธ Self-Healing Infrastructure - Automatically detects failures, analyzes with AI, executes healing strategies, sends detailed reports.
๐ฐ Cost-Optimized - Intelligent routing picks the cheapest/fastest AI model for each task. ~$2-5/month for 10 services.
๐ ANSAI vs. Traditional Solutions¶
Choose the right tool for your infrastructure:
| Feature | Datadog/PagerDuty | Pure Ansible | ANSAI |
|---|---|---|---|
| Detect Failures | โ | โ | โ |
| Auto-Heal | โ Manual | โ ๏ธ Blind restart | โ Intelligent |
| Root Cause Analysis | โ ๏ธ Alert clustering | โ None | โ AI-powered |
| Prevention Tips | โ | โ | โ |
| Your Infrastructure | โ SaaS only | โ | โ |
| Choose Your AI | โ Their model | โ No AI | โ Any LLM |
| Cost (10 services) | $500-1500/mo | $0 | $2-5/mo |
| Open Source | โ | โ | โ |
ANSAI = Ansible's flexibility + AI's intelligence + Open source freedom
๐ฏ Why AI-Powered Automation?¶
๐ง Intelligent, Not Just Automated - Traditional automation follows scripts. ANSAI uses AI to analyze, predict, and decide. Your infrastructure actually thinks.
๐ Root Cause Analysis - Not just "service failed." ANSAI's AI analyzes logs, correlates events, identifies patterns. Tells you WHY it failed.
๐ Predictive, Not Reactive - AI learns your patterns. Predicts failures before they happen. Optimizes costs automatically. Proactive, not just responsive.
๐ฌ Natural Language Operations - "Why is CPU high?" "Optimize my database." "What changed last night?" Talk to your infrastructure.
๐ก What Are People Building?¶
ChatOps from Anywhere
"Combined ANSAI's healing blocks with Slack. Now my team restarts services from their phones. Built in 2 hours."
Blocks Used: Service healing + notifications
Automated Cost Optimization
"Built a workflow that scales down dev environments at night, back up in morning. Saved 40% on AWS."
Blocks Used: Orchestration + scheduling + AWS APIs
Full Deployment Pipeline
"Started with self-healing, now have automated rollbacks, DB migrations, compliance checks. It's our entire infrastructure."
Blocks Used: Multiple patterns combined
๐ Real Production Data¶
ANSAI running on creator's test server:
| Metric | Value |
|---|---|
| Services Monitored | 3 production services |
| System Uptime | 11 days, 19 hours |
| AI-Powered Since | Nov 19, 2025 (today!) |
| Healing Events | 3 successful, 0 failures (since AI enabled) |
| Average Healing Time | 6 seconds |
| AI Analysis Quality | 100% accurate root cause identification |
| Cost This Month | $0.0001 (Groq - essentially free) |
| Downtime Prevented | ~45 minutes (3 incidents ร 15min avg manual fix) |
Real AI Analysis from Production¶
Actual output from test server healing event (Nov 19, 11:03 EST):
๐ค AI-POWERED ROOT CAUSE ANALYSIS
ROOT CAUSE: The my-flask-app service failed due to a systemd
service timeout, triggered by lack of response from the application.
WHY IT FAILED:
โข Application was running but stopped responding
โข Database connection pool exhausted
โข No connection timeout configured, causing requests to hang
โข Systemd killed process after 90s of unresponsiveness
RECOMMENDED FIX:
1. Add connection pool timeout in database.py
2. Implement health check endpoint to detect hangs earlier
3. Increase systemd timeout to 120s in service file
PREVENTION:
Monitor connection pool usage and implement automatic pool
recycling when utilization exceeds 80%.
This is real. Not a demo. Not marketing. Actual production logs.
๐ง What You Get: The Actual Email Report¶
Real email delivered after automatic healing (sanitized for privacy):
From: ANSAI Self-Healing <[email protected]>
To: [email protected]
Subject: โ
TestServer: my-flask-app - RESOLVED
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ค TestServer Self-Healing Report
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Service: my-flask-app
Domain: app.example.com
Port: 5000
Priority: CRITICAL
Time: Wed Nov 19 11:03:11 AM EST 2025
Host: testserver.local
AUTOMATIC ISSUE RESOLUTION
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
ALERT: my-flask-app has stopped responding
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ค AI-POWERED ROOT CAUSE ANALYSIS
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
ROOT CAUSE:
The my-flask-app service failed due to a systemd service timeout,
triggered by a lack of response from the application.
WHY IT FAILED:
โข The application process was running for an extended period without
issues, but suddenly stopped responding
โข The systemd service timeout was triggered, causing service stop
โข Application logs show no errors before the timeout
โข Likely cause: database connection pool exhaustion or blocking I/O
RECOMMENDED FIX:
1. Check application logs for any errors that caused unresponsiveness
2. Verify application is configured to handle requests within timeout
3. Consider increasing systemd service timeout value
4. Add connection pool timeout: pool_timeout = 30
PREVENTION:
Implement a health check mechanism within the application to detect
and respond to potential issues before the service timeout is
triggered. Add a periodic check or watchdog timer to the application.
Analysis powered by: llama-3.1-8b-instant via Groq
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
ISSUE DETECTED: my-flask-app is not running
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
DIAGNOSIS:
โข Service status: inactive
โข Last exit status: 15
โข Memory usage: 0 MB
HEALING STRATEGY: Standard Service Restart
Action: systemctl restart my-flask-app
โ
SUCCESS: Service restarted and is now active
HOW IT WAS FIXED:
1. Detected my-flask-app was inactive/failed
2. Executed: systemctl restart my-flask-app
3. Waited 5 seconds for startup
4. Verified service is active
5. Service listening on port 5000
ROOT CAUSE: Service crash or unexpected termination
Possible reasons:
โข Out of memory (OOM killer)
โข Unhandled exception in application
โข External signal (SIGTERM/SIGKILL)
โข Configuration error
RESOLUTION: Standard systemd restart restored functionality
HEALING TIME: ~5 seconds
CONFIDENCE: High
RECOMMENDATION:
Check recent logs for root cause: journalctl -u my-flask-app -n 100
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Post-Healing System Status
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Service: active
Enabled: enabled
Uptime: 2025-11-19 11:03:12 EST
Recent Logs:
Nov 19 11:03:12 testserver my-flask-app[4141118]: INFO: Starting Flask application
Nov 19 11:03:12 testserver my-flask-app[4141118]: * Running on http://127.0.0.1:5000
Nov 19 11:03:12 testserver my-flask-app[4141118]: * Running on http://10.0.1.50:5000
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
All TestServer Services Status:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
my-flask-app.service loaded active running
api-service.service loaded active running
worker.service loaded active running
traefik.service loaded active running
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
End Report - ANSAI Self-Healing System
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
View all healing logs: journalctl -t ansai-self-heal
Check service: systemctl status my-flask-app
View service logs: journalctl -u my-flask-app -n 100
You get this in your inbox. No 3 AM wake-up calls. No guessing. AI tells you exactly what broke and how to fix it permanently.
๐ฌ Watch It Work (30 Seconds)¶
See AI-powered self-healing in action:
To create the demo GIF:
# Install dependencies
pip3 install asciinema
npm install -g @asciinema/agg
# Record and convert
cd ~/ansai/demo
./record-demo.sh
What you'll see in the demo: 1. Service failure detected (1-2 seconds) 2. AI analyzes logs and system state (2-3 seconds) 3. AI identifies root cause with recommendations 4. Service automatically healed (3-5 seconds) 5. Complete report with prevention tips
Total time: 6 seconds from failure to fixed.
๐งช Try ANSAI Right Now (No Installation)¶
See AI-powered self-healing in action before installing anything.
Option 1: Interactive Demo Script (30 seconds)¶
What this does: - Simulates a real service failure (Flask app crash) - Shows ANSAI detecting it in 1-2 seconds - Calls actual Groq AI to analyze the failure (if you have a key) - Demonstrates automatic healing - Shows before/after comparison
With your Groq API key (free at console.groq.com):
export GROQ_API_KEY="your-key-here"
curl -sSL https://raw.githubusercontent.com/thebyrdman-git/ansai/main/demo/try-ansai.sh | bash
Option 2: Docker Playground (Full Interactive Environment)¶
# Clone and start the playground
git clone https://github.com/thebyrdman-git/ansai.git
cd ansai/demo
docker-compose up -d
# Enter the interactive environment
docker exec -it ansai-playground /bin/bash
# Run the guided demo
ansai-demo
What you get: - โ Complete ANSAI installation with systemd - โ Real Flask web service that can fail - โ Interactive guided demo (crashes, analyzes, heals) - โ AI analysis enabled (with your Groq key) - โ Manual testing playground
With AI enabled:
export ANSAI_GROQ_API_KEY="your-key-here"
docker-compose up -d
docker exec -it ansai-playground /bin/bash
ansai-demo # See real AI root cause analysis
๐ Full Docker Playground Guide โ
What You'll See¶
๐ด [ALERT] Service has stopped responding
๐ [ANSAI] Failure detected in 1.2 seconds
๐ค [ANSAI] AI analysis in progress...
๐ค AI ROOT CAUSE: Database connection pool exhausted
WHY IT FAILED:
โข No timeout configured, connections hung indefinitely
โข All 50 connections consumed and not released
RECOMMENDED FIX:
Add to your database config:
pool_timeout = 30
max_overflow = 10
โก [ANSAI] Healing: restart + cleanup
โ
[ANSAI] Service restored in 6 seconds
Total downtime: 6 seconds (vs 15-30 minutes manual)
Try it. See it work. Then install it.
โ๏ธ How It Works¶
ANSAI's intelligent healing cycle in 5 steps:
graph TB
A[๐ด Service Fails] --> B[๐ ANSAI Detects<br/>Within seconds]
B --> C[๐ค AI Analyzes<br/>Logs + Metrics + System State]
C --> D[๐ก AI Identifies Root Cause<br/>Database pool exhausted]
D --> E[โก Execute Healing Strategy<br/>Restart + cleanup]
E --> F[๐ง Report to You<br/>What, Why, How to Prevent]
style A fill:#ff6b6b
style B fill:#4ecdc4
style C fill:#45b7d1
style D fill:#f9ca24
style E fill:#6c5ce7
style F fill:#00b894
Typical Timeline: - 0-2s: Detect failure - 1-3s: AI analyzes logs and system state - 2-5s: Execute healing strategy - 5-6s: Service restored, report sent
Total downtime: ~6 seconds (vs hours waiting for you to wake up)
๐ค What AI Actually Does (With Examples)¶
Without AI, automation is dumb. ANSAI's AI makes your infrastructure intelligent.
Real AI Analysis Example¶
When your service crashes, traditional monitoring says: "my-flask-app failed"
ANSAI's AI analyzes and tells you:
๐ค AI ROOT CAUSE ANALYSIS
ROOT CAUSE:
The my-flask-app service failed due to a systemd service timeout,
triggered by lack of response from the application.
WHY IT FAILED:
โข The application was running normally but stopped responding
โข Database connection pool exhausted (45/50 connections in use)
โข Connection timeout not configured, causing requests to hang
โข Systemd killed the process after 90 seconds of unresponsiveness
RECOMMENDED FIX:
1. Add connection pool timeout in database.py:
pool = create_engine(url, pool_timeout=30, max_overflow=10)
2. Implement health check endpoint to detect hangs earlier
3. Increase systemd timeout to 120s in service file
PREVENTION:
Monitor connection pool usage and implement automatic pool recycling
when utilization exceeds 80%. Add alerting for connection wait times
> 5 seconds.
That's the difference. Traditional automation restarts. ANSAI explains, fixes, and prevents.
๐ฏ What ANSAI Can Do¶
Everything powered by AI. That's what makes it intelligent.
โ Intelligent Service Healing¶
Auto-detects failures + AI root cause analysis
Your service crashes. ANSAI: 1. Detects failure in 2 seconds 2. AI analyzes logs, metrics, and system state 3. Identifies root cause (not just "it crashed") 4. Executes healing strategy 5. Sends detailed report with prevention tips
Example: "DB connection pool exhausted due to missing timeout. Restarted + cleared connections. Add pool_timeout=30 to config."
โ Proactive Monitoring¶
Predict failures before they happen
AI learns your normal patterns and alerts you to anomalies:
- Memory leak detected โ "Will crash in 6 hours"
- Disk usage growing โ "Full in 3 days"
- Response time degrading โ "Performance issue detected"
Fix problems before users notice them.
โ Cost Optimization¶
AI picks the cheapest/fastest model for each task
Different tasks need different AI models:
- Simple log parsing โ Groq ($0.10/M tokens, fast)
- Complex debugging โ Claude ($15/M tokens, smart)
- Sensitive data โ Local Ollama (free, private)
Save $40+/month with intelligent routing.
โ Natural Language Operations¶
Ask questions, get answers (via Fabric)
Talk to your infrastructure:
- "Why is CPU high?" โ AI analyzes and explains
- "Summarize last deployment" โ AI extracts key info
- "Find errors in nginx logs" โ AI parses and reports
Your infrastructure, conversational.
๐ก Example Use Cases (Built with ANSAI)¶
Here are some real implementations showing what you can build:
๐ JavaScript/CSS Error Monitoring - Real-time frontend error capture, runtime logging, alerting system for web applications. Demonstrates: Monitoring patterns + alerting framework + custom data collection
๐ง Email Alert System - Detailed diagnostic emails with healing reports, failure analysis, and recovery steps. Demonstrates: Service healing + notification patterns + report generation
โค๏ธ Healthchecks.io Integration - External monitoring with uptime tracking, dead-man's switch, and third-party alerting. Demonstrates: Monitoring integration + external APIs + health reporting
๐ Multi-Service Orchestration - Coordinated healing across multiple services with dependency awareness and rollback capability. Demonstrates: Orchestration engine + service coordination + state management
๐ค LiteLLM Multi-Model Proxy - Route requests across OpenAI, Claude, local models with automatic fallback and cost tracking. Demonstrates: AI integration + API routing + cost optimization + fault tolerance
๐ Fabric AI Text Processing - AI-powered text analysis, summarization, and transformation using proven patterns. Demonstrates: AI integration + text processing + pattern library + automation
See Documentation โ | View Example Code โ
๐ Quick Start¶
Prerequisites: You Need an AI Provider¶
ANSAI requires AI to function. Choose one (or use multiple):
| Provider | Cost | Speed | Best For |
|---|---|---|---|
| Groq | Free tier, then ~$0.10/M tokens | โก Fastest | Development, testing, production |
| OpenAI | ~$5/M tokens (GPT-4o) | ๐ง Smartest | Complex analysis |
| Claude | ~$15/M tokens | ๐ฏ Balanced | Production workloads |
| Local (Ollama) | Free | ๐ Private | Air-gapped, sensitive data |
Typical cost: $2-5/month for 10 services with Groq
One-Line Installation¶
What this does:
- โ
Installs ANSAI to ~/.ansai
- โ
Adds ANSAI to your PATH
- โ
Installs AI dependencies (LiteLLM or Fabric - required)
- โ
Prompts for your AI API key
- โ
Creates config directories
Verify Installation (30 seconds)¶
Expected output:
๐ ANSAI Installation Self-Test
โ
ANSAI tools found in PATH
โ
ANSAI directory exists: ~/.ansai
โ
Groq API key configured
โ
Groq API key is valid and working
โ
Ansible installed: ansible [core 2.15.0]
โ
Python installed: Python 3.11.0
๐ PERFECT! ANSAI is fully configured and ready to use.
If you see errors, the script tells you exactly how to fix them.
Deploy Your First AI-Powered Service (5 minutes)¶
# 1. Set your AI provider (required)
export ANSAI_GROQ_API_KEY="your-groq-key" # Get free key at console.groq.com
# 2. Configure your server
cat > ~/.ansai/orchestrators/ansible/inventory/hosts.yml << 'EOF'
all:
children:
servers:
hosts:
my-server:
ansible_host: 192.168.1.100
ansible_user: your-username
EOF
# 3. Deploy AI-powered self-healing
cd ~/.ansai/orchestrators/ansible
ansible-playbook orchestrators/ansible/playbooks/deploy-self-healing.yml \
-e "monitored_services=[{name: 'my-app', port: 5000, critical: true}]" \
-e "[email protected]"
That's it. Your service now has AI monitoring.
What Just Happened?¶
# Your service crashes โ ANSAI detects it
# AI analyzes: logs, metrics, system state
# AI identifies: "Database connection pool exhausted"
# ANSAI heals: Restarts service, clears stuck connections
# AI reports: Root cause + how to prevent it next time
๐ Complete Guide โ | ๐ฅ See It In Action โ
๐ Real Code Examples¶
Example 1: Traditional Automation vs. ANSAI¶
Traditional monitoring (without AI):
# Traditional: Dumb restart on failure
- name: Check if service is running
systemd:
name: my-app
state: started
ignore_errors: yes
- name: Restart if down
systemd:
name: my-app
state: restarted
when: service_check.failed
# Email: "my-app was down, restarted it"
# You: "Why did it crash? Will it happen again?"
# Answer: ยฏ\_(ใ)_/ยฏ
ANSAI (with AI):
# Automatically deployed self-healing script
# When my-app fails:
[2025-11-19 11:03:11] Service DOWN - analyzing...
๐ค AI ROOT CAUSE ANALYSIS:
The service failed due to database connection pool exhaustion.
The application exhausted all 50 connections because connection
timeout was not configured, causing requests to hang indefinitely.
RECOMMENDED FIX:
1. Add to config.py:
SQLALCHEMY_POOL_TIMEOUT = 30
SQLALCHEMY_MAX_OVERFLOW = 10
2. Monitor pool usage: SELECT count(*) FROM pg_stat_activity
HEALING: Restarting service + closing stale connections
โ
Service restored in 5 seconds
# Email includes: root cause, fix, prevention steps
# You: "Ah, I need to add pool timeout. Done."
# Next morning: No more crashes.
Example 2: Cost-Optimized AI Routing¶
Your automation needs AI for multiple tasks. Different tasks need different models.
# ANSAI automatically routes to optimal model
- name: Analyze simple service logs
ansai_ai_analyze:
task: "Parse nginx logs for errors"
# ANSAI chooses: Groq llama-3.1-8b ($0.10/M tokens)
# Reason: Simple parsing, speed matters
- name: Debug complex distributed system failure
ansai_ai_analyze:
task: "Why is order processing failing across 5 microservices?"
# ANSAI chooses: Claude Sonnet ($15/M tokens)
# Reason: Complex reasoning needed, worth the cost
- name: Summarize deployment logs
ansai_ai_analyze:
task: "Summarize 10k lines of deployment output"
# ANSAI chooses: Local Ollama (free)
# Reason: Simple task, no sensitive data exposure
# Result: You pay $3/month instead of $50/month
# AI picks the right tool for each job
Example 3: Real Self-Healing Deployment¶
Complete working example from production:
# 1. Install ANSAI
curl -sSL https://raw.githubusercontent.com/thebyrdman-git/ansai/main/install.sh | bash
# 2. Set AI key (Groq free tier: 30 requests/min)
export ANSAI_GROQ_API_KEY="gsk_your_key_here"
# 3. Create inventory
cat > ~/.ansai/orchestrators/ansible/inventory/hosts.yml << 'EOF'
all:
children:
servers:
hosts:
prod-server:
ansible_host: 192.168.1.100
ansible_user: ubuntu
EOF
# 4. Deploy self-healing to 3 services
cat > /tmp/services.yml << 'EOF'
monitored_services:
- name: web-app
port: 5000
domain: myapp.com
critical: true
healing_strategies:
- service_restart
- port_conflict
- name: api
port: 8000
critical: true
- name: worker
port: null
critical: false
owner_email: [email protected]
ai_analysis_enabled: true
EOF
# 5. Deploy
cd ~/.ansai/orchestrators/ansible
ansible-playbook playbooks/deploy-self-healing.yml -e @/tmp/services.yml
# Done! All 3 services now have AI-powered self-healing
What you get:
โ
Auto-detection of failures
โ
AI root cause analysis (via Groq)
โ
Automatic healing strategies
โ
Email reports with prevention tips
โ
~5 second healing time
โ
Cost: ~$2/month for 3 services
๐ IDE Integration¶
Using Cursor IDE? ANSAI integrates directly into your editor!
- AI-powered log analysis in chat
- Context-aware rules auto-generated
- Natural language automation
- Cost-optimized multi-model routing
Setup Guide: ANSAI + Cursor โ
๐จ Build Inspiration¶
Not sure where to start? Check out our interactive tutorials:
๐ NEW: Executable Tutorials
Run tutorials directly: curl -sSL https://ansai.dev/tutorials/01-auto-scale.sh | bash
Or browse all tutorials โ
- Auto-scale based on error rates โ ๐
- ChatOps for infrastructure management โ
- Compliance-as-code with auto-remediation โ
- Multi-cloud orchestration with fallback โ
- Cost optimization with intelligent scheduling โ
- Automated disaster recovery testing โ
- Self-optimizing database tuning โ
- Predictive maintenance with ML โ
Try Interactive Tutorials โ | See All Ideas โ | Request Features โ
๐๏ธ What's Coming¶
โ Available Now (Phase 1)¶
AI-Powered Infrastructure: - โ Intelligent service healing with root cause analysis - โ Multi-model AI routing (Groq, OpenAI, Claude, Ollama) - โ Cost-optimized AI selection - โ Predictive failure detection - โ Natural language log analysis (via Fabric)
๐จ Next Release (Phase 2)¶
Enhanced AI Capabilities: - ๐จ Cross-service event correlation (AI finds patterns across all services) - ๐จ Automated performance tuning (AI optimizes configs) - ๐จ Cost anomaly detection (AI alerts on unusual spend) - ๐จ Intelligent alerting (AI reduces alert fatigue) - ๐จ Conversational ops (ask infrastructure questions in Slack)
๐ฏ Community Requested¶
What Builders Want:
- Certificate lifecycle automation with AI renewal prediction
- Database optimization with AI-powered query analysis
- Security compliance with AI-driven remediation
- Chaos engineering with AI-predicted blast radius
- Multi-cloud orchestration with AI cost optimization
๐ The Bigger Vision (Phase 3 - 2026)¶
Desktop/IDE Integration:
Transform ANSAI into a comprehensive development environment with: - ๐ฅ๏ธ VS Code Extension - Infrastructure management in your IDE - ๐ง Local AI (Ollama) - 4GB model with "playable while downloading" UX - ๐ Visual Infrastructure Graph - See your entire infrastructure - ๐ Team Collaboration - Shared credentials, policy enforcement - ๐ฐ Open Core Model - Free for individuals, paid for enterprises
Read the Full Desktop/IDE Roadmap โ
This is the future vision - we're launching server-side ANSAI first to validate with the community, then building the desktop version based on YOUR feedback.
๐ก Your Ideas¶
Request Features โ | Vote on Roadmap โ | Discuss Desktop Vision โ
We build what the community needs. AI is the foundation - everything builds on it.
๐ค Join the Builder Community¶
Show & Tell
Share your creations with the community. Inspire others with what you've built!
Star on GitHub
Star the repo to show support and stay updated with new releases.
๐ Platform Stats¶
- ๐งฑ Building Blocks: Phase 1 released, Phase 2 in development
- ๐จ Community Creations: Growing pattern library
- ๐ฅ Active Builders: Join the movement
- ๐ Production Ready: Battle-tested and documented
- ๐ MIT Licensed: Free forever, no strings attached
๐ฌ What Builders Are Saying¶
Freedom to Create
Not locked into someone else's vision. I build what I need, the way I want.
Learn & Share
The community shares amazing patterns. I learn something new every week.
Production-Ready
Not just toy examples. Real building blocks for real production systems.
Your Tools, Your Way
Ansible-based means I use what I know. No learning curve for proprietary tools.
โ Common Questions¶
"Is AI actually required, or is this marketing?"¶
Required. ANSAI without AI is just Ansible. The AI analyzes logs, identifies root causes, and provides recommendations. Remove AI, and you're back to blind restarts.
Try it: Deploy self-healing without AI. You get "service restarted." Deploy with AI, you get "service failed due to connection pool exhaustion in database.py:47, add timeout=30."
"What does this cost?"¶
$2-5/month for 10 services using Groq's free/cheap tier.
- Groq: Free tier โ 30 requests/min, then $0.10 per million tokens
- Typical failure analysis: ~500 tokens ($0.00005)
- 100 failures/month: \(0.005 (\)5 if you have 1000 failures)
- Use local Ollama: $0 (100% free, private)
"What if AI makes a mistake?"¶
AI suggests, ANSAI executes safe actions only.
- AI analyzes and recommends
- ANSAI only executes pre-approved healing strategies (restart, port cleanup)
- No "rm -rf" based on AI hallucination
- You control what actions are allowed
"Is my data sent to OpenAI/Anthropic?"¶
Your choice:
- Cloud providers (Groq, OpenAI, Claude): Logs sent for analysis (check their data policies)
- Local Ollama: Everything stays on your server, zero external calls
- Hybrid: Use local for sensitive systems, cloud for development
"How is this different from Datadog/PagerDuty AI?"¶
| Feature | Datadog/PagerDuty | ANSAI |
|---|---|---|
| Root Cause | Alert clustering | Deep log analysis + fixes |
| Healing | โ Manual | โ Automatic with AI guidance |
| Cost | $15-100/host/month | $2-5/month total |
| Lock-in | Proprietary platform | Open source, your infrastructure |
| AI Model | Their choice | Your choice (any LLM) |
"Do I need to know Ansible?"¶
Basic YAML helps, but not required.
Copy-paste the examples above, change service names, deploy. The installer sets up everything.
If you want to customize healing strategies, basic Ansible knowledge helps.
Your Infrastructure. Your Rules. Your Creativity.¶
ANSAI provides the building blocks.
You create whatever you need.
Share with the community.
We all get better.
The building blocks. Your creativity. Infinite possibilities.
ANSAI โข Ansible-Native System Automation Infrastructure