A record of the AI-related infrastructure and tools I’ve been running on my own. The main goal is staying useful while keeping cloud costs reasonable.
Self-Hosted Stack (my-ai)
Running Open WebUI + Ollama + SearXNG on GCP.
GCP us-west1-a
├── open-webui-vm (e2-micro)
│ ├── Open WebUI (chat interface)
│ ├── SearXNG (private search engine)
│ └── Caddy (HTTPS reverse proxy)
└── ollama-gpu-server (n1-standard-4 + T4 GPU, spot VM)
└── Ollama (local LLM inference)
Notable details:
- Open WebUI exposes an OpenAI-compatible API endpoint, so external tools can call it directly
- The GPU server runs as a spot VM — 60–70% cheaper, but can be preempted
- DuckDNS + Caddy handles custom domain and HTTPS for free
Models Running
| Model | Use case | VRAM |
|---|---|---|
| llama3.3:70b | General reasoning | 48GB (T4×4 or offload) |
| deepseek-coder-v2 | Code completion | 16GB |
| nomic-embed-text | Embeddings (RAG) | Low |
Local models are most useful for processing things I don’t want to send to external APIs — summarizing sensitive documents, private notes, etc.
Daily AI Digest (my-schedule)
An automated batch job that sends a morning email with AI-curated news summaries.
Flow:
EventBridge (6:50 AM)
→ Start EC2
→ cron triggers main.py
→ Open WebUI API × 4 topics
→ Send via AWS SES
→ EC2 self-stops
Topics: AI News, Economy & Portfolio, IT Industry, GrapheneOS
Cost: EC2 t3.micro running ~15 min/day = ~$2–3/month. SES is free up to 1,000 emails/month.
Getting a curated briefing pushed to my inbox every morning cut out a lot of mindless feed-checking.
Claude API
Using claude-sonnet-4-6 as the main model for development and batch work.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=[
{"role": "user", "content": "Summarize this: ..."}
]
)
Prompt caching is worth using — when the same system prompt is reused across requests, cache hits drop cost by ~90%. Useful for batch jobs and any workflow with a fixed prompt structure.
n8n Automation (my-n8n)
Self-hosted n8n on GCP for workflow automation.
Current workflows:
- RSS feed collection → filter → push to Notion database
- Gmail label trigger → summarize → Slack notification
- Scheduled web scraping → update Google Sheets
n8n is no-code for simple flows, but complex logic and transformations can be written in JavaScript. It has an OpenAI node (Claude-compatible) so you can call LLMs inline in a workflow.
Monthly Costs
| Service | Purpose | Approx. cost |
|---|---|---|
| GCP (e2-micro × 2) | Open WebUI, n8n | $10–15 |
| GCP (T4 spot VM) | Ollama, on-demand | $5–20 (variable) |
| AWS EC2 t3.micro | my-schedule | $2–3 |
| Claude API | Dev + batch | $5–20 (variable) |
| Total | ~$25–60 |
Switched from Claude Pro subscription to API-only to avoid a fixed monthly charge.
Takeaways
- Self-hosting has a real learning curve — but it builds genuine infrastructure understanding. Good hands-on practice with Docker, networking, and cloud VMs.
- Model selection matters — using strong API models (Claude, GPT-4) for quality-sensitive tasks and local Ollama models for private/bulk work is a practical split.
- Spot VMs need stateless design — they can go down anytime, so anything running on them needs to handle interruptions gracefully.
- Open WebUI is surprisingly complete — custom models, RAG, and web search are all built in.
Terraform configs and Docker Compose files for each project will be in separate posts.