What I Built with AI: Self-Hosting, Automation, and APIs

A record of the AI-related infrastructure and tools I’ve been running on my own. The main goal is staying useful while keeping cloud costs reasonable.

Self-Hosted Stack (my-ai)

Running Open WebUI + Ollama + SearXNG on GCP.

GCP us-west1-a
├── open-webui-vm (e2-micro)
│   ├── Open WebUI (chat interface)
│   ├── SearXNG (private search engine)
│   └── Caddy (HTTPS reverse proxy)
└── ollama-gpu-server (n1-standard-4 + T4 GPU, spot VM)
    └── Ollama (local LLM inference)

Notable details:

Open WebUI exposes an OpenAI-compatible API endpoint, so external tools can call it directly
The GPU server runs as a spot VM — 60–70% cheaper, but can be preempted
DuckDNS + Caddy handles custom domain and HTTPS for free

Models Running

Model	Use case	VRAM
llama3.3:70b	General reasoning	48GB (T4×4 or offload)
deepseek-coder-v2	Code completion	16GB
nomic-embed-text	Embeddings (RAG)	Low

Local models are most useful for processing things I don’t want to send to external APIs — summarizing sensitive documents, private notes, etc.

Daily AI Digest (my-schedule)

An automated batch job that sends a morning email with AI-curated news summaries.

Flow:

EventBridge (6:50 AM)
  → Start EC2
  → cron triggers main.py
  → Open WebUI API × 4 topics
  → Send via AWS SES
  → EC2 self-stops

Topics: AI News, Economy & Portfolio, IT Industry, GrapheneOS

Cost: EC2 t3.micro running ~15 min/day = ~$2–3/month. SES is free up to 1,000 emails/month.

Getting a curated briefing pushed to my inbox every morning cut out a lot of mindless feed-checking.

Claude API

Using claude-sonnet-4-6 as the main model for development and batch work.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    messages=[
        {"role": "user", "content": "Summarize this: ..."}
    ]
)

Prompt caching is worth using — when the same system prompt is reused across requests, cache hits drop cost by ~90%. Useful for batch jobs and any workflow with a fixed prompt structure.

n8n Automation (my-n8n)

Self-hosted n8n on GCP for workflow automation.

Current workflows:

RSS feed collection → filter → push to Notion database
Gmail label trigger → summarize → Slack notification
Scheduled web scraping → update Google Sheets

n8n is no-code for simple flows, but complex logic and transformations can be written in JavaScript. It has an OpenAI node (Claude-compatible) so you can call LLMs inline in a workflow.

Monthly Costs

Service	Purpose	Approx. cost
GCP (e2-micro × 2)	Open WebUI, n8n	$10–15
GCP (T4 spot VM)	Ollama, on-demand	$5–20 (variable)
AWS EC2 t3.micro	my-schedule	$2–3
Claude API	Dev + batch	$5–20 (variable)
Total		~$25–60

Switched from Claude Pro subscription to API-only to avoid a fixed monthly charge.

Takeaways

Self-hosting has a real learning curve — but it builds genuine infrastructure understanding. Good hands-on practice with Docker, networking, and cloud VMs.
Model selection matters — using strong API models (Claude, GPT-4) for quality-sensitive tasks and local Ollama models for private/bulk work is a practical split.
Spot VMs need stateless design — they can go down anytime, so anything running on them needs to handle interruptions gracefully.
Open WebUI is surprisingly complete — custom models, RAG, and web search are all built in.

Terraform configs and Docker Compose files for each project will be in separate posts.