Skip to content
burgercoffee
Go back

What I Built with AI: Self-Hosting, Automation, and APIs

A record of the AI-related infrastructure and tools I’ve been running on my own. The main goal is staying useful while keeping cloud costs reasonable.

Self-Hosted Stack (my-ai)

Running Open WebUI + Ollama + SearXNG on GCP.

GCP us-west1-a
├── open-webui-vm (e2-micro)
│   ├── Open WebUI (chat interface)
│   ├── SearXNG (private search engine)
│   └── Caddy (HTTPS reverse proxy)
└── ollama-gpu-server (n1-standard-4 + T4 GPU, spot VM)
    └── Ollama (local LLM inference)

Notable details:

Models Running

ModelUse caseVRAM
llama3.3:70bGeneral reasoning48GB (T4×4 or offload)
deepseek-coder-v2Code completion16GB
nomic-embed-textEmbeddings (RAG)Low

Local models are most useful for processing things I don’t want to send to external APIs — summarizing sensitive documents, private notes, etc.

Daily AI Digest (my-schedule)

An automated batch job that sends a morning email with AI-curated news summaries.

Flow:

EventBridge (6:50 AM)
  → Start EC2
  → cron triggers main.py
  → Open WebUI API × 4 topics
  → Send via AWS SES
  → EC2 self-stops

Topics: AI News, Economy & Portfolio, IT Industry, GrapheneOS

Cost: EC2 t3.micro running ~15 min/day = ~$2–3/month. SES is free up to 1,000 emails/month.

Getting a curated briefing pushed to my inbox every morning cut out a lot of mindless feed-checking.

Claude API

Using claude-sonnet-4-6 as the main model for development and batch work.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    messages=[
        {"role": "user", "content": "Summarize this: ..."}
    ]
)

Prompt caching is worth using — when the same system prompt is reused across requests, cache hits drop cost by ~90%. Useful for batch jobs and any workflow with a fixed prompt structure.

n8n Automation (my-n8n)

Self-hosted n8n on GCP for workflow automation.

Current workflows:

n8n is no-code for simple flows, but complex logic and transformations can be written in JavaScript. It has an OpenAI node (Claude-compatible) so you can call LLMs inline in a workflow.

Monthly Costs

ServicePurposeApprox. cost
GCP (e2-micro × 2)Open WebUI, n8n$10–15
GCP (T4 spot VM)Ollama, on-demand$5–20 (variable)
AWS EC2 t3.micromy-schedule$2–3
Claude APIDev + batch$5–20 (variable)
Total~$25–60

Switched from Claude Pro subscription to API-only to avoid a fixed monthly charge.

Takeaways


Terraform configs and Docker Compose files for each project will be in separate posts.


Share this post:

Next Post
Personal Finance Notes: Asset Allocation and Investment Policy