Blog posts

2026

Why Kubernetes doesn’t “just work” with GPUs

6 minute read

Published:

If you are running standard web applications on Kubernetes, the environment feels like a high-security facility. If you allocate 1GB of RAM to a pod, the Linux kernel acts as a relentless enforcer; the moment that pod attempts to touch 1.1GB, it is instantly terminated (OOMKilled). Similarly, CPU cycles are metered with surgical precision using Completely Fair Scheduling (CFS) quotas.

A Deep Dive into GPU Sharing Technologies

7 minute read

Published:

In Part 1 we explored how Kubernetes threats GPUs as a monolithic integer resource. For a massive training job, this may be a good fit. For a lightweight inference server utilizing only 2GB of an 80GB A100, it is a staggering waste of capital.

Gas Town: The Industrial Revolution of Vibe Coding

5 minute read

Published:

In our previous deep dive, we explored the Ralph Wiggum Technique , a method defined by its beautiful simplicity. It was the software equivalent of an infinite monkey theorem: set up a bash loop, feed an error log to an AI, and let it fail its way to success.

2025