Welcome! 👋

I’m Davide Rutigliano, a Senior Platform Engineer building GPU-accelerated Kubernetes platforms for AI/HPC workloads. Specialized in inference observability (vLLM, TTFT) and cluster lifecycle operations. Open-source contributor: Kubernetes, Kueue and KubeAI.

What is this Site?

This is my personal corner of the internet. A place to share my personal views, document what I learn, and connect with others in the field. Browse my portfolio for featured projects, read my blog for thoughts on platform engineering, or check out my notes for quick technical references.

What I Do

I specialize in building internal platforms and developer tools that scale. My work spans Kubernetes, virtualization, observability, and HPC/GPU infrastructure, with a focus on production readiness, efficiency and cost optimization.

Recent Highlights

🧠 vLLM & GenAI Observability: Engineered OpenTelemetry connectors to instrument vLLM inference (TTFT, KPIs), enabling on-call triage for multi-tenant GPU inference platform
⚡ High-performance GPU Monitoring: Engineered GPU observability solution for Kubernetes/KubeVirt (NVIDIA MIG/vGPU), unlocking 40+% HPC efficiency
🚀 62% infrastructure cost reduction ($100K+ annual savings) by architecting Kubernetes Cluster Auto-scaling with Cluster API across AWS, GCP, and on-prem
🤖 Built the SUSE Observability MCP Server from idea to MVP, embedding LLM-driven analysis directly into the alerting pipeline — recognized by senior leadership for production hardening
🔄 Designed VM migration orchestration with a Kubernetes operator enabling 100+ VMs migration from KVM to Harvester
📊 Architected federated observability migration to SUSE Observability (StackState), cutting troubleshooting time by 25%

🛠 Skills

AI & GPU Infra

NVIDIA MIG/vGPU GPU-Operator LLM-Ops vLLM Kueue/Slurm TensorFlow Pytorch Computer Vision

Observability

OpenTelemetry (OTel) Prometheus Grafana Alertmanager StackState Root Cause Analysis (RCA)

Reliability

SLIs/SLOs alerting strategy runbooks incident response postmortems capacity planning

Cloud Native

Kubernetes Helm Docker GitOps (ArgoCD, Flux) Terraform GCP AWS Azure

Development

Go Python Java Rust K8s Operators Event-Driven Architecture Linux

Let’s Connect

I’m always interested in discussing platform engineering, cloud architecture, and innovative solutions. Check out my portfolio for featured projects, or view my full CV.