Distributed Systems Design¶

Languages: Python, Go, Bash
Cloud: AWS, GCP, Kubernetes
Databases: PostgreSQL, MySQL, Redis, Kafka
Tools: Prometheus, Grafana, Terraform, ArgoCD, Varnish

Insights on Site Reliability Engineering and Distributed Systems from a Professional SRE Engineer with over 10 years of experience in designing and implementing Kubernetes-native applications in multi-cloud environments.

About¶

Senior Site Reliability Engineer with Software engineering background and 11+ years of experience architecting, deploying and scaling business-critical systems in hybrid-cloud environments, both on-prem and public (GCP, AWS). I've spent 8+ years at Vimeo enhancing platform reliability and performance of large-scale cloud-native applications. I was instrumental in migrating on-premises data center services to the cloud and adopting Kubernetes as the core orchestration platform, supporting business growth through high-availability system design, robust observability frameworks, and performance optimization at scale.

My professional work with Linux systems started 20 years ago, when I cross-compiled kernel modules for ARM-based embedded linux, and the learning continues as I study how Istio would complement Cilium in Kubernetes Control Plane V2.

Expertise¶

Site Reliability Engineering¶

Kubernetes clusters at scale
Observability (metrics, logs, traces)
SLOs, SLIs, and error budgets
Incident response and retrospectives
Capacity planning and load testing
GitOps and Continuous Integration and Continuous Delivery