TeleTrackr – Observability Pipeline on EKS
Overview
TeleTrackr is a production-grade observability pipeline built for a microservices e-commerce platform running on Amazon EKS. It provides distributed tracing, metrics collection, alerting, and real-time dashboards — with automated CI/CD deployment and rollback safety built in.
The Challenge
Microservices platforms generate traces and metrics across dozens of pods with no unified view. Pod failures go undetected for minutes, root-cause analysis requires digging through raw logs, and there's no automated way to catch regressions after deploys. Manual monitoring doesn't scale past a handful of services.
The Solution
Instrumented all microservices with OpenTelemetry SDKs for distributed tracing and metrics export. Prometheus scrapes metrics from all pods on a 15-second interval and evaluates alerting rules. Grafana visualizes the full trace-to-metric pipeline with per-service dashboards. AlertManager routes pod failure alerts to Gmail in real time. Helm-based deployments are automated via GitHub Actions CI/CD with built-in rollback on failed health checks.
Tech Stack
Outcomes
- ▸Real-time Gmail alerts delivered within 30 seconds of pod failure detection
- ▸Zero-downtime Helm rollbacks validated across 3 deployment scenarios
- ▸Full trace-to-dashboard pipeline operational with sub-second metrics resolution
- ▸CI/CD pipeline with automated health check gating prevents bad deploys from reaching production