Senior Kubernetes Engineer
OpsWerks is a technical consulting company specializing in operational services for the high-tech industry. We help platform and infrastructure teams operate multi-cloud environments, execute complex migrations, and enable seamless app deployments.
Your Role
As a Senior Kubernetes Engineer (EKS), you will be part of a managed services team supporting Kubernetes clusters running production workloads. You will handle incident response and troubleshooting, drive operational improvements, and mentor junior engineers—while coordinating closely with internal teams and client stakeholders.
Provide hands-on incident response and troubleshooting for Kubernetes/EKS production environments, including investigation, mitigation, and follow-through actions.
Act as a senior escalation point for complex cluster and application platform issues (networking, DNS, ingress, autoscaling, scheduling, node issues).
Support and maintain EKS platform operations such as cluster upgrades, add-ons management, node group/launch template changes, security patching, and capacity planning—following client change management processes.
Improve platform reliability by enhancing observability (metrics/logs/traces), alert quality, and runbook maturity.
Identify recurring issues and implement preventative actions through automation, standardization, and documentation.
Create and maintain runbooks, troubleshooting guides, operational checklists, and platform standards.
Participate in post-incident reviews (RCA) and ensure corrective and preventive actions are tracked to completion.
Mentor and guide junior engineers through reviews, pair troubleshooting, knowledge sharing, and operational best practices.
Demonstrate leadership during incidents and projects by coordinating tasks, communicating clearly, and keeping teams aligned on priorities.
Participate in an on-call rotation as part of a 24/7 operations model, with proper handoffs and team support.
Your Qualifications
3+ years hands-on Kubernetes experience supporting production environments.
Strong experience with Amazon EKS and common Kubernetes components (CoreDNS, kube-proxy, CNI, Ingress controllers, autoscaling).
Proven experience in incident response, troubleshooting, and production operations (debugging pods, networking/DNS issues, node problems, resource constraints, rollout failures).
Working knowledge of Kubernetes fundamentals: deployments, services, ingress, configmaps/secrets, RBAC, namespaces, quotas/limits, PDBs, and upgrade readiness.
Familiarity with observability and troubleshooting tools (CloudWatch, Prometheus/Grafana, Splunk/ELK, kubectl debugging, etc.).
Basic scripting/automation ability (e.g., Python or Bash) to reduce repetitive operational tasks.
Solid Linux and networking fundamentals (TCP/IP basics, DNS, TLS, load balancing concepts).
Excellent communication skills (written and oral)—can write clear incident updates, documentation, and explain technical issues to stakeholders.
Preferably has leadership skills (formal or informal), with the ability to guide others, lead discussions, and influence improvements.
Plus points if you have:
Kubernetes certifications (CKA/CKAD) or AWS certifications
Terraform or Crossplane
CI/CD (GitHub Actions, GitLab CI, Jenkins, etc.)
Networking (VPC design, routing, DNS, load balancers, troubleshooting)
Envoy / Nginx / Proxy concepts (Ingress, service routing, L7 behavior, TLS)
Experience with service mesh (Istio/Linkerd) and advanced traffic management
Ready to start your awesome journey and be part of OpsWerks?