Data Platform Reliability Engineer

Apply now

Senior Data Platform Reliability Engineer

Full-time · Manila and Cebu

OpsWerks is a technical consulting company specializing in operational services for the high-tech industry. We help platform and infrastructure teams operate multi-cloud environments, execute complex migrations, and enable seamless app deployments.

Your Role

As a Senior Data Platform Engineer, you will be responsible for operating, maintaining, and continuously improving the company’s data platforms running on Kubernetes (On-premise and/or on AWS/GCP) - similar on the DoEKS (Data on EKS) / AIoEKS (AI on EKS) deployment frameworks

  • Deploy new releases and configuration changes through GitOps/DevOps 

  • Monitor platform and service health using logs, metrics, and observability tools  

  • Participate in incident response, root cause analysis and 24x7 operational rotations 

  • Improve platform observability, operational tooling/automations, self-service capabilities and reliability practices to reduce recurring issues  

  • Investigate & troubleshoot user concerns by either correlating them to system-related issues, breaking integrations and/or user-specific errors/misconfigurations up to recommending/executing resolutions 

  • Provide technical mentorship to junior engineers 

  • Advocate for platform standards, security best practices, and operational excellence 

Your Qualifications

  • 3+ years of solid experience supporting production data workloads/platforms (Spark/Airflow/Jupyter) 

  • 5+ years of hands-on experience on ETL/ELT pipeline development & data transformations (Python/Java & SQL) 

  • Practical proficiency in Kubernetes environments including Cloud-provider managed Kubernetes flavors (AWS-EKS/GCP-GKE)

  • Comprehensive knowledge on Linux environments, microservice architectures and service communication patterns

  • Strong troubleshooting fundamentals such as application crashes, resource contentions, service latency, and scaling behavior 

  • Well-rounded competency in analyzing logs, metrics, monitoring systems, and service KPIs  

Plus points if you have:

  • Exposure in other Data/AI platforms such as Flink, Trino, Druid and Ray.IO 

  • Hands-on experience with automation or scripting (Bash, Python) 

  • Kubernetes or Data certifications (CKAD, AWS Certified Data Engineer) 

Ready to start your awesome journey and be part of OpsWerks?