Senior Site Reliability Engineer - Infrastructure

Apply now

Senior Site Reliability Engineer - Infrastructure

At OpsWerks, we're looking for a Senior Site Reliability Engineer – Infrastructure to join our team.

Full-time · Manila

Your Role

  • Serve as subject matter expert for infrastructure operations at scale by sharing knowledge amongst peers, documenting best practices, and performing root cause analysis or recurring/high impacting incidents.

  • Lead incident response, triaging, and customer communications during system outages and production issues.

  • Lead and run periodic sync-up meetings with our customers to discuss updates, get clarity, and solicit feedback.

  • Gather and analyze operations metrics regularly to make informed decisions in driving operational and process improvements.

  • Develop or contribute to existing tools and automated solutions to improve efficiency in operations.

  • Create comprehensive documentation for operational procedures and environment setup.

  • Eliminate operational toil through automation or process improvements.

  • Be a member of a 24x7 shifting rotation.

Your Qualifications

  • Bachelor’s degree in any Information Technology or Engineering course

  • Demonstrated ability in supporting critical production services and improving operations through automations and process enhancements.

  • At least 5 years of experience working with any technologies in the following domains:

    • Linux Systems Administration: RHEL, CentOS, Ubuntu or other *Nix systems

    • Container Orchestration and Scheduling: Docker, Kubernetes or similar

    • Infrastructure-as-a-Code: Puppet, Ansible, Chef, Terraform or similar

    • Logging and monitoring: Prometheus, Grafana, Splunk, MRTG

    • Version Management and CI/CD: Git, Jenkins or similar

    • Networking: Core Networking Concepts, Load Balancing (NetScaler, F5 or similar), Reverse Proxy (Nginx), Software Defined Networks (SDN), DHCP, DNS

    • Security: SSL certificates, Firewall, ACLs

  • Has strong experience managing Datacenter lifecycle (build, operate, decommission) practices.

  • Experience in leading infrastructure related projects to success.

  • Demonstrated ability in supporting critical production services and improving operations through automation and enhancements.

  • Solid understanding of distributed computing principles, platform operations, and best practices in Data Engineering and DevOps workflows.

Plus Points

  • Relevant certification in key skillsets – Linux (RHCSA, RHCE, LPIC), Networking (CCNP), Kubernetes (CKA, CKAD), or Cloud Computing AWS.

You'll be part of a driven and passionate Site Reliability Engineering Team that deploys some of the most diverse and massive Cloud platforms in the industry.

Our werk is known for excellent operational work. These are people who are taking head-on data center and platform operation challenges, not afraid of doing the dirty work, but still, continuously looking out for ways to make things efficient and better.

Your Team

Our team is made up of individuals who are aligned with OpsWerks’ values. In the spirit of building a healthy community, which requires open and honest communication, here are our expectations for every one of us at OpsWerks:

  • To uphold OpsWerks’ Mission and Methods.

  • To know, believe, and execute each team’s mission plan.

  • Growing in the 4 awareness (self, others, surroundings, and situation).

  • To take ownership of your personal growth for the team’s well-being.

  • To never give up, to never give in… only giving your best.

Apply for the job!

Ready to start your awesome journey and be part of OpsWerks?