Senior Cloud Applications Engineer
Your Role
Serve as Subject Matter Expert (SME) for distributed applications on hybrid cloud platforms, documenting best practices and providing guidance to peers.
Champion continuous operational improvements informed by metrics analysis and customer feedback.
Lead incident management, troubleshooting, response coordination, and conduct comprehensive post-incident reviews.
Clearly communicate complex technical issues to development teams, document root causes, and collaborate internally to create robust solutions.
Manage, deploy, and maintain enterprise applications and cloud-based systems using secure, scalable, and reliable frameworks.
Proactively monitor, troubleshoot, and optimize the health, performance, and reliability of applications and platforms.
Perform detailed log analysis and utilize stack traces to debug and resolve issues reported by partners and end-users.
Develop comprehensive documentation covering operational procedures, system configurations, and environment setups.
Continuously identify and implement automation opportunities to reduce manual tasks and operational overhead.
Train junior engineers in different subjects of expertise.
Participate in a 24x7 shifting rotation.
Your Qualifications
Bachelor’s Degree in Information Technology, Engineering, or a related technical discipline.
Minimum 5 years of experience supporting critical, high-availability production systems with a strong focus on automation, reliability, and operational excellence.
Strong verbal and written communication skills.
Required Technical Skills
Minimum of 5 years’ experience in at least 1-2 tools per domain:
Linux Administration and Troubleshooting: RHEL, CentOS, Ubuntu, or similar Unix-based operating systems.
Distributed Applications: Microservices architectures and distributed application support.
Logging & Monitoring: Splunk, Grafana, Prometheus.
Incident Management: PagerDuty, ServiceNow
Version Control : Git, GitHub, Gitlab
Plus Points
Certifications such as Certified Kubernetes Administrator (CKA), Certified Kubernetes Application Developer (CKAD), or relevant cloud certifications (AWS, Azure, GCP).
Extensive experience supporting and maintaining Platform-as-a-Service (PaaS) environments, Content Delivery Networks (CDNs), Messaging Queues, API Gateways, and Proxies as part of scalable and resilient system architectures.
Proven experience working in collaborative, cross-functional teams within structured processes that follow modern DevOps practices and workflows.
Proven ability to improve operational efficiency through the development of automation tools and scripts, leveraging languages such as Bash and Python to streamline workflows, reduce manual toil, and enhance and enhance system reliability.
You'll be part of a driven and passionate Site Reliability Engineering Team
that deploys some of the most diverse and
largest Datacenter and Distributed computing platforms.
Our werk is known for excellent operational work.
These are people who are taking head-on data center and
platform operation challenges, not afraid of doing the dirty work,
but still, continuously looking out for ways
to make things efficient and better.
Our team is made up of individuals who are aligned with OpsWerks’ values. In the spirit of building a healthy community, which requires open and honest communication, here are our expectations for every one of us at OpsWerks:
To uphold OpsWerks’ Mission and Methods.
To know, believe, and execute each team’s mission plan.
Growing in the 4 awareness (self, others, surroundings, and situation).
To take ownership of your personal growth for the team’s well-being.
To never give up, to never give in… only giving your best.
Apply for the job!
Ready to start your awesome journey and be part of OpsWerks?