Service Delivery and Incident Response Lead

Full-time · Manila

OpsWerks is a technical consulting company specializing in operational services for the high-tech industry. We help platform and infrastructure teams operate multi-cloud environments, execute complex migrations, and enable seamless app deployments.

Your Role

People & Team Leadership

Lead, coach, and mentor IT engineers to build strong technical and leadership capabilities.
Set clear performance goals aligned with our Beliefs, Vision, Mission, Methods (BVMM).
Conduct 1:1s, performance reviews, and career growth discussions.
Foster a culture of ownership, collaboration, and continuous learning.
Maintain balanced workloads, shift coverage, and clear succession plans to sustain healthy 24×7 operations.

Service Operations & Reliability

Oversee daily service health, capacity, and reliability across all supported environments.
Ensure compliance with operational KPIs through proactive planning and improvement.
Balance demand vs. capacity and manage shift coverage to prevent burnout.
Partner with engineering teams to maintain runbooks, knowledge bases, and escalation paths.
Drive automation and workflow optimization to reduce manual overhead.
Use data insights to guide decisions and improvements.

Incident & Problem Management

Lead end-to-end incident response, triage, communication, and resolution in real time.
Act as Incident Commander for high-impact events across a global environment.
Track and improve metrics like MTTD, MTTM, and MTTR.
Champion blameless Post-Incident Reviews (PIRs) and translate learnings into long-term system and process improvements.

Strategic & Cross-Functional Impact

Represent in customer reviews, operational syncs, and briefings.
Collaborate with SREs, product owners, and partner engineers to align priorities and reliability goals.
Contribute to frameworks and governance initiatives.
Lead service onboarding/off-boarding and strengthen operational readiness checkpoints.
Identify and close systemic operational gaps through process and tool improvements.

Your Qualifications

Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related discipline.
3+ years in Service Delivery, Incident Response, or Operations Leadership within enterprise-scale, 24×7 environments.
Proven experience managing technical teams, driving performance, and leading through critical situations.
Strong grounding in ITSM / ITIL principles (Incident & Problem Management).
Familiarity with cloud, distributed systems, or enterprise infrastructure.
Skilled in monitoring, alerting, and ticketing tools (e.g., PagerDuty, Datadog, Grafana, Splunk, ServiceNow).

Core Competencies

People and Performance Leadership
Incident Command and Escalation Management
Analytical and Problem-Solving Skills
Communication and Decision-Making Under Pressure
Root Cause and Post-Incident Analysis
Operational Planning and Service Governance
Stakeholder and Partner Management
IT Service Management (Incident & Problem Management)
Observability, Monitoring, and Automation Tools
Passion for People Development, Operational Discipline, and Continuous Improvement

Plus points if you have:

ITIL V3 or V4 certification
AWS Certified SysOps Administrator
SRE Foundation or Crisis/Incident Management certifications
Background in SRE practices and operational frameworks that promote reliability and automation

Ready to start your awesome journey and be part of OpsWerks?

Apply now