Senior Multi-Cloud Infrastructure Engineer
As a Senior Multi-Cloud Infrastructure Engineer, you will play a critical role in managing and optimizing multi-cloud environments, with a primary focus on AWS and/or Alibaba Cloud. You will be responsible for ensuring high availability, performance, security, and operational efficiency in mission-critical systems. You will also serve as a key escalation point for troubleshooting complex cloud infrastructure issues and implementing automation-driven solutions to enhance reliability.
This role requires deep hands-on expertise in AWS and Alibaba Cloud, with additional experience in GCP and Azure as a plus. You will work in a dynamic, fast-paced environment, collaborating with DevOps, SRE, and engineering teams to improve cloud operations.
Multi-Cloud Support & Incident Management
Serve as a senior escalation point for complex infrastructure issues across AWS and Alibaba Cloud, ensuring timely resolution and minimal downtime.
Lead incident response, troubleshooting, and root cause analysis (RCA) for cloud service failures, performance degradation, and security incidents.
Develop and implement incident management playbooks to standardize troubleshooting and reduce resolution times.
Participate in a 24×7 on-call rotation, ensuring continuous monitoring and rapid response to critical incidents.
Cloud Infrastructure Operations & Optimization
Monitor, maintain, and optimize cloud resources across AWS and Alibaba Cloud, ensuring scalability, cost efficiency, and compliance with best practices.
Manage multi-cloud networking including VPCs, Transit Gateways, VPNs, Load Balancers (AWS ALB/NLB, Alibaba SLB), and Firewalls.
Perform patch management, system updates, and security hardening to maintain a robust and secure cloud environment.
Optimize resource utilization and cost management across cloud platforms, leveraging FinOps best practices.
Automation & Infrastructure as Code (IaC)
Automate routine cloud operations using Terraform, AWS CloudFormation, Alibaba ROS, Pulumi, and Ansible.
Develop and maintain automation scripts in Bash, Python, or Go to improve cloud reliability and efficiency.
Implement self-healing mechanisms, auto-scaling strategies, and proactive alerting systems to improve infrastructure resilience.
Security, Compliance & Best Practices
Ensure cloud security best practices by managing IAM/RAM policies, encryption, and security group configurations.
Collaborate with security teams to conduct cloud security audits, risk assessments, and vulnerability management.
Implement and maintain compliance with industry standards
Collaboration & Technical Leadership
Work closely with DevOps, SRE, and development teams to resolve cloud infrastructure challenges and enhance performance.
Provide mentorship and technical leadership to internal teams on multi-cloud best practices, automation, and cost optimization.
Document operational procedures, troubleshooting guides, and automation frameworks for knowledge sharing.
Education:
Bachelor’s degree or higher in Computer Science, Information Technology, or a related field.
Experience:
Multi-Cloud Infrastructure:
Primary focus on AWS and Alibaba Cloud – deploying, managing, and optimizing workloads across both platforms with at least 5 years solid working experience in AWS ecosystem.
Experience with multi-cloud architectures, including hybrid solutions with AWS + Alibaba Cloud, or AWS mixed with GCP/Azure.
Cloud Services & Technologies:
AWS: EC2, VPC, IAM, S3, RDS, Lambda, Route 53, CloudFormation, CloudFront, CloudWatch, Auto Scaling, ALB/NLB, and Transit Gateway.
Alibaba Cloud: ECS, VPC, RAM, OSS, RDS, SLB, Function Compute, DNS, CloudMonitor, Auto Scaling, and Resource Orchestration Service (ROS).
Strong understanding of cloud security best practices, compliance, and cost optimization strategies.
Linux Systems Administration:
Expertise in Red Hat Enterprise Linux (RHEL), Ubuntu, or other major Linux distributions.
Experience in hardening, troubleshooting, and performance tuning.
Networking:
Proficiency in networking across cloud environments (AWS VPC Peering, Transit Gateway, Alibaba Cloud CEN, VPN, Direct Connect).
In-depth knowledge of TCP/IP, Firewalls (Security Groups, IPTables, ACLs), Load Balancing (AWS ELB/ALB, Alibaba SLB, Citrix NetScaler, NGINX, Envoy Proxy), and routing.
Build & Release Management:
Hands-on experience with Git (GitHub/GitLab), Artifactory, and CI/CD tools like Jenkins, AWS CodePipeline/CodeDeploy, GitHub Actions, Spinnaker.
Experience with Java build tools (Maven, Gradle) and automated deployment strategies.
Containerization & Orchestration:
Kubernetes (EKS, ACK), Docker, HashiCorp Nomad, AWS ECS.
Experience in container security, networking, and scaling strategies.
Logging & Monitoring:
AWS CloudWatch, Alibaba CloudMonitor, Prometheus, Thanos, Grafana, Splunk, New Relic.
Strong observability and alerting strategies for multi-cloud environments.
Scripting & Automation:
Proficiency in Bash, Python, or Go for automation, cloud resource management, and infrastructure deployment.
Expertise in Infrastructure as Code (IaC): Terraform, Pulumi, AWS CloudFormation, Alibaba ROS, Ansible, Puppet, Chef.
SRE & DevOps Expertise:
Strong background in Site Reliability Engineering (SRE), DevOps culture, and automation-driven operations.
Experience in implementing highly available, scalable, and resilient cloud-native architectures.
Communication & Collaboration:
Excellent verbal and written communication skills, with experience in cross-functional collaboration.
Certifications (Preferred):
AWS Certified Solutions Architect / DevOps Engineer / Security
Alibaba Cloud Professional / Expert Certifications
Relevant Kubernetes (CKA/CKS), Terraform, or DevOps certifications are a plus.
Do you want to join our team as our new Senior Multi-Cloud Infrastructure Engineer? Then we'd love to hear about you!