No longer accepting applications

Site Reliability Engineer (SRE) – AWS / TypeScript

Acquire IntelligenceMetro Manila, Philippines

3 days ago

Job description

We’re an award-winning global outsourcer providing contact center and back office services on behalf of our global clients. Come work at a place where innovation and teamwork come together to support the most exciting missions in the world!

Acquire Intelligence exists to help businesses unlock smarter ways of working. We believe that by combining the best of people, process, and automation, companies can grow faster and operate with greater confidence. Our purpose is to remove complexity, improve performance, and drive intelligent transformation for organizations around the world.

As an Acquire Intelligence employee, your role is vital in achieving and exceeding individual and team targets that support company objectives, while building and maintaining stakeholder relationships. You’re also responsible for complying with and enforcing procedures aligned with our information security policies.

As a values-led organization, we expect all our team members to exemplify our four values : Curious and Clever , Entrepreneurial Energy , Fast with Intent , and Laugh and Learn .

A SNAPSHOT OF YOUR ROLE

Responsibilities of the Site Reliability Engineer will include but are not limited to :

Service Level Management & Reliability

Define, monitor, and enforce Service Level Objectives (SLOs) and error budgets across all production systems
Track error budget burn rates and make data-driven decisions to halt risky deployments when thresholds are exceeded
Implement comprehensive monitoring and alerting strategies using Prometheus, Grafana, and PagerDuty
Establish and maintain reliability standards that support business-critical uptime requirements

Infrastructure Automation & Management

Design and implement Infrastructure as Code (IaC) solutions using Pulumi with TypeScript

Manage and optimize AWS services including EKS (Elastic Kubernetes Service), MSK (Managed Streaming for Kafka), SingleStore, MongoDB S3

Automate operational processes to eliminate toil, targeting any task that consumes more than 2 engineer-days per quarter

Incident Response & Post-Mortem Leadership

Serve as incident commander during production outages and service degradations

Lead comprehensive post-mortem processes within 48 hours of incidents

Drive "never-again" corrective actions to completion, ensuring systemic improvements

Maintain and improve incident response procedures and runbooks

Security & Compliance

Implement and enforce least-privilege IAM policies across all AWS resources

Manage security patch pipelines and vulnerability remediation processes

Support compliance initiatives including SOC2 and ISO 27001 certification requirements

Ensure security best practices are embedded in all infrastructure and operational procedures

On-Call & Operational Excellence

Participate in follow-the-sun on-call rotation with one week primary / secondary commitment every five weeks

Provide 24×7 support coverage across AU / NZ, EU / ZA, and MX time zones

Maintain operational runbooks and knowledge

Create a job alert for this search

Site Reliability Engineer • Metro Manila, Philippines