QUALIFICATIONS : Non-Negotiable
Nice to have
SKILLS and RESPONSIBILITIES :
Operational Support Leadership :
Provide leadership and direction for operational support teams, ensuring timely resolution of incidents and effective communication with stakeholders.
Establish and maintain service level agreements (SLAs) for operational support activities, and continuously monitor and improve support processes.
Implement and optimize monitoring, logging, and alerting systems to facilitate proactive issue detection and resolution.
Lead and inspire the infrastructure team, providing guidance, support, and mentorship to ensure the successful execution of projects and initiatives.
Incident Management :
Establish incident management processes and procedures to ensure timely response and resolution of incidents impacting cloud services.
Define incident severity levels, escalation paths, and communication protocols to effectively manage incidents of varying impact and urgency.
Lead incident response efforts during major incidents, coordinating cross‑functional teams and stakeholders to mitigate impact and restore service.
Major Incident Response :
Develop and maintain a major incident response plan outlining roles, responsibilities, and procedures for responding to and resolving major incidents.
Conduct regular tabletop exercises and simulations to test the effectiveness of the major incident response plan and identify areas for improvement.
Lead post‑incident reviews and root cause analyses to identify systemic issues and implement corrective actions to prevent recurrence.
Monitoring and Observability :
Implement comprehensive monitoring and observability solutions to gain insights into the health, performance, and availability of cloud infrastructure and services.
Utilize monitoring tools and platforms (New Relic) to collect, analyze, and visualize metrics, logs, and traces.
Establish and maintain robust monitoring and alerting mechanisms to ensure the timely detection and resolution of issues in our hosting environment.
Performance Optimization :
Monitor and analyze cloud infrastructure performance metrics to identify bottlenecks and areas for optimization.
Implement performance tuning strategies to improve the efficiency and responsiveness of cloud‑based applications and services.
Work closely with development teams to optimize application performance and resource utilization in the cloud environment.
Trend Analysis and Reporting :
Conduct trend analysis on system performance, incidents, and operational metrics to identify patterns, anomalies, and areas for improvement.
Develop and maintain reports and dashboards to communicate key performance indicators (KPIs) and metrics related to processes and systems.
Collaborate with stakeholders to derive insights from data and drive data‑driven decision‑making to optimize processes and enhance system reliability.
Documentation :
Maintain comprehensive documentation of infrastructure configurations and procedures. Provide training and knowledge‑sharing sessions for team members to ensure proficiency in AWS technologies and best practices.
Team Management and Development :
Lead and mentor a team of cloud infrastructure engineers, providing guidance, support, and opportunities for professional growth.
Foster a culture of collaboration, innovation, and continuous learning within the team.
Develop and execute training programs to enhance the technical skills and expertise of team members.
We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or please contact .
Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here.
Please read our Candidate Privacy Policy.
We are an equal opportunity employer : qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.
USA Job Seekers :
EEO Know Your Rights.
#J-18808-Ljbffr
Lead System Engineer • Manila, Metro Manila, Philippines