We are seeking a Web Application Technical Lead with strong expertise in managing and supporting large-scale, customer-facing applications.
This role requires hands-on incident handling experience and deep knowledge of observability tools to ensure application stability, scalability, and performance.
· Lead technical operations and incident handling for high-traffic web applications.
· Troubleshoot and resolve critical production issues, ensuring minimal downtime.
· Conduct root cause analysis and drive preventive measures.
· Monitor system performance and reliability using Splunk, Prometheus, Grafana, Datadog.
· Collaborate with engineering and product teams to implement enhancements and scalability improvements.
· Mentor junior engineers and establish best practices for observability and incident response.
· 10+ years of technical leadership in web applications, operations, or site reliability.
· Strong incident management and troubleshooting expertise.
· Hands-on experience with Splunk, Prometheus, Grafana, and Datadog.
· Proven ability to manage large-scale, customer-facing applications.
· Strong communication and cross-functional collaboration skills.
· Ability to thrive in a hybrid work environment.