Oracle Cloud Infrastructure (OCI) is seeking a Senior Site Reliability Engineer (SRE) to join the Messaging organization, which powers OCI Streaming—a mission-critical, large-scale, tier-0 customer-facing service.
As a Senior SRE, you will help to build tooling, maintain the infrastructure, monitor and minimize down time of services, patching, etc.
You will play a key role in ensuring the performance, reliability, and scalability of our cloud services, contributing directly to minimizing downtime and improving service resilience for our global customer base.
This is a high-impact role on a globally distributed team responsible for detecting, triaging, and mitigating OCI-wide service-impacting events.
You will partner with engineering and operations teams to build automation tools, enhance observability, and develop mitigation strategies to respond to and prevent outages.
Career Level –Site Reliability Developer 3 (IC3)
Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas.
Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services.
Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance.
Authority for end-to-end performance and operability.
Partner with development teams in defining and implementing improvements in service architecture.
Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio.
Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack.
Demonstrate clear understanding of automation and orchestration principles.
Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs).
Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations.
Understand and explain the affect of product architecture decisions on distributed systems.
Professional curiosity and a desire to a develop deep understanding of services and technologies.
Career Level - IC3