Talent Space, Inc.
is seeking a DevOps Engineer (Observability Engineer) for contract-to-hire opportunity in Thousand Oaks, CA!In this role, you will lead the design, implementation, and maintenance of our observability platform, with a strong emphasis on New Relic.
Your expertise in Infrastructure as Code (IaC) will be key to automating and managing our monitoring and alerting systems, ensuring high reliability, performance, and visibility across our infrastructure.
You’ll collaborate closely with Core Services, DevOps, Development, and Operations teams to promote a culture of proactive monitoring and data-driven decision-making.
Responsibilities:
- Design and Implement Observability Solutions: Architect, build, and scale end-to-end monitoring capabilities leveraging the New Relic platform—including APM, Infrastructure, Logs, Synthetics, and custom instrumentation with NRQL.
- Automate with Infrastructure as Code (IaC): Use tools like Terraform / OpenTofu to define, manage, and maintain observability components such as alerts, dashboards, and synthetic checks, ensuring consistency and scalability.
- Create Dashboards and Alerts: Develop clear, actionable dashboards and alerting policies in New Relic to deliver real-time insights into system and application health.
- Champion Best Practices: Serve as an observability subject matter expert, advising teams on logging, metrics, and tracing standards to enhance system reliability and reduce Mean Time to Resolution (MTTR).
- Troubleshoot and Optimize Performance: Leverage telemetry data to identify issues, analyze performance trends, and drive continuous optimization across the infrastructure and application stack.
Experience:
- 7+ years in a Cloud Engineering role, such as Observability, DevOps, or Site Reliability Engineering (SRE).
- New Relic Expertise: 3+ years of hands-on experience with the New Relic platform, including deep proficiency in Dashboards, NRQL, APM, and building effective alerting strategies.
- Infrastructure as Code (IaC): 3+ years of experience managing infrastructure and observability configurations using tools like Terraform or OpenTofu (preferred), as well as AWS CDK, CloudFormation, Chef, or Ansible.
- Cloud Platforms: Extensive hands-on experience with leading cloud providers—especially AWS (preferred), but also GCP or Azure.
- Scripting & Automation: Proficiency in scripting languages such as Python, Go, or Bash for automation and tooling development.
- Systems & Architecture: Strong understanding of cloud infrastructure, networking, Linux/Windows server administration, containerization (Docker, Kubernetes), CI/CD pipelines, and microservices within a SaaS environment.
- Security & Compliance: Solid grasp of security best practices, particularly within cloud infrastructure and CI/CD workflows.
Experience working in complex or highly regulated environments is a plus.
- Analytical & Troubleshooting Skills: Proven ability to analyze complex systems, identify root causes, and resolve issues efficiently across the tech stack.
- Cross-Functional Collaboration: Strong communication and interpersonal skills with a track record of working effectively with diverse teams across engineering, operations, and business functions.