Know ATS Score
CV/Résumé Score
  • Expertini Resume Scoring: Our Semantic Matching Algorithm evaluates your CV/Résumé before you apply for this job role: Manager – Network Observability Platform and Automation.
United States Jobs Expertini

Urgent! Manager – Network Observability Platform and Automation Job Opening In Dallas – Now Hiring DIGITAL REALTY

Manager – Network Observability Platform and Automation



Job description

Your role

A Manager – Network Observability typically leads a team of engineers focused on maintaining and improving the reliability, performance, and availability of an organization's systems and infrastructure.

This role involves a mix of technical leadership, people management, and strategic planning, ensuring systems meet business and user needs.

In this role, you will be responsible for oversight of Digital Realty’s Observability stack.

The ideal candidate can demonstrate a unique blend of network engineering, network operations, and software understanding through the application of engineering principals.

You will focus on delivering operational discipline and embrace key operational principals including automation, agile development, and scripting.

In this unique role, you will be part of the Observability team and build and maintain a global observability infrastructure.

Ideal candidates for this role will bring an understanding of carrier class network infrastructure as well as experience working in a fast-paced development environment.

What you’ll do 

  • Team Leadership:

    Manage and mentor a team of SREs, fostering their growth and development.

    Set team goals, prioritize projects, and ensure alignment with organizational objectives.

    Conduct performance reviews and provide constructive feedback.

    Build a positive and collaborative team environment.

  • Technical Oversight:

    Oversee the design, implementation, and maintenance of reliable infrastructure and services.

    Collaborate with other teams to define requirements, standards, and best practices.

    Identify and address performance bottlenecks and ensure system stability.

    Implement and improve monitoring and observability frameworks.

  • Operational Excellence:

    Manage on-call rotations and incident response to minimize downtime and ensure swift resolution.

    Drive automation efforts to reduce manual tasks and improve efficiency.

    Implement structured engineering and operations processes.

    Analyze and evaluate existing processes to identify opportunities for improvement.

  • Strategic Planning:

    Develop and implement the long-term reliability strategy for the organization.

    Make decisions about build vs.

    buy for tools and technologies.

    Ensure alignment with business goals and customer expectations.

    Manage relationships with vendors and other stakeholders.

  • Communication and Collaboration:

    Act as a bridge between technical teams and other departments.

    Represent the SRE team to stakeholders and communicate effectively.

    Collaborate with other engineering teams to ensure efficient workflows.

    Foster a culture of blameless postmortems and continuous learning.

  • What you’ll need 

    Key Skills and Experience:

  • Strong technical background in distributed systems, cloud computing, and related technologies.
  • Proven experience in managing and mentoring technical teams.
  • Excellent problem-solving and communication skills.
  • Experience with monitoring, automation, and incident management.
  • Understanding of , , and .
  • Familiarity with and Agile practices.

  • Qualifications

  • 10+ years of operations and engineering experience
  • 5+ years of of team building and management
  • 3+ years of network engineering in large scale data center environments
  • Bachelor’s degree in computer science (or equivalent training) preferred
  • Expertise in Layer 3 routing (BGP, IS-IS, etc) and Layer 2 switching (802.1Q, STP, etc) protocols
  • Experience with virtual networking concepts such as EVPN, VXLAN, Open vSwitch
  • Experience working with automation tools (Ansible, Terraform, etc)
  • Comfort with Python (or equivalent language)
  • Strong experience working with Linux systems and tools
  • Experience with virtual routing in Linux with FRR or similar software preferred
  • Experience with AWS preferred
  • A basic understanding of software development tools (Github, Jenkins, etc) and software development practices
  • Ability to understand high-level network design and its impacts across the infrastructure
  • Ability to work independently on complex and unique enterprise engineering projects
  • Strong analytical and troubleshooting skills
  • Strong communication skills

  • Required Skill Profession

    Computer Occupations



    Your Complete Job Search Toolkit

    ✨ Smart • Intelligent • Private • Secure

    Start Using Our Tools

    Join thousands of professionals who've advanced their careers with our platform

    Rate or Report This Job
    If you feel this job is inaccurate or spam kindly report to us using below form.
    Please Note: This is NOT a job application form.


      Unlock Your Manager Network Potential: Insight & Career Growth Guide