Job description
Role name Site Reliability Engineer
Location: Atlanta, GA (On Site)
Contract RoleRole and responsibilities:+ years of experience in Site/System Reliability, DevOps, or related roles.
• Strong skills in Linux/Unix administration and shell scripting.
• Proficiency with cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes, Docker).
• Knowledge of networking fundamentals (TCP/IP, DNS, load balancing).
• Proficiency in Linux/Unix administration, scripting (Python, Bash, or similar).
• Experience with monitoring tools (Prometheus, Grafana, Data Dog).
• Familiarity with containerization (Docker, Kubernetes) and cloud services.
• Experience with CI/CD systems (Jenkins, GitHub Actions, GitLab CI).
• Strong analytical and problem-solving skills.
• Knowledge of security practices (IAM, encryption, secrets management).
• Experience with incident management frameworks and SRE principles.
• Knowledge of performance tuning and capacity planning.
• Exposure to observability tools and log aggregation systems.
• Understanding of networking and security fundamentals.
Design, implement, and maintain monitoring, logging, and alerting systems.
• Define and track Service Level Indicators (SLIs), Objectives (SLOs), and Agreements (SLAs).
• Conduct post-incident reviews and implement preventive measures.
• Automate deployment, scaling, and operational tasks using Infrastructure-as-Code tools (Terraform, Ansible, CloudFormation).
• Implement CI/CD pipelines and release management processes.
• Optimize infrastructure for reliability, performance, and cost efficiency.
• Respond to production incidents, perform root cause analysis, and implement solutions.
• Collaborate with development teams to ensure system robustness.
• Maintain runbooks and operational documentation.
• Partner with software developers, QA, DevOps, and product teams to improve system reliability.
• Promote best practices in coding, testing, and deployment.
• Advocate for proactive measures to prevent outages and reduce operational toil.
• Ensure systems adhere to security, compliance, and governance standards.
• Participate in vulnerability assessments and remediation planning.
Required Skill Profession
Computer Occupations