Title: Site Reliability Engineer With Machine Learning
Location: Austin, TX (Onsite)
Type: Full Time
Good experience in SRE with Client Ops, Client Flows & very good at Scripting is required.
Job Description :
The ideal candidate would be the person who had experience on Kubernetes, Machine Learning workflows (preferably Amazon Sagemaker), Python scripting, Rubix. The person should have experience in Jupyter Notebooks as SRE
Successful candidate will several years of experience in supporting large enterprise system with at least 10 different upstream and downstream systems., Identifying issues from Splunk logs.
Technically sound in AWS, Kubernetes, and Python, basic SQL, Client Ops knowledge like MLFlow is a plus.
Answering/Fixing support issues for DatalaLab.
Implement and maintain Infra as Code and Build pipeline.
Taking measures to minimize on-call incidents.
Post incident reviews
Documenting the issue resolution and the undocumented knowledge
Work with dev teams to ensure that the new features meet the reliability and performance goals.
Ability to work with geographically distributed teams in India and SCV
Excellent problem-solving skills and decision making skills about when to engage other team members.
Diverse Lynx LLC is an Equal Employment Opportunity employer. All qualified applicants will receive due consideration for employment without any discrimination. All applicants will be evaluated solely on the basis of their ability, competence and their proven capability to perform the functions outlined in the corresponding role. We promote and support a diverse workforce across all levels in the company.