Job Description: HPC Infrastructure/Network Engineer
Location: Ashburn, VA - Onsite
Duration: 6 to 8 months 
Role Overview:
The HPC (High-Performance Computing) Role focuses on planning, implementing, and managing InfiniBand network configurations for high-performance computing in data centers.
The role emphasizes network and physical network troubleshooting (e.g., NIC testing, Ixia-enabled testing), with a skill distribution of 60% network, 30% Linux + CI/CD, and 10% HPC.
Responsibilities include configuring switches, routers, and adapters, implementing security protocols, monitoring performance, troubleshooting, collaborating with vendors, and developing automation scripts.
Key Responsibilities:
 - Configure and manage InfiniBand networks, including switches, routers, adapters, and performance tuning (e.g., MTU, buffer sizes, PFC/DCB for congestion management).
 - Conduct physical network troubleshooting (e.g., NIC testing, Ixia-enabled testing for performance validation).
 - Develop automation scripts (Python, Shell) for network tasks, leveraging libraries like Netmiko, NAPALM, Jinja; Ansible a plus.
 - Monitor performance using tools like EPM/IPM; implement security protocols (MACsec, IPsec, access controls).
 - Collaborate with vendors for compatibility, POCs, and BOMs; support lab/pre-field testing.
 - Document configurations and processes via MOP/SOP.
Qualifications:
 - Bachelor's degree in Computer Science, IT, or related field.
 - 5+ years of InfiniBand experience in enterprise/lab environments.
 - Expertise in InfiniBand architecture, protocols; RoCE a plus.
 - Proficient in Python, Shell scripting (junior developer level, 1–2 years) for network automation; Git experience preferred.
 - Strong network security (MACsec/IPsec), troubleshooting, and performance tuning skills.
 - Familiarity with RDMA applications, parallel computing frameworks (e.g., MPI, OpenMP).
 - Certifications (e.g., IBTA, CCNP) a plus; Linux/UNIX proficiency and CI/CD mindset required.
Skill Distribution (60/30/10):
 - 60% Network: Emphasis on InfiniBand troubleshooting, NIC testing, Ixia-enabled testing, and performance tuning (e.g., PFC/DCB, MTU).
 - 30% Linux + CI/CD: Linux/UNIX administration, Python/Shell scripting for automation, CI/CD familiarity (Git/Jenkins).
 - 10% HPC: Basic HPC cluster knowledge, RDMA applications, parallel computing (MPI/OpenMP).