About the Team
OpenAI’s Capacity Planning team ensures that our research and product teams have the compute, storage, and networking resources they need—when they need them.
We work across engineering, product, and research to forecast demand, track supply, and optimize utilization of compute.
Our goal is to develop data-driven, automated, and scalable planning systems that unlock the next generation of frontier AI models.
We are looking for a Capacity Tooling Engineer to design, build, and maintain the internal platforms, services, and dashboards that power OpenAI’s capacity planning and allocation processes.
You will create the tooling that helps us forecast usage, model scenarios, and make multi-billion-dollar infrastructure decisions.
Your work will directly impact how we allocate compute across research, product launches, and strategic initiatives.
This role is based in San Francisco, CA.
We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.
In this role you will:
Build and scale tooling for capacity planning that incorporate data pipelines, forecasting dashboards, allocation solvers, and scenario modeling tools.
Integrate data sources from infrastructure teams, data science, and multiple cloud providers to create a single source of truth for compute supply, demand, and costs.
Develop real-time reporting and alerting to surface supply gaps, utilization trends, and risks to leadership.
Design and implement automations to streamline workflows such as demand collection and supply allocation.
Design and implement optimization engines and solvers that recommend optimal allocation of compute.
Build interactive models that allow leadership to test “what-if” scenarios (e.g., varying levels of user growth, price changes, new product launches, etc).
You might thrive in this role if:
You are excited about building infrastructure at an incredible scale
You have depth and expertise in one or more of the following areas:GPU | CPU | Storage | Networking
You like to move fast, make decisions, and be held accountable
You can wear multiple hats and juggle technical, business and engineering considerations to make decisions
You have experience in AI/ML and/or cloud infrastructure
Making complex decisions with significant engineering, commercial, product and research implications, often with many billions of dollars involved
Want to work on a lean team, are a self-starter, can thrive in ambiguity