
Incident Response Engineer-Facility Operations Center
The role coordinates incident response, maintenance, vendor support, disaster recovery tests, and reporting for NVIDIA's datacenter portfolio. It requires developing reliability programs (including problem/change control, health scoring, RAM/RCM studies), driving automation and process improvements, conducting root cause analyses, and working with ML/AI teams on failure prediction. Candidates should have a bachelor's degree or equivalent, 5+ years of datacenter operations or EHS experience, strong statistical and reporting skills, and proficiency with DCIM/asset databases and office productivity tools.
