
Senior Deep Learning Software Engineer, Inference
NVIDIA is hiring a Senior Deep Learning Software Engineer (Inference) to design, build, and optimize GPU-accelerated software and inference pipelines for LLMs and generative/multimodal models. The role involves contributing to and optimizing frameworks (vLLM, SGLang, FlashInfer), using tools like CUDA, CUTLASS, Triton, and NCCL, and collaborating across teams. Requirements include a Master's/PhD or equivalent experience and 5+ years of relevant software development experience with strong C/C++ skills; Python and production inference experience are a plus.





