
Senior Software Engineer – TensorRT Edge-LLM
Develop and optimize a modern C++ inference framework extending TensorRT for autoregressive LLM serving (speculative decoding, LoRA, MoE, KV cache). Work includes CUDA kernel/operator development, compiler/runtime optimizations, performance benchmarking and collaboration across CUDA, compilers, and robotics teams. Requires 4+ years experience and deep familiarity with transformer inference techniques and GPU/CUDA development.








