
Principal Data Scientist - Agent Builder
Lead evaluation and quality efforts for Elastic's Search Conversational Experiences team, defining metrics, offline/online evaluation, LLM-as-judge calibration, and A/B testing. Drive improvements across retrieval, ranking, vector search, RAG, and agent tooling; partner with engineering to productionize evaluation pipelines and telemetry. Role requires 8+ years applied DS/ML experience and hands-on experience with Python, PyTorch/Transformers, IR/NLP, and Elasticsearch.












