About the job
About the Role: Grade Level (for internal use): 11 Lead AI Engineer (Agentic Systems) Role Summary As the Lead AI Engineer (Agentic Systems), you will help architect and build the organization’s next generation of autonomous AI workflows. This is a multidisciplinary technical role operating at the intersection of Software Engineering, Data Engineering, and Machine Learning Engineering. You will move beyond simple "chatbots" to design production-grade Agentic Systems: intelligent applications capable of reasoning, planning, and executing complex tasks autonomously. Responsibilities Agentic Systems Architecture & Core Engineering • Architect & Build Multi-Agent Workflows: Lead the hands-on design and coding of stateful, production-grade agentic systems using Python and orchestration frameworks like LangGraph, CrewAI, or AutoGen. • Agent-to-Agent (A2A) Communication: Design and implement robust A2A protocols enabling autonomous agents to collaborate, hand off sub-tasks, and negotiate execution paths dynamically within multi-agent environments. • State Management & Orchestration: Engineer robust control flows for non-deterministic agents; implement complex message passing, memory persistence, and interruptible state handling to support long-running autonomous tasks. • Tool Interface Design (MCP): Implement and standardize the Model Context Protocol (MCP) to create universal interfaces between agents, data sources, and operational tools, ensuring modularity and scalability. • Model Integration & Optimization: Utilize proxy services (i.e. LiteLLM) to manage model routing and fallback strategies; optimize context windows and inference costs across proprietary and open-source models. • Production Deployment: Containerize agentic workloads using Docker and orchestrate deployments on Kubernetes; leverage AWS AgentCore or similar cloud-native services for scalable infrastructure. Data Engineering & Operational Real-Time Integration • Build Agent Data Pipelines: Write and maintain high-throughput ingestion pipelines (using Databricks or Python-based ETL) that transform raw operational signals into structured context for agents. • Real-Time Context Injection: Ensure agents have access to "operational real-time" data (seconds/minutes latency) by optimizing retrieval architectures and vector store performance. Cross-Functional Engineering • Act as the technical bridge between Data Engineering and AI teams; translate complex agent requirements into concrete data schemas and pipeline specifications, while stepping in to resolve hands-on bottlenecks in data availability. Observability, Governance & Human-in-the-Loop • LLMOps & Tracing: Implement comprehensive observability using tools like Langfuse to trace agent reasoning steps, monitor token usage, and debug latency issues in production. • Safety & Control Frameworks: Design hybrid execution modes ranging from Human-in-the-Loop (HITL) for sensitive operations to fully autonomous execution; build "break-glass" mechanisms and guardrails for automated decision-making. • Evaluation & Reliability: Establish technical standards for testing non-deterministic outputs; automate evaluation pipelines to measure agent accuracy, hallucination rates, and drift before deployment. Technical Leadership & Strategy • Technical Roadmap Definition: Partner with Product and Engineering leadership to scope feasibility for autonomous projects; define the "Agentic Architecture" roadmap. • Mentorship & Standards: Define code quality standards, architectural patterns, and PR review processes for the AI engineering team; upskill team members on the latest agentic frameworks and methodologies. • Innovation: Proactively prototype with emerging tools (e.g., new reasoning models, graph-based RAG) to solve high-value business problems, moving successful experiments into the production roadmap. Qualifications Required • Experience: 7+ years of total technical experience in Software Engineering, Data Engineering, or Machine Learning. • GenAI Specialization: 2+ years of specific experience building and deploying LLM-based applications or Agentic Systems in production. • Database & Lakehouse Mastery: Experience architecting storage layers for AI, including Vector Databases (e.g., Pinecone, Weaviate, Qdrant), NoSQL/Relational Databases (PostgreSQL, DynamoDB), and modern Data Lakehouses (specifically Databricks or Snowflake). • Cloud & Infrastructure: Expertise in cloud architecture and container orchestration (AWS, GCP, or Azure) using Kubernetes and Docker. You must be comfortable deploying and scaling your own applications. • LLM Ecosystem: Familiarity with common LLM frameworks and orchestration libraries (e.g., LangGraph, LangChain, CrewAI, AutoGen). You understand the mechanics of RAG, embeddings, and context window management. • Hybrid Engineering Skillset: A unique blend of Data Science (understanding model behavior, probability, and prompting) and Software Engineering (CI/CD, API design, asynchronous programming, and system reliability). • Language Proficiency: Advanced proficiency in Python for systems engineering, capable of writing modular, testable, and maintainable production code. Preferred • Advanced Education: Master’s degree or PhD in Computer Science, Artificial Intelligence, or a related quantitative field. • NLP Expertise: 5+ years of hands-on experience in Natural Language Processing (NLP), ranging from foundational techniques (e.g., text processing, embeddings, classification) to modern architectures. • Graph Technologies: Experience with Knowledge Graphs (e.g., Neo4j, AWS Neptune), Graph Databases, and GraphML (Graph Machine Learning) to support complex reasoning and relationship modeling. • Agentic Tooling: Specific experience with LangGraph, LiteLLM, Langfuse, AWS AgentCore, or implementing the Model Context Protocol (MCP). • Advanced Architectures: Proven track record of implementing Agent-to-Agent (A2A) communication, swarm intelligence, or multi-modal agent workflows. • Real-Time Operations: Experience working in environments requiring operational real-time processing (e.g., FinTech, Energy, Logistics).
Requirements
- Python
- Agentic Systems
- Cloud Architecture
- Data Engineering
Qualifications
- 7+ years of technical experience
- GenAI Specialization
Preferred Technologies
- Python
- Agentic Systems
- Cloud Architecture
- Data Engineering
Benefits
- Health & Wellness
- Flexible Downtime
- Continuous Learning
- Invest in Your Future
- Family Friendly Perks
- Beyond the Basics
About the company
S&P Global enables businesses and governments with trusted data, expertise, and technology to make decisions with conviction. The company is focused on creating benchmarks, data, and insights to support sustainable value creation.
Similar Jobs
Lead AI Engineer
S&P Global
AI Lead Engineer
Barclays
Lead AI Engineer
S&P Global