AI Data Engineer

HYrEzy Tech Solutions

Hyderabad • Not disclosed

4 days ago

Hybrid

About the job

Role: AI Data Engineer • Location: Rai Durg, Hyderabad • Work mode- Hybrid model working (3 days work from office) • Experience: 5 - 8 Years (Minimum 5 years- AI Data Engineer) • Mandatory Skills: DVC (Data Version Control) and Airflow, Apache Spark, Flink, and Kafka, Advanced level Python and AI logic and Rust (or C++), Vector Database Mastery like configuration of HNSW indexes, scalar quantization, and metadata filtering strategies• Budget: 18 - 32 LPA• Qualification: Bachelor of Engineering - Bachelor of Technology (B.E./B.Tech.)• Notice period: Immediate / early joiners (Max. 15-30 days) • Interview Process: 2 - 3 Technical roundsImportant Note• We are currently prioritizing immediate / early joiners (maximum 15-30 days- notice period above 30 days will be automatically rejected.).• All mandatory technical skills must be clearly highlighted within the project descriptions in your resume, not just listed in the Skills or Roles & Responsibilities sections .Position OverviewWe are seeking a hardcore, hands-on AI Data Engineer to build the high-performance data infrastructure required to power autonomous AI agents. You won't just be moving data from A to B; you will be architecting Dynamic Context Windows, managing Real-time Semantic Indexes, and building Self-Cleaning Data Pipelines that feed our "Super Employee" agents.Key Responsibilities• Vector & Graph ETL: Design and maintain pipelines that transform unstructured data (PDFs, emails, logs, chats) into optimized embeddings for Vector Databases (Pinecone, Weaviate, Milvus).• Semantic Data Modeling: Engineer data structures that optimize for Retrieval-Augmented Generation (RAG), ensuring agents find the "needle in the haystack" in milliseconds.• Knowledge Graph Construction: Build and scale Knowledge Graphs (Neo4j) to represent complex relationships in our trading and support data that standard vector search misses.• Automated Data Labeling & Synthetic Data: Implement pipelines using LLMs to auto-label datasets or generate synthetic edge cases for agent training and evaluation.• Stream Processing for Agents: Build real-time data "listeners" (Kafka/Flink) that feed live context to agents, allowing them to react to market or support events as they happen.• Data Reliability & "Drift" Detection: Build monitoring for "Embedding Drift", identifying when the statistical distribution of your data changes and the agent's "knowledge" becomes stale.Qualifications• Vector Database Mastery: Expert-level configuration of HNSW indexes, scalar quantization, and metadata filtering strategies within Pinecone, Milvus, or Qdrant.• Advanced Python & Rust: Proficiency in Python for AI logic and Rust (or C++) for high-performance data processing and custom embedding functions.• Big Data Ecosystem: Hands-on experience with Apache Spark, Flink, and Kafka in a high-throughput environment (Trading/FinTech preferred).• LLM Data Tooling: Deep experience with Unstructured.io, LlamaIndex, or LangChain for document parsing and chunking strategy optimization.• MLOps & DataOps: Mastery of DVC (Data Version Control) and Airflow/Prefect for managing complex, non-linear AI data workflows.• Embedding Models: Understanding of how to fine-tune embedding models (e.g., BGE, Cohere, or OpenAI) to better represent domain-specific (Trading) terminology.Additional Qualifications• Chunking Strategy Architect: You don't just "split text." You implement Semantic Chunking and Parent-Child retrieval strategies to maximize LLM context relevance.• Cold/Warm/Hot Storage Strategy: Managing cost and latency by tiering data between Vector DBs (Hot), SQL/NoSQL (Warm), and S3/Data Lakes (Cold).• Privacy & Redaction Pipelines: Building automated PII (Personally Identifiable Information) redaction into the ingestion layer to ensure agents never "see" or "leak" sensitive user data.Why Join ?• Opportunity to lead transformative initiatives, modernizing legacy systems and shaping the future of trading technology.• Work with cutting-edge technologies in a dynamic, fast-paced environment.• Competitive compensation, professional growth opportunities, and the chance to work with industry-leading experts.

Requirements

DVC
Airflow
Apache Spark
Flink
Kafka
Python
Rust
Vector Database Mastery

Qualifications

Bachelor of Engineering
Bachelor of Technology

Preferred Technologies

DVC
Airflow
Apache Spark
Flink
Kafka
Python
Rust
Vector Database Mastery

Similar Jobs

AI Engineer

Kottayam•Not disclosed

Yesterday•Remote

AI Engineer

Tiruchirappalli•Not disclosed

3 days ago•Full-time

AI Data Knowledge Engineer

Innovapptive

Hyderabad•Not disclosed

2 weeks ago•On-Site