Data Scientist / Data Engineer

Confidential

New Delhi • Not disclosed

Yesterday

On-Site

About the job

We're hiring a Data Scientist / Data Engineer to help us turn raw data into reliable datasets, insights, and models that drive real decisions. This role blends strong data engineering (pipelines, quality, orchestration) with hands-on data science (analysis, experimentation, forecasting, ML when needed). You'll work closely with product and engineering teams to build data products that are accurate, scalable, and actionable. What you'll do • Design and build end-to-end data pipelines (batch and, if applicable, streaming). • Collect, clean, transform, and model data into well-structured datasets for analytics and ML. • Develop and maintain a data warehouse / lake model (dimensional modeling, data marts, curated layers). • Implement data quality checks, observability, lineage, and monitoring. • Perform exploratory analysis and deliver insights via dashboards, notebooks, and stakeholder-ready summaries. • Build and deploy ML models when needed (forecasting, churn / segmentation, anomaly detection, recommendations). • Run experiments / A / B testing support (metrics definitions, evaluation, statistical validity). • Collaborate with backend teams to define event schemas, tracking plans, and data contracts. • Optimize performance and cost across storage, compute, and queries. Must-have skills • Strong SQL and solid programming skills (Python preferred). • Experience building pipelines using tools like Airflow / Dagster / Prefect (or equivalent). • Strong knowledge of data modeling (star schema, slowly changing dimensions, event modeling). • Experience with at least one of: PostgreSQL / MySQL / BigQuery / Snowflake / Redshift. • Proven ability to validate data correctness and implement data quality frameworks. • Comfortable communicating insights and technical trade-offs to non-technical stakeholders. Nice-to-have skills • Streaming: Kafka / Kinesis / PubSub, real-time processing (Spark Streaming / Flink). • Big data: Spark, distributed compute, partitioning strategies. • Lakehouse: Iceberg / Delta / Hudi, object storage (S3 / GCS / Azure Blob). • MLOps: MLflow, model monitoring, feature stores, deployment pipelines. • BI: Superset / Power BI / Looker / Metabase, semantic layers. • Cloud: AWS / Azure / GCP (IAM, networking basics, managed data services). • Experience with privacy / security compliance (PII handling, retention policies, access controls). What we value • Ownership: you build reliable systems, not just one-off scripts. • Curiosity: you ask the 'why' behind metrics and propose better approaches. • Practicality: you can balance speed vs correctness and deliver iteratively. • Strong collaboration with engineers, product, and leadership.