Role Summary

The Senior Data Engineer will build the data foundation for a large-scale AI platform. The role includes ingesting large datasets, creating reliable ETL/ELT pipelines, designing feature-ready tables, managing data quality and supporting model training and product analytics. This person should understand that good AI systems depend on disciplined data engineering.

The role requires hands-on experience with distributed data processing, batch pipelines, schema evolution, performance tuning and production-grade data operations. The right person can work with messy source data and turn it into trusted datasets that engineers, data scientists and product teams can use.

What You Will Do

Design and build scalable data ingestion, transformation and validation pipelines for structured, semi-structured and time-series data.
Work with large datasets using Databricks, Spark, lakehouse patterns, object storage and distributed processing frameworks.
Create curated tables, feature datasets, training labels, data quality checks and metadata needed for model development and product use cases.
Handle schema changes, late-arriving data, missing values, inconsistent identifiers, timestamp alignment and unit normalization.
Optimize data pipelines for cost, speed, reliability and repeatable backfills.
Collaborate with data scientists and ML engineers to ensure model training data is consistent, auditable and versioned.
Build monitoring and alerting for pipeline health, data freshness, data quality and downstream impact.
Document datasets, transformations, assumptions and lineage clearly.
Participate in architecture decisions around data warehouses, lakehouses, feature stores, orchestration and storage formats.
Support production incidents related to data quality, pipeline failures or performance bottlenecks.

Requirements and Skills

6+ years of data engineering experience with large-scale production datasets.
Strong SQL and Python experience for data transformation, validation and automation.
Hands-on experience with Apache Spark, Databricks or equivalent distributed processing environments.
Experience with ETL/ELT design, batch processing, incremental loads, partitioning, file formats such as Parquet/Delta and object storage.
Strong understanding of data modeling, schema evolution, data quality frameworks, lineage and reproducibility.
Experience with orchestration tools such as Airflow, Dagster, Databricks Workflows or similar systems.
Ability to tune data jobs for performance and cost, including memory, shuffle, partitioning and parallelism considerations.
Experience working with messy operational datasets and turning them into reliable analytical or model-ready tables.
Good software engineering practices, including Git, code review, tests, modular code and clear documentation.

Preferred Background

Experience building feature stores, model training datasets or ML data pipelines.
Experience with ClickHouse, PostgreSQL, Snowflake, BigQuery, Redshift, Kafka, Redis or similar data systems.
Experience with data contracts, Great Expectations, Deequ, dbt or comparable data validation tools.
Comfortable working with data scientists, backend engineers and product managers to translate business needs into data structures.

Startup Environment

This is a startup environment for people who want meaningful responsibility rather than narrowly defined corporate roles. Team members should expect exposure to multiple parts of the business, including product design, engineering decisions, customer problem solving, implementation planning and operational execution. The team will be small, highly technical and organized around talented builders who can work directly with one another without unnecessary layers of hierarchy. We expect people to use modern development tools aggressively, including coding assistants, automation, test tools, model tooling and sufficient token budgets where they improve speed and quality. The working style is flexible, but the expectations are high: clear ownership, written thinking, disciplined execution, frequent communication, clean handoffs and the ability to make progress without waiting for a complete corporate structure.

What Success Looks Like

Data pipelines are reliable, observable and easy to re-run or backfill.
Modeling and product teams trust the curated datasets and understand their definitions.
Large datasets are processed efficiently without uncontrolled cloud cost or fragile manual steps.

Job Category: Data Engineer

Job Location: Chennai

Senior Data Engineer – ETL, Databricks and Data Platforms

Apply for this position

Development Services

Support Services

We’re Here to Help Your Business Grow with Smart IT Solutions

Apply for this position

Development Services

Support Services

We’re Here to Help Your Business Grow with Smart IT Solutions

Start Your Free Consultation