Data Engineering JD

Full Time
Madurai
Posted 2 months ago

Website AuroiTech Digital Solutions

About the Role

We are seeking a skilled Data Engineer to design, build, and optimize scalable data pipelines and platforms supporting analytics, machine learning, and business intelligence initiatives. The ideal candidate will have strong experience across modern data engineering technologies, with expertise in data processing (batch and streaming), data governance, and healthcare data standards (HL7/FHIR).

Key Responsibilities

Design, develop, and maintain end-to-end data pipelines for batch and real-time processing using Apache Spark, Flink, and Kafka.
Integrate, clean, and transform data from multiple sources (structured, semi-structured, and unstructured) into a centralized data warehouse or data lake.
Implement data quality frameworks, metadata management, and data governance practices, ensuring consistency, accuracy, and compliance.
Develop and orchestrate ETL/ELT workflows using Apache Airflow and dbt.
Optimize query performance and data storage using Trino (Presto), Druid, and Hadoop ecosystem tools.
Collaborate with cross-functional teams (data scientists, analysts, and architects) to enable scalable data-driven solutions.
Work with HL7 and FHIR standards for healthcare data integration and interoperability.
Implement CI/CD practices for data pipelines using DevOps tools and shell scripting on Linux environments.
Deploy and manage data infrastructure in AWS, GCP, or Azure.
Monitor and troubleshoot data workflows, ensuring reliability and cost efficiency.

Required Skills and Experience

4-6 years of core experience in Data engineering.
Programming: Strong in Python and SQL.
Data Engineering Frameworks: Apache Spark (batch + streaming), Flink, Kafka, Airflow, dbt, Hadoop, and Trino.
Databases: Experience with relational databases and optionally NoSQL.
Data Management: Expertise in data quality, data governance, and metadata management.
Healthcare Data Standards: Experience with HL7 and FHIR formats, data models, and APIs.
DevOps: Familiarity with Linux, shell scripting, and CI/CD
Cloud Platforms: Experience in at least one—AWS (Glue, EMR, Redshift), GCP (Dataflow, BigQuery), or Azure (Synapse, Data Factory).
Experience with monitoring and observability using Prometheus, Grafana, and Loki for performance tracking and alerting of data systems.
Strong understanding of data warehousing concepts, ETL design patterns, and distributed data systems.

Nice to Have

Experience with containerization (Docker, Kubernetes).
Knowledge of data observability tools (Monte Carlo, Great Expectations, etc.).
Familiarity with security and compliance in healthcare (HIPAA, PHI).
Exposure to machine learning pipeline integration.

To apply for this job email your details to renold@auroitech.org