About the position
Design, develop, test, and maintain robust data pipelines and ETL/ELT processes on Databricks (Delta Lake, Spark, SQL, Python/Scala/SQL notebooks).
Architect scalable data models and data vault/ dimensional schemas to support reporting, BI, and advanced analytics.
Implement data quality, lineage, and governance practices; monitor data quality metrics and resolve data issues proactively.
Collaborate with Data Platform Engineers to optimise cluster configuration, performance tuning, and cost management in cloud environments (Azure Databricks).
Build and maintain data ingestion from multiple sources (RDBMS, SaaS apps, files, streaming queues) using modern data engineering patterns (CDC, event-driven pipelines, change streams, Lakeflow Declarative Pipelines).
Ensure data security and compliance (encryption, access controls) in all data pipelines.
Develop and maintain CI/CD pipelines for data workflows; implement versioning, testing, and automated deployments.
Minimum Requirements:
Education:
Degree in Computer Science, Software Engineering, or related field.
3+ years of experience in Data Engineering.
Technical knowledge:
Expertise with Apache Spark (PySpark), Databricks notebooks, Delta Lake, and SQL.
Strong programming skills in Python for data processing.
Experience with cloud data platforms (Azure) and their Databricks offerings; familiarity with object storage (ADLS).
Proficient in building and maintaining ETL/ELT pipelines, data modelling, and performance optimisation.
Knowledge of data governance, data quality, and data lineage concepts.
Experience with CI/CD for data pipelines, and orchestration tools (GitHub Actions, Asset Bundles or Databricks’ jobs).
Strong problem-solving skills, attention to detail, and ability to work in a collaborative, cross-functional team.
Desired Skills:
- Expertise with Apache Spark
- Python
- Cloud Data Platforms