About the position
Design, build and maintain scalable, secure and performant data pipelines and ETL processes to support analytics and ML use-cases.
Develop data models and data warehouses / lakes that support reporting, analytics and machine learning needs.
Implement and operate data processing using Spark / PySpark and other big data technologies.
Build, train and validate machine learning and deep learning models and support their productionisation within MLOps frameworks.
Work with AWS services (S3, Glue, Lambda, RDS, VPC, IAM, etc.) to implement robust cloud-native data solutions.
Orchestrate data workflows, schedule jobs and manage dependencies to ensure timely, reliable data delivery.
Minimum Requirements:
Qualifications/Experience:
Bachelor’s degree Data Science, Computer Science, Software Engineering, or equivalent relevant hands-on experience.
Minimum 4 years’ hands-on experience in data science and/or data engineering roles, including production deployments.
Demonstrated experience working with AWS data services and building scalable data platforms and production ML solutions.
Essential Skills Requirements:
Strong knowledge of Data Science Fundamentals including statistics, machine learning, and deep learning.
Proficiency in Python (Python 3.x) and PySpark for building data and ML pipelines.
Experience designing and implementing ETL processes and scalable data pipelines.
Solid experience with AWS data services: S3, Glue, Lambda, RDS, and networking components (VPCs, Subnets, Security Groups) and IAM.
Hands-on experience with databases (SQL and NoSQL) and data modelling for analytical systems.
Familiarity with big data frameworks such as Apache Spark (and awareness of Hadoop ecosystems).
Experience with data pipeline orchestration and workflow tools (scheduling, dependency management).
Practical skills in performance tuning for data storage, query performance and processing optimisation.
Strong analytical skills for working with large and complex datasets and data validation.
Excellent collaboration and communication skills to translate technical concepts to non-technical stakeholders.
Advantageous Skills Requirements:
Experience with streaming technologies and real-time data processing (e.g., Kafka, Kinesis).
Familiarity with containerisation and orchestration (Docker, Kubernetes) and cloud deployment patterns.
Experience with BI tools and data preparation for visualization platforms (e.g., Tableau).
Knowledge of MLOps practices: model versioning, CI/CD for models, monitoring and model lifecycle management.
Familiarity with infrastructure-as-code and DevOps tooling (Terraform, CloudFormation, GitOps).
Experience with advanced data governance, security practices and compliance in cloud environments.
Experience with AI productivity and assistive tools while retaining strong ability to validate and optimise AI outputs.
Prior exposure to Extreme Programming (XP) practices within Agile teams (pair programming, test-first).
Experience with scripting (Bash / Shell, PowerShell) for automation and operational tasks.
Experience in technical data modelling and schema design (not drag-and-drop approaches).
Coaching and giving training to fellow colleagues and users when required.
Problem solving capabilities.
Strong presentation skills.
Desired Skills:
- Data Science Fundamentals
- Python (Python 3.x)
- PySpark