About the position
Key Responsibilities: Qualifications and Experience Requirements:
Desired Skills:
- Ability to lead architectural discussions
- influence design decisions
Desired Work Experience:
- 5 to 10 years
Desired Qualification Level:
- Degree
About The Employer:
- Knowledge:
- In-depth understanding of systems engineering principles, including performance optimisation, fault tolerance, and resource scheduling in Linux-based environments.
- Strong knowledge of containerised environments (Docker, Podman), orchestration platforms (Kubernetes, Helm), and runtime architectures (containerd, CRI).Expertise in infrastructure-as-code, continuous integration/deployment (CI/CD), and configuration management tools (e.g., GitLab CI, Ansible, Terraform, ArgoCD).
- Advanced understanding of distributed computing and storage architectures, including Ceph, S3, NFS, and local/clustered file systems.
- Operational and architectural fluency in relational and NoSQL database systems (e.g., PostgreSQL, MySQL, MongoDB), including replication, backups, and performance tuning.
- Working knowledge of networking fundamentals, security protocols, and systems-level observability (e.g., Prometheus, Grafana, ELK/EFK stack).
- Familiarity with the HPC ecosystem (e.g., SLURM, job schedulers) is beneficial for environments supporting scientific or research computing.
Competencies (Essential):
- Demonstrated technical leadership (3+ years): Proven ability to lead cross-functional initiatives across systems, storage, and database infrastructure, driving technical decisions from architecture through to implementation.
- Systems engineering expertise: Strong background in Linux administration, infrastructure automation, service orchestration, and performance optimisation across diverse environments.
- Distributed systems architecture: Extensive experience in designing and deploying scalable, resilient services using microservices, event-driven, and cloud-native design patterns.
- Containerisation and orchestration: Proficient in production-grade environments using Kubernetes, Docker, and Helm for both system and application deployments.
- Infrastructure automation and CI/CD: Hands-on experience with tools such as GitLab CI, ArgoCD, FluxCD, Jenkins, or GitHub Actions to enhance and secure platform operations.
- DevOps and SRE practices: Solid understanding of infrastructure-as-code, configuration management, and release automation (DevOps), alongside incident response, monitoring, SLIs/SLOs, and system reliability engineering (SRE).
- Advanced Linux expertise: Skilled in troubleshooting, kernel tuning, systemd orchestration, and large-scale system optimisation.
- Technical delivery and planning: Experience in backlog management, cross-team collaboration, and Agile sprint execution.
- Database administration: Practical experience managing both relational and NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB), including high availability, backups, replication, and performance tuning.
- Strong diagnostic and problem-solving skills: Ability to adopt a root-cause-first approach, with a strong sense of ownership, accountability, and focus on long-term operational stability.
Skills:
- Technical leadership: Ability to lead architectural discussions, influence design decisions, and mentor junior engineers across infrastructure streams.
- Resource management and leadership: Demonstrates leadership that fosters innovation and supports the development of emerging skills. Builds trust through consistency, integrity, understanding, and patience, while effectively planning, allocating, and monitoring resources to achieve desired outcomes.
- Problem-solving and analytical skills: Strong capability in root cause analysis, systems troubleshooting, and resolving performance bottlenecks.
- Communication and collaboration: Ability to clearly articulate technical recommendations, engage with cross-functional stakeholders, and effectively incorporate feedback.
- Planning and delivery: Proficient in backlog grooming, sprint planning, and delivering technical solutions within Agile and DevOps environments.
- Continuous learning: Committed to staying up to date with evolving technologies, particularly in containerisation, cloud-native systems, observability, and systems automation.
- Documentation and knowledge sharing: Skilled in producing high-quality technical documentation and effectively sharing knowledge across engineering teams.