About the position
Postion Summary:
The SKA-Mid Senior Computer Systems Engineer, will lead the compute and storage systems team for SKA-Mid and will report to the SKA-Mid Site Reliability Engineering (SRE) Manager within SKA-Mid Computing & Software, providing hands-on technical leadership in the design, implementation, and long-term operation and maintenance of secure, reliable, and high-performance computer systems infrastructure for the Telescopes hosted by SARAO. While contributing to computing systems enablement, this role also focuses on shaping operational practices, supporting local delivery partnerships, and helping build the team that will manage computing systems operations as the telescopes transition from construction to steady-state operations. This role involves guiding infrastructure development, mentoring team members, and ensuring systems align with SRE principles. Responsibilities include deploying and optimising systems, managing faults, contributing to long-term infrastructure planning, and ensuring scalable, maintainable operations. The position plays a key role in cross-team collaboration, driving innovation while supporting sustainable and resilient computing environments.
Key Responsibilities:
Minimum Qualification:
Minimum Experience:
Experience:
Knowledge:
Additional Notes:
Competency – Essential: Demonstrated technical leadership (3+ years), leading cross-functional efforts across systems, storage, and database infrastructure, driving technical decisions from architecture through [URL Removed] engineering expertise, with a focus on Linux administration, infrastructure automation, service orchestration, and performance optimisation across diverse [URL Removed] in distributed systems architecture, including the design and deployment of scalable, resilient services using microservices, event-driven, and cloud-native design [URL Removed] and orchestration fluency, including production-grade usage of Kubernetes, Docker, and Helm for system and application-level [URL Removed] automation and CI/CD, using tools such as GitLab CI, ArgoCD, FluxCD, Jenkins, or GitHub Actions to streamline and secure platform [URL Removed] DevOps and SRE practices, blending infrastructure-as-code, configuration management, and release automation (DevOps) with incident response, monitoring, SLIs/SLOs, and system reliability engineering (SRE).Linux expertise, including advanced troubleshooting, kernel tuning, systemd orchestration, and optimisation at [URL Removed] delivery and planning capabilities, including backlog scoping, cross-team collaboration, and Agile sprint [URL Removed] administration skills, with operational experience in administering relational and NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB), including high availability, backups, replication, and performance [URL Removed] skills, with a root-cause-first approach, and a strong bias for ownership, accountability, and long-term operational [URL Removed] leadership: Ability to lead architectural discussions, influence design decisions, and mentor junior engineers across infrastructure [URL Removed] Management/Leadership: Provides leadership that fosters an environment that encourages new ideas and provides support for the development of emerging skills. Creates trust by displaying consistency, understanding, integrity and patience. Plans, seeks, allocates and monitors resources to achieve outcomes. Problem solving and analysis: Skilled in root cause analysis, systems troubleshooting, and performance bottleneck [URL Removed] and Collaboration: Clear articulation of technical recommendations, cross-functional stakeholder engagement, feedback [URL Removed] and delivery: Proficient in backlog grooming, sprint planning, and technical delivery in Agile/DevOps [URL Removed] learning: Commitment to staying current with evolving technologies in containerisation, cloud-native systems, observability, and systems [URL Removed] and knowledge sharing: Ability to produce high-quality technical documentation and share knowledge across engineering [URL Removed] Collaborate within your team and with cross functional teams with our partners. Service Level Agreements (SLAs): Ability to interpret, monitor, and manage SLAs, warranties, and related contractual obligations and an understanding of operational frameworks such as Site Reliability Engineering (SRE), ITIL, and [URL Removed] Proficiency (This is not an exhaustive list, and additional relevant experience or skills will be viewed favourably):Containerisation & Orchestration: Kubernetes, Docker, Podman, Helm, containerdResource Management: SLURM (or other schedulers) Hardware & Infrastructure Acceleration: GPU & FPGA drivers Automation & Configuration Management: Ansible, Terraform, Bash, Python, Systemd, PackerCI/CD and Release Management: GitLab CI, GitHub Actions, Jenkins, Ansible Tower, ArgoCD/FluxCD (for infra), cron/at/systemd timersCloud, Virtualisation, and Bare-Metal Platforms: OpenStack, VMware vSphere/ESXi, Proxmox, KVM, AWS EC2/Storage, TerraformStorage & Filesystem Tools: Ceph, NFS, iSCSI, ZFS, Lustre, or relatedDatabase Operations (Operational DBA Tools): PostgreSQL CLI tools, MySQL, MongoDB, TimescaleDB, cron-based backups, or relatedMonitoring & Observability: Prometheus, Grafana, Zabbix, ELK stack, or [URL Removed] Notes: Organisational Values:The SKA-Mid Senior Compute Systems Engineer will be expected to demonstrate the SARAO and SKAO’s values, and to work actively to instil those behaviours in all SKA-Mid staff in South [URL Removed] values are:1. Diversity and Inclusion 2. Excellence3. Collaboration4. Creativity and Innovation5. SustainabilitySARAO’s values are:1. Passion for Excellence2. World-class service3. People-centered4. Respect5. Integrity and Ethics6. AccountabilityBoth SARAO and SKAO value and respect difference and are committed to building an inclusive culture by creating an environment where you can balance a successful career with your commitments and interests outside of work. We believe that you will do your best at work if you have a work / life balance. Some roles lend themselves to flexible options more than others, so if this is important to you, please raise this during your interview, as we are open to discussing flexible working opportunities during the hiring [URL Removed] NRF website provides more details on the initiatives and activities Applicants should submit a comprehensive CV by registering and apply online through the NRF Recruitment and Selection Portal. Applications should be accompanied by a letter of motivation indicating the applicant·s suitability for the position. The names and contact details of at least three referees should be provided.
Desired Skills:
- Skilled in applied field of position
- Knowledge to be relevant
- Responsible in performing duties
About The Employer:
The National Research Foundation (NRF) (wwww.nrf.ac.za) supports and promotes research and human capital development through funding, the provision of National Research Facilities and science outreach platforms and programmes to the broader community in all fields of science and technology, including natural sciences, engineering, social sciences and humanities. The South African Radio Astronomy Observatory (SARAO) (www.sarao.ac.za) spearheads South Africa's activities in the Square Kilometre Array Radio Telescope, commonly known as the SKA, in engineering, science and construction. SARAO is a National Facility managed by the National Research Foundation and incorporates radio astronomy instruments and programmes such as the MeerKAT in the Karoo, the Hartebeesthoek Radio Astronomy Observatory (HartRAO) in Gauteng, the African Very Long Baseline Interferometry (AVN) programme in nine African countries as well as the associated human capital development and commercialisation endeavours. The Square Kilometre Array Observatory (SKAO) (www.skao.int) is a next-generation global radio-astronomy facility that will revolutionise our understanding of the Universe and the laws of fundamental physics. It is one observatory with two telescopes – SKA-Mid in South Africa and SKA-Low in Western Australia. South Africa is a co-host member of the SKAO, an intergovernmental organisation headquartered at Jodrell Bank (near Manchester in the United Kingdom) responsible for SKAO construction and operations globally.