Search thousands of fresh jobs

×
This job is expired
National Research Foundation

SKA Mid - Senior Compute Systems Engineer at NRF National Research Foundation

National Research Foundation

  • R Undisclosed
  • Permanent Specialist position
  • Observatory
  • Posted 02 Oct 2025 by National Research Foundation
  • Expires in 11 days
  • Job 2623672 - Ref 1127

About the position

Postion Summary:
The SKA-Mid Senior Computer Systems Engineer, will lead the compute and storage systems team for SKA-Mid and will report to the SKA-Mid Site Reliability Engineering (SRE) Manager within SKA-Mid Computing & Software, providing hands-on technical leadership in the design, implementation, and long-term operation and maintenance of secure, reliable, and high-performance computer systems infrastructure for the Telescopes hosted by SARAO. While contributing to computing systems enablement, this role also focuses on shaping operational practices, supporting local delivery partnerships, and helping build the team that will manage computing systems operations as the telescopes transition from construction to steady-state operations. This role involves guiding infrastructure development, mentoring team members, and ensuring systems align with SRE principles. Responsibilities include deploying and optimising systems, managing faults, contributing to long-term infrastructure planning, and ensuring scalable, maintainable operations. The position plays a key role in cross-team collaboration, driving innovation while supporting sustainable and resilient computing environments.

Key Responsibilities:

  • Contribute to the global design and implementation of scalable and fault tolerant infrastructure systems that support engineering and operational needs
  • Contribute to the deployment, configuration, and maintenance of distributed storage and database systemsAnalyse system failures, performance issues, and misconfigurations across hardware, software, and network layers
  • Lead and mentor the computer systems engineers and contribute to strategic technical planning

Minimum Qualification:
  • Bachelors Degree / Advanced Diploma / NQF 7

Minimum Experience:
  • 5-13 years
  • BTech in Computer Science, Software Engineering, Information Systems, Electronic Engineering or equivalent qualifications coupled with 13 years’ experience, ORBENG/MTech in Computer Science, Software Engineering, Information Systems, Electronic Engineering or equivalent qualifications coupled with 9 years’ experience, ORMENG in Computer Science, Software Engineering, Information Systems, Electronic Engineering or equivalent qualifications coupled with 7 years’ experience, ORPHD in Computer Science, Software Engineering, Information Systems, Electronic Engineering or equivalent qualifications coupled with 5 years’ experience

Experience:
  • 3+ years in a technical leadership or software/system architectural role with direct responsibility for large-/platform-scale distributed systems
  • Demonstrated hands-on experience in infrastructure design and automation, distributed systems, observability, CI/CD, container orchestration (e
  • g
  • Kubernetes), DevOps/SRE practices and cloud-native technologies
  • Experience leading teams or initiatives that intersect with data platforms, storage, networking, and systems engineering domains

Knowledge:
  • In-depth understanding of systems engineering principles, including performance optimisation, fault tolerance, and resource scheduling in Linux-based environments
  • Strong knowledge of containerised environments (Docker, Podman), orchestration platforms (Kubernetes, Helm), and runtime architectures (containerd, CRI)
  • Expertise in infrastructure-as-code, continuous integration/deployment (CI/CD), and configuration management tools (e
  • g
  • , GitLab CI, Ansible, Terraform, ArgoCD)
  • Advanced understanding of distributed computing and storage architectures, including Ceph, S3, NFS, and local/clustered file systems
  • Operational and architectural fluency in relational and NoSQL database systems (e
  • g
  • , PostgreSQL, MySQL, MongoDB), including replication, backups, and performance tuning
  • Working knowledge of networking fundamentals, security protocols, and systems-level observability (e
  • g
  • , Prometheus, Grafana, ELK/EFK stack)
  • Familiarity with the HPC ecosystem (e
  • g
  • , SLURM, job schedulers) is beneficial for environments supporting scientific or research computing

Additional Notes:
Competency – Essential: Demonstrated technical leadership (3+ years), leading cross-functional efforts across systems, storage, and database infrastructure, driving technical decisions from architecture through [URL Removed] engineering expertise, with a focus on Linux administration, infrastructure automation, service orchestration, and performance optimisation across diverse [URL Removed] in distributed systems architecture, including the design and deployment of scalable, resilient services using microservices, event-driven, and cloud-native design [URL Removed] and orchestration fluency, including production-grade usage of Kubernetes, Docker, and Helm for system and application-level [URL Removed] automation and CI/CD, using tools such as GitLab CI, ArgoCD, FluxCD, Jenkins, or GitHub Actions to streamline and secure platform [URL Removed] DevOps and SRE practices, blending infrastructure-as-code, configuration management, and release automation (DevOps) with incident response, monitoring, SLIs/SLOs, and system reliability engineering (SRE).Linux expertise, including advanced troubleshooting, kernel tuning, systemd orchestration, and optimisation at [URL Removed] delivery and planning capabilities, including backlog scoping, cross-team collaboration, and Agile sprint [URL Removed] administration skills, with operational experience in administering relational and NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB), including high availability, backups, replication, and performance [URL Removed] skills, with a root-cause-first approach, and a strong bias for ownership, accountability, and long-term operational [URL Removed] leadership: Ability to lead architectural discussions, influence design decisions, and mentor junior engineers across infrastructure [URL Removed] Management/Leadership: Provides leadership that fosters an environment that encourages new ideas and provides support for the development of emerging skills. Creates trust by displaying consistency, understanding, integrity and patience. Plans, seeks, allocates and monitors resources to achieve outcomes. Problem solving and analysis: Skilled in root cause analysis, systems troubleshooting, and performance bottleneck [URL Removed] and Collaboration: Clear articulation of technical recommendations, cross-functional stakeholder engagement, feedback [URL Removed] and delivery: Proficient in backlog grooming, sprint planning, and technical delivery in Agile/DevOps [URL Removed] learning: Commitment to staying current with evolving technologies in containerisation, cloud-native systems, observability, and systems [URL Removed] and knowledge sharing: Ability to produce high-quality technical documentation and share knowledge across engineering [URL Removed] Collaborate within your team and with cross functional teams with our partners. Service Level Agreements (SLAs): Ability to interpret, monitor, and manage SLAs, warranties, and related contractual obligations and an understanding of operational frameworks such as Site Reliability Engineering (SRE), ITIL, and [URL Removed] Proficiency (This is not an exhaustive list, and additional relevant experience or skills will be viewed favourably):Containerisation & Orchestration: Kubernetes, Docker, Podman, Helm, containerdResource Management: SLURM (or other schedulers) Hardware & Infrastructure Acceleration: GPU & FPGA drivers Automation & Configuration Management: Ansible, Terraform, Bash, Python, Systemd, PackerCI/CD and Release Management: GitLab CI, GitHub Actions, Jenkins, Ansible Tower, ArgoCD/FluxCD (for infra), cron/at/systemd timersCloud, Virtualisation, and Bare-Metal Platforms: OpenStack, VMware vSphere/ESXi, Proxmox, KVM, AWS EC2/Storage, TerraformStorage & Filesystem Tools: Ceph, NFS, iSCSI, ZFS, Lustre, or relatedDatabase Operations (Operational DBA Tools): PostgreSQL CLI tools, MySQL, MongoDB, TimescaleDB, cron-based backups, or relatedMonitoring & Observability: Prometheus, Grafana, Zabbix, ELK stack, or [URL Removed] Notes: Organisational Values:The SKA-Mid Senior Compute Systems Engineer will be expected to demonstrate the SARAO and SKAO’s values, and to work actively to instil those behaviours in all SKA-Mid staff in South [URL Removed] values are:1. Diversity and Inclusion 2. Excellence3. Collaboration4. Creativity and Innovation5. SustainabilitySARAO’s values are:1. Passion for Excellence2. World-class service3. People-centered4. Respect5. Integrity and Ethics6. AccountabilityBoth SARAO and SKAO value and respect difference and are committed to building an inclusive culture by creating an environment where you can balance a successful career with your commitments and interests outside of work. We believe that you will do your best at work if you have a work / life balance. Some roles lend themselves to flexible options more than others, so if this is important to you, please raise this during your interview, as we are open to discussing flexible working opportunities during the hiring [URL Removed] NRF website provides more details on the initiatives and activities Applicants should submit a comprehensive CV by registering and apply online through the NRF Recruitment and Selection Portal. Applications should be accompanied by a letter of motivation indicating the applicant·s suitability for the position. The names and contact details of at least three referees should be provided.

Desired Skills:

  • Skilled in applied field of position
  • Knowledge to be relevant
  • Responsible in performing duties

About The Employer:

The National Research Foundation (NRF) (wwww.nrf.ac.za) supports and promotes research and human capital development through funding, the provision of National Research Facilities and science outreach platforms and programmes to the broader community in all fields of science and technology, including natural sciences, engineering, social sciences and humanities. The South African Radio Astronomy Observatory (SARAO) (www.sarao.ac.za) spearheads South Africa's activities in the Square Kilometre Array Radio Telescope, commonly known as the SKA, in engineering, science and construction. SARAO is a National Facility managed by the National Research Foundation and incorporates radio astronomy instruments and programmes such as the MeerKAT in the Karoo, the Hartebeesthoek Radio Astronomy Observatory (HartRAO) in Gauteng, the African Very Long Baseline Interferometry (AVN) programme in nine African countries as well as the associated human capital development and commercialisation endeavours. The Square Kilometre Array Observatory (SKAO) (www.skao.int) is a next-generation global radio-astronomy facility that will revolutionise our understanding of the Universe and the laws of fundamental physics. It is one observatory with two telescopes – SKA-Mid in South Africa and SKA-Low in Western Australia. South Africa is a co-host member of the SKAO, an intergovernmental organisation headquartered at Jodrell Bank (near Manchester in the United Kingdom) responsible for SKAO construction and operations globally.

National Research Foundation

Receive a daily digest of all new jobs matching this job. Your information is safe with us and you can cancel any time.

Expires in 10 days

Email me jobs similar to: SKA Mid - Senior Compute Systems Engineer at NRF National Research Foundation

Receive a daily digest of all new jobs matching this job: Senior IT Auditor. Your information is safe with us and you can cancel at any time.