About the position
A well-established business is seeking to appoint a Senior Compute Systems Engineer
The Senior Computer Systems Engineer, will lead the compute and storage systems team and will report to the Site Reliability Engineering (SRE) Manager within Computing & Software, providing hands-on technical leadership in the design, implementation, and long-term operation and maintenance of secure, reliable, and high-performance computer systems infrastructure for the Telescopes hosted by company.
Qualifications:
- BTech in Computer Science, Software Engineering, Information Systems, Electronic
- Engineering or equivalent qualifications coupled with 13 years’ experience,
- BENG/MTech in Computer Science, Software Engineering, Information Systems,
- Electronic Engineering or equivalent qualifications coupled with 9 years’ experience,
- MENG in Computer Science, Software Engineering, Information Systems, Electronic
- Engineering or equivalent qualifications coupled with 7 years’ experience,
- PHD in Computer Science, Software Engineering, Information Systems, Electronic
- Engineering or equivalent qualifications coupled with 5 years’ experience.
Experience:
- 3+ years in a technical leadership or software/system architectural role with direct responsibility for large-/platform-scale distributed systems.
- Demonstrated hands-on experience in infrastructure design and automation, distributed systems, observability, CI/CD, container orchestration (e.g. Kubernetes), DevOps/SRE practices and cloud-native technologies.
- Experience leading teams or initiatives that intersect with data platforms, storage, networking, and systems engineering domains
Knowledge:
- In-depth understanding of systems engineering principles, including performance optimization, fault tolerance, and resource scheduling in Linux-based environments.
- Strong knowledge of containerized environments (Docker, Podman), orchestration platforms (Kubernetes, Helm), and runtime architectures (containerd, CRI).
- Expertise in infrastructure-as-code, continuous integration/deployment (CI/CD), and configuration management tools (e.g., GitLab CI, Ansible, Terraform, ArgoCD).
- Advanced understanding of distributed computing and storage architectures, including Ceph, S3, NFS, and local/clustered file systems.
- Operational and architectural fluency in relational and NoSQL database systems (e.g., PostgreSQL, MySQL, MongoDB), including replication, backups, and performancetuning.
- Working knowledge of networking fundamentals, security protocols, and systems level observability (e.g., Prometheus, Grafana, ELK/EFK stack).
- Familiarity with the HPC ecosystem (e.g., SLURM, job schedulers) is beneficial for environments supporting scientific or research computing.
Please note that if you have not received a response within 14 days of submitting your application that your application was unsuccessful.
However, please keep a lookout on our website, [URL Removed] for available positions which may be inline with your career aspirations.
Desired Skills:
- Infrastructure design and automation
- Systems engineering principles
- (CI/CD)
- and configuration