About the position
Our client is looking for a Senior-Level DevOps Engineer to join their engineering team. This role is suited to a highly experienced, hands-on, and technically strong DevOps professional with deep cloud infrastructure expertise and a passion for building and maintaining scalable, high-availability production environments.
The successful candidate will take ownership of complex multi-cloud infrastructure, lead deployment and monitoring strategies, support mission-critical production systems, and collaborate closely with development, QA, and engineering teams to ensure reliable, secure, and efficient platform operations across global environments.
Key Responsibilities:
- Design, implement, maintain, and optimise highly available multi-cloud infrastructure environments across AWS and supporting cloud platforms
- Manage and scale production workloads across multiple AWS regions with a strong focus on uptime, reliability, and security
- Build, maintain, and improve Infrastructure-as-Code using Terraform across Development, Testing, and Production environments
- Design and maintain CI/CD pipelines using Jenkins and deployment orchestration tools such as Spinnaker, ArgoCD, or Harness
- Implement and manage Blue/Green and Red/Black deployment strategies, including rollback and artifact promotion processes
- Administer and optimise AWS RDS/Aurora MySQL environments, including upgrades, migrations, backups, restores, and performance tuning
- Manage and monitor messaging systems such as RabbitMQ, including scaling consumers and load balancing using HAProxy and Nginx
- Monitor infrastructure health using Prometheus, Grafana, ELK Stack, and related monitoring tools
- Troubleshoot complex production issues, conduct root cause analysis, and lead post-mortem investigations to reduce MTTR
- Perform advanced Linux administration, Bash scripting, networking troubleshooting, and performance optimisation
- Support platform deployments and debugging across PHP, Python, and JavaScript-based services
- Collaborate with software engineers and product teams to ensure smooth deployments and operational excellence
- Contribute to infrastructure architecture, technical strategy, scalability planning, and cost optimisation initiatives
- Participate in on-call rotations and act as an escalation point during production incidents
- Maintain and improve AI/ML infrastructure pipelines, GPU workloads, and distributed processing environments where applicable
Requirements:
- 5+ years’ hands-on DevOps and cloud infrastructure experience
- Advanced AWS experience managing production systems across multiple regions
- Strong expertise with:
- EC2
- RDS/Aurora (MySQL)
- VPC design, routing, peering, and ACLs
- IAM roles and policies
- S3 and CloudFront
- Security Groups
- Extensive Terraform experience, including:
- Modular infrastructure design
- Remote state management
- Environment separation
- Infrastructure code reviews and refactoring
- Strong Jenkins pipeline creation and CI/CD automation experience
- Experience with deployment orchestration tools such as Spinnaker, ArgoCD, or Harness
- Experience implementing Blue/Green or Red/Black deployment methodologies
- Strong MySQL database administration experience, including:
- Production upgrades and migrations
- Backup and restore procedures
- Performance tuning
- Proven RabbitMQ production support experience
- Experience with HAProxy and Nginx load balancing
- Strong monitoring and logging experience using Prometheus, Grafana, ELK Stack, or equivalent
- Proven production incident response and on-call support experience
- Advanced Linux administration skills (Ubuntu CLI)
- Strong Bash scripting and troubleshooting capabilities
- Solid networking fundamentals
- Comfortable supporting and debugging:
- PHP applications
- Python automation and AI integrations
- JavaScript-based deployment environments
- Experience with Docker or containerised environments
- Exposure to multi-cloud infrastructure environments (AWS and GCP preferred)
- Experience operating high-availability systems with 24/7 uptime requirements
- Exposure to AI/ML infrastructure, GPU workloads, or video/media processing systems (advantageous)
- AWS certifications (advantageous)
Technical & Professional Skills:
- Advanced AWS cloud infrastructure management
- Strong Terraform and Infrastructure-as-Code expertise
- CI/CD pipeline architecture and deployment automation
- Database administration and performance optimisation
- Monitoring, observability, and incident response management
- Linux systems administration and troubleshooting
- Messaging systems and distributed architecture support
- Infrastructure scalability and cost optimisation
- Strong networking and load balancing knowledge
- Experience supporting AI/ML and high-throughput environments
- Multi-cloud platform exposure and operational support
Preferred Qualifications:
- Tertiary qualification in Computer Science, Information Technology, Engineering, or a related field
- Relevant AWS, DevOps, or cloud certifications
- Experience working in fast-paced Agile or product-based environments
- Experience operating large-scale, customer-facing production systems
Key Competencies:
- Strong analytical and troubleshooting abilities
- High attention to detail and operational excellence
- Strong communication and collaboration skills
- Ability to work effectively under pressure in high-availability environments
- Strong ownership mentality and accountability
- Proactive, solution-driven mindset
- Ability to lead during critical production incidents
- Passion for automation, scalability, and continuous improvement
- Strong mentoring and knowledge-sharing approach
For more exciting IT vacancies, visit:
Network Recruitment International IT Jobs
We also specialise in recruiting for:
- Software Developers (Back-End, Front-End, Full Stack)
- Mobile Developers
- Business & Systems Analysts
- BI & SQL Experts
- UI/UX Professionals
- Data Scientists & Data Analysts
- Big Data Professionals
- Cloud Experts
- Infrastructure Specialists
- DevOps & SecOps Engineers
- Cybersecurity Specialists
- SEO / Digital Designers
Please note: If you have not received feedback within two weeks, please consider your application unsuccessful. Your profile will remain in our database for future opportunities.
For more information, contact:
Reinie Du Preez
Senior Specialist Recruitment Consultant
[Email Address Removed]
Desired Skills:
- devops
- engineer
- aws
- docker
- kubernetes