About the position
Job Description:
The IT Operations Engineer is responsible for delivering reliable infrastructure and exceptional user support through 24x7 monitoring, proactive management, and white-glove service. This role is critical to maintaining our mission-critical financial services environment, where uptime is paramount.
The engineer manages the complete lifecycle of both end-user systems and production infrastructure —from initial setup and onboarding through daily operations, maintenance, and eventual offboarding.
This position demands strong technical expertise, independent decision-making capabilities, and the ability to exercise sound judgment when responding to critical incidents in a fast-paced, high availability environment where every minute of downtime has significant business impact.
Responsibilities:
Production Infrastructure Monitoring and Incident Response
- Monitor critical production systems 24x7 to ensure optimal performance and availability
- Exercise independent judgment in assessing incident severity and determining appropriate response strategies
- Respond to infrastructure alerts and incidents with urgency and precision, making real-time decisions on escalation paths and resolution approaches
- Perform root cause analysis and implement corrective actions to prevent recurring issues
- Evaluate system behaviour patterns and make independent determinations on necessary interventions
- Coordinate with development teams during critical incidents and outages, serving as the technical authority for infrastructure decisions
- Document incident response procedures and maintain incident management records
- Participate in an on-call rotation to provide round-the-clock infrastructure support
Infrastructure Maintenance and Security Patching
- Design and execute planned maintenance windows for servers, network equipment, and applications
- Evaluate security patch criticality and make independent decisions on deployment timing and prioritisation
- Apply security patches and updates to maintain system currency and compliance
- Perform routine system health checks and preventive maintenance tasks
- Manage backup systems and validate backup integrity regularly
- Coordinate with the security team to implement security controls and remediation efforts
- Maintain accurate configuration management and documentation
- Assess infrastructure requirements and recommend improvements to system architecture
Service Request Fulfillment
- Process and fulfill IT service requests through the ITSM platform with attention to SLA compliance
- Coordinate software installations, license management, and application access requests
- Support the complete lifecycle of hardware and software assets including procurement, deployment, configuration, and decommissioning
- Manage vendor relationships for equipment repairs and service delivery
- Evaluate and determine appropriate solutions for complex user requests requiring technical analysis
- Create and maintain knowledge base articles and user documentation
- Provide training and guidance to end users on IT tools and best practices
- Design and implement onboarding and offboarding processes for both end-user systems and production infrastructure
Core Requirements
- 3+ years of experience in IT operations or a similar technical support role
- Demonstrated ability to exercise independent judgment in high-pressure situations and make critical decisions affecting system availability
- Experience with Windows and Linux server environments, virtualization technologies
- Strong understanding of network protocols, TCP/IP, DNS, DHCP, and VPN technologies
- Hands-on experience with monitoring tools (e.g., Nagios, Datadog, PRTG, or similar)
- Proficiency with ITSM platforms (ServiceNow, Jira Service Management, or similar)
- Experience with Active Directory, Office 365, and enterprise security tools
- Knowledge of backup and disaster recovery procedures
- Proven analytical and problem-solving abilities with capacity to assess complex technical issues and determine optimal solutions
- Excellent communication skills and customer service orientation
- Ability to work flexible hours, including on-call rotation
Preferred Qualifications
- Experience in financial services or other high-availability environments
- Knowledge of cloud platforms (AWS, Azure) and hybrid infrastructure
- CompTIA A+, Network+, Security+ or equivalent certifications
- Experience with automation and scripting (PowerShell, Python, Bash)
- Familiarity with the ITIL framework and best practices
- Understanding of foreign exchange markets and trading platform requirements
Desired Skills:
- Systems Analysis
- Complex Problem Solving
- Programming/configuration
- Critical Thinking
- Time Management