About the position
Own the end-to-end infrastructure landscape for Conversational AI platforms (e.g. ConvAIS).
Design, implement, and operate secure, scalable, and cost-efficient AI runtime environments.
Act as the technical backbone between product, development, data science, and operations.
Define infrastructure standards, reference architectures, and guardrails for conversational AI use cases.
Ensure platform reliability, performance, and availability, including incident support and root cause analysis.
Enable fast but compliant onboarding of new AI use cases and teams.
Drive automation first: CI/CD, environment provisioning, monitoring, and recovery.
Contribute to platform roadmap discussions with a strong infrastructure and operational perspective.
Act as a multiplier: mentoring engineers and spreading infrastructure and DevOps maturity.
Minimum Requirements:
Qualifications/Experience:
Degree in Computer Science, Information Technology, Engineering, or comparable practical experience.
5+ years of professional experience in infrastructure, platform engineering, or DevOps roles.
Proven track record operating mission-critical platforms in an enterprise environment.
Hands-on experience with cloud-native architectures and Kubernetes in production.
Practical exposure to AI/ML platform support is required; conversational AI experience is a strong plus.
Experience working in cross-functional, international teams.
Strong problem-solving mindset with the ability to balance speed, stability, and compliance.
Essential Skills Requirements:
Cloud infrastructure engineering (Azure preferred) with a focus on high-availability, scalable AI platforms (Kubernetes, container orchestration, networking, IAM).
Strong hands-on experience with Kubernetes (AKS), Helm, and platform-level CI/CD pipelines.
Solid understanding of conversational AI architectures (LLM-based services, APIs, grounding layers, vector stores).
Infrastructure-as-Code expertise (Terraform, ARM/Bicep) for reproducible and compliant environments.
Security-by-design mindset: identity, secrets management, network isolation, and secure service communication.
Observability fundamentals: logging, metrics, tracing for AI workloads (latency, token usage, cost drivers).
Strong collaboration skills with Dev, Data Science, and Product to translate functional requirements into resilient infrastructure.
Advantageous Skills Requirements:
Experience operating enterprise-grade AI platforms under regulatory, data protection, and compliance constraints.
Knowledge of cost optimisation for AI workloads (GPU/CPU trade-offs, scaling strategies, usage-based charging).
Exposure to MLOps / LLMOps concepts (model deployment, versioning, prompt lifecycle, evaluation).
Familiarity with event-driven architectures (Kafka, Azure Event Hub) in AI-driven systems.
Experience with cross-region / multi-environment setups (DEV, INT, PREPROD, PROD).
Ability to coach engineers and shape platform engineering best practices.
Desired Skills:
- Cloud infrastructure engineering
- Kubernetes
- AI architectures