Senior Data & AI Engineer – Data & AI Platform
We are looking for a Senior Data & AI Engineer to design, build, and operationalize scalable AI/ML solutions within our Data and AI Platform. This role focuses on enabling enterprise-wide AI adoption by developing robust data pipelines, deploying ML models, and building reusable AI services aligned with platform standards.
You will work at the intersection of Data Engineering, Machine Learning, and Platform Architecture, ensuring AI solutions are production-ready, governed, observable, secure, and cost-efficient.
Key Responsibilities
1. AI/ML Solution Development
- Design and implement scalable Machine Learning and Generative AI solutions on Databricks and Azure/AWS
- Build and optimize feature engineering pipelines and training workflows
- Develop reusable AI components, APIs, and enterprise AI services
2. Platform & MLOps Enablement
- Build and maintain end-to-end ML pipelines including training, validation, deployment, and monitoring
- Implement CI/CD pipelines for ML models and AI applications
- Integrate AI workloads with tools such as:
- Databricks
- MLflow
- Unity Catalog
3. Data Engineering & Integration
- Collaborate with Data Engineers to ensure high-quality and governed datasets
- Work with both structured and unstructured data across batch and streaming pipelines
- Optimize data pipelines for performance, scalability, and cost efficiency
4. AI Governance & Observability
- Implement model monitoring, drift detection, and performance tracking
- Ensure compliance with AI governance, security, and regulatory standards
- Enable auditability, lineage tracking, and observability for AI models
Required Skills & Qualifications
Advanced Data Engineering – Databricks Ecosystem
Databricks Platform Expertise
Strong hands-on experience with the Databricks ecosystem, including:
- Delta Lake
- Delta Live Tables
- Unity Catalog
- Databricks Workflows
- Serverless Compute
- Mosaic AI
- Model Serving Endpoints
Data Pipeline Engineering
Build scalable batch and streaming pipelines using:
- Apache Spark
- Structured Streaming
- Apache Kafka
Additional expertise required in:
- Medallion Architecture (Bronze / Silver / Gold)
- CDC implementation and incremental processing strategies
- Schema evolution and data contract enforcement
Performance Optimization
- Query tuning and execution plan optimization
- Partitioning, clustering, and Z-ordering
- Cost optimization for Databricks workloads
- Autoscaling and workload sizing strategies
Foundation Models & Generative AI Engineering
Foundation Model Experience
Hands-on experience with enterprise usage of:
- OpenAI models
- Anthropic models
- Meta Llama family
- Google Gemini
- Mistral AI models
- Open-source deployment patterns
Model Engineering
- Fine-tuning and instruction tuning
- LoRA / QLoRA optimization
- Quantization strategies
- Model serving and inference optimization
LLMOps
- Prompt lifecycle management
- Evaluation pipelines for LLM outputs
- Hallucination mitigation
- Guardrails and output validation
Retrieval-Augmented Generation (RAG) & Knowledge Systems
RAG Architecture
- Design and implementation of production-grade RAG systems
- Hybrid retrieval strategies
- Context window optimization
- Chunking and embedding strategies
Vector Databases
Hands-on experience with:
- FAISS
- Pinecone
- Milvus
- Databricks Vector Search
Embedding Systems
- Embedding model selection and benchmarking
- Semantic search optimization
- Knowledge indexing pipelines
AI Agent & Multi-Agent Systems
Agentic AI Development
- Design of autonomous and semi-autonomous AI agents
- Tool-calling orchestration
- Workflow-based reasoning systems
- Memory management strategies
Multi-Agent Systems (MAS)
- Agent coordination patterns
- Task decomposition frameworks
- Agent collaboration protocols
- Conflict resolution strategies
Experience with:
- LangChain
- LangGraph
- AutoGen
AI Communication Protocols & Integration Standards
Model Context Protocol (MCP)
- Building and deploying MCP-compliant servers
- Tool exposure and orchestration
- Secure model-tool interaction design
Agent-to-Agent (A2A) Protocols
- Understanding inter-agent communication patterns
- Event-driven agent coordination
- Distributed AI workflow orchestration
API & Service Mesh Integration
- REST / gRPC service integration
- Event-driven AI services
- Enterprise service bus integration
Modern AI Platform Engineering
Serving Infrastructure
- Real-time inference serving
- Batch inference pipelines
- Autoscaling inference endpoints
Containerization & Orchestration
- Docker
- Kubernetes
- Helm deployment patterns
Emerging AI Platform Standards
- Experience with MCP server development
- Exposure to A2A communication protocols
- AI agent orchestration platforms
- Enterprise-grade tool-use governance
Databricks AI Platform
Experience with:
- Mosaic AI Agent Framework
- Databricks Vector Search
- AI Gateway
- Lakehouse Monitoring
Nice-to-Have Use Case Exposure
- AI copilots for data engineering workflows
- Autonomous data quality remediation
- Metadata-aware AI systems
- Semantic data discovery
- Agentic ETL orchestration
- Natural language-to-SQL systems
- Enterprise knowledge assistants
- LLM-powered observability platforms
