- Design, develop, and maintain scalable batch and streaming data pipelines using Python (strong OOP design), Apache Spark (PySpark, Spark SQL), and Azure Databricks.
- Design and implement real-time and near-real-time streaming platforms using Apache Kafka or Apache Flink, with stream processing powered by Apache Flink or Spark Structured Streaming, supporting event-time processing, windowing, stateful transformations, checkpointing, and exactly-once semantics.
- Build and manage Medallion architecture (Bronze, Silver, Gold) on Azure Data Lake Storage (ADLS Gen2) using Delta Lake.
- Implement data governance, access control, and lineage using Databricks Unity Catalog.
- Develop and integrate data services and streaming consumers/producers within a microservices-based architecture.
- Deploy and operate data and streaming workloads on Azure Kubernetes Service (AKS) using Dockerized applications.
- Optimize Spark and streaming workloads through partitioning strategies, Z-Ordering, caching, broadcast joins, state tuning, and query optimization.
- Implement data quality, validation, deduplication, late-arriving data handling, and schema evolution for both batch and streaming pipelines.
- Collaborate with data analysts, data scientists, backend engineers, and DevOps teams to deliver end-to-end, production-grade data platforms.
- Design secure data solutions using Azure IAM, RBAC, Key Vault, private endpoints, and network isolation.
- Build and maintain CI/CD pipelines using GitLab for Databricks notebooks, Spark jobs, streaming applications, microservices, and infrastructure deployments.
- Automate infrastructure and platform provisioning using Infrastructure as Code (Terraform and Pulumi).
- Implement monitoring, logging, and alerting using Azure Monitor, Log Analytics, Databricks metrics, Kubernetes metrics, and streaming platform monitoring.
- Deliver curated, analytics-ready datasets to support reporting and dashboards in Power BI.
- Participate in Agile/Scrum ceremonies, perform code reviews, and mentor junior data engineers.
- Ensure adherence to coding standards, performance best practices, data governance, and cloud-native architecture principles.
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- 3–8 years of experience in Data Engineering, with hands-on Azure experience.
- Strong proficiency in Python with solid OOP concepts.
- Extensive experience with Apache Spark (PySpark, Spark SQL) and Azure Databricks.
- Hands-on experience with streaming platforms, such as:
- Apache Kafka OR Apache Flink
- Spark Structured Streaming
- Familiarity with microservices-based architectures, including service communication patterns and event-driven designs.
- Hands-on or working experience with Azure Kubernetes Service (AKS) for deploying and managing containerized workloads.
- Deep understanding of Delta Lake (ACID transactions, schema enforcement, time travel).
- Strong SQL skills for analytics and performance tuning.
- Experience with Azure services:
- ADLS Gen2, Azure Databricks, Azure Functions, Service Bus, Key Vault
- Strong experience with Git and GitLab CI/CD.
- Experience with Infrastructure as Code using Terraform and Pulumi.
- Knowledge of data modeling techniques (Star Schema, Data Vault, SCD Type 1/2).
- Solid understanding of distributed systems, fault tolerance, and streaming guarantees.
- Experience with both Kafka and Flink in production environments.
- Exposure to Kafka Schema Registry, event versioning, or CDC pipelines.
- Hands-on experience with Databricks Unity Catalog.
- Exposure to DevSecOps practices and secure data platform design.
- Experience or exposure to Generative AI / LLM-based data applications (RAG, embeddings, vector stores, Azure OpenAI).
- Experience with large-scale enterprise or regulatory data platforms.
Key Responsibilities
· Design and build multi-page, interactive Power BI dashboards featuring drill-throughs, bookmarks, custom visuals, and role-based filters to deliver tailored, actionable insights for diverse business users.
· Develop complex DAX measures, calculated tables, and time-intelligence functions to support dynamic KPI reporting and scenario modeling directly within the Power BI data model.
· Leverage Power Query (M-Query) to perform advanced data transformations—such as unpivoting, merging, conditional logic, and parameterization—ensuring source data is cleansed and shaped for optimal dashboard performance.
· Configure and maintain on-premises or cloud Power BI Gateways, automate incremental and full refresh schedules, and troubleshoot connectivity issues to guarantee up-to-date reporting.
· Optimize report and data model performance through query folding, aggregation tables, star-schema design patterns, and best-practice visuals to minimize load times and enhance user experience.
· Partner closely with stakeholders to elicit requirements, design storyboard prototypes, and iterate on layouts and visuals based on user feedback and adoption metrics.
· Architect and implement end-to-end ETL pipelines in Azure Data Factory, orchestrating data ingestion, transformation, delivery, error handling, and monitoring for reliable data flows.
· Develop and maintain PySpark notebooks on Azure Databricks to execute complex KPI logic, data cleansing, and aggregation workflows, supplemented by SQL scripts to validate ETL outputs and reconcile key metrics.
· Provision, secure, and govern data in Azure Data Lake Storage Gen2 via Unity Catalog, managing access controls, data lineage, schema evolution, and audit trails in compliance with organizational policies.
· Operate within an Agile/Scrum framework—participating in sprint planning, daily stand-ups, backlog grooming, and retrospectives—and perform root cause analyses to identify data quality and performance issues, implementing preventive measures.
Required Qualifications
Technical Skills required:
- Power BI Desktop and Service, DAX, Power Query (M),
- Python, Apache PySpark, SQL (T-SQL, ANSI),
- Azure Data Factory, Databricks, ADLS Gen2, Unity Catalog, CI/CD tools (Git, Azure DevOps), data modeling, performance tuning.
Soft Skills required:
- Strong problem-solving skills and attention to detail.
- Excellent communication and collaboration skills.
- Adaptability to changing priorities,
· Collaboration in cross-functional teams,
Time management, and a continuous-learning attitude.
