We are looking for a highly skilled and experienced L2 Data Engineer to join our growing Data & Analytics team. In this role, you will lead the design, development, optimization, and maintenance of scalable enterprise data platforms and cloud-native data solutions. You will work closely with architects, analysts, and business stakeholders to build high-performance data pipelines and modern lakehouse solutions that support advanced analytics, reporting, and data-driven decision-making.
This opportunity is ideal for a senior data professional with strong hands-on expertise in Databricks and the Microsoft Azure ecosystem, who is passionate about building reliable, scalable, and optimized data platforms in enterprise environments.
KEY RESPONSIBILITIES
• Design, develop, and optimize enterprise-scale data pipelines and ETL/ELT workflows using Azure and Databricks technologies.
• Architect and implement scalable data ingestion, transformation, and orchestration processes using Azure Data Factory, Databricks, and Azure Synapse Analytics.
• Develop high-performance data transformation frameworks using PySpark, Python, and Spark SQL for large-scale distributed data processing.
• Optimize SQL queries, Spark jobs, and data workflows to improve performance, scalability, and cost efficiency.
• Lead data migration initiatives, including SQL Server migrations and modernization of legacy data platforms.
• Implement and maintain Delta Lake architecture, incremental data loading strategies, and enterprise data lake best practices.
• Collaborate with architects and cross-functional teams to design robust and scalable data models aligned with business and governance standards.
• Monitor and troubleshoot production pipelines, perform root-cause analysis, and implement preventive measures for recurring issues.
• Support CI/CD implementation and infrastructure automation for data engineering workflows.
• Mentor junior engineers and contribute to engineering standards, reusable frameworks, and technical best practices.
• Create and maintain technical documentation including architecture diagrams, pipeline documentation, and operational runbooks.
• Evaluate and recommend modern data engineering tools, frameworks, and optimization strategies.
Requirements
5+ years of professional experience in Data Engineering or related roles.
• Strong expertise in Python for enterprise data processing, transformation, and automation.
• Advanced hands-on experience with Pandas, PySpark, and Spark SQL for large-scale distributed processing.
• Strong experience with Databricks, including cluster management, notebook development, workflow orchestration, Delta Lake, and performance optimization.
• Extensive experience building and managing enterprise data pipelines using Azure Data Factory.
• Strong working knowledge of Azure Synapse Analytics, particularly Spark pool integration and enterprise data warehousing concepts.
• Advanced SQL skills including query optimization, performance tuning, indexing strategies, and troubleshooting.
• Strong understanding of data lake architecture, Delta Lake, incremental processing, partitioning, and lakehouse concepts.
• Experience implementing data governance, security, access controls, and monitoring within cloud data platforms.
• Experience handling production support, troubleshooting, and optimization of enterprise data platforms.
NICE TO HAVE
• Experience with Terraform for Azure infrastructure provisioning and Infrastructure-as-Code (IaC).
• Experience implementing CI/CD pipelines for data engineering deployments.
• Exposure to Lakehouse Federation, Delta Sharing, and modern data sharing architectures.
• Experience with streaming and near real-time data processing solutions.
• Knowledge of DevOps practices and cloud cost optimization strategies.
CERTIFICATION REQUIREMENT
Candidates are expected to hold or be actively working toward the Databricks Certified Data Engineer Professional certification. This certification validates advanced expertise across the following domains:
• Advanced ETL and ELT development using Spark SQL and PySpark
• Enterprise-grade pipeline orchestration and optimization
• Data modeling and scalable lakehouse architecture
• Performance tuning and distributed data processing optimization
• Advanced data governance and security implementation
• Production-grade data engineering practices within the Databricks ecosystem

