Data Engineer
- Designed and implemented distributed ETL pipelines using PySpark and Scala for high-throughput, fault-tolerant processing.
- Developed proactive monitoring and alerting using Airflow, CloudWatch, and Python scripts to keep data workflows observable and recoverable.
- Orchestrated complex data workflows in Apache Airflow, improving scheduling, dependency handling, and operational visibility.
- Implemented validation and reconciliation frameworks that improved data accuracy and trust across analytics platforms.
- Introduced incremental processing and partitioning strategies that improved efficiency and reduced compute costs.