Designed a PySpark-based Data Prep Pipeline with advanced data quality checks to generate Final MRD for segmentation analysis.
Migrated legacy .NET identity resolution pipeline to Apache Spark processing 800M records, then expanded to AWS and Azure (Databricks) — full multi-cloud support within 3 months.
Engineered a batch ingestion pipeline orchestrating data quality and identity microservices, reducing operational costs by 25% and improving scalability by 20%.
Developed and productized near real-time microservices for data cleaning, address standardization, and PII-based identity unification on Azure and GCP.
Built modern single-page applications using AngularJS and Node.js with modularised, maintainable front-end architecture.
Developed an application enabling data flow through ERP systems to identify customers and generate appropriate output files.
Built a configurable automation pipeline processing 400GB+ files daily, reducing processing time from 18–20 days to 2–3 days.
Created SSIS packages to populate data from multiple data feeds into databases using advanced transformations.
Re-engineered near real-time microservices from a legacy monolith — bulk processing dropped from 60+ hours to 2 minutes for 1M records.
Web-based sales platform for a major financial institution to sell products to customers, with configurable multi-environment deployment.
Key Highlights