Senior Data Engineer

Building Scalable Data Solutions | ETL Specialist | Big Data Expert

4.5+ Years Experience
About

Transforming Complex Data Challenges into Efficient Solutions

Senior Data Engineer with extensive experience in designing and implementing robust data architectures. Specialized in building scalable ETL pipelines and optimizing data workflows for large-scale systems. Strong background in distributed computing and big data technologies.

🚀

Performance Optimization

Reduced ETL processing time by 90% through advanced optimization techniques

💡

Innovation

Developed custom data quality framework reducing manual testing effort by 80%

📈

Scale

Managed data pipelines processing over 10GB of data daily with 99.9% uptime

$ whoami
Senior Data Engineer & Solution Architect
$ cat expertise.txt
• ETL Pipeline Design & Optimization • Data Warehouse Architecture • Real-time Data Processing • Data Quality & Governance • Cloud Infrastructure (Azure) • Big Data Technologies
Experience

Software Engineer ETL

R1 RCM

Sep 2022 - Present

  • Designed and maintained ETL pipelines (SQL, Python, SSIS, Azure Databricks, Pyspark) for data transformation, ensuring data accuracy, reliability, and accessibility for BI.
  • Analyzed complex datasets to identify patterns, trends, and insights, informing business decisions and optimizing data processing/analysis.
  • Collaborated with cross-functional teams to maintain data consistency throughout the ETL pipeline, meeting business requirements.
  • Proactively identified and resolved data quality issues using various tools and techniques, delivering high-quality data for stakeholders.
  • Migrated ETL solutions, slashing workload from 3 FTE to 0.5 FTE while maintaining quality through 90% workflow efficiency gains.
Azure Databricks PySpark Python SQL SSIS

Quality Engineer ETL

Assimilate Solutions, A SitusAMC Company

Jan 2021 - Sep 2022

  • Owned data validation for data service scrum team, ensuring data integrity using SQL queries and automated Python scripts on Snowflake.
  • Validated Informatica mappings and workflows for accurate data processing and loading.
  • Documented test cases and results for efficient data quality tracking and monitoring.
  • Collaborated with onshore teams and stakeholders to meet data quality requirements.
Python SQL Snowflake Informatica

Associate Analyst

GlobalLogic

Oct 2019 - Jan 2020

  • Leveraged crowd-sourced data to train Google Lens for rich shopping experiences.
  • Collaborated on training Google Lens models for enhanced shopping experiences.
  • Improved team efficiency and quality through data analysis and reporting.
Data Analysis Machine Learning Google Lens
Skills

Programming Languages

Python PySpark Scala SQL T-SQL

Databases

Snowflake PostgreSQL SQL Server MySQL

System Design

Database Design ETL Pipeline Integration Design

ETL Tools

SSIS Informatica Azure Databricks Azure Data Factory

Reporting

Tableau Power BI Excel

File Handling

TXT XML TAB CSV Excel JSON Parquet

Python Libraries

Pandas SQLAlchemy Pyodbc NumPy Scikit-learn TensorFlow Dask Google-pubsub-v1 Datacompy Pytest Pandera Pysnooper Black Flake8

Tools & Technologies

Bitbucket GitHub Jira Azure Agile Scrum Postman API ChatGPT AI Prompt Microsoft Office Kafka Zookeeper Debezium Docker
Certifications
🐍

Data Science with Python

Data Analysis Specialization

❄️

Snowflake

The Complete Masterclass

🚀

Python Bootcamp

From Zero to Hero in Python

Scala 3

Complete Development Masterclass

📊

Power BI

Essential Training

📈

Tableau

Essential Training

Projects

ETL - Batch & Historical

Enterprise-scale ETL pipeline for efficient data transfer between delta tables and on-premise SQL server.

  • Created ETL pipeline leveraging multiple technologies for data transfer
  • Optimized load time using Python concurrency and parallel Databricks workflows
  • Implemented comprehensive data validation and quality checks
  • Added webhooks for notifications and dashboard monitoring
PySpark Azure Databricks SQL Server Unity Catalog Git

ETL - DataStreaming

Real-time data streaming pipeline using Google Pub/Sub for efficient data distribution.

  • Built real-time streaming pipeline with Google Pub/Sub integration
  • Implemented data transformation and multi-server distribution
  • Added parallel processing and retry mechanisms
  • Orchestrated via Windows service with auto-recovery
Python SQL Server Google Pub/Sub Pandas SQLAlchemy

ML - Stock Prediction Model

Machine learning pipeline for stock market analysis and prediction using multiple models.

  • Developed ML pipeline for Yahoo Finance data extraction
  • Implemented data cleansing and technical indicator generation
  • Trained and maintained multiple prediction models
  • Created action recommendation system based on model predictions
Python Scikit-learn Keras XGBoost yfinance

Testing Tool - ETL

Python GUI application for automated ETL validation and testing.

  • Developed automated ETL validation GUI tool
  • Implemented smoke testing and standard checks
  • Added support for heterogeneous data source comparison
  • Automated validation summary reporting
Python Pandas Dask SQLAlchemy Snowflake
Contact

Let's Build Something Amazing

I'm always interested in hearing about new projects and opportunities in data engineering and analytics.