Highly motivated Data Engineer with over 4+ years of experience and a strong Computer Science background. Proven ability to leverage analytical skills and technical expertise to design and implement data solutions. Collaborative team player with a passion for knowledge sharing and a focus on achieving success through teamwork.
Designed and maintained ETL pipelines for data transformation.
Analyzed complex datasets to inform business decisions.
Collaborated with cross-functional teams to ensure data consistency.
Proactively resolved data quality issues for stakeholders.
Migrated ETL solutions, reducing workload and improving efficiency.
Collaborate with the client to architect scalable data solutions.
2
Assimilate Solutions,A SitusAMC Company - Quality Engineer ETL | Jan 2021 - Sep 2022
Ensured data integrity through SQL queries and Python scripts.
Validated Informatica mappings and workflows.
Documented test cases and results for data quality tracking.
Collaborated with teams to meet data quality requirements.
3
GlobalLogic - Associate Analyst | Oct 2019 - Jan 2020
Leveraged crowd-sourced data to train Google Lens for rich shopping experiences.
Collaborated on training Google Lens models for enhanced shopping experiences.
Improved team efficiency and quality through data analysis and reporting.
Expertise
1
Data Pipelines
Building the data highway. I automate the flow of information, seamlessly extracting data from various sources, transforming it for clarity, and loading it for efficient analysis.
2
Data Modeling
Designing the blueprint for insights. I organize and connect data elements, creating a structured model that empowers clear reporting, fuels powerful analysis, and ultimately informs data-driven decisions.
3
Data Validation
Ensuring data accuracy for reliable analysis. I safeguard data quality by checking for errors, inconsistencies, and missing values.
4
Data Visualization
Transforming data into insights with impactful visuals. I create clear and engaging charts and graphs to reveal trends, patterns, and guide better decision-making.
This Python pipeline (in Docker containers) captures real-time changes (inserts, updates, deletes) from SQL Server using Debezium. Debezium feeds these changes as messages to Kafka, a streaming platform. A Python consumer reads and processes these messages for further analysis or action. This leverages Docker for portability and Kafka for scalable, real-time data flow.
2
ETL-Pipeline
Leveraged multithreading techniques for parallel processing tasks, along with robust error handling featuring retry logic. Implemented comprehensive logging to facilitate efficient troubleshooting. Proficient in crafting automated alerts for success/failure notifications.
Developed stock prediction model using historical Yahoo Finance data. Employed technical indicators & probability for signal generation. Implemented feature selection & trained/evaluated multiple ML models. Optimized model performance through retraining & utilized saved models for real-time prediction.