Hi, I'm Om Nankar

Production-focused Data Scientist with 2+ years of experience engineering scalable ETL pipelines and deployment-ready ML systems. Specialized in automating complex workflows using Agentic AI and distributed computing.

Om Nankar
0.9847
∇loss
epoch:42
acc:97%

Experience

Where I've worked and the impact I've made

Data Science Intern

Chista

Aug 2025 – Present

Key Achievements:

  • Engineered an autonomous ingestion pipeline using CrewAI (Agentic LLMs) to parse unstructured healthcare data, reducing client onboarding time by 40%
  • Built a "Golden Layer" validation framework in PostgreSQL, implementing rigorous data quality checks that eliminated consistency errors across complex formulary datasets
  • Deployed a Traceability Audit Tool (using FastAPI/Streamlit) allowing stakeholders to visualize model lineage and source data
Python CrewAI PostgreSQL FastAPI Streamlit

Data Scientist

Wolters Kluwer

Jan 2023 – Aug 2024

Key Achievements:

  • Developed an automated pricing engine in Python, replacing a legacy 7-day manual process with a 1-click execution pipeline that reduced operational overhead by 70%
  • Designed scalable ETL workflows to aggregate attrition and revenue metrics, feeding real-time Power BI dashboards used by the Finance Centre of Excellence
  • Optimized SQL queries for high-volume financial reporting, restructuring data models to improve report generation speed for 10+ critical business metrics
Python SQL Power BI ETL

Research Intern (Computer Vision)

Symbiosis Centre for Applied AI

Jun 2021 – Aug 2023

Key Achievements:

  • Developed deep learning pipelines (PyTorch) for medical imaging (MRI) and public safety object detection, resulting in 7 peer-reviewed publications
  • Benchmarked detection architectures, optimizing model inference for edge cases in diverse datasets (dementia, Lyme disease)
Python PyTorch Computer Vision

Projects

Featured work and side projects

Distributed Big Data Recommendation Engine

Engineered a scalable ETL pipeline handling massive unstructured datasets using PySpark on a Hadoop cluster. Built a Hybrid Recommendation System combining collaborative filtering and content-based approaches for high-precision user preference prediction.

PySpark Hadoop Python
🔍

Real-Time Anomaly Detection System

Architected a streaming data pipeline using Apache Kafka for ingestion and Flask for serving, enabling sub-second latency anomaly detection. Deployed unsupervised learning models (Isolation Forests) via REST APIs, containerized with Docker.

Kafka Flask Docker REST APIs
🌍

CO₂ Forecasting Engine

Developed an end-to-end CO₂ emissions forecasting pipeline by integrating unsupervised learning and predictive modeling techniques. Applied K-Means and Agglomerative Clustering to segment regions based on historical emission patterns and climate indicators. Built and compared time-series and machine learning models, including Holt-Winters, ARIMA, Linear Regression, SVR, and Random Forest, evaluating performance using MAE and RMSE to identify the most robust approach and generate cluster-specific forecasts for climate trend analysis.

Python Scikit-learn ARIMA Clustering

Skills & Technologies

Tools and technologies I work with daily

Languages
Python
SQL
R
R
Data Engineering
PySpark
Apache Kafka
Airflow
dbt
dbt
PostgreSQL
Neo4j
Excel
Machine Learning & AI
TensorFlow
PyTorch
🦜
LangChain
🤖
CrewAI
Scikit-learn
Cloud, DevOps & Visualization
AWS
Docker
GitHub
Databricks
FastAPI
Power BI
Power BI
Tableau
Tableau
Streamlit
Streamlit
Looker
Looker
Methodologies & Expertise
Machine Learning
A/B Testing
Causal Reasoning
Statistical Methods
Business Acumen
Machine Learning
A/B Testing
Causal Reasoning
Statistical Methods
Business Acumen

Education

My academic journey

M.S. in Data Science

University at Buffalo

2024 – 2026 (Expected)

B.Tech

Symbiosis Institute of Technology

2019 – 2023

Achievements

Awards, certifications & recognition

🏆 Academic Excellence Award

Recognized for outstanding academic performance and research contributions in Data Science.

📝 7 Peer-Reviewed Publications

Published research on deep learning pipelines for medical imaging (MRI) and public safety object detection.

Let's Connect

I'm always open to interesting conversations and opportunities

Let’s Grab a Virtual Coffee

I’m always excited to chat about data science, AI, research opportunities, or just have a great conversation. Pick a time that works for you!

Schedule a Meeting