Loading Roadmap…
Complete Roadmap

Your Data Science
Learning Journey

A structured path from Python fundamentals to production ML β€” 18 modules, 5 phases, designed to build skills progressively with real-world projects at every step.

18
Modules
2,840+
Notebook Cells
1,000+
Code Examples
5
Learning Phases

Visual Learning Path β€” follow left to right, top to bottom

Phase 1
🐍 Python Basics
πŸ”’ NumPy
🌐 APIs
πŸ”€ Git
πŸ—„οΈ SQL
β†’
Phase 2
🐼 Pandas
⚑ Polars
🧹 Feature Eng.
πŸ“ Statistics
🎲 Bayesian
β†’
Phase 3
πŸ“Š Matplotlib
🎨 Seaborn
✨ Plotly
β†’
Phase 4
πŸ€– Scikit-learn
πŸ“ˆ Time Series
πŸ’¬ NLP
β†’
Phase 5
🧠 Deep Learning
πŸš€ Streamlit
βš™οΈ MLOps
1

Foundations

The essential building blocks every data scientist needs before anything else

🐍
Module 00 Β· Start Here
Python Basics

Master Python from scratch β€” variables, control flow, functions, OOP, error handling, generators, async/await, and file I/O.

Variables & Types Functions & OOP List Comprehensions Decorators Async/Await
πŸ”’
Module 01 Β· After Python
NumPy

Numerical computing with N-dimensional arrays β€” vectorized operations, broadcasting, linear algebra, FFT, and random number generation.

Needs: Python Basics
ndarray Broadcasting Linear Algebra Vectorization
🌐
Module 02 Β· After Python
APIs & Data Collection

Consume and build APIs β€” HTTP fundamentals, requests, JSON parsing, authentication, pagination, rate limiting, async calls, FastAPI, and web scraping.

Needs: Python Basics
requests REST & JSON Pagination FastAPI
πŸ—„οΈ
Module 03 Β· Parallel Track
SQL

Query relational databases β€” SELECT, JOINs, CTEs, window functions, subqueries, indexes, views, triggers, and SQLite with Python.

Needs: Python Basics
SELECT & JOINs Window Fns CTEs Query Optimization
πŸ”€
Module 04 Β· Early Foundation
Git & Version Control

Master Git from basics to advanced β€” branching, merging, pull requests, rebasing, hooks, LFS for large files, and GitHub Actions CI/CD for data science teams.

Needs: Python Basics
Branching Pull Requests .gitignore GitHub Actions
2

Data Analysis

Load, clean, transform, and explore real-world datasets at scale

🐼
Module 03 Β· Core Tool
Pandas

The industry-standard data analysis library β€” DataFrames, groupby, merge/join, datetime, string ops, pivot tables, and cleaning pipelines.

Needs: Python NumPy
DataFrame GroupBy Merge & Pivot Time Series
⚑
Module 04 Β· Modern Alternative
Polars

Next-generation DataFrame library β€” multi-threaded execution, LazyFrame query optimizer, Arrow-native, and 10–100Γ— faster than pandas on large data.

Needs: Python Pandas (helpful)
Lazy API Expressions Parquet & Arrow Streaming
πŸ“
Module 05 Β· Theory Foundation
Statistics & Probability

The mathematical backbone of ML β€” distributions, hypothesis tests, Bayesian inference, bootstrap methods, multiple testing correction, and survival analysis.

Needs: NumPy Pandas
Hypothesis Tests Distributions Bootstrap Bayesian
🧹
Module 06 Β· Critical Skill
Data Cleaning & Feature Engineering

Handle missing data, outliers, inconsistencies, encoding, scaling, and create powerful features β€” the skill that improves model performance more than algorithms.

Needs: Pandas Statistics
Missing Data Encoding Scaling Pipelines
🎲
Module 07 Β· Deeper Thinking
Probability & Bayesian Thinking

Think probabilistically β€” Bayes' theorem, priors & posteriors, Bayesian A/B testing, Monte Carlo simulation, MCMC, and Bayesian optimization.

Needs: Statistics NumPy
Bayes' Theorem A/B Testing Monte Carlo MCMC
3

Data Visualization

Communicate insights through static, statistical, and interactive charts

πŸ“Š
Module 06 Β· Static Charts
Matplotlib

Full control over every chart element β€” subplots, twin axes, custom colormaps, publication-quality figures, and GIF animations.

Needs: NumPy Pandas
Subplots Heatmaps Animations Colormaps
🎨
Module 07 Β· Statistical Viz
Seaborn

Beautiful statistical graphics β€” violin plots, pair grids, FacetGrid multi-panel figures, regression plots, and custom themes on top of Matplotlib.

Needs: Matplotlib Pandas
FacetGrid Pair Plots Regression Viz Themes
✨
Module 08 Β· Interactive
Plotly

Web-ready interactive charts β€” hover tooltips, zoom, animated scatter, geographic maps, 3D surfaces, and Dash for full web dashboards.

Needs: Pandas Matplotlib (helpful)
Interactive 3D Plots Maps Dash
4

Machine Learning

Build, evaluate, and interpret predictive models on structured and unstructured data

πŸ€–
Module 09 Β· Core ML
Scikit-learn

Complete ML toolkit β€” regression, classification, clustering, pipelines, hyperparameter tuning, stacking ensembles, SHAP explainability, and time-series CV.

Needs: NumPy Pandas Statistics
Classification Pipelines Ensembles SHAP
πŸ“ˆ
Module 10 Β· Temporal Data
Time Series Analysis

Forecasting and temporal pattern analysis β€” ARIMA, Prophet, decomposition, wavelet analysis, VAR models, walk-forward CV, and anomaly detection.

Needs: Pandas Statistics Sklearn
ARIMA Prophet Wavelets VAR
πŸ’¬
Module 11 Β· Text & Language
NLP & Text Processing

From tokenization to transformers β€” NER, sentiment analysis, TF-IDF, topic modeling, BERT fine-tuning, zero-shot classification, and text summarization.

Needs: Python Sklearn
spaCy Transformers BERT Zero-Shot
5

Deep Learning & Deployment

Neural networks, generative models, web apps, and production ML systems

🧠
Module 12 Β· Neural Networks
Deep Learning

PyTorch from scratch β€” autograd, CNNs with skip connections, Transformer attention, LSTMs, VAEs, GANs, and transfer learning with frozen backbones.

Needs: NumPy Sklearn Statistics
PyTorch CNNs Transformers VAE & GAN
πŸš€
Module 13 Β· Web Apps
Streamlit

Turn Python scripts into shareable web apps β€” dashboards, ML predictors, live streaming, batch predictions with file upload, Plotly charts, and session state.

Needs: Python Pandas Sklearn (helpful)
Widgets Caching File Upload Live Streaming
βš™οΈ
Module 14 Β· Production
MLOps & Deployment

Ship models to production β€” FastAPI serving, Docker, MLflow tracking, data drift detection, feature stores, A/B testing, canary deployments, and multi-armed bandits.

Needs: Sklearn Statistics Deep Learning (helpful)
FastAPI Docker MLflow Drift Detection
β˜…

Career Paths After Completion

What you can build and where these skills lead

πŸ“Š Data Analyst

Extract insights from business data, build dashboards, and drive decisions with statistics and visualization.

Python + SQL (query & automate)
Pandas + Polars (analyze)
Statistics (test & validate)
Matplotlib + Plotly (report)
Streamlit (share dashboards)

πŸ€– ML Engineer

Design, train, and deploy machine learning models that power data-driven products at scale.

NumPy + Statistics (foundations)
Pandas (feature engineering)
Scikit-learn (model building)
Deep Learning (neural nets)
MLOps (deployment & monitoring)

🧠 AI / DL Researcher

Push the boundaries of AI β€” develop new architectures, train large models, and publish research.

NumPy + Statistics (math)
PyTorch β€” CNNs, Transformers
NLP β€” BERT, zero-shot, RAG
VAE & GANs (generative models)
Time Series + Anomaly Detection

πŸ’¬ NLP Engineer

Build language-powered systems β€” chatbots, search engines, document extraction, and LLM applications.

Python (scripting & APIs)
NLP β€” spaCy, transformers
Sklearn (text classification)
Deep Learning (BERT fine-tuning)
MLOps (model serving)

πŸ“ˆ Quant / Forecaster

Model financial markets, demand forecasting, and supply chain predictions with temporal ML.

Python + NumPy (computation)
Pandas / Polars (time data)
Statistics (significance testing)
Time Series β€” ARIMA, VAR, Prophet
Sklearn (ML-based forecasting)

βš™οΈ MLOps / Platform Engineer

Build the infrastructure that keeps models reliable, monitored, and continuously improving in production.

Python (automation & APIs)
SQL (data pipelines)
Sklearn / Deep Learning (models)
MLOps β€” Docker, FastAPI, MLflow
Statistics (A/B testing, drift)
πŸ’‘

Study Tips

Get the most out of each guide

πŸ““ Use the Notebooks

Every module ships with a study_guide.ipynb. Open it in Jupyter or VS Code and run each cell β€” muscle memory matters more than reading.

πŸ”¨ Do the Practice Problems

Each section ends with a real-world practice exercise. Try to solve it before reading the starter code β€” struggling productively is how you actually learn.

πŸ—ΊοΈ Follow the Phase Order

  • Complete Phase 1 before jumping ahead
  • Pandas before Visualization makes more sense
  • Statistics before Scikit-learn prevents gaps
  • Deep Learning last β€” it builds on everything

πŸš€ Build a Project Per Phase

  • Phase 1–2: EDA on a Kaggle dataset
  • Phase 3: Dashboard with Plotly/Streamlit
  • Phase 4: End-to-end ML pipeline
  • Phase 5: Deploy a model with FastAPI