Julien Beaulieu Data Scientist

Comprehensive Project Based Data Science Curriculum

*updated 25 Oct 2020: curriculum v3.0

Summary

This curriculum offers a mix of best in class resources and a suggested path to use them in order to become a data scientist. It is intended to be a complete education in data science using online materials and is an alternative to getting a master’s degree. All resources have been heavily researched and used by myself in my journey to becoming a Data Scientist & a Deep Learning practitioner.

Why I wrote this curriculum

There is a lack of curated online resource that organizes material found online into a long term learning plan that covers all aspects of data science. Most curriculums only suggest content from their own platform, or suggest too many options from which it’s hard to choose what to use.

Who is this for?

The following curriculum is intended for anyone who is currently in a field completely unrelated to data science who wants to fully transition careers and be hired as a data scientist (I myself used to be in digital marketing). It therefore assumes no prior knowledge of data science or programming, and only a basic knowledge of high school math.

If you already have experience with machine learning but are looking to refine your skills, you can start directly at within the data science or deep learning module and hand-pick what you find interesting throughout.

This curriculum does, however, assume that you are extremely eager to learn, self driven, and motivated because a lot of the resources are self-paced. Completing the curriculum end to end will easily take over a year. That said, my programme goes much deeper than a bootcamp and will give you more hands-on experience than most master’s degrees.

This curriculum is inspired by OSSU’s amazing self-taught, open source education in Computer Science.

Why choose a self-taught education?

  • Abundance of high-quality resources: This curriculum includes many courses from top universities (MIT, Stanford, University of San Francisco) as well as MOOCs (Massive open online course) that have been taken by millions of students and have received outstanding reviews (Deeplearning.ai, Fastai, Le Wagon).
  • Focus on state-of-the-art techniques: There aren’t many university courses or bootcamps that teach the latest techniques such as those found in Fastai or at Standford’s CS224n Deep Learning for Natural Language Processing.
  • Flexibility: Easy to pursue or continue part time if you find a job during the process.
  • Passion: You can study and work on the projects you find the most interesting instead of being bound to a strict curriculum for 2 years.
  • Self-starter: A central requirement is that you have a proven ability to be a self-starter. If that is you, you’re in the right place.

Objectives with this coursework

  • Work on real world practical projects that you are passionate about.
  • Develop a strong foundation in math - this includes linear algebra, calculus, statistics and probability.
  • Become a good developer with solid software engineering and computer science abilities.
  • Be able to read scientific papers and apply them or redo the experiments on your own.
  • Deploy models with elegant and reusable code.
  • Get hired as a data scientist, data analyst, or machine learning engineer.

Visual Overview

Made with Visme Infographic Maker

Curriculum

Programming Primer & Learning How to Learn

Knowing how to program is essential for data science. I highly suggest learning Python basics before anything else. This way, you’ll make the most out of your bootcamp/course and can remain focused on the actual data science instead of playing catch-up. Make sure you are up to speed with the following material before going further. Also, since you are about to engage in a lifetime of learning new things, I highly recommend having a look at the resources related to becoming a better learner.

Note: ❤️s represent material I particularly enjoyed and recommend.

Topics covered: Introduction to AI , Introduction to Python, Learning how to learn

Resources Source Format
Fundamentals of AI DataCamp Videos and coding environment exercises
Learn Python Codecademy Coding environment exercises
Learning How to Learn ❤️ Coursera  Videos and quizzes
A Mind for Numbers Barbara Oakley Book
Pragmatic Thinking and Learning ❤️ Andy Hunt Book

Overview

  • Get an overview of what machine learning is with the “Fundamentals of AI” to make sure it a good route for you to pursue.
  • Learn programming basics with a great platform: Codecademy.
  • Since you’re about to start an epic learning journey, make sure you know how to apply the best techniques to learn efficiently. Taking “Learning how to learn” on Coursera is a must! The book on which the course is based, “A Mind for Numbers”, and the brilliant “Pragmatic Thinking and Learning” are good complementary options too. For a quick summary of all three resource refer to this blogpost.

Core Data Science

This is where you’ll improve your coding abilities, mathematical understanding and start working on real data science problems. In this respect, I recommend taking what I consider is the best data science bootcamp out there: Le Wagon**. With a heavy emphasis on practical exercises and a final project in which you get to deploy your own machine learning model, this intensive bootcamp will give you the big picture on data science end to end: math theory, data wrangling, data vizualization, programming inside an IDE, Git, machine learning, deep learning, and data engineering.

Once you understand and have worked on the most important aspects of data science, you’ll have a better idea of what you enjoy, what your strengths and weaknesses are, and where to head next. This is also an opportunity to build your new professional network.

Next, check out Fastai’s Introduction to Machine Learning for Coders. I recommend only watching the first 6 lectures which focus on tree-based models. The rest of the videos focus on deep learning which is better covered in their more recent course, below. Although this content is 3 years old, don’t let that discourage you from watching it: it is taught by one of the the most respected data scientist in the world - Jeremy Howard - and is full of gems.

Pair the above with Andriy Burkov’s famous The Hundred-Page Machine Learning Book to learn from another perspective, and solidify your understanding of concepts covered so far.

Topics covered: Data wrangling Data collection with an API, SQL, Statistical tests & experiments, Data visualization, Machine Learning, Deep Learning, Random Forests, Model interpretation techniques

Resources Source Format
Data Science Bootcamp ❤️ Le Wagon In person / remote lectures - 9 weeks
Introduction to Machine Learning for Coders - Fastai ❤️ U of San Francisco Online videos and projects
The Hundred-Page Machine Learning Book ❤️ Andriy Burkov Book
Fastai Book Jeremy Howard, Sylvain Gugger Book

Overview

  • Gain experience in the most important data science related tasks by taking Le Wagon’s bootcamp. For a cheaper version look at Udacity’s Data Analyst Nanodegree combined with Coursera’s Machine Learning Course
  • Get a practical approach to machine learning with tree-based models and model interpretation with Fastai.
  • Complement your learning with the very well written and concise Hundred Page Machine Learning Book.

Core Programming

The following resources will help you become a good programmer, understand software engineering and give you the tools to pass the technical tests that most employers send you during recruitment. I suggest reviewing this material early in your education because being a good programmer will pay off very fast.

You don’t need to go through all of this material in a linear way. Review this on an as-need basis but make sure you’re regularly coming back to this material.

Resources Source Format
Python with Corey Schafer ❤️ YouTube  Videos
Python for Data Analysis, 2nd Edition Wes McKinney  Book
Fluent Python ❤️ Luciano Ramalho Book
Coding Exercises HackerRank  Coding exercises
Intro to Data Structures and Algorithms Udacity Self-paced videos and coding environment
Missing Semester MIT Self-paced videos and exercises

Overview

  • If you’re struggling with any programming concept, make sure you search for videos of Corey Schafer explaining the subject. His videos are always well-built, clear and enlightening.
  • This book is a practical, modern introduction for manipulating, processing, cleaning, and crunching datasets in Python. It is ideal for beginners and is a great way to get better at pandas, Numpy, and IPython.
  • Familiarize yourself with common data structures and algorithms in Python with practice exercises.
  • Complete HackerRank exercises to refine your Python skills with interview-style questions.
  • Once you’ve nailed the basics of Python, read Fluent Python to push things further. This book will walk you through Python’s core language features and libraries, and shows you how to make your code shorter, faster, and more readable at the same time. Note: keep an eye out for the updated edition of the book which is coming soon.
  • If you still aren’t comfortable with the shell, version control (Git) and debugging, watch the lectures from MIT’s Missing Semester and do the exercises. Seriously, don’t neglect the exercises!

Core Math

Machine learning is, fundamentally, math. Some say that it’s not strictly necessary to go deep into mathematical theory and that it’s better to focus on coding. While there is some truth to this, if your end goal is to read, write, implement papers, and to be a true expert in data science, then do not neglect math. The best way to achieve this is to go through the materials and also do the exercises.

The following list of resources will help you to both get started if you’re a beginner, and to go deep down the math rabit whole if you’re advanced.

Topics covered: Linear algebra statistics Vector calculus Probability and more

Resources Source Format
Essence of Linear Algebra ❤️ YouTube Videos
StatQuest - Machine Learning ❤️ Youtube Videos
Linear Algebra Khan Academy Videos and exercises
Linear Algebra 18.06 with Gilbert Strang ❤️ MIT Videos and homework
Calculus 1 & 2 Khan Academy Videos and Math excercises
Mathematics for Machine Learning Marc Peter Book

Overview

  • Get a great intuition for linear algebra with the fantastic resource: Essence of Linear Algebra by 3Blue1Brown.
  • Learn all things statistics and machine learning with Statquest. Josh Starmer has a gift for breaking down complex ideas into some of the simplest and best explanations on the Web.
  • Delve deep into linear algebra with prof. Gilbert Strang’s amazing lecture. Complement with exercises in his book (which includes solutions to the exercises). For a less in depth alternative, refer to Khan Academy.
  • Learn all the math required for machine learning with Marc Peter (and co.)’s book (advanced).
  • Don’t forget to actually do the exercises and work on assigments. This is the only way you’ll become good at math.

Deep Learning

After completing the courses in Core Data Science, and with more solid foundations in programming and machine learning theory, you can move onto deep learning.

Topics covered: Loss functions and optimization, Convolutional neural networks, Recurrent neural networks, Deep learning hardware and software, Deep learning for tabular data, NLP, Computer vision, Generative models,

Resources Source Format
Practical Deep Learning for Coders - Part 1 ❤️ U of San Francisco Online videos and recommended projects
Deep Learning Specilization ❤️ Coursera - Andrew Ng Online videos and assignments
CS224n: Natural Language Processing with Deep Learning ❤️ Stanford - Chris Manning Online videos, assignments & final project
EECS 498-007 / 598-005 - Deep Learning for Computer Vision ❤️ U of Michigan - Justin Johnson Online videos, assignments & final project

Overview

  • Learn how to create state of the art models using the Fastai Library with Part 1 of their course. I suggest taking both the Fastai and Deep Learning Specialization courses together since one is more focused on coding while the other is more focused on the theory and math behind it.
  • Both Chirs Maning’s and Justin Johnson’s (he used to teach the very popular CS231n at Stanford) courses are world class and will give you deep insights into the worlds of Natural Language Processing (NLP) and computer vision. Be sure to do the assignments since they have you code algorithms from scratch and give you a solid foundations to progress further.

Data Engineering

Data Engineering is the foundation for the new world of Big Data. Any good data scientist should know about data engineering and how to deploy models, at least at a basic level. Very early in your training you should start deploying models from your projects online. These days, employers are looking for the whole package, and you’ll have a better shot a scoring the job if you can take a project all the way from concept to a deployed, real-world application.

Topics covered:

Resources Source Format
Full Stack Deep Learning ❤️ Udacity Online videos and Project
ML in Production - Deployment Series  MLinProduction Blogpost series 
Machine Learning Engineering ❤️ Andriy Burkov Book
  • Learn all about built-in experiment management, unit tests, labelling, linting scripts, continuous integration/continuous development with CircleCI, model versioning, Docker and deployment with this course that truly should be way more popular than it currently is.
  • Complement this course with Andriy’s amazing Machine Learning Engineering book that will teach you about the whole life cycle of a machine learning project.
  • Read this multi-part blog series on deploying machine learning models in an automated, reproducible, and auditable manner.
  • Go back to some of the models you have built for your projects and deploy them!

Optional Courses

The following are courses should be taken depending on the outcome you want to achieve as a data scientist.

Resources Source Format
Practical Deep Learning for Coders - Part 2 ❤️ U of San Francisco Online videos and projects
CS229 - Machine Learning  Standford - Andrew Ng  Online videos and assignments
SQL Mode Analytics  Coding environment exercises
  • Learn to rebuild some Pytorch modules as well as part of the Fastai library from scratch with Part 2 of the course. This is also a great lecture in API design and software engineering.
  • If you wish to specialize in machine learning more so than deep learning, look no further than Andrew Ng’s famous machine learning lecture at Stanford.
  • If SQL is important for your projects and current/future job, become an expert with this SQL tutorial.

Extras

In addition to all of the above, I suggest doing the following:

  • Subscribe to these newsletters: Andriy Burkov, Deeplearning.ai’s The Batch, DataScienceWeekly.
  • Regularly explore Meetup.com to see if there are meetups on topics you are interested in. Since more meetups are currently happening online, you have access to meetups across the entire world.
  • Attend conferences. One I highly suggest going to is Pycon, even if that means spending a bit of money to attend and travelling to a host city. The value you’ll get from it will be worth it.
  • Participate in Hackathons. Keep an eye out for these events happening in your city, or look on meetup.com to find them.

Final Notes

While I update resources found in this curriculum quite regularly, some will inevitably become outdated. As a rule of thumb, you can be sure to trust the quality of the following content if you come across their material:

  • All new and old courses from Deeplearning.ai
  • All computer science / machine learning courses at Stanford Online
  • All courses from Fastai and Jeremy Howard specifically
  • Andrew Ng for machine learning
  • Justin Johnson for computer vision
  • Chris Manning for NLP
  • All of Andriy Burkov’s content
  • StatQuest for statistics/ML explanations
  • 3Blue1Brown for math

Please feel free to send me any resources, materials, courses that I have not included that you particularly enjoyed, or to send me a message if you want to chat about my experience learning this material.

**Disclaimer: I am a freelance teacher at Le Wagon’s data science bootcamp. That said, they are not paying me to be included here. I decided to add the bootcamp to the curriculum because of how valuable I think it is.



Subscribe to hear more from me