Duration: Ten weeks
12 April 2021
18 June 2021
Location: Online course (Live teaching)
9.00 – 18.30 (full-time)
Fees: £9,950 (10% alumni discount available)
Data science tools like programming, data modelling and visualisation are becoming common across many sectors as a way of generating business insights and informing strategy. Whether you’re a student just coming out of university or an industry professional, being able to use these tools and having an understanding of statistics, statistical analysis and data analysis will be essential as the need for data scientists grows.
That's why we're excited to present our Imperial Data Science Intensive Course, a partnership between Imperial College London, one of the world’s top universities and Le Wagon, the educational innovators behind the leading Coding Bootcamp that has helped nearly 10,000 students to build their technical skills. This venture also has the added support of Imperial Projects.
With modules on everything from Pandas to deep learning, in just ten weeks this course will teach you the fundamental skills to begin a career as a data scientist. This full-time, online and immersive experience equips participants with the skills to explore, clean, and transform data into actionable insights and to implement machine learning models from start to finish.
The course will culminate in a Data Spark project, pioneered by Imperial College London. Coached by data scientists from Imperial’s Data Science Institute, participants will apply the skills they've learned in class to capstone projects.
This is an entirely online course, adapted from the world class, in-person, Le Wagon Coding Bootcamp. This means you can get the teaching support and information you need to give your career a boost, wherever you are in the world!
Our course is designed to help you learn data science step by step, starting with the basic data toolkit in Python and the mathematics required to the complete implementation and deployment cycle of machine learning algorithms.
The course will be delivered live and you will get the opportunity to interact with the teaching team in real time, ask questions and get support as you progress.
Click through the tabs below to explore some of the modules you can study as part of this course:
Start the course prepared!
As the Imperial Data Science Intensive Course is very intense, our students must complete some online preparation work before starting the course. This work takes around 40 hours and covers the basics of Python, the pre-requisite language of the course, and some mathematical topics used every day by data scientists.
Python for data science
Learn programming in Python, how to work with Jupyter Notebook and to use powerful Python libraries like Pandas and NumPy to explore and analyse big datasets. Collect data from various sources, including CSV files, SQL queries on relational databases, Google Big Query, APIs and webscraping.
Relational database and SQL
Learn how to formulate a good question and how to answer it by building the right SQL query. This module will cover schema architecture and then dive deep into the advanced manipulation of SELECT to extract useful information from a stand-alone database or using a SQL client software like DBeaver.
Statistics, probability, linear algebra
Understand the underlying math behind all the libraries and models used in the course. Become comfortable with the basic concepts of statistics and probabilities (including mean, variance, random variable, Bayes’s Theorem, etc.) and with matrix computation, at the core of numerical operations in libraries like Pandas and NumPy.
You'll learn how to structure a Python repository with object-oriented programming in order to clean your code and make it re-usable, how to survive the data preparation phase of a vast dataset, and how to find and interpret meaningful statistical results based on multivariate regression models.
Data analysts are meant to communicate their findings to non-technical audiences: you will learn how to create impact by explaining your technical insights, and how to turn them into business decisions using cost/benefits analysis. You’ll be able to share your progress, present your results and compare your results to your teammates’.
Preprocessing and supervised learning
Learn how to explore, clean, and prepare your dataset through pre-processing techniques like vectorisation. Get familiar with the classic models of supervised learning – linear and logistic regressions. Learn how to solve prediction and classification tasks with the Python library scikit-learn using learning algorithms like KNN (k-nearest neighbors).
Generalisation and overfitting
Implement training and testing phases to make sure your model can be generalised to unseen data and deployed in production with predictable accuracy. Learn how to prevent overfitting using regularisation methods and how to choose the right loss function to improve your model's accuracy.
Evaluate your model's performance by defining what to optimise and the right error metrics in order to assess your business impact. Improve your model's performance with validation methods such as cross validation or hyperparameter tuning. Finally, discover a powerful supervised learning method called SVM (Support Vector Machines).
Unsupervised learning and advanced methods
Move to unsupervised learning and implement methods like PCA for dimensionality reduction or clustering for discovering groups in a data set. Complete your toolbelt with ensemble method that combine other models to improve performance, such as Random Forest or Gradient Boosting.
Managing images and text data
Get comfortable with managing high-dimensional variables and transforming them into manageable input. Learn classic pre-processing techniques for images like normalisation, standardisation and whitening. Apply the right type of encodings to prepare your text data for different NLP tasks (Natural Language Processing).
Understand the architecture of neural networks (neurons, layers, stacks) and their parameters (activation functions, loss function, optimiser). Become autonomous to build your own networks like Convolutional NeuralNetworks (for images), Recurrent Neural Networks (for time-series) and Natural Language Processing networks (for text).
Go further into computer vision with deep learning by building networks for object detection and recognition. Implement advanced techniques like data augmentation to augment your training set by computing image perturbations (random crops, intensity changes, etc) in order to improve your model's generalisation.
Machine learning pipeline
Move from Jupyter Notebook to a code editor and learn how to setup a machine learning project in the right way in order to quickly and confidently iterate. Learn how to convert a machine learning model into a model with a robust and scalable pipeline with sklearn-pipeline using encoders and transformers.
Machine learning workflow with MLflow
Building a machine learning model from start to finish requires a lot of data preparation, experimentation, iteration and tuning. We'll teach you how to do your feature engineering and hyperparameter tuning in order to build the best model. For this, we will leverage a library called MLflow.
Deploying to production with Google Cloud Platform
Finally, we'll show you how to deploy your code and model to production. Using Google Cloud AI Platform, you'll be able to train your model at scale, package it and make it available to the world. Cherry on top, you will use a Docker environment to deploy your own RESTful Flask API which could be plugged to any front-end interface.
You'll spend the last two weeks of the course working on a group project that explores an exciting data science problem you want to solve! As a team, you'll learn how to collaborate efficiently on a real data science project through a common Python repository and the Git flow. You will use a mix of your own datasets (if you have any from your company / non-profit organisation) and open-data repositories (Government initiatives, Kaggle, etc.). It will be a great way to practise using all the tools, techniques and methodologies covered in the Imperial Data Science Intensive Course and will make you realise how autonomous you have become.
This course is intense and will jump into advanced topics from the very first week. The course is designed for people with basic coding skills in Python and maths:
To get up to speed, you'll be given some preparation work to complete before the Imperial Data Science Intensive Course starts. We have a Python Technical Test as part of the admissions process to make sure you are at the right level to make the course a success.
We will be hosting lots of taster events open to students and anyone interested in the topic.
List of upcoming events:
Intro to Python
2/4 taster workshops for the Imperial Data Science Intensive Course. Let's learn how to explore the "insides" of websites and extract information from them.Sign up for free
Data Science Intensive launch event
Come to Imperial's Data Science Intensive launch event! More info to come.Sign up for free
Intro to SQL
3/4 taster workshops for our co-branded Bootcamp. Immersion in the life of a data analyst through concrete business cases using #datasets from the real world.Sign up for free
Bootcamp info session
Come to Imperial's Data Science Intensive info session! Le Wagon will be hosting and will answer all your questions.Sign up for free
Intro to Data Analysis (2)
4/4 taster workshops for our co-branded Bootcamp. Let's discover the basics of programming with Python and how to manage big data.Sign up for free
From morning lectures to evening recaps, every day of the course is action-packed.
Grab a coffee and start every morning with an engaging and interactive lecture, before putting what you’ve learnt into practice.
Pair up with your buddy for the day, and work on a series of programming challenges with the help of our teaching staff.
Review the day’s challenges and get an overview of upcoming lessons during live code sessions.
Having graduated from an MSc Computational Biology in 2014, Blair spent the last 6 years building his skillset as a data scientist. He's now joined Le Wagon, as well as being a meetup organiser, open source contributor and musician.
Ben studied and worked in IT for 11 years, working closely with developers. In 2017, he realised he wanted to become one and never looked back! He joined the team full-time as a teacher and engineer, and now leads the Data Science programme in the UK.
Mark uses methods of data science to study the emergence of new categories around innovations in organising. Recently, he has been using natural language processing (NLP) to study the emergence of commercial applications of artificial intelligence (AI) and machine learning to new approaches to organisations and work. He has PhD and MBA from Northwestern University and its Kellogg School and an AB degree from Stanford in Philosophy and Logic of Formal Systems.
Susan enjoys translating complex tech problems to a wider audience. She currently leads Imperial’s Data Spark programme and is a lecturer in Data Analytics at Ada National College for Digital Skills. She has an MBA from INSEAD in France and a BSc in Mechanical Engineering from Purdue University in the USA.
Sadia’s research is centred on implementing and developing statistical machine learning models to understand the progression of allergic diseases from childhood to adulthood using longitudinal data from five major birth cohorts. She obtained her PhD in Statistics from the London School of Economics. She was awarded the ‘Star Mentor’ award in 2020 for her mentoring on Imperial’s Data Spark programme.
Aras’s research focuses on models and algorithms for decision-making under uncertainty. His interests include robust optimisation, non-convex optimisation, and theoretical computer science. Aras is an academic mentor on the Imperial's Data Spark programme. Past projects he worked on used data analytics to improve operational efficiencies and commercial performance for a global energy company.