Data Science
Data scientists are building machine learning models, driving business strategy, and commanding some of the highest salaries in tech. This is where you build the expertise to join them.
Duration
22 weeks at 20 hours/week
Level
Intermediate
Start Date
Jul 20, 2026
Format
Learn at your own pace.
People are landing roles at leading companies after completing this programme. You could be next.
Data and AI Literacy Foundations
Equip learners with the foundational mindset and technical literacy needed to solve complex problems using structured reasoning, programmatic logic, and the EGAD framework.
Outcomes
Develop the analytical mindset and technical foundation to solve complex problems using structured reasoning, programmatic logic, and the EGAD framework.
Data Analytics with Spreadsheets
Enables learners to transform raw data into reliable business insights by mastering data cleaning, governance, and statistical reasoning within a spreadsheet environment. Learners will develop the technical proficiency to source and prepare datasets, apply descriptive analytics, and use AI-powered tools to visualize patterns and validate assumptions for data-driven decision-making.
Outcomes
Transform raw data into reliable business insights by mastering data cleaning, statistical analysis, and AI-powered visualization within a spreadsheet environment..
SQL for Data
A comprehensive foundation in relational database management, focusing on the ability to design, query, and optimize complex data structures using SQL. Learners will master everything from basic data retrieval to advanced analytical functions, window functions, and database normalization, all while applying best practices within production-grade notebook environments.
Outcomes
Design, query, and optimize relational databases using SQL—from basic data retrieval to advanced analytical functions and database normalization.
PowerBI for Data Analytics
This course focuses on transforming complex datasets into impactful visual stories by mastering data modeling, Power Query transformations, and DAX expressions within Power BI. Learners will develop the skills to design interactive dashboards and accessible reports that effectively communicate insights to both technical and non-technical stakeholders.
Outcomes
Build interactive Power BI dashboards and reports that turn complex data into compelling visual stories using data modeling, Power Query, and DAX.
Python I: Foundations & Control Flow
Master Python fundamentals from data structures and control flow to modular functions while integrating Git/GitHub and AI-assisted coding (Copilot/Claude) into a professional development workflow.
Outcomes
Master Python fundamentals data structures, control flow, and modular functions while building a professional workflow with Git, GitHub, and AI coding tools.
Python II: Python Programming & Algorithmic Thinking
Explore software architecting by mastering Object-Oriented Programming (OOP), algorithmic complexity (Big O), and advanced functional techniques. Learn to build professional-grade, scalable codebases and leverage NumPy and pandas for high-performance data manipulation, all while using AI to optimize class hierarchies and complex data transformations.
Outcomes
Write scalable, professional-grade Python code using Object-Oriented Programming, algorythmic complexity analysis, AI optimization, and advanced functional techniques with NumPy and pandas for high-performance data manipulation.
Supervised Learning I – Regression Foundations
Transition from data analysis to predictive modeling by mastering the supervised machine learning workflow. Build, diagnose, and optimize Linear Regression models (Simple and Multiple) using scikit-learn, handle high-dimensional data with Ridge and LASSO regularization, and deploy your models via object serialization (pickling). Use AI to scaffold ML pipelines and translate complex error metrics (RMSE/MAE) into actionable business insights.
Outcomes
Develop predictive models. Build, evaluate, and deploy supervised machine learning models using scikit-learn, applying regularization techniques and translating model performance into actionable business insights.
Supervised Learning II – Classification & Model Selection
Master the art of predicting categories and managing complex model trade-offs. Build non-linear models like Decision Trees and Random Forests, solve classification problems with Logistic Regression, and learn to handle “needle-in-a-haystack” scenarios using SMOTE for imbalanced data. Use AI to trace complex decision logic and translate technical metrics like Precision and Recall into high-stakes business strategies.
Outcomes
Predict categories and navigate model trade-offs using Decision Trees, Random Forests, and Logistic Regression—including strategies for handling imbalanced datasets in high-stakes business scenarios.
Supervised Learning III – Advanced Classification & Model Selection
Use the most powerful tools in the classifier’s arsenal, moving beyond basics to Support Vector Machines (SVMs), K-Nearest Neighbors (KNN), and Naive Bayes. Bridge the gap to deep learning by architecting Neural Networks with TensorFlow/Keras and learn the rigorous science of Model Selection—using cross-validation and automated Grid Search to systematically prove which algorithm is truly the “best fit” for your data.
Outcomes
Apply advanced classifiers SVMs, KNN, Naive Bayes, and Neural Networks and use cross-validation and Grid Search to rigorously identify the best model for your data.
Unsupervised Learning: Clustering & Dimensionality Reduction
Uncover the hidden architecture of unlabeled data by mastering Clustering and Dimensionality Reduction. Simplify massive datasets using Principal Component Analysis (PCA), visualize high-dimensional structures with t-SNE and UMAP, and segment populations using K-Means and Hierarchical Clustering. Use AI to detect anomalies and transform abstract cluster centroids into vivid, actionable business personas.
Outcomes
Uncover hidden patterns in unlabeled data using clustering and dimensionality reduction techniques PCA, t-SNE, K-Means, and more and translate results into actionable business personas.
Recommendation Systems
Explore the science of personalization and spatial analysis by implementing Gaussian Mixture Models (GMMs) for probabilistic “soft clustering” and building advanced Recommendation Engines. Process geographic data using GeoPandas, implement both Content-Based and Collaborative Filtering, and leverage AI zero-shot classification to solve the “Cold-Start” problem for new items.
Outcomes
Build recommendation engines and analyze geographic data using probabilistic clustering, content-based and collaborative filtering, and AI-powered zero-shot classification to solve cold-start challenges.
Natural Language Processing
Transform messy, unstructured text into actionable intelligence by mastering Natural Language Processing (NLP). Build robust text-cleaning pipelines covering Regex, tokenization, and lemmatization and convert language into math using TF-IDF and N-Grams. Step into modern AI by deploying Hugging Face Transformers for high-accuracy text classification without the need for manual training.
Outcomes
Transform unstructured text into actionable intelligence using NLP pipelines, TF-IDF, and modern Hugging Face Transformers for high-accuracy text classification.
Hi, I’m LEA, your ALX AI Assistant. I’m here to help, ask me anything.
What is data science?
Data science is the discipline of using data, statistical methods, and machine learning to build models that predict outcomes, classify information, and uncover patterns that are not visible through standard analysis. Data scientists work at the intersection of mathematics, programming, and business strategy, turning data into decisions at scale.
How is data science different from data analytics?
Data analytics focuses on examining existing data to understand what has happened and why. Data science goes further. It builds predictive models and machine learning systems to forecast what will happen, automate decisions, and find structure in large, complex datasets. Data science requires deeper programming and mathematical skills, and the programme reflects that.
What will I learn in this programme?
The programme takes you from data literacy and spreadsheet analysis through Python programming, exploratory data analysis, statistical reasoning, supervised and unsupervised machine learning, natural language processing, and recommendation systems. Every stage is built around real-world projects, including work on agricultural productivity, public health risk prediction, humanitarian aid allocation, and disaster relief.
What kinds of jobs do data scientists do?
Data scientists work in roles including Data Scientist, Machine Learning Engineer, AI Engineer, Research Scientist, and Senior Data Analyst. They are among the most in-demand professionals globally, with applications across healthcare, finance, agriculture, logistics, and technology.
Do I need programming experience to start?
You do not need prior programming experience. The programme builds Python from the foundations up, teaching data structures, control flow, and modular functions before moving into data manipulation and machine learning. What you need is persistence. This is one of the more demanding programmes in the portfolio, and the projects reflect that.
How long does Data Science take to complete?
The first intake launches on 20 July 2026. The programme covers significantly more ground than most programmes in the portfolio. Following the recommended pace gives you a clear timeline. You can also move faster if your background allows
Is data science relevant to African industries?
Deeply. The programme’s projects are deliberately built around African contexts, including water access in rural communities, food security for agricultural organisations, humanitarian aid allocation across African countries, and environmental monitoring. The skills are global. The application is grounded.
Do I need to complete Professional Foundations to do this programme?
When you complete your first Data Science short course, you are automatically enrolled in Professional Foundations. From that point you complete both in parallel. Professional Foundations is required for your Data Science Programme Certificate, but you do not need to finish it before you can continue your Data Science short courses. If you have already completed Professional Foundations through a previous programme, the system will recognise that and you will not be asked to repeat it.
We bring together industry leaders to share insights, spark ideas, and help you level up.