In May 2018, I received my Microsoft Professional Program Data Science Certificate. I thoroughly enjoyed going through the different units and now look forward to putting my new-found skills as a data scientist into practice. Allow me to share my experience, especially with those who want to follow my footsteps.
Early 2017 I started my company blog hoping that it would be propelled to the top of the search rankings. That didn’t happen, but in October the blog refocused on data visualization. That’s how my interest in data science started. I felt certification would help me stand out from the crowd, so I enrolled in the Data Science Track of the Microsoft Professional Program (MPP).
Microsoft identified the 8 top-most skills that needed to be taught in consultation with the industry. The curriculum is built around these skills and delivered as massive open online courses (MOOC) on the edX learning platform. That worked just fine for me since I could take the lessons from the comfort of my home office. Without a regular job, I spent a few hours every day to learn something new and practice what I had learned. I accelerated my learning in the second quarter and completed the program in 7 months with the submission of my capstone project.
The Data Science Certification program is made up of 3 units and a final project taught over 10 courses. The MPP Data Science Certificate will be awarded when you achieve a 70% pass rate and obtain a verified certificate for the 10 courses. You could first enroll in the free audit track and upgrade later since a verified certificate costs $99 per course. In total, the MPP Data Science Certificate will cost you $990.
Some courses give a choice between different technologies. For instance, analyzing and visualizing data is taught with Excel and Power BI. For the programming courses, one has a choice between R and Python, but nothing prevents you from learning both. The choice between R and Python will impact your learning experience, so you need to put some thought into it.
Review of Courses
Course 1: Data Science Orientation
This course gets started with an explanation of the curriculum and an encounter with a variety of data scientists. The following modules teach data science fundamentals and provide a basic introduction to statistics. For many of us, this will be a refresher of what we already know. The course can be completed in a single day and is a good primer for anyone with an interest in data science. I really enjoyed this course, and it became a motivator for the remainder of the track.
Course 2: Querying Data with Transact-SQL
The content of this course is rather intricate and geared towards database professionals. Various aspects of Transact-SQL are taught in 11 modules, so be prepared for a long ride. The combination of lectures, demonstrations and hands-on lab exercises kept me engaged, but at times I wondered whether the presenters knew I was watching. The course taught me the essentials of SQL, but I’m not sure how much of it I will use. An added benefit of this course is that it taught me how to create an Azure SQL Server Database.
Course 3b: Analyzing and Visualizing Data with Power BI
I learned this skill with Power BI since I was keen to increase my understanding and proficiency. The first modules teach the fundamental BI workflow of data transformation, modeling, visualization, and sharing. The other modules cover various topics and ensure that you obtain an all-round view of the Power BI product. Will Thomson and his team delivered the course with infectious enthusiasm, and I viewed the lessons repeatedly. Read my article Interactive Reports with Power BI to see how I put what I learned into practice.
Course 4: Essential Statistics for Data Analysis using Excel
The modules in this course help you gain a good understanding of descriptive statistics, basic probability, random variables, sampling and confidence intervals, and hypothesis testing. The lectures are excellent, and the demonstrations use lots of real-world examples. I enjoyed the lessons and even felt like a statistician, but the formulas fade quickly when you don’t use them! Overall an excellent foundation course for data scientist that equips them with essential skills in statistics.
Course 5b: Introduction to Python for Data Science
Programming courses in the data science track offer a choice between R and Python. I had some knowledge of Python and prefer its natural language, but chose Python due to its increasing popularity within and beyond data science. The modules cover basic Python and the Numpy, Matplotlib, and Pandas packages, and don’t require any previous knowledge. The videos are concise and to the point and the exercises are easy, but the final exam is timed and I struggled a bit with the manipulation of Pandas data frames. The exercises and final exam introduced me to DataCamp’s learning platform, so I know where I can get more practice.
Course 6: Data Science Essentials
This course starts with an introduction to data science, recaps statistics and data visualization, and ends with an introduction to data munging and machine learning. The demonstrations and lab exercises are conducted with Azure ML and Jupyter Notebooks. Both tools are user-friendly and provide an excellent learning environment for machine learning and programming. I enjoyed this course because it reinforced earlier learning and made me appreciate the entire data science workflow. The lab exercises are easy, and you should be able to attain a morale-boosting score.
Course 7: Principles of Machine Learning
This course builds on the previous one and offers a more in-depth overview of classification, regression, and clustering models. There’s a module that focuses on model improvement, while other modules cover tree and ensemble methods and optimized-based methods like neural networks and support vector machines. The lectures are excellent and could act as a future reference, but the calculus can be challenging. The demonstrations and exercises use practical real-world examples and doing the exercises provided great fun and excellent learning. I recommend that you do this course well since it prepared me for successful completion of the capstone project.
Course 8b: Programming with Python for Data Science
This course is delivered by Coding Dojo, the industry’s premier coding bootcamp, and is taught by Authman Apatira one of their lead instructors. The lectures are excellent, but you will spend most of your time on writing Python code and learning packages like Scikit Learn. The course is extensive and covers topics like data preparation, feature engineering, dimensionality reduction, data modeling, and evaluation. Even though I was closely engaged with this course for a month, I almost freaked out during the final exam. Applying freshly acquired programming and problem-solving skills under the pressure of time proved a challenge, but that’s what you sign up for as a data scientist.
Course 9a: Implementing Predictive Analytics with Spark in Azure HDInsight
I had intended to take Applied Machine Learning to learn more about the use of location data and satellite imagery, but that course was removed as an option by end March 2018. The course Implementing Predictive Analytics with Spark in Azure HDInsight feels like a let-down after the intensive learning in the previous course. One learns to provision an HDInsight Spark Cluster in Ms Azure and gets some exposure to Spark Python, but apart from that there’s limited new learning. A break in-between storms doesn’t harm, and I can add the use of Spark and Spark Python to my CV.
Course 10: Capstone Project
The Capstone Project runs for 6 weeks until the first month of every quarter and our cohort had to build a model that could predict earthquake damage in Nepal. The project assumes that you have completed the other nine courses since you need to apply what you’ve learned. The 3-part challenge of my project consisted of data exploration, a data model competition and the submission of a final report that was reviewed by fellow students. The model competition provided the fun part where one that to develop a multi-classifier from an imbalanced dataset. I struggled to improve on a multiclass logistic regression algorithm with default parameters, but feature selection and tuning of the hyperparameters got the job done. Limited time and visiting relatives added pressure to the assignment but competing and working with other students was stimulating.
What I Learned
The Data Science program claims to teach 8 fundamental skills that a data scientist needs. I completed the curriculum and obtained my data science certificate, but what did I really learn?
- Query relational data – I had worked with relational databases and SQL for many years, but my proficiency in the use of SQL has certainly improved.
- Analyze and visualize data – I had used pie and bar charts extensively, but learned to use other chart types and obtained new skills in data modeling and creating dashboards.
- Understand statistics – I had learned statistics, but this course made me understand statistics and why it is so important when you work with data.
- Explore data with code – I knew a bit of Python, but this course fully taught the basics and introduced me to relevant packages like Pandas, Matplotlib and Scikit Learn.
- Understand core data science concepts – data science is an evolving discipline, but I am now familiar with the core concepts and know how to apply them.
- Understand machine learning – I knew little or nothing about machine learning, but I now have a firm grasp of the principles and methods and know some of the key applications.
- Use code to manipulate and model data – I learned to manipulate data and apply machine learning with Python, but there’s more to be learned and constant practice is needed.
- Develop intelligent solutions – I learned to deploy and use an intelligent machine learning solution with Spark on Azure HDInsight, but not how to build it.
It’s debatable whether all these learned skills qualify me as a data scientist, and some even argue that you need to be in practice. I know a few people who practice badly, so I don’t hesitate to call myself a data scientist. Ready to work hard and open to new learning.
Learning functional data science skills also enhanced my technical skills in tools like Excel, Power BI, and Azure ML. A qualified data scientist should not be constrained to one technology, but skills in these products form an adequate starting point. In addition, they form a good reference for exploring and using other products and technologies.
Completing the MPP Data Science track and earning my certificate has been a wonderful experience and I have no regrets about the time and money spent. I got particularly excited about machine learning and have started thinking about its application in various industries. Even so, I obtained all-round data science skills that can add value to almost any organization.
The MPP Data Science track is a good choice If you have the time, the money, good internet, and are ready to work with Microsoft products. There are many alternatives for learning data science, so you need to find out what’s on the market. Consider also your career stage and objectives. I had a long career in business and the geospatial industry, so for me this was a good choice. But there could be better options if you are looking for a first job as a Python programmer.
What struck me in this course is that there is an easy and a hard way to do data science. Azure ML offers Machine Learning as a Service (MLaaS) and I see this becoming a self-service tool for business executives and professionals. Python programming, on the other hand, gives greater control, but it’s tedious and a lot harder. Let’s see who wins this battle, or will they continue to live happily ever after?