Strategies and Techniques for Data Cleaning in Excel and Python

Strategies and Techniques for Data Cleaning in Excel and Python

Master essential data cleaning techniques and turn messy data into a valuable resource with effective cleaning techniques in Excel and Python.

⏰ 4 h 37 min | 11 lessons
Published: September 2023, (updated 1 year ago)
Access our growing course library
$ 20 / month (billed annually)
After completing this course, you will...
Know how to clean and analyse data in Excel more efficiently using Tidy Data Principles.
Understand how to use common techniques in Python to speed up your data-cleaning operations.
Understand how to clean and manipulate data using Pandas, one of the most popular data science libraries.
Be comfortable working through a complete data-cleaning pipeline from import to investigation, cleaning and export.
COURSE OVERVIEW

Clean data is data that does not require any transformation or updates prior to commencing analysis. If your data requires cleansing work prior to use, then you have messy data.

If you primarily work with Excel, working with messy data means you regularly find yourself carrying out tasks such as:

  1. Removing non printable characters such as line breaks.
  2. Removing leading and trailing spaces or extra spaces.
  3. Using the text to columns wizard to split data into separate columns.
  4. Populating blank cells or removing blank rows or columns.
  5. Identifying duplicate values or duplicate data.
  6. Using conditional formatting to highlight errors.
  7. Correcting the capitalisation of text.
  8. Using paste special to clear formats.

This list is far from exhaustive and messy data is commonplace. Every messy dataset encountered is messy in its own unique way and requires its own unique processes to clean. However, once all such issues are solved and we have clean data, we can progress to the interesting part of the job, the analysis.

In this course we'll work through the complete data-cleaning pipeline, from importing data, to cleaning and manipulating it, to exporting it for analysis.

We'll do this using both Excel and Python, and we'll see how Python can be used to speed up our data cleaning operations.

Basic data exploration in Python - digiLab Academy

Figure 1. Basic data exploration in Python

We'll start by focusing on the Tidy Data Principles. These principles are a set of rules that help us to structure our data in a way that makes it easy to clean, manipulate and analyse.

We'll then move on to look at some of the most common data cleaning techniques in Excel, and we'll see how we can use Python to speed up our data cleaning operations.

We'll then move on to look at how we can use Python to clean and manipulate data using Pandas, one of the most popular data science libraries.

When you complete this course, you'll have the ability to take real-world data and clean it for further analysis in your own data science projects.

πŸ€– Use AI to help you learn!

All digiLab Academy subscribers have access to an embedded AI tutor! This is great for...

Helping to clarify concepts and ideas that you don't fully understand after completing a lesson.
Explaining the code and algorithms covered during a lesson in more detail.
Generating additional examples of whatever is covered in a lesson.
Getting immediate feedback and support around the clock...when your course tutor is asleep!
Section 1
1. What Does Clean Data Look Like in Microsoft Excel? πŸ“‚
11:13 (Preview)
2. Supercharge Your Microsoft Excel Data Cleaning with Python πŸ“‚
20:15 (Preview)
3. How to Clean Data in Excel Using Python: The Basics πŸ“‚
31:26 (Preview)
4. Extracting Values From Free Text: Introduction to REGEX πŸ“‚
33:23
5. Manipulating and working with Pandas dataframes πŸ“‚
44:42
6. Managing Missing Values Part 1 πŸ“‚
21:25
7. Managing Missing Values Part 2 πŸ“‚
23:32
8. Obtaining Data Part 1: Working with Files πŸ“‚
21:34
9. Obtaining Data Part 2: Working with Web Data πŸ“‚
30:22
10. Case Study: Bringing it All Together πŸ“‚
39:25
11. Conclusion, Certificate, and What Next?
getting-started
Richard Warburton
Guest Contributor
Richard is a data specialist and communicator who helps audiences understand complex subjects and processes. He holds masters degrees in both Mathematics and Data Science and has 18 years of professional experience gained in the Telecoms and Finance sectors. In his spare time, he enjoys cycling, walking, traveling, cooking, and is currently learning the piano.
TRY THIS NEXT
Getting Started in Machine Learning
Getting Started in Machine Learning
Learn key machine learning concepts and build a strong foundation for future learning.