3 Data Cleaning Methods In Excel
There are a lot of different ways to clean your data in Excel. 🧹
As you know, clean data isn’t just about not being sloppy.
Your unclean data could go to support a million-dollar decision.
And without transparent data cleaning, you risk not being able to reproduce your work.
📣 So let’s talk about 3 data cleaning methods in Excel:
👉 The Classic - In Classic data cleaning, you apply an Excel filter to a table of data. Then, as you notice errors, you apply filters to each column. Perhaps you use Find / Replace to change some data. After enough changes, you’ll feel like you’ve hit the stopping point of a particular phase. So you copy that worksheet tab over to another and begin a new process of cleaning. This usually involves a lot of copying and pasting.
This method is unfortunately the most common cleaning method. It works great for small tasks and not mission-critical data; for instance, “can you find the average sales of this product real quick for me?” But for larger products, this method is highly error prone. And someone has to write down every step to ensure it’s done correctly each time.
👉 Formula Driven - The Formula driven approach is similar to the previous style, but it employs Excel formulas to drive much of the cleaning. If you use the new Excel formulas, you’ll get a nice degree of transparency. For instance, it’s rather obvious what TEXTJOIN() and UNIQUE() do. But functions like MID() and SUBSTITUTE() (which I always confuse with REPLACE()) are less clear. This method of cleaning will work best for small requests to intermediate / advanced Excel users.
👉 Power Query - This is by far the best data cleaning tool in Excel’s library. Power Query solves a lot of the problems mentioned previously—it keeps a series of transparent steps and its results are easily reproducible. If you do not know any Power Query, I would definitely recommend making it a learning goal. If you have LinkedIn Learning, check out my course: Data Wrangling in Excel with Power Query.
With Power Query, you can employ all manners of data cleaning.
👉 VBA Data Cleaning (BONUS!) - I don’t recommend this method. Lines of VBA code were preferred in the days before Power Query. But today, there is very little justification to use VBA for data cleaning. A clear exception might be if you're competing in the Financial Modeling World Cup and you have a clear vision on how to solve the problem with VBA. But outside of a time sensitive requirement, you should focus all your efforts on Power Query.
The truth is, we don’t discuss data cleaning enough.
Often it’s left to the analyst to use whatever skills they have at their disposal.
For best practices, let’s turn towards Power Query for data cleaning.