What is Data Wrangling
Sorting,
cleaning,
and structuring
raw data so that it becomes reliable and usable for analysis.
Why is data wrangling important
Because mistakes in data due to it being unreliable can cost a company millions or even billions of dollars.
What are the 6 steps of Data Wrangling
Discovery, Structuring, Cleaning, Enriching, Verifying, Publishing.
What is the Discovery stage
Familiarizing yourself with data
to get an idea of its content
and understand how it can be organized effectively.
What is the Structuring stage
Transforming data so it is usable, such as making files the same type to ensure compatibility.
What is the Cleaning stage
Removing errors and useless data to make sure the dataset is accurate and free of mistakes.
What is the Enriching stage
Improving weak or incomplete pieces of data to make the dataset stronger and more valuable.
What is the Verifying stage
Ensuring that the data is correct and accurate, since incorrect data can lead to costly mistakes.
What is the Publishing stage
Finalizing the wrangled data so it is ready to be used or shared effectively.