3.4 Cleaning Data
Always opt to clean the data using the Programmatic way because manually it is more error prone.
This is the steps of Data Cleaning:
- Define: Defining a Data Cleaning Plan (usually writting down);
- Code: Converts the Data Cleaning Plan into code;
- Test: Evaluates the outuput of the code.
3.4.1 Tidiness
It is the standard preconized by Hadley Wickham.
Usually, the tidiness issues is the first to be solved.
3.4.2 Quality
After fixing tidiness issues, the quality issues could be fixed.
3.4.3 Methods
3.4.3.1 .melt()
Convert a wide format to a long format. It is the same of gather and spread functions from tidyr R package.
A work by AH Uyekita
anderson.uyekita[at]gmail.com