3.4 Cleaning Data

Always opt to clean the data using the Programmatic way because manually it is more error prone.

This is the steps of Data Cleaning:

  • Define: Defining a Data Cleaning Plan (usually writting down);
  • Code: Converts the Data Cleaning Plan into code;
  • Test: Evaluates the outuput of the code.

3.4.1 Tidiness

It is the standard preconized by Hadley Wickham.

Usually, the tidiness issues is the first to be solved.

3.4.2 Quality

After fixing tidiness issues, the quality issues could be fixed.

3.4.3 Methods

3.4.3.1 .melt()

Convert a wide format to a long format. It is the same of gather and spread functions from tidyr R package.

Good Video - Explaning the melt

 

A work by AH Uyekita

anderson.uyekita[at]gmail.com