Chapter 1 eq_clean_data

This function has two behaviours:

  1. When you assign a file to load, and;
# Loading the 'signif.txt' file.
eq_clean_data(file_name = system.file("extdata", "signif.txt", package = "msdr"))
  1. When you pipe a dataset already loaded.
# Pipe.
readr::read_delim("signif.txt",
                  delim = "\t") %>% eq_clean_data()

1.1 Loading the data

This function also loads the Earthquake database from NOAA.

# Path to the raw data.
raw_data_path <- system.file("extdata", "signif.txt", package = "msdr")

# Loading the dataset of Earthquake.
df <- readr::read_delim(file = raw_data_path,      
                        delim = '\t',              
                        col_names = TRUE,          
                        progress = FALSE,           
                        col_types = readr::cols())

# Printing the first 5 rows.
head(df) %>%
       select(I_D, YEAR, LOCATION_NAME, EQ_PRIMARY, TOTAL_DEATHS) %>% 
              kable()
I_D YEAR LOCATION_NAME EQ_PRIMARY TOTAL_DEATHS
1 -2150 JORDAN: BAB-A-DARAA,AL-KARAK 7.3 NA
3 -2000 TURKMENISTAN: W 7.1 1
2 -2000 SYRIA: UGARIT NA NA
5877 -1610 GREECE: THERA ISLAND (SANTORINI) NA NA
8 -1566 ISRAEL: ARIHA (JERICHO) NA NA
11 -1450 ITALY: LACUS CIMINI NA NA

As you can see, there are several observations with NA values.

1.2 Creating new features

The eq_clean_data creates the DATE variable binding the columns YEAR, MONTH, and DAY. All this using the Lubridate package.

# Creating a new feature.
df <- df %>%
       mutate(DATE = lubridate::ymd(paste(df$YEAR,      # YEAR column
                                          df$MONTH,     # MONTH column
                                          df$DAY,       # DAY column
                                          sep = "/")))  # YYYY/MM/DD

1.3 Conversion Process

I have converted the class of some features:

  • TOTAL_DEATHS to numeric;
  • EQ_PRIMARY to numeric;
  • All NA’s of TOTAL_DEATHS in zeros.

1.4 Cleaning Process

I have removed:

  • All observations flagged as Tsunami, and;
  • All observations with no Date.

1.5 Example 1

How to load a txt file.

# Load the package
library(msdr)

# Define as file_name the txt file.
df <- eq_clean_data(file_name = raw_data_path)

# Dimensions of the loaded dataframe.
dim(df)
#> [1] 2840   49

1.6 Example 2

Piping a dataset to the eq_clean_data.

# Load the package
library(msdr)

# Piping a read_delim with eq_clean_data.
readr::read_delim(raw_data_path,
                  delim = "\t") %>%
       
              eq_clean_data() -> df

# Dimensions of the loaded dataframe.
dim(df)
#> [1] 2840   49