After you have acquired the data, you should do the following:
The dlookr package makes these steps fast and easy:
dlookr increases synergy with dplyr
. Particularly in
data exploration and data wrangling, it increases the efficiency of the
tidyverse
package group.
Data diagnosis supports the following data structures.
Tasks  Descriptions  Functions  Support DBI 

describe overview of data  Inquire basic information to understand the data in general  overview() 

summary overview object  summary described overview of data  summary.overview() 

plot overview object  plot described overview of data  plot.overview() 

diagnose data quality of variables  The scope of data quality diagnosis is information on missing values and unique value information  diagnose() 
x 
diagnose data quality of categorical variables  frequency, ratio, rank by levels of each variables  diagnose_category() 
x 
diagnose data quality of numerical variables  descriptive statistics, number of zero, minus, outliers  diagnose_numeric() 
x 
diagnose data quality for outlier  number of outliers, ratio, mean of outliers, mean with outliers, mean without outliers  diagnose_outlier() 
x 
plot outliers information of numerical data  box plot and histogram whith outliers, without outliers  plot_outlier.data.frame() 
x 
plot outliers information of numerical data by target variable  box plot and density plot whith outliers, without outliers  plot_outlier.target_df() 
x 
diagnose combination of categorical variables  Check for sparse cases of level combinations of categorical variables  diagnose_sparese() 
Tasks  Descriptions  Functions  Support DBI 

pareto chart for missing value  visualize the Pareto chart for variables with a missing value.  plot_na_pareto() 

combination chart for missing value  visualize the distribution of missing value by combining variables.  plot_na_hclust() 

plot the combination variables that is include missing value  visualize the combinations of missing value across cases  plot_na_intersect() 
Types  Descriptions  Functions  Support DBI 

report the information of data diagnosis into a PDF file  report the information for diagnosing the data quality  diagnose_report() 
x 
reporting the information of data diagnosis into HTML file  report the information for diagnosing the quality of the data  diagnose_report() 
x 
reporting the information of data diagnosis into HTML file  dynamic report the information for diagnosing the quality of the data  diagnose_web_report() 
x 
reporting the information of data diagnosis into PDF and HTML files  paged report the information for diagnosing the quality of the data  diagnose_paged_report() 
x 
Types  Tasks  Descriptions  Functions  Support DBI 

categorical  summaries  frequency tables  univar_category() 

categorical  summaries  chisquared test  summary.univar_category() 

categorical  visualize  bar charts  plot.univar_category() 

categorical  visualize  bar charts  plot_bar_category() 

numerical  summaries  descriptive statistics  describe() 
x 
numerical  summaries  descriptive statistics  univar_numeric() 

numerical  summaries  descriptive statistics of standardized variable  summary.univar_numeric() 

numerical  visualize  histogram, box plot  plot.univar_numeric() 

numerical  visualize  QQ plots  plot_qq_numeric() 

numerical  visualize  box plot  plot_box_numeric() 

numerical  visualize  histogram  plot_hist_numeric() 
Types  Tasks  Descriptions  Functions  Support DBI 

categorical  summaries  frequency tables cross cases  compare_category() 

categorical  summaries  contingency tables, chisquared test  summary.compare_category() 

categorical  visualize  mosaics plot  plot.compare_category() 

numerical  summaries  correlation coefficient, linear model summaries  compare_numeric() 

numerical  summaries  correlation coefficient, linear model summaries with threshold  summary.compare_numeric() 

numerical  visualize  scatter plot with marginal box plot  plot.compare_numeric() 

numerical  Correlate  correlation coefficient  correlate() 
x 
numerical  Correlate  summaries with correlation matrix  summary.correlate() 
x 
numerical  Correlate  visualization of a correlation matrix  plot.correlate() 
x 
both  PPS  PPS(Predictive Power Score)  pps() 
x 
both  PPS  summaries with PPS  summary.pps() 
x 
both  PPS  visualization of a PPS matrix  plot.pps() 
x 
Types  Tasks  Descriptions  Functions  Support DBI 

numerical  summaries  ShapiroWilk normality test  normality() 
x 
numerical  summaries  normality diagnosis plot (histogram, QQ plots)  plot_normality() 
x 
Target Variable  Predictor  Descriptions  Functions  Support DBI 

categorical  categorical  contingency tables  relate() 
x 
categorical  categorical  mosaics plot  plot.relate() 
x 
categorical  numerical  descriptive statistic for each levels and total observation  relate() 
x 
categorical  numerical  density plot  plot.relate() 
x 
categorical  categorical  bar charts  plot_bar_category() 

numerical  categorical  ANOVA test  relate() 
x 
numerical  categorical  scatter plot  plot.relate() 
x 
numerical  numerical  simple linear model  relate() 
x 
numerical  numerical  box plot  plot.relate() 
x 
categorical  numerical  QQ plots  plot_qq_numeric() 

categorical  numerical  box plot  plot_box_numeric() 

categorical  numerical  histogram  plot_hist_numeric() 
Types  Descriptions  Functions  Support DBI 

reporting the information of EDA into PDF file  reporting the information of EDA  eda_report() 
x 
reporting the information of EDA into HTML file  reporting the information of EDA  eda_report() 
x 
reporting the information of EDA into PDF file  dynamic reporting the information of EDA  eda_web_report() 
x 
reporting the information of EDA into HTML file  paged reporting the information of EDA  eda_paged_report() 
x 
Types  Descriptions  Functions  Support DBI 

missing values  find the variable that contains the missing value in the object that inherits the data.frame  find_na() 

outliers  find the numerical variable that contains outliers in the object that inherits the data.frame  find_outliers() 

skewed variable  find the numerical variable that is the skewed variable that inherits the data.frame  find_skewness() 
Types  Descriptions  Functions  Support DBI 

missing values  missing values are imputed with some representative values and statistical methods.  imputate_na() 

outliers  outliers are imputed with some representative values and statistical methods.  imputate_outlier() 

summaries  calculate descriptive statistics of the original and imputed values.  summary.imputation() 

visualize  the imputation of a numerical variable is a density plot, and the imputation of a categorical variable is a bar plot.  plot.imputation() 
Types  Descriptions  Functions  Support DBI 

binning  converts a numeric variable to a categorization variable  binning() 

summaries  calculate frequency and relative frequency for each levels(bins)  summary.bins() 

visualize  visualize two plots on a single screen. The plot at the top is a histogram representing the frequency of the level. The plot at the bottom is a bar chart representing the frequency of the level.  plot.bins() 

optimal binning  categorizes a numeric characteristic into bins for ulterior usage in scoring modeling  binning_by() 

summaries  summary metrics to evaluate the performance of binomial classification model  summary.optimal_bins() 

visualize  generates plots for understand distribution, bad rate, and weight of evidence after running binning_by()  plot.optimal_bins() 

infogain binning  categorizes a numeric characteristic into bins for multiclass variables using recursive information gain ratio maximization  binning_rgr() 

visualize  generates plots for understanding distribution and distribution by target variable after running binning_rgr()  plot.infogain_bins() 

evaluate  calculates metrics to evaluate the performance of binned variable for binomial classification model  performance_bin() 

summaries  summary metrics to evaluate the performance of binomial classification model after performance_bin()  summary.performance_bin() 

visualize  It generates plots to understand frequency, WoE by bins using performance_bin after running binning_by()  plot.performance_bin() 

visualize  extract bins from “bins” and “optimal_bins” objects  extract.bins() 
Types  Descriptions  Functions  Support DBI 

diagnosis  performs diagnose performance that calculates metrics to evaluate the performance of binned variable for binomial classification model  performance_bin() 

summaries  summary method for “performance_bin”. summary metrics to evaluate the performance of the binomial classification model  summary.performance_bin() 

visualize  visualize for understanding frequency, WoE by bins using performance_bin and something else  plot.performance_bin() 
Types  Descriptions  Functions  Support DBI 

transformation  performs variable transformation for standardization and resolving skewness of numerical variables  transform() 

summaries  compares the distribution of data before and after data transformation  summary.transform() 

visualize  visualize two kinds of a plot by attribute of the ‘transform’ class. The transformation of a numerical variable is a density plot  plot.transform() 
Types  Descriptions  Functions  Support DBI 

reporting the information of transformation into PDF  reporting the information of transformation  transformation_report() 

reporting the information of transformation into HTML  reporting the information of transformation  transformation_report() 

reporting the transformation information into PDF  dynamic reporting the transformation information  transformation_web_report() 

reporting the information of transformation into HTML  paged reporting the information of transformation  transformation_paged_report() 
Types  Descriptions  Functions  Support DBI 

statistics  calculate the entropy  entropy() 

statistics  calculate the skewness of the data  skewness() 

statistics  calculate the kurtosis of the data  kurtosis() 

statistics  calculate the JensenShannon divergence between two probability distributions  jsd() 

statistics  calculate the KullbackLeibler divergence between two probability distributions  kld() 

statistics  calculate the Cramer’s V statistic between two categorical(discrete) variables  cramer() 

statistics  calculate the Theil’s U statistic between two categorical(discrete) variables  theil() 

statistics  finding percentile of a numerical variable.  get_percentile() 

statistics  transform a numeric vector using several methods like “log”, “sqrt”, “log+1”, “log+a”, “1/x”, “x^2”, “x^3”, “BoxCox”, “YeoJohnson”  get_transform() 

statistics  calculate the Cramer’s V statistic  cramer() 

statistics  calculate the Theil’s U statistic  theil() 
Types  Descriptions  Functions  Support DBI 

programming  extracts variable information having a certain class from an object inheriting data.frame  find_class() 

programming  gets class of variables in data.frame or tbl_df  get_class() 

programming  retrieves the column information of the DBMS table through the tbl_bdi object of dplyr  get_column_info() 

programming  finding the user machine’s OS.  get_os() 

programming  import Google fonts  import_google_font() 