Basic Summarization operation
Introduction
Summarization operation is used to collapse a dataframe into a single value. The syntax for summarize is:
summarize(<dataframe>, <variable_name)> = <function_applied_to_column_to_summarize>)
The summarize operation does not modify the current input, instead it returns a new dataframe which is a result of the operation performed.
Procedure
We will be working with a custom dataframe.
# package for creating dataframe
library(tibble)
# tibble or dataframe
df <- tibble(col1 = as.integer(c(1,2,3,4,5)),
col2 = c(11,12,13,14,15)
)
View(df)
Few rows of the data are:
We will use the summarize operation to:
- find mean for col1
- find median for col2
Code
# refer procedure for definition of df
library(dplyr)
result <- dplyr::summarize(df, mean_value = mean(col1), median_value = median(col2))
View(result)
The output of above code is:
Here 3 is the mean for col1 and 13 is median value for col2.
Conclusion
Thus we have successfully implemented basic summarize function in tidyverse.
References
- https://r4ds.had.co.nz/