📦Boxplot by R base graph

Boxplot

A boxplot is a simple way to visualize the distribution of a set of continuous data. It is often used to quickly summarize and compare multiple groups of data. The boxplot consists of a box that covers the middle 50% of the data and a line inside the box that represents the median of the data. Whiskers are drawn from the box to the minimum and maximum values, excluding any outliers, which are plotted as individual points outside the whiskers. The boxplot provides information on the center, spread, and skewness of the data.

Structure of a boxplot (source)
Skewness in boxplot (source)

We use a boxplot to determine how a continuous variable changes in respect to the categorical variable.

Create a basic boxplot

# view the data
data(iris)

# Create a basic boxplot
boxplot(iris$Sepal.Length~iris$Species)

Here, the code above creates a boxplot using the boxplot() function in R. The first argument of the function, iris$Sepal.Length, is the data that we want to plot. The second argument, ~iris$Species, specifies the factor or grouping variable that we want to use to split the data into different categories. The resulting boxplot will display the distribution of Sepal.Length across the three different species of iris (which are the categories of the grouping variable).

Output:

A basic boxplot

Change labels and colors of the boxplot:

boxplot(iris$Sepal.Length~iris$Species, xlab = "Species", ylab = "Sepal Length",
        main = "Sepal Length of Different Species", col = "darkorchid3")

Output:

A boxplot with label and color

Source:

The source of contents on this page:

Last updated