🏀Scatter Plot by R base graph
Scatter plot:
Let's create a scatter plot:
# Data
x <- mtcars$wt
y <- mtcars$mpg
# Create the plot
plot(x, y, main = "Main title",
xlab = "X axis title", ylab = "Y axis title",
pch = 19, frame = FALSE)Code Breakdown:
The code above creates a scatter plot in R using data from the built-in
mtcarsdata set. The variablesxandyare created by extracting thewtandmpgcolumns, respectively, from themtcarsdata set.The
plot()function is then used to create the scatter plot. The first two arguments toplot(),xandy, specify the x and y coordinates of the points in the plot, respectively.The
mainargument is used to specify the main title for the plot. Thexlabandylabarguments are used to specify the titles for the x and y axes, respectively.The
pchargument is used to specify the plotting character to use for the points in the plot. In this case,pch = 19means that filled circles will be used.Finally, the
frameargument is set toFALSEto remove the default frame around the plot.
Output:

pch values for point types:
pch values for point types:
Add regression line to the plot
Let's add a regression line to the above plot:
# Create the plot
plot(x, y, main = "Main title",
xlab = "X axis title", ylab = "Y axis title",
pch = 19, frame = FALSE)
# Add regression line
abline(lm(y ~ x, data = mtcars), col = "red")Here,
The second line of code, abline(lm(y ~ x, data = mtcars), col = "red"), adds a regression line to the plot. The lm function is used to fit a linear model to the data. The y ~ x formula specifies the response variable y and the predictor variable x. The data argument specifies the data frame that contains the variables x and y, which is mtcars in this case. The abline function is then used to add the regression line to the plot, and the col argument specifies the color of the line as red.
Add LOESS fit to the plot:
Let's add loess fit to the main scatter plot:
# Create the plot
plot(x, y, main = "Main title",
xlab = "X axis title", ylab = "Y axis title",
pch = 19, frame = FALSE)
# Add loess fit
lines(lowess(x, y), col = "blue")Here,
The lowess function in R computes a locally weighted regression fit. The fit is a smooth curve that tries to capture the underlying trend in the data. In this case, the curve is created by fitting a simple regression model at each individual point in the data, weighting the regression by the proximity of other points. The resulting curve is then plotted over the scatter plot of the data using the lines function. The color of the curve is set to "blue".

Scatter Plot Matrices
Here, we'll go over the process of creating a matrix of scatter plots. This is helpful for visualizing the correlation in smaller data sets. The pairs() function from the R base can be used for this purpose.
We will use R built-in iris dataset.
# load the data
data(iris)The pairs() function in R is used to create a matrix of scatter plots to visualize the relationship between multiple variables in a data set. The function is part of the base R package and can be used without loading any additional packages.
Here's an example of using the pairs() function in R:
# Create a basic plot
pairs(iris[,1:4], pch = 19)The code generates a matrix of scatter plots to visualize the correlation between different variables in the iris data set. The pch argument is set to 19, which determines the plotting character used in the scatter plots. The pairs function takes two arguments: the first argument is a matrix or data frame of the variables to be plotted, and the second argument pch sets the plotting character. In this code, the iris data set is passed in as the first argument, but only the first 4 columns (i.e., variables) are included. The resulting matrix of scatter plots will display the relationship between each pair of variables.

To show only the upper panel:
pairs(iris[,1:4], pch = 19, lower.panel = NULL)
Color points by groups
Let's color the points based on species group
# Specify the color in a vector
my_cols <- c("#ff8000", "#0080ff", "#ff0080")
# Create the plot
pairs(iris[,1:4], pch = 19, cex = 0.5,
col = my_cols[iris$Species],
lower.panel=NULL)Here,
The
cexargument sets the size of the plotting characters, with the value of 0.5 indicating that they will be half their default size.The argument
colspecifies the color of the points in the scatter plots, with the valuemy_cols[iris$Species]indicating that the color will be determined by the species of each iris in the data frame. Themy_colsvariable is a vector of color codes, andiris$$Speciesis a column in theirisdata frame that indicates the species of each iris.

Add correlations on the scatter plots:
# Customize the upper panel
upper.panel<-function(x, y){
points(x,y, pch=19, cex = 0.5, col=c("red", "green3", "blue")[iris$Species])
r <- round(cor(x, y), digits=2)
txt <- paste0("R = ", r)
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
text(0.5, 0.9, txt)
}
pairs(iris[,1:4], lower.panel = NULL,
upper.panel = upper.panel)code breakdown:
The code is creating a matrix of scatter plots using the pairs() function. The pairs() function takes the first four columns of the iris dataset as input. The code is customizing the upper panel of the plots using a user-defined function upper.panel. In the upper.panel function, points are plotted using the points() function. The arguments pch and col are used to specify the plot character and color for the points. The value for col is set based on the species of each sample in the iris dataset.
The cor() function is used to calculate the Pearson's correlation coefficient between the two variables in the plot, and round() is used to round the correlation value to 2 decimal places. The rounded correlation value is then concatenated with the string "R = " to create a label for the plot. The label is placed at the position (0.5, 0.9) in the plot using the text() function. The par() function is used to set the plot parameters to display the label. The on.exit() function ensures that the original plot parameters are restored after the upper.panel function is executed.
Finally, the pairs() function is called with the arguments lower.panel = NULL and upper.panel = upper.panel to create the matrix of scatter plots. The lower panel of the plots is suppressed by setting lower.panel = NULL, and the upper panel is customized using the upper.panel function.
Output:

Add correlations on the lower panels:
First, we have to define a function that will create a correlation panel. In this case, The size of the text is proportional to the correlations.
# Correlation panel
panel.cor <- function(x, y){
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- round(cor(x, y), digits=2)
txt <- paste0("R = ", r)
cex.cor <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = cex.cor * r)
}Here,
The code defines a custom panel function named "panel.cor" in the R programming language. This function will be used to produce a correlation plot. The function takes two arguments, x and y, which represent the two variables whose correlation is being calculated.
The first line of the function saves the current graphical parameters using the
parfunction and setson.exitto restore the original parameters after the function has finished executing.The second line sets the graphical parameters using the
parfunction, which takes the argumentusr = c(0, 1, 0, 1), meaning that the plot will cover the entire plot area (0 to 1 on both the x and y-axis).The third line calculates the Pearson correlation coefficient between the two variables using the
corfunction and rounds it to two decimal places using theroundfunction.The fourth line concatenates the text
"R = "with the correlation coefficient value stored in thervariable, and saves it in thetxtvariable.The fifth line calculates the scaling factor
cex.corfor the text, which is proportional to the inverse of the width of the text string.Finally, the last line adds the text "R = r" to the plot using the
textfunction, positioning it in the center of the plot area (x = 0.5, y = 0.5) and setting its size proportional to the correlation coefficient (cex = cex.cor * r).
# Specify the color in a vector
my_cols <- c("#ff8000", "#0080ff", "#ff0080")
# Customize the upper panel
upper.panel<-function(x, y){
points(x,y, pch = 19, col = my_cols[iris$Species])
}The R code defines a custom function named "upper.panel" that is used to customize the upper panel of a scatter plot. The function takes two arguments, "x" and "y", which represent the x-axis and y-axis data for the plot, respectively.
Within the function, the "
points" function is used to plot the data points. The "x" and "y" arguments are used to specify the x-axis and y-axis values for the plot.The "
pch" argument is set to 19, which specifies the plotting symbol to be used (a solid dot).The "
cex" argument is set to 0.5, which controls the size of the plotting symbol.The "
col" argument is set to "my_cols[iris$Species]", which specifies the color of the plotting symbol based on the species of the iris data.
# Create the plots
pairs(iris[,1:4],
lower.panel = panel.cor,
upper.panel = upper.panel)The code creates a scatter plot matrix using the pairs function from the R base library. A scatter plot matrix shows scatter plots of all combinations of variables in a data set.
The function
pairstakes a data frameiris[, 1:4]as its first argument, which includes the first four columns of theirisdata set.The second argument,
lower.panel = panel.cor, specifies a custom panel function to be used for the lower triangle of the scatter plot matrix. In this case, the panel functionpanel.coris defined earlier in the code and computes and displays the Pearson correlation coefficient of each pair of variables.The third argument,
upper.panel = upper.panel, specifies another custom panel function to be used for the upper triangle of the scatter plot matrix. Theupper.panelfunction is defined earlier in the code and adds points to each plot, with the points colored according to the species of iris, which is stored in theSpeciescolumn of theirisdata set. The points havepch = 19, which is a code for a filled circle symbol, and have sizecex = 0.5.By specifying custom panel functions for the lower and upper triangles of the scatter plot matrix, the code produces a customized visualization of the relationships between the variables in the
irisdata set.
Output:

Scatter plot by psych package
psych packageThe psych package in R provides a function called pairs.panels which can be used to generate a scatter plot matrix. This plot displays bivariate scatter plots below the diagonal, histograms along the diagonal, and the Pearson correlation coefficient above the diagonal.
# Load the library
library(psych)
# Create the plot
pairs.panels(iris[,-5],
method = "pearson", # correlation method
hist.col = "#e066ff",
density = TRUE, # show density plots
ellipses = TRUE # show correlation ellipses
)Output:

psych package in RSource:
Last updated