⚽DataFrame
View the Data
To view the data:
View(DataFrameName)
View the Head and Tail of the DataFrame
# View Head
head(DataFrameName)
# View Tail
tail(DataFrameName)
View the dimension or structure of the DataFrame#
# View the number of rows and column
dim(DataFrameName)
# View the overall structure
str(DataFrameName)
Summarize the data frame
summary(DataFrameName)

Get the content of a specific column
DataFrameName$ColumnName
Extract unique data and their number in a column
Let's say, the dataframe contains a column that is factor type. Now you want to view the unique factors and their number.
# View unique content
unique(DataFrameName$ColumnName)
# Get the number of unique content
length(unique(DataFrameName$ColumnName))
Get the observation of a specific cell
To get the value of a specific cell you have to specify the Row Number and Column Number of that cell.


Get all the observations from a row
DataFrameName[6, ]
Get a specific observation by its value
Subsetting with brackets using row and column numbers can be quite tedious if you have a large dataset and you don’t know where the observations you’re looking for are situated! And it’s never recommended anyway because if you hard-code a number in your script and you add some rows, later on, you might not be selecting the same observations anymore! That’s why we can use logical operations to access specific parts of the data that match our specifications.
# Let's access the values for number 603
DataFrameName[DataFrameName$ColumnName == 603, ]
Subset observation based on one condition
We can use logical operators to denote our conditions in a column and subset the observations that meet the condition.
[Subset means extracting observations from a bigger dataset.]
# Subset all the observations greater than 10
DataFrameName[DataFrameName$ColumnName > 10, ]
# This code is also the same as the previous code
# Here, we just used the not (!) operator
DataFrameName[!DataFrameName$ColumnName < 10, ]
Subset observations based on two conditions
DataFrameName[DataFrameName$Column_01 == 2 | DataFrameName$Column_01 == 5 , ]
DataFrameName[DataFrameName$Column_01 == 7 & DataFrameName$Column_02 %in% c(100:200) , ]
Change the name of the column in the Dataframe
Option 1: Use column index
colnames(df)[col_indx] <- “new_col_name”
Option 2: Use column name
colnames(df)[colnames(df) == "Age"] <- "Years"
Option 3: Using the rename()
Function from dplyr
:
rename()
Function from dplyr
:The rename()
function from the dplyr
package is a more concise and efficient way to rename columns in a dataframe. To use rename()
, you need to specify the new column name as a key-value pair, where the key is the old column name and the value is the new column name. For example, to rename the first column of a dataframe to new_column_name
, you would use the following code:
library(dplyr)
new_df <- df %>% rename(new_column_name = old_column_name)
# To change multiple column names
new_df <- df %>% rename(new_column_name_1 = old_column_name_1, new_column_name_2 = old_column_name_2)
Option 4: Change the names of all columns
setNames()
method in R can also be used to assign new names to the columns contained within a list, vector, or tuple. The changes have to be saved back then to the original data frame because they are not retained.
setnames(df, c(names of new columns))
Replace specific values in a column in R DataFrame
Option 1: By using row and column number
df[row_number, column_number] <- value_to_be_replaced
Option 2: Using the logical condition
dataframe_name$column_name1[dataframe_name$column_name2==y] <- x
Filter Observations of a dataframe based on observations from another dataframe
Let's think we have a main dataframe with many observations. We also have some observations in another small dataframe. We want to use the small dataframe to filter data from the main dataframe. We also want some specific columns from the main dataframe.
# Load the main dataframe and the filtering csv file
main_df <- read.csv("main_df.csv")
filter_df <- read.csv("filter_df.csv")
# Extract the column from the filtering csv file that will be used for filtering
filter_col <- filter_df[, 1]
# Filter the main dataframe based on the values in filter_col
filtered_df <- main_df[main_df$column_to_filter %in% filter_col, ]
# Select the specific columns you want from the filtered dataframe
selected_df <- filtered_df[, c("column_1", "column_2", "column_3")]
# Write the selected data to a new csv file
write.csv(selected_df, "filtered_and_selected.csv")
Code Breakdown: In this example, main_df.csv
is your main dataframe, and filter_df.csv
is the csv file with the observations used to filter the main dataframe. column_to_filter
is the column in main_df
that you want to filter based on the values in filter_df
. The specific columns you want to keep from the filtered dataframe are specified in the c()
function in the line selected_df <- filtered_df[, c("column_1", "column_2", "column_3")]
. Finally, the selected data is written to a new csv file, filtered_and_selected.csv
.
Code Breakdown:
Last updated