⚽DataFrame
View the Data
To view the data:
View(DataFrameName)View the Head and Tail of the DataFrame
# View Head
head(DataFrameName)
# View Tail
tail(DataFrameName)View the dimension or structure of the DataFrame#
# View the number of rows and column
dim(DataFrameName)
# View the overall structure
str(DataFrameName)Summarize the data frame

Get the content of a specific column
Extract unique data and their number in a column
Let's say, the dataframe contains a column that is factor type. Now you want to view the unique factors and their number.
Get the observation of a specific cell
To get the value of a specific cell you have to specify the Row Number and Column Number of that cell.


Get all the observations from a row
Get a specific observation by its value
Subsetting with brackets using row and column numbers can be quite tedious if you have a large dataset and you don’t know where the observations you’re looking for are situated! And it’s never recommended anyway because if you hard-code a number in your script and you add some rows, later on, you might not be selecting the same observations anymore! That’s why we can use logical operations to access specific parts of the data that match our specifications.
Subset observation based on one condition
We can use logical operators to denote our conditions in a column and subset the observations that meet the condition.
[Subset means extracting observations from a bigger dataset.]
Subset observations based on two conditions
Change the name of the column in the Dataframe
Option 1: Use column index
Option 2: Use column name
Option 3: Using the rename() Function from dplyr:
rename() Function from dplyr:The rename() function from the dplyr package is a more concise and efficient way to rename columns in a dataframe. To use rename(), you need to specify the new column name as a key-value pair, where the key is the old column name and the value is the new column name. For example, to rename the first column of a dataframe to new_column_name, you would use the following code:
Option 4: Change the names of all columns
setNames() method in R can also be used to assign new names to the columns contained within a list, vector, or tuple. The changes have to be saved back then to the original data frame because they are not retained.
Replace specific values in a column in R DataFrame
Option 1: By using row and column number
Option 2: Using the logical condition
Filter Observations of a dataframe based on observations from another dataframe
Let's think we have a main dataframe with many observations. We also have some observations in another small dataframe. We want to use the small dataframe to filter data from the main dataframe. We also want some specific columns from the main dataframe.
Code Breakdown: In this example, main_df.csv is your main dataframe, and filter_df.csv is the csv file with the observations used to filter the main dataframe. column_to_filter is the column in main_df that you want to filter based on the values in filter_df. The specific columns you want to keep from the filtered dataframe are specified in the c() function in the line selected_df <- filtered_df[, c("column_1", "column_2", "column_3")]. Finally, the selected data is written to a new csv file, filtered_and_selected.csv.
Code Breakdown: In this example, main_df is loaded from a .csv file named main_data.csv. The filter_data dataframe is loaded from another .csv file named filter_data.csv. The dplyr filter function is used to keep only those observations in main_df where the value in column_1 is found in the column_1 column of filter_data. The dplyr select function is used to only keep the column_2 and column_3 columns in the filtered data.
Code Breakdown:
Last updated