DataFrame
View the Data
To view the data:
View the Head and Tail of the DataFrame
View the dimension or structure of the DataFrame#
Summarize the data frame
Code Breakdown:
Here,
If the column of the dataframe is factor type then
summary
function will show the number of each factor.If the column is numeric type then
summary
function will show some basic statistics (min, max, mean median, etc) of that column.
Get the content of a specific column
Code Breakdown:
First, specify the dataframe name
then put a $ sign
After that add the column name.
This code will return a list of contents in that column.
Extract unique data and their number in a column
Let's say, the dataframe contains a column that is factor type. Now you want to view the unique factors and their number.
Get the observation of a specific cell
To get the value of a specific cell you have to specify the Row Number and Column Number of that cell.
You can even specify the column name to get the value or observation of a specific row.
Get all the observations from a row
Code Breakdown:
The above code will extract all the values from the 6th row of your given dataframe.
You just have to specify the row number.
Similarly, you can give a column number to get all the rows from that column.
DataFrameName[ , 5]
Get a specific observation by its value
Subsetting with brackets using row and column numbers can be quite tedious if you have a large dataset and you don’t know where the observations you’re looking for are situated! And it’s never recommended anyway because if you hard-code a number in your script and you add some rows, later on, you might not be selecting the same observations anymore! That’s why we can use logical operations to access specific parts of the data that match our specifications.
Code Breakdown:
Let's say your specific column contains a value of 603. You want to access that cell. Previously we used column and row numbers which is not always a good idea.
Here, we used column name and observation value.
==
is a logical operator.
Subset observation based on one condition
We can use logical operators to denote our conditions in a column and subset the observations that meet the condition.
[Subset means extracting observations from a bigger dataset.]
Subset observations based on two conditions
Code Breakdown:
The above code will subset all the observations of Column_01 where the value is 2 or (|) 5.
Code Breakdown:
This code will extract all the observations that have value 7 in Column_01 and values between 100 to 200 in Column_02.
Change the name of the column in the Dataframe
Option 1: Use column index
Option 2: Use column name
We can use names
instead of colnames
function. But colnames
is preferable to me.
names
: Functions to get or set the names of an R object.colnames
: Retrieve or set the column names of a matrix-like object (eg. Dataframe).
Option 3: Using the rename()
Function from dplyr
:
rename()
Function from dplyr
:The rename()
function from the dplyr
package is a more concise and efficient way to rename columns in a dataframe. To use rename()
, you need to specify the new column name as a key-value pair, where the key is the old column name and the value is the new column name. For example, to rename the first column of a dataframe to new_column_name
, you would use the following code:
Option 4: Change the names of all columns
setNames()
method in R can also be used to assign new names to the columns contained within a list, vector, or tuple. The changes have to be saved back then to the original data frame because they are not retained.
Replace specific values in a column in R DataFrame
Option 1: By using row and column number
Option 2: Using the logical condition
Code Breakdown:
y
: It is the value that helps us to fetch the data location of the columnx
: It is the value that needs to be replaced
Filter Observations of a dataframe based on observations from another dataframe
Let's think we have a main dataframe with many observations. We also have some observations in another small dataframe. We want to use the small dataframe to filter data from the main dataframe. We also want some specific columns from the main dataframe.
Code Breakdown: In this example, main_df.csv
is your main dataframe, and filter_df.csv
is the csv file with the observations used to filter the main dataframe. column_to_filter
is the column in main_df
that you want to filter based on the values in filter_df
. The specific columns you want to keep from the filtered dataframe are specified in the c()
function in the line selected_df <- filtered_df[, c("column_1", "column_2", "column_3")]
. Finally, the selected data is written to a new csv file, filtered_and_selected.csv
.
Code Breakdown:
Last updated