# Load Data

## Load CSV Files in R

```r
ObjectName <- read.csv("path-to-file/filename.csv", header = TURE)
```

{% hint style="info" %}
**Code Breakdown:**

* `read.csv` used to load **csv** files.
* `header = TURE` argument will consider the first row as the header or column names.
  {% endhint %}

## Get Data from the URL

**-> Option 1: Directly save as an object.**

Let's say the data is in **csv** format. We can use `read.csv` function to directly parse the data and save it as a DataFrame.

{% tabs %}
{% tab title="Code" %}

<pre class="language-r" data-overflow="wrap"><code class="lang-r"><strong>polls &#x3C;- read.csv("https://raw.githubusercontent.com/ds4stats/r-tutorials/master/tidying-data/data/rcp-polls.csv",
</strong>na.strings = "--", as.is = TRUE)
</code></pre>

{% hint style="info" %}
**Code Breakdown:**

* We have to put the URL inside the double quote under `read.csv` function.
* <mark style="color:red;">`na.strings = "--"`</mark> : This dataset denotes missing data as <mark style="color:red;">`--`</mark>. But R doesn't understand that. So we converted the <mark style="color:red;">`--`</mark> into <mark style="color:red;">`NA`</mark>.
* <mark style="color:red;">`as.is = TRUE`</mark> : Normally R converts the character column into a factor. By this argument, we specified not to do this conversion.
  {% endhint %}
  {% endtab %}
  {% endtabs %}

\
**-> Option 2: First save as a file and then load as an object.**

{% tabs %}
{% tab title="Code" %}
{% code overflow="wrap" %}

```r
# Set the URL as an object
url <- "https://raw.githubusercontent.com/ds4stats/r-tutorials/master/tidying-data/data/rcp-polls.csv"

# Download the file
download.file(url, "poll_dataset.csv")

# Load as an object
polls <- read.csv("poll_dataset.csv", header = TRUE, na.strings = "--",
as.is = TRUE)

```

{% endcode %}

{% hint style="info" %}
**Code Breakdown:**

* At first, the URL was specified by an object <mark style="color:red;">`url`</mark>.
* In the second code, <mark style="color:red;">`download.file`</mark> function downloaded the dataset.
* At 3rd code, we loaded the dataset as a **csv** file.
  {% endhint %}
  {% endtab %}
  {% endtabs %}

## Load TSV Files in R

To load a **TSV** file in R, we can use either the <mark style="color:red;">`read.delim()`</mark> function or the <mark style="color:red;">`read_tsv()`</mark> function from the <mark style="color:red;">`readr`</mark> package.

### Using the <mark style="color:red;">`read.delim()`</mark> function

The <mark style="color:red;">`read.delim()`</mark> function is a general function for reading delimited text files. To read a **TSV** file, you need to specify the delimiter as <mark style="color:red;">`"\t"`</mark>.

```r
# Load the TSV file
tsv_data <- read.delim("path/to/tsv_file.tsv", sep = "\t")

# Print the head of the data frame
head(tsv_data)

```

### Using the `read_tsv()` function

The <mark style="color:red;">`read_tsv()`</mark> function is a specific function for reading **TSV** files. It is more efficient than using the <mark style="color:red;">`read.delim()`</mark> function for TSV files.

To use the <mark style="color:red;">`read_tsv()`</mark> function, you need to install the `readr` package first.

```r
# Install the readr package
install.packages("readr")

# Load the readr package
library(readr)

# Load the TSV file
tsv_data <- read_tsv("path/to/tsv_file.tsv")

# Print the head of the data frame
head(tsv_data)

```

{% hint style="info" %}
When you observe the outputs, the basic difference between both methods is <mark style="color:red;">`read_tsv()`</mark> function returns the dataframe with columns by specifying the type of it `[ Student_Id – double, Student_Name – Character ],` when it comes to <mark style="color:red;">`read.delim()`</mark> method it simply returns the data present in the tsv file.

However, the <mark style="color:red;">`read_tsv()`</mark> function is more efficient and easier to use for TSV files.
{% endhint %}

## Load Excel Files in R

To load an <mark style="color:red;">**`.xlsx`**</mark> file in R, you can use the <mark style="color:red;">`read_excel()`</mark> function from the <mark style="color:red;">`readxl`</mark> package.

```r
# Load the the readxl package
library("readxl")

# Read the XLSX file
xlsx_data <- read_excel("data.xlsx")

# Print the head of the data frame to see the first few rows of data
head(xlsx_data)
```

**Loading a specific sheet from an XLSX file**

If you want to load a specific sheet from an XLSX file, you can use the <mark style="color:red;">`sheet`</mark> argument to the <mark style="color:red;">`read_excel()`</mark> function.

For example, to load the sheet named "<mark style="color:red;">`Sheet1`</mark>" from the XLSX file <mark style="color:red;">`data.xlsx`</mark>, you would use the following code:

```
# load the data
xlsx_data <- read_excel("data.xlsx", sheet = "Sheet1")

# loading multiple sheets
xlsx_data <- read_excel("data.xlsx", sheets = c("Sheet1", "Sheet2"))
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ar-riyaz.gitbook.io/r-for-bioinformatics/load-data.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
