Can load data into R using
read.csv() to get data from a comma separated variable (CSV) file.
DataFrameName <- read.csv(MyDataFile.csv)
Tidyverse package provides a lot of useful functions for manipulating data.
head(DataFrameName)– Shows first 6 rows of the data
tail(DataFrameName)– Shows last 6 rows of data
view(DataFrameName)– Shows all rows in tabular format
Can address elements of a dataframe using square brackets or the dollar sign.
DataFrameName[1 , 3]– Show the value in the third column of the first row
DataFrameName[ , 3]– Show the value in the third column for all rows
DataFrameName$ColumnName– Show the value in the column ColumnName for all rows
R supports UNIX style piping using
%>% to represent the pipe. Below code installs the tidyverse package (line is commented out as only needs to be run once per installation of R then to will be available to all scripts), loads tidyverse into the session using
require() and then loads the data from a CSV file into a dataframe called BusData using
read.csv(). It then pipes the data through the select() function to just get 4 columns:
Financial year– The financial year being reported on (April to March in this case)
Month– The month within that financial year being reported on
Total.Bus.Patronage.per.month..Concessionary..Non.Concessionary– The total number of bus journeys taken in that month (both fare paying and free concessionary)
Free.Concessionary.per.month– The number of journeys where a free concessionary bus pass was used
The reduced dataset is then filtered to only return those rows for the Financial Year 2008 (April 2008 to March 2009).
# install.packages("tidyverse") require("tidyverse") BusData <- read.csv("rawBusPatronage.csv") BusData %>% select(Financial.year, Month, Total.Bus.Patronage.per.month..Concessionary...Non.Concessionary., Free.Concessionary.per.month) %>% filter(Financial.year == 2008)
This is the equivalent of the SQL statement (some column names changed to be legal in common RDBMSes):
select FinancialYear , Month , TotalMonthlyBusJournies , TotalMonthlyConcessionaryJournies from BusData where Financialyear=2008;
arrange() function could be used to sort the output.