Can load data into R using
read.csv() to get data from a comma separated variable (CSV) file.
DataFrameName <- read.csv(MyDataFile.csv)
Tidyverse package provides a lot of useful functions for manipulating data.
head(DataFrameName)– Shows first 6 rows of the data
tail(DataFrameName)– Shows last 6 rows of data
view(DataFrameName)– Shows all rows in tabular format
Can address elements of a dataframe using square brackets or the dollar sign.
DataFrameName[1 , 3]– Show the value in the third column of the first row
DataFrameName[ , 3]– Show the value in the third column for all rows
DataFrameName$ColumnName– Show the value in the column ColumnName for all rows
R supports UNIX style piping using
%>% to represent the pipe. Below code installs the tidyverse package (line is commented out as only needs to be run once per installation of R then to will be available to all scripts), loads tidyverse into the session using
require() and then loads the data from a CSV file into a dataframe called BusData using
read.csv(). It then pipes the data through the select() function to just get 4 columns:
Financial year– The financial year being reported on (April to March in this case)
Month– The month within that financial year being reported on
Total.Bus.Patronage.per.month..Concessionary..Non.Concessionary– The total number of bus journeys taken in that month (both fare paying and free concessionary)
Free.Concessionary.per.month– The number of journeys where a free concessionary bus pass was used
The reduced dataset is then filtered to only return those rows for the Financial Year 2008 (April 2008 to March 2009).
# install.packages("tidyverse") require("tidyverse") BusData <- read.csv("rawBusPatronage.csv") BusData %>% select(Financial.year, Month, Total.Bus.Patronage.per.month..Concessionary...Non.Concessionary., Free.Concessionary.per.month) %>% filter(Financial.year == 2008)
This is the equivalent of the SQL statement (some column names changed to be legal in common RDBMSes):
select FinancialYear , Month , TotalMonthlyBusJournies , TotalMonthlyConcessionaryJournies from BusData where Financialyear=2008;
The data set used here is based on an open data set downloaded from Birmingham Data Factory and then manipulated in Excel.
arrange() function could be used to sort the output.