Can load data into R using read.csv()
to get data from a comma separated variable (CSV) file.
DataFrameName <- read.csv(MyDataFile.csv)
Tidyverse package provides a lot of useful functions for manipulating data.
head(DataFrameName)
– Shows first 6 rows of the datatail(DataFrameName)
– Shows last 6 rows of dataview(DataFrameName)
– Shows all rows in tabular format
Can address elements of a dataframe using square brackets or the dollar sign.
DataFrameName[1 , 3]
– Show the value in the third column of the first rowDataFrameName[ , 3]
– Show the value in the third column for all rowsDataFrameName$ColumnName
– Show the value in the column ColumnName for all rows
R supports UNIX style piping using %>%
to represent the pipe. Below code installs the tidyverse package (line is commented out as only needs to be run once per installation of R then to will be available to all scripts), loads tidyverse into the session using require()
and then loads the data from a CSV file into a dataframe called BusData using read.csv()
. It then pipes the data through the select() function to just get 4 columns:
Financial year
– The financial year being reported on (April to March in this case)Month
– The month within that financial year being reported onTotal.Bus.Patronage.per.month..Concessionary..Non.Concessionary
– The total number of bus journeys taken in that month (both fare paying and free concessionary)Free.Concessionary.per.month
– The number of journeys where a free concessionary bus pass was used
The reduced dataset is then filtered to only return those rows for the Financial Year 2008 (April 2008 to March 2009).
# install.packages("tidyverse")
require("tidyverse")
BusData <- read.csv("rawBusPatronage.csv")
BusData %>%
select(Financial.year, Month, Total.Bus.Patronage.per.month..Concessionary...Non.Concessionary., Free.Concessionary.per.month) %>%
filter(Financial.year == 2008)
This is the equivalent of the SQL statement (some column names changed to be legal in common RDBMSes):
select FinancialYear
, Month
, TotalMonthlyBusJournies
, TotalMonthlyConcessionaryJournies
from BusData
where Financialyear=2008;
The data set used here is based on an open data set downloaded from Birmingham Data Factory and then manipulated in Excel.
The arrange()
function could be used to sort the output.