R You Ready

R – An Introduction
R is a programming language package aimed at data scientist as a tool for computational statistics and visualisation. It has been developed into a popular language and data science programme for finance and data analytical companies. R is part of the open source revolution and has been created and supported entirely by developers and experts worldwide. R has a number of advantages including; every data analysis technique downloadable and free, cutting edge community reviewed methods, stunning data visualisation infographics, faster results with a manageable programme language and expert resources.

http://www.revolutionanalytics.com/what-r

R Code School

The best way to get to grips with R is to take the online tutorial through ‘Try R Code School’. Though basic, it runs through the primary sections and gets you acquainted with the R programming language. The tutorial is pirate themed and this made the sections enjoyable and the pirate in-jokes kept me entertained throughout. The seven sections in the tutorial were:
1. Using R
2. Vectors
3. Matrices
4. Summary Statistics
5. Factors
6. Data Frames
7. Real World Data
After completing each section, I was rewarded with a badge and each topic covered the basics to get me started with real world data sets.

Try R Code School Badges
Try R Code School Badges

Analysing the Data
Having previously worked in finance, I have an inherent interest (and experience) in financial analysis and reporting. I decided to use R programming to take financial data from the Irish Stock Exchange (ISEQ). I decided to focus on Aer Lingus shares over a ten year timeframe.
The first part of my research consisted of analysing the most powerful R packages to analyse my data. I found the most trended of the packages best at extracting financial time series data from internet sources were – Quantmod and Quandl. These packages work in a similar vein to a Bloomberg terminal but at no cost. As I was focusing on historic data, I used Quantmod to extract the data. Quandl would be the preferred package when looking at futures.
http://www.r-bloggers.com/quantitative-finance-applications-in-r/
I installed the Quantmod package from the ‘Packages’ dropdown in R and then tested searching for data using ticker symbols related to Aer Lingus shares – AERL.L.
This command essentially searched google to pull the ticker number ‘AERL.L’ and retrieve any data since 01/Jan/2004. This data is presented as daily log returns as the price; Open price, High price, Low price, Close price and Volume traded.

R commands to pull AERL.L data
R commands to pull AERL.L data

Now we have the data set, it is time to analyse the data to form some interesting information. The first chart I created was to run a time series showing the share price and the volume traded. This provides an illustration of the shares following an almost U-curve between 2006 and 2015.

Time Series of AERL.L data

Time Series of AERL.L data

We have a large data set giving daily prices of Aer Lingus shares over approximately a ten year period. A majority of modelling systems use the data in a XTS command object to extract subsets from the data range. This is widely used when extracting say monthly or quarterly data for additional analysis or reporting. This functionality is an example of how R can be used effectively over older analytical tools.

R commands to create XTS file

R commands to create XTS file and view data sets

Using the capabilities of the data set, I want to plot a graph showing the closing price of the shares. This graph is exactly what would be used to present to management and is an excellent representation of the data set.

R Graph of Closing Prices

R Graph to visualise closing prices of Aer Lingus Shares

An interesting analysis is to plot the daily log return of the closing prices. The resulting time series graph shows the visual impact of volatility in the share price. We can see that during the financial crisis (2008-2009) the share price was in flux and this would be evident of many traded shares at the time. Since 2010, the share price is still fluctuating (though at a lesser rate) and this would indicate instability in the company. Based on remedial research the likely effect has been the recovery in the business since 2010 and the recent speculation of a takeover from International Airlines Group (IAG).

Closing Prices Daily Log Return

Closing daily prices (daily log return)

 

R Summary Statistics

Summary statistics of Closing Prices

http://www.r-bloggers.com/quantitative-finance-applications-in-r-2/
http://www.r-bloggers.com/quantitative-finance-applications-in-r/

Concepts – If I had more time
It would be extremely difficult to propose a new effective financial model in such a short timeframe. In the above example we are not using indicators just taking data to determine market direction or trends. This example has given the power of R at modelling data and presenting the data in an excellent visual format. The data is current and can be easily updated through internet searches.
My analysis is limited in the sense that I have taken past data and from only one company. An excellent way to enhance the analysis would be to take competitor data and plot these against each other. This analysis over time would give an insight into market factors.
Quandl is another programme which looks at futures based on financial data. This would be an excellent programme to create models and predict future prices based on this information. Another way to analyse the data trends would be to analyse internet trends and keywords to see if there is a correlation with market movement. R would be able to analyse large data over time and this could be plotted against the share price chart.

References:

  • Cookbook for R, http://www.cookbook-r.com/ (Accessed: 01 August 2015)
  • Irish Stock Exchange (2015) ‘Market Data’ http://www.ise.ie/Market-Data-Announcements/Companies/Company-data/ (Accessed: 01 August 2015)
  • Playing Financial Data Series(1), Chenangen, (2014) http://www.r-bloggers.com/playing-financial-data-series1/ (Accessed: 03 August 2015)
  • Quantitative Finance Applications in R (Internet Sources), Joseph Rickert (2013) http://www.r-bloggers.com/quantitative-finance-applications-in-r/ (Accessed: 03 August 2015)
  • Quantitative Finance Applications in R (XTS), Joseph Rickert (2014) http://www.r-bloggers.com/quantitative-finance-applications-in-r-2/ (Accessed: 03 August 2015)
  • Revolution Analytics (2015) ‘R is Hot’ Available at: http://www.revolutionanalytics.com/whitepaper/r-hot (Accessed: 03 August 2015)
    Revolution Analytics (2015) ‘What R’ http://www.revolutionanalytics.com/what-r (Accessed: 03 August 2015)
  • Try R Code School (2015) http://tryr.codeschool.com/ (Accessed: 28 July 2015)

R Editor Commands
# Open Quantmod and, xts and moments
library(quantmod)
library(xts)
library(moments) # to get skew & kurtosis

# Searches from Google and pulls data since 01/08/2005
getSymbols(“AERL.L”, src=”google”,from=”2005-08-01″,to= “2015-08-14”);

# Plot a time series chart
Sys.setlocale(“LC_TIME”,”english”);
dev.new();
barChart(AERL.L,theme=”white”);
addBBands();

# Create an xts file of the ISEQ data and return TRUE
is.xts(AERL.L)

# View the dataset
head(AERL.L)
tail(AERL.L)

(AERL.L.Close) # returns TRUE
AERL.L.Close is.xts(AERL.L.Close) # returns TRUE
head(AERL.L.Close)

#Plot a graphic profile of the data
plot(AERL.L.Close, main = “Closing Daily Prices for Aer Lingus Shares(AERL.L)”,
col = “red”,xlab = “Date”, ylab = “Price”, major.ticks=’years’,
minor.ticks=FALSE)

# Set Closing price and Plot data
AERL.L.ret AERL.L.ret

plot(AERL.L.ret, main = “Closing Daily Prices for Aer Lingus (AERL.L)”,
col = “red”, xlab = “Date”, ylab = “Return”, major.ticks=’years’,
minor.ticks=FALSE)

# Set and plot data to find Mean, Std Dev (volatility), Skewness & Kurtosis
AERL.L.ret AERL.L.ret

statNames AERL.L.stats names(AERL.L.stats) AERL.L.stats

Google Fusion Table

Irish Population per County (Source: CSO Census Data)

Google Fusion Tables – Visualise your data

Okay so you have gathered some awesome data and you want to impress your boss with some useful information. Now while bar charts have their place, here is a way to make data visually alive. Thankfully there is a useful application which will do the hard work for you, and impress your boss at the same time.

“Google Fusion Tables is an experimental data visualization web application to gather, visualize, and share data tables.”

https://support.google.com/fusiontables/answer/2571232?hl=en

Google Fusion Tables is a web application tool used to create a visual interpretation of data sets. Data tables can be gathered from public data or imported from your own data. The data is then visualised and can be published and shared on the web. There is a real collaborative feel to the application and the information can be communicated to your target audience with ease.
Google Fusion Tables must firstly be installed by creating a Google account and signing into My Drive. Simply connect Fusion Tables as a new application, for free, and you are ready to begin.

Designing an Irish Population Heat Map
To create a Heat Map of the Irish population by county we needed two specific data tables, namely:
• Population figures by county (csv. file)
• Counties of Ireland data map (kml. file)
Now there are various ways these can be created but for this Heat Map the Population figures were taken from the most recent CSO database, which was taken in 2011.
http://www.cso.ie/en/statistics/population/populationofeachprovincecountyandcity2011/
The Map data was derived from a KML data file and contained geometry data on all the counties in the Republic of Ireland. This data was used to essentially plot the county boundaries in Google maps.
http://www.independent.ie/editorial/test/map_lead.kml

The next step was to cleanse the data which is important for any data exercises. The data from the CSO population table was converted into an Excel document and it was noticed that some of the counties included subsets which needed to be amended. The ‘State’ and ‘Provinces’ were removed and the data for Tipperary North and South was combined into one county. This left the data with 26 counties and corresponding population figures for each county.

The KML file was downloaded into Fusion Tables and there were 99 rows in total. This was the geometry data for the counties. This step is very important as the data from the two tables must be compatible or the files will not merge correctly.
These tables were uploaded into Google Fusion Tables ready to be ‘Merged’. This is where the power of Fusion Tables comes into its own. The Map file was opened and from File

A new Tab was created, with the merged data given a visual representation of the Population of Ireland by County for 2011. At this point, the map needs to be edited to give the Heat Map some visual meaning. It was decided to distribute the counties into six buckets based on population density. The figures were distributed as; 0 – 75,000 (6 counties), 75,000 – 100,000 (4), 100,000 – 125,000 (4), 125,000 – 180,000 (6), 180,000 – 250,000 (3) & 250,000 – 1,273,070 (3). Though this was not evenly distributed, the counties were easier to distinguish and the map had a clearer visual impact. The counties could have been evenly distributed by breaking the data from the Population table into even sets and represented in this fashion. Each bucket was given a colour which was incrementally darker as the population density increased. A legend was created and gives the Heat Map more context when distinguishing the county’s population numbers.

I have made my data public and this is an important feature of Google Fusion Tables. Anyone can now take my data use this to carry out further research on population in Ireland.

Irish Population Data in action
The Heat Map of Irish population could be used in a number of interesting ways depending on data gathered. The CSO website has a number of detailed databases with well-presented data sets on a number of topics including; housing, health, education, labour market, tourism and transport. These would be used on a macro level to for the government to decide on future spending requirements in certain areas. The country is experiencing a housing shortage and the government are expected to deliver social housing projects. To identify the biggest number of social housing areas needed the government would use a combination of social housing applicants by area. Plotting these two data sets would give a nationwide Heat Map and identifying the most needed areas on a more local scale. The KML data would need to be more granular to target specific areas within counties. A well-presented Heat Map would give an excellent representation specific area shortages and therefore where funding is most needed.
Taking data from the 2011 CSO Census, another heat map was created showing the Vacancy rates of Housing per County. The Heat Map below shows properties that re left vacant per county. This is another example of using CSO data to present a visual Heat Map in reporting social issues.

Further Practical uses for Fusion Tables
Google Fusion Tables have a variety of functions for making a visual interpretation of your data. Scene perception studies have proven people show an increased understanding of pictures based on colour. The most recognisable is for representing weather. News and weather reports are presented with predicted weather patterns and forecasts. This visual information is consumed and used for; sea crossings, floods, farming, heatwaves, icy roads, planning journeys.
An excellent use of Heat Maps has been on the research of Global Warming patterns. Predictive maps are powerful when publishing outcomes. The psychological impact of seeing the global warming patterns is proven to help with understanding and give meaning to, often complicated, data sets.

Conclusion
Google Fusion Tables is an excellent application to present data in a clear visual format. The application is extremely useful for taking geometry data in a KML file and creating a Heat Map using Google Maps. This visual representation, when shared, is an interactive way to present your data to a wider audience. The collaboration element gives the opportunity to enhance data and findings based on original data sources. The application has the potential to give a greater understanding of data sets, in a user friendly visual format.

Bibliography