DBA > Articles

Analyzing a MySQL database with R

By:
To read more DBA articles, visit http://dba.fyicenter.com/article/

In this article, we analyze a MySQL database which contains soccer transfer data, using the R environment, on OS X. We show in a few simple steps how you can link the two together.

Prerequisites
Make sure you have a recent version of MySQL, and the latest version of R installed. As our graphical user interface, we use RStudio. From RStudio you can install the MySQL package: install.packages("RMySQL"). If this doesn’t work for you, please refer to the installation instructions. For plotting, install the ggplot2 library by issueing a install.packages("ggplot2") in RStudio. Alternatively, you can use the graphical interface: in the bottom right panel, choose ‘Packages’ and click ‘Install Packages’. This presents you with a dialog to choose a CRAN mirror, and a prompt where you can enter the package name.

Analysis
Our MySQL database contains a table transactions, which contains transaction data. The transactions table has a column transfer_value of type float, which is what we are interested in. First, we generate a big list of all the transactions: SELECT transfer_value FROM transactions, which generates a MySQL table with a single column that contains the transaction data.

First, let’s make sure the MySQL library is imported:
> library(RMySQL)
Next, we connect to our database, which is named soccer:
> con <- dbConnect(MySQL(), dbname="soccer")
Now we can start to issue queries:
> transfer_values <- dbGetQuery(con, "select transfer_value from transactions")
> summary(transfer_values)
transfer_value
Min. : 5900
1st Qu.: 500000
Median : 1700000
Mean : 3688301
3rd Qu.: 4500000
Max. :94000000

This shows us a summary of the transfer values. The maximum is 94 million, which was Cristiano Ronaldo’s transfer to Real Madrid. To look at how the numbers are distributed, we can plot them in a graph, using the qplot function from the ggplot2 library, which is a convenience function that quickly generates ggplot plot objects.

> library(ggplot2)
> qplot(transfer_values$transfer_value)

This gives us a nice picture, which looks like a Power Law:
If we zoom in on the transfers up to 10 million, we get a more detailed picture, showing peaks at regular intervals: apparently, the soccer negotiators like nice round numbers too.

Full article...


Other Related Articles

... to read more DBA articles, visit http://dba.fyicenter.com/article/