Data mining, predictive analysis, and statistical techniques generally do not make headlines. However, in the last year, the New York Times cited Google’s Chief Economist Hal Varian as predicting a significant demand for statisticians in the next ten years. They also posted an article about the growing popularity of the R language as a tool for data analysis.
R is an open source alternative to statistical software such as SPSS and SAS. It is similar to the S language (developed by John Chambers and others at Bell Laboratories) and is also influenced by Scheme (a major Lisp dialect). It can be used to perform data analysis and visualization through an interactive programming environment.
The Oracle Database includes extensive statistical and analytical functionality. Many features, including Oracle SQL analytic functions and the DBMS_STAT_FUNCS package, are available in standard installations. Oracle Database also has optional packages that complement these capabilities, including Oracle OLAP.
Perhaps the most powerful of all advanced analytical capabilities is the Oracle Data Mining (ODM) option, which provides 12 data mining algorithms for discovering patterns and relationships and building and applying descriptive and predictive data mining models inside the database.
This article will demonstrate the use of Oracle Database 11g Enterprise Edition with the ODM option accessed by an R interface (RODM). ODM allows for analysis of data within the database. This eliminates the time, expense, and resources required to transfer data. The data remains secure in the database resulting in fewer problems due to information latency.
Oracle Database also has a graphical user interface to ODM called Oracle Data Miner “Classic” and a totally new Oracle Data Miner 11g Release 2 workflow GUI, an extension to SQL Developer, that will be available to early adopters soon. And now, for data analysts who are already familiar with the open source R language, there is now another solution: the RODM package.
The RODM package allows R users to interact with the Oracle Database and ODM functionality. Users can analyze and manipulate data without the use of SQL or PL/SQL. Written by Pablo Tamayo and Ari Mozes, it is available for download from the Comprehensive R Archive Network (CRAN). The package depends upon the RODBC package to make Oracle Database connections and do basic data manipulation.
RODM allows R users to access the power of the ODM in-database functions using the familiar R syntax. RODM provides a powerful environment for prototyping data analysis and data mining methodologies. RODM is especially useful for:
* Quick prototyping of vertical or domain-based applications where the Oracle Database supports the application
* Scripting of "production" data mining methodologies
* Customizing graphics of ODM data mining results (examples: classification, regression, anomaly detection)
The RODM interface allows R users to mine data using ODM from the R programming environment. It consists of a set of function wrappers written in source R language that pass data and parameters from the R environment to Oracle Database 11g Enterprise Edition as standard user PL/SQL queries via an Open Database Connectivity (ODBC) interface. The RODM interface code is a thin layer of logic and SQL that calls through an ODBC interface. RODM does not use or expose any Oracle product code because it is completely an external interface and not part of any Oracle product. RODM is similar to the example scripts (for example, the PL/SQL demo code) that illustrate the use of ODM, for example, how to create data mining models, pass arguments, retrieve results, and so on.
If you have not done so already, install the R language on your Microsoft Windows workstation. You can then install any R packages that you would like, including the RODM package.
Please note that you will need to access (or install) an Oracle 11g database with the ODM option to use RODM. If you don't have an installed Oracle database in place and you need to install one from scratch, we strongly recommend that you follow the guidelines in the Oracle Data Mining Administrator's Guide. Oracle Database Express Edition (Oracle Database XE) does not include this option. You will also need a database user with appropriate security and privileges as well as an ODBC connection to the database. This database user should have privileges to connect to the database, create tables, create views, and create mining models.
The following is a script to create such a user. This script (modified for your environment) would need to be run by a DBA or other person with administrative privileges.
CREATE USER dm IDENTIFIED BY
DEFAULT TABLESPACE users
TEMPORARY TABLESPACE temp
QUOTA 20M ON users;
GRANT create mining model TO dm;
GRANT create table TO dm;
GRANT create view TO dm;
GRANT create session TO dm;
ODBC Connection Configuration
The RODBC package is used to connect to Oracle Database. To configure an ODBC Data Source in Microsoft Windows, click the Start Menu and choose Control Panel, Administrative Tools, and Data Sources (ODBC).