Python Data Persistence with Oracle Database

By: Yuli Vasiliev
To read more DBA articles, visit http://dba.fyicenter.com/article/

Learn how to build your own Python app backed by Oracle Database, utilizing Oracle XML DB, Python scripts, and PL/SQL stored procedures.

While Python is quickly gaining popularity among developers, Oracle Database has been No. 1 among enterprise-level databases for a long time. Coupling these two in an efficient way is an interesting topic for discussion, while this actually can be a real challenge, because both have a lot to offer.

Be warned, though; this article is not an overview of the most outstanding Python and Oracle Database features, providing a set of detached examples. Instead, with the help of a single sample, it tries to give you a taste of how these two technologies can be used together in a complementary way. In particular, the article walks you through the creation of an Oracle-backed Python application that implements the business logic in both Python and inside the database, utilizing stored PL/SQL procedures whose invocations are orchestrated from within a Python script.

As you’ll learn in this article, even the lightweight Oracle Database 10g Express Edition (XE) can be efficiently utilized as the database back end for your data-driven Web application whose front-end tier is built with Python. In particular, Oracle Database XE supports Oracle XML DB, a set of Oracle Database XML technologies that are often needed when it comes to building Web applications.

Sample Application
Collecting information about what the user does while using your application is becoming a popular mechanism for receiving user feedback. Often, a click-tracking tool incorporated in your online application can give you much more ideas about users’ preferences than any survey asking users to explicitly express a preference.

Taking a simple example, suppose you want to pick up the headlines of the three latest Oracle Technology Network (OTN) articles from the OTN – New Articles RSS page and place the links on your site. Then you want to collect information about how many times users have followed each of those links on your site. That’s what our sample is going to do. Now let’s try to figure out how all this functionality might be implemented. To start with, you have to decide how business logic will be distributed among application layers. In fact, deciding how to distribute business logic among application tiers is likely the most challenging part of planning a database-driven application. While there is usually more than one way to implement business logic, your job is to find the most effective one. As a general rule of thumb, when planning a database-driven application, you should seriously consider implementing key data-processing logic inside the database. This approach can help you cut down network overhead associated with sending data between the Web server and the database, as well as reduce the burden on the Web server.

Projecting all this theory onto our sample, you might, for example, put the burden of obtaining the article details being inserted into the database, on the stored procedures created in the database, thus saving the Web server the trouble of processing the task related to maintaining the data integrity. What this means in practice is that you won’t need to write the Python code responsible for keeping track of whether a record related to an article whose link is being clicked already exists in your database or not, and if not, inserting that record, obtaining all the required details from the OTN - New Articles RSS page. By letting the database itself keep track of such things, you get a much more scalable and less error-prone solution. In this case, the Python code will be responsible for just obtaining the article links from the RSS page and sending a message to the database when a user clicks an article link.

The rest of this article discusses how to implement the sample app. For a concise description of how to set up and launch the sample, you can refer to the readme.txt file in the root directory of the sample code.

Preparing Your Working Environment
To build the sample discussed here, you need to have the following software components installed (see Downloads portlet) and working properly in your system:
* Apache HTTP Server 2.x
* Oracle Database 10g Express Edition
* Python 2.5 or later
* mod_python module
* cx_Oracle module

For a detailed description of how to install the above components, you can refer to another OTN article, “Build a Rapid Web Development Environment for Python Server Pages and Oracle,” by Przemyslaw Piotrowski.

Designing the Underlying Database
It is generally a good idea to start with designing the underlying database. Provided that you have a user schema created and granted all the privileges needed to create and manipulate schema objects, your first step is to create underlying tables. In this particular case, you will need the only table called otn_articles_rss, created as follows:

CREATE TABLE otn_articles_rss (
guid VARCHAR2(100) PRIMARY KEY,
title VARCHAR2(200),
pubDate VARCHAR2(32),
link VARCHAR2(200),
clicks INTEGER
);

The next step is to design a stored procedure called count_clicks that will be invoked from Python code, updating data in the otn_articles_rss table. Before you can proceed to the count_clicks procedure, though, you have to answer the following question: What will happen when count_clicks tries to update the clicks field of the article record that has not yet been inserted into the otn_articles_rss table? Let’s say that a new item was just added to the RSS page and then a link to that item appeared on your site. When someone clicks the link, the count_clicks PL/SQL procedure is invoked from within the Python code responsible for handling clicks performed on the links to OTN articles. Obviously upon processing the first click, the UPDATE statement being issued within the count_clicks procedure will fail, because there is no row to update yet.

To accommodate such cases, you can implement an IF block in the count_clicks procedure, working if the SQL%NOTFOUND attribute is set to TRUE due to the UPDATE’s failure to find the specified record. Within that IF block, you first can insert a new row into the otn_articles_rss table, providing only the guid and the number of clicks specified. After that, you should commit the changes so that becomes immediately available to the other user sessions that may be already in need of updating the clicks field of the newly inserted article record. Finally, you should update this record, setting its title, pubDate, and link fields. This logic may be implemented as a separate procedure, say, add_article_details, which you can create as follows:

CREATE OR REPLACE PROCEDURE add_article_details (gid VARCHAR2, clks NUMBER) AS
item XMLType;
heading VARCHAR2(200);
published VARCHAR2(32);
url VARCHAR2(200);
BEGIN
SELECT extract(httpuritype.createuri(
'http://feeds.delicious.com/v2/rss/OracleTechnologyNetwork/otntecharticle').getXML(),
'//item[contains(guid, "'||gid||'")>0]')
INTO item FROM DUAL;
SELECT extractValue(item, '//title'),
extractValue(item, '//pubDate'),
extractValue(item, '//link')
INTO heading, published, url FROM DUAL;
UPDATE otn_articles_rss SET
title = heading,
pubDate = published,
link = url,
clicks = clicks + clks
WHERE guid = gid;
END;
/

As you can see, the procedure accepts two arguments. gid is the guid of the article whose link is being clicked. clks is the amount by which to increase the number of total article’s views. In the procedure body, you obtain the required portion of the RSS document as an XMLType instance, and then extract the information that is then immediately used to fill up the otn_articles_rss’ record associated with the RSS item being processed.

With add_article_details in place, you can move on and create the count_clicks procedure as follows:
CREATE OR REPLACE PROCEDURE count_clicks (gid VARCHAR2, clks NUMBER) AS
BEGIN
UPDATE otn_articles_rss SET
clicks = clicks + clks
WHERE guid = gid;
IF SQL%NOTFOUND THEN
INSERT INTO otn_articles_rss(guid, clicks) VALUES(gid, 0);
COMMIT;
add_article_details (gid, clks);
END IF;
COMMIT;
END;
/

Transaction Considerations
In the count_clicks stored procedure shown in the above listing, note the use of COMMIT that follows immediately after the INSERT statement. What’s really important, though, is that it is followed by a call to add_article_details, whose execution may take much time. By committing at this stage, you make the newly inserted article record immediately available for other possible updates, which would otherwise be waiting for the completion of add_article_details.

Consider the following example. Suppose that the RSS page was just updated and a fresh new article link has become available. Next, two different users load your page and click this new link at almost the same time. As a result, two simultaneous calls to count_clicks will be made. In this case, the call that happens first will insert a new record into the otn_articles_rss table, and then it will call add_article_details. While the add_article_details execution is in progress, the other call to count_clicks could successfully perform the update operation, increasing the total click counts. However, if COMMIT were omitted here, the second call would fail to find a row for updating and, therefore, try to perform another insert. In fact, that would not lead to predictable results. What it would lead to, though, is a unique constraint-violated error and the loss of an update to be made by the second count_clicks call.

The most interesting part here is performing another COMMIT operation in the end of the count_clicks procedure body. As you might guess, committing at this stage is required to remove a lock from the record being updated, making it immediately available for updates performed by other sessions. Some may argue that this approach reduces flexibility, depriving the client of the ability to commit or roll back the transaction at its discretion. In this particular case, however, this is not a big issue because the transaction started upon a call to count_clicks should be committed immediately anyway. This is because count_clicks is always called when a user clicks an article link, thus leaving your page.

Building the Front-End Tier
Now that you have the stored procedures created and ready to be used in the application, you have to figure out how to orchestrate the entire flow of operations performed by all those pieces of application logic implemented inside the database, from within the front-end tier. This is where Python comes into play.

Let’s start with a simple implementation. In order to begin, you need to write some Python code that will be responsible for obtaining data from the OTN – New Articles RSS page. Then, you will need to develop code that will process clicks performed on the OTN articles’ links placed on your Web page. Finally, you will need to build that Web page itself. For that, you might use one of Python’s server-side technologies, say, Python Server Pages (PSP), which makes it possible to embed Python code into HTML.

To write Python code, you can use your favorite text editor, such as vi or Notepad. Create a file called oraclepersist.py and then insert the following code there, saving the file where the Python interpreter can find it:

Full article...