More DBA job interview questions and answers at
(Continued from previous question...)
MySQL is Portability
Because all SQL servers implement different parts of SQL, it takes work to write portable SQL applications. For very simple
selects/inserts it is very easy, but the more you need the harder it gets. If you want an application that is fast with many
databases it becomes even harder!
To make a complex application portable you need to choose a number of SQL servers that it should work with.
You can use the MySQL crash-me program/web-page http://www.mysql.com/information/crash-me.php to find functions, types, and
limits you can use with a selection of database servers. Crash-me now tests far from everything possible, but it is still
comprehensive with about 450 things tested.
For example, you shouldn't have column names longer than 18 characters if you want to be able to use Informix or DB2.
Both the MySQL benchmarks and crash-me programs are very database-independent. By taking a look at how we have handled this,
you can get a feeling for what you have to do to write your application database-independent. The benchmarks themselves can
be found in the `sql-bench' directory in the MySQL source distribution. They are written in Perl with DBI database interface
(which solves the access part of the problem).
See http://www.mysql.com/information/benchmarks.html for the results from this benchmark.
As you can see in these results, all databases have some weak points. That is, they have different design compromises that
lead to different behavior.
If you strive for database independence, you need to get a good feeling for each SQL server's bottlenecks. MySQL is VERY
fast in retrieving and updating things, but will have a problem in mixing slow readers/writers on the same table. Oracle, on
the other hand, has a big problem when you try to access rows that you have recently updated (until they are flushed to
disk). Transaction databases in general are not very good at generating summary tables from log tables, as in this case row
locking is almost useless.
To get your application really database-independent, you need to define an easy extendable interface through which you
manipulate your data. As C++ is available on most systems, it makes sense to use a C++ classes interface to the databases.
If you use some specific feature for some database (like the REPLACE command in MySQL), you should code a method for the
other SQL servers to implement the same feature (but slower). With MySQL you can use the /*! */ syntax to add MySQL-specific
keywords to a query. The code inside /**/ will be treated as a comment (ignored) by most other SQL servers.
If REAL high performance is more important than exactness, as in some Web applications, a possibility is to create an
application layer that caches all results to give you even higher performance. By letting old results 'expire' after a
while, you can keep the cache reasonably fresh. This is quite nice in case of extremely high load, in which case you can
dynamically increase the cache and set the expire timeout higher until things get back to normal.
In this case the table creation information should contain information of the initial size of the cache and how often the
table should normally be refreshed.
(Continued on next question...)