MySQL and SQL

Part: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45

(Continued from previous part...)

MySQL - Using Your Own Benchmarks

You should definately benchmark your application and database to find out where the bottlenecks are. By fixing it (or by replacing the bottleneck with a 'dummy module') you can then easily identify the next bottleneck (and so on). Even if the overall performance for your application is sufficient, you should at least make a plan for each bottleneck, and decide how to solve it if someday you really need the extra performance.

For an example of portable benchmark programs, look at the MySQL benchmark suite.
You can take any program from this suite and modify it for your needs. By doing this, you can try different solutions to your problem and test which is really the fastest solution for you.

It is very common that some problems only occur when the system is very heavily loaded. We have had many customers who contact us when they have a (tested) system in production and have encountered load problems. In every one of these cases so far, it has been problems with basic design (table scans are NOT good at high load) or OS/Library issues. Most of this would be a LOT easier to fix if the systems were not already in production.

To avoid problems like this, you should put some effort into benchmarking your whole application under the worst possible load! You can use Sasha's recent hack for this - super-smack. As the name suggests, it can bring your system down to its knees if you ask it, so make sure to use it only on your developement systems.

MySQL - Design Choices

MySQL keeps row data and index data in separate files. Many (almost all) other databases mix row and index data in the same file. We believe that the MySQL choice is better for a very wide range of modern systems.

Another way to store the row data is to keep the information for each column in a separate area (examples are SDBM and Focus). This will cause a performance hit for every query that accesses more than one column. Because this degenerates so quickly when more than one column is accessed, we believe that this model is not good for general purpose databases.

The more common case is that the index and data are stored together (like in Oracle/Sybase et al). In this case you will find the row information at the leaf page of the index. The good thing with this layout is that it, in many cases, depending on how well the index is cached, saves a disk read. The bad things with this layout are:

Table scanning is much slower because you have to read through the indexes to get at the data.
You can't use only the index table to retrieve data for a query.
You lose a lot of space, as you must duplicate indexes from the nodes (as you can't store the row in the nodes).
Deletes will degenerate the table over time (as indexes in nodes are usually not updated on delete).
It's harder to cache ONLY the index data.

MySQL Design Limitations/Tradeoffs

Because MySQL uses extremely fast table locking (multiple readers / single writers) the biggest remaining problem is a mix of a steady stream of inserts and slow selects on the same table.

We believe that for a huge number of systems the extremely fast performance in other cases make this choice a win. This case is usually also possible to solve by having multiple copies of the table, but it takes more effort and hardware.

We are also working on some extensions to solve this problem for some common application niches.

(Continued on next part...)

Part: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45