Getting started with DTracing MySQL

By: Amit Saha
To read more DBA articles, visit http://dba.fyicenter.com/article/

DTrace is a dynamic tracing facility built into the Solaris and Open Solaris operating systems and can be used by system administrators and developers alike to observe the runtime behavior of user level programs and of the operating system itself. On one hand, DTrace can be used to identify potential bottlenecks in the running processes on a production system and on the other DTrace can help you understand the runtime behavior of an external program such as MySQL better

Getting started with DTracing MySQL

Overview of DTrace
DTrace is a dynamic tracing facility built into the Solaris and Open Solaris operating systems and can be used by system administrators and developers alike to observe the runtime behavior of user level programs and of the operating system itself. On one hand, DTrace can be used to identify potential bottlenecks in the running processes on a production system and on the other DTrace can help you understand the runtime behavior of an external program such as MySQL better.

Originally available on Solaris, DTrace has now been ported to Mac OSX, FreeBSD and a experimental Linux port is also available. In this article, I shall use OpenSolaris 2008.11 release to demonstrate the working.

Concepts
The DTrace architecture takes a good look at the various components of the DTrace framework.

The following graphic (reproduced from the DTrace how-to) illustrates the DTrace framework and its various components:

Probes, about which we learn very shortly (and not shown in the figure) can be best visualised as sensors available to be probed by the DTrace consumers in user space.

We shall now learn the basic DTrace concepts which will help us in our quest of some serious playing around with DTrace and MySQL.

Probes, Providers and Consumers
DTrace dynamically modifies the operating system kernel and user processes to record data at locations of interest (or instrumentation points), called probes. The probe is user specified and its specificity and description determine the benefit derived from DTrace. A probe is a location or activity to which the DTrace framework can bind a request to perform actions, such as logging the system calls, the function calls in user level processes, recording a stack trace and so on. A probe is said to fire when the activity, as specified by the probe description takes place. When a probe is fired, the requested action will take place. A probe has the following attributes which identifies it uniquely:
* It has a name and a unique integer identifier
* It is made available by a provider
* It identifies the module and the function that it instruments

The current probes available on your system can be displayed by pfexec dtrace -l . By using various switches, it is also possible to display only the probes belonging to, say a particular module. For eg. pfexec dtrace -l m:mysql* will list all the probes available via the mysql* provider. (Note the * in mysql* denotes all modules with names starting with mysql).

DTrace probes are implemented by kernel modules called providers, each of which performs a particular kind of instrumentation to create probes. Providers can thus described as publishers of probes that can be consumed by DTrace consumers (see below). Providers can be used for instrumenting kernel and user-level code. For user-level code, there are two ways in which probes can be defined- User-Level Statically Defined Tracing (USDT) or PID provider.

In USDT, custom probe points are inserted into application code according to well defined guidelines and practices. Once the custom probe points are integrated and the application code is compiled and the binary is run, the probes become available for consumption by DTrace user level consumers. However, unless they are used, the probes have a zero impact on the performance of the application or the system as a whole.

Does this mean that DTrace cannot be used with applications with no USDT probes defined? No.

The PID provider can be used to probe any user level process, whether USDT probes was defined for it or not. Using the PID provider is a very generic and easy way to play around with DTrace. Code a simple application in your favorite programming language and have fun with DTrace by observing the function call flow, stack trace and lots more. In the later parts of this article, we shall use both of the above for DTracing a running MySQL server.
* Probe descriptions*: A DTrace probe as mentioned above is uniquely specified by a 4-tuple, and usually takes the form:

provider:module:function:name

If one or more of the fields are missing, the specified fields are interpreted in a right to left fashion. That is, if a probe description is given as foo:bar, the probe description matches all probes with the function foo and the name bar, regardless of the provider or the module. To obtain desired results, it is suggested to specify all the required fields. You may also want to match all the probes published by any given provider, for which you would use the probe description like: fbt:::, which matches all the probes of the fbt provider.

A DTrace consumer is any process that interacts with the DTrace framework. The consumer specifies the instrumentation points by specifying probe descriptions. dtrace is the primary consumer of the DTrace framework. (Now, you see the difference between DTrace and dtrace?)

DTrace Scripts or D-scripts
Programs or scripts to interact with the DTrace framework are written in the D programming language. A D program source file consists of one or more probe clauses that describe the points of instrumentation to be enabled by DTrace. Each probe clause has the following form:

probe descriptions
/ predicate /
{
action statements

}

A D program can consist of one or more of such probe clauses. The predicate and the list of action statements are optional and may not be required in some scenarios. D programs are described in detail here. A D program can be executed by specifying it via the '-s' switch to dtrace or making it an executable (like a shell script) and setting 'dtrace' as the interpreter, by #!/usr/sbin/dtrace in the script. Using DTrace with MySQL

As mentioned earlier, there are two ways in which DTrace can be used with any user level process- USDT and PID provider. We shall see demonstrations of both these mechanisms as we start using DTrace with the MySQL server, or specifically 'mysqld'. One thing to note here is that, DTrace one-liners can be used to demonstrate a lot of what we will be doing. But, to make the learning curve easier, we will use D-scripts, however small they may be. Familiarity with the MySQL source code is required to derive the maximum advantage of the rest of the article.

Before I start off with writing D scripts, here are some common points worth noting:

* Like a shell script, you can make a D script executable by a chmod +x and specifying the location of the script interpreter using a #!, like: #!/usr/sbin/dtrace.
* You can specify various switches to dtrace. For eg. you are specifying a D script to dtrace, you will use the -s switch
* The parameters of the function being traced are available to a D script using built in variables: arg0, arg1, arg2... Other built in variables like timestamp, walltimestamp are described here.timestamp gives the current value of a nanosecond counter, which increments from an arbitrary time in the past and is useful for relative time calculation. walltimestamp, the current number of nanoseconds since UTC, is more suited when Date/Time value is required.
* copyinstr is used to copy a the value of a char* type parameter to a variable in your D script. By default, strings upto a maximum size of 256 can be stored. You can change it using #pragma D option strsize=1024
* When monitoring probes for a multi-threaded application, such as mysqld, it is essential that each thread (and its variables) is treated as such. Thread-local variables, denoted using a "self->": makes it possible to prevent corruption of the variables of one thread by another
* DTrace allows usage of clause-local variables. To declare a variable as clause-local, specify it as this->, such as this->bar. As suggested by its name, the scope of a clause-local variable is limited to the probe clause or predicate, in which it is used. For more information on thread-local and clause-local variables, please refer here
* A number of macro variables are available in DTrace. A very commonly used is "$target": which is used in scripts using the PID provider.

Using the PID provider
To use the PID provider, you need to have a mysqld instance running on a (Open)Solaris system. (You won't need any special build of MySQL for this). Please note that, the function names are garbled in a binary. Hence any command, for eg. mysql_parse will not be exactly the same, but will have other text at the starting and the ending. We can use "nm" to see the garbled names:

nm mysqld | grep mysql_parse
"2":1134 | 136472368| 640|FUNC |GLOB |0 |13 |__1cLmysql_parse6FpnDTHD_pkcIp3_v_

Hence, we shall simply use the regex '*' at the beginning and end of the function name in our D scripts.
* Watching queries

#!/usr/sbin/dtrace -q

pid$target::*mysql_parse*:entry /* This probe is fired when the execution enters mysql_parse */
{
printf("Query: %s\n", copyinstr(arg1));
}

Save this script to a file, say watch.d. A D script is specified to dtrace with the -s switch. The Process ID (PID) specified via the switch -p is automatically made available to the $target macro in the D script.

Now, run the D script, watch.d :

$ pfexec dtrace -s watch.d -p `pgrep -x mysqld`

Fire up a MySQL client and run some queries. The D script should display the queries that you executed from the client:

Query: show databases
Query: show variables
Query: show engines
Query: SELECT DATABASE()
Query: show databases
Query: show tables
Query: show tables
Query: select * from foo

* Follow the query execution
A SQL query before execution passes through various other stages, the first of which is the query parsing. The query parsing is taken care of by the mysql_parse function in sql/sql_parse.cc. Since all the other stages, such as query optimisation and finally execution follows from there, using the following script, we can track all the functions that calls and returns from after mysql_parse:

#!/usr/sbin/dtrace
#pragma D option flowindent

pid$target::*mysql_parse*:entry
{
self->start = vtimestamp;

}
pid$target:::entry
/self->start/
{
trace(timestamp);

}

pid$target:::return
/self->start/
{
trace(timestamp);
}
pid$target::*mysql_parse*:return
/self->start/
{

self->start = 0;

}

You will observe an output similar to:


CPU FUNCTION         
                        
 0  -> __1cLmysql_parse6FpnDTHD_pkcIp3_v_           1791629230 
  
 0    -> __1cJlex_start6FpnDTHD__v_                 1791654796 
  
 0      -> __1cSst_select_lex_unitKinit_query6M_v_        1791682536 
  
 0        -> __1cSst_select_lex_nodeKinit_query6M_v_        1791710909 
  
 0        <- __1cSst_select_lex_nodeKinit_query6M_v_        1791731630 
  
 0      <- __1cSst_select_lex_unitKinit_query6M_v_        1791751932
 0      -> __1cSst_select_lex_nodeLinit_select6M_v_        1791776112
 0      <- __1cSst_select_lex_nodeLinit_select6M_v_        1791796307
 0      -> __1cNst_select_lexKinit_query6M_v_        1791821839
 0        -> __1cSst_select_lex_nodeKinit_query6M_v_        1791850872
 0        <- __1cSst_select_lex_nodeKinit_query6M_v_        1791871148
 0        -> __1cJsql_alloc6FI_pv_                  1791900080
 0          -> pthread_getspecific                  1791921125
 .
 .
 .
 0          <- __1cEItemHcleanup6M_v_           27789538249088
 0        <- __1cKItem_identHcleanup6M_v_       27789538281270
 0      <- __1cKItem_fieldHcleanup6M_v_         27789538313437
 0      -> __SLIP.DELETER__Q                    27789538347654
 0      <- __SLIP.DELETER__Q                    27789538382588
 0    <- __1cLQdDuery_arenaKfree_items6M_v_     27789538415179
 0  <- __1cDTHDTcleanup_after_query6M_v_        27789538450969
 0  -> __1cKYacc_state2T5B6M_v_                 27789538486642
 0  <- __1cKYacc_state2T5B6M_v_                 27789538521089
 0  -> __1cQLex_input_stream2T5B6M_v_           27789538556597
 0  <- __1cQLex_input_stream2T5B6M_v_           27789538600602
 0  <- __1cLmysql_parse6FpnDTHD_pkcIp3_v_       27789538637701

For digging into MySQL internals, as above, please refer this guide. You are also advised to refer to Understanding MySQL Internals
* Logging Queries

So, we have watched queries go by, how about capturing them into a file so as to use it for our own file logging purposes? We shall use a DTrace destructive function freopen which redirects all that is written to a standard output into the specified file. (Discussion on the why it is called a destructive function). We are going to snoop on the "dispatch_command": (in sql_parse.cc) function:

#!/usr/sbin/dtrace -qw
#pragma D option strsize=1024
dtrace:::BEGIN
{
freopen("/tmp/sqls");

}
pid$target::*dispatch_command*:entry

{
printf("%Y-> %s \n",walltimestamp,copyinstr(arg2));
}

which gives logs like:

2009 Feb 5 08:13:43-> create table fo_bawr (i INTEGER)
2009 Feb 5 08:13:56-> create table foo_bar (is INTEGER)

Using the embedded static Probes
The PID provider helps us get upto speed really fast when we are learning to DTrace any user level application. It also doesn't need a specially built application binary. However, we need to know the source code of the application really well. A basic level of that knowledge will enable us to write D scripts which are only as good. DTrace static probes in an application partially reduces the need to know the code end-to-end to write useful probes. The reason being, the embedded probes can be easily the highest level of abstraction to the important functionalities that are useful and likely to be monitored for performance considerations. As noted earlier, as long as the static probes are not used, no performance hit is experienced.

Static probes are being gradually integrated into MySQL. As of MySQL 6.0.9, there are around 55 static probes. The probes are defined and documented in the file: sql/probes.d which is a good place to have a look at the currently available probes and understand how to use the probes in your D scripts. The currently available probes are also well described in the MySQL reference manual here.

To enable the static probes, you will have to supply an extra option to the configure script, --enable-dtrace. After the build is over, start mysqld. Now open a terminal, and type $pfexec dtrace -l | grep mysql. You should see something like this:

135 mysql23509 mysqld __1cQdispatch_command6FnTenum_server_command_pnDTHD_pcI_b_ command-done
136 mysql23509 mysqld __1cQdispatch_command6FnTenum_server_command_pnDTHD_pcI_b_ command-start
137 mysql23509 mysqld __1cQclose_connection6FpnDTHD_Ib_v_ connection-done
138 mysql23509 mysqld handle_one_connection connection-start
139 mysql23509 mysqld __1cMmysql_delete6FpnDTHD_pnKTABLE_LIST_pnEItem_pnLst_sql_list_LXb_b_ delete-done
140 mysql23509 mysqld __1cHhandlerNha_delete_row6MpkC_i_ delete-row-done
.
.

As expected, the function names are mangled. You can view the demangled names with 'c++filt':

pfexec dtrace -l -m:mysql* | c++filt

which you will show the demangled functions:

ID PROVIDER MODULE FUNCTION NAME
56209 mysql706 mysqld bool dispatch_command(enum_server_command,THD*,char*,unsigned) command-done
56210 mysql706 mysqld bool dispatch_command(enum_server_command,THD*,char*,unsigned) command-start
56211 mysql706 mysqld void close_connection(THD*,unsigned,bool) connection-done
56212 mysql706 mysqld handle_one_connection connection-start
56213 mysql706 mysqld bool mysql_delete(THD*,TABLE_LIST*,Item*,st_sql_list*,unsigned long,unsigned long long,bool) delete-done
56214 mysql706 mysqld int handler::ha_delete_row(const unsigned char*) delete-row-done
56215 mysql706 mysqld int handler::ha_delete_row(const unsigned char*) delete-row-start
56216 mysql706 mysqld int mysql_execute_command(THD*) delete-start
56217 mysql706 mysqld unsigned long filesort(THD*,TABLE*,st_sort_field*,unsigned,SQL_SELECT*,unsigned long,bool,unsigned long*)filesort-done
56218 mysql706 mysqld unsigned long filesort(THD*,TABLE*,st_sort_field*,unsigned,SQL_SELECT*,unsigned long,bool,unsigned long*) filesort-start
56219 mysql706 mysqld int handler::ha_external_lock(THD*,int) handler-rdlock-done
56220 mysql706 mysqld int handler::ha_external_lock(THD*,int) handler-rdlock-start
56221 mysql706 mysqld int handler::ha_external_lock(THD*,int) handler-unlock-done
56222 mysql706 mysqld int handler::ha_external_lock(THD*,int) handler-unlock-start
56223 mysql706 mysqld int handler::ha_external_lock(THD*,int) handler-wrlock-done
56224 mysql706 mysqld int handler::ha_external_lock(THD*,int) handler-wrlock-start
56225 mysql706 mysqld bool mysql_insert(THD*,TABLE_LIST*,List&,List >&,List&,List&,enum_duplicates,bool) insert-done
56226 mysql706 mysqld int handler::ha_write_row(unsigned char*) insert-row-done
56227 mysql706 mysqld int handler::ha_write_row(unsigned char*) insert-row-start
56228 mysql706 mysqld void select_insert::abort() insert-select-done
56229 mysql706 mysqld bool select_insert::send_eof() insert-select-done
56230 mysql706 mysqld int mysql_execute_command(THD*) insert-select-start
56231 mysql706 mysqld int mysql_execute_command(THD*) insert-start
56232 mysql706 mysqld bool multi_delete::send_data(List&) multi-delete-done
56233 mysql706 mysqld int mysql_execute_command(THD*) multi-delete-start
56234 mysql706 mysqld bool multi_update::send_data(List&) multi-update-done
.
.
.

A better Query Logger

Using one of the static probes, we shall now write a better query logger, containing information such as username and connection id.

#!/usr/sbin/dtrace -qws
#pragma D option strsize=1024

dtrace:::BEGIN
{
printf("Logging to file..Hit Ctrl-C to end.\n");
freopen("/tmp/mysql.log");

}
mysql*:::query-start /* using the mysql provider */
{

self->query = copyinstr(arg0); /* Get the query */
self->connid = arg1; /* Get the connection ID */
self->db = copyinstr(arg2); /* Get the DB name */
self->who = strjoin(copyinstr(arg3),strjoin("@",copyinstr(arg4))); /* Get the username */

printf("%Y\t %20s\t Connection ID: %d \t Database: %s \t Query: %s\n", walltimestamp, self->who ,self->connid, self->db, self->query);
}

which stores logs into a file, /tmp/mysql.log:

2009 Feb 23 15:42:38 root@localhost Connection ID: 3 Database: Query: select user()
2009 Feb 23 15:42:41 root@localhost Connection ID: 3 Database: Query: show engines
2009 Feb 23 15:42:51 root@localhost Connection ID: 3 Database: Query: create table fo_bawr(is INTEGER)
2009 Feb 23 15:42:55 root@localhost Connection ID: 3 Database: Query: SELECT DATABASE()
2009 Feb 23 15:42:55 root@localhost Connection ID: 3 Database: test Query: show databases
2009 Feb 23 15:42:55 root@localhost Connection ID: 3 Database: test Query: show tables
2009 Feb 23 15:42:56 root@localhost Connection ID: 3 Database: test Query: create table fo_bawr(is INTEGER)
2009 Feb 23 15:43:04 root@localhost Connection ID: 3 Database: test Query: create table fo_bawr(i INTEGER)
2009 Feb 23 15:43:08 root@localhost Connection ID: 3 Database: test Query: create table fo_bawr2(i INTEGER)

Counting the bytes per client connection

In the last example, we are going to use the following static probes to write a D script which will give use the number of bytes transferred on a per client connection basis:

* connection__start(unsigned long conn_id, char *user, char *host);: This probe is fired when a new client connects to the server
* command__done(int status): This probe is fired when the client disconnects
* These probes can be used to measure the number of bytes transferred in read and write operations:

probe net__read__done(int status, unsigned long bytes);
probe net__write__start(unsigned long bytes);

The script is as follows:

#!/usr/sbin/dtrace
#pragma D option quiet

dtrace:::BEGIN
{
printf("Tracking the bytes.. Hit Ctrl-C to end.\n");

}

mysql*:::connection-start
{

self->bytes_read = 0;
self->bytes_write = 0;
self->conn_id = arg0;
self->who = strjoin(copyinstr(arg1),strjoin("@",copyinstr(arg2))); /* Get the username */
printf("Got a client connection at %Y from %20s with ID %u\n", walltimestamp, self->who, self->conn_id);
self->client_connect_start = timestamp;

}

mysql*:::net-read-done/* using the mysql provider */
{

self->bytes_read = self->bytes_read + arg1;

}

mysql*:::net-write-start/* using the mysql provider */
{

self->start_w = timestamp;
self->bytes_write= self->bytes_write + arg1;

}

mysql*:::connection-done
{

printf ("Connection with ID: %u closed.\nTotal Bytes transferred: %u \nTotal connection time (ms): %-9d\n\n",self>conn_id, self->bytes_read + self->bytes_write,(timestamp-self->client_connect_start)/1000000 );

}

The script above, reports the data transfer activity as follows:

Tracking the bytes.. Hit Ctrl-C to end.
Got a client connection at 2009 Feb 23 20:07:24 from root@localhost with ID 50
Got a client connection at 2009 Feb 23 20:07:27 from amit@localhost with ID 51
Connection with ID: 50 closed.
Total Bytes transferred: 1081204027
Total connection time (ms): 17650

Connection with ID: 51 closed.
Total Bytes transferred: 3039908614
Total connection time (ms): 23787

Where can DTrace help the MySQL community?

The niche area of the MySQL community which DTrace can breathe life into are the Database Administrators (DBAs) who strive to keep the database in good health at all points. Its easy to identify the performance bottlenecks that might have crept into the server over a period of time. With intelligent probe descriptions, its easy to get relevant statistics of a running MySQL server.

Besides the DBAs, DTrace is a great tool to understand how the control flow occurs in the MySQL server, when it receives a client request, till it serves the request. This makes it very easy to understand all the different sub-components of the MySQL server architecture.

Full article...