Deadlocks – Share the Road?

This past summer, when the vast majority of big application enhancements were in after our massive re-platforming project (SQL Server backend), we were able to get some of the Dev resources to start getting more attention on the day to day performance issues, including some frequent occurrences of deadlocks (a few more than I’d like to admit!) and some long running transactions.  While the application had to logic to retry the transaction and processing would continue, this is not something you want to have as a regular occurrence.

This has been an ongoing effort between dev & DBA, and we’ve used several different tools & methods to improve overall system performance:

  • Adding indexes
  • Adding indexes to temp tables within stored procedures
  • Converting large Entity Framework queries to stored procedures
  • Implementing a caching solution (Redis) for a portion of the application
  • Rewriting Stored Procedure & View logic to reflect usage patterns in the application.

With all the tuning we’ve done, we still have one stored procedure which is occasionally involved in deadlocks.  This stored procedure itself has had a lot of care and feeding, where I not only was able to leverage indexes on temp tables for performance but an application logic rewrite, which resulted in a 95% decrease in average time for this stored procedure.  Deadlocks reduced further, but still occurs when we hit a particular race condition.  This has become my white whale.

I was again inspired by Kendra Little’s Troubleshooting Blocking and Deadlocks for Beginners course on https://sqlworkbooks.com/ (and even if you are not a beginner,  I think she has a great way of looking at concepts that at least for me gives me a different perspective on approaching problems).  An example she gives about deadlocks is how it’s like two trucks going towards each other on a narrow road, there isn’t room to pass.  In her example, she created an alternative route for a read query to traverse that was being blocked by an update.

The trick on my deadlock was that it involved the primary key and the clustered key – which on this table happened to be two different keys.  My victim was consistently using the Primary key for the lookup but then needed to updated the Clustered Key.  My killer was using the Clustered Key for the lookup & update.  So my thought was can I tell my stored procedure to use the same path as the killer based on my knowledge of my data setup?  Share the road and drive in the same order?  (maybe it’s more like the circle driveway like you have at a hotel, where everyone should enter & exit the same way).

To visualize, here is an example table “OrderItems” to illustrate.  The yellow column, ItemId, is my Primary Key.  The red columns, OrderID & ItemNumber are my Clustered key, on which the table is organized.  In SQL Server, Clustered indexes sort and store the data rows in the table based on their key values (see MSDN Clustered and Nonclustered Indexes Described for more details). Generally the Clustered Index and Primary Key would be the same but in our application, we are referring to the Items in the context of the Orders.

Screen Shot 2017-03-17 at 2.01.28 PM

My “victim” example:

UPDATE OrderItems
SET cost = peritemcost
FROM orderItems oi JOIN #temprollup tr on oi.itemid = tr.itemid;

My Killer example:

UPDATE OrderItems
 SET ItemNumber = @ItemNumber, Cost=@Cost, Taxable= @Taxable, Quantity=@Quantity, ...
WHERE ItemID = @ItemID;

Why is my “Killer” using the Clustered key index vs. the primary key for lookup?  In this case, every column is potentially updated.  My victim uses the Primary key for its lookup.

In testing, adding the Clustered Key first column prompted my stored procedure to use the Clustered key for the lookup & update.

My “victim” update (one option – another would have been to keep the OrderID column in the temp table and include on the join, but this was the quickest to illustrate)

UPDATE OrderItems
SET cost = peritemcost
FROM orderItems oi JOIN #temprollup tr on oi.itemid = tr.itemid
WHERE OrderID = @OrderID;

While I need to do further testing & load testing, this looked to be a promising start!  (It also looked like my stored procedure change reduced that step from ~13 ms to ~5 ms, removed a work table and reduced my logical reads from 634 to 588 on my “OrderItems” table on my test set).

Capturing Blocking

Blocking in databases is a result of locks taken in the database in order to help maintain concurrency among multiple transactions, indicating transaction intent.  A potential side effect of these locks can be blocking (and in some cases deadlocks!), where transactions are waiting for other transactions to release a lock resource.  Often we can see this result in slower performance for users.  (some of this is affected by transaction isolation levels, which would be a whole different post).

Similar to capturing deadlocks & foreign key errors, blocking can be captured.  I think in this case SQL Server has the leg up on tools here.

MySQL

On the MySQL side, I look for long running transactions, not necessarily only blocking transactions.  Here I use the Percona Monitoring Plugin for Nagios, using pmp-check-mysql-innodb , checking for all three options idle_blocker_duration, waiter_count and max_duration.

SQL Server

I’d like to thank Kendra at sqlworkbooks.com for explaining the (native!) tools in SQL Server to capture blocking in her online training class Troubleshooting Blocking and Deadlocks for Beginners (free through Dec 2017, recommended!).

Unlike the xml_deadlock_report, the blocked_process_report does not appear to be captured by default in the system_health extended event.  Like deadlocks, a custom Extended Event can be created & enabled to capture blocked processes, choosing “blocked_process_report” from the event library under Select Events to capture.  Once a blocker is captured, you can doubleclick to open the report and view information on the resources affected & (partial) queries.

Additionally, there is a handy open source SQL Server Blocked Process Report Viewer that creates a stored procedure which creates the lead & blocker trees for viewing the blocked processes in an easier to read tree format based based on the blockers captured in the extended events session.

The extended event captures blocking sessions, but doesn’t notify on occurrence.  An SQL Server Agent alert condition can be setup for “@performance_condition=N’General Statistics|Processes blocked||>|0′” to notify when a blocking event has occurred.

 

Capturing Foreign Key Errors

Looking further into capturing certain error conditions such as yesterday’s Capturing Deadlocks post, next up was looking for foreign key errors.

MySQL

Similar to capturing Deadlocks in MySQL, the most recent foreign key error can be viewed in SHOW ENGINE INNODB STATUS.  Similarly, the Percona Toolkit can be used to capture & log the foreign key errors using pt-fk-error-logger.  The setup again is similar to capturing deadlocks.

  • Install the Percona toolkit using your standard process
  • For the initial setup, I create a new tools database & tools user for this setup
  • The below block will create the percona.deadlocks table if it does not exist, and run the logger in daemon mode, checking every 60 seconds.
    1. Additional options
      pt-fk-error-logger h=<hostname>,u=perconaadm,P=3306 --ask-pass --dest D=percona,t=foreign_key_errlog --daemonize --interval 60s
    • Additional options notes
      • –ask-pass will cause a prompt for the password to be entered twice, password will not be visible in the process list.
      • –run-time 5m – This will run the tool in Daemon mode for 5 minutes, then exit.
      • –create-dest-table is optional (if you’ve already created the table!)
  • I have scripted this out to check the daemon is running.

The logging of the error contains the timestamp of the error and the error message including the constraints violated and the failing statement.

 SQL Server

I was not able to find a corresponding automatic capture for foreign key errors, we rely on the application to catch the foreign key errors and log for further review.

Capturing Deadlocks

The last few months I’ve been taking a number of online & in person SQL Server training classes & meetups, living in New England there are several options within a couple of hours drive to attend SQL Saturdays.

Going over the training and some recent patching of my MySQL environment has reaffirmed a that there is overlap between SQL Server & MySQL, even if the tools may vary.  Capturing Deadlocks for example!

MySQL

On the MySQL side, the most recent deadlock can be viewed using SHOW ENGINE INNODB STATUS.  Downside is that depending on the volume of your deadlocks, that gives you no history.  To address this, I use a combination of tools from Percona:

  1. The Deadlock Logger from the Percona Toolkit to capture deadlocks on a regular basis and log them to a table within my database.
    • Install the toolkit using your standard process
    • For the initial setup, I create a new tools database & tools user for this setup
    • The below block will create the percona.deadlocks table if it does not exist, and run the logger in daemon mode, checking every 60 seconds.
      pt-deadlock-logger h=<hostname>,u=perconaadm --ask-pass --dest D=percona,t=deadlocks  --create-dest-table --daemonize --interval 60s
      • Additional options notes
        • –ask-pass will cause a prompt for the password to be entered twice, password will not be visible in process list.
        • –run-time 5m – This will run the tool in Daemon mode for 5 minutes, then exit.
        • –create-dest-table is optional (if you’ve already created the table!)
    • I have scripted this out to check the daemon is running.
  2. The pmp-check-mysql-deadlocks Nagios plugin from Percona Monitoring Plugins to send the team a Nagios alert if a deadlock occurs.
    • Install the plugins using your standard process
    • We have an additional user created with permissions to read the deadlocks table
    • Configure check in Nagios

SQL Server

On the SQL Server side, deadlocks (xml_deadlock_report) may be captured in the Extended Events system_health session(2012+), but depending on the volume of events, these could be overwritten.  An extended event specific to capture xml_deadlock_report specifically can be created to have a longer history.

View Deadlocks under system_health:

  1. In SQL Server Management Studio, expand Management –> Extended Events –> Sessions –> system_health.
  2. Right click on package0.event_file, choose “View Target Data…”
  3. Click “Filters” on the Extended Events toolbar, and filter for:
    1. Field: Name, Value: xml_deadlock_report
    2. Click OK to apply filter
    3. DeadlockGraphFilter
  4. The deadlock graph is viewable under the Deadlock tab: DeadlockGraphexample.png

A SQL Server Agent alert could be considered to enabled alerts when deadlock thresholds are met (@performance_condition=N’Locks|Number of Deadlocks/sec|Database|>|0′), but we use an outside monitoring tool for our notifications.