At Universal Yums we generate ~120,000+ renewal orders on the 1st of every month. Before updating to HPOS we were able to process ~12-15k orders per hour without any action scheduler failures.
After switching to WooCommerce HPOS, ~25% of our renewals failed during our first renewal cycle. The action scheduler logged this error “WordPress database error Deadlock found when trying to get lock; try restarting transaction for query INSERT INTO wc_order_addresses”.
Eventually the action would fail with this error: “Call to a member function get_date_created() on bool”, which happened because the renewal order was never created due to the address table lock.
To complete renewals for that cycle, we wrote a custom script that updated all subscriptions with a failed renewal action back to “active” status (since they had been put “on-hold” at the beginning of renewal), and then rescheduled the renewal action. This worked, but not something we wanted to do every month.
For the next renewal cycle, we decided to reduce the amount concurrent action scheduler batches. Historically, we’ve set this number at 20, which is what our server has been capable of running without too many deadlocks in the actionscheduler_action table. But since the issue was with database writes to the address table, we figured slowing down the process could help. (The action_scheduler_queue_runner_concurrent_batches
filters is what controls this.)
As you might expect, less concurrent batches led to slower order processing and less failures. Here’s some rough benchmarks:
Concurrent batches | Renewals per hour | Failures |
20 | 12,000 | 25% |
10 | 10,000 | 10% |
5 | 7,000 | 5% |
It’s possible if we just ran just a single action scheduler process we wouldn’t have any deadlocks- but that is not viable since it would take several days to run all our renewals. To work around the issue of failures in an automated fashion, we decided to automatically reschedule any actions that failed due to an address lock.
Here’s some example code for how we’re doing this (I am not including all the custom methods, but you can get this gist):
/**
* Handle WP CLI message when the action has failed.
*
* If there was a deadlock on the addresses table let's try to capture that and reschedule the action for later.
* These deadlocks all seemed fail with "Call to a member function get_date_created() on bool" as the error message.
*
* @param int $action_id
* @param Exception $exception
*/
function wp_cli_renewal_action_failed( $action_id, $exception ) {
// Skip if the exception message does not contain "Deadlock" or "get_date_created".
if ( ! str_contains( $exception->getMessage(), 'Deadlock' ) && ! str_contains( $exception->getMessage(), 'Call to a member function get_date_created() on bool' ) ) {
return;
}
// Is there anything in $wpdb->last_error? Let's log it if so, just in case.
// No need to generally log otherwise, as it will still show up in the failed scheduled action logfile.
$this->maybe_log_mysql_error( $action_id, $exception->getMessage() );
$failed_action = $this->get_failed_action_details( $action_id );
if ( ! $failed_action ) {
return;
}
$hook = $failed_action['hook'];
$args = json_decode( $failed_action['args'], true );
// If the hook is not one of the renewal hooks, skip.
if ( ! in_array( $hook, [ 'woocommerce_scheduled_subscription_payment', 'yums_monthly_order_for_annual_subscription' ], true ) ) {
return;
}
// If there's already a scheduled action, skip.
if ( as_has_scheduled_action( $hook, $args ) ) {
return;
}
// Get subscription ID so we can check a few other things.
$subscription_id = $args['subscription_id'] ?? '';
$subscription = wcs_get_subscription( $subscription_id );
if ( ! $subscription ) {
return;
}
// If the subscription has a renewal order for this month already, skip.
if ( SubscriptionHelpers::subscription_has_current_renewal( $subscription ) ) {
return;
}
// If the subscription is on hold, it needs to be set to active again to try to renew.
if ( $subscription->has_status( 'on-hold' ) ) {
$subscription->update_status( 'active', 'Updated subscription from on-hold to active to retry failed renewal action.' );
}
// Reschedule for 1 minute from now.
as_schedule_single_action( time() + MINUTE_IN_SECONDS, $hook, $args );
}
// Try to capture failed action specific to renewals deadlock, and log and reschedule it.
add_action( 'action_scheduler_failed_execution', 'wp_cli_renewal_action_failed', 11, 2 );
This doesn’t feel like an ideal long term solution, but as a medium term workaround it should be okay.
Additional Technical Notes
The core issue is the INSERT INTO ... ON DUPLICATE KEY UPDATE
on the wc_order_addresses
table when there are multiple scheduled actions running. It’s the equivalent of checking if a record with a unique key present in the values exists in the table, and if so, UPDATE the existing record instead of INSERT a new record. Even if there is no conflicting data in the table, we believe the query engine still does a check if there is a duplicate key first.
This ON DUPLICATE KEY
check is what appears to be locking the table. The size of our address table (23+ million rows) in addition to the large amount of order records we’re attempting to at once is perhaps unique to our site, but we’ve been communication with the WooCommerce team are hoping for a better solution at some point in the future.