Saturday 9 February 2013

Tuple Mover Mergeout, Replay Delete & Locks

Tuple Mover operation Mergeout is the process of consolidating ROS containers and purging deleted records. These ROS containers (called ROSs are actually files that are subset of Vertica's Read Optimized Store) are created by bulk loads and DML operations to the data stored within a projection. The Tuple Mover periodically merges ROS containers to maximize performance.

Mergeout works in two phases. First phase takes a U (Usage) lock that doesn't conflict with any other lock type request. Second phase takes a T (Tuple Mover) lock, which conflicts with transactions that require an X (Exclusive lock needed for delete operations) or O (Owner lock needed for Drop/Truncate) lock.
X & O locks are not allowed if the object already has a T lock (as a matter of fact O lock is not compatible with any lock and X lock is compatible only with U lock)

In such scenario, X & O lock requests would fail with an error like below:

ERROR: Locking failure: Timed out X locking Table:<<TABLE NAME>>. 
T held by [user <user name> (Mergeout: (Table: <<SCHEMA NAME.TABLE NAME>>) 
(Projection: <<PROJECTION NAME>>))]. Your current transaction isolation level is SERIALIZABLE

One common reason for an extended T lock duration is when Mergeout is doing the Replay Delete operation. Replay delete is the process of reconciling deleted records to determine if they should or shouldn't be included in to the new ROS containers (resulting from merging). This reconciliation is based on configuration parameter "HistoryRetentionTime", and it determines how long deleted data is saved (in seconds) as a historical reference. TM purges the records deleted prior to HistoryRetentionTime and records post this time are reorganized in new ROS containers. Replay delete is a rather slow operation in Vertica, since Vertica primarily is optimized for query intensive workloads.

But if a TM Mergeout is stuck in replay delete, you either have the option to wait for it to complete, or you will have to stop it to allow other operations to complete. To cancel a mergeout, we need to take the following steps:

1. Query table TUPLE_MOVER_OPERATIONS to get the associated SESSION_ID:

vsql=> select SESSION_ID from TUPLE_MOVER_OPERATIONS where IS_EXECUTING = True and PLAN_TYPE = 'Replay Delete';

2. Once we have the session id, we can use function CLOSE_SESSION to close the session and hence mergeout. e.g.

vsql=> select CLOSE_SESSION('myhost-12990:0x51c31');

(PS: Long running Replay delete can be avoided if we advance the AHM (function MAKE_AHM_NOW()) after any bulk deletes, this will make sure that Tuple Mover purges the deleted records rather than having to reorganize it.)

No comments:

Post a Comment