HBase row locks

A couple of days ago, I was thinking about the problem of distinguishing between create and update of rows on HBase, which is something we would like to do in Lily. By coincidence, at the same time there were two threads on the HBase mailing list discussing the same and a similarproblem.

Besides the problem of making a distinction between create and update, sometimes you are in a situation where you want to read a record, check what is in it, and depending on that update the record. The problem is that between the time you read a row and perform the update, someone else might have updated the row, so your update might be based on outdated information.

The BigTable paper, which served as inspiration for HBase, also mentions this ability:

Bigtable supports single-row transactions, which can be used to perform atomic read-modify-write sequences on data stored under a single row key.

HBase offers the ability to take a lock on a row, which is all you need for single-row transactions (updates to a single row are always atomic), though using row locks is not without danger. But let us first return to my original problem.

Distinguishing between create and update

HBase makes no difference between creating a row and updating a row, both are accomplished via the HTable.put() method. Along the same lines, retrieving a row consists of retrieving all the key-value pairs associated with its row key, if there are none than you get an empty result object returned, rather than null or a not-found exception. This also implies that it is impossible to insert a row in HBase that contains no values. A row exists by mercy of its key-value pairs.

So to distinguish between create and update of a row, we can do the following: get the row, check if it is empty, if not it already exists (throw exception), if it is empty, you can go on to create the row.

As already mentioned, this solution is not safe with multiple clients working concurrently. A solution is to lock the row. But the row does (maybe) not exist yet! No problem, HBase allows to take a lock on any row key, regardless of whether it exists (= whether there are key-value pairs stored for it). In fact, internally, HBase always takes a lock when doing a get or put operation. For completeness, I should mention that these locks are read-write locks, so they block both read and write operations on the row.

The below code illustrates how this is done. It makes use of HBaseTestingUtility, an easy way to launch a real HDFS and HBase setup as part of a testcase.

import org.apache.hadoop.hbase.HBaseTestingUtility;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.AfterClass;
import org.junit.BeforeClass;
import org.junit.Test;
import org.lilycms.testfw.TestHelper;

public class CreateTest {
    private static final HBaseTestingUtility TEST_UTIL = new HBaseTestingUtility();

    public static void setUpBeforeClass() throws Exception {

    public static void tearDownAfterClass() throws Exception {

    public void test() throws Exception {
        HBaseAdmin hbaseAdmin = new HBaseAdmin(TEST_UTIL.getConfiguration());
        HTableDescriptor table = new HTableDescriptor("CreateTestTable");
        HColumnDescriptor family = new HColumnDescriptor("Family");
        HTable htable = new HTable(TEST_UTIL.getConfiguration(), "CreateTestTable");

        byte[] rowkey = Bytes.toBytes("fookey");

        // Take a lock, no one else can do something with this row while we have the lock
        RowLock rowLock = htable.lockRow(rowkey);

        try {
            // Check if the record exists
            Get get = new Get(rowkey, rowLock);
            Result result = htable.get(get);
            if (!result.isEmpty()) {
                throw new Exception("Row with this key already exists.");

            // Row does not exist yet, create it
            Put put = new Put(rowkey, rowLock);
            put.add(Bytes.toBytes("Family"), Bytes.toBytes("Qualifier"), Bytes.toBytes("Value"));
        } finally {

Problems with row locks

Here is the response from Ryan R. to the above solution:

I would strongly discourage people from building on top of
lockRow/unlockRow.  The problem is if a row is not available, lockRow
will hold a responder thread and you can end up with a deadlock
because the lock holder won’t be able to unlock.  Sure the expiry
system kicks in, but 60 seconds is kind of infinity in database terms

I would probably go with either ICV or CAS to build the tools you
want.  With CAS you can accomplish a lot of things locking
accomplishes, but more efficiently.

This message is a bit dense. The unlock-problem Ryan is referring to is a kind of distributed deadlock situation. Imagine the following. Suppose an HBase region server accepts up to 50 concurrent connections. A client connects to the HBase node and takes a lock on a row. After this, 50 other clients decide they also want to take a lock on this row (or another locked row), but since the row is already locked, the calls block until the lock is unlocked (or until it expires) and the 50 available connections are occupied. Now the original client that has obtained the lock wants to unlock the row, but it is unable to connect to the HBase node because all connections are occupied!

As alternatives, he drops some TLA‘s: ICV and CAS. What are these?

ICV refers to the HTable.incrementColumnValue() method, as the name implies it is a way of atomically incrementing a column value without having to lock. If the row or column does not exist, the operation does not fail but behaves as if the value was 0. See this message for some creative thinking on how this can be used to distinguish create from update, though it is not foolproof as mentioned later in that thread.

CAS refers to the HTable.checkAndPut() method, the HBase variant of the check-and-set. It allows to conditionally update a row, the condition being that a certain row should have a certain value. If the row does not exist, the put will also go through. This allows to implement read-modify-write scenarios using OCC, e.g. by checking if a counter that you augment on every update still has the same value as when you read the record. But when it fails, you will have to try again (and again…).

Internally, the ICV and CAS operations are also implemented with locks. In this situation, the lock is no problem as this is code which runs locally in the region server (as mentioned before, every read or put internally also locks the row). So the problem with locks is really when using them in a distributed way.

From this, it follows that another solution would be to move your code from the client to the region server, sort of like how stored procedures run inside a SQL-process. This could be done today by subclassing the region server, but in the future coprocessors (HBASE-2000,presentation) should make this much more practical.

Finally, to return to the original problem of distinguishing between creates and updates, using row locks in our situation seems like an okay solution, as the row keys are in many cases GUIDs, in which case the chance of anyone taking a lock on such row is quasi non-existent.

Update: row locks do not survive region moves and splits

Since I wrote this blog, I meanwhile learned about another problem with row locks: they do not survive regions splits or moves. For example, suppose you have the following sequence of events: you take a lock on a row, after this HBase decides to split the region, and after that you do a put request which uses the lock. HBase will in that case throw an exception telling that the lock is unknown. Since region splits and moves are not exceptional, you can be certain to run into this situation, so be prepared for it (or better, don’t use HBase row locks).