Stale data in today's Invalidation technique

One may not implement strong consistency by simply deleting the key-value pairs impacted by an update to the RDBMS. This is because race conditions between concurrent read and write sessions may cause the system to produce stale data. Facebook recognized this in its NSDI 2013 publication and introduced the concept of leases to prevent certain race conditions. However, this does not remove stale data all together due to the use of snapshot isolation, see below. In our experiments using the BG benchmark (bgbenchmark.org), we observe the following amount of unpredictable data with different social graphs and mix of write actions.

System Load Invalidate Refresh Incremental Update
1 session 0% 0% 0%
Low, 10 concurrent sessions 0.5% 0% 0.01%
Moderate, 100 concurrent sessions 1.1% 1.4% 0.2%
High, 200 concurrent sessions 1.3% 1.8% 2.9%
Table 1. Percentage of unpredictable data with Twemcache. These percentages are reduced to zero with the IQ framework.

The following figure shows how snapshot isolation causes the system to produce stale data. In this figure, a write session updates the RDMBS and deletes its impacted key from the cache. The read session observes a cache miss, and queries the database to compute a value that it inserts in the cache. This query references data that is in the process of being updated by the write session. Snapshot isolation provides the query with a version of the database that is prior to the execution of the update command by the write session. This causes the read session to compute a stale value that is inserts into the cache. Once the write session commits, the value in the cache no longer reflects the state of the tabular data in the RDBMS. This causes subsequent references for the key to observe a stale value.

inv-stale-data
Figure 1. Race condition causes stale data due to snapshot isolation
IQ framework

The IQ framework provides strong consistency by using two leases: inhibit (I) and quarantine (Q). The I lease is granted to a read session when it observes a get miss. A write session must (1) acquire Q leases on all keys that it intends to update prior to committing its RDBMS transactions, (2) either invalidate or refresh the impacted keys after its RDBMS transaction commits, and (3) release its Q leases explicitly. The QaReg and QaRead commands of the IQ-Twemcached implement Step 1 with invalidate and refresh, respectively. The DaR and SaR commands of the IQ-Twemcached implement Steps 2 and 3 for invalidate and refresh, respectively.

A read session must issue IQGet(key) to retrieve the value of a key from the cache. If a value exists then the session is provided with this value. Otherwise, the session has observed a cache miss. In this case, IQ-Twemcached grants an I lease to the read session (assuming there are no pending I and Q leases on the referenced key) by generating a unique token that it provided to the read session. This token is maintained by the IQ-Client and is transparent to the application. The read session may proceed to query the RDBMS to compute a value for the key and insert it using IQSet(key,value). The IQ-Client processed this command by extending it with the token that identifies the lease and invoking IQSet(key,value,token) command of IQTwemcached. IQTwemcached will insert the value as long as the token identifies a valid lease.

A write session must quarantine all keys that it intends to update prior to commiting its RDBMS transaction. It does so using the QaReg command with invalidate and the QaRead command with refresh. Once its RDBMS transaction commits, it may proceed to either delete the impacted keys (with invalidate) using the DaR command or modify them (with refresh) using the SaR command. Note that both DaR and SaR release the obtained Q lease.

The following figure shows the impact of the I and Q leases on the race condition described in the previous section. Note that the IQGet command of the read session conflicts with the Q lease of the write session, forcing it to back off and try again. This back off and repeated attempt is supported by the IQ-Client and is transparent to the application. The IQGet succeeds once the write session issues its DaR command that releases its Q lease. At that point, the IQGet command of the read session succeeds in acquiring the I lease to report a cache miss. This serializes the read session to occur after the write session, forcing it to observe the RDBMS updates of the write session when computing a value for its key. The resulting value is consistent with the state of the database and its insertion in IQ-Twemcached will not result in stale data for subsequent IQGet commands that reference the same key.

images/iq-framework-prevents-race-condition.png
Figure 2. IQ framework prevents race conditions that cause stale data due to snapshot isolation