Tech Talk
Co-Founder Ben Darnell on Van Halen, brown M&Ms, and isolation.
Watch now
Here’s a surprising behavior of PostgreSQL you might not know about: under READ
COMMITTED isolation, PostgreSQL can sometimes miss rows when performing
UPDATE, DELETE, SELECT FOR UPDATE, or SELECT FOR SHARE statements. This is due
to the EvalPlanQual recheck PostgreSQL adds to these statements to prevent
lost-update anomalies.
For CockroachDB’s new implementation of READ COMMITTED isolation, we considered
building our own version of EvalPlanQual, but decided to use a different
technique instead which doesn’t miss rows. By not missing rows, CockroachDB
alleviates the need for application-level retries.
What is EvalPlanQual?
Under READ COMMITTED isolation, PostgreSQL adds a special recheck step to
UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE statements. This
recheck is known as EvalPlanQual, and consists of a re-evaluation
of part of the statement after all qualifying rows are locked. PostgreSQL uses
EvalPlanQual to prevent lost updates under READ COMMITTED isolation.
To make this concrete, let’s walk through an example.
An example of conflicting updates in PostgreSQL and CockroachDB
Suppose we’re using SQL to organize a 3-on-3 basketball league. One of the
tables in our schema holds information about every player in the league,
including (a) their ID, (b) their name, (c) their skill level, and (d) their
team assignment if they have one.
CREATE TABLE player (
id integer NOT NULL,
name text NOT NULL,
level text NOT NULL,
team text,
PRIMARY KEY (id)
);
INSERT INTO player VALUES
(1, ‘Gray’, ‘A’, ‘Dolphins’),
(2, ‘Mohan’, ‘A’, ‘Dolphins’),
(3, ‘Stonebreaker’, ‘A’, ‘Dolphins’),
(4, ‘Lamport’, ‘A’, ‘Gophers’),
(5, ‘Ullman’, ‘A’, ‘Gophers’),
(6, ‘Lynch’, ‘A’, ‘Gophers’),
(7, ‘Bernstein’, ‘AA’, ‘Elephants’),
(8, ‘Liskov’, ‘AA’, ‘Elephants’),
(9, ‘Codd’, ‘AA’, ‘Elephants’);
When modifying this table, we’ll try to maintain two invariants:
teams must have exactly 3 players
all players on a team must have the same skill level
After a wildly successful season, we decide to move all of the Gophers up to
skill level AA. We use a single UPDATE statement in an implicit transaction,
which should atomically modify all three players and thus maintain our
invariants.
UPDATE player SET level=’AA’ WHERE team=’Gophers’;
(But to more easily demonstrate, let’s sleep for 5 seconds at the beginning of
the update.)
WITH sleep AS (SELECT pg_sleep(5))
UPDATE player SET level=’AA’ FROM sleep WHERE team=’Gophers’;
While that update is running, the Gophers and Dolphins decide to exchange
players 3 and 4 in a trade. Again, we use a single UPDATE statement in an
implicit transaction. This should atomically swap the skill level and team of
the two players and thus, again, maintain our invariants.
UPDATE player p1 SET (level, team)=
(SELECT level, team FROM player p2
WHERE p2.id IN (3, 4) AND p2.id !=p1.id)
WHERE p1.id IN (3, 4);
The second update succeeds immediately. What happens to the first update? Let’s
consider the result by both database and isolation level.
PostgreSQL’s result
In PostgreSQL, under SERIALIZABLE isolation, the first update fails with a
“could not serialize” error. During execution, PostgreSQL detects that another
transaction has concurrently modified players 3 and 4, and aborts.
Under READ COMMITTED isolation, the first update succeeds, but the result is
anomalous: player 3 is now on the Gophers but is still at level A. Only two
of the Gophers were moved up to level AA.
michae2=# SELECT * FROM player ORDER BY id;
id | name | level | team
—-+————–+——-+———–
1 | Gray | A | Dolphins
2 | Mohan | A | Dolphins
3 | Stonebreaker | A | Gophers SELECT * FROM player ORDER BY id;
id | name | level | team
—–+————–+——-+————
1 | Gray | A | Dolphins
2 | Mohan | A | Dolphins
3 | Stonebreaker | AA | Gophers Seq Scan on player (cost=0.00..1.11 rows=3 width=38)
Filter: (team=’Gophers’::text)
But that Update on player operation hides a lot of complexity. We can think
of PostgreSQL executing the update under READ COMMITTED isolation in roughly
seven steps.
Establish a read snapshot for the statement. This snapshot is before the
second update begins.
Scan player at the read snapshot. (Seq Scan in the plan.)
Filter on team=’Gophers’. (Filter in the plan.) Players 4, 5, and 6
qualify since the read snapshot is before the second update.
Lock all qualifying rows. (Part of Update in the plan.)
Re-read the latest committed version of all locked rows. (Seq Scan in
the plan again.) This picks up the change to player 4.
Re-run the filter on the latest committed versions. (Filter in the plan
again.) Now only players 5 and 6 qualify.
Write a new version of each qualifying row. (Update in the plan.)
Steps 5 and 6 are PostgreSQL’s strategy for preventing lost updates under
READ COMMITTED isolation. After locking all qualifying rows, PostgreSQL
re-evaluates the query steps a second time on the latest version of each
locked row. This is the EvalPlanQual recheck. By doing this recheck,
PostgreSQL picks up any changes made to those rows between reading them and
locking them, which in this case correctly prevents modifying the skill level
of player 4. Unfortunately, because PostgreSQL only rechecks the locked rows,
it misses the change to player 3. We were expecting at least one of those
two players to move up to skill level AA, but neither did.
This behavior is documented, but it can still cause
surprises.
A closer look at CockroachDB’s UPDATE under READ COMMITTED isolation
Next, let’s look at EXPLAIN in CockroachDB. Again, the plan for the UPDATE
statement seems quite simple: a scan of the table followed by a filter.
[email protected]:26257/demoapp/defaultdb> EXPLAIN UPDATE player SET level=’AA’ WHERE team=’Gophers’;
info
——————————————————————————————-
distribution: local
vectorized: true
• update
│ table: player
│ set: level
│ auto commit
│
└── • render
│
└── • filter
│ estimated row count: 3
│ filter: team=’Gophers’
│
└── • scan
estimated row count: 9 (100% of the table; stats collected 8 seconds ago)
table: player@player_pkey
spans: FULL SCAN
Again, that update operation hides a lot of complexity. We can think of
CockroachDB executing the update under READ COMMITTED isolation in roughly six
steps.
Create a savepoint in the current transaction.
Establish a read snapshot for the statement. This snapshot is before the
second update begins.
Scan player at the read snapshot. (scan in the plan.)
Filter on team=’Gophers’. (filter in the plan.) Players 4, 5, and 6
qualify since the read snapshot is before the second update.
Write an intent for each qualifying row.1 (update in the plan.)
While writing intents, if the latest committed version of a row is newer
than our read snapshot, rollback to the savepoint and go back to step
1. This gives us a new read snapshot which is after the second update.
Step 6 is CockroachDB’s strategy for preventing lost updates under READ
COMMITTED isolation. CockroachDB retries the entire UPDATE
statement with a new read snapshot if it encounters a newer
version of a row while writing an intent.2 By executing the entire statement
again with a new read snapshot, CockroachDB can pick up any changes made
after the previous read snapshot, which in this case includes changes to both
players 3 and 4. Crucially, CockroachDB holds onto locks and intents across
retries, which helps it make progress even during periods of heavy contention.
How to prevent lost updates: a tradeoff?
At first glance these two different techniques for preventing lost updates seem
to embody a tradeoff: on the one hand, PostgreSQL avoids retries, on the other
hand, CockroachDB avoids anomalies. But if we consider things from the
application’s point of view, this is less of a tradeoff than it first appears.
What would an application need to do to be sure that all Gophers were updated
to skill level AA?
When using PostgreSQL, the application itself would need to retry the first
UPDATE statement until all Gophers were at level AA. Application-level
retries are more expensive than database-level retries, due to network
latency, so this is strictly worse than retrying within the database. There’s
no way to avoid retrying the first UPDATE statement in this scenario, but
CockroachDB is able to hide the retry from the application.
Conclusion
PostgreSQL adds EvalPlanQual rechecks to UPDATE, DELETE, SELECT FOR UPDATE,
and SELECT FOR SHARE statements under READ COMMITTED isolation to prevent
lost-update anomalies. EvalPlanQual lets PostgreSQL avoid internal retries of
these statements, but it can cause PostgreSQL to miss rows which will in
turn require application-level retries to handle.3
In CockroachDB’s new implementation of READ COMMITTED isolation, instead of
EvalPlanQual we built an internal statement-retry mechanism that does not
miss rows and thus alleviates the need for application-level retries.
1 Intents are new versions of a row that also act as exclusive
locks.
2 Under SERIALIZABLE isolation, CockroachDB retries entire transactions
when a serializable history cannot be achieved through other means. In this
case we’re using a single-statement implicit transaction, so CockroachDB
appears to have the same behavior under both SERIALIZABLE and READ COMMITTED
isolation. But note that if the scenario were slightly different, say a move
of a player to the Gophers rather than a swap, the two isolation levels would
show different behavior.
3 This article focuses on PostgreSQL, but it’s worth noting that MySQL
(using InnoDB) can also miss rows with some tweaks to the scenario. Of the
databases we’ve studied, so far Oracle has the most similar UPDATE behavior
under READ COMMITTED isolation to CockroachDB.
>>> Read full article>>>
Copyright for syndicated content belongs to the linked Source : Hacker News – https://www.cockroachlabs.com/blog/why-cockroachdb-doesnt-use-evalplanqual/