postgres_fdw: Provide better emulation of READ COMMITTED behavior
Hi,
postgres_fdw uses REPEATABLE READ isolation level for the remote
transaction when the local transaction has READ COMMITTED isolation
level, for the reason described in the comments for
begin_remote_xact() in connection.c:
/*
* Start remote transaction or subtransaction, if needed.
*
* Note that we always use at least REPEATABLE READ in the remote session.
* This is so that, if a query initiates multiple scans of the same or
* different foreign tables, we will get snapshot-consistent results from
* those scans. A disadvantage is that we can't provide sane emulation of
* READ COMMITTED behavior --- it would be nice if we had some other way to
* control which remote queries share a snapshot.
*/
But as mentioned above, this causes unexpected behavior like this:
S1: CREATE TABLE t1 (c1 TEXT);
S1: CREATE FOREIGN TABLE ft1 (c1 TEXT) SERVER loopback OPTIONS
(table_name 't1');
S1: INSERT INTO ft1 VALUES ('foo');
S2: START TRANSACTION ISOLATION LEVEL READ COMMITTED;
S2: SELECT * FROM ft1;
c1
-----
foo
(1 row)
Looks good, but:
S1: INSERT INTO ft1 VALUES ('bar');
S2: SELECT * FROM ft1;
c1
-----
foo
(1 row)
The SELECT query would be expected to retrieve not only the row
(‘foo’) but the row (‘bar’) inserted by session S1 just before, but
retrieves only the row (‘foo’).
I would like to propose a simple solution for this issue. The idea
for the solution is that while postgres_fdw uses REPEATABLE READ for
the remote transaction as before, it refreshes the snapshot for the
transaction so that the same snapshot is shared by remote queries from
within a top-level query (or a set of queries expanded from it by
rewrite rules) received via simple/extended query protocol. To do
this, changes I would like to make to the core and postgres_fdw code
is:
* Add IDs for iterations of the PostgresMain() loop for extensions
like postgres_fdw to know which top-level query is currently being
processed.
* Add a function pg_refresh_snapshot() to refresh the snapshot for a
REPEATABLE READ transaction. (This function would cause phantom read,
which is not allowed to occur at REPEATABLE READ isolation level in
Postgres, but is allowed by the SQL standard.)
* Modify postgres_fdw so that when encountering a new top-level query,
it refreshes the snapshot for the remote transaction by using
pg_refresh_snapshot() before sending the first query from within the
top-level query if it is safe to do so.
Attached is a WIP patch for that. With the patch we have:
S1: INSERT INTO ft1 VALUES ('foo');
S2: START TRANSACTION ISOLATION LEVEL READ COMMITTED;
S2: SELECT * FROM ft1;
c1
-----
foo
(1 row)
S1: INSERT INTO ft1 VALUES ('bar');
S2: SELECT * FROM ft1;
c1
-----
foo
bar
(2 rows)
The SELECT query retrieves both rows!
Comments welcome! Maybe I am missing something, though.
Best regards,
Etsuro Fujita
Attachments:
postgres_fdw-emulate-RC-behavior-WIP.patchapplication/octet-stream; name=postgres_fdw-emulate-RC-behavior-WIP.patchDownload+340-15
On Thu, Dec 5, 2024 at 4:41 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
Comments welcome! Maybe I am missing something, though.
I have a hard time seeing how this would work if cursors are in use on
the main server. Say I do this:
DECLARE foo CURSOR FOR SELECT * FROM ft1 UNION ALL SELECT * FROM ft2;
...fetch some rows from cursor foo but few enough that we only scan ft1...
...do something that causes a snapshot refresh like issue another query...
...fetch more rows from cursor foo until we start scanning ft2...
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Dec 5, 2024 at 4:41 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
Comments welcome! Maybe I am missing something, though.
I have a hard time seeing how this would work if cursors are in use on
the main server. Say I do this:DECLARE foo CURSOR FOR SELECT * FROM ft1 UNION ALL SELECT * FROM ft2;
...fetch some rows from cursor foo but few enough that we only scan ft1...
...do something that causes a snapshot refresh like issue another query...
...fetch more rows from cursor foo until we start scanning ft2...
Apart from the above issue, what do you think about that we are using a
'SELECT pg_catalog.pg_refresh_snapshot()' to let the remote do the
refresh_snapshot VS 'a new message type for this'? There are lots of
things happen in the 'SELECT' way like 'a extra network communication',
'a complete parser-planner-executor workflow.' With a new message type
for this, we can send the message character with the next query
together. if so, can the two overheads removed?
--
Best Regards
Andy Fan
On Fri, Dec 6, 2024 at 2:37 AM Robert Haas <robertmhaas@gmail.com> wrote:
I have a hard time seeing how this would work if cursors are in use on
the main server. Say I do this:DECLARE foo CURSOR FOR SELECT * FROM ft1 UNION ALL SELECT * FROM ft2;
...fetch some rows from cursor foo but few enough that we only scan ft1...
...do something that causes a snapshot refresh like issue another query...
...fetch more rows from cursor foo until we start scanning ft2...
Good point! Maybe my explanation was not enough, but the proposed
patch does not allow any query to do a snapshot refresh if such open
cursors exist on the main server, so cursor foo is guaranteed to scan
ft2 using the same snapshot that was used to scan ft1.
Thank you for taking the time for this proposal!
Best regards,
Etsuro Fujita
On Fri, Dec 6, 2024 at 7:50 PM Andy Fan <zhihuifan1213@163.com> wrote:
Apart from the above issue, what do you think about that we are using a
'SELECT pg_catalog.pg_refresh_snapshot()' to let the remote do the
refresh_snapshot VS 'a new message type for this'? There are lots of
things happen in the 'SELECT' way like 'a extra network communication',
'a complete parser-planner-executor workflow.' With a new message type
for this, we can send the message character with the next query
together. if so, can the two overheads removed?
I think it might be a good idea to extend simple/extend query
protocols that way, but even if so, I would like to leave that for
future work, because even without that, I think this is still an
improvement, and I do not want to set the goal for the first cut too
high.
Having said that, if the next query uses simple query protocol, we can
avoid the extra communication by sending the two queries in a single
function call. I will do that in the next version.
Thanks for the comment!
Best regards,
Etsuro Fujita
Etsuro Fujita <etsuro.fujita@gmail.com> writes:
On Fri, Dec 6, 2024 at 7:50 PM Andy Fan <zhihuifan1213@163.com> wrote:
Apart from the above issue, what do you think about that we are using a
'SELECT pg_catalog.pg_refresh_snapshot()' to let the remote do the
refresh_snapshot VS 'a new message type for this'? There are lots of
things happen in the 'SELECT' way like 'a extra network communication',
'a complete parser-planner-executor workflow.' With a new message type
for this, we can send the message character with the next query
together. if so, can the two overheads removed?I think it might be a good idea to extend simple/extend query
protocols that way, but even if so, I would like to leave that for
future work, because even without that, I think this is still an
improvement, and I do not want to set the goal for the first cut too
high.
That's reasonable.
Having said that, if the next query uses simple query protocol, we can
avoid the extra communication by sending the two queries in a single
function call. I will do that in the next version.
Good to know that.
After reading the patch, the changes looks good to me except the name of
ThereAre[No]OldLivePortals(), multiple negations can be somewhat confusing
at times. Opinions may vary, However. I point this out just in case this
is done by mistake when you were refactoring the code. If you think the
current one is better, I'm totoally OK with it.
--
Best Regards
Andy Fan
On Sat, Dec 7, 2024 at 4:03 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
On Fri, Dec 6, 2024 at 2:37 AM Robert Haas <robertmhaas@gmail.com> wrote:
I have a hard time seeing how this would work if cursors are in use on
the main server. Say I do this:DECLARE foo CURSOR FOR SELECT * FROM ft1 UNION ALL SELECT * FROM ft2;
...fetch some rows from cursor foo but few enough that we only scan ft1...
...do something that causes a snapshot refresh like issue another query...
...fetch more rows from cursor foo until we start scanning ft2...Good point! Maybe my explanation was not enough, but the proposed
patch does not allow any query to do a snapshot refresh if such open
cursors exist on the main server, so cursor foo is guaranteed to scan
ft2 using the same snapshot that was used to scan ft1.
OK, I see. That does prevent the hazard I mentioned, but it also means
that the results returned by a query are dependent on whether there's
an unrelated cursor open, which seems unfortunate.
--
Robert Haas
EDB: http://www.enterprisedb.com