Confusing results with lateral references

Started by Ashutosh Bapatabout 10 years ago7 messages

ashutosh.bapat@enterprisedb.com

about 10 years ago

Hi,

I am seeing different results with two queries which AFAIU have same
semantics and hence are expected to give same results.

There's no data in the table to start with.

postgres=# insert into t1 values (1, 1);
postgres=# insert into t2 values (1, 1), (2, 2);

Session 1
postgres=# begin;
BEGIN
postgres=# update t1 set val = 2 where val2 = 1;
UPDATE 1

Session 2
postgres=# select * from t1, (select distinct val, val2 from t2) t2 where
t1.val = t2.val for update of t1;

query waits here because of FOR UPDATE clause

Session 1
postgres=# commit;
COMMIT

Session 2 gives no rows
postgres=# select * from t1, (select distinct val, val2 from t2) t2 where
t1.val = t2.val for update of t1;
val | val2 | val | val2
-----+------+-----+------
(0 rows)

Reset values of t1
postgres=# update t1 set val = 1 where val2 = 1;
UPDATE 1

Session 1
postgres=# begin;
BEGIN
postgres=# update t1 set val = 2 where val2 = 1;
UPDATE 1

Session 2
postgres=# select * from t1, lateral (select distinct val, val2 from t2
where t2.val = t1.val) t2 for update of t1;

query waits here

Session 1
postgres=# commit;
COMMIT

Session 2 gives results of the query
postgres=# select * from t1, lateral (select distinct val, val2 from t2
where t2.val = t1.val) t2 for update of t1;
val | val2 | val | val2
-----+------+-----+------
2 | 1 | 1 | 1
(1 row)

AFAIU, both the queries

select * from t1, (select distinct val, val2 from t2) t2 where t1.val =
t2.val for update of t1;

AND

select * from t1, lateral (select distinct val, val2 from t2 where t2.val =
t1.val) t2 for update of t1;

have same semantic and should give same results.

Is seeing different results expected behaviour?
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 10 years ago

In reply to: Ashutosh Bapat (#1)

Re: Confusing results with lateral references

There's another seemingly wrong result, not with lateral, but with FOR
UPDATE.

postgres=# select * from t1;
val | val2
-----+------
1 | 1
(1 row)

postgres=# select * from t2;
val | val2
-----+------
1 | 1
2 | 2
1 | 1
(3 rows)

Session 1
postgres=# begin;
BEGIN
postgres=# update t1 set val = 2 where val2 = 1;
UPDATE 1

Session 2
postgres=# select * from t1 left join t2 on (t1.val = t2.val) for update of
t1;

query waits

Session 1
postgres=# commit;
COMMIT

Session 2 query returns two rows
select * from t1 left join t2 on (t1.val = t2.val) for update of t1;
val | val2 | val | val2
-----+------+-----+------
2 | 1 | |
2 | 1 | |
(2 rows)

It's confusing to see two rows from left join result when the table really
has only a single row. Is this behaviour expected?

On Thu, Dec 3, 2015 at 3:49 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

Hi,

I am seeing different results with two queries which AFAIU have same
semantics and hence are expected to give same results.

postgres=# \d t1
Table "public.t1"
Column | Type | Modifiers
--------+---------+-----------
val | integer |
val2 | integer |

postgres=# \d t2
Table "public.t2"
Column | Type | Modifiers
--------+---------+-----------
val | integer |
val2 | integer |

There's no data in the table to start with.

postgres=# insert into t1 values (1, 1);
postgres=# insert into t2 values (1, 1), (2, 2);

Session 1
postgres=# begin;
BEGIN
postgres=# update t1 set val = 2 where val2 = 1;
UPDATE 1

Session 2
postgres=# select * from t1, (select distinct val, val2 from t2) t2 where
t1.val = t2.val for update of t1;

query waits here because of FOR UPDATE clause

Session 1
postgres=# commit;
COMMIT

Session 2 gives no rows
postgres=# select * from t1, (select distinct val, val2 from t2) t2 where
t1.val = t2.val for update of t1;
val | val2 | val | val2
-----+------+-----+------
(0 rows)

Reset values of t1
postgres=# update t1 set val = 1 where val2 = 1;
UPDATE 1

Session 1
postgres=# begin;
BEGIN
postgres=# update t1 set val = 2 where val2 = 1;
UPDATE 1

Session 2
postgres=# select * from t1, lateral (select distinct val, val2 from t2
where t2.val = t1.val) t2 for update of t1;

query waits here

Session 1
postgres=# commit;
COMMIT

Session 2 gives results of the query
postgres=# select * from t1, lateral (select distinct val, val2 from t2
where t2.val = t1.val) t2 for update of t1;
val | val2 | val | val2
-----+------+-----+------
2 | 1 | 1 | 1
(1 row)

AFAIU, both the queries

select * from t1, (select distinct val, val2 from t2) t2 where t1.val =
t2.val for update of t1;

AND

select * from t1, lateral (select distinct val, val2 from t2 where t2.val
= t1.val) t2 for update of t1;

have same semantic and should give same results.

Is seeing different results expected behaviour?
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

about 10 years ago

In reply to: Ashutosh Bapat (#2)

Re: Confusing results with lateral references

On 2015/12/03 21:26, Ashutosh Bapat wrote:

Session 1
postgres=# begin;
BEGIN
postgres=# update t1 set val = 2 where val2 = 1;
UPDATE 1

Session 2
postgres=# select * from t1 left join t2 on (t1.val = t2.val) for update of
t1;

query waits

Session 1
postgres=# commit;
COMMIT

Session 2 query returns two rows
select * from t1 left join t2 on (t1.val = t2.val) for update of t1;
val | val2 | val | val2
-----+------+-----+------
2 | 1 | |
2 | 1 | |
(2 rows)

It's confusing to see two rows from left join result when the table really
has only a single row. Is this behaviour expected?

Maybe it is. Because the other table still has two (1, 1) rows, LockRows's
subplan would still produce two rows in result, no?

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 10 years ago

In reply to: Amit Langote (#3)

Re: Confusing results with lateral references

On Fri, Dec 4, 2015 at 10:58 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp

wrote:

On 2015/12/03 21:26, Ashutosh Bapat wrote:

Session 1
postgres=# begin;
BEGIN
postgres=# update t1 set val = 2 where val2 = 1;
UPDATE 1

Session 2
postgres=# select * from t1 left join t2 on (t1.val = t2.val) for update

of

t1;

query waits

Session 1
postgres=# commit;
COMMIT

Session 2 query returns two rows
select * from t1 left join t2 on (t1.val = t2.val) for update of t1;
val | val2 | val | val2
-----+------+-----+------
2 | 1 | |
2 | 1 | |
(2 rows)

It's confusing to see two rows from left join result when the table

really

has only a single row. Is this behaviour expected?

Maybe it is. Because the other table still has two (1, 1) rows, LockRows's
subplan would still produce two rows in result, no?

Documentation at
http://www.postgresql.org/docs/9.4/static/queries-table-expressions.html
says
(T1) LEFT OUTER JOIN (T2)

First, an inner join is performed. Then, *for each row in T1* that does not
satisfy the join condition with any row in T2, *a joined row is added* with
null values in columns of T2. Thus, the joined table always has at least
one row for each row in T1.
So there should be only one row for each row of outer table that didn't
join with the inner table. IOW a join with no joining rows should have same
number of rows as outer table.

Thanks,
Amit

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Tom Lane

tgl@sss.pgh.pa.us

about 10 years ago

In reply to: Ashutosh Bapat (#1)

Re: Confusing results with lateral references

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

I am seeing different results with two queries which AFAIU have same
semantics and hence are expected to give same results.

postgres=# select * from t1, (select distinct val, val2 from t2) t2ss where t1.val = t2ss.val for update of t1;

postgres=# select * from t1, lateral (select distinct val, val2 from t2 where t2.val = t1.val) t2ss for update of t1;

(I renamed your inline sub-selects to avoid confusion between them and the
table t2.)

I'm skeptical that those should be claimed to have identical semantics.

In the first example, after we've found the join row (1,1,1,1), we block
to see if the pending update on t1 will commit. After it does, we recheck
the join condition using the updated row from t1 (and the original row
from t2ss). The condition fails, so the updated row is not output.

The same thing happens in the second example, ie, we consider the updated
row from t1 and the non-updated row from t2ss (NOT t2). There are no join
conditions to recheck (in the outer query level), so the row passes, and
we output it.

If you'd allowed the FOR UPDATE to propagate into the sub-select, then the
sub-select's conditions would be considered as needing rechecks ... of
course, that would require removing the DISTINCT.

This example does show that a lateral reference to a FOR UPDATE table from
a non-FOR-UPDATE subselect has confusing behavior. Maybe we ought to
forbid that.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

about 10 years ago

In reply to: Ashutosh Bapat (#2)

Re: Confusing results with lateral references

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

There's another seemingly wrong result, not with lateral, but with FOR
UPDATE.

[ shrug... ] You're getting the post-update images of the two join
rows that would have been reported without FOR UPDATE. This one is
definitely not a bug.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Tom Lane (#5)

Re: Confusing results with lateral references

On Fri, Dec 4, 2015 at 10:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

I am seeing different results with two queries which AFAIU have same
semantics and hence are expected to give same results.

postgres=# select * from t1, (select distinct val, val2 from t2) t2ss where t1.val = t2ss.val for update of t1;

postgres=# select * from t1, lateral (select distinct val, val2 from t2 where t2.val = t1.val) t2ss for update of t1;

(I renamed your inline sub-selects to avoid confusion between them and the
table t2.)

I'm skeptical that those should be claimed to have identical semantics.

In the first example, after we've found the join row (1,1,1,1), we block
to see if the pending update on t1 will commit. After it does, we recheck
the join condition using the updated row from t1 (and the original row
from t2ss). The condition fails, so the updated row is not output.

Check.

The same thing happens in the second example, ie, we consider the updated
row from t1 and the non-updated row from t2ss (NOT t2). There are no join
conditions to recheck (in the outer query level), so the row passes, and
we output it.

What's surprising is that t2.val = t1.val isn't rechecked here. I
think that's not really possible, because of the DISTINCT operation,
which prevents us from identifying a single row from t2 that accounts
for the subquery's output row. Not sure whether it would work without
the DISTINCT.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers