rules regression test diff (was Re: [HACKERS] Last call?)

Started by Tom Laneabout 27 years ago3 messages
#1Tom Lane
tgl@sss.pgh.pa.us

jwieck@debis.com (Jan Wieck) writes:

These two queries will produce join plans. Since there are no
indices on any of the tables, they should produce tuples in
exactly the order they where entered into the table.

I modified the rules.sql test to perform an EXPLAIN of the query
that is generating the unexpected result, and it says:

QUERY: explain update rtest_person set pname = 'jwieck' where pdesc = 'Jan Wieck';
NOTICE: QUERY PLAN:

Merge Join (cost=0.00 size=1 width=42)
-> Seq Scan (cost=0.00 size=0 width=0)
-> Sort (cost=0.00 size=0 width=0)
-> Seq Scan on rtest_admin (cost=0.00 size=0 width=30)
-> Seq Scan (cost=0.00 size=0 width=0)
-> Sort (cost=0.00 size=0 width=0)
-> Seq Scan on rtest_person (cost=0.00 size=0 width=12)

NOTICE: QUERY PLAN:

Seq Scan on rtest_person (cost=0.00 size=0 width=18)

The thing that jumps out at me here is the "sort" step (which I assume
is on pname for rtest_admin and pdesc for rtest_person, those being the
fields to be joined). Since the prior state of rtest_admin is

QUERY: select * from rtest_admin;
pname|sysname
-----+-------
jw |orion
jw |notjw
bm |pluto
(3 rows)

the two rows that will be updated have equal sort keys and therefore the
sort could legally return them in either order. Does Postgres contain
its own sort algorithm for this kind of operation, or does it depend on
the system qsort? System library qsorts don't normally guarantee
stability for equal keys. It could be that we're looking at a byproduct
of some minor implementation difference between the qsorts on my machine
and yours. If that's it, though, I'm surprised that I'm the only one
reporting a difference in the test's output.

BTW, I get the same query plan if I EXPLAIN the query that you say the
rule should expand to,

UPDATE rtest_admin SET pname = 'jwieck'
WHERE rtest_person.pdesc = 'Jan Wieck'
AND rtest_admin.pname = rtest_person.pdesc;

so there doesn't seem to be anything wrong with the rule rewriter...

regards, tom lane

#2Noname
jwieck@debis.com
In reply to: Tom Lane (#1)
Re: rules regression test diff (was Re: [HACKERS] Last call?)

Tom Lane wrote:

I modified the rules.sql test to perform an EXPLAIN of the query
that is generating the unexpected result, and it says:

QUERY: explain update rtest_person set pname = 'jwieck' where pdesc = 'Jan Wieck';
NOTICE: QUERY PLAN:

Merge Join (cost=0.00 size=1 width=42)
-> Seq Scan (cost=0.00 size=0 width=0)
-> Sort (cost=0.00 size=0 width=0)
-> Seq Scan on rtest_admin (cost=0.00 size=0 width=30)
-> Seq Scan (cost=0.00 size=0 width=0)
-> Sort (cost=0.00 size=0 width=0)
-> Seq Scan on rtest_person (cost=0.00 size=0 width=12)

NOTICE: QUERY PLAN:

Seq Scan on rtest_person (cost=0.00 size=0 width=18)

Isn't it nice to have EXPLAIN doing the rewrite step?

[...]

the two rows that will be updated have equal sort keys and therefore the
sort could legally return them in either order. Does Postgres contain
its own sort algorithm for this kind of operation, or does it depend on
the system qsort? System library qsorts don't normally guarantee
stability for equal keys. It could be that we're looking at a byproduct
of some minor implementation difference between the qsorts on my machine
and yours. If that's it, though, I'm surprised that I'm the only one
reporting a difference in the test's output.

Could be the reason. createfirstrun() in psort.c is using
qsort as a first try. Maybe we should add ORDER BY pname,
sysname to the queries to avoid it.

BTW, I get the same query plan if I EXPLAIN the query that you say the
rule should expand to,

UPDATE rtest_admin SET pname = 'jwieck'
WHERE rtest_person.pdesc = 'Jan Wieck'
AND rtest_admin.pname = rtest_person.pdesc;

so there doesn't seem to be anything wrong with the rule rewriter...

Sure. The parsetree generated by the rule system is exactly
that what the parser outputs for this query. I hope some
people understand my new documentation of the rule system.
Sometimes the lonesome rule-rider needs a partner too.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#2)
Re: rules regression test diff (was Re: [HACKERS] Last call?)

jwieck@debis.com (Jan Wieck) writes:

Tom Lane wrote:

the two rows that will be updated have equal sort keys and therefore the
sort could legally return them in either order. Does Postgres contain
its own sort algorithm for this kind of operation, or does it depend on
the system qsort? System library qsorts don't normally guarantee
stability for equal keys. It could be that we're looking at a byproduct
of some minor implementation difference between the qsorts on my machine
and yours. If that's it, though, I'm surprised that I'm the only one
reporting a difference in the test's output.

Could be the reason. createfirstrun() in psort.c is using
qsort as a first try. Maybe we should add ORDER BY pname,
sysname to the queries to avoid it.

I think this is the answer then. Some poking around in HP's patch
documentation shows that they modified their version of qsort a while
back:

PHCO_6780:
qsort performs very badly on sorted blocks of data
- customer found that qsort on a file with 100,000
randomly sorted records took seconds, whereas a file
of 100,000 records containing large sorted blocks
took over an hour to sort.
qsort needed to pick an alternate pivot point when
detecting sorted or partially sorted data in order
to improve poor performance.

So I guess it's not so surprising if HP's qsort has slightly different
behavior on equal-keyed data than everyone else's.

Does anyone object to modifying this regression test to sort the
tuples for display? It seems that only the one query needs to be
changed...

regards, tom lane