bug #7499 additional comments

Started by Denis Kolesnikover 13 years ago4 messagesbugs
Jump to latest
#1Denis Kolesnik
lirex.software@gmail.com

My arguments are:

is that even
select id, str_last_name from tbl_owners_individual where id in
(83,175,111,1) order by id;

id | str_last_name
-----+----------------------
1 | Kolesnik
83 | GXXXXXXXXX
111 | Kolesnik
175 | GXXXXXXXXX
(4 строки)

select id, str_last_name from tbl_owners_individual where id in
(83,175,111,1) order by str_last_name;

id | str_last_name
-----+----------------------
83 | GXXXXXXXXX
175 | GXXXXXXXXX
1 | Kolesnik
111 | Kolesnik
(4 строки)

Compare this 2 results and you see, that even if the records with the
same last names do not come directly one after other then "id 1"
always closer to the top, then "id 111"
and "id 83" always clother to the top then "id 175". It proves, that
the sorting by id remains always even if only among records for the
same lastname.

Suppose a person who has basic SQL knowledges would learn on praxis
how would result a query if a person adds the clause "limit 1" to it and

if a person sees results for this query:
select id, str_last_name from tbl_owners_individual order by
str_last_name offset 26 limit 1;

id | str_last_name
----+----------------------
83 | GXXXXXXXXX
(1 строка)

and compares result to the query

select id, str_last_name from tbl_owners_individual order by
str_last_name offset 26;

id | str_last_name
-----+----------------------
83 | GXXXXXXXXX
175 | GXXXXXXXXX
...

then one makes conclusion, that a sorting by id always remain in both
cases, but if one replaces this queries so:

select id, str_last_name from tbl_owners_individual order by
str_last_name limit 1 offset 53;

id | str_last_name
-----+----------------------
111 | Kolesnik
(1 строка)

select id, str_last_name from tbl_owners_individual order by
str_last_name offset 53;

id | str_last_name
-----+----------------------
1 | Kolesnik
111 | Kolesnik
...

Then a person comes to misunderstanding.

You would sugguest, that one should read documentation.

in the (where with ... replaced a directory in which the PostgreSQL installed)
...PostgreSQL\9.1\doc\postgresql\html\queries-limit.html

"...When using LIMIT, it is important to use an ORDER BY clause that
constrains the result rows into a unique order. .."

here asked to use "ORDER BY" which is done in every query above.

"...The query optimizer takes LIMIT into account when generating query
plans, so you are very likely to get different plans (yielding
different row orders) depending on what you give for LIMIT and OFFSET.
Thus, using different LIMIT/OFFSET values to select different subsets
of a query result will give inconsistent results unless you enforce a
predictable result ordering with ORDER BY. This is not a bug; ..."

the values of "ORDER BY" for LIMIT/OFFSET are not different as you
see. All requirements are filled.

this part "...The query optimizer takes LIMIT into account when
generating query plans, so you are very likely to get different plans
(yielding different row orders) depending on what you give for LIMIT
and OFFSET." would explain, that adding "LIMIT" will result in some
unxplained data sorting, but

this "...Thus, using different LIMIT/OFFSET values to select different
subsets of a query result will give inconsistent results unless you
enforce a predictable result ordering with ORDER BY."

then the query with the results as you see:
select id, str_last_name from tbl_owners_individual where
str_last_name='Kolesnik' order by str_last_name limit 2 offset 2;

id | str_last_name
-----+----------------------
111 | Kolesnik
144 | Kolesnik
(2 строки)

inconsistent results unless you enforce a predictable result ordering
with ORDER BY.

order by is here predictable, exists, but:

select id, str_last_name from tbl_owners_individual order by str_last_name;
...
49 | Kolesnik
224 | Kolesnik
144 | Kolesnik
1 | Kolesnik
111 | Kolesnik
...

as you see: offset 2 should return "144 | Kolesnik "

and "...inconsistent results..." nowhere in this page of documentation
(as I read it, if I do wrong) stated that
"...inconsistent results..." not applies to the following 2 queries:

select id, str_last_name from tbl_owners_individual order by str_last_name;
select id, str_last_name from tbl_owners_individual order by
str_last_name limit 2 offset 2;

I and not only I by reading this page of documentation will conclude
not without a reason that the queries different on presense or absense
"...limit 2 offset 2..." should return results which are consistent.

Basing on this I conclude, that it is a bug.

With respect,
Denis Kolesnik.

Show quoted text

On 8/22/12, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:

Denis Kolesnik <lirex.software@gmail.com> wrote:

I have now VERY strong argument to consider it is as a bug:

No, you appear to have very strong feelings about it, but you are
not making an argument that holds water.

if there a understandable for SQL language sequence which sorts
in other fashion when adding "LIMIT".

Each query is evaluated in terms of satisfying what is requested in
that query, and the fastest plan which returns those results is
chosen. If you want results to be generated in a specific order, it
is incumbent on you to specify that in the query -- there is no
"natural order" to rows which is used as a tie-breaker. There are
even optimizations to have one query which is going to scan a table
start at the point that another table scan, already in progress is
at, to prevent extra reads -- so exactly the same query run at about
the same time, with no concurrent database changes can easily return
rows in different orders. That's a feature, not a bug. If you want
them in a particular order, say so, and appropriate index usage or
sorts will be added to the query execution to provide them the way
you ask, even though that is slower than it would be if you didn't
care about the order.

and even sorting by id:
select id, str_last_name from tbl_owners_individual where id in
(83,175,111,1) order by str_last_name;

id | str_last_name
-----+----------------------
83 | GXXXXXXXXX
175 | GXXXXXXXXX
1 | Kolesnik
111 | Kolesnik
(4 ******)

No, it didn't go out of its way to sort that way, it just happened
to fall out that way that time; don't count on it always being that
way, even if it happens many times in a row.

test=# create table tbl_owners_individual
test-# (id int not null primary key, str_last_name text not null);
CREATE TABLE
test=# insert into tbl_owners_individual values
test-# (1, 'Kolesnik'),
test-# (83, 'GXXXXXXXXX'),
test-# (111, 'Kolesnik'),
test-# (175, 'GXXXXXXXXX');
INSERT 0 4
test=# select id, str_last_name from tbl_owners_individual where id
in
test-# (83,175,111,1) order by str_last_name;
id | str_last_name
-----+---------------
83 | GXXXXXXXXX
175 | GXXXXXXXXX
1 | Kolesnik
111 | Kolesnik
(4 rows)

-Kevin

#2Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Denis Kolesnik (#1)
Re: bug #7499 additional comments

Denis Kolesnik <lirex.software@gmail.com> wrote:

My arguments are:

is that even
select id, str_last_name from tbl_owners_individual where id in
(83,175,111,1) order by id;

id | str_last_name
-----+----------------------
1 | Kolesnik
83 | GXXXXXXXXX
111 | Kolesnik
175 | GXXXXXXXXX
(4 ******)

select id, str_last_name from tbl_owners_individual where id in
(83,175,111,1) order by str_last_name;

id | str_last_name
-----+----------------------
83 | GXXXXXXXXX
175 | GXXXXXXXXX
1 | Kolesnik
111 | Kolesnik
(4 ******)

Compare this 2 results and you see, that even if the records with
the same last names do not come directly one after other then "id
1" always closer to the top, then "id 111" and "id 83" always
clother to the top then "id 175". It proves, that the sorting by
id remains always even if only among records for the same
lastname.

[sigh]

It proves no such thing. It happened to pick a path *that time*
which happened to generate them in that order. There are no
guarantees that it always will. Do you imagine that a column named
"id" has any special properties compared to a column by any other
name? It doesn't. What do you imagine would happen if you had
columns with integers in different sequences?

Please run this script on your system before your next post:

drop if exists table tbl_test;
create table tbl_test
(id int not null primary key,
str_last_name text not null,
misc text);
insert into tbl_test values
(1, 'Kolesnik'),
(83, 'GXXXXXXXXX'),
(111, 'Kolesnik'),
(175, 'GXXXXXXXXX');
select id, str_last_name from tbl_test
where id in (83,175,111,1) order by str_last_name;
update tbl_test set misc = 'x' where id = 1;
select id, str_last_name from tbl_test
where id in (83,175,111,1) order by str_last_name;
analyze tbl_test;
select id, str_last_name from tbl_test
where id in (83,175,111,1) order by str_last_name;

You would sugguest, that one should read documentation.

in the (where with ... replaced a directory in which the
PostgreSQL installed)
...PostgreSQL\9.1\doc\postgresql\html\queries-limit.html

"...When using LIMIT, it is important to use an ORDER BY clause
that constrains the result rows into a unique order. .."

here asked to use "ORDER BY" which is done in every query above.

No, it asked to specify ORDER BY such that it "constrains the result
rows into a unique order" -- which you are not doing in your
examples. That is exactly what you *should* do to get the results
you want.

"...The query optimizer takes LIMIT into account when generating
query plans, so you are very likely to get different plans
(yielding different row orders) depending on what you give for
LIMIT and OFFSET. Thus, using different LIMIT/OFFSET values to
select different subsets of a query result will give inconsistent
results unless you enforce a predictable result ordering with
ORDER BY. This is not a bug; ..."

the values of "ORDER BY" for LIMIT/OFFSET are not different as you
see. All requirements are filled.

Not even close. Read it again. Your ORDER BY clause is not
guaranteeing predictable results. It would if you added the primary
key columns to the ORDER BY clause (assuming there is a primary
key).

You would be well served to fix your query and move on. There's no
way we're going to make it behave the way you are requesting. The
current behavior conforms to the SQL standard, performs better than
it could if it worked the way you suggest, and allows you to get the
results you want by fully specifying the ORDER BY clause.

-Kevin

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Denis Kolesnik (#1)
Re: bug #7499 additional comments

Denis Kolesnik <lirex.software@gmail.com> writes:

You would sugguest, that one should read documentation.

Indeed.

in the (where with ... replaced a directory in which the PostgreSQL installed)
...PostgreSQL\9.1\doc\postgresql\html\queries-limit.html

"...When using LIMIT, it is important to use an ORDER BY clause that
constrains the result rows into a unique order. .."

here asked to use "ORDER BY" which is done in every query above.

What you are stubbornly refusing to grasp is that your ORDER BY is not
sufficient to ensure a unique order. The above-quoted sentence does not
promise that any old ORDER BY will do. If you simply ORDER BY a
non-unique column, the query's output ordering is only constrained up to
the point that rows with the same ordering value will be adjacent.
If you want more than that, you need to add additional columns to the
ORDER BY to resolve cases where that column is equal.

As long as you haven't done that, you are wrong, and you are wasting
both your time and ours arguing about it. You've wasted quite enough
of my time already; don't expect to see any further responses on this
subject.

regards, tom lane

#4Craig Ringer
craig@2ndquadrant.com
In reply to: Denis Kolesnik (#1)
Re: bug #7499 additional comments

On 08/23/2012 04:12 AM, Denis Kolesnik wrote:

Suppose a person who has basic SQL knowledges would learn on praxis
how would result a query if a person adds the clause "limit 1" to it

Then they just got bitten by not learning enough and not testing their
code well enough; they were probably programming by recipe and
copy-and-paste, not by learning the platform they're working with.

http://www.postgresql.org/docs/9.1/static/sql-select.html#SQL-ORDERBY

"The ORDER BY clause causes the result rows to be sorted according to
the specified expression(s). If two rows are equal according to the
leftmost expression, they are compared according to the next expression
and so on. If they are equal according to all specified expressions,
they are returned in an implementation-dependent order."

It'd be really nice if every programming language and tool could be
completely safe and easy, with no undefined, implementation-defined or
inconsistent behaviour. Unfortunately, in the real world that doesn't
happen because perfectly specified platforms are (a) really hard to
actually write and (b) usually hard to optimise and thus slow.

Suppose a person with basic C knowledge wrote this (utterly wrong and
dangerous, do not use for anything) program:

#include <stdio.h>
#include <string.h>
#include <malloc.h>
int main() {
char * blah = (char*)malloc(10);
strcpy(blah,"1234567890");
printf("%s\n", blah);
}

This program has *at* *least* one bug that'll cause it to run most of
the time, but fail unpredictably, especially when used as part of a
larger program rather than standalone. Failure will depend on platform,
C library, kernel, compiler settings, and the contents of uninitialized
memory.

Is the platform responsible for the user shooting themselves in the foot
because they didn't learn about null termination of strings, buffer
over-runs, the dangers of using strcpy(), etc? To me it's a bug in the
user's code, not the platform.

Sure, the platform could be easier to use. It could add lots of bounds
checks, prohibit raw memory access, use garbage collection instead of
explicit pointer-based memory management, etc. Then you'd have a new
platform called Java, which is very useful - but not really something
you can use to write tiny programs that take microseconds to run, or
high-performance operating system kernels.

Even Java has plenty of traps and confusing characteristics. Anything to
do with threads. finalize() methods. try {} catch {} finally {}
constructs. Double-checked locking. Plenty more. That's in a language
that was designed to be an easier and safer alternative to C.

Everything is a compromise, including the SQL language and
implementations of it. If Pg made underspecified sorts an error then
lots of other people would scream "bug!" because pretty much every other
database system lets you do this so it'd be a portability problem - and
because it's a really useful behaviour for some purposes. If Pg's query
planner always ensured that sorts were stable and always did the same
sorts, people wouldn't use Pg because it'd be too slow.

More importantly, PostgreSQL has no way of *knowing* for sure that the
sort is underspecified. It can't know that the column you've specified
isn't unique, or at least unique within the subset of data you're
working with. It trusts you to know what you want.

The trick is to read the documentation, learn, and test your code well.

That's true of every language, even those that try to protect the
programmer from their mistakes as much as possible.

--
Craig Ringer