MAX Query length

Started by Ansley, Michaelover 26 years ago27 messages
#1Ansley, Michael
Michael.Ansley@intec.co.za

Trawling through the code last night I noticed that:
#define MAX_QUERY_SIZE (BLCKSZ * 2)

Is there any conceivable reason why the query length would be dependent on
the block size? Or do I just have old source code?

MikeA

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ansley, Michael (#1)
Re: [HACKERS] MAX Query length

"Ansley, Michael" <Michael.Ansley@intec.co.za> writes:

Trawling through the code last night I noticed that:
#define MAX_QUERY_SIZE (BLCKSZ * 2)

Is there any conceivable reason why the query length would be dependent on
the block size?

Sure: you want to be able to INSERT a tuple of maximum size. In the
absence of dynamically sized text buffers, a reasonable estimate of
the longest INSERT command of interest is going to depend on BLCKSZ.

I don't know how long that particular constant has been defined like
that, though. I had the idea that it was the same as BLCKSZ, not 2x.
You may well find that frontend libpq is using a different value for
its buffer sizes than the backend is :-(

regards, tom lane

#3Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Ansley, Michael (#1)
Re: [HACKERS] MAX Query length

[Charset iso-8859-1 unsupported, filtering to ASCII...]

Trawling through the code last night I noticed that:
#define MAX_QUERY_SIZE (BLCKSZ * 2)

Is there any conceivable reason why the query length would be dependent on
the block size? Or do I just have old source code?

No great reason, but is seems like a good maximum. This controls the
buffer size on the client and server. Do you need it larger?

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#4Ansley, Michael
Michael.Ansley@intec.co.za
In reply to: Bruce Momjian (#3)
RE: [HACKERS] MAX Query length

Thanks for all the answers, everybody. Bruce, I had thought to start work
adjusting this so that the size wasn't limited at all. I'm just busy
gathering as much info as I can about the subject area, and hopefully in a
couple of days, if not earlier, I'll be in a position to start working on
the code.
I seem to remember there being a hackers guide somewhere. If I remember
right, it dealt with issues like where to check out the latest source from
cvs, rough standards, and other basic advice. Can anybody point me to it?

Thanks

MikeA

Show quoted text

-----Original Message-----
From: Bruce Momjian [mailto:maillist@candle.pha.pa.us]
Sent: Wednesday, July 14, 1999 5:02 PM
To: Ansley, Michael
Cc: 'pgsql-hackers@postgresql.org'
Subject: Re: [HACKERS] MAX Query length

[Charset iso-8859-1 unsupported, filtering to ASCII...]

Trawling through the code last night I noticed that:
#define MAX_QUERY_SIZE (BLCKSZ * 2)

Is there any conceivable reason why the query length would

be dependent on

the block size? Or do I just have old source code?

No great reason, but is seems like a good maximum. This controls the
buffer size on the client and server. Do you need it larger?

-- 
Bruce Momjian                        |  http://www.op.net/~candle
maillist@candle.pha.pa.us            |  (610) 853-3000
+  If your life is a hard drive,     |  830 Blythe Avenue
+  Christ can be your backup.        |  Drexel Hill, 
Pennsylvania 19026
#5Ansley, Michael
Michael.Ansley@intec.co.za
In reply to: Ansley, Michael (#4)
RE: [HACKERS] MAX Query length

I was just thinking of removing the limit completely. The query would fail
when it could allocate more memory for the query string.

MikeA

Show quoted text

-----Original Message-----
From: Bernard Frankpitt [mailto:frankpit@pop.dn.net]
Sent: Wednesday, July 14, 1999 5:56 PM
To: Tom Lane; pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] MAX Query length

Tom Lane wrote:

"Ansley, Michael" <Michael.Ansley@intec.co.za> writes:

Trawling through the code last night I noticed that:
#define MAX_QUERY_SIZE (BLCKSZ * 2)

Sure: you want to be able to INSERT a tuple of maximum

size. In the

absence of dynamically sized text buffers, a reasonable estimate of
the longest INSERT command of interest is going to depend

on BLCKSZ.
...

regards, tom lane

While I agree that it is reasonable that the query size should be
dependent on the block-size, there is an assumption here that the
type_in() and type_out() routines that do not expand the size of the
ascii representation of the tuple data in the query string
to more than
twice is size in it's internal disk representation. An important
exception to this assumption would be large arrays of floating point
data that are stored with limited precision. A
(single-precision) float
takes 4 bytes of space in a disk block, yet the ascii
representation
for the same data before conversion could easily take in excess of 16
bits if it comes from a
piece of code like

double x;
int buf_pos
....
....
buf_pos +=
snprintf( &query_buf[buf_pos], (l_buf - buf_pos ), "%e", x);

somewhere in a front end. Perhaps it would be a good idea
to increase
the multiplier in

#define MAX_QUERY_SIZE (BLCKSZ * 2)

to something larger than 2.

Bernie Frankpitt

#6Bernard Frankpitt
frankpit@pop.dn.net
In reply to: Tom Lane (#2)
Re: [HACKERS] MAX Query length

Tom Lane wrote:

"Ansley, Michael" <Michael.Ansley@intec.co.za> writes:

Trawling through the code last night I noticed that:
#define MAX_QUERY_SIZE (BLCKSZ * 2)

Sure: you want to be able to INSERT a tuple of maximum size. In the
absence of dynamically sized text buffers, a reasonable estimate of
the longest INSERT command of interest is going to depend on BLCKSZ.

...

regards, tom lane

While I agree that it is reasonable that the query size should be
dependent on the block-size, there is an assumption here that the
type_in() and type_out() routines that do not expand the size of the
ascii representation of the tuple data in the query string to more than
twice is size in it's internal disk representation. An important
exception to this assumption would be large arrays of floating point
data that are stored with limited precision. A (single-precision) float
takes 4 bytes of space in a disk block, yet the ascii representation
for the same data before conversion could easily take in excess of 16
bits if it comes from a
piece of code like

double x;
int buf_pos
....
....
buf_pos +=
snprintf( &query_buf[buf_pos], (l_buf - buf_pos ), "%e", x);

somewhere in a front end. Perhaps it would be a good idea to increase
the multiplier in

#define MAX_QUERY_SIZE (BLCKSZ * 2)

to something larger than 2.

Bernie Frankpitt

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bernard Frankpitt (#6)
Re: [HACKERS] MAX Query length

Bernard Frankpitt <frankpit@pop.dn.net> writes:

Tom Lane wrote:

Sure: you want to be able to INSERT a tuple of maximum size. In the
absence of dynamically sized text buffers, a reasonable estimate of
the longest INSERT command of interest is going to depend on BLCKSZ.

Perhaps it would be a good idea to increase
the multiplier in
#define MAX_QUERY_SIZE (BLCKSZ * 2)
to something larger than 2.

This entire chain of logic will fall to the ground anyway once we support
tuples larger than a disk block, which I believe is going to happen
before too much longer. So, rather than argue about what the multiplier
ought to be, I think it's more productive to just press on with making
the query buffers dynamically resizable...

regards, tom lane

#8Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Ansley, Michael (#4)
Re: [HACKERS] MAX Query length

[Charset iso-8859-1 unsupported, filtering to ASCII...]

Thanks for all the answers, everybody. Bruce, I had thought to start work
adjusting this so that the size wasn't limited at all. I'm just busy
gathering as much info as I can about the subject area, and hopefully in a
couple of days, if not earlier, I'll be in a position to start working on
the code.
I seem to remember there being a hackers guide somewhere. If I remember
right, it dealt with issues like where to check out the latest source from
cvs, rough standards, and other basic advice. Can anybody point me to it?

Info Central/Documenation, see the Developers section.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#9Ansley, Michael
Michael.Ansley@intec.co.za
In reply to: Bruce Momjian (#8)
RE: [HACKERS] MAX Query length

Nice parody.

Show quoted text

Info Central/Documenation, see the Developers section.

Bruce Momjian | http://www.op.net/~candle

#10Noname
wieck@debis.com
In reply to: Tom Lane (#7)
Re: [HACKERS] MAX Query length

Tom Lane wrote:

Bernard Frankpitt <frankpit@pop.dn.net> writes:

Tom Lane wrote:

Sure: you want to be able to INSERT a tuple of maximum size. In the
absence of dynamically sized text buffers, a reasonable estimate of
the longest INSERT command of interest is going to depend on BLCKSZ.

Perhaps it would be a good idea to increase
the multiplier in
#define MAX_QUERY_SIZE (BLCKSZ * 2)
to something larger than 2.

This entire chain of logic will fall to the ground anyway once we support
tuples larger than a disk block, which I believe is going to happen
before too much longer. So, rather than argue about what the multiplier
ought to be, I think it's more productive to just press on with making
the query buffers dynamically resizable...

Yes, even if we choose to make some other limit (like Vadim
suggested around 64K), a query operating on them could be
much bigger. I already had some progress with a data type
that uses a simple, byte oriented lz compression buffer as
internal representation.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #

#11Ansley, Michael
Michael.Ansley@intec.co.za
In reply to: Noname (#10)
RE: [HACKERS] MAX Query length

Well, I'm starting on this, so hopefully in a couple of weeks the length
limit of the query buffer will fade into insignificance.
Is somebody actively working on removing the tuple-length dependence on the
block size?

At present, disk blocks are set to 8k. Is it as easy as just adjusting the
constant to enlarge this? Testing queries larger than 16k with only an 8k
tuple size could be challenging.

MikeA

Show quoted text

This entire chain of logic will fall to the ground anyway

once we support

tuples larger than a disk block, which I believe is going to happen
before too much longer. So, rather than argue about what

the multiplier

ought to be, I think it's more productive to just press on

with making

the query buffers dynamically resizable...

Yes, even if we choose to make some other limit (like Vadim
suggested around 64K), a query operating on them could be
much bigger. I already had some progress with a data type
that uses a simple, byte oriented lz compression buffer as
internal representation.

Jan

--

#============================================================
==========#
# It's easier to get forgiveness for being wrong than for
being right. #
# Let's break this rule - forgive me.
#
#========================================= wieck@debis.com
(Jan Wieck) #

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ansley, Michael (#11)
Re: [HACKERS] MAX Query length

"Ansley, Michael" <Michael.Ansley@intec.co.za> writes:

At present, disk blocks are set to 8k. Is it as easy as just adjusting the
constant to enlarge this? Testing queries larger than 16k with only an 8k
tuple size could be challenging.

As of 6.5, it's just a matter of adjusting BLCKSZ in include/config.h,
rebuilding, and re-initdb-ing. The workable sizes are 8k 16k and 32k;
bigger than 32k fails for reasons I don't recall exactly (offsets
stored in signed shorts somewhere, no doubt).

Is somebody actively working on removing the tuple-length dependence on the
block size?

There was considerable discussion about it a few weeks ago, but I didn't
hear anyone actually committing to do the work :-(. Maybe when you've
made some progress on the text-length issues, someone will get excited
about the tuple-length issue...

regards, tom lane

#13Ansley, Michael
Michael.Ansley@intec.co.za
In reply to: Tom Lane (#12)
RE: [HACKERS] MAX Query length

Once I have recompiled with a new block size, how do I update the databases
that I already have. If I understand right, once the block size has been
updated, my current dbs will not work. Do I just pg_dump before make
install and then recreate the dbs and load the dumps afterwards?

As of 6.5, it's just a matter of adjusting BLCKSZ in
include/config.h,
rebuilding, and re-initdb-ing. The workable sizes are 8k
16k and 32k;
bigger than 32k fails for reasons I don't recall exactly (offsets
stored in signed shorts somewhere, no doubt).

MikeA

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ansley, Michael (#13)
Re: [HACKERS] MAX Query length

I wrote:

"Ansley, Michael" <Michael.Ansley@intec.co.za> writes:

At present, disk blocks are set to 8k. Is it as easy as just adjusting the
constant to enlarge this? Testing queries larger than 16k with only an 8k
tuple size could be challenging.

As of 6.5, it's just a matter of adjusting BLCKSZ in include/config.h,
rebuilding, and re-initdb-ing.

A further thought on this: if you increase BLCKSZ then at least some of
the fixed-size text buffers will get bigger, so it's not clear that you
will be stressing things all that hard if you take that route. Might be
easier to leave BLCKSZ alone and test with queries that are long and
complicated but don't actually require a large tuple size. Some
examples:

1. SELECT a,a,a,a,... FROM table;

2. SELECT a FROM table WHERE x = 1 OR x = 2 OR x = 3 OR ...;

3. Hugely complex CREATE TABLE commands (lots of constraints and
defaults and indexes, which don't enlarge the size of an actual
tuple of the table).

regards, tom lane

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#14)
Re: [HACKERS] MAX Query length

"Ansley, Michael" <Michael.Ansley@intec.co.za> writes:

Once I have recompiled with a new block size, how do I update the databases
that I already have. If I understand right, once the block size has been
updated, my current dbs will not work. Do I just pg_dump before make
install and then recreate the dbs and load the dumps afterwards?

Right, the real sequence when you are changing disk layout details is
pg_dumpall with old pg_dump and backend.
stop postmaster
rm -rf installation
make install
initdb
start postmaster
psql <pgdumpscript.

You may want to do your development work in a "playpen" installation
instead of risking breaking your "production" installation with these
sorts of shenanigans. I do that all the time here; for one thing I
don't have to bother saving and restoring any data when I blow away
a playpen installation.

The easiest kind of playpen setup is a separate server machine, but if
you only have one machine available then you do something like this to
build a playpen:

configure --with-pgport=5440 --prefix=/users/postgres/testversion

(Adjust playpen's port and install location to taste; make more than one
if you want...) BTW, if you are messing with the backend then your
playpen should also be built with --enable-cassert.

As I commented a moment ago, it's probably not really necessary for you
to change BLCKSZ for your testing, but the above tips are worth
repeating every so often for the benefit of new hackers.

regards, tom lane

#16Ole Gjerde
gjerde@icebox.org
In reply to: Tom Lane (#15)
Interesting index/LIKE/join slowness problems

Hi,
I've posted 3 messages to pgsql-general about a weird index problem I'm
having. I've found a very simple case that exhibits this problems.
This time I'm using a different database and different table that the
first 3 messages(It's the same pg install however).

The index called mcrl1_partnumber_index is an index on the 'reference'
field. The table was just vacuumed(with and without analyze).
The pg install is from CVS last night around 7pm Central time.

The problems seems to be rooted in 'OR' combined with 'LIKE'. If I remove
the % in the string, explain shows the same (high) cost. If I also remove
the 'LIKE' the cost basically goes to nothing. The cost is indeed
correct, either of the 2 first cases takes ~5 minutes, while the last one
(no LIKE) finishes instantly.

The weird thing is, why is the cost being calculated as being that high
when it's actually using the index on that field and is there a reason why
explain shows the index name twice?

I ran the same exact query on a MS SQL server with the same data, and
that took in comparison about 2 seconds to finish.
Both Postgres and MS SQL are on Pentium 100 servers(Yes, very pathetic),
and Linux 2.2.6 and NT 4.0 respectively.

Thanks,
Ole Gjerde

Here's the SQL:
---------------------
select * from mcrl1 where reference = 'AN914' OR reference LIKE 'AN914-%';

Here's the explain:
-----------------
mcrl=> explain select * from mcrl1 where reference = 'AN914' OR reference
LIKE AN914-%';
NOTICE: QUERY PLAN:

Index Scan using mcrl1_reference_index, mcrl1_reference_index on mcrl1
(cost=418431.81 rows=1 width=120)

EXPLAIN

Here's the table layout: 
------------
Table    = mcrl1
+----------------------------------+----------------------------------+-------+
|              Field               |              Type                |Length|
+----------------------------------+----------------------------------+-------+
| reference                        | varchar()                        |32 |
| cage_num                         | char()                           |5 |
| fsc                              | char()                           |4 |
| niin                             | char()                           |9 |
| isc                              | char()                           |1 |
| rnvc                             | char()                           |1 |
| rncc                             | char()                           |1 |
| sadc                             | char()                           |1 |
| da                               | char()                           |1 |
| description                      | varchar()                        |32 |
+----------------------------------+----------------------------------+-------+
Index:    mcrl1_partnumber_index
#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ole Gjerde (#16)
Re: [HACKERS] Interesting index/LIKE/join slowness problems

Ole Gjerde <gjerde@icebox.org> writes:

The pg install is from CVS last night around 7pm Central time.

Do you have USE_LOCALE defined?

The problems seems to be rooted in 'OR' combined with 'LIKE'. If I remove
the % in the string, explain shows the same (high) cost. If I also remove
the 'LIKE' the cost basically goes to nothing. The cost is indeed
correct, either of the 2 first cases takes ~5 minutes, while the last one
(no LIKE) finishes instantly.

When you have just "where reference = 'AN914'", the system knows it can
use the index to scan just the tuples with keys between AN914 and AN914
(duh). Very few tuples actually get fetched.

As soon as you use LIKE with a %, more tuples have to be scanned. It's
particularly bad if you have USE_LOCALE; with the current code, that
basically means that LIKE 'AN914-%' will cause all tuples beginning with
key AN914- and running to the end of the table to be scanned.

See the extensive thread on this topic from about a month or two back
in the pgsql-hackers mail list archives; I don't feel like repeating the
info now.

When you throw in the OR, the indexqual logic basically breaks down
completely; I think you end up scanning the entire table. (This could
be made smarter, perhaps, but right now I don't believe the system is
able to figure out the union of indexqual conditions.) I would say it
is an optimizer bug that it is not reverting to sequential scan here
... that would be a good bit faster, I bet.

regards, tom lane

#18Ole Gjerde
gjerde@icebox.org
In reply to: Tom Lane (#17)
Re: [HACKERS] Interesting index/LIKE/join slowness problems

On Thu, 15 Jul 1999, Tom Lane wrote:

Do you have USE_LOCALE defined?

Nope.. Not unless it defaults to on... I did a
./configure --prefix=/home/postgres ; make ; make install as usual

As soon as you use LIKE with a %, more tuples have to be scanned. It's
particularly bad if you have USE_LOCALE; with the current code, that
basically means that LIKE 'AN914-%' will cause all tuples beginning with
key AN914- and running to the end of the table to be scanned.

Ok.. I get that.. But why does LIKE 'AN914' have the same problem? The %
doesn't have to be there as long as it's either LIKE or ~*(or ~ etc)
query. And that still doesn't explain why it happens with USE_LOCALE
off..
Also, since the ='s work using OR, why wouldn't LIKE also? Both methods
would use the indexes, and the LIKE doesn't take that much longer to run..
Doesn't make sense, especially concerning what you mention below..

See the extensive thread on this topic from about a month or two back
in the pgsql-hackers mail list archives; I don't feel like repeating the
info now.

I haven't been able to find a discussion on this topic last few months, I
found discussion about something similar in March, but that didn't explain
it very well.. I'll just have to look some more :)

When you throw in the OR, the indexqual logic basically breaks down
completely; I think you end up scanning the entire table. (This could
be made smarter, perhaps, but right now I don't believe the system is
able to figure out the union of indexqual conditions.) I would say it
is an optimizer bug that it is not reverting to sequential scan here
... that would be a good bit faster, I bet.

Ok.. I can believe that.. This is a pretty nasty problem tho.. I don't
believe using OR with LIKE is all that rare.. Maybe it's rare on a 17
mill row table, but still..
What would be the outlook on fixing the problem and not the symptom? :)

As far as sequential scan being faster.. Unfortunately, this table has
about 17 million rows, so any kind of seq scan is gonna be really slow.

Thanks,
Ole Gjerde

#19Ansley, Michael
Michael.Ansley@intec.co.za
In reply to: Tom Lane (#15)
RE: [HACKERS] MAX Query length

Thanks, Tom.

Right, the real sequence when you are changing disk layout details is
pg_dumpall with old pg_dump and backend.
stop postmaster
rm -rf installation
make install
initdb
start postmaster
psql <pgdumpscript.

You may want to do your development work in a "playpen" installation
instead of risking breaking your "production" installation with these

etc., etc.

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ole Gjerde (#18)
Re: [HACKERS] Interesting index/LIKE/join slowness problems

Ole Gjerde <gjerde@icebox.org> writes:

Ok.. I get that.. But why does LIKE 'AN914' have the same problem? The %
doesn't have to be there as long as it's either LIKE or ~*(or ~ etc)
query.

A pure "where field LIKE constant" doesn't have the problem; it's the
OR that does it. More specifically it's an OR of ANDs that doesn't work
very well.

By the time the parser gets done with it, your query looks like

select * from mcrl1 where
reference = 'AN914' OR
(reference LIKE 'AN914-%'
AND reference >= 'AN914-'
AND reference <= 'AN914-\377');

(ugly, ain't it?) Those comparison clauses are what need to be pulled
out and fed to the indexscan mechanism, so that only part of the table
gets scanned, not the whole table. Indexscan doesn't know anything
about LIKE, but it does grok >= and <=.

Unfortunately the current optimizer doesn't do it right. I looked into
a very similar bug report from Hiroshi Inoue (see his message of 3/19/99
and my response of 4/3 in the hackers archives), and what I found was
that the cause is a fairly fundamental optimizer design choice. The
ANDed conditions get split into separate top-level clauses and there's
no easy way to put them back together. The optimizer ends up passing
only one of them to the indexscan executor. That's better than nothing,
but on average you still end up scanning half the table rather than
just a small range of it.

I haven't been able to find a discussion on this topic last few months, I
found discussion about something similar in March, but that didn't explain
it very well.. I'll just have to look some more :)

I was referring to the discussion around 4/15/99 about why LIKE needs a
smarter way to generate the upper comparison clause. That's not
directly your problem, but it is causing the same kind of slowdown for
everyone who does use LOCALE...

When you throw in the OR, the indexqual logic basically breaks down
completely; I think you end up scanning the entire table. (This could
be made smarter, perhaps, but right now I don't believe the system is
able to figure out the union of indexqual conditions.)

I was wrong about that --- the executor *does* handle OR'd indexqual
conditions, basically by performing a new indexscan for each OR'd
condition. (That's why EXPLAIN is listing the index multiple times.)
The trouble with OR-of-ANDs is entirely the optimizer's fault; the
executor would do them fine if the optimizer would only hand them over
in that form.

What would be the outlook on fixing the problem and not the symptom? :)

I plan to look into fixing this for 6.6, but don't hold your breath
waiting...

regards, tom lane

#21Ole Gjerde
gjerde@icebox.org
In reply to: Tom Lane (#20)
Re: [HACKERS] Interesting index/LIKE/join slowness problems

On Fri, 16 Jul 1999, Tom Lane wrote:

I was wrong about that --- the executor *does* handle OR'd indexqual
conditions, basically by performing a new indexscan for each OR'd
condition. (That's why EXPLAIN is listing the index multiple times.)
The trouble with OR-of-ANDs is entirely the optimizer's fault; the
executor would do them fine if the optimizer would only hand them over
in that form.

What would be the outlook on fixing the problem and not the symptom? :)

I plan to look into fixing this for 6.6, but don't hold your breath
waiting...

Thanks for giving the very detailed explanation!

Since we really need to have this work, or go with a different database,
we would be willing to pay someone to fix this problem. Would anybody be
interested in doing this, how soon and how much? It would be preferable
that this would be a patch that would be accepted back into postgres for
6.6.

Thanks,
Ole Gjerde
Avsupport Inc.

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ole Gjerde (#21)
Re: [HACKERS] Interesting index/LIKE/join slowness problems

Ole Gjerde <gjerde@icebox.org> writes:

On Fri, 16 Jul 1999, Tom Lane wrote:

The trouble with OR-of-ANDs is entirely the optimizer's fault; the
executor would do them fine if the optimizer would only hand them over
in that form.

Since we really need to have this work, or go with a different database,
we would be willing to pay someone to fix this problem. Would anybody be
interested in doing this, how soon and how much? It would be preferable
that this would be a patch that would be accepted back into postgres for
6.6.

Fixing the general OR-of-ANDs problem is going to be quite ticklish,
I think, because it would be easy to make other cases worse if we're
not careful about how we rewrite the qual condition.

However, I had an idea yesterday about a narrow, localized fix for LIKE
(and the other ops processed by makeIndexable), which I think would meet
your needs if the particular cases you are concerned about are just ORs
of LIKEs and simple comparisons.

It goes like this: while we want LIKE to generate indexable comparisons
if possible, having the parser insert them into the parsetree is a
really crude hack. The extra clauses are just a waste of cycles under
many scenarios (no index on the field being looked at, LIKE not in the
WHERE clause or buried too deeply to be an indexqual, etc etc).
What's worse, the parser doesn't know for sure that what it's
manipulating really *is* a LIKE --- it's making an unwarranted
assumption on the basis of the operator name, before the actual operator
has been looked up! So I've wanted to replace that method of optimizing
LIKE since the moment I saw it ;-)

What would be better would be to teach the indexqual extractor in the
optimizer that it can make indexqual conditions from a LIKE operator.
Then, the LIKE just passes through the cnfify() step without getting
rewritten, so we don't have the OR-of-ANDs problem. Plus we don't pay
any overhead if the LIKE can't be used as an indexqual condition for any
reason. And by the time the optimizer is acting, we really know whether
we have a LIKE or not, because type resolution and operator lookup have
been done.

I don't know how soon the general OR-of-ANDs problem can be solved,
but I am planning to try to make this LIKE fix for 6.6. If you want
to send some $$ my way, all the better...

regards, tom lane

#23Ole Gjerde
gjerde@icebox.org
In reply to: Tom Lane (#22)
Index not used on simple select

Hey,
I've been having this bizarre problem with some index on this one table.
The table has in the past had more than 9 indexes, but today I redid the
table and it still has the same problem.
I just did a dump of the schema, COPY'd the data out. Deleted all
postgres files, and installed 6.5.1.
The table has 3,969,935 rows in it.

Any ideas?

Here is the explain reports after both vacuum and vacuum analyze on the
table:
---------------------------------------------
parts=> explain select * from av_parts where partnumber = '123456';
NOTICE: QUERY PLAN:

Index Scan using av_parts_partnumber_index on av_parts (cost=3.55 rows=32
width=124)

EXPLAIN
parts=> explain select * from av_parts where nsn = '123456';
NOTICE: QUERY PLAN:

Seq Scan on av_parts (cost=194841.86 rows=3206927 width=124)

EXPLAIN
-------------------------------------------------

This is how I create the 2 indexes:
-------------------------------------------------
CREATE INDEX "av_parts_partnumber_index" on "av_parts" using btree
( "partnumber" "varchar_ops" );
CREATE INDEX "av_parts_nsn_index" on "av_parts" using btree
( "nsn" "varchar_ops" );
-------------------------------------------------

Table    = av_parts
+----------------------------------+----------------------------------+-------+
|              Field               |              Type                |Length|
+----------------------------------+----------------------------------+-------+
| itemid                           | int4 not null default nextval (  |4 |
| vendorid                         | int4                             |4 |
| partnumber                       | varchar()                        |25 |
| alternatepartnumber              | varchar()                        |25 |
| nsn                              | varchar()                        |15 |
| description                      | varchar()                        |50 |
| condition                        | varchar()                        |10 |
| quantity                         | int4                             |4 |
| rawpartnumber                    | varchar()                        |25 |
| rawalternatenumber               | varchar()                        |25 |
| rawnsnnumber                     | varchar()                        |15 |
| date                             | int4                             |4 |
| cagecode                         | varchar()                        |10 |
+----------------------------------+----------------------------------+-------+
Indices:  av_parts_itemid_key
          av_parts_nsn_index
          av_parts_partnumber_index

Thanks,
Ole Gjerde

#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ole Gjerde (#23)
Re: [HACKERS] Index not used on simple select

Ole Gjerde <gjerde@icebox.org> writes:

parts=> explain select * from av_parts where nsn = '123456';
Seq Scan on av_parts (cost=194841.86 rows=3206927 width=124)
[ why isn't it using the index on nsn? ]

That is darn peculiar. You're probably managing to trigger some nitty
little bug in the optimizer, but I haven't the foggiest what it might
be.

Indices: av_parts_itemid_key
av_parts_nsn_index
av_parts_partnumber_index

One bit of info you didn't provide is how that third index is defined.

Shipping your 4-million-row database around is obviously out of the
question, but I think a reproducible test case is needed; it's going to
take burrowing into the code with a debugger to find this one. Can
you make a small test case that behaves the same way? (One thing
to try right away is loading the same table and index definitions into
an empty database, but *not* loading any data and not doing vacuum.
If that setup doesn't show the bug, try adding a couple thousand
representative rows from your real data, vacuum analyzing, and then
seeing if it happens.)

regards, tom lane

#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#24)
Re: [HACKERS] Index not used on simple select

(Note to hackers: Ole sent me a 1000-row test case off list.)

oletest=> explain select * from av_parts where partnumber = '123456';
NOTICE: QUERY PLAN:

Index Scan using av_parts_partnumber_index on av_parts (cost=2.04 rows=1
width=124)

EXPLAIN
oletest=> explain select * from av_parts where nsn = '123456';
NOTICE: QUERY PLAN:

Seq Scan on av_parts (cost=48.00 rows=995 width=124)

OK, I confirm seeing this behavior. I don't have time to dig into
the code right now, but will do so when I get a chance.

It looks like the highly skewed distribution of nsn values (what you
sent me had 997 '' entries, only 3 non-empty strings) is confusing the
selectivity estimation code somehow, such that the system thinks that
the query is going to match most of the rows. Notice it is estimating
995 returned rows for the nsn select! Under these circumstances it will
prefer a sequential scan, since the more-expensive-per-tuple index scan
doesn't look like it will be able to avoid reading most of the table.
That logic is OK, it's the 0.995 selectivity estimate that's wrong...

Exactly why the selectivity estimate is so ludicrous remains to
be seen, but I know that there are some bogosities in that code
(search the pghackers archives for "selectivity" for more info).
I am hoping to do some extensive revisions of the selectivity code
for 6.6 or 6.7. This particular problem might be easily fixable,
or it might have to wait for the rewrite.

Thanks for the test case!

regards, tom lane

#26Ole Gjerde
gjerde@icebox.org
In reply to: Tom Lane (#25)
Re: [HACKERS] Index not used on simple select

On Fri, 23 Jul 1999, Tom Lane wrote:

It looks like the highly skewed distribution of nsn values (what you
sent me had 997 '' entries, only 3 non-empty strings) is confusing the

As a note, it doesn't seem to matter wether the field has '' or NULL.
Even after I do a update to set all all rows with '' to NULL, it still
does the same thing.

Also, my full set of data is not quite so skewed. The nsn field has about
450,000 non-empty rows in it.

Thanks,
Ole Gjerde

#27Ole Gjerde
gjerde@icebox.org
In reply to: Ole Gjerde (#26)
UPDATE memory exhaustion

Hey,
I was having problems with UPDATE, so I looked through the archives. Back
around the 20th of May, there was a thread about update using all memory
(thread: strange behavior of UPDATE).

It now looks that I am having that same problem on pg 6.5.1.
Basically I tried running a simple query:
update av_parts set nsn = 'xxxxx' where nsn = '';

And postgres started chugging along. After a while(not sure how long) it
was using all memory on the computer.

The box has 82MB of memory and 128 MB of swap.
The query is trying to update 3.5 million rows.

I would try to gdb to the process and see where it's spending its time,
unfortunately that box is pretty much dead until I reboot it. I'll try to
do it again later with a ulimit so I can actually log into the box :)

Thanks,
Ole Gjerde