335 times faster (!)

Started by Mikael Carneholmabout 23 years ago16 messagesgeneral
Jump to latest
#1Mikael Carneholm
carniz@spray.se

I discovered a strange thing when doing a simple search on a (comparably) large table with ~900K rows today:

When searching for a specific row on the primary key (type: bigint), the search took about 6,5 seconds. The column has a default btree index as created by the primary key constraint. However, when searching for the same row on one of it's columns (type: text) which has a functional index on lower(column name), the same row was retrieved in 19ms! That's ~335 times faster!

My idea is thus that one could create tables with a text type column holding the value of the identifier without using the 'primary key' clause, and then afterwards add a functional index on lower(column name). If performance is the main issue, this must be an interesting solution. The downside is of course that the text data type may result in invalid integer values being inserted as keys.

Anyone tried this before?

- Mikael

_____________________________________________________________
Här börjar internet!
Skaffa gratis e-mail och gratis internet på http://www.spray.se

Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/

#2Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Mikael Carneholm (#1)
Re: 335 times faster (!)

On Mon, 3 Feb 2003, Mikael Carneholm wrote:

When searching for a specific row on the primary key (type: bigint),
the search took about 6,5 seconds. The column has a default btree
index as created by the primary key constraint. However, when
searching for the same row on one of it's columns (type: text) which
has a functional index on lower(column name), the same row was
retrieved in 19ms! That's ~335 times faster!

Did you remember to cast the constant into bigint? If not, it probably
ignored the bigint index and did a table scan.

#3Nigel J. Andrews
nandrews@investsystems.co.uk
In reply to: Mikael Carneholm (#1)
Re: 335 times faster (!)

On Mon, 3 Feb 2003, Mikael Carneholm wrote:

I discovered a strange thing when doing a simple search on a (comparably) large table with ~900K rows today:

When searching for a specific row on the primary key (type: bigint), the search took about 6,5 seconds. The column has a default btree index as created by the primary key constraint. However, when searching for the same row on one of it's columns (type: text) which has a functional index on lower(column name), the same row was retrieved in 19ms! That's ~335 times faster!

My idea is thus that one could create tables with a text type column holding the value of the identifier without using the 'primary key' clause, and then afterwards add a functional index on lower(column name). If performance is the main issue, this must be an interesting solution. The downside is of course that the text data type may result in invalid integer values being inserted as keys.

Anyone tried this before?

Could it be that you've got a reasonably powerful machine and that your table
isn't very wide?

Are you sure your search using the primary key was actually using the primary
key index, i.e. did you just do:

SELECT * FROM mytable WHERE pkcol = 45

or did you quote the number or cast to bigint? Perhaps this has changed in 7.3
I don't know.

Also, did you perhaps do your search on the text type column just after doing
the first SELECT? You might find there's some caching issue.

Not sure about anyone else but I think we'd want to see the plans used for your
queries, in addition to the queries, before accepting this.

--
Nigel J. Andrews

#4Bruno Wolff III
bruno@wolff.to
In reply to: Mikael Carneholm (#1)
Re: 335 times faster (!)

On Mon, Feb 03, 2003 at 18:42:31 +0100,
Mikael Carneholm <carniz@spray.se> wrote:

I discovered a strange thing when doing a simple search on a (comparably) large table with ~900K rows today:

When searching for a specific row on the primary key (type: bigint), the search took about 6,5 seconds. The column has a default btree index as created by the primary key constraint. However, when searching for the same row on one of it's columns (type: text) which has a functional index on lower(column name), the same row was retrieved in 19ms! That's ~335 times faster!

This is probably a type coersion issue. You can probably get the first search
to run much faster by including an explicit cast to bigint.

#5Mikael Carneholm
carniz@spray.se
In reply to: Bruno Wolff III (#4)
Re: 335 times faster (!)

------- Ursprungligt meddelande -------

Från: Nigel J. Andrews <nandrews@investsystems.co.uk>
Datum: Mon, 3 Feb 2003 17:59:12 +0000 (GMT)

Could it be that you've got a reasonably powerful machine and that your table
isn't very wide?

Machine: Intel P3 650 laptop /w 256 RAM
Table: 10 columns

Are you sure your search using the primary key was actually using the primary
key index, i.e. did you just do:

SELECT * FROM mytable WHERE pkcol = 45

Yep:
select * from enheter where enhetsid = xxxxxxxxx;

Also, did you perhaps do your search on the text type column just after doing
the first SELECT? You might find there's some caching issue.

I tested this (after you pointed it out) by alterating between the same two queries back and forth, but they still differ by the same amount.

Not sure about anyone else but I think we'd want to see the plans used for your
queries, in addition to the queries, before accepting this.

explain select * from enheter where enhetsid = 200178146;
QUERY PLAN
------------------------------------------------------------
Seq Scan on enheter (cost=0.00..15678.50 rows=1 width=91) Filter: (enhetsid = 200178146)
(2 rows)

explain select * from enheter where lower(enhetsnamn1) = 'donalds foto ab';
QUERY PLAN ---------------------------------------------------------------------------------------------
Index Scan using idx_enheter_enhetsnamn1 on enheter (cost=0.00..1342.82 rows=337 width=91) Index Cond: (lower(enhetsnamn1) = 'donalds foto ab'::text)
(2 rows)

- Mikael

_____________________________________________________________
Här börjar internet!
Skaffa gratis e-mail och gratis internet på http://www.spray.se

Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/

#6Mikael Carneholm
carniz@spray.se
In reply to: Mikael Carneholm (#5)
Re: 335 times faster (!)

------- Ursprungligt meddelande -------

Från: Mario Weilguni <mweilguni@sime.com>
Datum: Mon, 3 Feb 2003 20:05:48 +0100

try:
explain select * from enheter where enhetsid = '200178146';
or
explain select * from enheter where enhetsid = 200178146::bigint

explain select * from enheter where enhetsid = '200178146';
QUERY PLAN ---------------------------------------------------------------------------
Index Scan using pk_enheter on enheter (cost=0.00..4.05 rows=1 width=91)
Index Cond: (enhetsid = 200178146::bigint)
(2 rows)

Strange...using:
200178146::bigint
or
'200178146'
..the query is lightning fast. Since the PK column is of integer type, I don't think it's logical to pass a string-type argument...or am I different than most people on this point? :)

What about third party frameworks (such as Hibernate, eg) - I'm sure they will look at the column datatype and think: 'Oh, it's an integer...I'll pass an integer argument then', which will result in unnecessary poor performance.

What's the cause of this behaviour? Why isn't psql (or the backend) converting integer type arguments into char/string types, if there's this much to gain?

(Sorry for being quite a Pg newbie..have never used Pg for such "heavy" load before)

- Mikael

_____________________________________________________________
Här börjar internet!
Skaffa gratis e-mail och gratis internet på http://www.spray.se

Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/

#7Dennis Gearon
gearond@cvc.net
In reply to: Mikael Carneholm (#6)
Re: 335 times faster (!)

My surrogate primary keys will all be INT4's. Seems like it would change something passed in
to an INT4 to match the primary key if that's what the key is.

2/3/2003 11:23:28 AM, Mikael Carneholm <carniz@spray.se> wrote:
<snip>

Strange...using:
200178146::bigint
or
'200178146'
..the query is lightning fast. Since the PK column is of integer type, I don't think it's

logical to pass a string-type argument...or am I different than most people on this point? :)

What about third party frameworks (such as Hibernate, eg) - I'm sure they will look at the

column datatype and think: 'Oh, it's an integer...I'll pass an integer argument then', which
will result in unnecessary poor performance.

What's the cause of this behaviour? Why isn't psql (or the backend) converting integer type

arguments into char/string types, if there's this much to gain?

Show quoted text

(Sorry for being quite a Pg newbie..have never used Pg for such "heavy" load before)

- Mikael

_____________________________________________________________
H�r b�rjar internet!
Skaffa gratis e-mail och gratis internet p� http://www.spray.se

Tr�ffa folk fr�n hela Sverige p� ett och samma st�lle - http://chat.spray.se/

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

#8Nigel J. Andrews
nandrews@investsystems.co.uk
In reply to: Mikael Carneholm (#6)
Re: 335 times faster (!)

On Mon, 3 Feb 2003, Mikael Carneholm wrote:

------- Ursprungligt meddelande -------

Från: Mario Weilguni <mweilguni@sime.com>
Datum: Mon, 3 Feb 2003 20:05:48 +0100

try:
explain select * from enheter where enhetsid = '200178146';
or
explain select * from enheter where enhetsid = 200178146::bigint

explain select * from enheter where enhetsid = '200178146';
QUERY PLAN ---------------------------------------------------------------------------
Index Scan using pk_enheter on enheter (cost=0.00..4.05 rows=1 width=91)
Index Cond: (enhetsid = 200178146::bigint)
(2 rows)

Strange...using:
200178146::bigint
or
'200178146'
..the query is lightning fast. Since the PK column is of integer type, I don't think it's logical to pass a string-type argument...or am I different than most people on this point? :)

No, I think a lot of us have been caught by this in the past.

What about third party frameworks (such as Hibernate, eg) - I'm sure they will look at the column datatype and think: 'Oh, it's an integer...I'll pass an integer argument then', which will result in unnecessary poor performance.

What's the cause of this behaviour? Why isn't psql (or the backend) converting integer type arguments into char/string types, if there's this much to gain?

Someone more familiar with the backend can answer this a lot better than I
could so I won't give a half assed comment.

(Sorry for being quite a Pg newbie..have never used Pg for such "heavy" load before)

I feel like that most days.

--
Nigel J. Andrews

#9Mikael Carneholm
carniz@spray.se
In reply to: Nigel J. Andrews (#8)
Re: 335 times faster (!)

------- Ursprungligt meddelande -------

Från: Nigel J. Andrews <nandrews@investsystems.co.uk>
Datum: Mon, 3 Feb 2003 20:00:32 +0000 (GMT)

No, I think a lot of us have been caught by this in the past.

Then I suggest that this is 'fixed' in the next release (i.e., the query optimizer should automatically translate integer/bigint type arguments into char/string/text type arguments, or something like that)

- Mikael

_____________________________________________________________
Här börjar internet!
Skaffa gratis e-mail och gratis internet på http://www.spray.se

Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/

#10Jan Wieck
JanWieck@Yahoo.com
In reply to: Nigel J. Andrews (#8)
Re: 335 times faster (!)

"Nigel J. Andrews" wrote:

On Mon, 3 Feb 2003, Mikael Carneholm wrote:

What's the cause of this behaviour? Why isn't psql (or the backend) converting integer type arguments into char/string types, if there's this much to gain?

Someone more familiar with the backend can answer this a lot better than I
could so I won't give a half assed comment.

It is the other way round. The backend converts a non-quoted sequence of
digits too early into an int4 and cannot recover from that "mistake".

A single quoted literal value is treated as a constant of unknown
datatype and get's parsed into what fits best much later.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#11Mikael Carneholm
carniz@spray.se
In reply to: Jan Wieck (#10)
Re: 335 times faster (!)

------- Ursprungligt meddelande -------

Från: Jan Wieck <JanWieck@Yahoo.com>
Datum: Mon, 03 Feb 2003 16:11:53 -0500

It is the other way round. The backend converts a non-quoted sequence of
digits too early into an int4 and cannot recover from that "mistake".

Is this on the TODO list for 7.4? The current docs (http://developer.postgresql.org/docs/postgres/datatype.html#DATATYPE-INT) state that "PostgreSQL currently cannot use an index when two different data types are involved" - which makes it sound like this is a current-only drawback that will be fixed later on. T/F?

- Mikael

_____________________________________________________________
Här börjar internet!
Skaffa gratis e-mail och gratis internet på http://www.spray.se

Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/

#12Patric Bechtel
bechtel@ipcon.de
In reply to: Mikael Carneholm (#6)
Re: 335 times faster (!) [Viruschecked]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, 03 Feb 2003 20:23:28 +0100, Mikael Carneholm wrote:

Hallo Mikael,

it's been around for about a month or so:
I've submitted a patch on the patches list, which fixes
this (kind of, at least for Castor/OJB/Hibernate) when
the URL-parameter useExplicitTyping=true is set.
Just try it, there are several other very nice
features; but nobody seems to had an eye on it till now... :-(

If there's interest, I've improved the patch even
further, and keep it in sync with the current CVS;
since there's no interest so far, I'm not submitting
them; except someone tries and applies them.

tia
Patric

------- Ursprungligt meddelande -------

Från: Mario Weilguni <mweilguni@sime.com>
Datum: Mon, 3 Feb 2003 20:05:48 +0100

try:
explain select * from enheter where enhetsid = '200178146';
or
explain select * from enheter where enhetsid = 200178146::bigint

explain select * from enheter where enhetsid = '200178146';
QUERY PLAN

- ---------------------------------------------------------------------------

Index Scan using pk_enheter on enheter (cost=0.00..4.05 rows=1 width=91)
Index Cond: (enhetsid = 200178146::bigint)
(2 rows)

Strange...using:
200178146::bigint
or
'200178146'
..the query is lightning fast. Since the PK column is of integer type, I don't think it's logical to pass a string-type argument...or am I different than most

people on this point? :)

What about third party frameworks (such as Hibernate, eg) - I'm sure they will look at the column datatype and think: 'Oh, it's an integer...I'll pass an

integer argument then', which will result in unnecessary poor performance.

What's the cause of this behaviour? Why isn't psql (or the backend) converting integer type arguments into char/string types, if there's this much to gain?

(Sorry for being quite a Pg newbie..have never used Pg for such "heavy" load before)

- Mikael

_____________________________________________________________
Här börjar internet!
Skaffa gratis e-mail och gratis internet på http://www.spray.se

Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

PGP Public Key Fingerprint: 2636 F26E F523 7D62 4377 D206 7C68 06BB

-----BEGIN PGP SIGNATURE-----
Version: PGPsdk version 1.7.1 (C) 1997-1999 Network Associates, Inc. and its affiliated companies.

iQA/AwUBPj7VW3xoBrvMu8qQEQLfAwCgl0Dd0K4QLtt4E8Seqr4ArRm8Kv0An3Vd
mmL6pYa9PZDb4osUxw7q5xSZ
=NZGE
-----END PGP SIGNATURE-----

#13Patric Bechtel
bechtel@ipcon.de
In reply to: Patric Bechtel (#12)
Re: 335 times faster (!) [Viruschecked] [Viruschecked]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, 03 Feb 2003 22:47:22 +0100, Patric Bechtel wrote:

Hello Mikael,

sorry for bothering, but I oversaw that you wrote on the
GENERAL list... my patch is only for the JDBC drive. It
probably isn't the "right" way to fix it, but digging into
that backend issue is clearly beyond my skills... :-)

Patric

On Mon, 03 Feb 2003 20:23:28 +0100, Mikael Carneholm wrote:

Hallo Mikael,

it's been around for about a month or so:
I've submitted a patch on the patches list, which fixes
this (kind of, at least for Castor/OJB/Hibernate) when
the URL-parameter useExplicitTyping=true is set.
Just try it, there are several other very nice
features; but nobody seems to had an eye on it till now... :-(

If there's interest, I've improved the patch even
further, and keep it in sync with the current CVS;
since there's no interest so far, I'm not submitting
them; except someone tries and applies them.

tia
Patric

------- Ursprungligt meddelande -------

Från: Mario Weilguni <mweilguni@sime.com>
Datum: Mon, 3 Feb 2003 20:05:48 +0100

try:
explain select * from enheter where enhetsid = '200178146';
or
explain select * from enheter where enhetsid = 200178146::bigint

explain select * from enheter where enhetsid = '200178146';
QUERY PLAN

---------------------------------------------------------------------------

Index Scan using pk_enheter on enheter (cost=0.00..4.05 rows=1 width=91)
Index Cond: (enhetsid = 200178146::bigint)
(2 rows)

Strange...using:
200178146::bigint
or
'200178146'
..the query is lightning fast. Since the PK column is of integer type, I don't think it's logical to pass a string-type argument...or am I different than most

people on this point? :)

What about third party frameworks (such as Hibernate, eg) - I'm sure they will look at the column datatype and think: 'Oh, it's an integer...I'll pass an

integer argument then', which will result in unnecessary poor performance.

What's the cause of this behaviour? Why isn't psql (or the backend) converting integer type arguments into char/string types, if there's this much to gain?

(Sorry for being quite a Pg newbie..have never used Pg for such "heavy" load before)

- Mikael

_____________________________________________________________
Här börjar internet!
Skaffa gratis e-mail och gratis internet på http://www.spray.se

Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

PGP Public Key Fingerprint: 2636 F26E F523 7D62 4377 D206 7C68 06BB

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

PGP Public Key Fingerprint: 2636 F26E F523 7D62 4377 D206 7C68 06BB

-----BEGIN PGP SIGNATURE-----
Version: PGPsdk version 1.7.1 (C) 1997-1999 Network Associates, Inc. and its affiliated companies.

iQA/AwUBPj7XNnxoBrvMu8qQEQLphgCdFp4zC7Rgn44jqOGJGttnmf0vBuIAn2P9
8YhRpfuzfvaU/ErQ1VOH0NL1
=wwrt
-----END PGP SIGNATURE-----

#14Neil Conway
neilc@samurai.com
In reply to: Mikael Carneholm (#9)
Re: 335 times faster (!)

On Mon, 2003-02-03 at 15:13, Mikael Carneholm wrote:

Then I suggest that this is 'fixed' in the next release

Please read the list archives -- this has been discussed before (many
times).

Short answer: the fix is non-trivial. It will be fixed eventually, but
it's difficult to do without causing other undesirable changes.

Cheers,

Neil
--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC

#15Mikael Carneholm
carniz@spray.se
In reply to: Neil Conway (#14)
Re: 335 times faster (!) [Viruschecked]

------- Ursprungligt meddelande -------

Från: Patric Bechtel <bechtel@ipcon.de>
Datum: Mon, 03 Feb 2003 22:47:22 +0100

Hallo Mikael,

it's been around for about a month or so:
I've submitted a patch on the patches list, which fixes
this (kind of, at least for Castor/OJB/Hibernate) when
the URL-parameter useExplicitTyping=true is set.
Just try it, there are several other very nice
features; but nobody seems to had an eye on it till now... :-(

If there's interest, I've improved the patch even
further, and keep it in sync with the current CVS;
since there's no interest so far, I'm not submitting
them; except someone tries and applies them.

tia
Patric

Super! I will try it as soon as possible! I'm sure there's interest for this, thinking of the many users that have discovered the fine combo Eclipse-JBOSS-[Hibernate/Castor/OJB]-PostgreSQL.

Regards,
- Mikael.

_____________________________________________________________
Här börjar internet!
Skaffa gratis e-mail och gratis internet på http://www.spray.se

Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/

#16Jan Wieck
JanWieck@Yahoo.com
In reply to: Mikael Carneholm (#11)
Re: 335 times faster (!)

Mikael Carneholm wrote:

------- Ursprungligt meddelande -------

Från: Jan Wieck <JanWieck@Yahoo.com>
Datum: Mon, 03 Feb 2003 16:11:53 -0500

It is the other way round. The backend converts a non-quoted sequence of
digits too early into an int4 and cannot recover from that "mistake".

Is this on the TODO list for 7.4? The current docs (http://developer.postgresql.org/docs/postgres/datatype.html#DATATYPE-INT) state that "PostgreSQL currently cannot use an index when two different data types are involved" - which makes it sound like this is a current-only drawback that will be fixed later on. T/F?

Exactly ... only I cannot define "later on" very precise. Assuming the
time-line going forward, it's in the future, somewhere.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #