Selecting a constant question

Started by Dann Corbitalmost 19 years ago59 messageshackers
Jump to latest
#1Dann Corbit
DCorbit@connx.com

SELECT 1 FROM test.dbo.a_003

gets about 60,000 records per second

SELECT '1' FROM test.dbo.a_003

gets about 600 records per second.

The cause is that postgres describes the return column as "unknown"
length 65534 in the 2nd case.

Since the value is a constant, it seems rather odd to make the length
65534 characters. Why not make it char(1) or some other appropriate and
less costly data type? After all, it isn't going to grow during the
query.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dann Corbit (#1)
Re: Selecting a constant question

"Dann Corbit" <DCorbit@connx.com> writes:

SELECT 1 FROM test.dbo.a_003
gets about 60,000 records per second
SELECT '1' FROM test.dbo.a_003
gets about 600 records per second.

The cause is that postgres describes the return column as "unknown"
length 65534 in the 2nd case.

Postgres describes it in no such fashion --- unknown will always have a
typmod of -1 which means "unspecified". Possibly you have some client
code that knows much less than it thinks it does about the meanings of
typmod values?

The actual volume of data transmitted is going to be just about the same
either way, so I'm not sure you've diagnosed the cause of slowdown
correctly. Trying the example in psql seems to be about the same speed
both ways, with if anything a slight advantage to select '1'.

regards, tom lane

#3Bruce Momjian
bruce@momjian.us
In reply to: Dann Corbit (#1)
Re: Selecting a constant question

"Dann Corbit" <DCorbit@connx.com> writes:

SELECT 1 FROM test.dbo.a_003

gets about 60,000 records per second

SELECT '1' FROM test.dbo.a_003

gets about 600 records per second.

The cause is that postgres describes the return column as "unknown"
length 65534 in the 2nd case.

Wait, back up. How does this cause it to go slower?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#4Dann Corbit
DCorbit@connx.com
In reply to: Bruce Momjian (#3)
Re: Selecting a constant question

-----Original Message-----
From: Gregory Stark [mailto:stark@enterprisedb.com]
Sent: Monday, June 11, 2007 12:48 PM
To: Dann Corbit
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Selecting a constant question

"Dann Corbit" <DCorbit@connx.com> writes:

SELECT 1 FROM test.dbo.a_003

gets about 60,000 records per second

SELECT '1' FROM test.dbo.a_003

gets about 600 records per second.

The cause is that postgres describes the return column as "unknown"
length 65534 in the 2nd case.

Wait, back up. How does this cause it to go slower?

The issue is this:

Postgres describes the column with a typmod of -1 (unknown) and a length
of 65534.

This means that any client application must reserve 65534 bytes of
spaces for every row of data (like a grid control for example), which
postgres should know (and report) that the maximum length of the column
is 1.

This is not a PSQL issue, it's an issue with other products relying on
the accuracy of the reported postgres metadata for a given SQL
statement.

#5Martijn van Oosterhout
kleptog@svana.org
In reply to: Dann Corbit (#4)
Re: Selecting a constant question

On Mon, Jun 11, 2007 at 12:55:55PM -0700, Dann Corbit wrote:

The issue is this:

Postgres describes the column with a typmod of -1 (unknown) and a length
of 65534.

Postgres does no such thing. How can it possibly know the maximum size
of a column before executing the query?

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dann Corbit (#4)
Re: Selecting a constant question

"Dann Corbit" <DCorbit@connx.com> writes:

The issue is this:
Postgres describes the column with a typmod of -1 (unknown) and a length
of 65534.

Oh, you're looking at typlen not typmod. Please observe the comments in
pg_type.h:

/*
* For a fixed-size type, typlen is the number of bytes we use to
* represent a value of this type, e.g. 4 for an int4. But for a
* variable-length type, typlen is negative. We use -1 to indicate a
* "varlena" type (one that has a length word), -2 to indicate a
* null-terminated C string.
*/
int2 typlen;

You should be treating typlen as signed not unsigned, and not assuming a
fixed width for any negative value.

Since the width refers to the server internal representation, and not to
what comes down the wire, I find it pretty strange for an application to
be using typlen for anything at all actually.

regards, tom lane

#7Dann Corbit
DCorbit@connx.com
In reply to: Tom Lane (#6)
Re: Selecting a constant question

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, June 11, 2007 1:32 PM
To: Dann Corbit
Cc: Gregory Stark; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Selecting a constant question

"Dann Corbit" <DCorbit@connx.com> writes:

The issue is this:
Postgres describes the column with a typmod of -1 (unknown) and a

length

of 65534.

Oh, you're looking at typlen not typmod. Please observe the comments

in

pg_type.h:

/*
* For a fixed-size type, typlen is the number of bytes we use

to

* represent a value of this type, e.g. 4 for an int4. But for
a
* variable-length type, typlen is negative. We use -1 to

indicate

a
* "varlena" type (one that has a length word), -2 to indicate a
* null-terminated C string.
*/
int2 typlen;

You should be treating typlen as signed not unsigned, and not assuming

a

fixed width for any negative value.

Since the width refers to the server internal representation, and not

to

what comes down the wire, I find it pretty strange for an application

to

be using typlen for anything at all actually.

Thanks for the response.

Since libpq function PQfsize returns -2 for all constant character
strings in SQL statements ... What is the proper procedure to determine
the length of a constant character column after query execution but
before fetching the first row of data?

#8Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#2)
Re: Selecting a constant question

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

Trying the example in psql seems to be about the same speed both ways, with
if anything a slight advantage to select '1'.

Fwiw I see a slight advantage for '1' as well. I wonder why.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#9Dann Corbit
DCorbit@connx.com
In reply to: Dann Corbit (#1)
Re: Selecting a constant question

-----Original Message-----
From: Martijn van Oosterhout [mailto:kleptog@svana.org]
Sent: Monday, June 11, 2007 1:46 PM
To: Dann Corbit
Subject: Re: [HACKERS] Selecting a constant question

On Mon, Jun 11, 2007 at 01:29:37PM -0700, Dann Corbit wrote:

Our application is using the libPQ interface to access postgres.

The query is "select '123' from <tablename>" .. the table is not
important.

After executing the query, we interrogate the metadata of the result

set

using the PQfsize, PQfmod and PQftype functions.

Did you read the documentation of the PQfsize function?

PQfsize returns the space allocated for this column in a database row,
in other words the size of the server's internal representation of the
data type. (Accordingly, it is not really very useful to clients.) A
negative value indicates the data type is variable-length.

http://www.postgresql.org/docs/8.2/interactive/libpq-exec.html#LIBPQ-EXE
C-

SELECT-INFO

The size of the column is returned as 65534 (or -2 if you consider

this

a signed short value)

It's variable length, you can't say anything more.

So what you are saying is that the constant '1' is variable length, and
there is no way to find out the maximum length from the database.

#10Dann Corbit
DCorbit@connx.com
In reply to: Dann Corbit (#9)
Re: Selecting a constant question

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
owner@postgresql.org] On Behalf Of Dann Corbit
Sent: Monday, June 11, 2007 1:52 PM
To: Martijn van Oosterhout
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Selecting a constant question

-----Original Message-----
From: Martijn van Oosterhout [mailto:kleptog@svana.org]
Sent: Monday, June 11, 2007 1:46 PM
To: Dann Corbit
Subject: Re: [HACKERS] Selecting a constant question

On Mon, Jun 11, 2007 at 01:29:37PM -0700, Dann Corbit wrote:

Our application is using the libPQ interface to access postgres.

The query is "select '123' from <tablename>" .. the table is not
important.

After executing the query, we interrogate the metadata of the

result

set

using the PQfsize, PQfmod and PQftype functions.

Did you read the documentation of the PQfsize function?

PQfsize returns the space allocated for this column in a database

row,

in other words the size of the server's internal representation of

the

data type. (Accordingly, it is not really very useful to clients.) A
negative value indicates the data type is variable-length.

http://www.postgresql.org/docs/8.2/interactive/libpq-exec.html#LIBPQ-EXE

C-

SELECT-INFO

The size of the column is returned as 65534 (or -2 if you

consider

this

a signed short value)

It's variable length, you can't say anything more.

So what you are saying is that the constant '1' is variable length,

and

there is no way to find out the maximum length from the database.

I have a PostgreSQL feature request:

Report the maximum size of a variable length string from the server.

Surely, we cannot be the only people who will need this information. If
(for example) someone wants to bind to a grid, then the maximum size has
to be known in advance.

#11Bruce Momjian
bruce@momjian.us
In reply to: Dann Corbit (#10)
Re: Selecting a constant question

"Dann Corbit" <DCorbit@connx.com> writes:

Surely, we cannot be the only people who will need this information. If
(for example) someone wants to bind to a grid, then the maximum size has
to be known in advance.

In fact psql needs it and implements this. It has to skim through the entire
result set to calculate the column widths. It's quite a lot of work but the
server is in no better position to do it than psql.

On the contrary the server is missing quite a bit of information of how you
intend to display the information. Do you need the number of bytes or
characters? Are all the characters the same width in your display system? What
about currency symbols? Do you intend to reverse any quoting or just display
backslashes?

Even knowing how many characters and assuming fixed character widths that
wouldn't even be enough to set your grid control widths. Usually people like
numeric quantities decimal aligned and so two records "1.00" and "0.01" will
take much more width than two records with "1.00" and "2.00".

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#12Dann Corbit
DCorbit@connx.com
In reply to: Bruce Momjian (#11)
Re: Selecting a constant question

-----Original Message-----
From: Gregory Stark [mailto:stark@enterprisedb.com]
Sent: Monday, June 11, 2007 2:41 PM
To: Dann Corbit
Cc: Martijn van Oosterhout; pgsql-hackers@postgresql.org; Larry McGhaw
Subject: Re: [HACKERS] Selecting a constant question

"Dann Corbit" <DCorbit@connx.com> writes:

Surely, we cannot be the only people who will need this information.

If

(for example) someone wants to bind to a grid, then the maximum size

has

to be known in advance.

In fact psql needs it and implements this. It has to skim through the
entire
result set to calculate the column widths. It's quite a lot of work

but

the
server is in no better position to do it than psql.

Reading the data twice sounds a little painful. What if there are 30
million rows?

On the contrary the server is missing quite a bit of information of

how

you
intend to display the information. Do you need the number of bytes or
characters? Are all the characters the same width in your display

system?

What
about currency symbols? Do you intend to reverse any quoting or just
display
backslashes?

Giving me the information about the data type will be enough. As an
example, in this case we have varchar data. If the server should be so
kind as to report varchar(1) for '1' or varchar(3) for '123' then I
would not have any difficulty binding the data to a grid.

Even knowing how many characters and assuming fixed character widths

that

wouldn't even be enough to set your grid control widths. Usually

people

like
numeric quantities decimal aligned and so two records "1.00" and

"0.01"

will
take much more width than two records with "1.00" and "2.00".

SQL*Server, Oracle, Ingres, DB/2 and other database systems somehow
manage to do it, so I guess it is not technically intractable.

I suspect that your own ODBC/JDBC and other drivers suffer from this
same effect.

Now, I do recognize that sometimes nobody is going to know how big
something is, including the server. But with a query using a constant
it seems like it ought to be well defined to me. Perhaps the
difficulties are escaping me because I am not familiar with the low
level guts of this problem. But I suspect that lots of people besides
me would benefit if sizes of things were known when it is possible to
know them.

As I said before, I see that it cannot be known right now. So I am
putting it in as a feature request.

If you could be so kind as to point out the right spot to look in the
server code, I imagine we could fix it and check in the patch ourselves.

#13Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dann Corbit (#12)
Re: Selecting a constant question

Dann Corbit wrote:

"Dann Corbit" <DCorbit@connx.com> writes:

In fact psql needs it and implements this. It has to skim through the
entire
result set to calculate the column widths. It's quite a lot of work

but

the
server is in no better position to do it than psql.

Reading the data twice sounds a little painful. What if there are 30
million rows?

You get an "out of memory" error.

On the contrary the server is missing quite a bit of information of
how you intend to display the information. Do you need the number of
bytes or characters? Are all the characters the same width in your
display system? What about currency symbols? Do you intend to
reverse any quoting or just display backslashes?

Giving me the information about the data type will be enough. As an
example, in this case we have varchar data. If the server should be so
kind as to report varchar(1) for '1' or varchar(3) for '123' then I
would not have any difficulty binding the data to a grid.

Oh, you have the length information for each datum all right. It's on
the first four bytes of it.

--
Alvaro Herrera http://www.PlanetPostgreSQL.org/
"The West won the world not by the superiority of its ideas or values
or religion but rather by its superiority in applying organized violence.
Westerners often forget this fact, non-Westerners never do."
(Samuel P. Huntington)

#14Dann Corbit
DCorbit@connx.com
In reply to: Alvaro Herrera (#13)
Re: Selecting a constant question

-----Original Message-----
From: Alvaro Herrera [mailto:alvherre@commandprompt.com]
Sent: Monday, June 11, 2007 3:16 PM
To: Dann Corbit
Cc: Gregory Stark; Martijn van Oosterhout;

pgsql-hackers@postgresql.org;

Larry McGhaw
Subject: Re: [HACKERS] Selecting a constant question

Dann Corbit wrote:

"Dann Corbit" <DCorbit@connx.com> writes:

In fact psql needs it and implements this. It has to skim through

the

entire
result set to calculate the column widths. It's quite a lot of

work

but

the
server is in no better position to do it than psql.

Reading the data twice sounds a little painful. What if there are

30

million rows?

You get an "out of memory" error.

On the contrary the server is missing quite a bit of information

of

how you intend to display the information. Do you need the number

of

bytes or characters? Are all the characters the same width in your
display system? What about currency symbols? Do you intend to
reverse any quoting or just display backslashes?

Giving me the information about the data type will be enough. As an
example, in this case we have varchar data. If the server should be

so

kind as to report varchar(1) for '1' or varchar(3) for '123' then I
would not have any difficulty binding the data to a grid.

Oh, you have the length information for each datum all right. It's on
the first four bytes of it.

Sure, but when I bind to a grid, I need to know a-priori how big the
biggest returned instance can be. Reading the entire data set twice to
learn the size of a constant seems rather conceptually odd to me.

#15Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dann Corbit (#14)
Re: Selecting a constant question

Dann Corbit wrote:

Oh, you have the length information for each datum all right. It's on
the first four bytes of it.

Sure, but when I bind to a grid, I need to know a-priori how big the
biggest returned instance can be. Reading the entire data set twice to
learn the size of a constant seems rather conceptually odd to me.

Did you read up on typmod already? I think that's part of the info sent
down in the query response.

--
Alvaro Herrera http://www.flickr.com/photos/alvherre/
"No single strategy is always right (Unless the boss says so)"
(Larry Wall)

#16Martijn van Oosterhout
kleptog@svana.org
In reply to: Dann Corbit (#14)
Re: Selecting a constant question

On Mon, Jun 11, 2007 at 03:18:33PM -0700, Dann Corbit wrote:

Sure, but when I bind to a grid, I need to know a-priori how big the
biggest returned instance can be. Reading the entire data set twice to
learn the size of a constant seems rather conceptually odd to me.

To be honest, the concept that a widget requires a constant that can't
be changed later is also a bit odd. There are many times you won't know
beforehand how big the data is, surely the framework should be smart
enough to handle these cases?

Start the width at 100, if it turns out to be too small, make it
bigger...

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#17Andrew Dunstan
andrew@dunslane.net
In reply to: Dann Corbit (#10)
Re: Selecting a constant question

Dann Corbit wrote:

I have a PostgreSQL feature request:

Report the maximum size of a variable length string from the server.

Surely, we cannot be the only people who will need this information. If
(for example) someone wants to bind to a grid, then the maximum size has
to be known in advance.

Does PQfmod not tell you what you need if the field is varchar(n) ?

cheers

andrew

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dann Corbit (#12)
Re: Selecting a constant question

"Dann Corbit" <DCorbit@connx.com> writes:

Giving me the information about the data type will be enough. As an
example, in this case we have varchar data. If the server should be so
kind as to report varchar(1) for '1' or varchar(3) for '123' then I
would not have any difficulty binding the data to a grid.

This seems merest fantasy. Reflect on multibyte character sets for a
bit --- even if it's known that the column is varchar(3) there is no
guarantee that the value will fit in 3 bytes.

regards, tom lane

#19Dann Corbit
DCorbit@connx.com
In reply to: Tom Lane (#18)
Re: Selecting a constant question

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, June 11, 2007 3:35 PM
To: Dann Corbit
Cc: Gregory Stark; Martijn van Oosterhout;

pgsql-hackers@postgresql.org;

Larry McGhaw
Subject: Re: [HACKERS] Selecting a constant question

"Dann Corbit" <DCorbit@connx.com> writes:

Giving me the information about the data type will be enough. As an
example, in this case we have varchar data. If the server should be

so

kind as to report varchar(1) for '1' or varchar(3) for '123' then I
would not have any difficulty binding the data to a grid.

This seems merest fantasy. Reflect on multibyte character sets for a
bit --- even if it's known that the column is varchar(3) there is no
guarantee that the value will fit in 3 bytes.

If the server bound the data as UNICODE, then it will tell me
UNICODE(3). I know how big this will be.

In the worst case scenario it will fit in 3*4 = 12 bytes.

If the server is built without UNICODE enabled, then it will definitely
fit in 3 bytes.

#20Dann Corbit
DCorbit@connx.com
In reply to: Martijn van Oosterhout (#16)
Re: Selecting a constant question

-----Original Message-----
From: Martijn van Oosterhout [mailto:kleptog@svana.org]
Sent: Monday, June 11, 2007 3:29 PM
To: Dann Corbit
Cc: Alvaro Herrera; Gregory Stark; pgsql-hackers@postgresql.org; Larry
McGhaw
Subject: Re: [HACKERS] Selecting a constant question

On Mon, Jun 11, 2007 at 03:18:33PM -0700, Dann Corbit wrote:

Sure, but when I bind to a grid, I need to know a-priori how big the
biggest returned instance can be. Reading the entire data set twice

to

learn the size of a constant seems rather conceptually odd to me.

To be honest, the concept that a widget requires a constant that can't
be changed later is also a bit odd.

Not when the data itself is a constant that cannot be changed.

There are many times you won't know
beforehand how big the data is, surely the framework should be smart
enough to handle these cases?

If it were impossible to know the size of a string constant supplied in
the query, then I think I would agree with you here. However, it seems
to me that the maximum possible size of such a known, constant-width
string is not hard to determine.

Start the width at 100, if it turns out to be too small, make it
bigger...

If that were a good idea, then why report data sizes at all? Just let
it always be a surprise when it comes streaming down the pipe.

Honestly, I cannot fathom this answer.

#21Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dann Corbit (#19)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dann Corbit (#20)
#23Dann Corbit
DCorbit@connx.com
In reply to: Alvaro Herrera (#21)
#24Dann Corbit
DCorbit@connx.com
In reply to: Tom Lane (#22)
#25Larry McGhaw
lmcghaw@connx.com
In reply to: Alvaro Herrera (#21)
#26Larry McGhaw
lmcghaw@connx.com
In reply to: Bruce Momjian (#11)
#27Kris Jurka
books@ejurka.com
In reply to: Larry McGhaw (#26)
#28Dann Corbit
DCorbit@connx.com
In reply to: Kris Jurka (#27)
#29Andrew Dunstan
andrew@dunslane.net
In reply to: Larry McGhaw (#26)
#30Dann Corbit
DCorbit@connx.com
In reply to: Andrew Dunstan (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Larry McGhaw (#25)
#32Dann Corbit
DCorbit@connx.com
In reply to: Tom Lane (#31)
#33Hannu Krosing
hannu@tm.ee
In reply to: Dann Corbit (#7)
#34Larry McGhaw
lmcghaw@connx.com
In reply to: Bruce Momjian (#11)
#35Hannu Krosing
hannu@tm.ee
In reply to: Larry McGhaw (#34)
#36Dann Corbit
DCorbit@connx.com
In reply to: Hannu Krosing (#33)
#37Dann Corbit
DCorbit@connx.com
In reply to: Hannu Krosing (#35)
#38Hannu Krosing
hannu@tm.ee
In reply to: Dann Corbit (#37)
#39Larry McGhaw
lmcghaw@connx.com
In reply to: Bruce Momjian (#11)
#40Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Larry McGhaw (#34)
#41Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Heikki Linnakangas (#40)
#42Bruce Momjian
bruce@momjian.us
In reply to: Larry McGhaw (#26)
#43Florian Pflug
fgp@phlo.org
In reply to: Dann Corbit (#36)
#44Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Florian Pflug (#43)
#45Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Zeugswetter Andreas SB SD (#44)
#46Dave Page
dpage@pgadmin.org
In reply to: Heikki Linnakangas (#45)
#47Martijn van Oosterhout
kleptog@svana.org
In reply to: Zeugswetter Andreas SB SD (#44)
#48Andrew Dunstan
andrew@dunslane.net
In reply to: Larry McGhaw (#39)
#49Larry McGhaw
lmcghaw@connx.com
In reply to: Andrew Dunstan (#48)
#50Martijn van Oosterhout
kleptog@svana.org
In reply to: Larry McGhaw (#49)
#51Brian Hurt
bhurt@janestcapital.com
In reply to: Larry McGhaw (#49)
#52Larry McGhaw
lmcghaw@connx.com
In reply to: Brian Hurt (#51)
#53Andrew Dunstan
andrew@dunslane.net
In reply to: Larry McGhaw (#52)
#54Larry McGhaw
lmcghaw@connx.com
In reply to: Andrew Dunstan (#53)
#55Bruce Momjian
bruce@momjian.us
In reply to: Larry McGhaw (#52)
#56Larry McGhaw
lmcghaw@connx.com
In reply to: Martijn van Oosterhout (#50)
#57Dann Corbit
DCorbit@connx.com
In reply to: Larry McGhaw (#56)
#58Hannu Krosing
hannu@tm.ee
In reply to: Larry McGhaw (#56)
#59Joshua D. Drake
jd@commandprompt.com
In reply to: Hannu Krosing (#58)