varchar() vs char16 performance

Started by Thomas G. Lockhartalmost 28 years ago24 messages
#1Thomas G. Lockhart
lockhart@alumni.caltech.edu

I ran some timing tests to check the performance of varchar() vs char16.
The results of the test indicate that there is no difference in
performance (within the timing scatter of the tests):

char16 vc(16)
0.99s 1.05s 1 row (this measures startup time, not types)
39.29s 39.28s ~65000 rows

The char2,4,8,16 types seem to have no value-added over the
better-supported char(), varchar(), text types; I am considering
removing them from the backend, and instead have the parser
transparently translate the types into varchar() (or char() - I'm not
certain which is a better match for the types) for v6.4. Applications
would not have to be changed.

Comments?

- Tom

The test is included below:

-- create table c16 (c char16);
create table c16 (c varchar(16));

copy c16 from 'c16.copy';

select count(*) from c16 where c = 'hi there';

select count(*) from c16 where c = 'test string';

select count(*) from c16 where c != 'hi there';

select count(*) from c16 where c != 'test string';

delete from c16;

drop table c16;

#2Noname
darrenk@insightdist.com
In reply to: Thomas G. Lockhart (#1)
Re: [HACKERS] varchar() vs char16 performance

The char2,4,8,16 types seem to have no value-added over the
better-supported char(), varchar(), text types; I am considering
removing them from the backend, and instead have the parser
transparently translate the types into varchar() (or char() - I'm not
certain which is a better match for the types) for v6.4. Applications
would not have to be changed.

Comments?

Wouldn't bother me. I've got this mega-patch sitting here that would
remove them and put them into a loadable module under contrib. Just
have to tidy up the section that creates the index ops.

Will there be a warning about using a "depreciated type" in 6.4 or are
we going to have this gunking up the grammer forever? :)

darrenk

#3Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Thomas G. Lockhart (#1)
Re: [HACKERS] varchar() vs char16 performance

I ran some timing tests to check the performance of varchar() vs char16.
The results of the test indicate that there is no difference in
performance (within the timing scatter of the tests):

char16 vc(16)
0.99s 1.05s 1 row (this measures startup time, not types)
39.29s 39.28s ~65000 rows

The char2,4,8,16 types seem to have no value-added over the
better-supported char(), varchar(), text types; I am considering
removing them from the backend, and instead have the parser
transparently translate the types into varchar() (or char() - I'm not
certain which is a better match for the types) for v6.4. Applications
would not have to be changed.

Fine, remove them. You may want to keep 'char' because of the reduced
overhead compared to char(1), but the others certainly can be removed.
Seems like you have not mentioned char, so you may no intension of
removing it.

I would map to char(). They are fixed size, and the old types were
fixed size too. I know char16 may be better for varchar(), but that is
not as clean a translation from the old type.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#4Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Noname (#2)
Re: [HACKERS] varchar() vs char16 performance

Will there be a warning about using a "depreciated type" in 6.4 or are
we going to have this gunking up the grammer forever? :)

Good idea. Then we can pull it out of the grammar sometime later. Now,
if these types are in a loadable module, then we can't actually do
anything in the parser anyway, since the loadable module would never
work. Are these character types worth keeping at all? Less support and
no performance benefit leaves me thinking not...

- Tom

#5Zeugswetter Andreas
andreas.zeugswetter@telecom.at
In reply to: Bruce Momjian (#3)
AW: [HACKERS] varchar() vs char16 performance

I ran some timing tests to check the performance of varchar() vs char16.
The results of the test indicate that there is no difference in
performance (within the timing scatter of the tests):

char16 vc(16)
0.99s 1.05s 1 row (this measures startup time, not types)
39.29s 39.28s ~65000 rows

The char2,4,8,16 types seem to have no value-added over the
better-supported char(), varchar(), text types; I am considering
removing them from the backend, and instead have the parser
transparently translate the types into varchar() (or char() - I'm not
certain which is a better match for the types) for v6.4. Applications
would not have to be changed.

Comments?

From the logic char(16) would be the correct replacement.

I also support the idea to remove char-char16. The problems I see are:
1. char2-16 does not have the 4 byte var header (this would be good for the char() type too)
2. per definition 'a ' = 'a' for the char type but 'a ' <> 'a' for varchar

Note the correct behavior of postgresql:
test=> select 'a '::char(4) = 'a'::char(4);
?column?
--------
t
(1 row)

test=> select 'a '::varchar(4) = 'a'::varchar(4);
?column?
--------
f
(1 row)

but note the "incorrect" (char-char16 beeing a fixed length type) behavior of the char-char16 code
test=> select 'a '::char4 = 'a'::char4;
?column?
--------
f
(1 row)

so the current char2-16 code behaves more like varchar. I don't know if anybody does
rely on this behavior. If not, I would vote to change the char-char16 with the char()
type and remove the varhdr from char(). I like getting a simple char * into my C function.
(see point 1)

Andreas

#6Noname
darrenk@insightdist.com
In reply to: Zeugswetter Andreas (#5)
Re: [HACKERS] varchar() vs char16 performance

Will there be a warning about using a "depreciated type" in 6.4 or are
we going to have this gunking up the grammer forever? :)

Good idea. Then we can pull it out of the grammar sometime later. Now,
if these types are in a loadable module, then we can't actually do
anything in the parser anyway, since the loadable module would never
work. Are these character types worth keeping at all? Less support and
no performance benefit leaves me thinking not...

IMHO, not worth keeping if the performance benefit is gone and the only
real benefit though was the few bytes of header space they saved per field.

They should work as loadable if removed from the parser since they would
be created thru a series of CREATE TYPE, FUNCTION and OPERATOR statements.
Should go thru the parser as just some user-created t[r]ype.

But any benefit from including them in the distribution as loadable would
be negligible. Could be, but no real point, so I vote to depreciate them.

darrenk

#7Hal Snyder
hal@vailsys.com
In reply to: Thomas G. Lockhart (#1)
Re: [HACKERS] varchar() vs char16 performance

Date: Wed, 11 Mar 1998 14:39:23 +0000
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>

...

The char2,4,8,16 types seem to have no value-added over the
better-supported char(), varchar(), text types; I am considering
removing them from the backend, and instead have the parser
transparently translate the types into varchar() (or char() - I'm not
certain which is a better match for the types) for v6.4. Applications
would not have to be changed.

Comments?

I'm not up on the details of PostgreSQL's differing character types,
but wonder - would the proposed change break any apps where trailing
(or leading?) whitespace is significant? Not that I'm running any
...

#8Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Zeugswetter Andreas (#5)
Re: AW: [HACKERS] varchar() vs char16 performance

so the current char2-16 code behaves more like varchar. I don't know if anybody does
rely on this behavior. If not, I would vote to change the char-char16 with the char()
type and remove the varhdr from char(). I like getting a simple char * into my C function.
(see point 1)

Removing the header from char() types is possible now that we have
atttypmod, but I doubt atttypmod is available in all places that the
length of the type is needed. varlena is supported all over the place
in the backend.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#9Brett McCormick
brett@work.chicken.org
In reply to: Hal Snyder (#7)
Re: [HACKERS] varchar() vs char16 performance

n Wed, 11 March 1998, at 11:15:34, Hal Snyder wrote:

The char2,4,8,16 types seem to have no value-added over the
better-supported char(), varchar(), text types; I am considering
removing them from the backend, and instead have the parser
transparently translate the types into varchar() (or char() - I'm not
certain which is a better match for the types) for v6.4. Applications
would not have to be changed.

Comments?

I'm not up on the details of PostgreSQL's differing character types,
but wonder - would the proposed change break any apps where trailing
(or leading?) whitespace is significant? Not that I'm running any
...

Heh.. migrating to 6.3. was a surprise for me.. I certainly wasn't
expecting whitespace pads, and there are some cases where it makes a
big difference!

#10Andrew Martin
martin@biochemistry.ucl.ac.uk
In reply to: Brett McCormick (#9)
Re: [HACKERS] varchar() vs char16 performance

Will there be a warning about using a "depreciated type" in 6.4 or are
we going to have this gunking up the grammer forever? :)

Good idea. Then we can pull it out of the grammar sometime later. Now,
if these types are in a loadable module, then we can't actually do
anything in the parser anyway, since the loadable module would never
work. Are these character types worth keeping at all? Less support and
no performance benefit leaves me thinking not...

IMHO, not worth keeping if the performance benefit is gone and the only
real benefit though was the few bytes of header space they saved per field.

If char2 et al are going completely from 6.4, I think it would be sensible
for pg_dump to filter these types and change them to char(2) et al when
it writes the CREATE statments.

Best wishes,

Andrew

----------------------------------------------------------------------------
Dr. Andrew C.R. Martin University College London
EMAIL: (Work) martin@biochem.ucl.ac.uk (Home) andrew@stagleys.demon.co.uk
URL: http://www.biochem.ucl.ac.uk/~martin
Tel: (Work) +44(0)171 419 3890 (Home) +44(0)1372 275775

#11Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Andrew Martin (#10)
Re: [HACKERS] varchar() vs char16 performance

If char2 et al are going completely from 6.4, I think it would be sensible
for pg_dump to filter these types and change them to char(2) et al when
it writes the CREATE statments.

That is an interesting idea, but what about applications that use them?
I think we can have the parser change them for one release, then drop
them completely.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#12Andrew Martin
martin@biochemistry.ucl.ac.uk
In reply to: Bruce Momjian (#11)
Re: [HACKERS] varchar() vs char16 performance

If char2 et al are going completely from 6.4, I think it would be sensible
for pg_dump to filter these types and change them to char(2) et al when
it writes the CREATE statments.

That is an interesting idea, but what about applications that use them?
I think we can have the parser change them for one release, then drop
them completely.

Not just an interesting idea, but essential!

There are two separate issues here:
1. Migrating data between PostgreSQL versions
2. Migrating applications between PostgreSQL versions

My suggestion for modifying pg_dump addresses the first of these. Without
this fix, it will become impossible to migrate data without editing the
dump file. The problem of column names mentioned before is not so
serious as one can change the column name in the old version of the
database before dumping (providing you know this is necessary! A list
of depracated column names in the upgrade procedure would be useful,
or better pg_dump would tell you...). With changes of type, there is
no option but to edit the dump file. If that happens to be bigger than
your total VM you are stuffed! (I guess you could mess around with
head and tail to split the file up or write a Perl script to modify
the schemae, but we shouldn't be expecting users to do this...)

An alternative workaround would be for pg_dumpall to have an option
where it makes use of the pg_dump's ability to dump only the data
or only the schema. That way one could edit just the schemae which
will be small and not have to touch the actual data dumps. However,
this really isn't necessary if pg_dump does the filtering.

Issue 2 really isn't a problem. Providing there are adequate warnings
in the upgrade information, developers can quite simply alter the
applications they write which use CREATE statements. These warnings
should *stay in the upgrade information* for evermore... We must
not assume that people who are happy with a release they have of
PostgreSQL will upgrade everytime a new version comes out.

Andrew

----------------------------------------------------------------------------
Dr. Andrew C.R. Martin University College London
EMAIL: (Work) martin@biochem.ucl.ac.uk (Home) andrew@stagleys.demon.co.uk
URL: http://www.biochem.ucl.ac.uk/~martin
Tel: (Work) +44(0)171 419 3890 (Home) +44(0)1372 275775

#13Noname
t-ishii@sra.co.jp
In reply to: Andrew Martin (#12)
Re: [HACKERS] varchar() vs char16 performance

I ran some timing tests to check the performance of varchar() vs char16.
The results of the test indicate that there is no difference in
performance (within the timing scatter of the tests):

char16 vc(16)
0.99s 1.05s 1 row (this measures startup time, not types)
39.29s 39.28s ~65000 rows

The char2,4,8,16 types seem to have no value-added over the
better-supported char(), varchar(), text types; I am considering
removing them from the backend, and instead have the parser
transparently translate the types into varchar() (or char() - I'm not
certain which is a better match for the types) for v6.4. Applications
would not have to be changed.

Comments?

Please do not remove char2! Some users uses it for making an array of
char.

create table c(c char2[]);

Seems strange? Yes. Actually what he wanted to do was:

test=> create table c(c char[]);
ERROR: parser: parse error at or near "["
--
Tatsuo Ishii
t-ishii@sra.co.jp

#14Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Noname (#13)
Re: [HACKERS] varchar() vs char16 performance

I ran some timing tests to check the performance of varchar() vs char16.
The results of the test indicate that there is no difference in
performance (within the timing scatter of the tests):

char16 vc(16)
0.99s 1.05s 1 row (this measures startup time, not types)
39.29s 39.28s ~65000 rows

The char2,4,8,16 types seem to have no value-added over the
better-supported char(), varchar(), text types; I am considering
removing them from the backend, and instead have the parser
transparently translate the types into varchar() (or char() - I'm not
certain which is a better match for the types) for v6.4. Applications
would not have to be changed.

Comments?

Please do not remove char2! Some users uses it for making an array of
char.

create table c(c char2[]);

Seems strange? Yes. Actually what he wanted to do was:

test=> create table c(c char[]);
ERROR: parser: parse error at or near "["

Maybe we just need to fix char[].

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#15Brett McCormick
brett@work.chicken.org
In reply to: Noname (#13)
Re: [HACKERS] varchar() vs char16 performance

hmm.. well until the grammar gets fixed, create table c(c _char)
should work, as _typename is the typename from an array of that type.
Although I don't see what advantages a character array has over text?

On Mon, 16 March 1998, at 15:20:36, t-ishii@sra.co.jp wrote:

Show quoted text

Please do not remove char2! Some users uses it for making an array of
char.

create table c(c char2[]);

Seems strange? Yes. Actually what he wanted to do was:

test=> create table c(c char[]);
ERROR: parser: parse error at or near "["
--
Tatsuo Ishii
t-ishii@sra.co.jp

#16Noname
t-ishii@sra.co.jp
In reply to: Brett McCormick (#15)
Re: [HACKERS] varchar() vs char16 performance

hmm.. well until the grammar gets fixed, create table c(c _char)
should work, as _typename is the typename from an array of that type.
Although I don't see what advantages a character array has over text?

Good point. I asked him the same question. He needed a chararcter
vector in that each char represents an individual value. Rather than
using substr(), an array might be more intuitive, he said.

On Mon, 16 March 1998, at 15:20:36, t-ishii@sra.co.jp wrote:

Please do not remove char2! Some users uses it for making an array of
char.

create table c(c char2[]);

Seems strange? Yes. Actually what he wanted to do was:

test=> create table c(c char[]);
ERROR: parser: parse error at or near "["

--
Tatsuo Ishii
t-ishii@sra.co.jp

#17Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#14)
Re: [HACKERS] varchar() vs char16 performance

The char2,4,8,16 types seem to have no value-added over the
better-supported char(), varchar(), text types; I am considering
removing them from the backend, and instead have the parser
transparently translate the types into varchar() or char()

Maybe we just need to fix char[].

There are notes in the source code from Jolly wondering whether arrays
of char/varchar would work, and specifically disallowing it "for now".

imho, even though there is one user forcing arrays of single characters
by invoking char2, rather than using text and substrings, that is not
sufficient to keep this obsolete capability in the backend forever.

There are two ways to smoothly transition to a system in which char2-16
is not a native built-in type:

1) modify the parser to automatically translate char2-16 into
char(2-16). This will not allow arrays of char2.

2) put char2-16 into a user-loadable module, which could be configured
into the template1 database if the installer chooses. This would retain
all current char2-16 capabilities.

These options are mutually exclusive, since implementing (1) would mean
the parser would not allow user-defined types for (2).

I had thought that char2-16 add _no_ functionality over the char() and
varchar() types; Tatsuo points out at least one capability which they
have. Are there any others?

- Tom

#18Noname
t-ishii@sra.co.jp
In reply to: Thomas G. Lockhart (#17)
Re: [HACKERS] varchar() vs char16 performance

imho, even though there is one user forcing arrays of single characters
by invoking char2, rather than using text and substrings, that is not
sufficient to keep this obsolete capability in the backend forever.

I agree with you. With the suggestion from Brett, the user now have
better solution than using char2, I think.
--
Tatsuo Ishii
t-ishii@sra.co.jp

#19Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Zeugswetter Andreas (#21)
Re: AW: [HACKERS] varchar() vs char16 performance

I had thought that char2-16 add _no_ functionality over the char() and
varchar() types; Tatsuo points out at least one capability which they
have. Are there any others?

They give and take a char * pointer to a C function like
create function upper(char16)
returning char16 as '/u/my/upper.so' language 'sql';
whereas char() gives a varlena pointer.

I don't really see this as a big deal since, for example, only 16 bytes
are allocated for a char16, so it is not guaranteed to be zero delimited
and you have to make a working copy to use libc functions anyway. Also,
that is really an implementation detail or annoyance rather than a
user-visible feature.

With the macros that are provided for the varlena structure
manipulations, things are pretty convenient. Are there more macros which
could be helpful here??

- Tom

#20Noname
darrenk@insightdist.com
In reply to: Zeugswetter Andreas (#21)
Re: AW: [HACKERS] varchar() vs char16 performance

I had thought that char2-16 add _no_ functionality over the char() and
varchar() types; Tatsuo points out at least one capability which they
have. Are there any others?

They give and take a char * pointer to a C function like
create function upper(char16)
returning char16 as '/u/my/upper.so' language 'sql';
whereas char() gives a varlena pointer.

The char[248] types rip out just fine.

But that char16 is a whole new beast. It's tentacles are everywhere...

From comments in various files, the char16 was the original name type

and was then replaced with 'name'. But there are still a few places
of inconsistency in the code, namely (*bad pun*) in the cache code.

There is this eqproc array in catcache.c that is a hack that has to
match the oids of the types from oid 16 to 30, except that F_CHAR16EQ
is still where F_NAMEEQ should be. Tried renaming it last night, but
initdb would blowup the first insert, so there is some other effect in
the caching code.

Still other incomplete conversions. In pg_proc.h the arg types for
char16eq, lt, le, gt, ge & ne are names (oid 19) when they should be
char16 (oid 20)! But char16eq is correctly defined to take char16
in pg_operator.h.

I'll work on this some more tonite.

darrenk

#21Zeugswetter Andreas
andreas.zeugswetter@telecom.at
In reply to: Noname (#18)
AW: [HACKERS] varchar() vs char16 performance

I had thought that char2-16 add _no_ functionality over the char() and
varchar() types; Tatsuo points out at least one capability which they
have. Are there any others?

They give and take a char * pointer to a C function like
create function upper(char16)
returning char16 as '/u/my/upper.so' language 'sql';
whereas char() gives a varlena pointer.

Andreas

#22Zeugswetter Andreas
andreas.zeugswetter@telecom.at
In reply to: Zeugswetter Andreas (#21)
Re: [HACKERS] varchar() vs char16 performance

I had thought that char2-16 add _no_ functionality over the char() and
varchar() types; Tatsuo points out at least one capability which they
have. Are there any others?

They give and take a char * pointer to a C function like
create function upper(char16)
returning char16 as '/u/my/upper.so' language 'sql';
whereas char() gives a varlena pointer.

I don't really see this as a big deal since, for example, only 16 bytes
are allocated for a char16, so it is not guaranteed to be zero delimited
and you have to make a working copy to use libc functions anyway. Also,
that is really an implementation detail or annoyance rather than a
user-visible feature.

I thought almost all postgresql users write their C extensions ;-)
it is ***the*** feature of postgresql. I am not saying that this is a real problem.
It is just one thing that will be different, but ok for me :-)

Show quoted text

With the macros that are provided for the varlena structure
manipulations, things are pretty convenient. Are there more macros which
could be helpful here??

#23Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Thomas G. Lockhart (#19)
Re: AW: [HACKERS] varchar() vs char16 performance

I don't really see this as a big deal since, for example, only 16 bytes
are allocated for a char16, so it is not guaranteed to be zero delimited
and you have to make a working copy to use libc functions anyway. Also,
that is really an implementation detail or annoyance rather than a
user-visible feature.

Agreed.

With the macros that are provided for the varlena structure
manipulations, things are pretty convenient. Are there more macros which
could be helpful here??

From postgres.h:

#define VARSIZE(PTR) (((struct varlena *)(PTR))->vl_len)
#define VARDATA(PTR) (((struct varlena *)(PTR))->vl_dat)
#define VARHDRSZ sizeof(int32)

I have gone throught the code before 6.3, and changed many hard-coded
4's to use these macros.
-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#24Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Noname (#20)
Re: AW: [HACKERS] varchar() vs char16 performance

I had thought that char2-16 add _no_ functionality over the char() and
varchar() types; Tatsuo points out at least one capability which they
have. Are there any others?

They give and take a char * pointer to a C function like
create function upper(char16)
returning char16 as '/u/my/upper.so' language 'sql';
whereas char() gives a varlena pointer.

The char[248] types rip out just fine.

But that char16 is a whole new beast. It's tentacles are everywhere...

From comments in various files, the char16 was the original name type

and was then replaced with 'name'. But there are still a few places
of inconsistency in the code, namely (*bad pun*) in the cache code.

Yes, you are correct. That was the original name length.

I thought I fixed all the cache name16 references before 6.3. That is
why we can now have index names over 16 characters. Can you confirm
this is still a problem in 6.3.

There is this eqproc array in catcache.c that is a hack that has to
match the oids of the types from oid 16 to 30, except that F_CHAR16EQ
is still where F_NAMEEQ should be. Tried renaming it last night, but
initdb would blowup the first insert, so there is some other effect in
the caching code.

Still other incomplete conversions. In pg_proc.h the arg types for
char16eq, lt, le, gt, ge & ne are names (oid 19) when they should be
char16 (oid 20)! But char16eq is correctly defined to take char16
in pg_operator.h.

I'll work on this some more tonite.

Great.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)