Datatypes and performance

Started by Jay O'Connoralmost 23 years ago27 messagesgeneral

joconnor@cybermesa.com

almost 23 years ago

Couple of questions on datat types and performance

1. varchar vs varchar(2)

I created a database schema and based on the test data I had to work with,
I couldn't really determine the max size of a lot of string fields so I
just left a lot of fields as varchar

A comment from a coworker was that this would cause a performance problem

Based on the docs, they say that there's not performance difference between
using varchar(n) and text.

So will leaving my fields as unlimited varchar be a performance issue? Or
should I try to narrow them down?

My coworker has more experience with Oracle and MS-SQL than postgresql

2.varchar and int
I was using a varchar as one field that's part of an index. Looking at the
data, I realized I could just use an int instead. My assumption would be
that an int would be faster to serach for thena varchar, so I converted
the field to int. Is this a valid assumption?

Thanks

Andrew Sullivan

andrew@libertyrms.info

almost 23 years ago

In reply to: Jay O'Connor (#1)

Re: Datatypes and performance

On Fri, Jul 04, 2003 at 09:10:41AM -0700, Jay O'Connor wrote:

Based on the docs, they say that there's not performance difference between
using varchar(n) and text.

So will leaving my fields as unlimited varchar be a performance issue? Or
should I try to narrow them down?

In fact, there is a performance penalty for limiting their length,
because you have to check on each insert.

My coworker has more experience with Oracle and MS-SQL than postgresql

You may want to tell your coworker to read the docs ;-)

-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Maksim Likharev

mlikharev@aurigin.com

almost 23 years ago

In reply to: Andrew Sullivan (#2)

Re: Datatypes and performance

How postgres internally stores text fields, in a separate table?

-----Original Message-----
From: Andrew Sullivan [mailto:andrew@libertyrms.info]
Sent: Friday, July 04, 2003 12:55 PM
To: PostgreSQL List
Subject: Re: [GENERAL] Datatypes and performance

On Fri, Jul 04, 2003 at 09:10:41AM -0700, Jay O'Connor wrote:

Based on the docs, they say that there's not performance difference

between

using varchar(n) and text.

So will leaving my fields as unlimited varchar be a performance issue?

should I try to narrow them down?

In fact, there is a performance penalty for limiting their length,
because you have to check on each insert.

My coworker has more experience with Oracle and MS-SQL than postgresql

You may want to tell your coworker to read the docs ;-)

-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Import Notes

Resolved by subject fallback

Andrew Sullivan

andrew@libertyrms.info

almost 23 years ago

In reply to: Maksim Likharev (#3)

Re: Datatypes and performance

On Fri, Jul 04, 2003 at 01:14:52PM -0700, Maksim Likharev wrote:

How postgres internally stores text fields, in a separate table?

I believe it gets stored in a separate table just in case it's too
long (read the docs on TOAST if you want more about this). But
normally, no. Here's what the docs have to say about it:

---cut here---
The storage requirement for data of these types is 4 bytes plus the
actual string, and in case of character plus the padding. Long
strings are compressed by the system automatically, so the physical
requirement on disk may be less. Long values are also stored in
background tables so they don't interfere with rapid access to the
shorter column values. In any case, the longest possible character
string that can be stored is about 1 GB. (The maximum value that will
be allowed for n in the data type declaration is less than that. It
wouldn't be very useful to change this because with multibyte
character encodings the number of characters and bytes can be quite
different anyway. If you desire to store long strings with no
specific upper limit, use text or character varying without a length
specifier, rather than making up an arbitrary length limit.)
---cut here---

A
--
----
Andrew Sullivan 204-4141 Yonge Street
Liberty RMS Toronto, Ontario Canada
<andrew@libertyrms.info> M2P 2A8
+1 416 646 3304 x110

Maksim Likharev

mlikharev@aurigin.com

almost 23 years ago

In reply to: Andrew Sullivan (#4)

Re: Datatypes and performance

Ok, what I see here tells me that text is slower then fixed len varchar,
due to stored in separate table ( but how else you can store long fields
).
so postgres has to read another page(s) in order to get long value.
Story about boundary checks for varchar just does not count,
just nothing with comparing with disk reads/writes.

-----Original Message-----
From: Andrew Sullivan [mailto:andrew@libertyrms.info]
Sent: Friday, July 04, 2003 1:24 PM
To: PostgreSQL List
Subject: Re: [GENERAL] Datatypes and performance

On Fri, Jul 04, 2003 at 01:14:52PM -0700, Maksim Likharev wrote:

How postgres internally stores text fields, in a separate table?

I believe it gets stored in a separate table just in case it's too
long (read the docs on TOAST if you want more about this). But
normally, no. Here's what the docs have to say about it:

A
--
----
Andrew Sullivan 204-4141 Yonge Street
Liberty RMS Toronto, Ontario Canada
<andrew@libertyrms.info> M2P 2A8
+1 416 646 3304 x110

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Import Notes

Resolved by subject fallback

Nigel J. Andrews

nandrews@investsystems.co.uk

almost 23 years ago

In reply to: Maksim Likharev (#5)

Re: Datatypes and performance

That's "long values" that are stored elsewhere. I believe the length has to be
a good portion of a page size, normally 4KB, before it's considered for placing
in the toast table. I'm not sure if the details are in the documentation but
I'm sure they'll appear in a search of the archive. Of course, someone like
Tom, Bruce etc. while no doubt pop up with the specifics.

--
Nigel J. Andrews

On Fri, 4 Jul 2003, Maksim Likharev wrote:

Show quoted text

Ok, what I see here tells me that text is slower then fixed len varchar,
due to stored in separate table ( but how else you can store long fields
).
so postgres has to read another page(s) in order to get long value.
Story about boundary checks for varchar just does not count,
just nothing with comparing with disk reads/writes.

-----Original Message-----
From: Andrew Sullivan [mailto:andrew@libertyrms.info]
Sent: Friday, July 04, 2003 1:24 PM
To: PostgreSQL List
Subject: Re: [GENERAL] Datatypes and performance

On Fri, Jul 04, 2003 at 01:14:52PM -0700, Maksim Likharev wrote:

How postgres internally stores text fields, in a separate table?

I believe it gets stored in a separate table just in case it's too
long (read the docs on TOAST if you want more about this). But
normally, no. Here's what the docs have to say about it:

---cut here---
The storage requirement for data of these types is 4 bytes plus the
actual string, and in case of character plus the padding. Long
strings are compressed by the system automatically, so the physical
requirement on disk may be less. Long values are also stored in
background tables so they don't interfere with rapid access to the
shorter column values. In any case, the longest possible character
string that can be stored is about 1 GB. (The maximum value that will
be allowed for n in the data type declaration is less than that. It
wouldn't be very useful to change this because with multibyte
character encodings the number of characters and bytes can be quite
different anyway. If you desire to store long strings with no
specific upper limit, use text or character varying without a length
specifier, rather than making up an arbitrary length limit.)
---cut here---

A

Alvaro Herrera

alvherre@dcc.uchile.cl

almost 23 years ago

In reply to: Maksim Likharev (#5)

Re: Datatypes and performance

On Fri, Jul 04, 2003 at 02:22:39PM -0700, Maksim Likharev wrote:

Ok, what I see here tells me that text is slower then fixed len varchar,
due to stored in separate table ( but how else you can store long fields
).
so postgres has to read another page(s) in order to get long value.

That's regardless of the datatype: a varchar longer than 2 KiB IIRC will
be stored in a separate table, just as a text longer than 2 KiB.
There's no difference _at all_ for those two datatypes _except_ that the
former is checked for maximum length. If you store 256 chars in a TEXT
field it will be in the main table as it were a varchar(256).

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Nunca confiarï¿½ en un traidor. Ni siquiera si el traidor lo he creado yo"
(Barï¿½n Vladimir Harkonnen)

Maksim Likharev

mlikharev@aurigin.com

almost 23 years ago

In reply to: Alvaro Herrera (#7)

Re: Datatypes and performance

Ok, then if I store 4K - 1byte in a varchar and after that 4K + 1 byte,
and again and again, you are saying me that my text will be jerking
around 2 tables?
So in reality no performance degradation/benefits for varchar vs text,
should be read as 'varchar as slow as text' or keep you varchar under 4K
if you want to read it fast.

Pretty useful detail, thank you.

-----Original Message-----
From: Alvaro Herrera [mailto:alvherre@dcc.uchile.cl]
Sent: Friday, July 04, 2003 2:54 PM
To: Maksim Likharev
Cc: PostgreSQL List
Subject: Re: [GENERAL] Datatypes and performance

On Fri, Jul 04, 2003 at 02:22:39PM -0700, Maksim Likharev wrote:

Ok, what I see here tells me that text is slower then fixed len

varchar,

due to stored in separate table ( but how else you can store long

fields

).
so postgres has to read another page(s) in order to get long value.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Nunca confiaré en un traidor. Ni siquiera si el traidor lo he creado
yo"
(Barón Vladimir Harkonnen)

Import Notes

Resolved by subject fallback

Alvaro Herrera

alvherre@dcc.uchile.cl

almost 23 years ago

In reply to: Maksim Likharev (#8)

Re: Datatypes and performance

On Fri, Jul 04, 2003 at 06:57:04PM -0700, Maksim Likharev wrote:

Ok, then if I store 4K - 1byte in a varchar and after that 4K + 1 byte,
and again and again, you are saying me that my text will be jerking
around 2 tables?

In any case the tuple will probably "jerk around" different pages (I
don't think it's different whether the pages are from two tables or from
the same one). I fail to understand how that is a different performance
problem than for any other datatype. I'm sure you've done some
benchmark to determine that the varchar handling is as slow as you
appear to be saying?

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Postgres is bloatware by design: it was built to house
PhD theses." (Joey Hellerstein, SIGMOD annual conference 2002)

#10

Dennis Bjorklund

db@zigo.dhs.org

almost 23 years ago

In reply to: Maksim Likharev (#8)

Re: Datatypes and performance

On Fri, 4 Jul 2003, Maksim Likharev wrote:

Ok, then if I store 4K - 1byte in a varchar and after that 4K + 1 byte,
and again and again, you are saying me that my text will be jerking
around 2 tables?

Updates in postgresql is not done inplace. It just adds the new data and
the old is kept around so that other concurrent transactions can still run
and use the old data. This gives a lot of speed when you have many
concurrent users.

Vacuum cleans out old unused data.

--
/Dennis

#11

Maksim Likharev

mlikharev@aurigin.com

almost 23 years ago

In reply to: Dennis Bjorklund (#10)

Re: Datatypes and performance

Wasn't any sarcasm in my words,
as I sad I am really glad to see real numbers behind the story about
no performance difference for varchar and text.

I do not have performance metrics for varchars,
but I have benchmarks and my observations about performance in general.
Just want to say, I haven't seen much other DBMS systems other then
Microsoft SQL,
and Postgres 7.3.x, so my performance observations based on those
systems.
Selects, do not have any problems in general equal or slightly slower
that MS SQL.

Inserts/Updates sometimes slow, sometimes a nightmare,
in general painfully slow.

I guess as long as Postgres retrieves data fast enough I satisfied,
but for dynamic DBs ( lots of inserts/updates ) ....

P.S
do not want to offend somebody or something, just saying what I see.

-----Original Message-----
From: Alvaro Herrera [mailto:alvherre@dcc.uchile.cl]
Sent: Friday, July 04, 2003 8:13 PM
To: Maksim Likharev
Cc: PostgreSQL List
Subject: Re: [GENERAL] Datatypes and performance

On Fri, Jul 04, 2003 at 06:57:04PM -0700, Maksim Likharev wrote:

Ok, then if I store 4K - 1byte in a varchar and after that 4K + 1

byte,

and again and again, you are saying me that my text will be jerking
around 2 tables?

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Postgres is bloatware by design: it was built to house
PhD theses." (Joey Hellerstein, SIGMOD annual conference 2002)

Import Notes

Resolved by subject fallback

#12

Alvaro Herrera

alvherre@dcc.uchile.cl

almost 23 years ago

In reply to: Maksim Likharev (#11)

Re: Datatypes and performance

On Fri, Jul 04, 2003 at 10:50:17PM -0700, Maksim Likharev wrote:

Wasn't any sarcasm in my words,

Well, I had really thought so. I'm no native english speaker so I
frequently misunderstand people.

but I have benchmarks and my observations about performance in general.
Just want to say, I haven't seen much other DBMS systems other then
Microsoft SQL, and Postgres 7.3.x, so my performance observations
based on those systems. Selects, do not have any problems in general
equal or slightly slower that MS SQL.

Inserts/Updates sometimes slow, sometimes a nightmare,
in general painfully slow.

This is probably unrelated to the technique I was describing (it's
called TOAST, by the way). Try using another approach, for example if
you have lots of inserts, wrap them in a transaction like
BEGIN
INSERT ...
INSERT ...
...
COMMIT

Or use a COPY statement. Both of those approaches are much faster than
using hundreds of standalone inserts.

Be sure to tune your server, for example bump up the shared_buffers
setting, VACUUM frequently, ANALYZE when there's significant statistical
change in your data, etc. There are lots of documents on how to do
this (I'm sure somebody will post appropiate URLs for those).

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Por suerte hoy explotï¿½ el califont porque si no me habrï¿½a muerto
de aburrido" (Papelucho)

#13

Mattias Kregert

mattias@kregert.se

almost 23 years ago

In reply to: Maksim Likharev (#11)

Re: Datatypes and performance

First of all, disable FSYNC... that will speed things up a lot!

----- Original Message -----
From: "Alvaro Herrera" <alvherre@dcc.uchile.cl>
To: "Maksim Likharev" <mlikharev@aurigin.com>
Cc: "PostgreSQL List" <pgsql-general@postgresql.org>
Sent: Saturday, July 05, 2003 8:15 AM
Subject: Re: [GENERAL] Datatypes and performance

Show quoted text

On Fri, Jul 04, 2003 at 10:50:17PM -0700, Maksim Likharev wrote:

Wasn't any sarcasm in my words,

Well, I had really thought so. I'm no native english speaker so I
frequently misunderstand people.

but I have benchmarks and my observations about performance in general.
Just want to say, I haven't seen much other DBMS systems other then
Microsoft SQL, and Postgres 7.3.x, so my performance observations
based on those systems. Selects, do not have any problems in general
equal or slightly slower that MS SQL.

Inserts/Updates sometimes slow, sometimes a nightmare,
in general painfully slow.

This is probably unrelated to the technique I was describing (it's
called TOAST, by the way). Try using another approach, for example if
you have lots of inserts, wrap them in a transaction like
BEGIN
INSERT ...
INSERT ...
...
COMMIT

Or use a COPY statement. Both of those approaches are much faster than
using hundreds of standalone inserts.

Be sure to tune your server, for example bump up the shared_buffers
setting, VACUUM frequently, ANALYZE when there's significant statistical
change in your data, etc. There are lots of documents on how to do
this (I'm sure somebody will post appropiate URLs for those).

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Por suerte hoy explotó el califont porque si no me habría muerto
de aburrido" (Papelucho)

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

#14

Shridhar Daithankar

shridhar_daithankar@persistent.co.in

almost 23 years ago

In reply to: Mattias Kregert (#13)

Re: Datatypes and performance

On 7 Jul 2003 at 12:59, Mattias Kregert wrote:

First of all, disable FSYNC... that will speed things up a lot!

That shouldn't have been done so casually. Though can cause performance boost
and it degrades reliability of data as it can cause data corruption in case of
power/disk failure.

Bye
Shridhar

--
Why use Windows, since there is a door?(By fachat@galileo.rhein-neckar.de,
Andre Fachat)

#15

Mattias Kregert

mattias@kregert.se

almost 23 years ago

In reply to: Shridhar Daithankar (#14)

Re: Datatypes and performance

In the case of disk failure, the files will probably be damaged anyway and then i'll have to install new hardware or format+check badblocks and then restore from the backup. I can't see how fsync would help in the case of disk crash. Without reliable raid or something i think this would be a catastrophic failure => get new hardware: disk/ram/computer/building whatever and go find the (remote) backup.

In the case of power failure, you'll *might* have to restore from backup unless you use ReiserFS or some other journalling filesystem. I use ReiserFS. I also have a UPS installed, just to be sure...

Journalling FS will fix the FS problems, so the files are ok.
PG journal will fix the PG problems so the tables will be ok.

From: "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>

Show quoted text

On 7 Jul 2003 at 12:59, Mattias Kregert wrote:

First of all, disable FSYNC... that will speed things up a lot!

That shouldn't have been done so casually. Though can cause performance boost
and it degrades reliability of data as it can cause data corruption in case of
power/disk failure.

Bye
Shridhar

#16

Martijn van Oosterhout

kleptog@svana.org

almost 23 years ago

In reply to: Mattias Kregert (#15)

Re: Datatypes and performance

On Mon, Jul 07, 2003 at 01:30:09PM +0200, Mattias Kregert wrote:

In the case of disk failure, the files will probably be damaged anyway and
then i'll have to install new hardware or format+check badblocks and then
restore from the backup. I can't see how fsync would help in the case of
disk crash. Without reliable raid or something i think this would be a
catastrophic failure => get new hardware: disk/ram/computer/building
whatever and go find the (remote) backup.

True, it doesn't help with one disk, but RAID does work.

In the case of power failure, you'll *might* have to restore from backup
unless you use ReiserFS or some other journalling filesystem. I use
ReiserFS. I also have a UPS installed, just to be sure...

Journalling FS will fix the FS problems, so the files are ok.
PG journal will fix the PG problems so the tables will be ok.

Firstly, journalling filesystems insure the integrity of the *filesystem*,
not the files on it. So your files can still be corrupted. You could enable
full data journalling. I would imagine that would cost you more than just
enabling fsync.

Secondly, IIRC the fsync applies to the PG journals, so turning off fsync
will kill the tables in a crash.

Basically, do you care about your data?

--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

"the West won the world not by the superiority of its ideas or values or
religion but rather by its superiority in applying organized violence.
Westerners often forget this fact, non-Westerners never do."
- Samuel P. Huntington

#17

Tom Lane

tgl@sss.pgh.pa.us

almost 23 years ago

In reply to: Mattias Kregert (#15)

Re: Datatypes and performance

"Mattias Kregert" <mattias@kregert.se> writes:

[ misguided analysis ]

Journalling FS will fix the FS problems, so the files are ok.
PG journal will fix the PG problems so the tables will be ok.

Only if the journal is all down to disk before the crash.

The fundamental problem with fsync off is that it's likely to violate
the WAL principle (write journal entry before data entry it describes).
If you then crash, you have data entries that correspond to transactions
that should not have been committed (because WAL replay won't guarantee
recovering all of the transaction's effects). In other words, corrupt
data.

If we had a less costly way of guaranteeing write order than fsync, we'd
use it, but there is no other portable method.

regards, tom lane

#18

Mattias Kregert

mattias@kregert.se

almost 23 years ago

In reply to: Shridhar Daithankar (#14)

Re: Datatypes and performance

Tom Lane wrote:

"Mattias Kregert" <mattias@kregert.se> writes:

[ misguided analysis ]

Journalling FS will fix the FS problems, so the files are ok.
PG journal will fix the PG problems so the tables will be ok.

Only if the journal is all down to disk before the crash.

The fundamental problem with fsync off is that it's likely to violate
the WAL principle (write journal entry before data entry it describes).
If you then crash, you have data entries that correspond to transactions
that should not have been committed (because WAL replay won't guarantee
recovering all of the transaction's effects). In other words, corrupt
data.

If we had a less costly way of guaranteeing write order than fsync, we'd
use it, but there is no other portable method.

Uh oh... i thought the journal was always synced, and that the fsync option only affected table writes... :(

If I turn fsync on and then pull the power cord while a number of clients are doing lots of inserts/updates and stuff, will the fsync then guarantee that no data will be lost or corrupted?

/* m */

#19

Karsten Hilbert

Karsten.Hilbert@gmx.net

almost 23 years ago

In reply to: Mattias Kregert (#18)

Re: Datatypes and performance

If I turn fsync on and then pull the power cord while a
number of clients are doing lots of inserts/updates and stuff,
will the fsync then guarantee that no data will be lost or
corrupted?

You are surely kidding, aren't you ?

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

#20

Mattias Kregert

mattias@kregert.se

almost 23 years ago

In reply to: Shridhar Daithankar (#14)

Re: Datatypes and performance

If I turn fsync on and then pull the power cord while a
number of clients are doing lots of inserts/updates and stuff,
will the fsync then guarantee that no data will be lost or
corrupted?

You are surely kidding, aren't you ?

Karsten

No. No kidding.
Just to clarify, what I mean is: With FSYNC enabled, after a power failure, after "pg_ctl start" and replay of xact log etc; Are COMMITTED transactions guaranteed to be intact, and are UNCOMMITTED transactions guaranteed not to appear in the tables?

If the answer is "yes", then I understand the use of FSYNC.

If the answer is "no", then i don't see the point in using FSYNC at all.

/* m */

#21

Maksim Likharev

mlikharev@aurigin.com

almost 23 years ago

In reply to: Mattias Kregert (#20)

#22