Disk block size issues.

Started by Nonameabout 28 years ago12 messages
#1Noname
darrenk@insightdist.com

A few things that I have noticed will be affected by allowing the
disk block size to be other than 8k. (4k, 8k, 16k or 32k)

1. Rules

The rule system currently stores plans as tuples in pg_rewrite.
Making the block size smaller will accordingly reduce the size of
the rules you can create.

But, the converse is also true...bigger blocks -> bigger rules.

Are the rules ever going to become large objects? Is this something
to put on the TODO to investigate now that Peter has fixed them?

2. Attribute limits

Should the size limits of the varchar/char be driven by the chosen
block size?

Since the current max len is 4k, should I for now advise that the
block size not be made smaller than the current 8k? Or could the
limit be dropped from 4096 to 4000 to allow 4k blocks?

Oracle has a limit of 2000 on their varchar since they allow blocks
of as little as 2k.

Seems there would be an inconsistency in there with telling the user
that the text/varchar/char limit is 4096 and then not letting them
store a value of that size because of the tuple/block size limit.

Perhaps mention this as a caveat also if using 4k blocks? Are 4k
block something that someone would be beneficial or only 16k/32k?

On the flip-side of this, uping the max text size though will run
into the 8k packet size.

I've run thru the regression tests a few times with 4k blocks and
they seem to pass with the same differences. Today I will try with
16k and 32k. If those work, I'll submit the patch for perusal.

Comments welcome...

darrenk@insightdist.com

#2Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Noname (#1)
Re: [HACKERS] Disk block size issues.

A few things that I have noticed will be affected by allowing the
disk block size to be other than 8k. (4k, 8k, 16k or 32k)

1. Rules

The rule system currently stores plans as tuples in pg_rewrite.
Making the block size smaller will accordingly reduce the size of
the rules you can create.

I say make it match the given block size at compile time.

But, the converse is also true...bigger blocks -> bigger rules.

Are the rules ever going to become large objects? Is this something
to put on the TODO to investigate now that Peter has fixed them?

2. Attribute limits

Should the size limits of the varchar/char be driven by the chosen
block size?

Yes, they should be calculated based on the compile block size.

Since the current max len is 4k, should I for now advise that the
block size not be made smaller than the current 8k? Or could the
limit be dropped from 4096 to 4000 to allow 4k blocks?

Oracle has a limit of 2000 on their varchar since they allow blocks
of as little as 2k.

Seems there would be an inconsistency in there with telling the user
that the text/varchar/char limit is 4096 and then not letting them
store a value of that size because of the tuple/block size limit.

Perhaps mention this as a caveat also if using 4k blocks? Are 4k
block something that someone would be beneficial or only 16k/32k?

Just make the max size based on the block size.

On the flip-side of this, uping the max text size though will run
into the 8k packet size.

This is an interesting point. While we can compute most of the changes
at compile time, we will have to communicate with clients that were
compiled with different max limits.

I recommend we increase the max client buffer size to what we believe is
the largest block size anyone would ever reasonably choose. That way,
all can communicate. I recommend you contact Peter Mount for JDBC,
Openlink for ODBC, and all the other client maintainers and let them
know the changes will be in 6.3 so they can be ready with new version
when 6.3 starts beta on February 1.

I've run thru the regression tests a few times with 4k blocks and
they seem to pass with the same differences. Today I will try with
16k and 32k. If those work, I'll submit the patch for perusal.

Great.

--
Bruce Momjian
maillist@candle.pha.pa.us

#3Vadim B. Mikheev
vadim@sable.krasnoyarsk.su
In reply to: Noname (#1)
Re: [HACKERS] Disk block size issues.

Darren King wrote:

A few things that I have noticed will be affected by allowing the
disk block size to be other than 8k. (4k, 8k, 16k or 32k)

1. Rules

The rule system currently stores plans as tuples in pg_rewrite.
Making the block size smaller will accordingly reduce the size of
the rules you can create.

But, the converse is also true...bigger blocks -> bigger rules.

Are the rules ever going to become large objects? Is this something
to put on the TODO to investigate now that Peter has fixed them?

It's better to implement multi-representation feature for all verlena
types. We could use on-disk vl_len < 0 to flag that data of size ABS(vl_len)
are in large object specified in vl_data. It seems very easy to do.

This will also resolve item 2 below.

Vadim

Show quoted text

2. Attribute limits

Should the size limits of the varchar/char be driven by the chosen
block size?

Since the current max len is 4k, should I for now advise that the
block size not be made smaller than the current 8k? Or could the
limit be dropped from 4096 to 4000 to allow 4k blocks?

Oracle has a limit of 2000 on their varchar since they allow blocks
of as little as 2k.

Seems there would be an inconsistency in there with telling the user
that the text/varchar/char limit is 4096 and then not letting them
store a value of that size because of the tuple/block size limit.

Perhaps mention this as a caveat also if using 4k blocks? Are 4k
block something that someone would be beneficial or only 16k/32k?

On the flip-side of this, uping the max text size though will run
into the 8k packet size.

I've run thru the regression tests a few times with 4k blocks and
they seem to pass with the same differences. Today I will try with
16k and 32k. If those work, I'll submit the patch for perusal.

Comments welcome...

darrenk@insightdist.com

#4Noname
darrenk@insightdist.com
In reply to: Vadim B. Mikheev (#3)
Re: [HACKERS] Disk block size issues.

A few things that I have noticed will be affected by allowing the
disk block size to be other than 8k. (4k, 8k, 16k or 32k)

1. Rules

The rule system currently stores plans as tuples in pg_rewrite.
Making the block size smaller will accordingly reduce the size of
the rules you can create.

I say make it match the given block size at compile time.

For now it does. There's a comment in rewriteDefine.c though that
indicates the original pg coders thought about putting the stored
plans into large objects if 8k was too limiting.

Could be nice to have the type limits stored in a system table so
the user or a program could query the limits of the current db.

2. Attribute limits

Should the size limits of the varchar/char be driven by the chosen
block size?

Yes, they should be calculated based on the compile block size.
...
Just make the max size based on the block size.
...
This is an interesting point. While we can compute most of the changes
at compile time, we will have to communicate with clients that were
compiled with different max limits.

I recommend we increase the max client buffer size to what we believe is
the largest block size anyone would ever reasonably choose. That way,
all can communicate. I recommend you contact Peter Mount for JDBC,
Openlink for ODBC, and all the other client maintainers and let them
know the changes will be in 6.3 so they can be ready with new version
when 6.3 starts beta on February 1.

So the buffer size will be defined in one place also that they should all
reference when compiling or running? In include/config.h I assume?

This could be difficult for the ODBC and JDBC drivers to determine
automagically since they are usually compiled on different systems that
the postgres src.

Other stuff...

Could the block size be made into a command line option, like "-k 8192"?

Would only require that the BLCKSZ define become a variable and that it
be passed to the backends too. Much easier than having to recompile/install
postgres to change the block size. Could have multiple postmasters running
different block-sized databases without having to have a binary around for
each size.

Renaming BLCKSZ...

How about PG_BLOCK_SIZE? Or if it's made a variable, DiskBlockSize, keeping
it in the tradition of SortMem, ShowStats, etc.

darrenk

#5Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Noname (#4)
Re: [HACKERS] Disk block size issues.

A few things that I have noticed will be affected by allowing the
disk block size to be other than 8k. (4k, 8k, 16k or 32k)

1. Rules

The rule system currently stores plans as tuples in pg_rewrite.
Making the block size smaller will accordingly reduce the size of
the rules you can create.

I say make it match the given block size at compile time.

For now it does. There's a comment in rewriteDefine.c though that
indicates the original pg coders thought about putting the stored
plans into large objects if 8k was too limiting.

Yep, I saw that too.

Could be nice to have the type limits stored in a system table so
the user or a program could query the limits of the current db.

Someday.

2. Attribute limits

Should the size limits of the varchar/char be driven by the chosen
block size?

Yes, they should be calculated based on the compile block size.
...
Just make the max size based on the block size.
...
This is an interesting point. While we can compute most of the changes
at compile time, we will have to communicate with clients that were
compiled with different max limits.

I recommend we increase the max client buffer size to what we believe is
the largest block size anyone would ever reasonably choose. That way,
all can communicate. I recommend you contact Peter Mount for JDBC,
Openlink for ODBC, and all the other client maintainers and let them
know the changes will be in 6.3 so they can be ready with new version
when 6.3 starts beta on February 1.

So the buffer size will be defined in one place also that they should all
reference when compiling or running? In include/config.h I assume?

Yes, in config.h, and let's call it PG... so it is clear, and everything
can key off of that.

This could be difficult for the ODBC and JDBC drivers to determine
automagically since they are usually compiled on different systems that
the postgres src.

I think they will need to handle the maximum size someone could ever
choose. Let's face it, 32k or 64k is not too much to ask for a buffer.
I just hope there are not too many of them. I only see it in one place
in libpq. The others are malloc'ed based on how big the result is when
it comes back from the socket.

I recommend we add a test in config.h to make sure they do not set the
max size greater than some predefined limit, and mention why we test
there (for clients). The interface/* files will not use the backend
block size, but will use another config.h define called PGMAXBLCKSZ, or
something like that, so they can interoperate will all backends.

Other stuff...

Could the block size be made into a command line option, like "-k 8192"?

Too scary for me.

Would only require that the BLCKSZ define become a variable and that it
be passed to the backends too. Much easier than having to recompile/install
postgres to change the block size. Could have multiple postmasters running
different block-sized databases without having to have a binary around for
each size.

Yes, we could do that, but if they ever start the postmaster with a
different value, he is lost. I thought because of the bit fields and
cases where BLCKSZ is used in macros to define sized arrays that we
can't make it variable.

I think we should make it a config.h constant for now, but I am not firm
on this.

Renaming BLCKSZ...

How about PG_BLOCK_SIZE? Or if it's made a variable, DiskBlockSize, keeping
it in the tradition of SortMem, ShowStats, etc.

I like that new name.

--
Bruce Momjian
maillist@candle.pha.pa.us

#6The Hermit Hacker
scrappy@hub.org
In reply to: Noname (#4)
Re: [HACKERS] Disk block size issues.

On Fri, 9 Jan 1998, Darren King wrote:

How about PG_BLOCK_SIZE? Or if it's made a variable, DiskBlockSize, keeping
it in the tradition of SortMem, ShowStats, etc.

I know of one site that builds their Virtual Websites into
chroot()'d environments...something like this would be perfect for them,
as it would prvent them having to recompile for each individual size...

But...initdb would have to have an appropriate option...and we'd
have to have a mechanism in place that checks that -k parameter is
actually appropriate.

Would it not make a little more sense to have a pg_block_size file
created in the data directory that postmaster reads at startup?

#7Bruce Momjian
maillist@candle.pha.pa.us
In reply to: The Hermit Hacker (#6)
Re: [HACKERS] Disk block size issues.

On Fri, 9 Jan 1998, Darren King wrote:

How about PG_BLOCK_SIZE? Or if it's made a variable, DiskBlockSize, keeping
it in the tradition of SortMem, ShowStats, etc.

I know of one site that builds their Virtual Websites into
chroot()'d environments...something like this would be perfect for them,
as it would prvent them having to recompile for each individual size...

But...initdb would have to have an appropriate option...and we'd
have to have a mechanism in place that checks that -k parameter is
actually appropriate.

Would it not make a little more sense to have a pg_block_size file
created in the data directory that postmaster reads at startup?

I like that, but the postmaster and each backend would have to read that
file before starting, or the postmaster can pass it down into the
postgres backend via a command-line option.

--
Bruce Momjian
maillist@candle.pha.pa.us

#8The Hermit Hacker
scrappy@hub.org
In reply to: Bruce Momjian (#5)
Re: [HACKERS] Disk block size issues.

On Fri, 9 Jan 1998, Bruce Momjian wrote:

Other stuff...

Could the block size be made into a command line option, like "-k 8192"?

Too scary for me.

I kinda like this one...if it can be relatively implimented. The main
reason I like it is that, like -B and -S, it means that someone could deal
with "tweaking" a system without having to recompile from scratch...

That said, I'd much rather that -k option being something that is
an option only available when *creating* the database (ie. initdb) with a
pg_blocksize file being created and checked when postmaster starts up.

Essentially, make '-k 8192' an option only available to the postgres
process, not the postmaster process. And not settable by the -O option to
postmaster...

Yes, we could do that, but if they ever start the postmaster with a
different value, he is lost.

See above...it should only be something that is settable at initdb time,
not accessible via 'postmaster' itself...

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

#9Noname
darrenk@insightdist.com
In reply to: The Hermit Hacker (#8)
Re: [HACKERS] Disk block size issues.

On Fri, 9 Jan 1998, Bruce Momjian wrote:

Could the block size be made into a command line option, like "-k 8192"?

Too scary for me.

I kinda like this one...if it can be relatively implimented. The main
reason I like it is that, like -B and -S, it means that someone could deal
with "tweaking" a system without having to recompile from scratch...

That said, I'd much rather that -k option being something that is
an option only available when *creating* the database (ie. initdb) with a
pg_blocksize file being created and checked when postmaster starts up.

Essentially, make '-k 8192' an option only available to the postgres
process, not the postmaster process. And not settable by the -O option to
postmaster...

Yes, we could do that, but if they ever start the postmaster with a
different value, he is lost.

See above...it should only be something that is settable at initdb time,
not accessible via 'postmaster' itself...

This is a pretty reasonable restriction, but...

The major change would be like Bruce has stated earlier, the variables that
are declared with the #define value would have to be made into pointers and
palloc'd/pfree'd as necessary. Could get pretty ugly in files like nbtsort.c
with double-dereferenced pointers and all.

I'll make a list of these variables this weekend and come with a more definate
opinion on the subject.

darrenk

#10Shiby Thomas
sthomas@cise.ufl.edu
In reply to: The Hermit Hacker (#8)
Re: [HACKERS] Disk block size issues.

=> I kinda like this one...if it can be relatively implimented. The main
=> reason I like it is that, like -B and -S, it means that someone could deal
=> with "tweaking" a system without having to recompile from scratch...
=>
The -S flag for the postmaster seems to be setting the silentflag. But the
FAQ says, it can be used to set the sort memory. The following is 6.2.1 version
code in src/backend/postmaster/postmaster.c
case 'S':

/*
* Start in 'S'ilent mode (disassociate from controlling
* tty). You may also think of this as 'S'ysV mode since
* it's most badly needed on SysV-derived systems like
* SVR4 and HP-UX.
*/
silentflag = 1;
break;

Am I looking at the wrong file? Can someone please tell me how to increase
the sort memory size.

Thanks
--shiby

#11Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Shiby Thomas (#10)
Re: [HACKERS] Disk block size issues.

Bug in FAQ, fixed now. The -S in postmaster is silent, the -S in
postgres is sort. The FAQ had it as postmaster when it should have been
postgres.

=> I kinda like this one...if it can be relatively implimented. The main
=> reason I like it is that, like -B and -S, it means that someone could deal
=> with "tweaking" a system without having to recompile from scratch...
=>
The -S flag for the postmaster seems to be setting the silentflag. But the
FAQ says, it can be used to set the sort memory. The following is 6.2.1 version
code in src/backend/postmaster/postmaster.c
case 'S':

/*
* Start in 'S'ilent mode (disassociate from controlling
* tty). You may also think of this as 'S'ysV mode since
* it's most badly needed on SysV-derived systems like
* SVR4 and HP-UX.
*/
silentflag = 1;
break;

Am I looking at the wrong file? Can someone please tell me how to increase
the sort memory size.

Thanks
--shiby

--
Bruce Momjian
maillist@candle.pha.pa.us

#12Peter T Mount
psqlhack@maidast.demon.co.uk
In reply to: Bruce Momjian (#5)
Re: [HACKERS] Disk block size issues.

On Fri, 9 Jan 1998, Bruce Momjian wrote:

This is an interesting point. While we can compute most of the changes
at compile time, we will have to communicate with clients that were
compiled with different max limits.

I recommend we increase the max client buffer size to what we believe is
the largest block size anyone would ever reasonably choose. That way,
all can communicate. I recommend you contact Peter Mount for JDBC,
Openlink for ODBC, and all the other client maintainers and let them
know the changes will be in 6.3 so they can be ready with new version
when 6.3 starts beta on February 1.

I'll be ready :-)

So the buffer size will be defined in one place also that they should all
reference when compiling or running? In include/config.h I assume?

Yes, in config.h, and let's call it PG... so it is clear, and everything
can key off of that.

This could be difficult for the ODBC and JDBC drivers to determine
automagically since they are usually compiled on different systems that
the postgres src.

Not necesarily for JDBC. Because of it's nature, there is no real reason
why we can't even include it precompiled with the source - the same jar
file runs on any platform.

Infact, this does bring up the same problem we were discussing about
earlier, where we were thinking about changing the protocol on startup. If
that change occurs, then this value is an ideal candidate to add to the
startup packet.

I think they will need to handle the maximum size someone could ever
choose. Let's face it, 32k or 64k is not too much to ask for a buffer.
I just hope there are not too many of them. I only see it in one place
in libpq. The others are malloc'ed based on how big the result is when
it comes back from the socket.

I recommend we add a test in config.h to make sure they do not set the
max size greater than some predefined limit, and mention why we test
there (for clients). The interface/* files will not use the backend
block size, but will use another config.h define called PGMAXBLCKSZ, or
something like that, so they can interoperate will all backends.

Slight problem with JDBC (or Java in general), in that we don't use .h
files, so settings in config.h are useless to us. So far, certain
constants have been duplicated in the source.

I was thinking of possibly adding a couple of functions to the backend, to
allow us to get certain details about the backend, which is needed for
certain DatabaseMetaData methods. Perhaps adding PGMAXBLCKSZ to that may
get round the problem.

--
Peter T Mount petermount@earthling.net or pmount@maidast.demon.co.uk
Main Homepage: http://www.demon.co.uk/finder
Work Homepage: http://www.maidstone.gov.uk Work EMail: peter@maidstone.gov.uk