tables >2GB
Now that we know the storage manager code that splits tables over 2GB
into separate files doesn't work(Irix), can we rip out that code and
just use the OS code to access >2GB files as normal files. Now, most
OS's can support 64-bit files and file sizes.
Because it is isolated in the storage manager, it should be easy.
--
Bruce Momjian
maillist@candle.pha.pa.us
Bruce Momjian wrote:
Now that we know the storage manager code that splits tables over 2GB
into separate files doesn't work(Irix), can we rip out that code and
just use the OS code to access >2GB files as normal files. Now, most
OS's can support 64-bit files and file sizes.Because it is isolated in the storage manager, it should be easy.
Someday we'll get TABLESPACEs and fixed multi-chunk code could
allow to store chunks in different TABLESPACEs created on _different
disks_ - imho, ability to store a table on > 1 disk is good thing.
And so, I would suggest just add elog(ERROR) to mdextend() now,
with recommendation to increase RELSEG_SIZE...
Vadim
Now that we know the storage manager code that splits tables over 2GB
into separate files doesn't work(Irix), can we rip out that code and
just use the OS code to access >2GB files as normal files. Now, most
OS's can support 64-bit files and file sizes.Because it is isolated in the storage manager, it should be easy.
Can someone knowledgeable make a patch for this for our mega-patch?
--
Bruce Momjian | 830 Blythe Avenue
maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)
On Thu, 19 Mar 1998, Bruce Momjian wrote:
Now that we know the storage manager code that splits tables over 2GB
into separate files doesn't work(Irix), can we rip out that code and
just use the OS code to access >2GB files as normal files. Now, most
OS's can support 64-bit files and file sizes.Because it is isolated in the storage manager, it should be easy.
Can someone knowledgeable make a patch for this for our mega-patch?
*Only* if its in before 9am AST (or is it ADT?) on Friday
morning...:)
Now that we know the storage manager code that splits tables over 2GB
into separate files doesn't work(Irix), can we rip out that code and
just use the OS code to access >2GB files as normal files. Now, most
OS's can support 64-bit files and file sizes.Because it is isolated in the storage manager, it should be easy.
Can someone knowledgeable make a patch for this for our mega-patch?
There are still quite a few OS's out there that do not support >2GB files
yet. Even my beloved Linux (x86)...
So, how about we fix the storage manager instead?
A neat thing that Illustra does is allow you to stripe a table across
multiple directories. You get big tables, easy storage management, and
a nice performance boost.
create stripedir('stripe1', '/disk1/data/stripe1');
create stripedir('stripe2', '/disk2/data/stripe2');
create table giant_table (...) with (stripes 4, 'stripe1', 'stripe2');
-- the '4' is the number of pages to interleave.
Then the smgr just distributes the blocks alternately across the stripes.
read_block(blockno, ...stripeinfo)
{
...
stripe = (blockno / stripe_interleave ) % number_of_stripes;
stripe_block = blockno / number_of_stripes;
fd = stripe_info->fd[stripe];
lseek(fd, stripe_block * BLOCKSIZE, SEEK_SET);
...
}
All vastly oversimplified of course....
-dg
David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
- I realize now that irony has no place in business communications.
Does this act as a notice of volunteering to work on this aspect of the
code? *Grin*
On Thu, 19 Mar 1998, David Gould wrote:
Show quoted text
Now that we know the storage manager code that splits tables over 2GB
into separate files doesn't work(Irix), can we rip out that code and
just use the OS code to access >2GB files as normal files. Now, most
OS's can support 64-bit files and file sizes.Because it is isolated in the storage manager, it should be easy.
Can someone knowledgeable make a patch for this for our mega-patch?
There are still quite a few OS's out there that do not support >2GB files
yet. Even my beloved Linux (x86)...So, how about we fix the storage manager instead?
A neat thing that Illustra does is allow you to stripe a table across
multiple directories. You get big tables, easy storage management, and
a nice performance boost.create stripedir('stripe1', '/disk1/data/stripe1');
create stripedir('stripe2', '/disk2/data/stripe2');create table giant_table (...) with (stripes 4, 'stripe1', 'stripe2');
-- the '4' is the number of pages to interleave.Then the smgr just distributes the blocks alternately across the stripes.
read_block(blockno, ...stripeinfo)
{
...
stripe = (blockno / stripe_interleave ) % number_of_stripes;
stripe_block = blockno / number_of_stripes;fd = stripe_info->fd[stripe];
lseek(fd, stripe_block * BLOCKSIZE, SEEK_SET);
...
}All vastly oversimplified of course....
-dg
David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
- I realize now that irony has no place in business communications.
So, how about we fix the storage manager instead?
I don't recall, what exactly breaks when going over 2 gig? I don't have the
disk space available, otherwise I'd debug this. I can still try if I knew
what the problem was...this code isn't all that complex.
I agree.
OK...here is a patch that will cause the magnetic disk storage manager to
not try to split files in 2 gig chunks. It will just try to get another
block.If applied, everything is just as before. But if LET_OS_MANAGE_FILESIZE
is defined, the chaining disappears and the file just keeps on going,
and going, and going, til the OS barfs.Are there #defines in the system includes that could be used to determine
a max file size? If so, then I'd think that this would be something
to add to configure. If files over 2 gig are not allowed, then the old
code would compile.Anyway, if the patch looks ok to the powers-that-be or if there is some
thing else to be changed, let me know and I'll resubmit it to PATCHES.Compiled and regressed ok with and without LET_OS_MANAGE_FILESIZE, but
then again there aren't any regression tables over 2 gig. :)
Well, BSD/OS goes over 2gig, but the postgreSQL code uses lseek, which
returns long, so even though I can handle larger files, the lseek()
can't because long is 32-bits. Looks like only Alpha can handle those
files based on our current code.
Thanks to our success, this looks like something we will have to deal
with.
--
Bruce Momjian | 830 Blythe Avenue
maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)
Import Notes
Reply to msg id not found: 9803192227.AA34190@ceodev | Resolved by subject fallback
Applied.
I don't recall, what exactly breaks when going over 2 gig? I don't have the
disk space available, otherwise I'd debug this. I can still try if I knew
what the problem was...this code isn't all that complex.OK...here is a patch that will cause the magnetic disk storage manager to
not try to split files in 2 gig chunks. It will just try to get another
block.If applied, everything is just as before. But if LET_OS_MANAGE_FILESIZE
is defined, the chaining disappears and the file just keeps on going,
and going, and going, til the OS barfs.Are there #defines in the system includes that could be used to determine
a max file size? If so, then I'd think that this would be something
to add to configure. If files over 2 gig are not allowed, then the old
code would compile.Anyway, if the patch looks ok to the powers-that-be or if there is some
--
Bruce Momjian | 830 Blythe Avenue
maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)
Import Notes
Reply to msg id not found: 9803192227.AA34190@ceodev | Resolved by subject fallback
Bruce Momjian wrote:
Now that we know the storage manager code that splits tables over 2GB
into separate files doesn't work(Irix), can we rip out that code and
just use the OS code to access >2GB files as normal files. Now, most
OS's can support 64-bit files and file sizes.Because it is isolated in the storage manager, it should be easy.
Can someone knowledgeable make a patch for this for our mega-patch?
But, could it not be useful to be able to use multiple files per
table? Suppose someone wants to spread them out on different
disks to increase access performance?
And what about tables over 2^64 bytes size? There will never be
disks of that size? Now, remember what people said about 2^32 byte
files, and years after 1999, and 64k RAM, and about all inventions
already being invented, and... :)
/* m */
* Bruce Momjian
|
| Well, BSD/OS goes over 2gig, but the postgreSQL code uses lseek, which
| returns long, so even though I can handle larger files, the lseek()
| can't because long is 32-bits.
Are you sure? In NetBSD, lseek() is declared to return an off_t,
which again is defined to be a 64bit quantity. I would assume that
BSD/OS did it the same way -- in fact, I'd be surprised if not.
-tih
--
Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"
* Bruce Momjian
|
| Well, BSD/OS goes over 2gig, but the postgreSQL code uses lseek, which
| returns long, so even though I can handle larger files, the lseek()
| can't because long is 32-bits.Are you sure? In NetBSD, lseek() is declared to return an off_t,
which again is defined to be a 64bit quantity. I would assume that
BSD/OS did it the same way -- in fact, I'd be surprised if not.
Oops, you are right:
typedef quad_t off_t;
I thought they added fgetpos() only for 64-bit quantities, and did not
change the return value of lseek. However:
sys/types.h:76: typedef quad_t off_t; /* file offset*/
so you are right, but our code:
fd.c:110: long seekPos;
fd.c:263: fileP->seekPos = (long) lseek(fileP->fd, 0L, SEEK_CUR);
so it still will not work because the code is not defining seekPos as
off_t. We need to get this code cleaned up/fixed.
How could they make such a mistake and assume it is a long, unless this
thing gets passed around in the backend, and they don't want to
reference off_t all over the place? That code needs cleanup.
--
Bruce Momjian | 830 Blythe Avenue
maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)
* Bruce Momjian
|
| How could they make such a mistake and assume it is a long, [...]
"All the world's a VAX!" Getting rid of (and tidying up after)
explicit "long lseek();" declarations is a major part of porting
old software to new UNIXes.
-tih
--
Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"
* Bruce Momjian
|
| How could they make such a mistake and assume it is a long, [...]"All the world's a VAX!" Getting rid of (and tidying up after)
explicit "long lseek();" declarations is a major part of porting
old software to new UNIXes.
It's on our TODO list now.
--
Bruce Momjian | 830 Blythe Avenue
maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)