pg_dump and large files - is this a problem?
Is it my imagination, or is there a problem with the way pg_dump uses off_t
etc. My understanding is that off_t may be 64 bits on systems with 32 bit
ints. But it looks like pg_dump writes them as 4 byte values in all cases.
It also reads them as 4 byte values. Does this seem like a problem to
anybody else?
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes:
Is it my imagination, or is there a problem with the way pg_dump uses off_t
etc. My understanding is that off_t may be 64 bits on systems with 32 bit
ints. But it looks like pg_dump writes them as 4 byte values in all cases.
It also reads them as 4 byte values. Does this seem like a problem to
anybody else?
Yes, it does --- the implication is that the custom format, at least,
can't support dumps > 4Gb. What exactly is pg_dump writing off_t's
into files for; maybe there's not really a problem?
If there is a problem, seems like we'd better fix it. Perhaps there
needs to be something in the header to tell the reader the sizeof
off_t.
regards, tom lane
Tom Lane wrote:
Philip Warner <pjw@rhyme.com.au> writes:
Is it my imagination, or is there a problem with the way pg_dump uses off_t
etc. My understanding is that off_t may be 64 bits on systems with 32 bit
ints. But it looks like pg_dump writes them as 4 byte values in all cases.
It also reads them as 4 byte values. Does this seem like a problem to
anybody else?Yes, it does --- the implication is that the custom format, at least,
can't support dumps > 4Gb. What exactly is pg_dump writing off_t's
into files for; maybe there's not really a problem?If there is a problem, seems like we'd better fix it. Perhaps there
needs to be something in the header to tell the reader the sizeof
off_t.
BSD/OS has 64-bit off_t's so it does support large files. Is there
something I can test?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Tom Lane writes:
Yes, it does --- the implication is that the custom format, at least,
can't support dumps > 4Gb. What exactly is pg_dump writing off_t's
into files for; maybe there's not really a problem?
That's kind of what I was wondering, too.
Not that it's an excuse, but I think that large file access through zlib
won't work anyway. Zlib uses the integer types in fairly random ways.
--
Peter Eisentraut peter_e@gmx.net
At 09:59 AM 1/10/2002 -0400, Tom Lane wrote:
If there is a problem, seems like we'd better fix it. Perhaps there
needs to be something in the header to tell the reader the sizeof
off_t.
Yes, and do the peripheral stuff to support old archives etc. We also need
to be careful about the places where we do file-position-arithmetic - if
there are any, I can't recall.
I am not sure we need to worry about whether zlib supports large files
since I am pretty sure we don't use zlib for file IO - we just pass it
in-memory blocks; so it should work no matter how much data is in the stream.
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
At 11:20 AM 1/10/2002 -0400, Bruce Momjian wrote:
BSD/OS has 64-bit off_t's so it does support large files. Is there
something I can test?
Not really since it saves only the first 32 bits of the 64 bit positions it
will do no worse than a version that supports 32 bits only. It might even
do slightly better. When this is sorted out, we need to verify that:
- large dump files are restorable
- dump files with 32 bit off_t restore properly on systems with 64 biy off_t
- dump files with 64 bit off_t restore properly on systems with 32 bit off_tAS
LONG AS the offsets are less than 32 bits.
- old dump files restore properly.
- new dump files have a new version number so that old pg_restore will not
try to restore them.
We probably need to add Read/WriteOffset to pg_backup_archiver.c to read
the appropriate sized value from a dump file, in the same way that
Read/WriteInt works now.
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
At 09:42 AM 2/10/2002 +1000, Philip Warner wrote:
Yes, and do the peripheral stuff to support old archives etc.
Does silence mean people agree? Does it also mean someone is doing this
(eg. whoever did the off_t support)? Or does it mean somebody else needs to
do it?
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes:
At 09:42 AM 2/10/2002 +1000, Philip Warner wrote:
Yes, and do the peripheral stuff to support old archives etc.
Does silence mean people agree? Does it also mean someone is doing this
(eg. whoever did the off_t support)? Or does it mean somebody else needs to
do it?
It needs to get done; AFAIK no one has stepped up to do it. Do you want
to?
regards, tom lane
Philip Warner wrote:
At 09:42 AM 2/10/2002 +1000, Philip Warner wrote:
Yes, and do the peripheral stuff to support old archives etc.
Does silence mean people agree? Does it also mean someone is doing this
(eg. whoever did the off_t support)? Or does it mean somebody else needs to
do it?
Added to open items:
Fix pg_dump to handle 64-bit off_t offsets for custom format
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Attachments:
/root/open_itemstext/plainDownload
At 11:06 AM 2/10/2002 -0400, Tom Lane wrote:
It needs to get done; AFAIK no one has stepped up to do it. Do you want
to?
I'll have a look; my main concern at the moment is that off_t and size_t
are totally non-committal as to structure; in particular I can probably
safely assume that they are unsigned, but can I assume that they have the
same endian--ness as int etc?
If so, then will it be valid to just read/write each byte in endian order?
How likely is it that the 64 bit value will actually be implemented as a
structure like:
off_t { int lo; int hi; }
which effectively ignores endian-ness at the 32 bit scale?
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
At 11:06 AM 2/10/2002 -0400, Tom Lane wrote:
It needs to get done; AFAIK no one has stepped up to do it. Do you want
to?
My limited reading of off_t stuff now suggests that it would be brave to
assume it is even a simple 64 bit number (or even 3 32 bit numbers). One
alternative, which I am not terribly fond of, is to have pg_dump write
multiple files - when we get to 1 or 2GB, we just open another file, and
record our file positions as a (file number, file position) pair. Low tech,
but at least we know it would work.
Unless anyone knows of a documented way to get 64 bit uint/int file
offsets, I don't see we have mush choice.
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
Import Notes
Resolved by subject fallback
My limited reading of off_t stuff now suggests that it would be brave to
assume it is even a simple 64 bit number (or even 3 32 bit numbers). One
alternative, which I am not terribly fond of, is to have pg_dump write
multiple files - when we get to 1 or 2GB, we just open another file, and
record our file positions as a (file number, file position) pair. Low tech,
but at least we know it would work.Unless anyone knows of a documented way to get 64 bit uint/int file
offsets, I don't see we have mush choice.
How common is fgetpos64? Linux supports it, but I don't know about other
systems.
http://hpc.uky.edu/cgi-bin/man.cgi?section=all&topic=fgetpos64
Regards,
Mario Weilguni
Import Notes
Resolved by subject fallback
Philip Warner writes:
My limited reading of off_t stuff now suggests that it would be brave to
assume it is even a simple 64 bit number (or even 3 32 bit numbers).
What are you reading?? If you find a platform with 64 bit file
offsets that doesn't support 64 bit integral types I will not just be
surprised but amazed.
One alternative, which I am not terribly fond of, is to have pg_dump
write multiple files - when we get to 1 or 2GB, we just open another
file, and record our file positions as a (file number, file
position) pair. Low tech, but at least we know it would work.
That does avoid the issue completely, of course, and also avoids
problems where a platform might have large file support but a
particular filesystem might or might not.
Unless anyone knows of a documented way to get 64 bit uint/int file
offsets, I don't see we have mush choice.
If you're on a platform that supports large files it will either have
a straightforward 64 bit off_t or else will support the "large files
API" that is common on Unix-like operating systems.
What are you trying to do, exactly?
Regards,
Giles
At 07:15 AM 4/10/2002 +1000, Giles Lean wrote:
My limited reading of off_t stuff now suggests that it would be brave to
assume it is even a simple 64 bit number (or even 3 32 bit numbers).What are you reading?? If you find a platform with 64 bit file
offsets that doesn't support 64 bit integral types I will not just be
surprised but amazed.
Yes, but there is no guarantee that off_t is implemented as such, nor would
we be wise to assume so (most docs say explicitly not to do so).
Unless anyone knows of a documented way to get 64 bit uint/int file
offsets, I don't see we have mush choice.If you're on a platform that supports large files it will either have
a straightforward 64 bit off_t or else will support the "large files
API" that is common on Unix-like operating systems.What are you trying to do, exactly?
Again yes, but the problem is the same: we need a way of making the *value*
of an off_t portable (not just assuming it's a int64). In general that
involves knowing how to turn it into a more universal data type (eg. int64,
or even a string). Does the large file API have functions for representing
the off_t values that is portable across architectures? And is the API also
portable?
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
Philip Warner writes:
Yes, but there is no guarantee that off_t is implemented as such, nor would
we be wise to assume so (most docs say explicitly not to do so).
I suspect you're reading old documents, which is why I asked what you
were referring to. In the '80s what you are saying would have been
best practice, no question: 64 bit type support was not common.
When talking of near-current systems with 64 bit off_t you are not
going to find one without support for 64 bit integral types.
Again yes, but the problem is the same: we need a way of making the *value*
of an off_t portable (not just assuming it's a int64). In general that
involves knowing how to turn it into a more universal data type (eg. int64,
or even a string).
So you need to know the size of off_t, which will be 32 bit or 64 bit,
and then you need routines to convert that to a portable representation.
The canonical solution is XDR, but I'm not sure that you want to bother
with it or if it has been extended universally to support 64 bit types.
If you limit the file sizes to 1GB (your less preferred option, I
know;-) then like the rest of the PostgreSQL code you can safely
assume that off_t fits into 32 bits and have a choice of functions
(XDR or ntohl() etc) to deal with them and ignore 64 bit off_t
issues altogether.
If you intend pg_dump files to be portable avoiding the use of large
files will be best. It also avoids issues on platforms such as HP-UX
where large file support is available, but it has to be enabled on a
per-filesystem basis. :-(
Does the large file API have functions for representing
the off_t values that is portable across architectures? And is the API also
portable?
The large files API is a way to access large files from 32 bit
processes. It is reasonably portable, but is a red herring for
what you are wanting to do. (I'm not convinced I am understanding
what you're trying to do, but I have 'flu which is not helping. :-)
Regards,
Giles
Giles Lean <giles@nemeton.com.au> writes:
When talking of near-current systems with 64 bit off_t you are not
going to find one without support for 64 bit integral types.
I tend to agree with Giles on this point. A non-integral representation
of off_t is theoretically possible but I don't believe it exists in
practice. Before going far out of our way to allow it, we should first
require some evidence that it's needed on a supported or
likely-to-be-supported platform.
time_t isn't guaranteed to be an integral type either if you read the
oldest docs about it ... but no one believes that in practice ...
regards, tom lane
Tom Lane wrote:
Giles Lean <giles@nemeton.com.au> writes:
When talking of near-current systems with 64 bit off_t you are not
going to find one without support for 64 bit integral types.I tend to agree with Giles on this point. A non-integral representation
of off_t is theoretically possible but I don't believe it exists in
practice. Before going far out of our way to allow it, we should first
require some evidence that it's needed on a supported or
likely-to-be-supported platform.time_t isn't guaranteed to be an integral type either if you read the
oldest docs about it ... but no one believes that in practice ...
I think fpos_t is the non-integral one. I thought off_t almost always
was integral.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
At 11:07 PM 3/10/2002 -0400, Tom Lane wrote:
A non-integral representation
of off_t is theoretically possible but I don't believe it exists in
practice.
Excellent. So I can just read/write the bytes in an appropriate order and
expect whatever size it is to be a single intXX.
Fine with me, unless anybody voices another opinion in the next day, I will
proceed. I just have this vague recollection of seeing a header file with a
more complex structure for off_t. I'm probably dreaming.
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
I have made the changes to pg_dump and verified that (a) it reads old
files, (b) it handles 8 byte offsets, and (c) it dumps & seems to restore
(at least to /dev/null).
I don't have a lot of options for testing it - should I just apply the
changes and wait for the problems, or can someone offer a bigendian machine
and/or a 4 byte off_t machine?
was integral.
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes:
I don't have a lot of options for testing it - should I just apply the
changes and wait for the problems, or can someone offer a bigendian machine
and/or a 4 byte off_t machine?
My HP is big-endian; send in the patch and I'll check it here...
regards, tom lane