pg_dump and large files - is this a problem?

Started by Philip Warnerover 23 years ago110 messages
#1Philip Warner
pjw@rhyme.com.au

Is it my imagination, or is there a problem with the way pg_dump uses off_t
etc. My understanding is that off_t may be 64 bits on systems with 32 bit
ints. But it looks like pg_dump writes them as 4 byte values in all cases.
It also reads them as 4 byte values. Does this seem like a problem to
anybody else?

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Philip Warner (#1)
Re: pg_dump and large files - is this a problem?

Philip Warner <pjw@rhyme.com.au> writes:

Is it my imagination, or is there a problem with the way pg_dump uses off_t
etc. My understanding is that off_t may be 64 bits on systems with 32 bit
ints. But it looks like pg_dump writes them as 4 byte values in all cases.
It also reads them as 4 byte values. Does this seem like a problem to
anybody else?

Yes, it does --- the implication is that the custom format, at least,
can't support dumps > 4Gb. What exactly is pg_dump writing off_t's
into files for; maybe there's not really a problem?

If there is a problem, seems like we'd better fix it. Perhaps there
needs to be something in the header to tell the reader the sizeof
off_t.

regards, tom lane

#3Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#2)
Re: pg_dump and large files - is this a problem?

Tom Lane wrote:

Philip Warner <pjw@rhyme.com.au> writes:

Is it my imagination, or is there a problem with the way pg_dump uses off_t
etc. My understanding is that off_t may be 64 bits on systems with 32 bit
ints. But it looks like pg_dump writes them as 4 byte values in all cases.
It also reads them as 4 byte values. Does this seem like a problem to
anybody else?

Yes, it does --- the implication is that the custom format, at least,
can't support dumps > 4Gb. What exactly is pg_dump writing off_t's
into files for; maybe there's not really a problem?

If there is a problem, seems like we'd better fix it. Perhaps there
needs to be something in the header to tell the reader the sizeof
off_t.

BSD/OS has 64-bit off_t's so it does support large files. Is there
something I can test?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#4Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#2)
Re: pg_dump and large files - is this a problem?

Tom Lane writes:

Yes, it does --- the implication is that the custom format, at least,
can't support dumps > 4Gb. What exactly is pg_dump writing off_t's
into files for; maybe there's not really a problem?

That's kind of what I was wondering, too.

Not that it's an excuse, but I think that large file access through zlib
won't work anyway. Zlib uses the integer types in fairly random ways.

--
Peter Eisentraut peter_e@gmx.net

#5Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#2)
Re: pg_dump and large files - is this a problem?

At 09:59 AM 1/10/2002 -0400, Tom Lane wrote:

If there is a problem, seems like we'd better fix it. Perhaps there
needs to be something in the header to tell the reader the sizeof
off_t.

Yes, and do the peripheral stuff to support old archives etc. We also need
to be careful about the places where we do file-position-arithmetic - if
there are any, I can't recall.

I am not sure we need to worry about whether zlib supports large files
since I am pretty sure we don't use zlib for file IO - we just pass it
in-memory blocks; so it should work no matter how much data is in the stream.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#6Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#3)
Re: pg_dump and large files - is this a problem?

At 11:20 AM 1/10/2002 -0400, Bruce Momjian wrote:

BSD/OS has 64-bit off_t's so it does support large files. Is there
something I can test?

Not really since it saves only the first 32 bits of the 64 bit positions it
will do no worse than a version that supports 32 bits only. It might even
do slightly better. When this is sorted out, we need to verify that:

- large dump files are restorable

- dump files with 32 bit off_t restore properly on systems with 64 biy off_t

- dump files with 64 bit off_t restore properly on systems with 32 bit off_tAS
LONG AS the offsets are less than 32 bits.

- old dump files restore properly.

- new dump files have a new version number so that old pg_restore will not
try to restore them.

We probably need to add Read/WriteOffset to pg_backup_archiver.c to read
the appropriate sized value from a dump file, in the same way that
Read/WriteInt works now.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#7Philip Warner
pjw@rhyme.com.au
In reply to: Philip Warner (#5)
Re: pg_dump and large files - is this a problem?

At 09:42 AM 2/10/2002 +1000, Philip Warner wrote:

Yes, and do the peripheral stuff to support old archives etc.

Does silence mean people agree? Does it also mean someone is doing this
(eg. whoever did the off_t support)? Or does it mean somebody else needs to
do it?

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Philip Warner (#7)
Re: pg_dump and large files - is this a problem?

Philip Warner <pjw@rhyme.com.au> writes:

At 09:42 AM 2/10/2002 +1000, Philip Warner wrote:

Yes, and do the peripheral stuff to support old archives etc.

Does silence mean people agree? Does it also mean someone is doing this
(eg. whoever did the off_t support)? Or does it mean somebody else needs to
do it?

It needs to get done; AFAIK no one has stepped up to do it. Do you want
to?

regards, tom lane

#9Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#7)
1 attachment(s)
Re: pg_dump and large files - is this a problem?

Philip Warner wrote:

At 09:42 AM 2/10/2002 +1000, Philip Warner wrote:

Yes, and do the peripheral stuff to support old archives etc.

Does silence mean people agree? Does it also mean someone is doing this
(eg. whoever did the off_t support)? Or does it mean somebody else needs to
do it?

Added to open items:

Fix pg_dump to handle 64-bit off_t offsets for custom format

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Attachments:

/root/open_itemstext/plainDownload
#10Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#8)
Re: pg_dump and large files - is this a problem?

At 11:06 AM 2/10/2002 -0400, Tom Lane wrote:

It needs to get done; AFAIK no one has stepped up to do it. Do you want
to?

I'll have a look; my main concern at the moment is that off_t and size_t
are totally non-committal as to structure; in particular I can probably
safely assume that they are unsigned, but can I assume that they have the
same endian--ness as int etc?

If so, then will it be valid to just read/write each byte in endian order?
How likely is it that the 64 bit value will actually be implemented as a
structure like:

off_t { int lo; int hi; }

which effectively ignores endian-ness at the 32 bit scale?

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#11Philip Warner
pjw@rhyme.com.au
In reply to: Philip Warner (#10)
Re: pg_dump and large files - is this a problem?

At 11:06 AM 2/10/2002 -0400, Tom Lane wrote:

It needs to get done; AFAIK no one has stepped up to do it. Do you want
to?

My limited reading of off_t stuff now suggests that it would be brave to
assume it is even a simple 64 bit number (or even 3 32 bit numbers). One
alternative, which I am not terribly fond of, is to have pg_dump write
multiple files - when we get to 1 or 2GB, we just open another file, and
record our file positions as a (file number, file position) pair. Low tech,
but at least we know it would work.

Unless anyone knows of a documented way to get 64 bit uint/int file
offsets, I don't see we have mush choice.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#12Mario Weilguni
mario.weilguni@icomedias.com
In reply to: Philip Warner (#11)
Re: pg_dump and large files - is this a problem?

My limited reading of off_t stuff now suggests that it would be brave to
assume it is even a simple 64 bit number (or even 3 32 bit numbers). One
alternative, which I am not terribly fond of, is to have pg_dump write
multiple files - when we get to 1 or 2GB, we just open another file, and
record our file positions as a (file number, file position) pair. Low tech,
but at least we know it would work.

Unless anyone knows of a documented way to get 64 bit uint/int file
offsets, I don't see we have mush choice.

How common is fgetpos64? Linux supports it, but I don't know about other
systems.

http://hpc.uky.edu/cgi-bin/man.cgi?section=all&amp;topic=fgetpos64

Regards,
Mario Weilguni

#13Giles Lean
giles@nemeton.com.au
In reply to: Philip Warner (#11)
Re: pg_dump and large files - is this a problem?

Philip Warner writes:

My limited reading of off_t stuff now suggests that it would be brave to
assume it is even a simple 64 bit number (or even 3 32 bit numbers).

What are you reading?? If you find a platform with 64 bit file
offsets that doesn't support 64 bit integral types I will not just be
surprised but amazed.

One alternative, which I am not terribly fond of, is to have pg_dump
write multiple files - when we get to 1 or 2GB, we just open another
file, and record our file positions as a (file number, file
position) pair. Low tech, but at least we know it would work.

That does avoid the issue completely, of course, and also avoids
problems where a platform might have large file support but a
particular filesystem might or might not.

Unless anyone knows of a documented way to get 64 bit uint/int file
offsets, I don't see we have mush choice.

If you're on a platform that supports large files it will either have
a straightforward 64 bit off_t or else will support the "large files
API" that is common on Unix-like operating systems.

What are you trying to do, exactly?

Regards,

Giles

#14Philip Warner
pjw@rhyme.com.au
In reply to: Giles Lean (#13)
Re: pg_dump and large files - is this a problem?

At 07:15 AM 4/10/2002 +1000, Giles Lean wrote:

My limited reading of off_t stuff now suggests that it would be brave to
assume it is even a simple 64 bit number (or even 3 32 bit numbers).

What are you reading?? If you find a platform with 64 bit file
offsets that doesn't support 64 bit integral types I will not just be
surprised but amazed.

Yes, but there is no guarantee that off_t is implemented as such, nor would
we be wise to assume so (most docs say explicitly not to do so).

Unless anyone knows of a documented way to get 64 bit uint/int file
offsets, I don't see we have mush choice.

If you're on a platform that supports large files it will either have
a straightforward 64 bit off_t or else will support the "large files
API" that is common on Unix-like operating systems.

What are you trying to do, exactly?

Again yes, but the problem is the same: we need a way of making the *value*
of an off_t portable (not just assuming it's a int64). In general that
involves knowing how to turn it into a more universal data type (eg. int64,
or even a string). Does the large file API have functions for representing
the off_t values that is portable across architectures? And is the API also
portable?

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#15Giles Lean
giles@nemeton.com.au
In reply to: Philip Warner (#14)
Re: pg_dump and large files - is this a problem?

Philip Warner writes:

Yes, but there is no guarantee that off_t is implemented as such, nor would
we be wise to assume so (most docs say explicitly not to do so).

I suspect you're reading old documents, which is why I asked what you
were referring to. In the '80s what you are saying would have been
best practice, no question: 64 bit type support was not common.

When talking of near-current systems with 64 bit off_t you are not
going to find one without support for 64 bit integral types.

Again yes, but the problem is the same: we need a way of making the *value*
of an off_t portable (not just assuming it's a int64). In general that
involves knowing how to turn it into a more universal data type (eg. int64,
or even a string).

So you need to know the size of off_t, which will be 32 bit or 64 bit,
and then you need routines to convert that to a portable representation.
The canonical solution is XDR, but I'm not sure that you want to bother
with it or if it has been extended universally to support 64 bit types.

If you limit the file sizes to 1GB (your less preferred option, I
know;-) then like the rest of the PostgreSQL code you can safely
assume that off_t fits into 32 bits and have a choice of functions
(XDR or ntohl() etc) to deal with them and ignore 64 bit off_t
issues altogether.

If you intend pg_dump files to be portable avoiding the use of large
files will be best. It also avoids issues on platforms such as HP-UX
where large file support is available, but it has to be enabled on a
per-filesystem basis. :-(

Does the large file API have functions for representing
the off_t values that is portable across architectures? And is the API also
portable?

The large files API is a way to access large files from 32 bit
processes. It is reasonably portable, but is a red herring for
what you are wanting to do. (I'm not convinced I am understanding
what you're trying to do, but I have 'flu which is not helping. :-)

Regards,

Giles

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Giles Lean (#15)
Re: pg_dump and large files - is this a problem?

Giles Lean <giles@nemeton.com.au> writes:

When talking of near-current systems with 64 bit off_t you are not
going to find one without support for 64 bit integral types.

I tend to agree with Giles on this point. A non-integral representation
of off_t is theoretically possible but I don't believe it exists in
practice. Before going far out of our way to allow it, we should first
require some evidence that it's needed on a supported or
likely-to-be-supported platform.

time_t isn't guaranteed to be an integral type either if you read the
oldest docs about it ... but no one believes that in practice ...

regards, tom lane

#17Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#16)
Re: pg_dump and large files - is this a problem?

Tom Lane wrote:

Giles Lean <giles@nemeton.com.au> writes:

When talking of near-current systems with 64 bit off_t you are not
going to find one without support for 64 bit integral types.

I tend to agree with Giles on this point. A non-integral representation
of off_t is theoretically possible but I don't believe it exists in
practice. Before going far out of our way to allow it, we should first
require some evidence that it's needed on a supported or
likely-to-be-supported platform.

time_t isn't guaranteed to be an integral type either if you read the
oldest docs about it ... but no one believes that in practice ...

I think fpos_t is the non-integral one. I thought off_t almost always
was integral.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#18Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#16)
Re: pg_dump and large files - is this a problem?

At 11:07 PM 3/10/2002 -0400, Tom Lane wrote:

A non-integral representation
of off_t is theoretically possible but I don't believe it exists in
practice.

Excellent. So I can just read/write the bytes in an appropriate order and
expect whatever size it is to be a single intXX.

Fine with me, unless anybody voices another opinion in the next day, I will
proceed. I just have this vague recollection of seeing a header file with a
more complex structure for off_t. I'm probably dreaming.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#19Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#17)
Re: pg_dump and large files - is this a problem?

I have made the changes to pg_dump and verified that (a) it reads old
files, (b) it handles 8 byte offsets, and (c) it dumps & seems to restore
(at least to /dev/null).

I don't have a lot of options for testing it - should I just apply the
changes and wait for the problems, or can someone offer a bigendian machine
and/or a 4 byte off_t machine?

was integral.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Philip Warner (#19)
Re: pg_dump and large files - is this a problem?

Philip Warner <pjw@rhyme.com.au> writes:

I don't have a lot of options for testing it - should I just apply the
changes and wait for the problems, or can someone offer a bigendian machine
and/or a 4 byte off_t machine?

My HP is big-endian; send in the patch and I'll check it here...

regards, tom lane

#21Peter Eisentraut
peter_e@gmx.net
In reply to: Philip Warner (#19)
Re: pg_dump and large files - is this a problem?

Philip Warner writes:

I have made the changes to pg_dump and verified that (a) it reads old
files, (b) it handles 8 byte offsets, and (c) it dumps & seems to restore
(at least to /dev/null).

I don't have a lot of options for testing it - should I just apply the
changes and wait for the problems, or can someone offer a bigendian machine
and/or a 4 byte off_t machine?

Any old machine has a 4-byte off_t if you configure with
--disable-largefile. This could be a neat way to test: Make two
installations configured different ways and move data back and forth
between them until it changes. ;-)

--
Peter Eisentraut peter_e@gmx.net

#22Philip Warner
pjw@rhyme.com.au
In reply to: Peter Eisentraut (#21)
Re: pg_dump and large files - is this a problem?

At 12:07 AM 19/10/2002 +0200, Peter Eisentraut wrote:

Any old machine has a 4-byte off_t if you configure with
--disable-largefile.

Thanks - done. I just dumped to a custom backup file, then dumped it do
SQL, and compared each version (V7.2.1, 8 byte & 4 byte offsets), and they
all looked OK. Also, the 4 byte version reads the 8 byte offset version
correctly - although I have not checked reading > 4GB files with 4 byte
offset, but it's not a priority for obvious reasons.

So once Giles gets back to me (Monday), I'll commit the changes.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#23Philip Warner
pjw@rhyme.com.au
In reply to: Philip Warner (#22)
Re: pg_dump and large files - is this a problem?

I have put the latest patch at:

http://downloads.rhyme.com.au/postgresql/pg_dump/

along with two dump files of the regression DB, one with 4 byte
and the other with 8 byte offsets. I can read/restore each from
the other, so it looks pretty good. Once the endianness is tested,
we should be OK.

Known problems:

- will not cope with > 4GB files and size_t not 64 bit.
- when printing data position, it is assumed that off_t is UINT64
(we could remove this entirely - it's just for display)
- if seek is not supported, then an intXX is assigned to off_t
when file offsets are needed. This *should* not cause a problem
since without seek, the offsets will not be written to the file.

Changes from Prior Version:

- No longer stores or outputs data length
- Assumes result of ftello is correct if it disagrees with internally
kept tally.
- 'pg_restore -l' now shows sizes of int and offset.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#24Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#23)
Re: pg_dump and large files - is this a problem?

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

I will try to apply it within the next 48 hours.

---------------------------------------------------------------------------

Philip Warner wrote:

I have put the latest patch at:

http://downloads.rhyme.com.au/postgresql/pg_dump/

along with two dump files of the regression DB, one with 4 byte
and the other with 8 byte offsets. I can read/restore each from
the other, so it looks pretty good. Once the endianness is tested,
we should be OK.

Known problems:

- will not cope with > 4GB files and size_t not 64 bit.
- when printing data position, it is assumed that off_t is UINT64
(we could remove this entirely - it's just for display)
- if seek is not supported, then an intXX is assigned to off_t
when file offsets are needed. This *should* not cause a problem
since without seek, the offsets will not be written to the file.

Changes from Prior Version:

- No longer stores or outputs data length
- Assumes result of ftello is correct if it disagrees with internally
kept tally.
- 'pg_restore -l' now shows sizes of int and offset.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#25Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#24)
Re: pg_dump and large files - is this a problem?

At 09:18 PM 20/10/2002 -0400, Bruce Momjian wrote:

I will try to apply it within the next 48 hours.

I'm happy to apply it when necessary; but I wouldn't do it until we've from
some someone with a big-endian machine...

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#26Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#25)
Re: pg_dump and large files - is this a problem?

Philip Warner wrote:

At 09:18 PM 20/10/2002 -0400, Bruce Momjian wrote:

I will try to apply it within the next 48 hours.

I'm happy to apply it when necessary; but I wouldn't do it until we've from
some someone with a big-endian machine...

Well, I think Tom was going to try it on his HPUX machine. However, it
is on the open items list, so we are going to need to get it in there
soon anyway, or yank it all out. If no big endian people want to test
it, we will have to ship and then I am sure some big-ending testing will
happen. ;-)

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#27Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#26)
Re: pg_dump and large files - is this a problem?

At 09:50 PM 20/10/2002 -0400, Bruce Momjian wrote:

Well, I think Tom was going to try it on his HPUX machine.

It might be good if someone who knows a little more than me about
endianness etc has a look at the patch - specifically this bit of code:

#if __BYTE_ORDER == __LITTLE_ENDIAN
for (off = 0 ; off < sizeof(off_t) ; off++) {
#else
for (off = sizeof(off_t) -1 ; off >= 0 ; off--) {
#endif
i = *(char*)(ptr+off);
(*AH->WriteBytePtr) (AH, i);
}

It is *intended* to write the data such that the least significant byte
is written first to the file, but the dump Giles put on his FTP site
is not correct - it's written msb->lsb.

There seem to be two possibilities (a) I am an idiot and there is something
wrong with the code above that I can not see, or (b) the test:

#if __BYTE_ORDER == __LITTLE_ENDIAN

is not the right thing to do. Any insights would be appreciated.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Philip Warner (#27)
Re: pg_dump and large files - is this a problem?

Philip Warner <pjw@rhyme.com.au> writes:

It might be good if someone who knows a little more than me about
endianness etc has a look at the patch - specifically this bit of code:

#if __BYTE_ORDER == __LITTLE_ENDIAN

Well, the main problem with that is there's no such symbol as
__BYTE_ORDER ...

I'd prefer not to introduce one, either, if we can possibly avoid it.
I know that we have BYTE_ORDER defined in the port header files, but
I think it's quite untrustworthy, since there is no other place in the
main distribution that uses it anymore (AFAICS only contrib/pgcrypto
uses it at all).

The easiest way to write and reassemble an arithmetic value in a
platform-independent order is via shifting. For instance,

// write, LSB first
for (i = 0; i < sizeof(off_t); i++)
{
writebyte(val & 0xFF);
val >>= 8;
}

// read, LSB first
val = 0;
shift = 0;
for (i = 0; i < sizeof(off_t); i++)
{
val |= (readbyte() << shift);
shift += 8;
}

(This assumes readbyte delivers an unsigned byte, else you might need to
mask it with 0xFF before shifting.)

regards, tom lane

#29Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#28)
Re: pg_dump and large files - is this a problem?

At 09:47 AM 21/10/2002 -0400, Tom Lane wrote:

Well, the main problem with that is there's no such symbol as
__BYTE_ORDER ...

What about just:

int i = 256;

then checking the first byte? This should give me the endianness, and makes
a non-destructive write (not sure it it's important). Currently the
commonly used code does not rely on off_t arithmetic, so if possible I'd
like to avoid shift. Does that sound reasonable? Or overly cautious?

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Philip Warner (#29)
Re: pg_dump and large files - is this a problem?

Philip Warner <pjw@rhyme.com.au> writes:

then checking the first byte? This should give me the endianness, and makes
a non-destructive write (not sure it it's important). Currently the
commonly used code does not rely on off_t arithmetic, so if possible I'd
like to avoid shift. Does that sound reasonable? Or overly cautious?

I think it's pointless. Let's assume off_t is not an arithmetic type
but some weird struct dreamed up by a crazed kernel hacker. What are
the odds that dumping the bytes in it, in either order, will produce
something that's compatible with any other platform? There could be
padding, or the fields might be in an order that doesn't match the
byte order within the fields, or something else.

The shift method requires *no* directly endian-dependent code,
and I think it will work on any platform where you have any hope of
portability anyway.

regards, tom lane

#31Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#30)
Re: pg_dump and large files - is this a problem?

Here is a modified version of Philip's patch that has the changes Tom
suggested; treating off_t as an integral type. I did light testing on
my BSD/OS machine that has 8-byte off_t but I don't have 4 gigs of free
space to test larger files.

ftp://candle.pha.pa.us/pub/postgresql/mypatches/pg_dump

Can others test?

---------------------------------------------------------------------------

Tom Lane wrote:

Philip Warner <pjw@rhyme.com.au> writes:

then checking the first byte? This should give me the endianness, and makes
a non-destructive write (not sure it it's important). Currently the
commonly used code does not rely on off_t arithmetic, so if possible I'd
like to avoid shift. Does that sound reasonable? Or overly cautious?

I think it's pointless. Let's assume off_t is not an arithmetic type
but some weird struct dreamed up by a crazed kernel hacker. What are
the odds that dumping the bytes in it, in either order, will produce
something that's compatible with any other platform? There could be
padding, or the fields might be in an order that doesn't match the
byte order within the fields, or something else.

The shift method requires *no* directly endian-dependent code,
and I think it will work on any platform where you have any hope of
portability anyway.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#32Larry Rosenman
ler@lerctr.org
In reply to: Bruce Momjian (#31)
Re: pg_dump and large files - is this a problem?

On Mon, 2002-10-21 at 20:47, Bruce Momjian wrote:

Here is a modified version of Philip's patch that has the changes Tom
suggested; treating off_t as an integral type. I did light testing on
my BSD/OS machine that has 8-byte off_t but I don't have 4 gigs of free
space to test larger files.

I can make an account for anyone that wants to play on UnixWare 7.1.3.

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

#33Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Larry Rosenman (#32)
Re: pg_dump and large files - is this a problem?

Larry Rosenman wrote:

On Mon, 2002-10-21 at 20:47, Bruce Momjian wrote:

Here is a modified version of Philip's patch that has the changes Tom
suggested; treating off_t as an integral type. I did light testing on
my BSD/OS machine that has 8-byte off_t but I don't have 4 gigs of free
space to test larger files.

I can make an account for anyone that wants to play on UnixWare 7.1.3.

If you have 7.3, you can just test this way:

1) apply the patch
2) run the regression tests
3) pg_dump -Fc regression >/tmp/x
4) pg_restore -Fc </tmp/x

That's all I did and it worked.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#34Larry Rosenman
ler@lerctr.org
In reply to: Bruce Momjian (#33)
Re: pg_dump and large files - is this a problem?

On Mon, 2002-10-21 at 20:52, Bruce Momjian wrote:

Larry Rosenman wrote:

On Mon, 2002-10-21 at 20:47, Bruce Momjian wrote:

Here is a modified version of Philip's patch that has the changes Tom
suggested; treating off_t as an integral type. I did light testing on
my BSD/OS machine that has 8-byte off_t but I don't have 4 gigs of free
space to test larger files.

I can make an account for anyone that wants to play on UnixWare 7.1.3.

If you have 7.3, you can just test this way:

I haven't had the time to play with 7.3 (busy on a NUMBER of other
things).

I'm more than willing to supply resources, just my time is short right
now.

1) apply the patch
2) run the regression tests
3) pg_dump -Fc regression >/tmp/x
4) pg_restore -Fc </tmp/x

That's all I did and it worked.

-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

#35Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#33)
Re: pg_dump and large files - is this a problem?

At 09:52 PM 21/10/2002 -0400, Bruce Momjian wrote:

4) pg_restore -Fc </tmp/x

pg_restore /tmp/x

is enough; it will determine the file type, and by avoiding the pipe, you
allow it to do seeks which are not much use here, but are usefull when you
only restore one table in a very large backup.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#36Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#30)
Re: pg_dump and large files - is this a problem?

At 10:16 AM 21/10/2002 -0400, Tom Lane wrote:

What are
the odds that dumping the bytes in it, in either order, will produce
something that's compatible with any other platform?

None, but it will be compatible with itself (the most we can hope for), and
will work even if shifting is not supported for off_t (how likely is
that?). I agree shift is definitely the way to go if it works on arbitrary
data - ie. it does not rely on off_t being an integer. Can I shift a struct?

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Philip Warner (#36)
Re: pg_dump and large files - is this a problem?

Philip Warner <pjw@rhyme.com.au> writes:

None, but it will be compatible with itself (the most we can hope for), and
will work even if shifting is not supported for off_t (how likely is
that?). I agree shift is definitely the way to go if it works on arbitrary
data - ie. it does not rely on off_t being an integer. Can I shift a struct?

You can't. If there are any platforms where in fact off_t isn't an
arithmetic type, then shifting code would break there. I am not sure
there are any; can anyone provide a counterexample?

It would be simple enough to add a configure test to see whether off_t
is arithmetic (just try to compile "off_t x; x <<= 8;"). How about
#ifdef OFF_T_IS_ARITHMETIC_TYPE
// cross-platform compatible
use shifting method
#else
// not cross-platform compatible
read or write bytes of struct in storage order
#endif

regards, tom lane

#38Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#37)
Re: pg_dump and large files - is this a problem?

Tom Lane wrote:

Philip Warner <pjw@rhyme.com.au> writes:

None, but it will be compatible with itself (the most we can hope for), and
will work even if shifting is not supported for off_t (how likely is
that?). I agree shift is definitely the way to go if it works on arbitrary
data - ie. it does not rely on off_t being an integer. Can I shift a struct?

You can't. If there are any platforms where in fact off_t isn't an
arithmetic type, then shifting code would break there. I am not sure
there are any; can anyone provide a counterexample?

It would be simple enough to add a configure test to see whether off_t
is arithmetic (just try to compile "off_t x; x <<= 8;"). How about
#ifdef OFF_T_IS_ARITHMETIC_TYPE
// cross-platform compatible
use shifting method
#else
// not cross-platform compatible
read or write bytes of struct in storage order
#endif

It is my understanding that off_t is an integral type and fpos_t is
perhaps a struct. My fgetpos manual page says:

The fgetpos() and fsetpos() functions are alternate interfaces equivalent
to ftell() and fseek() (with whence set to SEEK_SET ), setting and stor-
ing the current value of the file offset into or from the object refer-
enced by pos. On some (non-UNIX) systems an ``fpos_t'' object may be a
complex object and these routines may be the only way to portably reposi-
tion a text stream.

I poked around and found this Usenet posting:

http://groups.google.com/groups?q=C+off_t+standard+integral&amp;hl=en&amp;lr=&amp;ie=UTF-8&amp;oe=UTF-8&amp;selm=E958tG.8tH%40root.co.uk&amp;rnum=1

stating that while off_t must be arithmetic, it doesn't have to be
integral, meaning it could be float or double, which can't be shifted.

However, since we don't know if we support any non-integral off_t
platforms, and because a configure test would require us to have two
code paths for with/without integral off_t, I suggest we apply my
version of Philip's patch and let's see if everyone can compile it
cleanly. It does have the advantage of being more portable on systems
that do have integral off_t, which I think is most/all of our supported
platforms.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#39Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#38)
Re: pg_dump and large files - is this a problem?

At 12:00 PM 22/10/2002 -0400, Bruce Momjian wrote:

It does have the advantage of being more portable on systems
that do have integral off_t

I suspect it is no more portable than determining storage order by using
'int i = 256', then writing in storage order, and has the disadvantage that
it may break as discussed.

AFAICT, using storage order will not break under any circumstances within
one OS/architecture (unlike using shift), and will not break any more often
than using shift in cases where off_t is integral.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#38)
Re: pg_dump and large files - is this a problem?

Bruce Momjian <pgman@candle.pha.pa.us> writes:

However, since we don't know if we support any non-integral off_t
platforms, and because a configure test would require us to have two
code paths for with/without integral off_t, I suggest we apply my
version of Philip's patch and let's see if everyone can compile it
cleanly.

Actually, it looks to me like configure will spit up if off_t is not
an integral type:

/* Check that off_t can represent 2**63 - 1 correctly.
We can't simply define LARGE_OFF_T to be 9223372036854775807,
since some C++ compilers masquerading as C compilers
incorrectly reject 9223372036854775807. */
#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
&& LARGE_OFF_T % 2147483647 == 1)
? 1 : -1];

So I think we're wasting our time to debate whether we need to support
non-integral off_t ... let's just apply Bruce's version and wait to
see if anyone has a problem before doing more work.

regards, tom lane

#41Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#39)
Re: pg_dump and large files - is this a problem?

Philip Warner wrote:

At 12:00 PM 22/10/2002 -0400, Bruce Momjian wrote:

It does have the advantage of being more portable on systems
that do have integral off_t

I suspect it is no more portable than determining storage order by using
'int i = 256', then writing in storage order, and has the disadvantage that
it may break as discussed.

AFAICT, using storage order will not break under any circumstances within
one OS/architecture (unlike using shift), and will not break any more often
than using shift in cases where off_t is integral.

Your version will break more often because we are assuming we can
determine the endian-ness of the OS, _and_ for quad off_t types,
assuming we know that is stored the same too. While we have ending for
int's, I have no idea if quads are always stored the same. By accessing
it as an integral type, we make certain it is output the same way every
time for every OS.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#42Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#40)
Re: pg_dump and large files - is this a problem?

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

However, since we don't know if we support any non-integral off_t
platforms, and because a configure test would require us to have two
code paths for with/without integral off_t, I suggest we apply my
version of Philip's patch and let's see if everyone can compile it
cleanly.

Actually, it looks to me like configure will spit up if off_t is not
an integral type:

/* Check that off_t can represent 2**63 - 1 correctly.
We can't simply define LARGE_OFF_T to be 9223372036854775807,
since some C++ compilers masquerading as C compilers
incorrectly reject 9223372036854775807. */
#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
&& LARGE_OFF_T % 2147483647 == 1)
? 1 : -1];

So I think we're wasting our time to debate whether we need to support
non-integral off_t ... let's just apply Bruce's version and wait to
see if anyone has a problem before doing more work.

I am concerned about one more thing. On BSD/OS, we have off_t of quad
(8 byte), but we don't have fseeko, so this call looks questionable:

if (fseeko(AH->FH, tctx->dataPos, SEEK_SET) != 0)

In this case, dataPos is off_t (8 bytes), while fseek only accepts long
in that parameter (4 bytes). When this code is hit, a file > 4 gigs
will seek to the wrong offset, I am afraid. Also, I don't understand
why the compiler doesn't produce a warning.

I wonder if I should add a conditional test so this code is hit only if
HAVE_FSEEKO is defined. There is alternative code for all the non-zero
fseeks.

Comments?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#43Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#41)
Re: pg_dump and large files - is this a problem?

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Your version will break more often because we are assuming we can
determine the endian-ness of the OS, _and_ for quad off_t types,
assuming we know that is stored the same too. While we have ending for
int's, I have no idea if quads are always stored the same.

There is precedent for problems of that ilk, too, cf PDP_ENDIAN: years
ago someone made double-word-integer software routines and did not
think twice about which word should appear first in storage, with the
consequence that the storage order was neither little-endian nor
big-endian. (We have exactly the same issue with our CRC routines for
compilers without int64: the two-int32 struct is defined in a way that's
compatible with little-endian storage, and on a big-endian machine it'll
produce a funny storage order.)

Unless someone can point to a supported (or potentially interesting)
platform on which off_t is indeed not integral, I think the shift-based
code is our safest bet. (The precedent of the off_t checking code in
configure makes me really doubt that there are any platforms with
non-integral off_t.)

regards, tom lane

#44Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#23)
Re: pg_dump and large files - is this a problem?

Patch applied with shift <</>> changes by me. Thanks.

---------------------------------------------------------------------------

Philip Warner wrote:

I have put the latest patch at:

http://downloads.rhyme.com.au/postgresql/pg_dump/

along with two dump files of the regression DB, one with 4 byte
and the other with 8 byte offsets. I can read/restore each from
the other, so it looks pretty good. Once the endianness is tested,
we should be OK.

Known problems:

- will not cope with > 4GB files and size_t not 64 bit.
- when printing data position, it is assumed that off_t is UINT64
(we could remove this entirely - it's just for display)
- if seek is not supported, then an intXX is assigned to off_t
when file offsets are needed. This *should* not cause a problem
since without seek, the offsets will not be written to the file.

Changes from Prior Version:

- No longer stores or outputs data length
- Assumes result of ftello is correct if it disagrees with internally
kept tally.
- 'pg_restore -l' now shows sizes of int and offset.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#45Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Bruce Momjian (#42)
1 attachment(s)
Re: pg_dump and large files - is this a problem?

Bruce Momjian wrote:

So I think we're wasting our time to debate whether we need to support
non-integral off_t ... let's just apply Bruce's version and wait to
see if anyone has a problem before doing more work.

I am concerned about one more thing. On BSD/OS, we have off_t of quad
(8 byte), but we don't have fseeko, so this call looks questionable:

if (fseeko(AH->FH, tctx->dataPos, SEEK_SET) != 0)

In this case, dataPos is off_t (8 bytes), while fseek only accepts long
in that parameter (4 bytes). When this code is hit, a file > 4 gigs
will seek to the wrong offset, I am afraid. Also, I don't understand
why the compiler doesn't produce a warning.

I wonder if I should add a conditional test so this code is hit only if
HAVE_FSEEKO is defined. There is alternative code for all the non-zero
fseeks.

Here is a patch that I think fixes the problem I outlined above. If
there is no fseeko(), it will not call fseek with a non-zero offset
unless sizeof(off_t) <= sizeof(long).

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Attachments:

/pgpatches/pg_dumptext/plainDownload
Index: src/bin/pg_dump/pg_backup_custom.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/bin/pg_dump/pg_backup_custom.c,v
retrieving revision 1.22
diff -c -c -r1.22 pg_backup_custom.c
*** src/bin/pg_dump/pg_backup_custom.c	22 Oct 2002 19:15:23 -0000	1.22
--- src/bin/pg_dump/pg_backup_custom.c	22 Oct 2002 21:36:30 -0000
***************
*** 431,437 ****
  	if (tctx->dataState == K_OFFSET_NO_DATA)
  		return;
  
! 	if (!ctx->hasSeek || tctx->dataState == K_OFFSET_POS_NOT_SET)
  	{
  		/* Skip over unnecessary blocks until we get the one we want. */
  
--- 431,441 ----
  	if (tctx->dataState == K_OFFSET_NO_DATA)
  		return;
  
! 	if (!ctx->hasSeek || tctx->dataState == K_OFFSET_POS_NOT_SET
! #if !defined(HAVE_FSEEKO)
! 		|| sizeof(off_t) > sizeof(long)
! #endif
! 		)
  	{
  		/* Skip over unnecessary blocks until we get the one we want. */
  
***************
*** 809,815 ****
  		 * be ok to just use the existing self-consistent block
  		 * formatting.
  		 */
! 		if (ctx->hasSeek)
  		{
  			fseeko(AH->FH, tpos, SEEK_SET);
  			WriteToc(AH);
--- 813,823 ----
  		 * be ok to just use the existing self-consistent block
  		 * formatting.
  		 */
! 		if (ctx->hasSeek
! #if !defined(HAVE_FSEEKO)
! 			&& sizeof(off_t) <= sizeof(long)
! #endif
! 			)
  		{
  			fseeko(AH->FH, tpos, SEEK_SET);
  			WriteToc(AH);
#46Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#42)
Re: pg_dump and large files - is this a problem?

Bruce Momjian writes:

I am concerned about one more thing. On BSD/OS, we have off_t of quad
(8 byte), but we don't have fseeko, so this call looks questionable:

if (fseeko(AH->FH, tctx->dataPos, SEEK_SET) != 0)

Maybe you want to ask your OS provider how the heck this is supposed to
work. I mean, it's great to have wide types, but what's the point if the
API can't handle them?

--
Peter Eisentraut peter_e@gmx.net

#47Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Peter Eisentraut (#46)
Re: pg_dump and large files - is this a problem?

Peter Eisentraut wrote:

Bruce Momjian writes:

I am concerned about one more thing. On BSD/OS, we have off_t of quad
(8 byte), but we don't have fseeko, so this call looks questionable:

if (fseeko(AH->FH, tctx->dataPos, SEEK_SET) != 0)

Maybe you want to ask your OS provider how the heck this is supposed to
work. I mean, it's great to have wide types, but what's the point if the
API can't handle them?

Excellent question. They do have fsetpos/fgetpos, and I think they
think you are supposed to use those. However, they don't do seek from
current position, and they don't take an off_t, so I am confused myself.

I did ask on the mailing list and everyone kind of agreed it was a
missing feature. However, because of the way we call fseeko not knowing
if it is a quad or a long, I think we have to add the checks to prevent
such wild seeks from happening.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#48Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#45)
Re: pg_dump and large files - is this a problem?

At 05:37 PM 22/10/2002 -0400, Bruce Momjian wrote:

! if (ctx->hasSeek
! #if !defined(HAVE_FSEEKO)
! && sizeof(off_t) <= sizeof(long)
! #endif
! )

Just to clarify my understanding:

- HAVE_FSEEKO is tested & defined in configure
- If it is not defined, then all calls to fseeko will magically be
translated to fseek calls, and use the 'long' parameter type.

Is that right?

If so, why don't we:

#if defined(HAVE_FSEEKO)
#define FILE_OFFSET off_t
#define FSEEK fseeko
#else
#define FILE_OFFSET long
#define FSEEK fseek
#end if

then replace all refs to off_t with FILE_OFFSET, and fseeko with FSEEK.

Existing checks etc will then refuse to load file offsets with significant
bytes after the 4th byte, we will still use fseek/o in broken OS
implementations of off_t.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#49Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#48)
Re: pg_dump and large files - is this a problem?

Philip Warner wrote:

At 05:37 PM 22/10/2002 -0400, Bruce Momjian wrote:

! if (ctx->hasSeek
! #if !defined(HAVE_FSEEKO)
! && sizeof(off_t) <= sizeof(long)
! #endif
! )

Just to clarify my understanding:

- HAVE_FSEEKO is tested & defined in configure
- If it is not defined, then all calls to fseeko will magically be
translated to fseek calls, and use the 'long' parameter type.

Is that right?

If so, why don't we:

#if defined(HAVE_FSEEKO)
#define FILE_OFFSET off_t
#define FSEEK fseeko
#else
#define FILE_OFFSET long
#define FSEEK fseek
#end if

then replace all refs to off_t with FILE_OFFSET, and fseeko with FSEEK.

Existing checks etc will then refuse to load file offsets with significant
bytes after the 4th byte, we will still use fseek/o in broken OS
implementations of off_t.

Uh, not exactly. I have off_t as a quad, and I don't have fseeko, so
the above conditional doesn't work. I want to use off_t, but can't use
fseek(). As it turns out, the code already has options to handle no
fseek, so it seems to work anyway. I think what you miss may be the
table of contents in the archive, if I am reading the code correctly.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#50Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#49)
Re: pg_dump and large files - is this a problem?

At 10:46 PM 22/10/2002 -0400, Bruce Momjian wrote:

Uh, not exactly. I have off_t as a quad, and I don't have fseeko, so
the above conditional doesn't work. I want to use off_t, but can't use
fseek().

Then when you create dumps, they will be invalid since I assume that ftello
is also broken in the same way. You need to fix _getFilePos as well. And
any other place that uses an off_t needs to be looked at very carefully.
The code was written assuming that if 'hasSeek' was set, then we could
trust it.

Given that you say you do have support for some kind of 64 bt offset, I
would be a lot happier with these changes if you did something akin to my
original sauggestion:

#if defined(HAVE_FSEEKO)
#define FILE_OFFSET off_t
#define FSEEK fseeko
#elseif defined(HAVE_SOME_OTHER_FSEEK)
#define FILE_OFFSET some_other_offset
#define FSEEK some_other_fseek
#else
#define FILE_OFFSET long
#define FSEEK fseek
#end if

...assuming you have a non-broken 64 bit fseek/tell pair, then this will
work in all cases, and make the code a lot less ugly (assuming of course
the non-broken version can be shifted).

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#51Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#50)
Re: pg_dump and large files - is this a problem?

Sounds messy. Let me see if I can code up an fseeko/ftello for BSD/OS
and add that to /port. No reason to hold up beta for that, though.

I wonder if any other platforms have this limitation. I think we need
to add some type of test for no-fseeko()/ftello() and sizeof(off_t) >
sizeof(long). This fseeko/ftello/off_t is just too fluid, and the
failure modes too serious.

---------------------------------------------------------------------------

Philip Warner wrote:

At 10:46 PM 22/10/2002 -0400, Bruce Momjian wrote:

Uh, not exactly. I have off_t as a quad, and I don't have fseeko, so
the above conditional doesn't work. I want to use off_t, but can't use
fseek().

Then when you create dumps, they will be invalid since I assume that ftello
is also broken in the same way. You need to fix _getFilePos as well. And
any other place that uses an off_t needs to be looked at very carefully.
The code was written assuming that if 'hasSeek' was set, then we could
trust it.

Given that you say you do have support for some kind of 64 bt offset, I
would be a lot happier with these changes if you did something akin to my
original sauggestion:

#if defined(HAVE_FSEEKO)
#define FILE_OFFSET off_t
#define FSEEK fseeko
#elseif defined(HAVE_SOME_OTHER_FSEEK)
#define FILE_OFFSET some_other_offset
#define FSEEK some_other_fseek
#else
#define FILE_OFFSET long
#define FSEEK fseek
#end if

...assuming you have a non-broken 64 bit fseek/tell pair, then this will
work in all cases, and make the code a lot less ugly (assuming of course
the non-broken version can be shifted).

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#51)
Re: pg_dump and large files - is this a problem?

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I wonder if any other platforms have this limitation. I think we need
to add some type of test for no-fseeko()/ftello() and sizeof(off_t) >
sizeof(long). This fseeko/ftello/off_t is just too fluid, and the
failure modes too serious.

I am wondering why pg_dump has to depend on either fseek or ftell.

regards, tom lane

#53Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#52)
Re: pg_dump and large files - is this a problem?

At 12:32 AM 23/10/2002 -0400, Tom Lane wrote:

I am wondering why pg_dump has to depend on either fseek or ftell.

It doesn't - it just works better and has more features if they are
available, much like zlib etc.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#54Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#51)
Re: pg_dump and large files - is this a problem?

At 12:29 AM 23/10/2002 -0400, Bruce Momjian wrote:

This fseeko/ftello/off_t is just too fluid, and the
failure modes too serious.

I agree. Can you think of a better solution than the one I suggested???

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#55Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#50)
Re: pg_dump and large files - is this a problem?

OK, you are saying if we don't have fseeko(), there is no reason to use
off_t, and we may as well use long. What limitations does that impose,
and are the limitations clear to the user.

What has me confused is that I only see two places that use a non-zero
fseeko, and in those cases, there is a non-fseeko code path that does
the same thing, or the call isn't actually required. Both cases are in
pg_dump/pg_dump_custom.c. It appears seeking in the file is an
optimization that prevents all the blocks from being read. That is
fine, but we shouldn't introduce failure cases to do that.

If BSD/OS is the only problem OS, I can deal with that, but I have no
idea if other OS's have the same limitation, and because of the way our
code exists now, we are not even checking to see if there is a problem.

I did some poking around, and on BSD/OS, fgetpos/fsetpos use fpos_t,
which is actually off_t, and interestingly, lseek() uses off_t too.
Seems only fseek/ftell is limited to long. I can easily implemnt
fseeko/ftello using fgetpos/fsetpos, but that is only one OS.

One idea would be to patch up BSD/OS in backend/port/bsdi and add a
configure tests that actually fails if fseeko doesn't exist _and_
sizeof(off_t) > sizeof(long). That would at least catch OS's before
they make >2gig backups that can't be restored.

---------------------------------------------------------------------------

Philip Warner wrote:

At 10:46 PM 22/10/2002 -0400, Bruce Momjian wrote:

Uh, not exactly. I have off_t as a quad, and I don't have fseeko, so
the above conditional doesn't work. I want to use off_t, but can't use
fseek().

Then when you create dumps, they will be invalid since I assume that ftello
is also broken in the same way. You need to fix _getFilePos as well. And
any other place that uses an off_t needs to be looked at very carefully.
The code was written assuming that if 'hasSeek' was set, then we could
trust it.

Given that you say you do have support for some kind of 64 bt offset, I
would be a lot happier with these changes if you did something akin to my
original sauggestion:

#if defined(HAVE_FSEEKO)
#define FILE_OFFSET off_t
#define FSEEK fseeko
#elseif defined(HAVE_SOME_OTHER_FSEEK)
#define FILE_OFFSET some_other_offset
#define FSEEK some_other_fseek
#else
#define FILE_OFFSET long
#define FSEEK fseek
#end if

...assuming you have a non-broken 64 bit fseek/tell pair, then this will
work in all cases, and make the code a lot less ugly (assuming of course
the non-broken version can be shifted).

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#56Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#55)
Re: pg_dump and large files - is this a problem?

At 01:02 AM 23/10/2002 -0400, Bruce Momjian wrote:

OK, you are saying if we don't have fseeko(), there is no reason to use
off_t, and we may as well use long. What limitations does that impose,
and are the limitations clear to the user.

What I'm saying is that if we have not got fseeko then we should use any
'seek-class' function that returns a 64 bit value. We have already made the
assumption that off_t is an integer; the same logic that came to that
conclusion, applies just as validly to the other seek functions.

Secondly, if there is no 64 bit 'seek-class' function, then we should
probably use a size_t, but a long would probably be fine too. I am not
particularly attached to this part; long, int etc etc. Whatever is most
likely to return an integer and work with whatever function we choose.

As to implications: assuming they are all integers (which as you know I
don't like), we should have no problems.

If a system does not have any function to access 64 bit file offsets, then
I'd say they are pretty unlikely to have files > 2GB.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#57Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#56)
Re: pg_dump and large files - is this a problem?

Philip Warner wrote:

At 01:02 AM 23/10/2002 -0400, Bruce Momjian wrote:

OK, you are saying if we don't have fseeko(), there is no reason to use
off_t, and we may as well use long. What limitations does that impose,
and are the limitations clear to the user.

What I'm saying is that if we have not got fseeko then we should use any
'seek-class' function that returns a 64 bit value. We have already made the
assumption that off_t is an integer; the same logic that came to that
conclusion, applies just as validly to the other seek functions.

Oh, I see, so try to use fsetpos/fgetpos? I can write wrappers for
those to look like fgetpos/fsetpos and put it in /port.

Secondly, if there is no 64 bit 'seek-class' function, then we should
probably use a size_t, but a long would probably be fine too. I am not
particularly attached to this part; long, int etc etc. Whatever is most
likely to return an integer and work with whatever function we choose.

As to implications: assuming they are all integers (which as you know I
don't like), we should have no problems.

If a system does not have any function to access 64 bit file offsets, then
I'd say they are pretty unlikely to have files > 2GB.

OK, my OS can handle 64-bit files, but has only fgetpos/fsetpos, so I
could get that working. The bigger question is what about OS's that
have 64-bit off_t/files but don't have any seek-type functions. I did
research to find mine, but what about others that may have other
variants?

I think you are right that we have to not use off_t and use long if we
can't find a proper 64-bit seek function, but what are the failure modes
of doing this? Exactly what happens for larger files?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#58Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#56)
Re: pg_dump and large files - is this a problem?

Philip Warner wrote:

At 01:02 AM 23/10/2002 -0400, Bruce Momjian wrote:

OK, you are saying if we don't have fseeko(), there is no reason to use
off_t, and we may as well use long. What limitations does that impose,
and are the limitations clear to the user.

What I'm saying is that if we have not got fseeko then we should use any
'seek-class' function that returns a 64 bit value. We have already made the
assumption that off_t is an integer; the same logic that came to that
conclusion, applies just as validly to the other seek functions.

Secondly, if there is no 64 bit 'seek-class' function, then we should
probably use a size_t, but a long would probably be fine too. I am not
particularly attached to this part; long, int etc etc. Whatever is most
likely to return an integer and work with whatever function we choose.

As to implications: assuming they are all integers (which as you know I
don't like), we should have no problems.

If a system does not have any function to access 64 bit file offsets, then
I'd say they are pretty unlikely to have files > 2GB.

Let me see if I can be clearer. With shifting off_t, if that fails, we
will find out right away, at compile time. I think that is acceptable.

What I am concerned about are cases that fail at runtime, specifically
during a restore of a >2gig file. In my reading of the code, those
failures will be silent or will produce unusual error messages. I don't
think we can ship code that has strange failure modes for data restore.

Now, if someone knows those failure cases, I would love to hear about
it. If not, I will dig into the code today and find out where they are.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#59Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#57)
Re: pg_dump and large files - is this a problem?

Bruce Momjian writes:

I think you are right that we have to not use off_t and use long if we
can't find a proper 64-bit seek function, but what are the failure modes
of doing this? Exactly what happens for larger files?

First we need to decide what we want to happen and after that think about
how to implement it. Given sizeof(off_t) > sizeof(long) and no fseeko(),
we have the following options:

1. Disable access to large files.

2. Seek in some other way.

What's it gonna be?

--
Peter Eisentraut peter_e@gmx.net

#60Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Peter Eisentraut (#59)
Re: pg_dump and large files - is this a problem?

Peter Eisentraut wrote:

Bruce Momjian writes:

I think you are right that we have to not use off_t and use long if we
can't find a proper 64-bit seek function, but what are the failure modes
of doing this? Exactly what happens for larger files?

First we need to decide what we want to happen and after that think about
how to implement it. Given sizeof(off_t) > sizeof(long) and no fseeko(),
we have the following options:

1. Disable access to large files.

2. Seek in some other way.

What's it gonna be?

OK, well BSD/OS now works, but I wonder if there are any other quad
off_t OS's out there without fseeko.

How would we disable access to large files? Do we fstat the file and
see if it is too large? I suppose we are looking for cases where the
file system has large files, but fseeko doesn't allow us to access them.
Should we leave this issue alone and wait to find another OS with this
problem, and we can then rejigger fseeko.c to handle that OS too?

Looking at the pg_dump code, it seems the fseeks are optional in there
anyway because it already has code to read the file sequentially rather
than use fseek, and the TOC case in pg_backup_custom.c says that is
optional too.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#61Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#59)
Re: pg_dump and large files - is this a problem?

Peter Eisentraut <peter_e@gmx.net> writes:

First we need to decide what we want to happen and after that think about
how to implement it. Given sizeof(off_t) > sizeof(long) and no fseeko(),
we have the following options:

It seems obvious to me that there are no platforms that offer
sizeof(off_t) > sizeof(long) but have no API for doing seeks with off_t.
That would be just plain silly. IMHO it's acceptable for us to fail at
configure time if we can't figure out how to seek.

The question is *which* seek APIs we need to support. Are there any
besides fseeko() and fgetpos()?

regards, tom lane

#62Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#60)
Re: pg_dump and large files - is this a problem?

Bruce Momjian <pgman@candle.pha.pa.us> writes:

How would we disable access to large files?

I think configure should fail if it can't find a way to seek.
Workaround for anyone in that situation is configure --disable-largefile.

regards, tom lane

#63Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#61)
Re: pg_dump and large files - is this a problem?

Tom Lane wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

First we need to decide what we want to happen and after that think about
how to implement it. Given sizeof(off_t) > sizeof(long) and no fseeko(),
we have the following options:

It seems obvious to me that there are no platforms that offer
sizeof(off_t) > sizeof(long) but have no API for doing seeks with off_t.
That would be just plain silly. IMHO it's acceptable for us to fail at
configure time if we can't figure out how to seek.

I would certainly be happy failing at configure time, so we know at the
start what is broken, rather than failures during restore.

The question is *which* seek APIs we need to support. Are there any
besides fseeko() and fgetpos()?

What I have added is BSD/OS specific because only on BSD/OS do I know
fpos_t and off_t are the same type. If we come up with other platforms,
we will have to deal with it then.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#64Giles Lean
giles@nemeton.com.au
In reply to: Bruce Momjian (#60)
Re: pg_dump and large files - is this a problem?

Bruce Momjian <pgman@candle.pha.pa.us> writes:

OK, well BSD/OS now works, but I wonder if there are any other quad
off_t OS's out there without fseeko.

NetBSD prior to 1.6, released September 14, 2002. (Source: CVS logs.)

OpenBSD prior to 2.7, released June 15, 2000. (Source: release notes.)

FreeBSD has had fseeko() for some time, but I'm not sure which release
introduced it -- perhaps 3.2.0, released May, 1999. (Source: CVS logs.)

Regards,

Giles

#65Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#60)
Re: pg_dump and large files - is this a problem?

At 05:50 PM 23/10/2002 -0400, Bruce Momjian wrote:

Looking at the pg_dump code, it seems the fseeks are optional in there
anyway because it already has code to read the file sequentially rather

But there are features that are not available if it can't seek: eg. it will
not restore in a different order to that in which it was written; it will
not dump data offsets in the TOC so dump files can not be restored in
alternate orders; restore times will be large for a single table (it has to
read the entire file potentially).

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#66Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#65)
Re: pg_dump and large files - is this a problem?

Philip Warner wrote:

At 05:50 PM 23/10/2002 -0400, Bruce Momjian wrote:

Looking at the pg_dump code, it seems the fseeks are optional in there
anyway because it already has code to read the file sequentially rather

But there are features that are not available if it can't seek: eg. it will
not restore in a different order to that in which it was written; it will
not dump data offsets in the TOC so dump files can not be restored in
alternate orders; restore times will be large for a single table (it has to
read the entire file potentially).

OK, that helps. We just got a list of 2 other OS's without fseeko and
with large file support. Any NetBSD before Auguest 2002 has that
problem. We are going to need to either get fseeko workarounds for
those, or disable those features in a meaningful way.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#67Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Giles Lean (#64)
Re: pg_dump and large files - is this a problem?

Giles Lean wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

OK, well BSD/OS now works, but I wonder if there are any other quad
off_t OS's out there without fseeko.

NetBSD prior to 1.6, released September 14, 2002. (Source: CVS logs.)

OK, does pre-1.6 NetBSD have fgetpos/fsetpos that is off_t/quad?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#68Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#58)
Re: pg_dump and large files - is this a problem?

At 10:42 AM 23/10/2002 -0400, Bruce Momjian wrote:

What I am concerned about are cases that fail at runtime, specifically
during a restore of a >2gig file.

Please give an example that would still apply assuming we get a working
seek/tell pair that works with whatever we use as an offset?

If you are concerned about reading a dump file with 8 byte offsets on a
machine with 4 byte off_t, that case and it's permutations are already covered.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#69Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#68)
Re: pg_dump and large files - is this a problem?

Philip Warner wrote:

At 10:42 AM 23/10/2002 -0400, Bruce Momjian wrote:

What I am concerned about are cases that fail at runtime, specifically
during a restore of a >2gig file.

Please give an example that would still apply assuming we get a working
seek/tell pair that works with whatever we use as an offset?

If we get this, everything is fine. I have done that for BSD/OS today.
I may need to do the same for NetBSD/OpenBSD too.

If you are concerned about reading a dump file with 8 byte offsets on a
machine with 4 byte off_t, that case and it's permutations are already covered.

No, I know that is covered because it will report a proper error message
on the restore on the 4-byte off_t machine.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#70Philip Warner
pjw@rhyme.com.au
In reply to: Peter Eisentraut (#59)
Re: pg_dump and large files - is this a problem?

At 11:50 PM 23/10/2002 +0200, Peter Eisentraut wrote:

1. Disable access to large files.

2. Seek in some other way.

This gets my vote, but I would like to see a clean implementation (not huge
quantities if ifdefs every time we call fseek); either we write our own
fseek as Bruce seems to be suggesting, or we have a single header file that
defines the FSEEK/FTELL/OFF_T to point to the 'right' functions, where
'right' is defined as 'most likely to generate an integer and which makes
use of the largest number of bytes'.

The way the code is currently written it does not matter if this is a 16 or
3 byte value - so long as it is an integer.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#71Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#70)
Re: pg_dump and large files - is this a problem?

Philip Warner wrote:

At 11:50 PM 23/10/2002 +0200, Peter Eisentraut wrote:

1. Disable access to large files.

2. Seek in some other way.

This gets my vote, but I would like to see a clean implementation (not huge
quantities if ifdefs every time we call fseek); either we write our own
fseek as Bruce seems to be suggesting, or we have a single header file that
defines the FSEEK/FTELL/OFF_T to point to the 'right' functions, where
'right' is defined as 'most likely to generate an integer and which makes
use of the largest number of bytes'.

We have to write another function because fsetpos doesn't do SEEK_CUR so
you have to implement it with more complex code. It isn't a drop in
place thing.

The way the code is currently written it does not matter if this is a 16 or
3 byte value - so long as it is an integer.

Right. What we are assuming now is that off_t can be seeked using
whatever we defined for fseeko, which is incorrect in one, and now I
hear more than one OS.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#72Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#66)
Re: pg_dump and large files - is this a problem?

At 09:36 PM 23/10/2002 -0400, Bruce Momjian wrote:

We are going to need to either get fseeko workarounds for
those, or disable those features in a meaningful way.

????? if we have not got a 64 bit seek function of any kind, then use a 32
bit seek - the features don't need to be disabled. AFAICT, this is a
non-issue: no 64 bit seek means no large files.

I'm not sure we should even worry about it, but if you are genuinely
concerned that we have no 64 bit seek call, but we do have files > 4GB,
then If you really want to disable seek, just modify the code that sets
'hasSeek' - don't screw around with every seek call. But only modify clear
it if the file is > 4GB.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#73Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#69)
Re: pg_dump and large files - is this a problem?

At 09:41 PM 23/10/2002 -0400, Bruce Momjian wrote:

If we get this, everything is fine. I have done that for BSD/OS today.
I may need to do the same for NetBSD/OpenBSD too.

What did you do to achieve this?

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#74Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#73)
Re: pg_dump and large files - is this a problem?

Philip Warner wrote:

At 09:41 PM 23/10/2002 -0400, Bruce Momjian wrote:

If we get this, everything is fine. I have done that for BSD/OS today.
I may need to do the same for NetBSD/OpenBSD too.

What did you do to achieve this?

See src/port/fseeko.c in current CVS, with some configure.in glue.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#75Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#71)
Re: pg_dump and large files - is this a problem?

At 09:45 PM 23/10/2002 -0400, Bruce Momjian wrote:

We have to write another function because fsetpos doesn't do SEEK_CUR so
you have to implement it with more complex code. It isn't a drop in
place thing.

The only code that uses SEEK_CUR is the code to check if seek is available
- I am ver happy to change that to SEEK_SET - I can't even recall why I
used SEEK_CUR. The code that does the real seeks uses SEEK_SET.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#76Philip Warner
pjw@rhyme.com.au
In reply to: Philip Warner (#75)
Re: pg_dump and large files - is this a problem?

At 11:55 AM 24/10/2002 +1000, Philip Warner wrote:

The only code that uses SEEK_CUR is the code to check if seek is available
- I am ver happy to change that to SEEK_SET - I can't even recall why I
used SEEK_CUR. The code that does the real seeks uses SEEK_SET.

Come to think of it:

ctx->hasSeek = (fseeko(AH->FH, 0, SEEK_CUR) == 0);

should be replaced by:

#ifdef HAS_FSEEK[O]
ctx->hasSeek = TRUE;
#else
ctx->hasSeek = FALSE;
#endif

Since we're now checking for it in configure, we should remove the checks
from the pg_dump code.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#77Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#75)
Re: pg_dump and large files - is this a problem?

Philip Warner wrote:

At 09:45 PM 23/10/2002 -0400, Bruce Momjian wrote:

We have to write another function because fsetpos doesn't do SEEK_CUR so
you have to implement it with more complex code. It isn't a drop in
place thing.

The only code that uses SEEK_CUR is the code to check if seek is available
- I am ver happy to change that to SEEK_SET - I can't even recall why I
used SEEK_CUR. The code that does the real seeks uses SEEK_SET.

There are other problems. fgetpos() expects a pointer to an fpos_t,
while ftello just returns off_t, so you need a local variable in the
function to pass to fgetpos() and they return that from the function.

It is much cleaner to just duplicate the entire API so you don't have
any limitations or failure cases.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#78Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#76)
Re: pg_dump and large files - is this a problem?

Well, that certainly changes the functionality of the code. I thought
that fseeko test was done so that things that couldn't be seeked on were
detected. Not sure what isn't seek-able, maybe named pipes. I thought
it was testing that so I didn't touch that variable.

This was my original thought, that we have non-fseeko code in place.
Can we just trigger the non-fseeko code on HAS_FSEEKO. The code would
be something like:

if (sizeof(long) >= sizeof(off_t))
ctx->hasSeek = TRUE;
else
#ifdef HAVE_FSEEKO
ctx->hasSeek = TRUE;
#else
ctx->hasSeek = FALSE;
#endif

---------------------------------------------------------------------------

Philip Warner wrote:

At 11:55 AM 24/10/2002 +1000, Philip Warner wrote:

The only code that uses SEEK_CUR is the code to check if seek is available
- I am ver happy to change that to SEEK_SET - I can't even recall why I
used SEEK_CUR. The code that does the real seeks uses SEEK_SET.

Come to think of it:

ctx->hasSeek = (fseeko(AH->FH, 0, SEEK_CUR) == 0);

should be replaced by:

#ifdef HAS_FSEEK[O]
ctx->hasSeek = TRUE;
#else
ctx->hasSeek = FALSE;
#endif

Since we're now checking for it in configure, we should remove the checks
from the pg_dump code.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#79Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#78)
Re: pg_dump and large files - is this a problem?

At 10:08 PM 23/10/2002 -0400, Bruce Momjian wrote:

Well, that certainly changes the functionality of the code. I thought
that fseeko test was done so that things that couldn't be seeked on were
detected.

You are quite correct. It should read:

#ifdef HAVE_FSEEKO
ctx->hasSeek = fseeko(...,SEEK_SET);
#else
ctx->hasSeek = FALSE;
#endif

pipes are the main case for which we are checking.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#80Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#77)
Re: pg_dump and large files - is this a problem?

At 10:03 PM 23/10/2002 -0400, Bruce Momjian wrote:

It is much cleaner to just duplicate the entire API so you don't have
any limitations or failure cases.

We may still end up using macros in pg_dump to cope with cases where off_t
& fseeko are not defined - if there are any. I presume we would then just
revert to calling fseek/ftell etc.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#81Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#80)
Re: pg_dump and large files - is this a problem?

Philip Warner wrote:

At 10:03 PM 23/10/2002 -0400, Bruce Momjian wrote:

It is much cleaner to just duplicate the entire API so you don't have
any limitations or failure cases.

We may still end up using macros in pg_dump to cope with cases where off_t
& fseeko are not defined - if there are any. I presume we would then just
revert to calling fseek/ftell etc.

Well, we have fseeko falling back to fseek already, so that is working
fine. I don't think we will find any OS's without off_t. We just need
a little smarts. Let me see if I can work on it now.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#82Giles Lean
giles@nemeton.com.au
In reply to: Bruce Momjian (#67)
Re: pg_dump and large files - is this a problem?

OK, does pre-1.6 NetBSD have fgetpos/fsetpos that is off_t/quad?

Yes:

int
fgetpos(FILE *stream, fpos_t *pos);

int
fsetpos(FILE *stream, const fpos_t *pos);

Per comments in <stdio.h> fpos_t is the same format as off_t, and
off_t and fpos_t have been 64 bit since 1994.

http://cvsweb.netbsd.org/bsdweb.cgi/basesrc/include/stdio.h

Regards,

Giles

#83Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Giles Lean (#82)
Re: pg_dump and large files - is this a problem?

Looks like I have some more work to do. Thanks.

---------------------------------------------------------------------------

Giles Lean wrote:

OK, does pre-1.6 NetBSD have fgetpos/fsetpos that is off_t/quad?

Yes:

int
fgetpos(FILE *stream, fpos_t *pos);

int
fsetpos(FILE *stream, const fpos_t *pos);

Per comments in <stdio.h> fpos_t is the same format as off_t, and
off_t and fpos_t have been 64 bit since 1994.

http://cvsweb.netbsd.org/bsdweb.cgi/basesrc/include/stdio.h

Regards,

Giles

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#84Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Giles Lean (#82)
Re: pg_dump and large files - is this a problem?

OK, NetBSD added.

Any other OS's need this? Is it safe for me to code something that
assumes fpos_t and off_t are identical? I can't think of a good way to
test if two data types are identical. I don't think sizeof is enough.

---------------------------------------------------------------------------

Giles Lean wrote:

OK, does pre-1.6 NetBSD have fgetpos/fsetpos that is off_t/quad?

Yes:

int
fgetpos(FILE *stream, fpos_t *pos);

int
fsetpos(FILE *stream, const fpos_t *pos);

Per comments in <stdio.h> fpos_t is the same format as off_t, and
off_t and fpos_t have been 64 bit since 1994.

http://cvsweb.netbsd.org/bsdweb.cgi/basesrc/include/stdio.h

Regards,

Giles

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#85Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Bruce Momjian (#84)
Re: pg_dump and large files - is this a problem?

The question is *which* seek APIs we need to support. Are there any
besides fseeko() and fgetpos()?

On AIX we have
int fseeko64 (FILE* Stream, off64_t Offset, int Whence);
which is intended for large file access for programs that do NOT
#define _LARGE_FILES

It is functionality that is available if _LARGE_FILE_API is defined,
which is the default if _LARGE_FILES is not defined.

That would have been my preferred way of handling large files on AIX
in the two/three? places that need it (pg_dump/restore, psql and backend COPY).
This would have had the advantage that off_t is not 64 bit in all other places
where it is actually not needed, no ?

Andreas

#86Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#84)
Re: pg_dump and large files - is this a problem?

Bruce Momjian writes:

OK, NetBSD added.

Any other OS's need this? Is it safe for me to code something that
assumes fpos_t and off_t are identical? I can't think of a good way to
test if two data types are identical. I don't think sizeof is enough.

No, you can't assume that fpos_t and off_t are identical.

But you can simulate a long fseeko() by calling fseek() multiple times, so
it should be possible to write a replacement that works on all systems.

--
Peter Eisentraut peter_e@gmx.net

#87Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Peter Eisentraut (#86)
Re: pg_dump and large files - is this a problem?

Peter Eisentraut wrote:

Bruce Momjian writes:

OK, NetBSD added.

Any other OS's need this? Is it safe for me to code something that
assumes fpos_t and off_t are identical? I can't think of a good way to
test if two data types are identical. I don't think sizeof is enough.

No, you can't assume that fpos_t and off_t are identical.

I was wondering --- if fpos_t and off_t are identical sizeof, and fpos_t
can do shift << or >>, that means fpos_t is also integral like off_t.
Can I then assume they are the same?

But you can simulate a long fseeko() by calling fseek() multiple times, so
it should be possible to write a replacement that works on all systems.

Yes, but I can't simulate ftello, so I then can't do SEEK_CUR. and if I
can't duplicate the entire API, I don't want to try.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#88Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#79)
1 attachment(s)
Re: pg_dump and large files - is this a problem?

Philip Warner wrote:

At 10:08 PM 23/10/2002 -0400, Bruce Momjian wrote:

Well, that certainly changes the functionality of the code. I thought
that fseeko test was done so that things that couldn't be seeked on were
detected.

You are quite correct. It should read:

#ifdef HAVE_FSEEKO
ctx->hasSeek = fseeko(...,SEEK_SET);
#else
ctx->hasSeek = FALSE;
#endif

pipes are the main case for which we are checking.

OK, I have applied the following patch to set hasSeek only if
fseek/fseeko is reliable. This takes care of the random failure case
for large files. Now I need to see if I can get the custom fseeko
working for more platforms.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Attachments:

/bjm/difftext/plainDownload
Index: src/bin/pg_dump/common.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/bin/pg_dump/common.c,v
retrieving revision 1.71
diff -c -c -r1.71 common.c
*** src/bin/pg_dump/common.c	9 Oct 2002 16:20:25 -0000	1.71
--- src/bin/pg_dump/common.c	25 Oct 2002 01:30:51 -0000
***************
*** 290,296 ****
  		 * attr with the same name, then only dump it if:
  		 *
  		 * - it is NOT NULL and zero parents are NOT NULL
! 		 *   OR 
  		 * - it has a default value AND the default value does not match
  		 *   all parent default values, or no parents specify a default.
  		 *
--- 290,296 ----
  		 * attr with the same name, then only dump it if:
  		 *
  		 * - it is NOT NULL and zero parents are NOT NULL
! 		 *   OR
  		 * - it has a default value AND the default value does not match
  		 *   all parent default values, or no parents specify a default.
  		 *
Index: src/bin/pg_dump/pg_backup_archiver.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/bin/pg_dump/pg_backup_archiver.c,v
retrieving revision 1.59
diff -c -c -r1.59 pg_backup_archiver.c
*** src/bin/pg_dump/pg_backup_archiver.c	22 Oct 2002 19:15:23 -0000	1.59
--- src/bin/pg_dump/pg_backup_archiver.c	25 Oct 2002 01:30:57 -0000
***************
*** 2338,2343 ****
--- 2338,2369 ----
  }
  
  
+ /*
+  * checkSeek
+  *	  check to see if fseek can be performed.
+  */
+ 
+ bool
+ checkSeek(FILE *fp)
+ {
+ 
+ 	if (fseek(fp, 0, SEEK_CUR) != 0)
+ 		return false;
+ 	else if (sizeof(off_t) > sizeof(long))
+ 	/*
+ 	 *	At this point, off_t is too large for long, so we return
+ 	 *	based on whether an off_t version of fseek is available.
+ 	 */
+ #ifdef HAVE_FSEEKO
+ 		return true;
+ #else
+ 		return false;
+ #endif
+ 	else
+ 		return true;
+ }
+ 
+ 
  static void
  _SortToc(ArchiveHandle *AH, TocSortCompareFn fn)
  {
Index: src/bin/pg_dump/pg_backup_archiver.h
===================================================================
RCS file: /cvsroot/pgsql-server/src/bin/pg_dump/pg_backup_archiver.h,v
retrieving revision 1.48
diff -c -c -r1.48 pg_backup_archiver.h
*** src/bin/pg_dump/pg_backup_archiver.h	22 Oct 2002 19:15:23 -0000	1.48
--- src/bin/pg_dump/pg_backup_archiver.h	25 Oct 2002 01:30:58 -0000
***************
*** 27,32 ****
--- 27,33 ----
  
  #include "postgres_fe.h"
  
+ #include <stdio.h>
  #include <time.h>
  #include <errno.h>
  
***************
*** 284,289 ****
--- 285,291 ----
  extern void WriteDataChunks(ArchiveHandle *AH);
  
  extern int	TocIDRequired(ArchiveHandle *AH, int id, RestoreOptions *ropt);
+ extern bool checkSeek(FILE *fp);
  
  /*
   * Mandatory routines for each supported format
Index: src/bin/pg_dump/pg_backup_custom.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/bin/pg_dump/pg_backup_custom.c,v
retrieving revision 1.22
diff -c -c -r1.22 pg_backup_custom.c
*** src/bin/pg_dump/pg_backup_custom.c	22 Oct 2002 19:15:23 -0000	1.22
--- src/bin/pg_dump/pg_backup_custom.c	25 Oct 2002 01:31:01 -0000
***************
*** 179,185 ****
  		if (!AH->FH)
  			die_horribly(AH, modulename, "could not open archive file %s: %s\n", AH->fSpec, strerror(errno));
  
! 		ctx->hasSeek = (fseeko(AH->FH, 0, SEEK_CUR) == 0);
  	}
  	else
  	{
--- 179,185 ----
  		if (!AH->FH)
  			die_horribly(AH, modulename, "could not open archive file %s: %s\n", AH->fSpec, strerror(errno));
  
! 		ctx->hasSeek = checkSeek(AH->FH);
  	}
  	else
  	{
***************
*** 190,196 ****
  		if (!AH->FH)
  			die_horribly(AH, modulename, "could not open archive file %s: %s\n", AH->fSpec, strerror(errno));
  
! 		ctx->hasSeek = (fseeko(AH->FH, 0, SEEK_CUR) == 0);
  
  		ReadHead(AH);
  		ReadToc(AH);
--- 190,196 ----
  		if (!AH->FH)
  			die_horribly(AH, modulename, "could not open archive file %s: %s\n", AH->fSpec, strerror(errno));
  
! 		ctx->hasSeek = checkSeek(AH->FH);
  
  		ReadHead(AH);
  		ReadToc(AH);
Index: src/bin/pg_dump/pg_backup_files.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/bin/pg_dump/pg_backup_files.c,v
retrieving revision 1.20
diff -c -c -r1.20 pg_backup_files.c
*** src/bin/pg_dump/pg_backup_files.c	22 Oct 2002 19:15:23 -0000	1.20
--- src/bin/pg_dump/pg_backup_files.c	25 Oct 2002 01:31:01 -0000
***************
*** 129,135 ****
  		if (AH->FH == NULL)
  			die_horribly(NULL, modulename, "could not open output file: %s\n", strerror(errno));
  
! 		ctx->hasSeek = (fseeko(AH->FH, 0, SEEK_CUR) == 0);
  
  		if (AH->compression < 0 || AH->compression > 9)
  			AH->compression = Z_DEFAULT_COMPRESSION;
--- 129,135 ----
  		if (AH->FH == NULL)
  			die_horribly(NULL, modulename, "could not open output file: %s\n", strerror(errno));
  
! 		ctx->hasSeek = checkSeek(AH->FH);
  
  		if (AH->compression < 0 || AH->compression > 9)
  			AH->compression = Z_DEFAULT_COMPRESSION;
***************
*** 147,153 ****
  		if (AH->FH == NULL)
  			die_horribly(NULL, modulename, "could not open input file: %s\n", strerror(errno));
  
! 		ctx->hasSeek = (fseeko(AH->FH, 0, SEEK_CUR) == 0);
  
  		ReadHead(AH);
  		ReadToc(AH);
--- 147,153 ----
  		if (AH->FH == NULL)
  			die_horribly(NULL, modulename, "could not open input file: %s\n", strerror(errno));
  
! 		ctx->hasSeek = checkSeek(AH->FH);
  
  		ReadHead(AH);
  		ReadToc(AH);
Index: src/bin/pg_dump/pg_backup_tar.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/bin/pg_dump/pg_backup_tar.c,v
retrieving revision 1.31
diff -c -c -r1.31 pg_backup_tar.c
*** src/bin/pg_dump/pg_backup_tar.c	22 Oct 2002 19:15:23 -0000	1.31
--- src/bin/pg_dump/pg_backup_tar.c	25 Oct 2002 01:31:04 -0000
***************
*** 190,196 ****
  		 */
  		/* setvbuf(ctx->tarFH, NULL, _IONBF, 0); */
  
! 		ctx->hasSeek = (fseeko(ctx->tarFH, 0, SEEK_CUR) == 0);
  
  		if (AH->compression < 0 || AH->compression > 9)
  			AH->compression = Z_DEFAULT_COMPRESSION;
--- 190,196 ----
  		 */
  		/* setvbuf(ctx->tarFH, NULL, _IONBF, 0); */
  
! 		ctx->hasSeek = checkSeek(ctx->tarFH);
  
  		if (AH->compression < 0 || AH->compression > 9)
  			AH->compression = Z_DEFAULT_COMPRESSION;
***************
*** 227,233 ****
  
  		ctx->tarFHpos = 0;
  
! 		ctx->hasSeek = (fseeko(ctx->tarFH, 0, SEEK_CUR) == 0);
  
  		/*
  		 * Forcibly unmark the header as read since we use the lookahead
--- 227,233 ----
  
  		ctx->tarFHpos = 0;
  
! 		ctx->hasSeek = checkSeek(ctx->tarFH);
  
  		/*
  		 * Forcibly unmark the header as read since we use the lookahead
#89Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Zeugswetter Andreas SB SD (#85)
Re: pg_dump and large files - is this a problem?

Zeugswetter Andreas SB SD wrote:

The question is *which* seek APIs we need to support. Are there any
besides fseeko() and fgetpos()?

On AIX we have
int fseeko64 (FILE* Stream, off64_t Offset, int Whence);
which is intended for large file access for programs that do NOT
#define _LARGE_FILES

It is functionality that is available if _LARGE_FILE_API is defined,
which is the default if _LARGE_FILES is not defined.

That would have been my preferred way of handling large files on AIX
in the two/three? places that need it (pg_dump/restore, psql and backend COPY).
This would have had the advantage that off_t is not 64 bit in all other places
where it is actually not needed, no ?

OK, I am focusing on AIX now. I don't think we can go down the road of
saying where large file support is needed or not needed. I think for
each platform either we support large files or we don't. Is there a way
to have off_t be 64 bits everywhere, and if it is, why wouldn't we just
enable that rather than poke around figuring out where it is needed?

Also, I have the open item:

Fix AIX + Large File + Flex problem

Is there an AIX problem with Flex?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#90Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#88)
Re: pg_dump and large files - is this a problem?

The patch will not work. Please reread my quoted email.

At 09:32 PM 24/10/2002 -0400, Bruce Momjian wrote:

Philip Warner wrote:

You are quite correct. It should read:

#ifdef HAVE_FSEEKO
ctx->hasSeek = fseeko(...,SEEK_SET);
#else
ctx->hasSeek = FALSE;
#endif

pipes are the main case for which we are checking.

OK, I have applied the following patch to set hasSeek only if
fseek/fseeko is reliable.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#91Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#90)
Re: pg_dump and large files - is this a problem?

You are going to have to be more specific than that.

---------------------------------------------------------------------------

Philip Warner wrote:

The patch will not work. Please reread my quoted email.

At 09:32 PM 24/10/2002 -0400, Bruce Momjian wrote:

Philip Warner wrote:

You are quite correct. It should read:

#ifdef HAVE_FSEEKO
ctx->hasSeek = fseeko(...,SEEK_SET);
#else
ctx->hasSeek = FALSE;
#endif

pipes are the main case for which we are checking.

OK, I have applied the following patch to set hasSeek only if
fseek/fseeko is reliable.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#92Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#91)
Re: pg_dump and large files - is this a problem?

At 09:56 PM 24/10/2002 -0400, Bruce Momjian wrote:

You are quite correct. It should read:

#ifdef HAVE_FSEEKO
ctx->hasSeek = fseeko(...,SEEK_SET);

^^^^^^^^^^^^^^^^^^^^^^

#else
ctx->hasSeek = FALSE;
#endif

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#93Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#89)
Re: pg_dump and large files - is this a problem?

At 09:38 PM 24/10/2002 -0400, Bruce Momjian wrote:

OK, I am focusing on AIX now. I don't think we can go down the road of
saying where large file support is needed or not needed. I think for
each platform either we support large files or we don't.

Rather than having a different patch file for each platform and refusing to
code fseek/tell because we can't do SEEK_CUR, why not check for FSEEKO64
and revert to a simple solution:

#ifdef HAVE_FSEEKO64
#define FSEEK fseeko64
#define FTELL ftello64
#define FILE_OFFSET off64_t
#else
#ifdef HAVE_FSEEKO
#define FSEEK fseeko
#define FTELL ftello
#define FILE_OFFSET off_t
#else
#if HAVE_FSEEK_BETTER_THAN_32_BIT
#define FSEEK FSEEK_BETTER_THAN_32_BIT
#define FTELL FTELL_BETTER_THAN_32_BIT
#define FILE_OFFSET FILE_OFFSET_BETTER_THAN_32_BIT
#else
#if sizeof(off_t) > sizeof(long)
#define IGNORE_FSEEK
#else
#define FSEEK fseek
#define FTELL ftell
#define FILE_OFFSET long
#end if...

Then use a correct checkSeek which also checks IGNORE_FSEEK.

AFAICT, this *will* do the job on all systems discussed. And we can
certainly skip the HAVE_FSEEK_BETTER_THAN_32_BIT bit, but coding a trivial
seek/tell pair for fsetpos/fgetpos is easy, even in a macro.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#94Philip Warner
pjw@rhyme.com.au
In reply to: Philip Warner (#93)
Re: pg_dump and large files - is this a problem?

I just reread the patch; is it valid to assume fseek and fseeko have the
same failure modes? Or does the call to 'fseek' actually call fseeko?

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#95Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#92)
Re: pg_dump and large files - is this a problem?

OK, finally figured it out. I had used fseek instead of fseeko.

---------------------------------------------------------------------------

Philip Warner wrote:

At 09:56 PM 24/10/2002 -0400, Bruce Momjian wrote:

You are quite correct. It should read:

#ifdef HAVE_FSEEKO
ctx->hasSeek = fseeko(...,SEEK_SET);

^^^^^^^^^^^^^^^^^^^^^^

#else
ctx->hasSeek = FALSE;
#endif

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#96Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#94)
Re: pg_dump and large files - is this a problem?

Philip Warner wrote:

I just reread the patch; is it valid to assume fseek and fseeko have the
same failure modes? Or does the call to 'fseek' actually call fseeko?

The fseek was a typo. It should have been fseeko as you suggested.
CVS updated.

Your idea of using SEEK_SET is good, except I was concerned that the
checkSeek call will move the file pointer. Is that OK? It doesn't seem
appropriate.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#97Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#93)
Re: pg_dump and large files - is this a problem?

Philip Warner wrote:

Rather than having a different patch file for each platform and refusing to
code fseek/tell because we can't do SEEK_CUR, why not check for FSEEKO64
and revert to a simple solution:

#ifdef HAVE_FSEEKO64
#define FSEEK fseeko64
#define FTELL ftello64
#define FILE_OFFSET off64_t

We can do this, but there is the problem of making the code pretty ugly.
Also, it is not immediately clear when off_t is something to be used by
fseek and when it is being used in file offsets that will never be
seeked. I am concerned about perhaps making things worse than they are
now.

#else
#ifdef HAVE_FSEEKO
#define FSEEK fseeko
#define FTELL ftello
#define FILE_OFFSET off_t
#else
#if HAVE_FSEEK_BETTER_THAN_32_BIT
#define FSEEK FSEEK_BETTER_THAN_32_BIT
#define FTELL FTELL_BETTER_THAN_32_BIT
#define FILE_OFFSET FILE_OFFSET_BETTER_THAN_32_BIT
#else
#if sizeof(off_t) > sizeof(long)

Can't do sizeof() tests in cpp, which is where the #if is processed.

#define IGNORE_FSEEK
#else
#define FSEEK fseek
#define FTELL ftell
#define FILE_OFFSET long
#end if...

Then use a correct checkSeek which also checks IGNORE_FSEEK.

AFAICT, this *will* do the job on all systems discussed. And we can
certainly skip the HAVE_FSEEK_BETTER_THAN_32_BIT bit, but coding a trivial
seek/tell pair for fsetpos/fgetpos is easy, even in a macro.

I don't think we can assume that off_t can be passed to fset/getpos
unless we know the platform supports it, unless people think fpos_t
being integral and the same size as fpos_t is enough.

Also, I don't think these can be done a macro, perhaps
fseeko(...,SEEK_SET), but not the others, and not ftello. See
port/fseeko.c for the reason.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#98Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#97)
Re: pg_dump and large files - is this a problem?

At 12:07 AM 25/10/2002 -0400, Bruce Momjian wrote:

I don't think we can assume that off_t can be passed to fset/getpos
unless we know the platform supports it, unless people think fpos_t
being integral and the same size as fpos_t is enough.

We don't need to. We would #define FILE_OFFSET as fpos_t in that case.

Also, I don't think these can be done a macro, perhaps
fseeko(...,SEEK_SET), but not the others, and not ftello. See
port/fseeko.c for the reason.

My understanding was that you could define a block and declare variables in
a macro; just use a local for the temp storage of the in/out args. If this
is the only thing stopping you adopting this approach, then I am very happy
to try to code the macros properly.

However, I do get the impression that there is more resistance to the idea
than just this.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#99Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#96)
Re: pg_dump and large files - is this a problem?

At 11:51 PM 24/10/2002 -0400, Bruce Momjian wrote:

Your idea of using SEEK_SET is good, except I was concerned that the
checkSeek call will move the file pointer. Is that OK? It doesn't seem
appropriate.

The call is made just after the file is opened (or it should be!), so
SEEK_SET, 0 will not be a problem.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#100Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Philip Warner (#99)
Re: pg_dump and large files - is this a problem?

The question is *which* seek APIs we need to support. Are there any
besides fseeko() and fgetpos()?

On AIX we have
int fseeko64 (FILE* Stream, off64_t Offset, int Whence);
which is intended for large file access for programs that do NOT
#define _LARGE_FILES

It is functionality that is available if _LARGE_FILE_API is defined,
which is the default if _LARGE_FILES is not defined.

That would have been my preferred way of handling large files on AIX
in the two/three? places that need it (pg_dump/restore, psql and backend COPY).
This would have had the advantage that off_t is not 64 bit in all other places
where it is actually not needed, no ?

OK, I am focusing on AIX now. I don't think we can go down the road of
saying where large file support is needed or not needed. I think for
each platform either we support large files or we don't. Is there a way
to have off_t be 64 bits everywhere, and if it is, why wouldn't we just
enable that rather than poke around figuring out where it is needed?

if _LARGE_FILES is defined, off_t is 64 bits on AIX (and fseeko works).
The problem with flex is, that the generated c file does #include <unistd.h>
before we #include "postgres.h".
In this situation _LARGE_FILES is not defined for unistd.h and unistd.h
chooses to define _LARGE_FILE_API, those two are not compatible.

If a general off_t of 64 bits is no performance problem, we should focus
on fixing the #include <unistd.h> issue, and forget what I wanted/hinted.
Peter E. has a patch for this in his pipeline. I can give it a second try
tomorrow.

Sorry for the late answer, I am very pressed currently :-(
Andreas

#101Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#100)
Re: pg_dump and large files - is this a problem?

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

The problem with flex is, that the generated c file does #include <unistd.h>
before we #include "postgres.h".
In this situation _LARGE_FILES is not defined for unistd.h and unistd.h
chooses to define _LARGE_FILE_API, those two are not compatible.

Yeah. AFAICS the only way around this is to avoid doing any I/O
operations in the flex-generated files. Fortunately, that's not much
of a restriction.

regards, tom lane

#102Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#101)
Re: pg_dump and large files - is this a problem?

The problem with flex is, that the generated c file does #include <unistd.h>
before we #include "postgres.h".
In this situation _LARGE_FILES is not defined for unistd.h and unistd.h
chooses to define _LARGE_FILE_API, those two are not compatible.

Yeah. AFAICS the only way around this is to avoid doing any I/O
operations in the flex-generated files. Fortunately, that's not much
of a restriction.

Unfortunately I do not think that is sufficient, since the problem is already
at the #include level. The compiler barfs on the second #include <unistd.h>
from postgres.h

Andreas

#103Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#102)
Re: pg_dump and large files - is this a problem?

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

Yeah. AFAICS the only way around this is to avoid doing any I/O
operations in the flex-generated files. Fortunately, that's not much
of a restriction.

Unfortunately I do not think that is sufficient, since the problem is already
at the #include level. The compiler barfs on the second #include <unistd.h>
from postgres.h

AIX is too stupid to wrap unistd.h in an "#ifndef" to protect against
double inclusion? I suppose we could do that for them...

regards, tom lane

#104Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#103)
Re: pg_dump and large files - is this a problem?

Yeah. AFAICS the only way around this is to avoid doing any I/O
operations in the flex-generated files. Fortunately,

that's not much

of a restriction.

Unfortunately I do not think that is sufficient, since the problem is already
at the #include level. The compiler barfs on the second #include <unistd.h>
from postgres.h

AIX is too stupid to wrap unistd.h in an "#ifndef" to protect against
double inclusion? I suppose we could do that for them...

I guess that is exactly not wanted, since that would hide the actual
problem, namely that _LARGE_FILE_API gets defined (off_t --> 32bit).
Thus I think IBM did not protect unistd.h on purpose.

Andreas

#105Peter Eisentraut
peter_e@gmx.net
In reply to: Zeugswetter Andreas SB SD (#104)
Re: pg_dump and large files - is this a problem?

Zeugswetter Andreas SB SD writes:

AIX is too stupid to wrap unistd.h in an "#ifndef" to protect against
double inclusion? I suppose we could do that for them...

I guess that is exactly not wanted, since that would hide the actual
problem, namely that _LARGE_FILE_API gets defined (off_t --> 32bit).
Thus I think IBM did not protect unistd.h on purpose.

I think the problem is more accurately described thus: Flex generated
files include <stdio.h> before "postgres.h" due to the way it lays out the
code in the output. stdio.h does something which prevents switching to
the large file model later on in postgres.h. (This manifests itself in
unistd.h, but unistd.h itself is not the problem per se.)

The proposed fix was to include the flex output in some other file (such
as the corresponding grammar file) rather than to compile it separately.
The patch just needs to be tried out.

--
Peter Eisentraut peter_e@gmx.net

#106Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#105)
Re: pg_dump and large files - is this a problem?

Peter Eisentraut <peter_e@gmx.net> writes:

The proposed fix was to include the flex output in some other file (such
as the corresponding grammar file) rather than to compile it separately.

Seems like a reasonable solution. Can you make that happen in the next
day or two? If not, I'll take a whack at it ...

regards, tom lane

#107Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#105)
Re: pg_dump and large files - is this a problem?

Peter Eisentraut <peter_e@gmx.net> writes:

I think the problem is more accurately described thus: Flex generated
files include <stdio.h> before "postgres.h" due to the way it lays out the
code in the output. stdio.h does something which prevents switching to
the large file model later on in postgres.h. (This manifests itself in
unistd.h, but unistd.h itself is not the problem per se.)

The proposed fix was to include the flex output in some other file (such
as the corresponding grammar file) rather than to compile it separately.

I have made this change. CVS tip should compile cleanly now on machines
where this is an issue.

regards, tom lane

#108Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#107)
1 attachment(s)
Re: pg_dump and large files - is this a problem?

Tom Lane writes:

I think the problem is more accurately described thus: Flex generated
files include <stdio.h> before "postgres.h" due to the way it lays out the
code in the output. stdio.h does something which prevents switching to
the large file model later on in postgres.h. (This manifests itself in
unistd.h, but unistd.h itself is not the problem per se.)

The proposed fix was to include the flex output in some other file (such
as the corresponding grammar file) rather than to compile it separately.

I have made this change. CVS tip should compile cleanly now on machines
where this is an issue.

Hmm, sorry for the late response, but I was away on the (long) weekend :-(
I think your patch might be the source for Christopher's build problem
(Compile problem on FreeBSD/Alpha).

Peter already had a patch, that I tested, modified a little, and sent him back
for inclusion into CVS.

I will attach his patch with my small fixes for cross reference.
The issue is, that you need to remove the #include "bootstrap_tokens.h"
line from the lex file.

Andreas

Attachments:

flex-patch2.gzapplication/x-gzip; name=flex-patch2.gzDownload
#109Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#108)
Re: pg_dump and large files - is this a problem?

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

The issue is, that you need to remove the #include "bootstrap_tokens.h"
line from the lex file.

Good point; I'm surprised gcc doesn't spit up on that. I've made that
mod and also added the inclusion-order-correction in pqsignal.c.

regards, tom lane

#110Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#109)
Re: pg_dump and large files - is this a problem?

Does this resolve our AIX compile problem?

---------------------------------------------------------------------------

Tom Lane wrote:

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

The issue is, that you need to remove the #include "bootstrap_tokens.h"
line from the lex file.

Good point; I'm surprised gcc doesn't spit up on that. I've made that
mod and also added the inclusion-order-correction in pqsignal.c.

regards, tom lane

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073