Large file support available

Started by Peter Eisentrautover 23 years ago18 messages
#1Peter Eisentraut
peter_e@gmx.net

Large file support is now compiled by default if available. (Use
--disable-largefile to turn it off. That's what Autoconf gives us.)

But:

The zlib library uses unsigned ints and unsigned longs for file positions
and offsets. Depending on how that is used in detail and depending on how
zlib itself is compiled, this may or may not work.

The tar file format (POSIX and traditional) has an inherent limitation on
the size of the member files of 2^33 bytes (pg_dump currently only handles
2^30). The result in that case continues to be a broken archive. The GNU
tar format has an extension that would handle 2^89 bytes. This may be
something interesting to work on.

--
Peter Eisentraut peter_e@gmx.net

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#1)
Re: Large file support available

Peter Eisentraut <peter_e@gmx.net> writes:

Large file support is now compiled by default if available.

I am now getting (on HPUX 10.20)

/usr/include/sys/resource.h: In function `getrlimit':
/usr/include/sys/resource.h:168: warning: implicit declaration of function `__getrlimit64'
/usr/include/sys/resource.h: In function `setrlimit':
/usr/include/sys/resource.h:170: warning: implicit declaration of function `__setrlimit64'

for essentially every file in the system. A little digging shows that
this is happening because _FILE64 is defined and _LARGEFILE64_SOURCE
is not; this is evidently a Bad Idea on HPUX.

Further digging shows that noplace in the standard headers is
_LARGEFILE64_SOURCE #define'd, so evidently one is supposed to supply it
from user headers. Ugh. Please add this to the list of
platform-specific symbols that had better be turned on to support large
files.

regards, tom lane

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#1)
Re: Large file support available

Also, even with configure --disable-largefile, I find that pg_config.h
still contains

/* Define to 1 to make fseeko visible on some hosts. */
#define _LARGEFILE_SOURCE 1

/* Define to 1 if fseeko (and presumably ftello) exists and is declared. */
#define HAVE_FSEEKO 1

This strikes me as probably Not a Good Thing, although I haven't dug to
see what the implications are.

regards, tom lane

#4Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Peter Eisentraut (#1)
Re: Large file support available

Large file support is now compiled by default if available. (Use
--disable-largefile to turn it off. That's what Autoconf gives us.)

Are you sure that backend gains more performance than 1GB segmented
file (I mean large file support turn on LET_OS_MANAGE_FILESIZE)? I
myself have not tried yet, but a linux kernel hacker around me gave
this question sometime ago.
--
Tatsuo Ishii

#5Peter Eisentraut
peter_e@gmx.net
In reply to: Tatsuo Ishii (#4)
Re: Large file support available

Tatsuo Ishii writes:

Are you sure that backend gains more performance than 1GB segmented
file (I mean large file support turn on LET_OS_MANAGE_FILESIZE)?

No idea. My change only enables access to large files, it doesn't change
the segmentation logic in the backend. The main use at this point is for
pg_dump-related activities.

In fact, while the large file support API can handle 64-bit offsets, its
availability and use don't guarantee that the file system will support any
particular file size. So the segmentation logic in the backend isn't
going anywhere, as far as I'm concerned.

--
Peter Eisentraut peter_e@gmx.net

#6Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#3)
Re: Large file support available

Tom Lane writes:

Also, even with configure --disable-largefile, I find that pg_config.h
still contains

/* Define to 1 to make fseeko visible on some hosts. */
#define _LARGEFILE_SOURCE 1

/* Define to 1 if fseeko (and presumably ftello) exists and is declared. */
#define HAVE_FSEEKO 1

This strikes me as probably Not a Good Thing, although I haven't dug to
see what the implications are.

This is harmless (until proven otherwise). fseeko() is identical to
fseek() except that the offset argument uses off_t, and _LARGEFILE_SOURCE
makes fseeko() and friends visible in the headers. That's all. No large
files involved.

--
Peter Eisentraut peter_e@gmx.net

#7Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#2)
Re: Large file support available

Tom Lane writes:

/usr/include/sys/resource.h: In function `getrlimit':
/usr/include/sys/resource.h:168: warning: implicit declaration of function `__getrlimit64'
/usr/include/sys/resource.h: In function `setrlimit':
/usr/include/sys/resource.h:170: warning: implicit declaration of function `__setrlimit64'

for essentially every file in the system. A little digging shows that
this is happening because _FILE64 is defined and _LARGEFILE64_SOURCE
is not; this is evidently a Bad Idea on HPUX.

You're supposed to define _LARGEFILE64_SOURCE if you want to use functions
like open64(), fseek64(), getrlimit64(), etc. in your source. We don't
want those, obviously.

What is happening here is that evidently the system headers effectively
redefine getrlimit() to point to getrlimit64() if FILE_OFFSET_BITS=64,
which is the usual strategy for all the I/O functions. But you're not
supposed to have to define _LARGEFILE64_SOURCE for this, because the
change is supposed to be transparent.

If the {s|g}etrlimit warnings are indeed the only ones (i.e., none about
open, fseek, write, read, etc.) then this is either a bug or there's
something wrong in the include file order or something like that. Which
way is sys/resource.h included anyway?

If there's no way to fix it then we can add a definition of
_LARGEFILE64_SOURCE to hpux.h and consider further action.

--
Peter Eisentraut peter_e@gmx.net

#8Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Peter Eisentraut (#6)
Re: Large file support available

Peter Eisentraut wrote:

Tom Lane writes:

Also, even with configure --disable-largefile, I find that pg_config.h
still contains

/* Define to 1 to make fseeko visible on some hosts. */
#define _LARGEFILE_SOURCE 1

/* Define to 1 if fseeko (and presumably ftello) exists and is declared. */
#define HAVE_FSEEKO 1

This strikes me as probably Not a Good Thing, although I haven't dug to
see what the implications are.

This is harmless (until proven otherwise). fseeko() is identical to
fseek() except that the offset argument uses off_t, and _LARGEFILE_SOURCE
makes fseeko() and friends visible in the headers. That's all. No large
files involved.

I am confused. fseeko() doesn't look standard to me. I though
fgetpos/fsetpos() where the standard interfaces for large file support;
from BSD/OS:

The fgetpos(), fsetpos(), fseek(), ftell(), and rewind() functions con-
form to ANSI C X3.159-1989 (``ANSI C '').

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#7)
Re: Large file support available

Peter Eisentraut <peter_e@gmx.net> writes:

If the {s|g}etrlimit warnings are indeed the only ones (i.e., none about
open, fseek, write, read, etc.) then this is either a bug or there's
something wrong in the include file order or something like that.

No such luck. Here's a more complete excerpt of one typical failure:

gcc -O1 -Wall -Wmissing-prototypes -Wmissing-declarations -g -I../../../../src/include -c -o tuptoaster.o tuptoaster.c
In file included from /usr/include/sys/wait.h:83,
from /usr/local/lib/gcc-lib/hppa2.0-hp-hpux10.20/2.95.3/include/stdlib.h:231,
from ../../../../src/include/c.h:56,
from ../../../../src/include/postgres.h:47,
from tuptoaster.c:25:
/usr/include/sys/resource.h: In function `getrlimit':
/usr/include/sys/resource.h:168: warning: implicit declaration of function `__getrlimit64'
/usr/include/sys/resource.h: In function `setrlimit':
/usr/include/sys/resource.h:170: warning: implicit declaration of function `__setrlimit64'
In file included from /usr/include/unistd.h:11,
from tuptoaster.c:27:
/usr/include/sys/unistd.h: In function `truncate':
/usr/include/sys/unistd.h:539: warning: implicit declaration of function `__truncate64'
/usr/include/sys/unistd.h: In function `prealloc':
/usr/include/sys/unistd.h:543: warning: implicit declaration of function `__prealloc64'
/usr/include/sys/unistd.h: In function `lockf':
/usr/include/sys/unistd.h:544: warning: implicit declaration of function `__lockf64'
/usr/include/sys/unistd.h: In function `ftruncate':
/usr/include/sys/unistd.h:545: warning: implicit declaration of function `__ftruncate64'
In file included from /usr/include/fcntl.h:9,
from tuptoaster.c:28:
/usr/include/sys/fcntl.h: In function `open':
/usr/include/sys/fcntl.h:216: warning: implicit declaration of function `__open64'
/usr/include/sys/fcntl.h: In function `creat':
/usr/include/sys/fcntl.h:217: warning: implicit declaration of function `__creat64'

AFAICT a *lot* of HPUX headers expect you to #define _LARGEFILE64_SOURCE
if you want this stuff to work.

regards, tom lane

#10Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Bruce Momjian (#8)
Re: Large file support available

Peter, I have received no reply to this question.

---------------------------------------------------------------------------

Bruce Momjian wrote:

Peter Eisentraut wrote:

Tom Lane writes:

Also, even with configure --disable-largefile, I find that pg_config.h
still contains

/* Define to 1 to make fseeko visible on some hosts. */
#define _LARGEFILE_SOURCE 1

/* Define to 1 if fseeko (and presumably ftello) exists and is declared. */
#define HAVE_FSEEKO 1

This strikes me as probably Not a Good Thing, although I haven't dug to
see what the implications are.

This is harmless (until proven otherwise). fseeko() is identical to
fseek() except that the offset argument uses off_t, and _LARGEFILE_SOURCE
makes fseeko() and friends visible in the headers. That's all. No large
files involved.

I am confused. fseeko() doesn't look standard to me. I though
fgetpos/fsetpos() where the standard interfaces for large file support;
from BSD/OS:

The fgetpos(), fsetpos(), fseek(), ftell(), and rewind() functions con-
form to ANSI C X3.159-1989 (``ANSI C '').

-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#11Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Bruce Momjian (#10)
Re: Large file support available

OK, with no one replying to this, I will take it upon myself to resolve
this. According to the Mac OSX fseek() manual page:

The fgetpos(), fsetpos(), fseek(), ftell(), and rewind() functions
conform to ANSI X3.159-1989 (``ANSI C'').

The fseeko() and ftello() functions conform to Version 2 of the
Single UNIX Specification (``SUSv2'').

which basically says that we should be using fseek or preferably
fseekpos, not fseeko. I realize that the advantage of fseeko is that it
has the same API as fseek, but if we are going to fix this, we may as
well do it right and use fgetpos if we have it.

Is there anyone who has fseeko() but _not_ fsetpos()?

---------------------------------------------------------------------------

Bruce Momjian wrote:

Peter, I have received no reply to this question.

---------------------------------------------------------------------------

Bruce Momjian wrote:

Peter Eisentraut wrote:

Tom Lane writes:

Also, even with configure --disable-largefile, I find that pg_config.h
still contains

/* Define to 1 to make fseeko visible on some hosts. */
#define _LARGEFILE_SOURCE 1

/* Define to 1 if fseeko (and presumably ftello) exists and is declared. */
#define HAVE_FSEEKO 1

This strikes me as probably Not a Good Thing, although I haven't dug to
see what the implications are.

This is harmless (until proven otherwise). fseeko() is identical to
fseek() except that the offset argument uses off_t, and _LARGEFILE_SOURCE
makes fseeko() and friends visible in the headers. That's all. No large
files involved.

I am confused. fseeko() doesn't look standard to me. I though
fgetpos/fsetpos() where the standard interfaces for large file support;
from BSD/OS:

The fgetpos(), fsetpos(), fseek(), ftell(), and rewind() functions con-
form to ANSI C X3.159-1989 (``ANSI C '').

-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#11)
Re: Large file support available

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Is there anyone who has fseeko() but _not_ fsetpos()?

AFAICT this is completely irrelevant to large files.

regards, tom lane

#13Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#12)
Re: Large file support available

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Is there anyone who has fseeko() but _not_ fsetpos()?

AFAICT this is completely irrelevant to large files.

Please explain. At:

http://man.dnswatch.com/cgi-bin/htmlman?fseek+3

I see:

The fseeko() function is identical to fseek(), except it takes an off_t
argument instead of a long. Likewise, the ftello() function is identical
to ftell(), except it returns an off_t.

while fsetpos() is:

fsetpos(FILE *stream, const fpos_t *pos);

and presumably fpos_t handles long files too.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#13)
Re: Large file support available

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I see:
The fseeko() function is identical to fseek(), except it takes an off_t
argument instead of a long. Likewise, the ftello() function is identical
to ftell(), except it returns an off_t.

Indeed. Notice the complete lack of any commitment about the size of
off_t ...

while fsetpos() is:
fsetpos(FILE *stream, const fpos_t *pos);

... or the size of fpos_t.

You might find it illuminating to read this random extract from the
HPUX 10.20 man pages:

NAME
fgetpos64(), fopen64(), freopen64(), fseeko64(), fsetpos64(),
fstatvfsdev64(), ftello64(), ftw64(), nftw64(), statvfsdev64(),
tmpfile64() - non-POSIX standard API interfaces to support large
files.

DESCRIPTION
New API's to support large files. These API interfaces are not a part
of the POSIX standard and may be removed in the future.

fgetpos64() The fgetpos64() function is identical to
fgetpos() except that fgetpos64() returns the
position in a fpos64_t instead of a fpos_t. All
other functional behaviors, returns, and errors
are identical.

... etc ...

I don't see any reason to believe that fgetpos buys us anything but
notational inconvenience. It certainly doesn't buy large file support,
at least not without the same behind-the-scenes redefinitions needed for
fseek/fseeko and friends...

regards, tom lane

#15Mark Kirkwood
markir@slingshot.co.nz
In reply to: Bruce Momjian (#11)
Re: Large file support available

Bruce Momjian wrote:

OK, with no one replying to this, I will take it upon myself to resolve
this. According to the Mac OSX fseek() manual page:

The fgetpos(), fsetpos(), fseek(), ftell(), and rewind() functions
conform to ANSI X3.159-1989 (``ANSI C'').

The fseeko() and ftello() functions conform to Version 2 of the
Single UNIX Specification (``SUSv2'').

I might be veering *slightly* off the topic here, but since I got bitten
by this recently I thought I would mention it:

On Linux, and found that I needed

<#include asm/fcntl.h>
instead of
<#include fcntl.h>

when using lseek. I had expected defining _FILE_OFFSET_BITS=64 to sort
this (which it did not).

I think that this will only be an issue if folk want relation files to
be chunked at > 2G (or want to define LET_OS_MANAGE_FILES).

best wishes

Mark

#16Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#14)
Re: Large file support available

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I see:
The fseeko() function is identical to fseek(), except it takes an off_t
argument instead of a long. Likewise, the ftello() function is identical
to ftell(), except it returns an off_t.

Indeed. Notice the complete lack of any commitment about the size of
off_t ...

while fsetpos() is:
fsetpos(FILE *stream, const fpos_t *pos);

... or the size of fpos_t.

You might find it illuminating to read this random extract from the
HPUX 10.20 man pages:

...

I don't see any reason to believe that fgetpos buys us anything but
notational inconvenience. It certainly doesn't buy large file support,
at least not without the same behind-the-scenes redefinitions needed for
fseek/fseeko and friends...

Clearly there is the issues that fseek uses long, which isn't enough for
large file support. On BSD/OS, we have fsetpos, which is the way we do
large file support:

int
fseek(FILE *stream, long offset, int whence);

int
fsetpos(FILE *stream, const fpos_t *pos);

My point is that it seems fsetpos is the approved way of accessing large
files, rather than fseeko. In fact, I don't have fseeko here but I do
have fsetpos, and it does handle large files because my includes have
this:

typedef off_t fpos_t
typedef quad_t off_t;

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#17Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#16)
Re: Large file support available

Bruce Momjian writes:

My point is that it seems fsetpos is the approved way of accessing large
files, rather than fseeko. In fact, I don't have fseeko here but I do
have fsetpos, and it does handle large files because my includes have
this:

typedef off_t fpos_t
typedef quad_t off_t;

Interesting. In general, you can't rely on fpos_t being an integral type,
which indeed on my machine it isn't. But for pg_dump we need an integral
type because we do offset arithmetic.

--
Peter Eisentraut peter_e@gmx.net

#18Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Peter Eisentraut (#17)
Re: Large file support available

Peter Eisentraut wrote:

Bruce Momjian writes:

My point is that it seems fsetpos is the approved way of accessing large
files, rather than fseeko. In fact, I don't have fseeko here but I do
have fsetpos, and it does handle large files because my includes have
this:

typedef off_t fpos_t
typedef quad_t off_t;

Interesting. In general, you can't rely on fpos_t being an integral type,
which indeed on my machine it isn't. But for pg_dump we need an integral
type because we do offset arithmetic.

Oh, is that why fsetpos is always SEEK_SET and not SET_CURR or offset
stuff. Strange I don't have fseeko and do have large file support.
BSD/OS has had it for years. I guess they just do off_t arithmetic, but
that isn't portable.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073