64-bit API for large object

Started by Tatsuo Ishiiover 13 years ago49 messages
#1Tatsuo Ishii
ishii@postgresql.org

Hi,

I found this in the TODO list:

Add API for 64-bit large object access

If this is a still valid TODO item and nobody is working on this, I
would like to work in this.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#2Peter Eisentraut
peter_e@gmx.net
In reply to: Tatsuo Ishii (#1)
Re: 64-bit API for large object

On Wed, 2012-08-22 at 07:27 +0900, Tatsuo Ishii wrote:

I found this in the TODO list:

Add API for 64-bit large object access

If this is a still valid TODO item and nobody is working on this, I
would like to work in this.

Large objects are limited to 2 GB in size, so a 64-bit API doesn't sound
very useful to me at the moment.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#2)
Re: 64-bit API for large object

Peter Eisentraut <peter_e@gmx.net> writes:

On Wed, 2012-08-22 at 07:27 +0900, Tatsuo Ishii wrote:

I found this in the TODO list:
Add API for 64-bit large object access
If this is a still valid TODO item and nobody is working on this, I
would like to work in this.

Large objects are limited to 2 GB in size, so a 64-bit API doesn't sound
very useful to me at the moment.

Not entirely. pg_largeobject.pageno is int32, but that's still 2G pages
not bytes, so there's three or so orders of magnitude that could be
gotten by expanding the client-side API before we'd have to change the
server's on-disk representation.

There might well be some local variables in the server's largeobject
code that would need to be widened, but that's the easiest part of the
job.

regards, tom lane

#4Tatsuo Ishii
ishii@postgresql.org
In reply to: Tom Lane (#3)
Re: 64-bit API for large object

Large objects are limited to 2 GB in size, so a 64-bit API doesn't sound
very useful to me at the moment.

Not entirely. pg_largeobject.pageno is int32, but that's still 2G pages
not bytes, so there's three or so orders of magnitude that could be
gotten by expanding the client-side API before we'd have to change the
server's on-disk representation.

Right. You have already explained that in this:
http://archives.postgresql.org/pgsql-hackers/2010-09/msg01888.php

There might well be some local variables in the server's largeobject
code that would need to be widened, but that's the easiest part of the
job.

--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#5Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#3)
Re: 64-bit API for large object

On Wed, 2012-08-22 at 01:14 -0400, Tom Lane wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

On Wed, 2012-08-22 at 07:27 +0900, Tatsuo Ishii wrote:

I found this in the TODO list:
Add API for 64-bit large object access
If this is a still valid TODO item and nobody is working on this, I
would like to work in this.

Large objects are limited to 2 GB in size, so a 64-bit API doesn't sound
very useful to me at the moment.

Not entirely. pg_largeobject.pageno is int32, but that's still 2G pages
not bytes, so there's three or so orders of magnitude that could be
gotten by expanding the client-side API before we'd have to change the
server's on-disk representation.

Well then a 64-bit API would be very useful. Go for it. :-)

#6Tatsuo Ishii
ishii@postgresql.org
In reply to: Peter Eisentraut (#5)
Re: 64-bit API for large object

On Wed, 2012-08-22 at 01:14 -0400, Tom Lane wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

On Wed, 2012-08-22 at 07:27 +0900, Tatsuo Ishii wrote:

I found this in the TODO list:
Add API for 64-bit large object access
If this is a still valid TODO item and nobody is working on this, I
would like to work in this.

Large objects are limited to 2 GB in size, so a 64-bit API doesn't sound
very useful to me at the moment.

Not entirely. pg_largeobject.pageno is int32, but that's still 2G pages
not bytes, so there's three or so orders of magnitude that could be
gotten by expanding the client-side API before we'd have to change the
server's on-disk representation.

Well then a 64-bit API would be very useful. Go for it. :-)

Ok, I will do it.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#7Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#1)
Re: 64-bit API for large object

Hi,

I found this in the TODO list:

Add API for 64-bit large object access

If this is a still valid TODO item and nobody is working on this, I
would like to work in this.

Here are the list of functions think we need to change.

1) Frontend lo_* libpq functions(fe-lobj.c)

lo_initialize() need to get backend 64-bit large object handling
function's oid, namely lo_lseek64, lo_tell64, lo_truncate64, loread64,
lowrite64(explained later). If they are not available, use older
32-bit backend functions.

BTW, currently lo_initialize() throws an error if one of oids are not
avilable. I doubt we do the same way for 64-bit functions since this
will make 9.3 libpq unable to access large objects stored in pre-9.2
PostgreSQL servers.

2) Bakend lo_* functions (be-fsstubs.c)

Add lo_lseek64, lo_tell64, lo_truncate64, loread64 and lowrite64 so
that they can handle 64-bit seek position and data lenghth.

3) Backend inv_api.c functions

No need to add new functions. Just extend them to handle 64-bit data.

BTW , what will happen if older 32-bit libpq accesses large objects
over 2GB?

lo_read and lo_write: they can read or write lobjs using 32-bit API as
long as requested read/write data length is smaller than 2GB. So I
think we can safely allow them to access over 2GB lobjs.

lo_lseek: again as long as requested offset is smaller than 2GB, there
would be no problem.

lo_tell:if current seek position is beyond 2GB, returns an error.

Comments, suggestions?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#7)
Re: 64-bit API for large object

Tatsuo Ishii <ishii@postgresql.org> writes:

Here are the list of functions think we need to change.

1) Frontend lo_* libpq functions(fe-lobj.c)

lo_initialize() need to get backend 64-bit large object handling
function's oid, namely lo_lseek64, lo_tell64, lo_truncate64, loread64,
lowrite64(explained later). If they are not available, use older
32-bit backend functions.

I don't particularly see a need for loread64 or lowrite64. Who's going
to be reading or writing more than 2GB at once? If someone tries,
they'd be well advised to reconsider their code design anyway.

regards, tom lane

#9Tatsuo Ishii
ishii@postgresql.org
In reply to: Tom Lane (#8)
Re: 64-bit API for large object

1) Frontend lo_* libpq functions(fe-lobj.c)

lo_initialize() need to get backend 64-bit large object handling
function's oid, namely lo_lseek64, lo_tell64, lo_truncate64, loread64,
lowrite64(explained later). If they are not available, use older
32-bit backend functions.

I don't particularly see a need for loread64 or lowrite64. Who's going
to be reading or writing more than 2GB at once? If someone tries,
they'd be well advised to reconsider their code design anyway.

Ok, loread64 and lowrite64 will not be added.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#10Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#9)
Re: 64-bit API for large object

Correct me if I am wrong.

After expanding large object API to 64-bit, the max size of a large
object will be 8TB(assuming 8KB default BLKSZ).

large object max size = pageno(int32) * LOBLKSIZE
= (2^32-1) * (BLCKSZ / 4)
= (2^32-1) * (8192/4)
= 8TB

I just want to confirm my calculation is correct.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#10)
Re: 64-bit API for large object

Tatsuo Ishii <ishii@postgresql.org> writes:

Correct me if I am wrong.
After expanding large object API to 64-bit, the max size of a large
object will be 8TB(assuming 8KB default BLKSZ).

large object max size = pageno(int32) * LOBLKSIZE
= (2^32-1) * (BLCKSZ / 4)
= (2^32-1) * (8192/4)
= 8TB

I just want to confirm my calculation is correct.

pg_largeobject.pageno is a signed int, so I don't think we can let it go
past 2^31-1, so half that.

We could buy back the other bit if we redefined the column as oid
instead of int4 (to make it unsigned), but I think that would create
fairly considerable risk of confusion between the loid and pageno
columns (loid already being oid). I'd just as soon not go there,
at least not till we start seeing actual field complaints about
4TB being paltry ;-)

regards, tom lane

#12Tatsuo Ishii
ishii@postgresql.org
In reply to: Tom Lane (#11)
Re: 64-bit API for large object

pg_largeobject.pageno is a signed int, so I don't think we can let it go
past 2^31-1, so half that.

We could buy back the other bit if we redefined the column as oid
instead of int4 (to make it unsigned), but I think that would create
fairly considerable risk of confusion between the loid and pageno
columns (loid already being oid). I'd just as soon not go there,
at least not till we start seeing actual field complaints about
4TB being paltry ;-)

Agreed. 4TB should be enough.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#13Robert Haas
robertmhaas@gmail.com
In reply to: Tatsuo Ishii (#12)
Re: 64-bit API for large object

On Tue, Aug 28, 2012 at 10:51 PM, Tatsuo Ishii <ishii@postgresql.org> wrote:

pg_largeobject.pageno is a signed int, so I don't think we can let it go
past 2^31-1, so half that.

We could buy back the other bit if we redefined the column as oid
instead of int4 (to make it unsigned), but I think that would create
fairly considerable risk of confusion between the loid and pageno
columns (loid already being oid). I'd just as soon not go there,
at least not till we start seeing actual field complaints about
4TB being paltry ;-)

Agreed. 4TB should be enough.

...for anybody!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#9)
1 attachment(s)
Re: 64-bit API for large object

Ok, here is the patch to implement 64-bit API for large object, to
allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to
32KB). The patch is based on Jeremy Drake's patch posted on September
23, 2005
(http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php)
and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai
for the backend part and Yugo Nagata for the rest(including
documentation patch).

Here are changes made in the patch:

1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata)

lo_initialize() gathers backend 64-bit large object handling
function's oid, namely lo_lseek64, lo_tell64, lo_truncate64.

If client calls lo_*64 functions and backend does not support them,
lo_*64 functions return error to caller. There might be an argument
since calls to lo_*64 functions can automatically be redirected to
32-bit older API. I don't know this is worth the trouble though.

Currently lo_initialize() throws an error if one of oids are not
available. I doubt we do the same way for 64-bit functions since this
will make 9.3 libpq unable to access large objects stored in pre-9.2
PostgreSQL servers.

To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
is a pointer to 64-bit integer and actual data is placed somewhere
else. There might be other way: add new member to union u to store
64-bit integer:

typedef struct
{
int len;
int isint;
union
{
int *ptr; /* can't use void (dec compiler barfs) */
int integer;
int64 bigint; /* 64-bit integer */
} u;
} PQArgBlock;

I'm a little bit worried about this way because PQArgBlock is a public
interface.

Also we add new type "pg_int64":

#ifndef NO_PG_INT64
#define HAVE_PG_INT64 1
typedef long long int pg_int64;
#endif

in postgres_ext.h per suggestion from Tom Lane:
http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php

2) Backend lo_* functions (be-fsstubs.c)(Nozomi Anzai)

Add lo_lseek64, lo_tell64, lo_truncate64 so that they can handle
64-bit seek position and data length. loread64 and lowrite64 are not
added because if a program tries to read/write more than 2GB at once,
it would be a sign that the program need to be re-designed anyway.

3) Backend inv_api.c functions(Nozomi Anzai)

No need to add new functions. Just extend them to handle 64-bit data.

BTW , what will happen if older 32-bit libpq accesses large objects
over 2GB?

lo_read and lo_write: they can read or write lobjs using 32-bit API as
long as requested read/write data length is smaller than 2GB. So I
think we can safely allow them to access over 2GB lobjs.

lo_lseek: again as long as requested offset is smaller than 2GB, there
would be no problem.

lo_tell:if current seek position is beyond 2GB, returns an error.

4) src/test/examples/testlo64.c added for 64-bit API example(Yugo Nagata)

Comments and suggestions are welcome.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Attachments:

lobj64.patch.gzapplication/octet-streamDownload
#15Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Tatsuo Ishii (#14)
Re: 64-bit API for large object

I checked this patch. It can be applied onto the latest master branch
without any problems. My comments are below.

2012/9/11 Tatsuo Ishii <ishii@postgresql.org>:

Ok, here is the patch to implement 64-bit API for large object, to
allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to
32KB). The patch is based on Jeremy Drake's patch posted on September
23, 2005
(http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php)
and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai
for the backend part and Yugo Nagata for the rest(including
documentation patch).

Here are changes made in the patch:

1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata)

lo_initialize() gathers backend 64-bit large object handling
function's oid, namely lo_lseek64, lo_tell64, lo_truncate64.

If client calls lo_*64 functions and backend does not support them,
lo_*64 functions return error to caller. There might be an argument
since calls to lo_*64 functions can automatically be redirected to
32-bit older API. I don't know this is worth the trouble though.

I think it should definitely return an error code when user tries to
use lo_*64 functions towards the backend v9.2 or older, because
fallback to 32bit API can raise unexpected errors if application
intends to seek the area over than 2GB.

Currently lo_initialize() throws an error if one of oids are not
available. I doubt we do the same way for 64-bit functions since this
will make 9.3 libpq unable to access large objects stored in pre-9.2
PostgreSQL servers.

It seems to me the situation to split the case of pre-9.2 and post-9.3
using a condition of "conn->sversion >= 90300".

To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
is a pointer to 64-bit integer and actual data is placed somewhere
else. There might be other way: add new member to union u to store
64-bit integer:

typedef struct
{
int len;
int isint;
union
{
int *ptr; /* can't use void (dec compiler barfs) */
int integer;
int64 bigint; /* 64-bit integer */
} u;
} PQArgBlock;

I'm a little bit worried about this way because PQArgBlock is a public
interface.

I'm inclined to add a new field for the union; that seems to me straight
forward approach.
For example, the manner in lo_seek64() seems to me confusable.
It set 1 on "isint" field even though pointer is delivered actually.

+       argv[1].isint = 1;
+       argv[1].len = 8;
+       argv[1].u.ptr = (int *) &len;

Also we add new type "pg_int64":

#ifndef NO_PG_INT64
#define HAVE_PG_INT64 1
typedef long long int pg_int64;
#endif

in postgres_ext.h per suggestion from Tom Lane:
http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php

I'm uncertain about context of this discussion.

Does it make matter if we include <stdint.h> and use int64_t instead
of the self defined data type?

2) Backend lo_* functions (be-fsstubs.c)(Nozomi Anzai)

Add lo_lseek64, lo_tell64, lo_truncate64 so that they can handle
64-bit seek position and data length. loread64 and lowrite64 are not
added because if a program tries to read/write more than 2GB at once,
it would be a sign that the program need to be re-designed anyway.

I think it is a reasonable.

3) Backend inv_api.c functions(Nozomi Anzai)

No need to add new functions. Just extend them to handle 64-bit data.

BTW , what will happen if older 32-bit libpq accesses large objects
over 2GB?

lo_read and lo_write: they can read or write lobjs using 32-bit API as
long as requested read/write data length is smaller than 2GB. So I
think we can safely allow them to access over 2GB lobjs.

lo_lseek: again as long as requested offset is smaller than 2GB, there
would be no problem.

lo_tell:if current seek position is beyond 2GB, returns an error.

Even though iteration of lo_lseek() may move the offset to 4TB, it also
makes unavailable to use lo_tell() to obtain the current offset, so I think
it is reasonable behavior.

However, error code is not an appropriate one.

+       if (INT_MAX < offset)
+       {
+               ereport(ERROR,
+                               (errcode(ERRCODE_UNDEFINED_OBJECT),
+                                errmsg("invalid large-object
descriptor: %d", fd)));
+               PG_RETURN_INT32(-1);
+       }

According to the manpage of lseek(2)
EOVERFLOW
The resulting file offset cannot be represented in an off_t.

Please add a new error code such as ERRCODE_BLOB_OFFSET_OVERFLOW.

4) src/test/examples/testlo64.c added for 64-bit API example(Yugo Nagata)

Comments and suggestions are welcome.

miscellaneous comments are below.

Regression test is helpful. Even though no need to try to create 4TB large
object, it is helpful to write some chunks around the design boundary.
Could you add some test cases that writes some chunks around 4TB offset.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#16Nozomi Anzai
anzai@sraoss.co.jp
In reply to: Kohei KaiGai (#15)
Re: 64-bit API for large object

3) Backend inv_api.c functions(Nozomi Anzai)

No need to add new functions. Just extend them to handle 64-bit data.

BTW , what will happen if older 32-bit libpq accesses large objects
over 2GB?

lo_read and lo_write: they can read or write lobjs using 32-bit API as
long as requested read/write data length is smaller than 2GB. So I
think we can safely allow them to access over 2GB lobjs.

lo_lseek: again as long as requested offset is smaller than 2GB, there
would be no problem.

lo_tell:if current seek position is beyond 2GB, returns an error.

Even though iteration of lo_lseek() may move the offset to 4TB, it also
makes unavailable to use lo_tell() to obtain the current offset, so I think
it is reasonable behavior.

However, error code is not an appropriate one.

+       if (INT_MAX < offset)
+       {
+               ereport(ERROR,
+                               (errcode(ERRCODE_UNDEFINED_OBJECT),
+                                errmsg("invalid large-object
descriptor: %d", fd)));
+               PG_RETURN_INT32(-1);
+       }

According to the manpage of lseek(2)
EOVERFLOW
The resulting file offset cannot be represented in an off_t.

Please add a new error code such as ERRCODE_BLOB_OFFSET_OVERFLOW.

Agreed.

--
Nozomi Anzai
SRA OSS, Inc. Japan

#17Tatsuo Ishii
ishii@postgresql.org
In reply to: Kohei KaiGai (#15)
Re: 64-bit API for large object

To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
is a pointer to 64-bit integer and actual data is placed somewhere
else. There might be other way: add new member to union u to store
64-bit integer:

typedef struct
{
int len;
int isint;
union
{
int *ptr; /* can't use void (dec compiler barfs) */
int integer;
int64 bigint; /* 64-bit integer */
} u;
} PQArgBlock;

I'm a little bit worried about this way because PQArgBlock is a public
interface.

I'm inclined to add a new field for the union; that seems to me straight
forward approach.
For example, the manner in lo_seek64() seems to me confusable.
It set 1 on "isint" field even though pointer is delivered actually.

+       argv[1].isint = 1;
+       argv[1].len = 8;
+       argv[1].u.ptr = (int *) &len;

I have to admit that this is confusing. However I'm worring about
changing sizeof(PQArgBlock) from compatibility's point of view. Maybe
I'm just a paranoia though.

Also we add new type "pg_int64":

#ifndef NO_PG_INT64
#define HAVE_PG_INT64 1
typedef long long int pg_int64;
#endif

in postgres_ext.h per suggestion from Tom Lane:
http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php

I'm uncertain about context of this discussion.

Does it make matter if we include <stdint.h> and use int64_t instead
of the self defined data type?

I think Tom's point is, there are tons of applications which define
their own "int64_t" (at least in 2005).
Also pg_config.h has:

#define HAVE_STDINT_H 1

and this suggests that PostgreSQL adopts to platforms which does not
have stdint.h. If so, we need to take care of such platforms anyway.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#17)
Re: 64-bit API for large object

Tatsuo Ishii <ishii@postgresql.org> writes:

To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
is a pointer to 64-bit integer and actual data is placed somewhere
else.

Yeah, I think we have to do it like that. Changing the size of
PQArgBlock would be a libpq ABI break, which IMO is sufficiently painful
to kill this whole proposal. Much better a little localized ugliness
in fe-lobj.c.

regards, tom lane

#19Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Tom Lane (#18)
Re: 64-bit API for large object

I think Tom's point is, there are tons of applications which define
their own "int64_t" (at least in 2005).
Also pg_config.h has:

#define HAVE_STDINT_H 1

and this suggests that PostgreSQL adopts to platforms which does not
have stdint.h. If so, we need to take care of such platforms anyway.

OK, it makes me clear. It might be helpful a source code comment
to remain why we used self defined datatype here.

2012/9/21 Tom Lane <tgl@sss.pgh.pa.us>:

Tatsuo Ishii <ishii@postgresql.org> writes:

To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
is a pointer to 64-bit integer and actual data is placed somewhere
else.

Yeah, I think we have to do it like that. Changing the size of
PQArgBlock would be a libpq ABI break, which IMO is sufficiently painful
to kill this whole proposal. Much better a little localized ugliness
in fe-lobj.c.

Hmm, I see. Please deliver the 64bit integer argument as reference,
and don't forget endian translations here.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#20Yugo Nagata
nagata@sraoss.co.jp
In reply to: Kohei KaiGai (#15)
Re: 64-bit API for large object

Currently lo_initialize() throws an error if one of oids are not
available. I doubt we do the same way for 64-bit functions since this
will make 9.3 libpq unable to access large objects stored in pre-9.2
PostgreSQL servers.

It seems to me the situation to split the case of pre-9.2 and post-9.3
using a condition of "conn->sversion >= 90300".

Agreed. I'll fix it like that.

4) src/test/examples/testlo64.c added for 64-bit API example(Yugo Nagata)

Comments and suggestions are welcome.

miscellaneous comments are below.

Regression test is helpful. Even though no need to try to create 4TB large
object, it is helpful to write some chunks around the design boundary.
Could you add some test cases that writes some chunks around 4TB offset.

Agreed. I'll do that.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Yugo Nagata <nagata@sraoss.co.jp>

#21Tatsuo Ishii
ishii@postgresql.org
In reply to: Kohei KaiGai (#19)
Re: 64-bit API for large object

I think Tom's point is, there are tons of applications which define
their own "int64_t" (at least in 2005).
Also pg_config.h has:

#define HAVE_STDINT_H 1

and this suggests that PostgreSQL adopts to platforms which does not
have stdint.h. If so, we need to take care of such platforms anyway.

OK, it makes me clear. It might be helpful a source code comment
to remain why we used self defined datatype here.

Ok.

2012/9/21 Tom Lane <tgl@sss.pgh.pa.us>:

Tatsuo Ishii <ishii@postgresql.org> writes:

To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
is a pointer to 64-bit integer and actual data is placed somewhere
else.

Yeah, I think we have to do it like that. Changing the size of
PQArgBlock would be a libpq ABI break, which IMO is sufficiently painful
to kill this whole proposal. Much better a little localized ugliness
in fe-lobj.c.

Hmm, I see. Please deliver the 64bit integer argument as reference,
and don't forget endian translations here.

I thought pgPutInt64() takes care of endianness. No?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#22Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Tatsuo Ishii (#21)
Re: 64-bit API for large object

2012/9/21 Tatsuo Ishii <ishii@postgresql.org>:

I think Tom's point is, there are tons of applications which define
their own "int64_t" (at least in 2005).
Also pg_config.h has:

#define HAVE_STDINT_H 1

and this suggests that PostgreSQL adopts to platforms which does not
have stdint.h. If so, we need to take care of such platforms anyway.

OK, it makes me clear. It might be helpful a source code comment
to remain why we used self defined datatype here.

Ok.

2012/9/21 Tom Lane <tgl@sss.pgh.pa.us>:

Tatsuo Ishii <ishii@postgresql.org> writes:

To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
is a pointer to 64-bit integer and actual data is placed somewhere
else.

Yeah, I think we have to do it like that. Changing the size of
PQArgBlock would be a libpq ABI break, which IMO is sufficiently painful
to kill this whole proposal. Much better a little localized ugliness
in fe-lobj.c.

Hmm, I see. Please deliver the 64bit integer argument as reference,
and don't forget endian translations here.

I thought pgPutInt64() takes care of endianness. No?

It works inside of the PGfn(), when isint = 1 towards pointer data type.
In my sense, it is a bit problem specific solution.

So, I'd like to see other person's opinion here.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#23Tatsuo Ishii
ishii@postgresql.org
In reply to: Kohei KaiGai (#22)
Re: 64-bit API for large object

Hmm, I see. Please deliver the 64bit integer argument as reference,
and don't forget endian translations here.

I thought pgPutInt64() takes care of endianness. No?

It works inside of the PGfn(), when isint = 1 towards pointer data type.
In my sense, it is a bit problem specific solution.

So, I'd like to see other person's opinion here.

I think we cannot change this because we want to keep the counter part
backend side function pq_getmsgint64() as it is (the function is not
part of the patch).
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#24Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Tatsuo Ishii (#23)
Re: 64-bit API for large object

2012/9/21 Tatsuo Ishii <ishii@postgresql.org>:

Hmm, I see. Please deliver the 64bit integer argument as reference,
and don't forget endian translations here.

I thought pgPutInt64() takes care of endianness. No?

It works inside of the PGfn(), when isint = 1 towards pointer data type.
In my sense, it is a bit problem specific solution.

So, I'd like to see other person's opinion here.

I think we cannot change this because we want to keep the counter part
backend side function pq_getmsgint64() as it is (the function is not
part of the patch).

My opinion is lo_lseek64() and lo_tell64() should handle endian translation
prior and next to PQfn() invocation; to avoid the int64 specific case-handling
inside of PQfn() that can be called by other applications.

Am I missing something?

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#25Tatsuo Ishii
ishii@postgresql.org
In reply to: Kohei KaiGai (#24)
Re: 64-bit API for large object

I thought pgPutInt64() takes care of endianness. No?

It works inside of the PGfn(), when isint = 1 towards pointer data type.
In my sense, it is a bit problem specific solution.

So, I'd like to see other person's opinion here.

I think we cannot change this because we want to keep the counter part
backend side function pq_getmsgint64() as it is (the function is not
part of the patch).

My opinion is lo_lseek64() and lo_tell64() should handle endian translation
prior and next to PQfn() invocation; to avoid the int64 specific case-handling
inside of PQfn() that can be called by other applications.

Am I missing something?

So what do you want to do with pq_getmsgint64()? It exactly does the
same thing as pqPutInt64(), just in opposit direction. Do you want to
change pq_getmsgint64()? Or add new function in backend?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#26Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Tatsuo Ishii (#25)
Re: 64-bit API for large object

2012/9/21 Tatsuo Ishii <ishii@postgresql.org>:

I thought pgPutInt64() takes care of endianness. No?

It works inside of the PGfn(), when isint = 1 towards pointer data type.
In my sense, it is a bit problem specific solution.

So, I'd like to see other person's opinion here.

I think we cannot change this because we want to keep the counter part
backend side function pq_getmsgint64() as it is (the function is not
part of the patch).

My opinion is lo_lseek64() and lo_tell64() should handle endian translation
prior and next to PQfn() invocation; to avoid the int64 specific case-handling
inside of PQfn() that can be called by other applications.

Am I missing something?

So what do you want to do with pq_getmsgint64()? It exactly does the
same thing as pqPutInt64(), just in opposit direction. Do you want to
change pq_getmsgint64()? Or add new function in backend?

My preference is nothing are changed both pg_getmsgint64() of the backend
and routines under PQfn() of the libpq. Isn't it unavailable to deliver int64-
value "after" the endian translation on the caller side?

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#27Tatsuo Ishii
ishii@postgresql.org
In reply to: Kohei KaiGai (#26)
Re: 64-bit API for large object

2012/9/21 Tatsuo Ishii <ishii@postgresql.org>:

I thought pgPutInt64() takes care of endianness. No?

It works inside of the PGfn(), when isint = 1 towards pointer data type.
In my sense, it is a bit problem specific solution.

So, I'd like to see other person's opinion here.

I think we cannot change this because we want to keep the counter part
backend side function pq_getmsgint64() as it is (the function is not
part of the patch).

My opinion is lo_lseek64() and lo_tell64() should handle endian translation
prior and next to PQfn() invocation; to avoid the int64 specific case-handling
inside of PQfn() that can be called by other applications.

Am I missing something?

So what do you want to do with pq_getmsgint64()? It exactly does the
same thing as pqPutInt64(), just in opposit direction. Do you want to
change pq_getmsgint64()? Or add new function in backend?

My preference is nothing are changed both pg_getmsgint64() of the backend
and routines under PQfn() of the libpq. Isn't it unavailable to deliver int64-
value "after" the endian translation on the caller side?

I am confused.

My opinion is lo_lseek64() and lo_tell64() should handle endian translation
prior and next to PQfn() invocation; to avoid the int64 specific case-handling
inside of PQfn() that can be called by other applications.

Why do we need this? If PQArgBlock.isint != 0, it treats input data as
integer anyway. So I don't see any use case other than "int64 specific
case-handling" if isint != 0 and len == 8. If you have other use case
for isint != 0 and len == 8, please show it.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kohei KaiGai (#26)
Re: 64-bit API for large object

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

My preference is nothing are changed both pg_getmsgint64() of the backend
and routines under PQfn() of the libpq. Isn't it unavailable to deliver int64-
value "after" the endian translation on the caller side?

Right. If we had to change anything on the backend side, it would mean
we had a wire protocol change, which is even less acceptable than a
libpq ABI change.

regards, tom lane

#29Tatsuo Ishii
ishii@postgresql.org
In reply to: Tom Lane (#28)
Re: 64-bit API for large object

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

My preference is nothing are changed both pg_getmsgint64() of the backend
and routines under PQfn() of the libpq. Isn't it unavailable to deliver int64-
value "after" the endian translation on the caller side?

Right. If we had to change anything on the backend side, it would mean
we had a wire protocol change, which is even less acceptable than a
libpq ABI change.

The patch does not touch pg_getmsgint64() and I don't think we are not
going have a wire protocol change.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#30Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Tatsuo Ishii (#29)
Re: 64-bit API for large object

2012/9/21 Tatsuo Ishii <ishii@postgresql.org>:

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

My preference is nothing are changed both pg_getmsgint64() of the backend
and routines under PQfn() of the libpq. Isn't it unavailable to deliver int64-
value "after" the endian translation on the caller side?

Right. If we had to change anything on the backend side, it would mean
we had a wire protocol change, which is even less acceptable than a
libpq ABI change.

The patch does not touch pg_getmsgint64() and I don't think we are not
going have a wire protocol change.

It's also uncertain what portion does Tom said "right" for...

What I pointed out is this patch adds a special case handling on pqFunctionCall3
of libpq to fetch 64bit-integer from PQArgBlock->u.ptr and adjust endian orders.
It is never the topic on backend side.

It is not a technical problem, but I feel a bit strange coding style.
So, I don't want to against it so much.

Tom, could you give us a suggestion which manner is better approach; whether
the PQfn should have responsibility for endian translation of 64bit-integer, or
callers (lo_tell64 or lo_seek64)?

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kohei KaiGai (#30)
Re: 64-bit API for large object

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

Tom, could you give us a suggestion which manner is better approach; whether
the PQfn should have responsibility for endian translation of 64bit-integer, or
callers (lo_tell64 or lo_seek64)?

Adding anything inside pqFunctionCall is useless, unless we were to add
an int64 variant to PQArgBlock, which isn't a good idea because it will
be an ABI break. The functions in fe-lobj.c have to set up the int64
value as if it were pass-by-reference, which means dealing with
endianness concerns there.

regards, tom lane

#32Tatsuo Ishii
ishii@postgresql.org
In reply to: Tom Lane (#31)
Re: 64-bit API for large object

Tom, Kaigai,

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

Tom, could you give us a suggestion which manner is better approach; whether
the PQfn should have responsibility for endian translation of 64bit-integer, or
callers (lo_tell64 or lo_seek64)?

Adding anything inside pqFunctionCall is useless, unless we were to add
an int64 variant to PQArgBlock, which isn't a good idea because it will
be an ABI break. The functions in fe-lobj.c have to set up the int64
value as if it were pass-by-reference, which means dealing with
endianness concerns there.

I just want to make sure you guy's point.

We do not modify pqFunctionCall. That means PQfn does not accept
PQArgBlock.isint != 0 and PQArgBlock.len == 8 case. If a PQfn caller
wants to send 64-bit integer, it should set PQArgBlock.isint = 0 and
PQArgBlock.len = 8 and set data pass-by-reference. Endianness should
be taken care by the PQfn caller. Also we do not modify fe-misc.c
because there's no point to add pqPutint64/pqGetint64(they are called
from pqFunctionCall in the patch).
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#33Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#32)
Re: 64-bit API for large object

Tom, Kaigai,

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

Tom, could you give us a suggestion which manner is better approach; whether
the PQfn should have responsibility for endian translation of 64bit-integer, or
callers (lo_tell64 or lo_seek64)?

Adding anything inside pqFunctionCall is useless, unless we were to add
an int64 variant to PQArgBlock, which isn't a good idea because it will
be an ABI break. The functions in fe-lobj.c have to set up the int64
value as if it were pass-by-reference, which means dealing with
endianness concerns there.

I just want to make sure you guy's point.

We do not modify pqFunctionCall. That means PQfn does not accept
PQArgBlock.isint != 0 and PQArgBlock.len == 8 case. If a PQfn caller
wants to send 64-bit integer, it should set PQArgBlock.isint = 0 and
PQArgBlock.len = 8 and set data pass-by-reference. Endianness should
be taken care by the PQfn caller. Also we do not modify fe-misc.c
because there's no point to add pqPutint64/pqGetint64(they are called
from pqFunctionCall in the patch).

Oops. There's no such a function pqGetint64 in the patch. 64-bit int
case is taken care inside pqGetint.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#34Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Tatsuo Ishii (#32)
Re: 64-bit API for large object

2012/9/22 Tatsuo Ishii <ishii@postgresql.org>:

Tom, Kaigai,

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

Tom, could you give us a suggestion which manner is better approach; whether
the PQfn should have responsibility for endian translation of 64bit-integer, or
callers (lo_tell64 or lo_seek64)?

Adding anything inside pqFunctionCall is useless, unless we were to add
an int64 variant to PQArgBlock, which isn't a good idea because it will
be an ABI break. The functions in fe-lobj.c have to set up the int64
value as if it were pass-by-reference, which means dealing with
endianness concerns there.

I just want to make sure you guy's point.

We do not modify pqFunctionCall. That means PQfn does not accept
PQArgBlock.isint != 0 and PQArgBlock.len == 8 case. If a PQfn caller
wants to send 64-bit integer, it should set PQArgBlock.isint = 0 and
PQArgBlock.len = 8 and set data pass-by-reference. Endianness should
be taken care by the PQfn caller. Also we do not modify fe-misc.c
because there's no point to add pqPutint64/pqGetint64(they are called
from pqFunctionCall in the patch).

Yes, it is exactly what I suggested.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#35Tatsuo Ishii
ishii@postgresql.org
In reply to: Kohei KaiGai (#34)
Re: 64-bit API for large object

2012/9/22 Tatsuo Ishii <ishii@postgresql.org>:

Tom, Kaigai,

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

Tom, could you give us a suggestion which manner is better approach; whether
the PQfn should have responsibility for endian translation of 64bit-integer, or
callers (lo_tell64 or lo_seek64)?

Adding anything inside pqFunctionCall is useless, unless we were to add
an int64 variant to PQArgBlock, which isn't a good idea because it will
be an ABI break. The functions in fe-lobj.c have to set up the int64
value as if it were pass-by-reference, which means dealing with
endianness concerns there.

I just want to make sure you guy's point.

We do not modify pqFunctionCall. That means PQfn does not accept
PQArgBlock.isint != 0 and PQArgBlock.len == 8 case. If a PQfn caller
wants to send 64-bit integer, it should set PQArgBlock.isint = 0 and
PQArgBlock.len = 8 and set data pass-by-reference. Endianness should
be taken care by the PQfn caller. Also we do not modify fe-misc.c
because there's no point to add pqPutint64/pqGetint64(they are called
from pqFunctionCall in the patch).

Yes, it is exactly what I suggested.

Thanks for the confirmation!
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#36Nozomi Anzai
anzai@sraoss.co.jp
In reply to: Kohei KaiGai (#15)
1 attachment(s)
Re: 64-bit API for large object

Here is 64-bit API for large object version 2 patch.

I checked this patch. It can be applied onto the latest master branch
without any problems. My comments are below.

2012/9/11 Tatsuo Ishii <ishii@postgresql.org>:

Ok, here is the patch to implement 64-bit API for large object, to
allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to
32KB). The patch is based on Jeremy Drake's patch posted on September
23, 2005
(http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php)
and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai
for the backend part and Yugo Nagata for the rest(including
documentation patch).

Here are changes made in the patch:

1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata)

lo_initialize() gathers backend 64-bit large object handling
function's oid, namely lo_lseek64, lo_tell64, lo_truncate64.

If client calls lo_*64 functions and backend does not support them,
lo_*64 functions return error to caller. There might be an argument
since calls to lo_*64 functions can automatically be redirected to
32-bit older API. I don't know this is worth the trouble though.

I think it should definitely return an error code when user tries to
use lo_*64 functions towards the backend v9.2 or older, because
fallback to 32bit API can raise unexpected errors if application
intends to seek the area over than 2GB.

Currently lo_initialize() throws an error if one of oids are not
available. I doubt we do the same way for 64-bit functions since this
will make 9.3 libpq unable to access large objects stored in pre-9.2
PostgreSQL servers.

It seems to me the situation to split the case of pre-9.2 and post-9.3
using a condition of "conn->sversion >= 90300".

Fixed so, and tested it by deleteing the lo_tell64's row from pg_proc.

To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
is a pointer to 64-bit integer and actual data is placed somewhere
else. There might be other way: add new member to union u to store
64-bit integer:

typedef struct
{
int len;
int isint;
union
{
int *ptr; /* can't use void (dec compiler barfs) */
int integer;
int64 bigint; /* 64-bit integer */
} u;
} PQArgBlock;

I'm a little bit worried about this way because PQArgBlock is a public
interface.

I'm inclined to add a new field for the union; that seems to me straight
forward approach.
For example, the manner in lo_seek64() seems to me confusable.
It set 1 on "isint" field even though pointer is delivered actually.

+       argv[1].isint = 1;
+       argv[1].len = 8;
+       argv[1].u.ptr = (int *) &len;

Your proposal was not adopted per discussion.

Also we add new type "pg_int64":

#ifndef NO_PG_INT64
#define HAVE_PG_INT64 1
typedef long long int pg_int64;
#endif

in postgres_ext.h per suggestion from Tom Lane:
http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php

I'm uncertain about context of this discussion.

Does it make matter if we include <stdint.h> and use int64_t instead
of the self defined data type?

Your proposal was not adopted per discussion.
Per discussion, endiannness translation was moved to fe-lobj.c.

2) Backend lo_* functions (be-fsstubs.c)(Nozomi Anzai)

Add lo_lseek64, lo_tell64, lo_truncate64 so that they can handle
64-bit seek position and data length. loread64 and lowrite64 are not
added because if a program tries to read/write more than 2GB at once,
it would be a sign that the program need to be re-designed anyway.

I think it is a reasonable.

3) Backend inv_api.c functions(Nozomi Anzai)

No need to add new functions. Just extend them to handle 64-bit data.

BTW , what will happen if older 32-bit libpq accesses large objects
over 2GB?

lo_read and lo_write: they can read or write lobjs using 32-bit API as
long as requested read/write data length is smaller than 2GB. So I
think we can safely allow them to access over 2GB lobjs.

lo_lseek: again as long as requested offset is smaller than 2GB, there
would be no problem.

lo_tell:if current seek position is beyond 2GB, returns an error.

Even though iteration of lo_lseek() may move the offset to 4TB, it also
makes unavailable to use lo_tell() to obtain the current offset, so I think
it is reasonable behavior.

However, error code is not an appropriate one.

+       if (INT_MAX < offset)
+       {
+               ereport(ERROR,
+                               (errcode(ERRCODE_UNDEFINED_OBJECT),
+                                errmsg("invalid large-object
descriptor: %d", fd)));
+               PG_RETURN_INT32(-1);
+       }

According to the manpage of lseek(2)
EOVERFLOW
The resulting file offset cannot be represented in an off_t.

Please add a new error code such as ERRCODE_BLOB_OFFSET_OVERFLOW.

Changed the error code and error message. We added a new error code
"ERRCODE_UNDEFINED_OBJECT (22P07)".

4) src/test/examples/testlo64.c added for 64-bit API example(Yugo Nagata)

Comments and suggestions are welcome.

miscellaneous comments are below.

Regression test is helpful. Even though no need to try to create 4TB large
object, it is helpful to write some chunks around the design boundary.
Could you add some test cases that writes some chunks around 4TB offset.

Added 64-bit lobj test items into regression test and confirmed it worked
rightly.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Nozomi Anzai
SRA OSS, Inc. Japan

Attachments:

lobj64-v2.patchapplication/octet-stream; name=lobj64-v2.patchDownload
diff --git a/doc/src/sgml/lobj.sgml b/doc/src/sgml/lobj.sgml
index 291409f..cd8eb44 100644
--- a/doc/src/sgml/lobj.sgml
+++ b/doc/src/sgml/lobj.sgml
@@ -41,7 +41,7 @@
     larger than a single database page into a secondary storage area per table.
     This makes the large object facility partially obsolete.  One
     remaining advantage of the large object facility is that it allows values
-    up to 2 GB in size, whereas <acronym>TOAST</acronym>ed fields can be at
+    up to 4 TB in size, whereas <acronym>TOAST</acronym>ed fields can be at
     most 1 GB.  Also, large objects can be randomly modified using a read/write
     API that is more efficient than performing such operations using
     <acronym>TOAST</acronym>.
@@ -312,6 +312,7 @@ int lo_read(PGconn *conn, int fd, char *buf, size_t len);
      large object descriptor, call
 <synopsis>
 int lo_lseek(PGconn *conn, int fd, int offset, int whence);
+pg_int64 lo_lseek64(PGconn *conn, int fd, pg_int64 offset, int whence);
 </synopsis>
      <indexterm><primary>lo_lseek</></> This function moves the
      current location pointer for the large object descriptor identified by
@@ -321,6 +322,9 @@ int lo_lseek(PGconn *conn, int fd, int offset, int whence);
      <symbol>SEEK_CUR</> (seek from current position), and
      <symbol>SEEK_END</> (seek from object end).  The return value is
      the new location pointer, or -1 on error.
+     <indexterm><primary>lo_lseek64</></> <function>lo_lseek64</function>
+     is a function for large objects larger than 2GB. <symbol>pg_int64</>
+     is defined as 8-byte integer type.
 </para>
 </sect2>
 
@@ -332,9 +336,12 @@ int lo_lseek(PGconn *conn, int fd, int offset, int whence);
      call
 <synopsis>
 int lo_tell(PGconn *conn, int fd);
+pg_int64 lo_tell64(PGconn *conn, int fd);
 </synopsis>
      <indexterm><primary>lo_tell</></> If there is an error, the
      return value is negative.
+     <indexterm><primary>lo_tell64</></> <function>lo_tell64</function> is
+     a function for large objects larger than 2GB.
 </para>
 </sect2>
 
@@ -345,6 +352,7 @@ int lo_tell(PGconn *conn, int fd);
      To truncate a large object to a given length, call
 <synopsis>
 int lo_truncate(PGcon *conn, int fd, size_t len);
+int lo_truncate64(PGcon *conn, int fd, pg_int64 len);
 </synopsis>
      <indexterm><primary>lo_truncate</></> truncates the large object
      descriptor <parameter>fd</> to length <parameter>len</>.  The
@@ -352,6 +360,8 @@ int lo_truncate(PGcon *conn, int fd, size_t len);
      previous <function>lo_open</function>.  If <parameter>len</> is
      greater than the current large object length, the large object
      is extended with null bytes ('\0').
+     <indexterm><primary>lo_truncate64</></> <function>lo_truncate64</function>
+     is a function for large objects larger than 2GB.
 </para>
 
 <para>
diff --git a/src/backend/libpq/be-fsstubs.c b/src/backend/libpq/be-fsstubs.c
index 6f7e474..4bc81ba 100644
--- a/src/backend/libpq/be-fsstubs.c
+++ b/src/backend/libpq/be-fsstubs.c
@@ -39,6 +39,7 @@
 #include "postgres.h"
 
 #include <fcntl.h>
+#include <limits.h>
 #include <sys/stat.h>
 #include <unistd.h>
 
@@ -216,7 +217,7 @@ lo_lseek(PG_FUNCTION_ARGS)
 	int32		fd = PG_GETARG_INT32(0);
 	int32		offset = PG_GETARG_INT32(1);
 	int32		whence = PG_GETARG_INT32(2);
-	int			status;
+	int64		status;
 
 	if (fd < 0 || fd >= cookies_size || cookies[fd] == NULL)
 		ereport(ERROR,
@@ -225,9 +226,45 @@ lo_lseek(PG_FUNCTION_ARGS)
 
 	status = inv_seek(cookies[fd], offset, whence);
 
+	if (INT_MAX < status)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_BLOB_OFFSET_OVERFLOW),
+				 errmsg("offset overflow: %d", fd)));
+		PG_RETURN_INT32(-1);
+	}
+
 	PG_RETURN_INT32(status);
 }
 
+
+Datum
+lo_lseek64(PG_FUNCTION_ARGS)
+{
+	int32		fd = PG_GETARG_INT32(0);
+	int64		offset = PG_GETARG_INT64(1);
+	int32		whence = PG_GETARG_INT32(2);
+	MemoryContext currentContext;
+	int64			status;
+
+	if (fd < 0 || fd >= cookies_size || cookies[fd] == NULL)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("invalid large-object descriptor: %d", fd)));
+		PG_RETURN_INT64(-1);
+	}
+
+	Assert(fscxt != NULL);
+	currentContext = MemoryContextSwitchTo(fscxt);
+
+	status = inv_seek(cookies[fd], offset, whence);
+
+	MemoryContextSwitchTo(currentContext);
+
+	PG_RETURN_INT64(status);
+}
+
 Datum
 lo_creat(PG_FUNCTION_ARGS)
 {
@@ -264,13 +301,46 @@ Datum
 lo_tell(PG_FUNCTION_ARGS)
 {
 	int32		fd = PG_GETARG_INT32(0);
+	int64		offset = 0;
+
+	if (fd < 0 || fd >= cookies_size || cookies[fd] == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("invalid large-object descriptor: %d", fd)));
+
+	offset = inv_tell(cookies[fd]);
+
+	if (INT_MAX < offset)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_BLOB_OFFSET_OVERFLOW),
+				 errmsg("offset overflow: %d", fd)));
+		PG_RETURN_INT32(-1);
+	}
+
+	PG_RETURN_INT32(offset);
+}
+
+
+Datum
+lo_tell64(PG_FUNCTION_ARGS)
+{
+	int32		fd = PG_GETARG_INT32(0);
 
 	if (fd < 0 || fd >= cookies_size || cookies[fd] == NULL)
+	{
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_OBJECT),
 				 errmsg("invalid large-object descriptor: %d", fd)));
+		PG_RETURN_INT64(-1);
+	}
 
-	PG_RETURN_INT32(inv_tell(cookies[fd]));
+	/*
+	 * We assume we do not need to switch contexts for inv_tell. That is
+	 * true for now, but is probably more than this module ought to
+	 * assume...
+	 */
+	PG_RETURN_INT64(inv_tell(cookies[fd]));
 }
 
 Datum
@@ -533,6 +603,33 @@ lo_truncate(PG_FUNCTION_ARGS)
 	PG_RETURN_INT32(0);
 }
 
+Datum
+lo_truncate64(PG_FUNCTION_ARGS)
+{
+	int32		fd = PG_GETARG_INT32(0);
+	int64		len = PG_GETARG_INT64(1);
+
+	if (fd < 0 || fd >= cookies_size || cookies[fd] == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("invalid large-object descriptor: %d", fd)));
+
+	/* Permission checks */
+	if (!lo_compat_privileges &&
+		pg_largeobject_aclcheck_snapshot(cookies[fd]->id,
+										 GetUserId(),
+										 ACL_UPDATE,
+									   cookies[fd]->snapshot) != ACLCHECK_OK)
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied for large object %u",
+						cookies[fd]->id)));
+
+	inv_truncate(cookies[fd], len);
+
+	PG_RETURN_INT32(0);
+}
+
 /*
  * AtEOXact_LargeObject -
  *		 prepares large objects for transaction commit
diff --git a/src/backend/storage/large_object/inv_api.c b/src/backend/storage/large_object/inv_api.c
index 3adfb15..6a51664 100644
--- a/src/backend/storage/large_object/inv_api.c
+++ b/src/backend/storage/large_object/inv_api.c
@@ -324,10 +324,10 @@ inv_drop(Oid lobjId)
  * NOTE: LOs can contain gaps, just like Unix files.  We actually return
  * the offset of the last byte + 1.
  */
-static uint32
+static uint64
 inv_getsize(LargeObjectDesc *obj_desc)
 {
-	uint32		lastbyte = 0;
+	uint64		lastbyte = 0;
 	ScanKeyData skey[1];
 	SysScanDesc sd;
 	HeapTuple	tuple;
@@ -368,7 +368,7 @@ inv_getsize(LargeObjectDesc *obj_desc)
 				heap_tuple_untoast_attr((struct varlena *) datafield);
 			pfreeit = true;
 		}
-		lastbyte = data->pageno * LOBLKSIZE + getbytealen(datafield);
+		lastbyte = (uint64) data->pageno * LOBLKSIZE + getbytealen(datafield);
 		if (pfreeit)
 			pfree(datafield);
 	}
@@ -378,8 +378,8 @@ inv_getsize(LargeObjectDesc *obj_desc)
 	return lastbyte;
 }
 
-int
-inv_seek(LargeObjectDesc *obj_desc, int offset, int whence)
+int64
+inv_seek(LargeObjectDesc *obj_desc, int64 offset, int whence)
 {
 	Assert(PointerIsValid(obj_desc));
 
@@ -387,20 +387,20 @@ inv_seek(LargeObjectDesc *obj_desc, int offset, int whence)
 	{
 		case SEEK_SET:
 			if (offset < 0)
-				elog(ERROR, "invalid seek offset: %d", offset);
+				elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
 			obj_desc->offset = offset;
 			break;
 		case SEEK_CUR:
-			if (offset < 0 && obj_desc->offset < ((uint32) (-offset)))
-				elog(ERROR, "invalid seek offset: %d", offset);
+			if (offset < 0 && obj_desc->offset < ((uint64) (-offset)))
+				elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
 			obj_desc->offset += offset;
 			break;
 		case SEEK_END:
 			{
-				uint32		size = inv_getsize(obj_desc);
+				uint64		size = inv_getsize(obj_desc);
 
-				if (offset < 0 && size < ((uint32) (-offset)))
-					elog(ERROR, "invalid seek offset: %d", offset);
+				if (offset < 0 && size < ((uint64) (-offset)))
+					elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
 				obj_desc->offset = size + offset;
 			}
 			break;
@@ -410,7 +410,7 @@ inv_seek(LargeObjectDesc *obj_desc, int offset, int whence)
 	return obj_desc->offset;
 }
 
-int
+int64
 inv_tell(LargeObjectDesc *obj_desc)
 {
 	Assert(PointerIsValid(obj_desc));
@@ -422,11 +422,11 @@ int
 inv_read(LargeObjectDesc *obj_desc, char *buf, int nbytes)
 {
 	int			nread = 0;
-	int			n;
-	int			off;
+	int64		n;
+	int64		off;
 	int			len;
 	int32		pageno = (int32) (obj_desc->offset / LOBLKSIZE);
-	uint32		pageoff;
+	uint64		pageoff;
 	ScanKeyData skey[2];
 	SysScanDesc sd;
 	HeapTuple	tuple;
@@ -467,7 +467,7 @@ inv_read(LargeObjectDesc *obj_desc, char *buf, int nbytes)
 		 * there may be missing pages if the LO contains unwritten "holes". We
 		 * want missing sections to read out as zeroes.
 		 */
-		pageoff = ((uint32) data->pageno) * LOBLKSIZE;
+		pageoff = ((uint64) data->pageno) * LOBLKSIZE;
 		if (pageoff > obj_desc->offset)
 		{
 			n = pageoff - obj_desc->offset;
@@ -718,10 +718,10 @@ inv_write(LargeObjectDesc *obj_desc, const char *buf, int nbytes)
 }
 
 void
-inv_truncate(LargeObjectDesc *obj_desc, int len)
+inv_truncate(LargeObjectDesc *obj_desc, int64 len)
 {
 	int32		pageno = (int32) (len / LOBLKSIZE);
-	int			off;
+	int64		off;
 	ScanKeyData skey[2];
 	SysScanDesc sd;
 	HeapTuple	oldtuple;
diff --git a/src/backend/utils/errcodes.txt b/src/backend/utils/errcodes.txt
index 3e04164..db8ab53 100644
--- a/src/backend/utils/errcodes.txt
+++ b/src/backend/utils/errcodes.txt
@@ -199,6 +199,7 @@ Section: Class 22 - Data Exception
 2200N    E    ERRCODE_INVALID_XML_CONTENT                                    invalid_xml_content
 2200S    E    ERRCODE_INVALID_XML_COMMENT                                    invalid_xml_comment
 2200T    E    ERRCODE_INVALID_XML_PROCESSING_INSTRUCTION                     invalid_xml_processing_instruction
+22P07    E    ERRCODE_BLOB_OFFSET_OVERFLOW                                   blob_offset_overflow
 
 Section: Class 23 - Integrity Constraint Violation
 
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 77a3b41..a2da836 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -1040,14 +1040,20 @@ DATA(insert OID = 955 (  lowrite		   PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23
 DESCR("large object write");
 DATA(insert OID = 956 (  lo_lseek		   PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 23 "23 23 23" _null_ _null_ _null_ _null_	lo_lseek _null_ _null_ _null_ ));
 DESCR("large object seek");
+DATA(insert OID = 3170 (  lo_lseek64		   PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 20 "23 20 23" _null_ _null_ _null_ _null_	lo_lseek64 _null_ _null_ _null_ ));
+DESCR("large object seek (64 bit)");
 DATA(insert OID = 957 (  lo_creat		   PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 26 "23" _null_ _null_ _null_ _null_ lo_creat _null_ _null_ _null_ ));
 DESCR("large object create");
 DATA(insert OID = 715 (  lo_create		   PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 26 "26" _null_ _null_ _null_ _null_ lo_create _null_ _null_ _null_ ));
 DESCR("large object create");
 DATA(insert OID = 958 (  lo_tell		   PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 23 "23" _null_ _null_ _null_ _null_ lo_tell _null_ _null_ _null_ ));
 DESCR("large object position");
+DATA(insert OID = 3171 (  lo_tell64		   PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 20 "23" _null_ _null_ _null_ _null_ lo_tell64 _null_ _null_ _null_ ));
+DESCR("large object position (64 bit)");
 DATA(insert OID = 1004 (  lo_truncate	   PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "23 23" _null_ _null_ _null_ _null_ lo_truncate _null_ _null_ _null_ ));
 DESCR("truncate large object");
+DATA(insert OID = 3172 (  lo_truncate64	   PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "23 20" _null_ _null_ _null_ _null_ lo_truncate64 _null_ _null_ _null_ ));
+DESCR("truncate large object (64 bit)");
 
 DATA(insert OID = 959 (  on_pl			   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "600 628" _null_ _null_ _null_ _null_	on_pl _null_ _null_ _null_ ));
 DATA(insert OID = 960 (  on_sl			   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "601 628" _null_ _null_ _null_ _null_	on_sl _null_ _null_ _null_ ));
diff --git a/src/include/libpq/be-fsstubs.h b/src/include/libpq/be-fsstubs.h
index 0c832da..d74ea0e 100644
--- a/src/include/libpq/be-fsstubs.h
+++ b/src/include/libpq/be-fsstubs.h
@@ -34,8 +34,11 @@ extern Datum lowrite(PG_FUNCTION_ARGS);
 
 extern Datum lo_lseek(PG_FUNCTION_ARGS);
 extern Datum lo_tell(PG_FUNCTION_ARGS);
+extern Datum lo_lseek64(PG_FUNCTION_ARGS);
+extern Datum lo_tell64(PG_FUNCTION_ARGS);
 extern Datum lo_unlink(PG_FUNCTION_ARGS);
 extern Datum lo_truncate(PG_FUNCTION_ARGS);
+extern Datum lo_truncate64(PG_FUNCTION_ARGS);
 
 /*
  * compatibility option for access control
diff --git a/src/include/postgres_ext.h b/src/include/postgres_ext.h
index b6ebb7a..76502de 100644
--- a/src/include/postgres_ext.h
+++ b/src/include/postgres_ext.h
@@ -56,4 +56,9 @@ typedef unsigned int Oid;
 #define PG_DIAG_SOURCE_LINE		'L'
 #define PG_DIAG_SOURCE_FUNCTION 'R'
 
+#ifndef NO_PG_INT64
+#define HAVE_PG_INT64 1
+typedef long long int pg_int64;
+#endif
+
 #endif
diff --git a/src/include/storage/large_object.h b/src/include/storage/large_object.h
index 1fe07ee..79646c9 100644
--- a/src/include/storage/large_object.h
+++ b/src/include/storage/large_object.h
@@ -37,7 +37,7 @@ typedef struct LargeObjectDesc
 	Oid			id;				/* LO's identifier */
 	Snapshot	snapshot;		/* snapshot to use */
 	SubTransactionId subid;		/* owning subtransaction ID */
-	uint32		offset;			/* current seek pointer */
+	uint64		offset;			/* current seek pointer */
 	int			flags;			/* locking info, etc */
 
 /* flag bits: */
@@ -74,10 +74,10 @@ extern Oid	inv_create(Oid lobjId);
 extern LargeObjectDesc *inv_open(Oid lobjId, int flags, MemoryContext mcxt);
 extern void inv_close(LargeObjectDesc *obj_desc);
 extern int	inv_drop(Oid lobjId);
-extern int	inv_seek(LargeObjectDesc *obj_desc, int offset, int whence);
-extern int	inv_tell(LargeObjectDesc *obj_desc);
+extern int64	inv_seek(LargeObjectDesc *obj_desc, int64 offset, int whence);
+extern int64	inv_tell(LargeObjectDesc *obj_desc);
 extern int	inv_read(LargeObjectDesc *obj_desc, char *buf, int nbytes);
 extern int	inv_write(LargeObjectDesc *obj_desc, const char *buf, int nbytes);
-extern void inv_truncate(LargeObjectDesc *obj_desc, int len);
+extern void inv_truncate(LargeObjectDesc *obj_desc, int64 len);
 
 #endif   /* LARGE_OBJECT_H */
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 9d95e26..56d0bb8 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -161,3 +161,6 @@ PQping                    158
 PQpingParams              159
 PQlibVersion              160
 PQsetSingleRowMode        161
+lo_lseek64                162
+lo_tell64                 163
+lo_truncate64             164
diff --git a/src/interfaces/libpq/fe-lobj.c b/src/interfaces/libpq/fe-lobj.c
index f3a6d03..a2466e1 100644
--- a/src/interfaces/libpq/fe-lobj.c
+++ b/src/interfaces/libpq/fe-lobj.c
@@ -37,10 +37,16 @@
 #include "libpq-int.h"
 #include "libpq/libpq-fs.h"		/* must come after sys/stat.h */
 
+/* for ntohl/htonl */
+#include <netinet/in.h>
+#include <arpa/inet.h>
+
 #define LO_BUFSIZE		  8192
 
 static int	lo_initialize(PGconn *conn);
 static Oid	lo_import_internal(PGconn *conn, const char *filename, Oid oid);
+static pg_int64	lo_hton64(pg_int64 host64);
+static pg_int64	lo_ntoh64(pg_int64 net64);
 
 /*
  * lo_open
@@ -174,6 +180,59 @@ lo_truncate(PGconn *conn, int fd, size_t len)
 	}
 }
 
+/*
+ * lo_truncate64
+ *	  truncates an existing large object to the given size
+ *
+ * returns 0 upon success
+ * returns -1 upon failure
+ */
+#ifdef HAVE_PG_INT64
+int
+lo_truncate64(PGconn *conn, int fd, pg_int64 len)
+{
+	PQArgBlock	argv[2];
+	PGresult   *res;
+	int			retval;
+	int			result_len;
+
+	if (conn == NULL || conn->lobjfuncs == NULL)
+	{
+		if (lo_initialize(conn) < 0)
+			return -1;
+	}
+
+	if (conn->lobjfuncs->fn_lo_truncate64 == 0)
+	{
+		printfPQExpBuffer(&conn->errorMessage,
+			libpq_gettext("cannot determine OID of function lo_truncate64\n"));
+		return -1;
+	}
+
+	argv[0].isint = 1;
+	argv[0].len = 4;
+	argv[0].u.integer = fd;
+
+	len = lo_hton64(len);
+	argv[1].isint = 0;
+	argv[1].len = 8;
+	argv[1].u.ptr = (int *) &len;
+
+	res = PQfn(conn, conn->lobjfuncs->fn_lo_truncate64,
+			   &retval, &result_len, 1, argv, 2);
+
+	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+	{
+		PQclear(res);
+		return retval;
+	}
+	else
+	{
+		PQclear(res);
+		return -1;
+	}
+}
+#endif
 
 /*
  * lo_read
@@ -311,6 +370,63 @@ lo_lseek(PGconn *conn, int fd, int offset, int whence)
 }
 
 /*
+ * lo_lseek64
+ *	  change the current read or write location on a large object
+ * currently, only L_SET is a legal value for whence
+ *
+ */
+
+#ifdef HAVE_PG_INT64
+pg_int64
+lo_lseek64(PGconn *conn, int fd, pg_int64 offset, int whence)
+{
+	PQArgBlock	argv[3];
+	PGresult   *res;
+	pg_int64		retval;
+	int			result_len;
+
+	if (conn == NULL || conn->lobjfuncs == NULL)
+	{
+		if (lo_initialize(conn) < 0)
+			return -1;
+	}
+
+	if (conn->lobjfuncs->fn_lo_lseek64 == 0)
+	{
+		printfPQExpBuffer(&conn->errorMessage,
+			libpq_gettext("cannot determine OID of function lo_lseek64\n"));
+		return -1;
+	}
+
+	argv[0].isint = 1;
+	argv[0].len = 4;
+	argv[0].u.integer = fd;
+
+	offset = lo_hton64(offset);
+	argv[1].isint = 0;
+	argv[1].len = 8;
+	argv[1].u.ptr = (int *) &offset;
+
+	argv[2].isint = 1;
+	argv[2].len = 4;
+	argv[2].u.integer = whence;
+
+	res = PQfn(conn, conn->lobjfuncs->fn_lo_lseek64,
+			   (int *)&retval, &result_len, 0, argv, 3);
+	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+	{
+		PQclear(res);
+		return lo_ntoh64((pg_int64)retval);
+	}
+	else
+	{
+		PQclear(res);
+		return -1;
+	}
+}
+#endif
+
+/*
  * lo_creat
  *	  create a new large object
  * the mode is ignored (once upon a time it had a use)
@@ -436,6 +552,52 @@ lo_tell(PGconn *conn, int fd)
 }
 
 /*
+ * lo_tell64
+ *	  returns the current seek location of the large object
+ *
+ */
+#ifdef HAVE_PG_INT64
+pg_int64
+lo_tell64(PGconn *conn, int fd)
+{
+	pg_int64	retval;
+	PQArgBlock	argv[1];
+	PGresult   *res;
+	int			result_len;
+
+	if (conn == NULL || conn->lobjfuncs == NULL)
+	{
+		if (lo_initialize(conn) < 0)
+			return -1;
+	}
+
+	if (conn->lobjfuncs->fn_lo_tell64 == 0)
+	{
+		printfPQExpBuffer(&conn->errorMessage,
+			libpq_gettext("cannot determine OID of function lo_tell64\n"));
+		return -1;
+	}
+
+	argv[0].isint = 1;
+	argv[0].len = 4;
+	argv[0].u.integer = fd;
+
+	res = PQfn(conn, conn->lobjfuncs->fn_lo_tell64,
+			   (int *) &retval, &result_len, 0, argv, 1);
+	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+	{
+		PQclear(res);
+		return lo_ntoh64((pg_int64) retval);
+	}
+	else
+	{
+		PQclear(res);
+		return -1;
+	}
+}
+#endif
+
+/*
  * lo_unlink
  *	  delete a file
  *
@@ -713,8 +875,11 @@ lo_initialize(PGconn *conn)
 			"'lo_create', "
 			"'lo_unlink', "
 			"'lo_lseek', "
+			"'lo_lseek64', "
 			"'lo_tell', "
+			"'lo_tell64', "
 			"'lo_truncate', "
+			"'lo_truncate64', "
 			"'loread', "
 			"'lowrite') "
 			"and pronamespace = (select oid from pg_catalog.pg_namespace "
@@ -765,10 +930,16 @@ lo_initialize(PGconn *conn)
 			lobjfuncs->fn_lo_unlink = foid;
 		else if (strcmp(fname, "lo_lseek") == 0)
 			lobjfuncs->fn_lo_lseek = foid;
+		else if (strcmp(fname, "lo_lseek64") == 0)
+			lobjfuncs->fn_lo_lseek64 = foid;
 		else if (strcmp(fname, "lo_tell") == 0)
 			lobjfuncs->fn_lo_tell = foid;
+		else if (strcmp(fname, "lo_tell64") == 0)
+			lobjfuncs->fn_lo_tell64 = foid;
 		else if (strcmp(fname, "lo_truncate") == 0)
 			lobjfuncs->fn_lo_truncate = foid;
+		else if (strcmp(fname, "lo_truncate64") == 0)
+			lobjfuncs->fn_lo_truncate64 = foid;
 		else if (strcmp(fname, "loread") == 0)
 			lobjfuncs->fn_lo_read = foid;
 		else if (strcmp(fname, "lowrite") == 0)
@@ -836,10 +1007,86 @@ lo_initialize(PGconn *conn)
 		free(lobjfuncs);
 		return -1;
 	}
-
+	if (conn->sversion >= 90300)
+	{
+		if (lobjfuncs->fn_lo_lseek64 == 0)
+		{
+			printfPQExpBuffer(&conn->errorMessage,
+					libpq_gettext("cannot determine OID of function lo_lseek64\n"));
+			free(lobjfuncs);
+			return -1;
+		}
+		if (lobjfuncs->fn_lo_tell64 == 0)
+		{
+			printfPQExpBuffer(&conn->errorMessage,
+					libpq_gettext("cannot determine OID of function lo_tell64\n"));
+			free(lobjfuncs);
+			return -1;
+		}
+		if (lobjfuncs->fn_lo_truncate64 == 0)
+		{
+			printfPQExpBuffer(&conn->errorMessage,
+					libpq_gettext("cannot determine OID of function lo_truncate64\n"));
+			free(lobjfuncs);
+			return -1;
+		}
+	}
 	/*
 	 * Put the structure into the connection control
 	 */
 	conn->lobjfuncs = lobjfuncs;
 	return 0;
 }
+
+/*
+ * lo_hton64
+ *	  converts an 64-bit integer from host byte order to network byte order
+ */
+static pg_int64
+lo_hton64(pg_int64 host64)
+{
+	pg_int64 	result;
+	uint32_t	h32, l32;
+
+	/* High order half first, since we're doing MSB-first */
+#ifdef INT64_IS_BUSTED
+	/* don't try a right shift of 32 on a 32-bit word */
+	h32 = (host64 < 0) ? -1 : 0;
+#else
+	h32 = (uint32_t) (host64 >> 32);
+#endif
+
+	/* Now the low order half */
+	l32 = (uint32_t) (host64 & 0xffffffff);
+
+	result = htonl(l32);
+	result <<= 32;
+	result |= htonl(h32);
+
+	return result;
+}
+
+/*
+ * lo_ntoh64
+ *	  converts an 64-bit integer from network byte order to host byte order
+ */
+static pg_int64
+lo_ntoh64(pg_int64 net64)
+{
+	pg_int64 	result;
+	uint32_t	h32, l32;
+
+	l32 = (uint32_t) (net64 >> 32);
+	h32 = (uint32_t) (net64 & 0xffffffff);
+
+#ifdef INT64_IS_BUSTED
+	/* just lose the high half */
+	result = (pg_int64) ntohl(l32);
+#else
+	result = ntohl(h32);
+	result <<= 32;
+	result |= ntohl(l32);
+#endif
+
+	return result;
+}
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index 9d05dd2..73568ca 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -548,6 +548,12 @@ extern Oid	lo_import(PGconn *conn, const char *filename);
 extern Oid	lo_import_with_oid(PGconn *conn, const char *filename, Oid lobjId);
 extern int	lo_export(PGconn *conn, Oid lobjId, const char *filename);
 
+#ifdef HAVE_PG_INT64
+extern pg_int64	lo_lseek64(PGconn *conn, int fd, pg_int64 offset, int whence);
+extern pg_int64	lo_tell64(PGconn *conn, int fd);
+extern int	lo_truncate64(PGconn *conn, int fd, pg_int64 len);
+#endif
+
 /* === in fe-misc.c === */
 
 /* Get the version of the libpq library in use */
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 4a6c8fe..375821e 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -271,8 +271,11 @@ typedef struct pgLobjfuncs
 	Oid			fn_lo_create;	/* OID of backend function lo_create	*/
 	Oid			fn_lo_unlink;	/* OID of backend function lo_unlink	*/
 	Oid			fn_lo_lseek;	/* OID of backend function lo_lseek		*/
+	Oid			fn_lo_lseek64;	/* OID of backend function lo_lseek64		*/
 	Oid			fn_lo_tell;		/* OID of backend function lo_tell		*/
+	Oid			fn_lo_tell64;		/* OID of backend function lo_tell64		*/
 	Oid			fn_lo_truncate; /* OID of backend function lo_truncate	*/
+	Oid			fn_lo_truncate64; /* OID of backend function lo_truncate64	*/
 	Oid			fn_lo_read;		/* OID of backend function LOread		*/
 	Oid			fn_lo_write;	/* OID of backend function LOwrite		*/
 } PGlobjfuncs;
diff --git a/src/test/examples/Makefile b/src/test/examples/Makefile
index bbc6ee1..aee5c04 100644
--- a/src/test/examples/Makefile
+++ b/src/test/examples/Makefile
@@ -14,7 +14,7 @@ override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 override LDLIBS := $(libpq_pgport) $(LDLIBS)
 
 
-PROGS = testlibpq testlibpq2 testlibpq3 testlibpq4 testlo
+PROGS = testlibpq testlibpq2 testlibpq3 testlibpq4 testlo testlo64
 
 all: $(PROGS)
 
diff --git a/src/test/examples/testlo64.c b/src/test/examples/testlo64.c
new file mode 100644
index 0000000..e8faaa9
--- /dev/null
+++ b/src/test/examples/testlo64.c
@@ -0,0 +1,320 @@
+/*-------------------------------------------------------------------------
+ *
+ * testlo.c
+ *	  test using large objects with libpq
+ *
+ * Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  $PostgreSQL: pgsql/src/test/examples/testlo.c,v 1.25 2004/12/31 22:03:58 pgsql Exp $
+ *
+ *-------------------------------------------------------------------------
+ */
+#include <stdio.h>
+#include <stdlib.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+#include "libpq-fe.h"
+#include "libpq/libpq-fs.h"
+
+#define BUFSIZE			1024
+
+/*
+ * importFile -
+ *	  import file "in_filename" into database as large object "lobjOid"
+ *
+ */
+static Oid
+importFile(PGconn *conn, char *filename)
+{
+	Oid			lobjId;
+	int			lobj_fd;
+	char		buf[BUFSIZE];
+	int			nbytes,
+				tmp;
+	int			fd;
+
+	/*
+	 * open the file to be read in
+	 */
+	fd = open(filename, O_RDONLY, 0666);
+	if (fd < 0)
+	{							/* error */
+		fprintf(stderr, "can't open unix file\"%s\"\n", filename);
+	}
+
+	/*
+	 * create the large object
+	 */
+	lobjId = lo_creat(conn, INV_READ | INV_WRITE);
+	if (lobjId == 0)
+		fprintf(stderr, "can't create large object");
+
+	lobj_fd = lo_open(conn, lobjId, INV_WRITE);
+
+	/*
+	 * read in from the Unix file and write to the inversion file
+	 */
+	while ((nbytes = read(fd, buf, BUFSIZE)) > 0)
+	{
+		tmp = lo_write(conn, lobj_fd, buf, nbytes);
+		if (tmp < nbytes)
+			fprintf(stderr, "error while reading \"%s\"", filename);
+	}
+
+	close(fd);
+	lo_close(conn, lobj_fd);
+
+	return lobjId;
+}
+
+static void
+pickout(PGconn *conn, Oid lobjId, pg_int64 start, int len)
+{
+	int			lobj_fd;
+	char	   *buf;
+	int			nbytes;
+	int			nread;
+	pg_int64		pos;
+
+	lobj_fd = lo_open(conn, lobjId, INV_READ);
+	if (lobj_fd < 0)
+		fprintf(stderr, "can't open large object %u", lobjId);
+
+	if (lo_tell64(conn, lobj_fd) < 0)
+	{
+		fprintf(stderr, "error lo_tell64: %s\n", PQerrorMessage(conn));
+	}
+
+	if ((pos=lo_lseek64(conn, lobj_fd, start, SEEK_SET)) < 0)
+	{
+		fprintf(stderr, "error lo_lseek64: %s\n", PQerrorMessage(conn));
+		return;
+	}
+
+	fprintf(stderr, "before read: retval of lo_lseek64 : %lld\n", (long long int) pos);
+
+	buf = malloc(len + 1);
+
+	nread = 0;
+	while (len - nread > 0)
+	{
+		nbytes = lo_read(conn, lobj_fd, buf, len - nread);
+		buf[nbytes] = '\0';
+		fprintf(stderr, ">>> %s", buf);
+		nread += nbytes;
+		if (nbytes <= 0)
+			break;				/* no more data? */
+	}
+	free(buf);
+	fprintf(stderr, "\n");
+
+	pos = lo_tell64(conn, lobj_fd);
+	fprintf(stderr, "after read: retval of lo_tell64 : %lld\n\n", (long long int) pos);
+
+	lo_close(conn, lobj_fd);
+}
+
+static void
+overwrite(PGconn *conn, Oid lobjId, pg_int64 start, int len)
+{
+	int			lobj_fd;
+	char	   *buf;
+	int			nbytes;
+	int			nwritten;
+	int			i;
+	pg_int64		pos;
+
+	lobj_fd = lo_open(conn, lobjId, INV_READ | INV_WRITE);
+	if (lobj_fd < 0)
+		fprintf(stderr, "can't open large object %u", lobjId);
+
+	if ((pos=lo_lseek64(conn, lobj_fd, start, SEEK_SET)) < 0)
+	{
+		fprintf(stderr, "error lo_lseek64: %s\n", PQerrorMessage(conn));
+		return;
+	}
+	fprintf(stderr, "before write: retval of lo_lseek64 : %lld\n", (long long int) pos);
+
+	buf = malloc(len + 1);
+
+	for (i = 0; i < len; i++)
+		buf[i] = 'X';
+	buf[i] = '\0';
+
+	nwritten = 0;
+	while (len - nwritten > 0)
+	{
+		nbytes = lo_write(conn, lobj_fd, buf + nwritten, len - nwritten);
+		nwritten += nbytes;
+		if (nbytes <= 0)
+		{
+			fprintf(stderr, "\nWRITE FAILED!\n");
+			break;
+		}
+	}
+	free(buf);
+
+	pos = lo_tell64(conn, lobj_fd);
+	fprintf(stderr, "after write: retval of lo_tell64 : %lld\n\n", (long long int) pos);
+
+	lo_close(conn, lobj_fd);
+}
+
+static void
+my_truncate(PGconn *conn, Oid lobjId, size_t len)
+{
+	int			lobj_fd;
+
+	lobj_fd = lo_open(conn, lobjId, INV_READ | INV_WRITE);
+	if (lobj_fd < 0)
+		fprintf(stderr, "can't open large object %u", lobjId);
+
+	if (lo_truncate64(conn, lobj_fd, len) < 0)
+	{
+		fprintf(stderr, "error lo_truncate64: %s\n", PQerrorMessage(conn));
+		return;
+	}
+
+
+	fprintf(stderr, "\n");
+	lo_close(conn, lobj_fd);
+}
+
+
+/*
+ * exportFile -
+ *	  export large object "lobjOid" to file "out_filename"
+ *
+ */
+static void
+exportFile(PGconn *conn, Oid lobjId, char *filename)
+{
+	int			lobj_fd;
+	char		buf[BUFSIZE];
+	int			nbytes,
+				tmp;
+	int			fd;
+
+	/*
+	 * create an inversion "object"
+	 */
+	lobj_fd = lo_open(conn, lobjId, INV_READ);
+	if (lobj_fd < 0)
+		fprintf(stderr, "can't open large object %u", lobjId);
+
+	/*
+	 * open the file to be written to
+	 */
+	fd = open(filename, O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (fd < 0)
+	{							/* error */
+		fprintf(stderr, "can't open unix file\"%s\"",
+				filename);
+	}
+
+	/*
+	 * read in from the Unix file and write to the inversion file
+	 */
+	while ((nbytes = lo_read(conn, lobj_fd, buf, BUFSIZE)) > 0)
+	{
+		tmp = write(fd, buf, nbytes);
+		if (tmp < nbytes)
+		{
+			fprintf(stderr, "error while writing \"%s\"",
+					filename);
+		}
+	}
+
+	lo_close(conn, lobj_fd);
+	close(fd);
+
+	return;
+}
+
+static void
+exit_nicely(PGconn *conn)
+{
+	PQfinish(conn);
+	exit(1);
+}
+
+int
+main(int argc, char **argv)
+{
+	char	   *in_filename,
+			   *out_filename,
+			   *out_filename2;
+	char	   *database;
+	Oid			lobjOid;
+	PGconn	   *conn;
+	PGresult   *res;
+
+	if (argc != 5)
+	{
+		fprintf(stderr, "Usage: %s database_name in_filename out_filename out_filename2\n",
+				argv[0]);
+		exit(1);
+	}
+
+	database = argv[1];
+	in_filename = argv[2];
+	out_filename = argv[3];
+	out_filename2 = argv[4];
+
+	/*
+	 * set up the connection
+	 */
+	conn = PQsetdb(NULL, NULL, NULL, NULL, database);
+
+	/* check to see that the backend connection was successfully made */
+	if (PQstatus(conn) != CONNECTION_OK)
+	{
+		fprintf(stderr, "Connection to database failed: %s",
+				PQerrorMessage(conn));
+		exit_nicely(conn);
+	}
+
+	res = PQexec(conn, "begin");
+	PQclear(res);
+	printf("importing file \"%s\" ...\n", in_filename);
+/*	lobjOid = importFile(conn, in_filename); */
+	lobjOid = lo_import(conn, in_filename);
+	if (lobjOid == 0)
+		fprintf(stderr, "%s\n", PQerrorMessage(conn));
+	else
+	{
+		printf("\tas large object %u.\n", lobjOid);
+
+		printf("picking out bytes 4294967000-4294968000 of the large object\n");
+		pickout(conn, lobjOid, 4294967000ULL, 1000);
+
+		printf("overwriting bytes 4294967000-4294968000 of the large object with X's\n");
+		overwrite(conn, lobjOid, 4294967000ULL, 1000);
+
+
+		printf("exporting large object to file \"%s\" ...\n", out_filename);
+/*		exportFile(conn, lobjOid, out_filename); */
+		if (!lo_export(conn, lobjOid, out_filename))
+			fprintf(stderr, "%s\n", PQerrorMessage(conn));
+
+		printf("truncating to 3294968000 byte\n");
+		my_truncate(conn, lobjOid, 3294968000ULL);
+
+		printf("exporting truncated large object to file \"%s\" ...\n", out_filename2);
+		if (!lo_export(conn, lobjOid, out_filename2))
+			fprintf(stderr, "%s\n", PQerrorMessage(conn));
+
+	}
+
+	res = PQexec(conn, "end");
+	PQclear(res);
+	PQfinish(conn);
+	return 0;
+}
diff --git a/src/test/regress/input/largeobject.source b/src/test/regress/input/largeobject.source
index 40f40f8..4984d78 100644
--- a/src/test/regress/input/largeobject.source
+++ b/src/test/regress/input/largeobject.source
@@ -125,6 +125,29 @@ SELECT lo_tell(fd) FROM lotest_stash_values;
 SELECT lo_close(fd) FROM lotest_stash_values;
 END;
 
+-- Test 64-bit largelbject functions.
+BEGIN;
+UPDATE lotest_stash_values SET fd = lo_open(loid, CAST(x'20000' | x'40000' AS integer));
+
+SELECT lo_lseek64(fd, 4294967296, 0) FROM lotest_stash_values;
+SELECT lowrite(fd, 'offset:4GB') FROM lotest_stash_values;
+SELECT lo_tell64(fd) FROM lotest_stash_values;
+
+SELECT lo_lseek64(fd, -10, 1) FROM lotest_stash_values;
+SELECT lo_tell64(fd) FROM lotest_stash_values;
+SELECT loread(fd, 10) FROM lotest_stash_values;
+
+SELECT lo_truncate64(fd, 5000000000) FROM lotest_stash_values;
+SELECT lo_lseek64(fd, 0, 2) FROM lotest_stash_values;
+SELECT lo_tell64(fd) FROM lotest_stash_values;
+
+SELECT lo_truncate64(fd, 3000000000) FROM lotest_stash_values;
+SELECT lo_lseek64(fd, 0, 2) FROM lotest_stash_values;
+SELECT lo_tell64(fd) FROM lotest_stash_values;
+
+SELECT lo_close(fd) FROM lotest_stash_values;
+END;
+
 -- lo_unlink(lobjId oid) returns integer
 -- return value appears to always be 1
 SELECT lo_unlink(loid) from lotest_stash_values;
diff --git a/src/test/regress/output/largeobject.source b/src/test/regress/output/largeobject.source
index 55aaf8f..74c4772 100644
--- a/src/test/regress/output/largeobject.source
+++ b/src/test/regress/output/largeobject.source
@@ -210,6 +210,88 @@ SELECT lo_close(fd) FROM lotest_stash_values;
 (1 row)
 
 END;
+-- Test 64-bit largelbject functions.
+BEGIN;
+UPDATE lotest_stash_values SET fd = lo_open(loid, CAST(x'20000' | x'40000' AS integer));
+SELECT lo_lseek64(fd, 4294967296, 0) FROM lotest_stash_values;
+ lo_lseek64 
+------------
+ 4294967296
+(1 row)
+
+SELECT lowrite(fd, 'offset:4GB') FROM lotest_stash_values;
+ lowrite 
+---------
+      10
+(1 row)
+
+SELECT lo_tell64(fd) FROM lotest_stash_values;
+ lo_tell64  
+------------
+ 4294967306
+(1 row)
+
+SELECT lo_lseek64(fd, -10, 1) FROM lotest_stash_values;
+ lo_lseek64 
+------------
+ 4294967296
+(1 row)
+
+SELECT lo_tell64(fd) FROM lotest_stash_values;
+ lo_tell64  
+------------
+ 4294967296
+(1 row)
+
+SELECT loread(fd, 10) FROM lotest_stash_values;
+   loread   
+------------
+ offset:4GB
+(1 row)
+
+SELECT lo_truncate64(fd, 5000000000) FROM lotest_stash_values;
+ lo_truncate64 
+---------------
+             0
+(1 row)
+
+SELECT lo_lseek64(fd, 0, 2) FROM lotest_stash_values;
+ lo_lseek64 
+------------
+ 5000000000
+(1 row)
+
+SELECT lo_tell64(fd) FROM lotest_stash_values;
+ lo_tell64  
+------------
+ 5000000000
+(1 row)
+
+SELECT lo_truncate64(fd, 3000000000) FROM lotest_stash_values;
+ lo_truncate64 
+---------------
+             0
+(1 row)
+
+SELECT lo_lseek64(fd, 0, 2) FROM lotest_stash_values;
+ lo_lseek64 
+------------
+ 3000000000
+(1 row)
+
+SELECT lo_tell64(fd) FROM lotest_stash_values;
+ lo_tell64  
+------------
+ 3000000000
+(1 row)
+
+SELECT lo_close(fd) FROM lotest_stash_values;
+ lo_close 
+----------
+        0
+(1 row)
+
+END;
 -- lo_unlink(lobjId oid) returns integer
 -- return value appears to always be 1
 SELECT lo_unlink(loid) from lotest_stash_values;
#37Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Nozomi Anzai (#36)
Re: 64-bit API for large object

I checked this patch. It looks good, but here are still some points to be
discussed.

* I have a question. What is the meaning of INT64_IS_BUSTED?
It seems to me a marker to indicate a platform without 64bit support.
However, the commit 901be0fad4034c9cf8a3588fd6cf2ece82e4b8ce
says as follows:
| Remove all the special-case code for INT64_IS_BUSTED, per decision that
| we're not going to support that anymore.

* At inv_seek(), it seems to me it checks offset correctness with wrong way,
as follows:
| case SEEK_SET:
| if (offset < 0)
| elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
| obj_desc->offset = offset;
| break;
It is a right assumption, if large object size would be restricted to 2GB.
But the largest positive int64 is larger than expected limitation.
So, it seems to me it should be compared with (INT_MAX * PAGE_SIZE)
instead.

* At inv_write(), it definitely needs a check to prevent data-write upper 4TB.
In case when obj_desc->offset is a bit below 4TB, an additional 1GB write
will break head of the large object because of "pageno" overflow.

* Please also add checks on inv_read() to prevent LargeObjectDesc->offset
unexpectedly overflows 4TB boundary.

* At inv_truncate(), variable "off" is re-defined to int64. Is it really needed
change? All its usage is to store the result of "len % LOBLKSIZE".

Thanks,

2012/9/24 Nozomi Anzai <anzai@sraoss.co.jp>:

Here is 64-bit API for large object version 2 patch.

I checked this patch. It can be applied onto the latest master branch
without any problems. My comments are below.

2012/9/11 Tatsuo Ishii <ishii@postgresql.org>:

Ok, here is the patch to implement 64-bit API for large object, to
allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to
32KB). The patch is based on Jeremy Drake's patch posted on September
23, 2005
(http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php)
and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai
for the backend part and Yugo Nagata for the rest(including
documentation patch).

Here are changes made in the patch:

1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata)

lo_initialize() gathers backend 64-bit large object handling
function's oid, namely lo_lseek64, lo_tell64, lo_truncate64.

If client calls lo_*64 functions and backend does not support them,
lo_*64 functions return error to caller. There might be an argument
since calls to lo_*64 functions can automatically be redirected to
32-bit older API. I don't know this is worth the trouble though.

I think it should definitely return an error code when user tries to
use lo_*64 functions towards the backend v9.2 or older, because
fallback to 32bit API can raise unexpected errors if application
intends to seek the area over than 2GB.

Currently lo_initialize() throws an error if one of oids are not
available. I doubt we do the same way for 64-bit functions since this
will make 9.3 libpq unable to access large objects stored in pre-9.2
PostgreSQL servers.

It seems to me the situation to split the case of pre-9.2 and post-9.3
using a condition of "conn->sversion >= 90300".

Fixed so, and tested it by deleteing the lo_tell64's row from pg_proc.

To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
is a pointer to 64-bit integer and actual data is placed somewhere
else. There might be other way: add new member to union u to store
64-bit integer:

typedef struct
{
int len;
int isint;
union
{
int *ptr; /* can't use void (dec compiler barfs) */
int integer;
int64 bigint; /* 64-bit integer */
} u;
} PQArgBlock;

I'm a little bit worried about this way because PQArgBlock is a public
interface.

I'm inclined to add a new field for the union; that seems to me straight
forward approach.
For example, the manner in lo_seek64() seems to me confusable.
It set 1 on "isint" field even though pointer is delivered actually.

+       argv[1].isint = 1;
+       argv[1].len = 8;
+       argv[1].u.ptr = (int *) &len;

Your proposal was not adopted per discussion.

Also we add new type "pg_int64":

#ifndef NO_PG_INT64
#define HAVE_PG_INT64 1
typedef long long int pg_int64;
#endif

in postgres_ext.h per suggestion from Tom Lane:
http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php

I'm uncertain about context of this discussion.

Does it make matter if we include <stdint.h> and use int64_t instead
of the self defined data type?

Your proposal was not adopted per discussion.
Per discussion, endiannness translation was moved to fe-lobj.c.

2) Backend lo_* functions (be-fsstubs.c)(Nozomi Anzai)

Add lo_lseek64, lo_tell64, lo_truncate64 so that they can handle
64-bit seek position and data length. loread64 and lowrite64 are not
added because if a program tries to read/write more than 2GB at once,
it would be a sign that the program need to be re-designed anyway.

I think it is a reasonable.

3) Backend inv_api.c functions(Nozomi Anzai)

No need to add new functions. Just extend them to handle 64-bit data.

BTW , what will happen if older 32-bit libpq accesses large objects
over 2GB?

lo_read and lo_write: they can read or write lobjs using 32-bit API as
long as requested read/write data length is smaller than 2GB. So I
think we can safely allow them to access over 2GB lobjs.

lo_lseek: again as long as requested offset is smaller than 2GB, there
would be no problem.

lo_tell:if current seek position is beyond 2GB, returns an error.

Even though iteration of lo_lseek() may move the offset to 4TB, it also
makes unavailable to use lo_tell() to obtain the current offset, so I think
it is reasonable behavior.

However, error code is not an appropriate one.

+       if (INT_MAX < offset)
+       {
+               ereport(ERROR,
+                               (errcode(ERRCODE_UNDEFINED_OBJECT),
+                                errmsg("invalid large-object
descriptor: %d", fd)));
+               PG_RETURN_INT32(-1);
+       }

According to the manpage of lseek(2)
EOVERFLOW
The resulting file offset cannot be represented in an off_t.

Please add a new error code such as ERRCODE_BLOB_OFFSET_OVERFLOW.

Changed the error code and error message. We added a new error code
"ERRCODE_UNDEFINED_OBJECT (22P07)".

4) src/test/examples/testlo64.c added for 64-bit API example(Yugo Nagata)

Comments and suggestions are welcome.

miscellaneous comments are below.

Regression test is helpful. Even though no need to try to create 4TB large
object, it is helpful to write some chunks around the design boundary.
Could you add some test cases that writes some chunks around 4TB offset.

Added 64-bit lobj test items into regression test and confirmed it worked
rightly.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Nozomi Anzai
SRA OSS, Inc. Japan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#38Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Kohei KaiGai (#37)
Re: 64-bit API for large object

Excerpts from Kohei KaiGai's message of jue sep 27 01:01:18 -0300 2012:

* I have a question. What is the meaning of INT64_IS_BUSTED?
It seems to me a marker to indicate a platform without 64bit support.
However, the commit 901be0fad4034c9cf8a3588fd6cf2ece82e4b8ce
says as follows:
| Remove all the special-case code for INT64_IS_BUSTED, per decision that
| we're not going to support that anymore.

Yeah, I think we should just get rid of those bits. I don't remember
seeing *any* complaint when INT64_IS_BUSTED was removed, which means
nobody was using that code anyway.

Now there is one more problem in this area which is that the patch
defined a new type pg_int64 for frontend code (postgres_ext.h). This
seems a bad idea to me. We already have int64 defined in c.h. Should
we expose int64 to postgres_ext.h somehow? Should we use standard-
mandated int64_t instead? One way would be to have a new configure
check for int64_t, and if that type doesn't exist, then just don't
provide the 64 bit functionality to frontend.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#39Tatsuo Ishii
ishii@postgresql.org
In reply to: Alvaro Herrera (#38)
Re: 64-bit API for large object

Excerpts from Kohei KaiGai's message of jue sep 27 01:01:18 -0300 2012:

* I have a question. What is the meaning of INT64_IS_BUSTED?
It seems to me a marker to indicate a platform without 64bit support.
However, the commit 901be0fad4034c9cf8a3588fd6cf2ece82e4b8ce
says as follows:
| Remove all the special-case code for INT64_IS_BUSTED, per decision that
| we're not going to support that anymore.

Yeah, I think we should just get rid of those bits. I don't remember
seeing *any* complaint when INT64_IS_BUSTED was removed, which means
nobody was using that code anyway.

Ok.

Now there is one more problem in this area which is that the patch
defined a new type pg_int64 for frontend code (postgres_ext.h). This
seems a bad idea to me. We already have int64 defined in c.h. Should
we expose int64 to postgres_ext.h somehow? Should we use standard-
mandated int64_t instead? One way would be to have a new configure
check for int64_t, and if that type doesn't exist, then just don't
provide the 64 bit functionality to frontend.

This has been already explained in upthread:
http://archives.postgresql.org/pgsql-hackers/2012-09/msg00447.php
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#40Tatsuo Ishii
ishii@postgresql.org
In reply to: Kohei KaiGai (#37)
Re: 64-bit API for large object

Kaiai-san,

Thank you for review.

I checked this patch. It looks good, but here are still some points to be
discussed.

* I have a question. What is the meaning of INT64_IS_BUSTED?
It seems to me a marker to indicate a platform without 64bit support.
However, the commit 901be0fad4034c9cf8a3588fd6cf2ece82e4b8ce
says as follows:
| Remove all the special-case code for INT64_IS_BUSTED, per decision that
| we're not going to support that anymore.

Agreed.

* At inv_seek(), it seems to me it checks offset correctness with wrong way,
as follows:
| case SEEK_SET:
| if (offset < 0)
| elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
| obj_desc->offset = offset;
| break;
It is a right assumption, if large object size would be restricted to 2GB.
But the largest positive int64 is larger than expected limitation.
So, it seems to me it should be compared with (INT_MAX * PAGE_SIZE)
instead.

Point taken. However, checking offset < 0 seems to be still valid
because it is possible to pass minus offset to inv_seek(), no? Also I
think upper limit for seek position should be defined as (INT_MAX *
LOBLKSIZE), rather than (INT_MAX * PAGE_SIZE). Probably (INT_MAX *
LOBLKSIZE) should be defined in pg_largeobject.h as:

/*
* Maximum byte length for each large object
*/
#define MAX_LARGE_OBJECT_SIZE INT64CONST(INT_MAX * LOBLKSIZE)

Then the checking offset in inv_seek() will be:

case SEEK_SET:
if (offset < 0 || offset >= MAX_LARGE_OBJECT_SIZE)
elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
obj_desc->offset = offset;
break;
case SEEK_CUR:
if ((offset + obj_desc->offset) < 0 ||
(offset + obj_desc->offset) >= MAX_LARGE_OBJECT_SIZE)
elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
obj_desc->offset += offset;
break;
case SEEK_END:
{
int64 pos = inv_getsize(obj_desc) + offset;

if (pos < 0 || pos >= MAX_LARGE_OBJECT_SIZE)
elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
obj_desc->offset = pos;
}

What do you think?

* At inv_write(), it definitely needs a check to prevent data-write upper 4TB.
In case when obj_desc->offset is a bit below 4TB, an additional 1GB write
will break head of the large object because of "pageno" overflow.

Ok. I will add checking:

if ((nbytes + obj_desc->offset) > MAX_LARGE_OBJECT_SIZE)
elog(ERROR, "invalid write request size: %d", nbytes);

* Please also add checks on inv_read() to prevent LargeObjectDesc->offset
unexpectedly overflows 4TB boundary.

Ok. I will add checking:

if ((nbytes + obj_desc->offset) > MAX_LARGE_OBJECT_SIZE)
elog(ERROR, "invalid read request size: %d", nbytes);

* At inv_truncate(), variable "off" is re-defined to int64. Is it really needed
change? All its usage is to store the result of "len % LOBLKSIZE".

Your point is correct. Back to int32.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Show quoted text

Thanks,

2012/9/24 Nozomi Anzai <anzai@sraoss.co.jp>:

Here is 64-bit API for large object version 2 patch.

I checked this patch. It can be applied onto the latest master branch
without any problems. My comments are below.

2012/9/11 Tatsuo Ishii <ishii@postgresql.org>:

Ok, here is the patch to implement 64-bit API for large object, to
allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to
32KB). The patch is based on Jeremy Drake's patch posted on September
23, 2005
(http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php)
and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai
for the backend part and Yugo Nagata for the rest(including
documentation patch).

Here are changes made in the patch:

1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata)

lo_initialize() gathers backend 64-bit large object handling
function's oid, namely lo_lseek64, lo_tell64, lo_truncate64.

If client calls lo_*64 functions and backend does not support them,
lo_*64 functions return error to caller. There might be an argument
since calls to lo_*64 functions can automatically be redirected to
32-bit older API. I don't know this is worth the trouble though.

I think it should definitely return an error code when user tries to
use lo_*64 functions towards the backend v9.2 or older, because
fallback to 32bit API can raise unexpected errors if application
intends to seek the area over than 2GB.

Currently lo_initialize() throws an error if one of oids are not
available. I doubt we do the same way for 64-bit functions since this
will make 9.3 libpq unable to access large objects stored in pre-9.2
PostgreSQL servers.

It seems to me the situation to split the case of pre-9.2 and post-9.3
using a condition of "conn->sversion >= 90300".

Fixed so, and tested it by deleteing the lo_tell64's row from pg_proc.

To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
is a pointer to 64-bit integer and actual data is placed somewhere
else. There might be other way: add new member to union u to store
64-bit integer:

typedef struct
{
int len;
int isint;
union
{
int *ptr; /* can't use void (dec compiler barfs) */
int integer;
int64 bigint; /* 64-bit integer */
} u;
} PQArgBlock;

I'm a little bit worried about this way because PQArgBlock is a public
interface.

I'm inclined to add a new field for the union; that seems to me straight
forward approach.
For example, the manner in lo_seek64() seems to me confusable.
It set 1 on "isint" field even though pointer is delivered actually.

+       argv[1].isint = 1;
+       argv[1].len = 8;
+       argv[1].u.ptr = (int *) &len;

Your proposal was not adopted per discussion.

Also we add new type "pg_int64":

#ifndef NO_PG_INT64
#define HAVE_PG_INT64 1
typedef long long int pg_int64;
#endif

in postgres_ext.h per suggestion from Tom Lane:
http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php

I'm uncertain about context of this discussion.

Does it make matter if we include <stdint.h> and use int64_t instead
of the self defined data type?

Your proposal was not adopted per discussion.
Per discussion, endiannness translation was moved to fe-lobj.c.

2) Backend lo_* functions (be-fsstubs.c)(Nozomi Anzai)

Add lo_lseek64, lo_tell64, lo_truncate64 so that they can handle
64-bit seek position and data length. loread64 and lowrite64 are not
added because if a program tries to read/write more than 2GB at once,
it would be a sign that the program need to be re-designed anyway.

I think it is a reasonable.

3) Backend inv_api.c functions(Nozomi Anzai)

No need to add new functions. Just extend them to handle 64-bit data.

BTW , what will happen if older 32-bit libpq accesses large objects
over 2GB?

lo_read and lo_write: they can read or write lobjs using 32-bit API as
long as requested read/write data length is smaller than 2GB. So I
think we can safely allow them to access over 2GB lobjs.

lo_lseek: again as long as requested offset is smaller than 2GB, there
would be no problem.

lo_tell:if current seek position is beyond 2GB, returns an error.

Even though iteration of lo_lseek() may move the offset to 4TB, it also
makes unavailable to use lo_tell() to obtain the current offset, so I think
it is reasonable behavior.

However, error code is not an appropriate one.

+       if (INT_MAX < offset)
+       {
+               ereport(ERROR,
+                               (errcode(ERRCODE_UNDEFINED_OBJECT),
+                                errmsg("invalid large-object
descriptor: %d", fd)));
+               PG_RETURN_INT32(-1);
+       }

According to the manpage of lseek(2)
EOVERFLOW
The resulting file offset cannot be represented in an off_t.

Please add a new error code such as ERRCODE_BLOB_OFFSET_OVERFLOW.

Changed the error code and error message. We added a new error code
"ERRCODE_UNDEFINED_OBJECT (22P07)".

4) src/test/examples/testlo64.c added for 64-bit API example(Yugo Nagata)

Comments and suggestions are welcome.

miscellaneous comments are below.

Regression test is helpful. Even though no need to try to create 4TB large
object, it is helpful to write some chunks around the design boundary.
Could you add some test cases that writes some chunks around 4TB offset.

Added 64-bit lobj test items into regression test and confirmed it worked
rightly.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Nozomi Anzai
SRA OSS, Inc. Japan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Tatsuo Ishii (#40)
Re: 64-bit API for large object

2012/9/30 Tatsuo Ishii <ishii@postgresql.org>:

* At inv_seek(), it seems to me it checks offset correctness with wrong way,
as follows:
| case SEEK_SET:
| if (offset < 0)
| elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
| obj_desc->offset = offset;
| break;
It is a right assumption, if large object size would be restricted to 2GB.
But the largest positive int64 is larger than expected limitation.
So, it seems to me it should be compared with (INT_MAX * PAGE_SIZE)
instead.

Point taken. However, checking offset < 0 seems to be still valid
because it is possible to pass minus offset to inv_seek(), no? Also I
think upper limit for seek position should be defined as (INT_MAX *
LOBLKSIZE), rather than (INT_MAX * PAGE_SIZE). Probably (INT_MAX *
LOBLKSIZE) should be defined in pg_largeobject.h as:

/*
* Maximum byte length for each large object
*/
#define MAX_LARGE_OBJECT_SIZE INT64CONST(INT_MAX * LOBLKSIZE)

Then the checking offset in inv_seek() will be:

case SEEK_SET:
if (offset < 0 || offset >= MAX_LARGE_OBJECT_SIZE)
elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
obj_desc->offset = offset;
break;
case SEEK_CUR:
if ((offset + obj_desc->offset) < 0 ||
(offset + obj_desc->offset) >= MAX_LARGE_OBJECT_SIZE)
elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
obj_desc->offset += offset;
break;
case SEEK_END:
{
int64 pos = inv_getsize(obj_desc) + offset;

if (pos < 0 || pos >= MAX_LARGE_OBJECT_SIZE)
elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
obj_desc->offset = pos;
}

What do you think?

Yes, it is exactly what I expected. Indeed, it is still need a check to prevent
negative offset here.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#42Nozomi Anzai
anzai@sraoss.co.jp
In reply to: Kohei KaiGai (#37)
1 attachment(s)
Re: 64-bit API for large object

Here is 64-bit API for large object version 3 patch.

I checked this patch. It looks good, but here are still some points to be
discussed.

* I have a question. What is the meaning of INT64_IS_BUSTED?
It seems to me a marker to indicate a platform without 64bit support.
However, the commit 901be0fad4034c9cf8a3588fd6cf2ece82e4b8ce
says as follows:
| Remove all the special-case code for INT64_IS_BUSTED, per decision that
| we're not going to support that anymore.

Removed INT64_IS_BUSTED.

* At inv_seek(), it seems to me it checks offset correctness with wrong way,
as follows:
| case SEEK_SET:
| if (offset < 0)
| elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
| obj_desc->offset = offset;
| break;
It is a right assumption, if large object size would be restricted to 2GB.
But the largest positive int64 is larger than expected limitation.
So, it seems to me it should be compared with (INT_MAX * PAGE_SIZE)
instead.

Fixed.

* At inv_write(), it definitely needs a check to prevent data-write upper 4TB.
In case when obj_desc->offset is a bit below 4TB, an additional 1GB write
will break head of the large object because of "pageno" overflow.

Added a such check.

* Please also add checks on inv_read() to prevent LargeObjectDesc->offset
unexpectedly overflows 4TB boundary.

Added a such check.

* At inv_truncate(), variable "off" is re-defined to int64. Is it really needed
change? All its usage is to store the result of "len % LOBLKSIZE".

Fixed and back to int32.

Thanks,

2012/9/24 Nozomi Anzai <anzai@sraoss.co.jp>:

Here is 64-bit API for large object version 2 patch.

I checked this patch. It can be applied onto the latest master branch
without any problems. My comments are below.

2012/9/11 Tatsuo Ishii <ishii@postgresql.org>:

Ok, here is the patch to implement 64-bit API for large object, to
allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to
32KB). The patch is based on Jeremy Drake's patch posted on September
23, 2005
(http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php)
and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai
for the backend part and Yugo Nagata for the rest(including
documentation patch).

Here are changes made in the patch:

1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata)

lo_initialize() gathers backend 64-bit large object handling
function's oid, namely lo_lseek64, lo_tell64, lo_truncate64.

If client calls lo_*64 functions and backend does not support them,
lo_*64 functions return error to caller. There might be an argument
since calls to lo_*64 functions can automatically be redirected to
32-bit older API. I don't know this is worth the trouble though.

I think it should definitely return an error code when user tries to
use lo_*64 functions towards the backend v9.2 or older, because
fallback to 32bit API can raise unexpected errors if application
intends to seek the area over than 2GB.

Currently lo_initialize() throws an error if one of oids are not
available. I doubt we do the same way for 64-bit functions since this
will make 9.3 libpq unable to access large objects stored in pre-9.2
PostgreSQL servers.

It seems to me the situation to split the case of pre-9.2 and post-9.3
using a condition of "conn->sversion >= 90300".

Fixed so, and tested it by deleteing the lo_tell64's row from pg_proc.

To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
is a pointer to 64-bit integer and actual data is placed somewhere
else. There might be other way: add new member to union u to store
64-bit integer:

typedef struct
{
int len;
int isint;
union
{
int *ptr; /* can't use void (dec compiler barfs) */
int integer;
int64 bigint; /* 64-bit integer */
} u;
} PQArgBlock;

I'm a little bit worried about this way because PQArgBlock is a public
interface.

I'm inclined to add a new field for the union; that seems to me straight
forward approach.
For example, the manner in lo_seek64() seems to me confusable.
It set 1 on "isint" field even though pointer is delivered actually.

+       argv[1].isint = 1;
+       argv[1].len = 8;
+       argv[1].u.ptr = (int *) &len;

Your proposal was not adopted per discussion.

Also we add new type "pg_int64":

#ifndef NO_PG_INT64
#define HAVE_PG_INT64 1
typedef long long int pg_int64;
#endif

in postgres_ext.h per suggestion from Tom Lane:
http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php

I'm uncertain about context of this discussion.

Does it make matter if we include <stdint.h> and use int64_t instead
of the self defined data type?

Your proposal was not adopted per discussion.
Per discussion, endiannness translation was moved to fe-lobj.c.

2) Backend lo_* functions (be-fsstubs.c)(Nozomi Anzai)

Add lo_lseek64, lo_tell64, lo_truncate64 so that they can handle
64-bit seek position and data length. loread64 and lowrite64 are not
added because if a program tries to read/write more than 2GB at once,
it would be a sign that the program need to be re-designed anyway.

I think it is a reasonable.

3) Backend inv_api.c functions(Nozomi Anzai)

No need to add new functions. Just extend them to handle 64-bit data.

BTW , what will happen if older 32-bit libpq accesses large objects
over 2GB?

lo_read and lo_write: they can read or write lobjs using 32-bit API as
long as requested read/write data length is smaller than 2GB. So I
think we can safely allow them to access over 2GB lobjs.

lo_lseek: again as long as requested offset is smaller than 2GB, there
would be no problem.

lo_tell:if current seek position is beyond 2GB, returns an error.

Even though iteration of lo_lseek() may move the offset to 4TB, it also
makes unavailable to use lo_tell() to obtain the current offset, so I think
it is reasonable behavior.

However, error code is not an appropriate one.

+       if (INT_MAX < offset)
+       {
+               ereport(ERROR,
+                               (errcode(ERRCODE_UNDEFINED_OBJECT),
+                                errmsg("invalid large-object
descriptor: %d", fd)));
+               PG_RETURN_INT32(-1);
+       }

According to the manpage of lseek(2)
EOVERFLOW
The resulting file offset cannot be represented in an off_t.

Please add a new error code such as ERRCODE_BLOB_OFFSET_OVERFLOW.

Changed the error code and error message. We added a new error code
"ERRCODE_UNDEFINED_OBJECT (22P07)".

4) src/test/examples/testlo64.c added for 64-bit API example(Yugo Nagata)

Comments and suggestions are welcome.

miscellaneous comments are below.

Regression test is helpful. Even though no need to try to create 4TB large
object, it is helpful to write some chunks around the design boundary.
Could you add some test cases that writes some chunks around 4TB offset.

Added 64-bit lobj test items into regression test and confirmed it worked
rightly.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Nozomi Anzai
SRA OSS, Inc. Japan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Nozomi Anzai
SRA OSS, Inc. Japan

Attachments:

lobj64-v3.patchapplication/octet-stream; name=lobj64-v3.patchDownload
diff --git a/doc/src/sgml/lobj.sgml b/doc/src/sgml/lobj.sgml
index 291409f..cd8eb44 100644
--- a/doc/src/sgml/lobj.sgml
+++ b/doc/src/sgml/lobj.sgml
@@ -41,7 +41,7 @@
     larger than a single database page into a secondary storage area per table.
     This makes the large object facility partially obsolete.  One
     remaining advantage of the large object facility is that it allows values
-    up to 2 GB in size, whereas <acronym>TOAST</acronym>ed fields can be at
+    up to 4 TB in size, whereas <acronym>TOAST</acronym>ed fields can be at
     most 1 GB.  Also, large objects can be randomly modified using a read/write
     API that is more efficient than performing such operations using
     <acronym>TOAST</acronym>.
@@ -312,6 +312,7 @@ int lo_read(PGconn *conn, int fd, char *buf, size_t len);
      large object descriptor, call
 <synopsis>
 int lo_lseek(PGconn *conn, int fd, int offset, int whence);
+pg_int64 lo_lseek64(PGconn *conn, int fd, pg_int64 offset, int whence);
 </synopsis>
      <indexterm><primary>lo_lseek</></> This function moves the
      current location pointer for the large object descriptor identified by
@@ -321,6 +322,9 @@ int lo_lseek(PGconn *conn, int fd, int offset, int whence);
      <symbol>SEEK_CUR</> (seek from current position), and
      <symbol>SEEK_END</> (seek from object end).  The return value is
      the new location pointer, or -1 on error.
+     <indexterm><primary>lo_lseek64</></> <function>lo_lseek64</function>
+     is a function for large objects larger than 2GB. <symbol>pg_int64</>
+     is defined as 8-byte integer type.
 </para>
 </sect2>
 
@@ -332,9 +336,12 @@ int lo_lseek(PGconn *conn, int fd, int offset, int whence);
      call
 <synopsis>
 int lo_tell(PGconn *conn, int fd);
+pg_int64 lo_tell64(PGconn *conn, int fd);
 </synopsis>
      <indexterm><primary>lo_tell</></> If there is an error, the
      return value is negative.
+     <indexterm><primary>lo_tell64</></> <function>lo_tell64</function> is
+     a function for large objects larger than 2GB.
 </para>
 </sect2>
 
@@ -345,6 +352,7 @@ int lo_tell(PGconn *conn, int fd);
      To truncate a large object to a given length, call
 <synopsis>
 int lo_truncate(PGcon *conn, int fd, size_t len);
+int lo_truncate64(PGcon *conn, int fd, pg_int64 len);
 </synopsis>
      <indexterm><primary>lo_truncate</></> truncates the large object
      descriptor <parameter>fd</> to length <parameter>len</>.  The
@@ -352,6 +360,8 @@ int lo_truncate(PGcon *conn, int fd, size_t len);
      previous <function>lo_open</function>.  If <parameter>len</> is
      greater than the current large object length, the large object
      is extended with null bytes ('\0').
+     <indexterm><primary>lo_truncate64</></> <function>lo_truncate64</function>
+     is a function for large objects larger than 2GB.
 </para>
 
 <para>
diff --git a/src/backend/libpq/be-fsstubs.c b/src/backend/libpq/be-fsstubs.c
index 6f7e474..4bc81ba 100644
--- a/src/backend/libpq/be-fsstubs.c
+++ b/src/backend/libpq/be-fsstubs.c
@@ -39,6 +39,7 @@
 #include "postgres.h"
 
 #include <fcntl.h>
+#include <limits.h>
 #include <sys/stat.h>
 #include <unistd.h>
 
@@ -216,7 +217,7 @@ lo_lseek(PG_FUNCTION_ARGS)
 	int32		fd = PG_GETARG_INT32(0);
 	int32		offset = PG_GETARG_INT32(1);
 	int32		whence = PG_GETARG_INT32(2);
-	int			status;
+	int64		status;
 
 	if (fd < 0 || fd >= cookies_size || cookies[fd] == NULL)
 		ereport(ERROR,
@@ -225,9 +226,45 @@ lo_lseek(PG_FUNCTION_ARGS)
 
 	status = inv_seek(cookies[fd], offset, whence);
 
+	if (INT_MAX < status)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_BLOB_OFFSET_OVERFLOW),
+				 errmsg("offset overflow: %d", fd)));
+		PG_RETURN_INT32(-1);
+	}
+
 	PG_RETURN_INT32(status);
 }
 
+
+Datum
+lo_lseek64(PG_FUNCTION_ARGS)
+{
+	int32		fd = PG_GETARG_INT32(0);
+	int64		offset = PG_GETARG_INT64(1);
+	int32		whence = PG_GETARG_INT32(2);
+	MemoryContext currentContext;
+	int64			status;
+
+	if (fd < 0 || fd >= cookies_size || cookies[fd] == NULL)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("invalid large-object descriptor: %d", fd)));
+		PG_RETURN_INT64(-1);
+	}
+
+	Assert(fscxt != NULL);
+	currentContext = MemoryContextSwitchTo(fscxt);
+
+	status = inv_seek(cookies[fd], offset, whence);
+
+	MemoryContextSwitchTo(currentContext);
+
+	PG_RETURN_INT64(status);
+}
+
 Datum
 lo_creat(PG_FUNCTION_ARGS)
 {
@@ -264,13 +301,46 @@ Datum
 lo_tell(PG_FUNCTION_ARGS)
 {
 	int32		fd = PG_GETARG_INT32(0);
+	int64		offset = 0;
+
+	if (fd < 0 || fd >= cookies_size || cookies[fd] == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("invalid large-object descriptor: %d", fd)));
+
+	offset = inv_tell(cookies[fd]);
+
+	if (INT_MAX < offset)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_BLOB_OFFSET_OVERFLOW),
+				 errmsg("offset overflow: %d", fd)));
+		PG_RETURN_INT32(-1);
+	}
+
+	PG_RETURN_INT32(offset);
+}
+
+
+Datum
+lo_tell64(PG_FUNCTION_ARGS)
+{
+	int32		fd = PG_GETARG_INT32(0);
 
 	if (fd < 0 || fd >= cookies_size || cookies[fd] == NULL)
+	{
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_OBJECT),
 				 errmsg("invalid large-object descriptor: %d", fd)));
+		PG_RETURN_INT64(-1);
+	}
 
-	PG_RETURN_INT32(inv_tell(cookies[fd]));
+	/*
+	 * We assume we do not need to switch contexts for inv_tell. That is
+	 * true for now, but is probably more than this module ought to
+	 * assume...
+	 */
+	PG_RETURN_INT64(inv_tell(cookies[fd]));
 }
 
 Datum
@@ -533,6 +603,33 @@ lo_truncate(PG_FUNCTION_ARGS)
 	PG_RETURN_INT32(0);
 }
 
+Datum
+lo_truncate64(PG_FUNCTION_ARGS)
+{
+	int32		fd = PG_GETARG_INT32(0);
+	int64		len = PG_GETARG_INT64(1);
+
+	if (fd < 0 || fd >= cookies_size || cookies[fd] == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("invalid large-object descriptor: %d", fd)));
+
+	/* Permission checks */
+	if (!lo_compat_privileges &&
+		pg_largeobject_aclcheck_snapshot(cookies[fd]->id,
+										 GetUserId(),
+										 ACL_UPDATE,
+									   cookies[fd]->snapshot) != ACLCHECK_OK)
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied for large object %u",
+						cookies[fd]->id)));
+
+	inv_truncate(cookies[fd], len);
+
+	PG_RETURN_INT32(0);
+}
+
 /*
  * AtEOXact_LargeObject -
  *		 prepares large objects for transaction commit
diff --git a/src/backend/storage/large_object/inv_api.c b/src/backend/storage/large_object/inv_api.c
index 3adfb15..3f5688b 100644
--- a/src/backend/storage/large_object/inv_api.c
+++ b/src/backend/storage/large_object/inv_api.c
@@ -324,10 +324,10 @@ inv_drop(Oid lobjId)
  * NOTE: LOs can contain gaps, just like Unix files.  We actually return
  * the offset of the last byte + 1.
  */
-static uint32
+static uint64
 inv_getsize(LargeObjectDesc *obj_desc)
 {
-	uint32		lastbyte = 0;
+	uint64		lastbyte = 0;
 	ScanKeyData skey[1];
 	SysScanDesc sd;
 	HeapTuple	tuple;
@@ -368,7 +368,7 @@ inv_getsize(LargeObjectDesc *obj_desc)
 				heap_tuple_untoast_attr((struct varlena *) datafield);
 			pfreeit = true;
 		}
-		lastbyte = data->pageno * LOBLKSIZE + getbytealen(datafield);
+		lastbyte = (uint64) data->pageno * LOBLKSIZE + getbytealen(datafield);
 		if (pfreeit)
 			pfree(datafield);
 	}
@@ -378,30 +378,31 @@ inv_getsize(LargeObjectDesc *obj_desc)
 	return lastbyte;
 }
 
-int
-inv_seek(LargeObjectDesc *obj_desc, int offset, int whence)
+int64
+inv_seek(LargeObjectDesc *obj_desc, int64 offset, int whence)
 {
 	Assert(PointerIsValid(obj_desc));
 
 	switch (whence)
 	{
 		case SEEK_SET:
-			if (offset < 0)
-				elog(ERROR, "invalid seek offset: %d", offset);
+			if (offset < 0 || offset >= MAX_LARGE_OBJECT_SIZE)
+				elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
 			obj_desc->offset = offset;
 			break;
 		case SEEK_CUR:
-			if (offset < 0 && obj_desc->offset < ((uint32) (-offset)))
-				elog(ERROR, "invalid seek offset: %d", offset);
+			if ((offset + obj_desc->offset) < 0 ||
+			   (offset + obj_desc->offset) >= MAX_LARGE_OBJECT_SIZE)
+				elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
 			obj_desc->offset += offset;
 			break;
 		case SEEK_END:
 			{
-				uint32		size = inv_getsize(obj_desc);
+				int64		pos = inv_getsize(obj_desc) + offset;
 
-				if (offset < 0 && size < ((uint32) (-offset)))
-					elog(ERROR, "invalid seek offset: %d", offset);
-				obj_desc->offset = size + offset;
+				if (pos < 0 || pos >= MAX_LARGE_OBJECT_SIZE)
+					elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
+				obj_desc->offset = pos;
 			}
 			break;
 		default:
@@ -410,7 +411,7 @@ inv_seek(LargeObjectDesc *obj_desc, int offset, int whence)
 	return obj_desc->offset;
 }
 
-int
+int64
 inv_tell(LargeObjectDesc *obj_desc)
 {
 	Assert(PointerIsValid(obj_desc));
@@ -422,11 +423,11 @@ int
 inv_read(LargeObjectDesc *obj_desc, char *buf, int nbytes)
 {
 	int			nread = 0;
-	int			n;
-	int			off;
+	int64		n;
+	int64		off;
 	int			len;
 	int32		pageno = (int32) (obj_desc->offset / LOBLKSIZE);
-	uint32		pageoff;
+	uint64		pageoff;
 	ScanKeyData skey[2];
 	SysScanDesc sd;
 	HeapTuple	tuple;
@@ -437,6 +438,9 @@ inv_read(LargeObjectDesc *obj_desc, char *buf, int nbytes)
 	if (nbytes <= 0)
 		return 0;
 
+	if ((nbytes + obj_desc->offset) > MAX_LARGE_OBJECT_SIZE)
+		elog(ERROR, "invalid read request size: %d", nbytes);
+
 	open_lo_relation();
 
 	ScanKeyInit(&skey[0],
@@ -467,7 +471,7 @@ inv_read(LargeObjectDesc *obj_desc, char *buf, int nbytes)
 		 * there may be missing pages if the LO contains unwritten "holes". We
 		 * want missing sections to read out as zeroes.
 		 */
-		pageoff = ((uint32) data->pageno) * LOBLKSIZE;
+		pageoff = ((uint64) data->pageno) * LOBLKSIZE;
 		if (pageoff > obj_desc->offset)
 		{
 			n = pageoff - obj_desc->offset;
@@ -560,6 +564,9 @@ inv_write(LargeObjectDesc *obj_desc, const char *buf, int nbytes)
 	if (nbytes <= 0)
 		return 0;
 
+	if ((nbytes + obj_desc->offset) > MAX_LARGE_OBJECT_SIZE)
+		elog(ERROR, "invalid write request size: %d", nbytes);
+
 	open_lo_relation();
 
 	indstate = CatalogOpenIndexes(lo_heap_r);
@@ -718,10 +725,10 @@ inv_write(LargeObjectDesc *obj_desc, const char *buf, int nbytes)
 }
 
 void
-inv_truncate(LargeObjectDesc *obj_desc, int len)
+inv_truncate(LargeObjectDesc *obj_desc, int64 len)
 {
 	int32		pageno = (int32) (len / LOBLKSIZE);
-	int			off;
+	int32		off;
 	ScanKeyData skey[2];
 	SysScanDesc sd;
 	HeapTuple	oldtuple;
diff --git a/src/backend/utils/errcodes.txt b/src/backend/utils/errcodes.txt
index 3e04164..db8ab53 100644
--- a/src/backend/utils/errcodes.txt
+++ b/src/backend/utils/errcodes.txt
@@ -199,6 +199,7 @@ Section: Class 22 - Data Exception
 2200N    E    ERRCODE_INVALID_XML_CONTENT                                    invalid_xml_content
 2200S    E    ERRCODE_INVALID_XML_COMMENT                                    invalid_xml_comment
 2200T    E    ERRCODE_INVALID_XML_PROCESSING_INSTRUCTION                     invalid_xml_processing_instruction
+22P07    E    ERRCODE_BLOB_OFFSET_OVERFLOW                                   blob_offset_overflow
 
 Section: Class 23 - Integrity Constraint Violation
 
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 77a3b41..a2da836 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -1040,14 +1040,20 @@ DATA(insert OID = 955 (  lowrite		   PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23
 DESCR("large object write");
 DATA(insert OID = 956 (  lo_lseek		   PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 23 "23 23 23" _null_ _null_ _null_ _null_	lo_lseek _null_ _null_ _null_ ));
 DESCR("large object seek");
+DATA(insert OID = 3170 (  lo_lseek64		   PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 20 "23 20 23" _null_ _null_ _null_ _null_	lo_lseek64 _null_ _null_ _null_ ));
+DESCR("large object seek (64 bit)");
 DATA(insert OID = 957 (  lo_creat		   PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 26 "23" _null_ _null_ _null_ _null_ lo_creat _null_ _null_ _null_ ));
 DESCR("large object create");
 DATA(insert OID = 715 (  lo_create		   PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 26 "26" _null_ _null_ _null_ _null_ lo_create _null_ _null_ _null_ ));
 DESCR("large object create");
 DATA(insert OID = 958 (  lo_tell		   PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 23 "23" _null_ _null_ _null_ _null_ lo_tell _null_ _null_ _null_ ));
 DESCR("large object position");
+DATA(insert OID = 3171 (  lo_tell64		   PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 20 "23" _null_ _null_ _null_ _null_ lo_tell64 _null_ _null_ _null_ ));
+DESCR("large object position (64 bit)");
 DATA(insert OID = 1004 (  lo_truncate	   PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "23 23" _null_ _null_ _null_ _null_ lo_truncate _null_ _null_ _null_ ));
 DESCR("truncate large object");
+DATA(insert OID = 3172 (  lo_truncate64	   PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "23 20" _null_ _null_ _null_ _null_ lo_truncate64 _null_ _null_ _null_ ));
+DESCR("truncate large object (64 bit)");
 
 DATA(insert OID = 959 (  on_pl			   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "600 628" _null_ _null_ _null_ _null_	on_pl _null_ _null_ _null_ ));
 DATA(insert OID = 960 (  on_sl			   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "601 628" _null_ _null_ _null_ _null_	on_sl _null_ _null_ _null_ ));
diff --git a/src/include/libpq/be-fsstubs.h b/src/include/libpq/be-fsstubs.h
index 0c832da..d74ea0e 100644
--- a/src/include/libpq/be-fsstubs.h
+++ b/src/include/libpq/be-fsstubs.h
@@ -34,8 +34,11 @@ extern Datum lowrite(PG_FUNCTION_ARGS);
 
 extern Datum lo_lseek(PG_FUNCTION_ARGS);
 extern Datum lo_tell(PG_FUNCTION_ARGS);
+extern Datum lo_lseek64(PG_FUNCTION_ARGS);
+extern Datum lo_tell64(PG_FUNCTION_ARGS);
 extern Datum lo_unlink(PG_FUNCTION_ARGS);
 extern Datum lo_truncate(PG_FUNCTION_ARGS);
+extern Datum lo_truncate64(PG_FUNCTION_ARGS);
 
 /*
  * compatibility option for access control
diff --git a/src/include/postgres_ext.h b/src/include/postgres_ext.h
index b6ebb7a..76502de 100644
--- a/src/include/postgres_ext.h
+++ b/src/include/postgres_ext.h
@@ -56,4 +56,9 @@ typedef unsigned int Oid;
 #define PG_DIAG_SOURCE_LINE		'L'
 #define PG_DIAG_SOURCE_FUNCTION 'R'
 
+#ifndef NO_PG_INT64
+#define HAVE_PG_INT64 1
+typedef long long int pg_int64;
+#endif
+
 #endif
diff --git a/src/include/storage/large_object.h b/src/include/storage/large_object.h
index 1fe07ee..52f01c6 100644
--- a/src/include/storage/large_object.h
+++ b/src/include/storage/large_object.h
@@ -37,7 +37,7 @@ typedef struct LargeObjectDesc
 	Oid			id;				/* LO's identifier */
 	Snapshot	snapshot;		/* snapshot to use */
 	SubTransactionId subid;		/* owning subtransaction ID */
-	uint32		offset;			/* current seek pointer */
+	uint64		offset;			/* current seek pointer */
 	int			flags;			/* locking info, etc */
 
 /* flag bits: */
@@ -62,7 +62,10 @@ typedef struct LargeObjectDesc
  * This avoids unnecessary tuple updates caused by partial-page writes.
  */
 #define LOBLKSIZE		(BLCKSZ / 4)
-
+/*
+ * Maximum byte length for each large object
+*/
+#define MAX_LARGE_OBJECT_SIZE	INT64CONST(INT_MAX * LOBLKSIZE)
 
 /*
  * Function definitions...
@@ -74,10 +77,10 @@ extern Oid	inv_create(Oid lobjId);
 extern LargeObjectDesc *inv_open(Oid lobjId, int flags, MemoryContext mcxt);
 extern void inv_close(LargeObjectDesc *obj_desc);
 extern int	inv_drop(Oid lobjId);
-extern int	inv_seek(LargeObjectDesc *obj_desc, int offset, int whence);
-extern int	inv_tell(LargeObjectDesc *obj_desc);
+extern int64	inv_seek(LargeObjectDesc *obj_desc, int64 offset, int whence);
+extern int64	inv_tell(LargeObjectDesc *obj_desc);
 extern int	inv_read(LargeObjectDesc *obj_desc, char *buf, int nbytes);
 extern int	inv_write(LargeObjectDesc *obj_desc, const char *buf, int nbytes);
-extern void inv_truncate(LargeObjectDesc *obj_desc, int len);
+extern void inv_truncate(LargeObjectDesc *obj_desc, int64 len);
 
 #endif   /* LARGE_OBJECT_H */
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 9d95e26..56d0bb8 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -161,3 +161,6 @@ PQping                    158
 PQpingParams              159
 PQlibVersion              160
 PQsetSingleRowMode        161
+lo_lseek64                162
+lo_tell64                 163
+lo_truncate64             164
diff --git a/src/interfaces/libpq/fe-lobj.c b/src/interfaces/libpq/fe-lobj.c
index f3a6d03..fb17ac8 100644
--- a/src/interfaces/libpq/fe-lobj.c
+++ b/src/interfaces/libpq/fe-lobj.c
@@ -37,10 +37,16 @@
 #include "libpq-int.h"
 #include "libpq/libpq-fs.h"		/* must come after sys/stat.h */
 
+/* for ntohl/htonl */
+#include <netinet/in.h>
+#include <arpa/inet.h>
+
 #define LO_BUFSIZE		  8192
 
 static int	lo_initialize(PGconn *conn);
 static Oid	lo_import_internal(PGconn *conn, const char *filename, Oid oid);
+static pg_int64	lo_hton64(pg_int64 host64);
+static pg_int64	lo_ntoh64(pg_int64 net64);
 
 /*
  * lo_open
@@ -174,6 +180,59 @@ lo_truncate(PGconn *conn, int fd, size_t len)
 	}
 }
 
+/*
+ * lo_truncate64
+ *	  truncates an existing large object to the given size
+ *
+ * returns 0 upon success
+ * returns -1 upon failure
+ */
+#ifdef HAVE_PG_INT64
+int
+lo_truncate64(PGconn *conn, int fd, pg_int64 len)
+{
+	PQArgBlock	argv[2];
+	PGresult   *res;
+	int			retval;
+	int			result_len;
+
+	if (conn == NULL || conn->lobjfuncs == NULL)
+	{
+		if (lo_initialize(conn) < 0)
+			return -1;
+	}
+
+	if (conn->lobjfuncs->fn_lo_truncate64 == 0)
+	{
+		printfPQExpBuffer(&conn->errorMessage,
+			libpq_gettext("cannot determine OID of function lo_truncate64\n"));
+		return -1;
+	}
+
+	argv[0].isint = 1;
+	argv[0].len = 4;
+	argv[0].u.integer = fd;
+
+	len = lo_hton64(len);
+	argv[1].isint = 0;
+	argv[1].len = 8;
+	argv[1].u.ptr = (int *) &len;
+
+	res = PQfn(conn, conn->lobjfuncs->fn_lo_truncate64,
+			   &retval, &result_len, 1, argv, 2);
+
+	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+	{
+		PQclear(res);
+		return retval;
+	}
+	else
+	{
+		PQclear(res);
+		return -1;
+	}
+}
+#endif
 
 /*
  * lo_read
@@ -311,6 +370,63 @@ lo_lseek(PGconn *conn, int fd, int offset, int whence)
 }
 
 /*
+ * lo_lseek64
+ *	  change the current read or write location on a large object
+ * currently, only L_SET is a legal value for whence
+ *
+ */
+
+#ifdef HAVE_PG_INT64
+pg_int64
+lo_lseek64(PGconn *conn, int fd, pg_int64 offset, int whence)
+{
+	PQArgBlock	argv[3];
+	PGresult   *res;
+	pg_int64		retval;
+	int			result_len;
+
+	if (conn == NULL || conn->lobjfuncs == NULL)
+	{
+		if (lo_initialize(conn) < 0)
+			return -1;
+	}
+
+	if (conn->lobjfuncs->fn_lo_lseek64 == 0)
+	{
+		printfPQExpBuffer(&conn->errorMessage,
+			libpq_gettext("cannot determine OID of function lo_lseek64\n"));
+		return -1;
+	}
+
+	argv[0].isint = 1;
+	argv[0].len = 4;
+	argv[0].u.integer = fd;
+
+	offset = lo_hton64(offset);
+	argv[1].isint = 0;
+	argv[1].len = 8;
+	argv[1].u.ptr = (int *) &offset;
+
+	argv[2].isint = 1;
+	argv[2].len = 4;
+	argv[2].u.integer = whence;
+
+	res = PQfn(conn, conn->lobjfuncs->fn_lo_lseek64,
+			   (int *)&retval, &result_len, 0, argv, 3);
+	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+	{
+		PQclear(res);
+		return lo_ntoh64((pg_int64)retval);
+	}
+	else
+	{
+		PQclear(res);
+		return -1;
+	}
+}
+#endif
+
+/*
  * lo_creat
  *	  create a new large object
  * the mode is ignored (once upon a time it had a use)
@@ -436,6 +552,52 @@ lo_tell(PGconn *conn, int fd)
 }
 
 /*
+ * lo_tell64
+ *	  returns the current seek location of the large object
+ *
+ */
+#ifdef HAVE_PG_INT64
+pg_int64
+lo_tell64(PGconn *conn, int fd)
+{
+	pg_int64	retval;
+	PQArgBlock	argv[1];
+	PGresult   *res;
+	int			result_len;
+
+	if (conn == NULL || conn->lobjfuncs == NULL)
+	{
+		if (lo_initialize(conn) < 0)
+			return -1;
+	}
+
+	if (conn->lobjfuncs->fn_lo_tell64 == 0)
+	{
+		printfPQExpBuffer(&conn->errorMessage,
+			libpq_gettext("cannot determine OID of function lo_tell64\n"));
+		return -1;
+	}
+
+	argv[0].isint = 1;
+	argv[0].len = 4;
+	argv[0].u.integer = fd;
+
+	res = PQfn(conn, conn->lobjfuncs->fn_lo_tell64,
+			   (int *) &retval, &result_len, 0, argv, 1);
+	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+	{
+		PQclear(res);
+		return lo_ntoh64((pg_int64) retval);
+	}
+	else
+	{
+		PQclear(res);
+		return -1;
+	}
+}
+#endif
+
+/*
  * lo_unlink
  *	  delete a file
  *
@@ -713,8 +875,11 @@ lo_initialize(PGconn *conn)
 			"'lo_create', "
 			"'lo_unlink', "
 			"'lo_lseek', "
+			"'lo_lseek64', "
 			"'lo_tell', "
+			"'lo_tell64', "
 			"'lo_truncate', "
+			"'lo_truncate64', "
 			"'loread', "
 			"'lowrite') "
 			"and pronamespace = (select oid from pg_catalog.pg_namespace "
@@ -765,10 +930,16 @@ lo_initialize(PGconn *conn)
 			lobjfuncs->fn_lo_unlink = foid;
 		else if (strcmp(fname, "lo_lseek") == 0)
 			lobjfuncs->fn_lo_lseek = foid;
+		else if (strcmp(fname, "lo_lseek64") == 0)
+			lobjfuncs->fn_lo_lseek64 = foid;
 		else if (strcmp(fname, "lo_tell") == 0)
 			lobjfuncs->fn_lo_tell = foid;
+		else if (strcmp(fname, "lo_tell64") == 0)
+			lobjfuncs->fn_lo_tell64 = foid;
 		else if (strcmp(fname, "lo_truncate") == 0)
 			lobjfuncs->fn_lo_truncate = foid;
+		else if (strcmp(fname, "lo_truncate64") == 0)
+			lobjfuncs->fn_lo_truncate64 = foid;
 		else if (strcmp(fname, "loread") == 0)
 			lobjfuncs->fn_lo_read = foid;
 		else if (strcmp(fname, "lowrite") == 0)
@@ -836,10 +1007,76 @@ lo_initialize(PGconn *conn)
 		free(lobjfuncs);
 		return -1;
 	}
-
+	if (conn->sversion >= 90300)
+	{
+		if (lobjfuncs->fn_lo_lseek64 == 0)
+		{
+			printfPQExpBuffer(&conn->errorMessage,
+					libpq_gettext("cannot determine OID of function lo_lseek64\n"));
+			free(lobjfuncs);
+			return -1;
+		}
+		if (lobjfuncs->fn_lo_tell64 == 0)
+		{
+			printfPQExpBuffer(&conn->errorMessage,
+					libpq_gettext("cannot determine OID of function lo_tell64\n"));
+			free(lobjfuncs);
+			return -1;
+		}
+		if (lobjfuncs->fn_lo_truncate64 == 0)
+		{
+			printfPQExpBuffer(&conn->errorMessage,
+					libpq_gettext("cannot determine OID of function lo_truncate64\n"));
+			free(lobjfuncs);
+			return -1;
+		}
+	}
 	/*
 	 * Put the structure into the connection control
 	 */
 	conn->lobjfuncs = lobjfuncs;
 	return 0;
 }
+
+/*
+ * lo_hton64
+ *	  converts an 64-bit integer from host byte order to network byte order
+ */
+static pg_int64
+lo_hton64(pg_int64 host64)
+{
+	pg_int64 	result;
+	uint32_t	h32, l32;
+
+	/* High order half first, since we're doing MSB-first */
+	h32 = (uint32_t) (host64 >> 32);
+
+	/* Now the low order half */
+	l32 = (uint32_t) (host64 & 0xffffffff);
+
+	result = htonl(l32);
+	result <<= 32;
+	result |= htonl(h32);
+
+	return result;
+}
+
+/*
+ * lo_ntoh64
+ *	  converts an 64-bit integer from network byte order to host byte order
+ */
+static pg_int64
+lo_ntoh64(pg_int64 net64)
+{
+	pg_int64 	result;
+	uint32_t	h32, l32;
+
+	l32 = (uint32_t) (net64 >> 32);
+	h32 = (uint32_t) (net64 & 0xffffffff);
+
+	result = ntohl(h32);
+	result <<= 32;
+	result |= ntohl(l32);
+
+	return result;
+}
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index 9d05dd2..73568ca 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -548,6 +548,12 @@ extern Oid	lo_import(PGconn *conn, const char *filename);
 extern Oid	lo_import_with_oid(PGconn *conn, const char *filename, Oid lobjId);
 extern int	lo_export(PGconn *conn, Oid lobjId, const char *filename);
 
+#ifdef HAVE_PG_INT64
+extern pg_int64	lo_lseek64(PGconn *conn, int fd, pg_int64 offset, int whence);
+extern pg_int64	lo_tell64(PGconn *conn, int fd);
+extern int	lo_truncate64(PGconn *conn, int fd, pg_int64 len);
+#endif
+
 /* === in fe-misc.c === */
 
 /* Get the version of the libpq library in use */
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 4a6c8fe..375821e 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -271,8 +271,11 @@ typedef struct pgLobjfuncs
 	Oid			fn_lo_create;	/* OID of backend function lo_create	*/
 	Oid			fn_lo_unlink;	/* OID of backend function lo_unlink	*/
 	Oid			fn_lo_lseek;	/* OID of backend function lo_lseek		*/
+	Oid			fn_lo_lseek64;	/* OID of backend function lo_lseek64		*/
 	Oid			fn_lo_tell;		/* OID of backend function lo_tell		*/
+	Oid			fn_lo_tell64;		/* OID of backend function lo_tell64		*/
 	Oid			fn_lo_truncate; /* OID of backend function lo_truncate	*/
+	Oid			fn_lo_truncate64; /* OID of backend function lo_truncate64	*/
 	Oid			fn_lo_read;		/* OID of backend function LOread		*/
 	Oid			fn_lo_write;	/* OID of backend function LOwrite		*/
 } PGlobjfuncs;
diff --git a/src/test/examples/Makefile b/src/test/examples/Makefile
index bbc6ee1..aee5c04 100644
--- a/src/test/examples/Makefile
+++ b/src/test/examples/Makefile
@@ -14,7 +14,7 @@ override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 override LDLIBS := $(libpq_pgport) $(LDLIBS)
 
 
-PROGS = testlibpq testlibpq2 testlibpq3 testlibpq4 testlo
+PROGS = testlibpq testlibpq2 testlibpq3 testlibpq4 testlo testlo64
 
 all: $(PROGS)
 
diff --git a/src/test/examples/testlo64.c b/src/test/examples/testlo64.c
new file mode 100644
index 0000000..e8faaa9
--- /dev/null
+++ b/src/test/examples/testlo64.c
@@ -0,0 +1,320 @@
+/*-------------------------------------------------------------------------
+ *
+ * testlo.c
+ *	  test using large objects with libpq
+ *
+ * Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  $PostgreSQL: pgsql/src/test/examples/testlo.c,v 1.25 2004/12/31 22:03:58 pgsql Exp $
+ *
+ *-------------------------------------------------------------------------
+ */
+#include <stdio.h>
+#include <stdlib.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+#include "libpq-fe.h"
+#include "libpq/libpq-fs.h"
+
+#define BUFSIZE			1024
+
+/*
+ * importFile -
+ *	  import file "in_filename" into database as large object "lobjOid"
+ *
+ */
+static Oid
+importFile(PGconn *conn, char *filename)
+{
+	Oid			lobjId;
+	int			lobj_fd;
+	char		buf[BUFSIZE];
+	int			nbytes,
+				tmp;
+	int			fd;
+
+	/*
+	 * open the file to be read in
+	 */
+	fd = open(filename, O_RDONLY, 0666);
+	if (fd < 0)
+	{							/* error */
+		fprintf(stderr, "can't open unix file\"%s\"\n", filename);
+	}
+
+	/*
+	 * create the large object
+	 */
+	lobjId = lo_creat(conn, INV_READ | INV_WRITE);
+	if (lobjId == 0)
+		fprintf(stderr, "can't create large object");
+
+	lobj_fd = lo_open(conn, lobjId, INV_WRITE);
+
+	/*
+	 * read in from the Unix file and write to the inversion file
+	 */
+	while ((nbytes = read(fd, buf, BUFSIZE)) > 0)
+	{
+		tmp = lo_write(conn, lobj_fd, buf, nbytes);
+		if (tmp < nbytes)
+			fprintf(stderr, "error while reading \"%s\"", filename);
+	}
+
+	close(fd);
+	lo_close(conn, lobj_fd);
+
+	return lobjId;
+}
+
+static void
+pickout(PGconn *conn, Oid lobjId, pg_int64 start, int len)
+{
+	int			lobj_fd;
+	char	   *buf;
+	int			nbytes;
+	int			nread;
+	pg_int64		pos;
+
+	lobj_fd = lo_open(conn, lobjId, INV_READ);
+	if (lobj_fd < 0)
+		fprintf(stderr, "can't open large object %u", lobjId);
+
+	if (lo_tell64(conn, lobj_fd) < 0)
+	{
+		fprintf(stderr, "error lo_tell64: %s\n", PQerrorMessage(conn));
+	}
+
+	if ((pos=lo_lseek64(conn, lobj_fd, start, SEEK_SET)) < 0)
+	{
+		fprintf(stderr, "error lo_lseek64: %s\n", PQerrorMessage(conn));
+		return;
+	}
+
+	fprintf(stderr, "before read: retval of lo_lseek64 : %lld\n", (long long int) pos);
+
+	buf = malloc(len + 1);
+
+	nread = 0;
+	while (len - nread > 0)
+	{
+		nbytes = lo_read(conn, lobj_fd, buf, len - nread);
+		buf[nbytes] = '\0';
+		fprintf(stderr, ">>> %s", buf);
+		nread += nbytes;
+		if (nbytes <= 0)
+			break;				/* no more data? */
+	}
+	free(buf);
+	fprintf(stderr, "\n");
+
+	pos = lo_tell64(conn, lobj_fd);
+	fprintf(stderr, "after read: retval of lo_tell64 : %lld\n\n", (long long int) pos);
+
+	lo_close(conn, lobj_fd);
+}
+
+static void
+overwrite(PGconn *conn, Oid lobjId, pg_int64 start, int len)
+{
+	int			lobj_fd;
+	char	   *buf;
+	int			nbytes;
+	int			nwritten;
+	int			i;
+	pg_int64		pos;
+
+	lobj_fd = lo_open(conn, lobjId, INV_READ | INV_WRITE);
+	if (lobj_fd < 0)
+		fprintf(stderr, "can't open large object %u", lobjId);
+
+	if ((pos=lo_lseek64(conn, lobj_fd, start, SEEK_SET)) < 0)
+	{
+		fprintf(stderr, "error lo_lseek64: %s\n", PQerrorMessage(conn));
+		return;
+	}
+	fprintf(stderr, "before write: retval of lo_lseek64 : %lld\n", (long long int) pos);
+
+	buf = malloc(len + 1);
+
+	for (i = 0; i < len; i++)
+		buf[i] = 'X';
+	buf[i] = '\0';
+
+	nwritten = 0;
+	while (len - nwritten > 0)
+	{
+		nbytes = lo_write(conn, lobj_fd, buf + nwritten, len - nwritten);
+		nwritten += nbytes;
+		if (nbytes <= 0)
+		{
+			fprintf(stderr, "\nWRITE FAILED!\n");
+			break;
+		}
+	}
+	free(buf);
+
+	pos = lo_tell64(conn, lobj_fd);
+	fprintf(stderr, "after write: retval of lo_tell64 : %lld\n\n", (long long int) pos);
+
+	lo_close(conn, lobj_fd);
+}
+
+static void
+my_truncate(PGconn *conn, Oid lobjId, size_t len)
+{
+	int			lobj_fd;
+
+	lobj_fd = lo_open(conn, lobjId, INV_READ | INV_WRITE);
+	if (lobj_fd < 0)
+		fprintf(stderr, "can't open large object %u", lobjId);
+
+	if (lo_truncate64(conn, lobj_fd, len) < 0)
+	{
+		fprintf(stderr, "error lo_truncate64: %s\n", PQerrorMessage(conn));
+		return;
+	}
+
+
+	fprintf(stderr, "\n");
+	lo_close(conn, lobj_fd);
+}
+
+
+/*
+ * exportFile -
+ *	  export large object "lobjOid" to file "out_filename"
+ *
+ */
+static void
+exportFile(PGconn *conn, Oid lobjId, char *filename)
+{
+	int			lobj_fd;
+	char		buf[BUFSIZE];
+	int			nbytes,
+				tmp;
+	int			fd;
+
+	/*
+	 * create an inversion "object"
+	 */
+	lobj_fd = lo_open(conn, lobjId, INV_READ);
+	if (lobj_fd < 0)
+		fprintf(stderr, "can't open large object %u", lobjId);
+
+	/*
+	 * open the file to be written to
+	 */
+	fd = open(filename, O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (fd < 0)
+	{							/* error */
+		fprintf(stderr, "can't open unix file\"%s\"",
+				filename);
+	}
+
+	/*
+	 * read in from the Unix file and write to the inversion file
+	 */
+	while ((nbytes = lo_read(conn, lobj_fd, buf, BUFSIZE)) > 0)
+	{
+		tmp = write(fd, buf, nbytes);
+		if (tmp < nbytes)
+		{
+			fprintf(stderr, "error while writing \"%s\"",
+					filename);
+		}
+	}
+
+	lo_close(conn, lobj_fd);
+	close(fd);
+
+	return;
+}
+
+static void
+exit_nicely(PGconn *conn)
+{
+	PQfinish(conn);
+	exit(1);
+}
+
+int
+main(int argc, char **argv)
+{
+	char	   *in_filename,
+			   *out_filename,
+			   *out_filename2;
+	char	   *database;
+	Oid			lobjOid;
+	PGconn	   *conn;
+	PGresult   *res;
+
+	if (argc != 5)
+	{
+		fprintf(stderr, "Usage: %s database_name in_filename out_filename out_filename2\n",
+				argv[0]);
+		exit(1);
+	}
+
+	database = argv[1];
+	in_filename = argv[2];
+	out_filename = argv[3];
+	out_filename2 = argv[4];
+
+	/*
+	 * set up the connection
+	 */
+	conn = PQsetdb(NULL, NULL, NULL, NULL, database);
+
+	/* check to see that the backend connection was successfully made */
+	if (PQstatus(conn) != CONNECTION_OK)
+	{
+		fprintf(stderr, "Connection to database failed: %s",
+				PQerrorMessage(conn));
+		exit_nicely(conn);
+	}
+
+	res = PQexec(conn, "begin");
+	PQclear(res);
+	printf("importing file \"%s\" ...\n", in_filename);
+/*	lobjOid = importFile(conn, in_filename); */
+	lobjOid = lo_import(conn, in_filename);
+	if (lobjOid == 0)
+		fprintf(stderr, "%s\n", PQerrorMessage(conn));
+	else
+	{
+		printf("\tas large object %u.\n", lobjOid);
+
+		printf("picking out bytes 4294967000-4294968000 of the large object\n");
+		pickout(conn, lobjOid, 4294967000ULL, 1000);
+
+		printf("overwriting bytes 4294967000-4294968000 of the large object with X's\n");
+		overwrite(conn, lobjOid, 4294967000ULL, 1000);
+
+
+		printf("exporting large object to file \"%s\" ...\n", out_filename);
+/*		exportFile(conn, lobjOid, out_filename); */
+		if (!lo_export(conn, lobjOid, out_filename))
+			fprintf(stderr, "%s\n", PQerrorMessage(conn));
+
+		printf("truncating to 3294968000 byte\n");
+		my_truncate(conn, lobjOid, 3294968000ULL);
+
+		printf("exporting truncated large object to file \"%s\" ...\n", out_filename2);
+		if (!lo_export(conn, lobjOid, out_filename2))
+			fprintf(stderr, "%s\n", PQerrorMessage(conn));
+
+	}
+
+	res = PQexec(conn, "end");
+	PQclear(res);
+	PQfinish(conn);
+	return 0;
+}
diff --git a/src/test/regress/input/largeobject.source b/src/test/regress/input/largeobject.source
index 40f40f8..4984d78 100644
--- a/src/test/regress/input/largeobject.source
+++ b/src/test/regress/input/largeobject.source
@@ -125,6 +125,29 @@ SELECT lo_tell(fd) FROM lotest_stash_values;
 SELECT lo_close(fd) FROM lotest_stash_values;
 END;
 
+-- Test 64-bit largelbject functions.
+BEGIN;
+UPDATE lotest_stash_values SET fd = lo_open(loid, CAST(x'20000' | x'40000' AS integer));
+
+SELECT lo_lseek64(fd, 4294967296, 0) FROM lotest_stash_values;
+SELECT lowrite(fd, 'offset:4GB') FROM lotest_stash_values;
+SELECT lo_tell64(fd) FROM lotest_stash_values;
+
+SELECT lo_lseek64(fd, -10, 1) FROM lotest_stash_values;
+SELECT lo_tell64(fd) FROM lotest_stash_values;
+SELECT loread(fd, 10) FROM lotest_stash_values;
+
+SELECT lo_truncate64(fd, 5000000000) FROM lotest_stash_values;
+SELECT lo_lseek64(fd, 0, 2) FROM lotest_stash_values;
+SELECT lo_tell64(fd) FROM lotest_stash_values;
+
+SELECT lo_truncate64(fd, 3000000000) FROM lotest_stash_values;
+SELECT lo_lseek64(fd, 0, 2) FROM lotest_stash_values;
+SELECT lo_tell64(fd) FROM lotest_stash_values;
+
+SELECT lo_close(fd) FROM lotest_stash_values;
+END;
+
 -- lo_unlink(lobjId oid) returns integer
 -- return value appears to always be 1
 SELECT lo_unlink(loid) from lotest_stash_values;
diff --git a/src/test/regress/output/largeobject.source b/src/test/regress/output/largeobject.source
index 55aaf8f..74c4772 100644
--- a/src/test/regress/output/largeobject.source
+++ b/src/test/regress/output/largeobject.source
@@ -210,6 +210,88 @@ SELECT lo_close(fd) FROM lotest_stash_values;
 (1 row)
 
 END;
+-- Test 64-bit largelbject functions.
+BEGIN;
+UPDATE lotest_stash_values SET fd = lo_open(loid, CAST(x'20000' | x'40000' AS integer));
+SELECT lo_lseek64(fd, 4294967296, 0) FROM lotest_stash_values;
+ lo_lseek64 
+------------
+ 4294967296
+(1 row)
+
+SELECT lowrite(fd, 'offset:4GB') FROM lotest_stash_values;
+ lowrite 
+---------
+      10
+(1 row)
+
+SELECT lo_tell64(fd) FROM lotest_stash_values;
+ lo_tell64  
+------------
+ 4294967306
+(1 row)
+
+SELECT lo_lseek64(fd, -10, 1) FROM lotest_stash_values;
+ lo_lseek64 
+------------
+ 4294967296
+(1 row)
+
+SELECT lo_tell64(fd) FROM lotest_stash_values;
+ lo_tell64  
+------------
+ 4294967296
+(1 row)
+
+SELECT loread(fd, 10) FROM lotest_stash_values;
+   loread   
+------------
+ offset:4GB
+(1 row)
+
+SELECT lo_truncate64(fd, 5000000000) FROM lotest_stash_values;
+ lo_truncate64 
+---------------
+             0
+(1 row)
+
+SELECT lo_lseek64(fd, 0, 2) FROM lotest_stash_values;
+ lo_lseek64 
+------------
+ 5000000000
+(1 row)
+
+SELECT lo_tell64(fd) FROM lotest_stash_values;
+ lo_tell64  
+------------
+ 5000000000
+(1 row)
+
+SELECT lo_truncate64(fd, 3000000000) FROM lotest_stash_values;
+ lo_truncate64 
+---------------
+             0
+(1 row)
+
+SELECT lo_lseek64(fd, 0, 2) FROM lotest_stash_values;
+ lo_lseek64 
+------------
+ 3000000000
+(1 row)
+
+SELECT lo_tell64(fd) FROM lotest_stash_values;
+ lo_tell64  
+------------
+ 3000000000
+(1 row)
+
+SELECT lo_close(fd) FROM lotest_stash_values;
+ lo_close 
+----------
+        0
+(1 row)
+
+END;
 -- lo_unlink(lobjId oid) returns integer
 -- return value appears to always be 1
 SELECT lo_unlink(loid) from lotest_stash_values;
#43Peter Eisentraut
peter_e@gmx.net
In reply to: Alvaro Herrera (#38)
Re: 64-bit API for large object

On 9/28/12 10:35 AM, Alvaro Herrera wrote:

Now there is one more problem in this area which is that the patch
defined a new type pg_int64 for frontend code (postgres_ext.h). This
seems a bad idea to me. We already have int64 defined in c.h. Should
we expose int64 to postgres_ext.h somehow? Should we use standard-
mandated int64_t instead? One way would be to have a new configure
check for int64_t, and if that type doesn't exist, then just don't
provide the 64 bit functionality to frontend.

Or create a new type like pg_lo_off_t.

#44Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Nozomi Anzai (#42)
Re: 64-bit API for large object

Hi Anzai-san,

The latest patch is fair enough for me, so let me hand over its reviewing
for comitters.

Thanks,

2012/10/1 Nozomi Anzai <anzai@sraoss.co.jp>:

Here is 64-bit API for large object version 3 patch.

I checked this patch. It looks good, but here are still some points to be
discussed.

* I have a question. What is the meaning of INT64_IS_BUSTED?
It seems to me a marker to indicate a platform without 64bit support.
However, the commit 901be0fad4034c9cf8a3588fd6cf2ece82e4b8ce
says as follows:
| Remove all the special-case code for INT64_IS_BUSTED, per decision that
| we're not going to support that anymore.

Removed INT64_IS_BUSTED.

* At inv_seek(), it seems to me it checks offset correctness with wrong way,
as follows:
| case SEEK_SET:
| if (offset < 0)
| elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
| obj_desc->offset = offset;
| break;
It is a right assumption, if large object size would be restricted to 2GB.
But the largest positive int64 is larger than expected limitation.
So, it seems to me it should be compared with (INT_MAX * PAGE_SIZE)
instead.

Fixed.

* At inv_write(), it definitely needs a check to prevent data-write upper 4TB.
In case when obj_desc->offset is a bit below 4TB, an additional 1GB write
will break head of the large object because of "pageno" overflow.

Added a such check.

* Please also add checks on inv_read() to prevent LargeObjectDesc->offset
unexpectedly overflows 4TB boundary.

Added a such check.

* At inv_truncate(), variable "off" is re-defined to int64. Is it really needed
change? All its usage is to store the result of "len % LOBLKSIZE".

Fixed and back to int32.

Thanks,

2012/9/24 Nozomi Anzai <anzai@sraoss.co.jp>:

Here is 64-bit API for large object version 2 patch.

I checked this patch. It can be applied onto the latest master branch
without any problems. My comments are below.

2012/9/11 Tatsuo Ishii <ishii@postgresql.org>:

Ok, here is the patch to implement 64-bit API for large object, to
allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to
32KB). The patch is based on Jeremy Drake's patch posted on September
23, 2005
(http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php)
and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai
for the backend part and Yugo Nagata for the rest(including
documentation patch).

Here are changes made in the patch:

1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata)

lo_initialize() gathers backend 64-bit large object handling
function's oid, namely lo_lseek64, lo_tell64, lo_truncate64.

If client calls lo_*64 functions and backend does not support them,
lo_*64 functions return error to caller. There might be an argument
since calls to lo_*64 functions can automatically be redirected to
32-bit older API. I don't know this is worth the trouble though.

I think it should definitely return an error code when user tries to
use lo_*64 functions towards the backend v9.2 or older, because
fallback to 32bit API can raise unexpected errors if application
intends to seek the area over than 2GB.

Currently lo_initialize() throws an error if one of oids are not
available. I doubt we do the same way for 64-bit functions since this
will make 9.3 libpq unable to access large objects stored in pre-9.2
PostgreSQL servers.

It seems to me the situation to split the case of pre-9.2 and post-9.3
using a condition of "conn->sversion >= 90300".

Fixed so, and tested it by deleteing the lo_tell64's row from pg_proc.

To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
is a pointer to 64-bit integer and actual data is placed somewhere
else. There might be other way: add new member to union u to store
64-bit integer:

typedef struct
{
int len;
int isint;
union
{
int *ptr; /* can't use void (dec compiler barfs) */
int integer;
int64 bigint; /* 64-bit integer */
} u;
} PQArgBlock;

I'm a little bit worried about this way because PQArgBlock is a public
interface.

I'm inclined to add a new field for the union; that seems to me straight
forward approach.
For example, the manner in lo_seek64() seems to me confusable.
It set 1 on "isint" field even though pointer is delivered actually.

+       argv[1].isint = 1;
+       argv[1].len = 8;
+       argv[1].u.ptr = (int *) &len;

Your proposal was not adopted per discussion.

Also we add new type "pg_int64":

#ifndef NO_PG_INT64
#define HAVE_PG_INT64 1
typedef long long int pg_int64;
#endif

in postgres_ext.h per suggestion from Tom Lane:
http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php

I'm uncertain about context of this discussion.

Does it make matter if we include <stdint.h> and use int64_t instead
of the self defined data type?

Your proposal was not adopted per discussion.
Per discussion, endiannness translation was moved to fe-lobj.c.

2) Backend lo_* functions (be-fsstubs.c)(Nozomi Anzai)

Add lo_lseek64, lo_tell64, lo_truncate64 so that they can handle
64-bit seek position and data length. loread64 and lowrite64 are not
added because if a program tries to read/write more than 2GB at once,
it would be a sign that the program need to be re-designed anyway.

I think it is a reasonable.

3) Backend inv_api.c functions(Nozomi Anzai)

No need to add new functions. Just extend them to handle 64-bit data.

BTW , what will happen if older 32-bit libpq accesses large objects
over 2GB?

lo_read and lo_write: they can read or write lobjs using 32-bit API as
long as requested read/write data length is smaller than 2GB. So I
think we can safely allow them to access over 2GB lobjs.

lo_lseek: again as long as requested offset is smaller than 2GB, there
would be no problem.

lo_tell:if current seek position is beyond 2GB, returns an error.

Even though iteration of lo_lseek() may move the offset to 4TB, it also
makes unavailable to use lo_tell() to obtain the current offset, so I think
it is reasonable behavior.

However, error code is not an appropriate one.

+       if (INT_MAX < offset)
+       {
+               ereport(ERROR,
+                               (errcode(ERRCODE_UNDEFINED_OBJECT),
+                                errmsg("invalid large-object
descriptor: %d", fd)));
+               PG_RETURN_INT32(-1);
+       }

According to the manpage of lseek(2)
EOVERFLOW
The resulting file offset cannot be represented in an off_t.

Please add a new error code such as ERRCODE_BLOB_OFFSET_OVERFLOW.

Changed the error code and error message. We added a new error code
"ERRCODE_UNDEFINED_OBJECT (22P07)".

4) src/test/examples/testlo64.c added for 64-bit API example(Yugo Nagata)

Comments and suggestions are welcome.

miscellaneous comments are below.

Regression test is helpful. Even though no need to try to create 4TB large
object, it is helpful to write some chunks around the design boundary.
Could you add some test cases that writes some chunks around 4TB offset.

Added 64-bit lobj test items into regression test and confirmed it worked
rightly.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Nozomi Anzai
SRA OSS, Inc. Japan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Nozomi Anzai
SRA OSS, Inc. Japan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#45Tatsuo Ishii
ishii@postgresql.org
In reply to: Kohei KaiGai (#44)
1 attachment(s)
Re: 64-bit API for large object

As a committer, I have looked into the patch and it seems it's good to
commit. However I want to make a small enhancement in the
documentation part:

1) lo_open section needs to mention about new 64bit APIs. Also it
should include description about lo_truncate, but this is not 64bit
APIs author's fault since it should had been there when lo_truncate
was added.

2) Add mention that 64bit APIs are only available in PostgreSQL 9.3 or
later and if the API is requested against older version of servers
it will fail.

If there's no objection, I would like commit attached patches.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Show quoted text

Hi Anzai-san,

The latest patch is fair enough for me, so let me hand over its reviewing
for comitters.

Thanks,

2012/10/1 Nozomi Anzai <anzai@sraoss.co.jp>:

Here is 64-bit API for large object version 3 patch.

I checked this patch. It looks good, but here are still some points to be
discussed.

* I have a question. What is the meaning of INT64_IS_BUSTED?
It seems to me a marker to indicate a platform without 64bit support.
However, the commit 901be0fad4034c9cf8a3588fd6cf2ece82e4b8ce
says as follows:
| Remove all the special-case code for INT64_IS_BUSTED, per decision that
| we're not going to support that anymore.

Removed INT64_IS_BUSTED.

* At inv_seek(), it seems to me it checks offset correctness with wrong way,
as follows:
| case SEEK_SET:
| if (offset < 0)
| elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
| obj_desc->offset = offset;
| break;
It is a right assumption, if large object size would be restricted to 2GB.
But the largest positive int64 is larger than expected limitation.
So, it seems to me it should be compared with (INT_MAX * PAGE_SIZE)
instead.

Fixed.

* At inv_write(), it definitely needs a check to prevent data-write upper 4TB.
In case when obj_desc->offset is a bit below 4TB, an additional 1GB write
will break head of the large object because of "pageno" overflow.

Added a such check.

* Please also add checks on inv_read() to prevent LargeObjectDesc->offset
unexpectedly overflows 4TB boundary.

Added a such check.

* At inv_truncate(), variable "off" is re-defined to int64. Is it really needed
change? All its usage is to store the result of "len % LOBLKSIZE".

Fixed and back to int32.

Thanks,

2012/9/24 Nozomi Anzai <anzai@sraoss.co.jp>:

Here is 64-bit API for large object version 2 patch.

I checked this patch. It can be applied onto the latest master branch
without any problems. My comments are below.

2012/9/11 Tatsuo Ishii <ishii@postgresql.org>:

Ok, here is the patch to implement 64-bit API for large object, to
allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to
32KB). The patch is based on Jeremy Drake's patch posted on September
23, 2005
(http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php)
and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai
for the backend part and Yugo Nagata for the rest(including
documentation patch).

Here are changes made in the patch:

1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata)

lo_initialize() gathers backend 64-bit large object handling
function's oid, namely lo_lseek64, lo_tell64, lo_truncate64.

If client calls lo_*64 functions and backend does not support them,
lo_*64 functions return error to caller. There might be an argument
since calls to lo_*64 functions can automatically be redirected to
32-bit older API. I don't know this is worth the trouble though.

I think it should definitely return an error code when user tries to
use lo_*64 functions towards the backend v9.2 or older, because
fallback to 32bit API can raise unexpected errors if application
intends to seek the area over than 2GB.

Currently lo_initialize() throws an error if one of oids are not
available. I doubt we do the same way for 64-bit functions since this
will make 9.3 libpq unable to access large objects stored in pre-9.2
PostgreSQL servers.

It seems to me the situation to split the case of pre-9.2 and post-9.3
using a condition of "conn->sversion >= 90300".

Fixed so, and tested it by deleteing the lo_tell64's row from pg_proc.

To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
is a pointer to 64-bit integer and actual data is placed somewhere
else. There might be other way: add new member to union u to store
64-bit integer:

typedef struct
{
int len;
int isint;
union
{
int *ptr; /* can't use void (dec compiler barfs) */
int integer;
int64 bigint; /* 64-bit integer */
} u;
} PQArgBlock;

I'm a little bit worried about this way because PQArgBlock is a public
interface.

I'm inclined to add a new field for the union; that seems to me straight
forward approach.
For example, the manner in lo_seek64() seems to me confusable.
It set 1 on "isint" field even though pointer is delivered actually.

+       argv[1].isint = 1;
+       argv[1].len = 8;
+       argv[1].u.ptr = (int *) &len;

Your proposal was not adopted per discussion.

Also we add new type "pg_int64":

#ifndef NO_PG_INT64
#define HAVE_PG_INT64 1
typedef long long int pg_int64;
#endif

in postgres_ext.h per suggestion from Tom Lane:
http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php

I'm uncertain about context of this discussion.

Does it make matter if we include <stdint.h> and use int64_t instead
of the self defined data type?

Your proposal was not adopted per discussion.
Per discussion, endiannness translation was moved to fe-lobj.c.

2) Backend lo_* functions (be-fsstubs.c)(Nozomi Anzai)

Add lo_lseek64, lo_tell64, lo_truncate64 so that they can handle
64-bit seek position and data length. loread64 and lowrite64 are not
added because if a program tries to read/write more than 2GB at once,
it would be a sign that the program need to be re-designed anyway.

I think it is a reasonable.

3) Backend inv_api.c functions(Nozomi Anzai)

No need to add new functions. Just extend them to handle 64-bit data.

BTW , what will happen if older 32-bit libpq accesses large objects
over 2GB?

lo_read and lo_write: they can read or write lobjs using 32-bit API as
long as requested read/write data length is smaller than 2GB. So I
think we can safely allow them to access over 2GB lobjs.

lo_lseek: again as long as requested offset is smaller than 2GB, there
would be no problem.

lo_tell:if current seek position is beyond 2GB, returns an error.

Even though iteration of lo_lseek() may move the offset to 4TB, it also
makes unavailable to use lo_tell() to obtain the current offset, so I think
it is reasonable behavior.

However, error code is not an appropriate one.

+       if (INT_MAX < offset)
+       {
+               ereport(ERROR,
+                               (errcode(ERRCODE_UNDEFINED_OBJECT),
+                                errmsg("invalid large-object
descriptor: %d", fd)));
+               PG_RETURN_INT32(-1);
+       }

According to the manpage of lseek(2)
EOVERFLOW
The resulting file offset cannot be represented in an off_t.

Please add a new error code such as ERRCODE_BLOB_OFFSET_OVERFLOW.

Changed the error code and error message. We added a new error code
"ERRCODE_UNDEFINED_OBJECT (22P07)".

4) src/test/examples/testlo64.c added for 64-bit API example(Yugo Nagata)

Comments and suggestions are welcome.

miscellaneous comments are below.

Regression test is helpful. Even though no need to try to create 4TB large
object, it is helpful to write some chunks around the design boundary.
Could you add some test cases that writes some chunks around 4TB offset.

Added 64-bit lobj test items into regression test and confirmed it worked
rightly.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Nozomi Anzai
SRA OSS, Inc. Japan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Nozomi Anzai
SRA OSS, Inc. Japan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachments:

lobj64-v4.patchtext/x-patch; charset=us-asciiDownload
diff --git a/doc/src/sgml/lobj.sgml b/doc/src/sgml/lobj.sgml
new file mode 100644
index 291409f..d34190f
*** a/doc/src/sgml/lobj.sgml
--- b/doc/src/sgml/lobj.sgml
***************
*** 41,47 ****
      larger than a single database page into a secondary storage area per table.
      This makes the large object facility partially obsolete.  One
      remaining advantage of the large object facility is that it allows values
!     up to 2 GB in size, whereas <acronym>TOAST</acronym>ed fields can be at
      most 1 GB.  Also, large objects can be randomly modified using a read/write
      API that is more efficient than performing such operations using
      <acronym>TOAST</acronym>.
--- 41,47 ----
      larger than a single database page into a secondary storage area per table.
      This makes the large object facility partially obsolete.  One
      remaining advantage of the large object facility is that it allows values
!     up to 4 TB in size, whereas <acronym>TOAST</acronym>ed fields can be at
      most 1 GB.  Also, large objects can be randomly modified using a read/write
      API that is more efficient than performing such operations using
      <acronym>TOAST</acronym>.
*************** int lo_open(PGconn *conn, Oid lobjId, in
*** 237,243 ****
       <function>lo_open</function> returns a (non-negative) large object
       descriptor for later use in <function>lo_read</function>,
       <function>lo_write</function>, <function>lo_lseek</function>,
!      <function>lo_tell</function>, and <function>lo_close</function>.
       The descriptor is only valid for
       the duration of the current transaction.
       On failure, -1 is returned.
--- 237,245 ----
       <function>lo_open</function> returns a (non-negative) large object
       descriptor for later use in <function>lo_read</function>,
       <function>lo_write</function>, <function>lo_lseek</function>,
! 	 <function>lo_lseek64</function>, <function>lo_tell</function>,
!      <function>lo_tell64</function>, <function>lo_truncate</function>,
! 	 <function>lo_truncate64</function>, and <function>lo_close</function>.
       The descriptor is only valid for
       the duration of the current transaction.
       On failure, -1 is returned.
*************** int lo_read(PGconn *conn, int fd, char *
*** 312,317 ****
--- 314,320 ----
       large object descriptor, call
  <synopsis>
  int lo_lseek(PGconn *conn, int fd, int offset, int whence);
+ pg_int64 lo_lseek64(PGconn *conn, int fd, pg_int64 offset, int whence);
  </synopsis>
       <indexterm><primary>lo_lseek</></> This function moves the
       current location pointer for the large object descriptor identified by
*************** int lo_lseek(PGconn *conn, int fd, int o
*** 321,327 ****
--- 324,339 ----
       <symbol>SEEK_CUR</> (seek from current position), and
       <symbol>SEEK_END</> (seek from object end).  The return value is
       the new location pointer, or -1 on error.
+      <indexterm><primary>lo_lseek64</></> <function>lo_lseek64</function>
+      is a function for large objects larger than 2GB. <symbol>pg_int64</>
+      is defined as 8-byte integer type.
+ </para>
+ <para>
+      <function>lo_lseek64</> is new as of <productname>PostgreSQL</productname>
+      9.3; if this function is run against an older server version, it will
+      fail and return a negative value.
  </para>
+ 
  </sect2>
  
  <sect2 id="lo-tell">
*************** int lo_lseek(PGconn *conn, int fd, int o
*** 332,340 ****
--- 344,360 ----
       call
  <synopsis>
  int lo_tell(PGconn *conn, int fd);
+ pg_int64 lo_tell64(PGconn *conn, int fd);
  </synopsis>
       <indexterm><primary>lo_tell</></> If there is an error, the
       return value is negative.
+      <indexterm><primary>lo_tell64</></> <function>lo_tell64</function> is
+      a function for large objects larger than 2GB.
+ </para>
+ <para>
+      <function>lo_tell64</> is new as of <productname>PostgreSQL</productname>
+      9.3; if this function is run against an older server version, it will
+      fail and return a negative value.
  </para>
  </sect2>
  
*************** int lo_tell(PGconn *conn, int fd);
*** 345,350 ****
--- 365,371 ----
       To truncate a large object to a given length, call
  <synopsis>
  int lo_truncate(PGcon *conn, int fd, size_t len);
+ int lo_truncate64(PGcon *conn, int fd, pg_int64 len);
  </synopsis>
       <indexterm><primary>lo_truncate</></> truncates the large object
       descriptor <parameter>fd</> to length <parameter>len</>.  The
*************** int lo_truncate(PGcon *conn, int fd, siz
*** 352,357 ****
--- 373,380 ----
       previous <function>lo_open</function>.  If <parameter>len</> is
       greater than the current large object length, the large object
       is extended with null bytes ('\0').
+      <indexterm><primary>lo_truncate64</></> <function>lo_truncate64</function>
+      is a function for large objects larger than 2GB.
  </para>
  
  <para>
*************** int lo_truncate(PGcon *conn, int fd, siz
*** 359,365 ****
  </para>
  
  <para>
!      On success <function>lo_truncate</function> returns
       zero.  On error, the return value is negative.
  </para>
  
--- 382,388 ----
  </para>
  
  <para>
!      On success <function>lo_truncate</function>, <function>lo_truncate64</function> returns
       zero.  On error, the return value is negative.
  </para>
  
*************** int lo_truncate(PGcon *conn, int fd, siz
*** 368,373 ****
--- 391,401 ----
       8.3; if this function is run against an older server version, it will
       fail and return a negative value.
  </para>
+ <para>
+      <function>lo_truncate64</> is new as of <productname>PostgreSQL</productname>
+      9.3; if this function is run against an older server version, it will
+      fail and return a negative value.
+ </para>
  </sect2>
  
  <sect2 id="lo-close">
diff --git a/src/backend/libpq/be-fsstubs.c b/src/backend/libpq/be-fsstubs.c
new file mode 100644
index 6f7e474..4bc81ba
*** a/src/backend/libpq/be-fsstubs.c
--- b/src/backend/libpq/be-fsstubs.c
***************
*** 39,44 ****
--- 39,45 ----
  #include "postgres.h"
  
  #include <fcntl.h>
+ #include <limits.h>
  #include <sys/stat.h>
  #include <unistd.h>
  
*************** lo_lseek(PG_FUNCTION_ARGS)
*** 216,222 ****
  	int32		fd = PG_GETARG_INT32(0);
  	int32		offset = PG_GETARG_INT32(1);
  	int32		whence = PG_GETARG_INT32(2);
! 	int			status;
  
  	if (fd < 0 || fd >= cookies_size || cookies[fd] == NULL)
  		ereport(ERROR,
--- 217,223 ----
  	int32		fd = PG_GETARG_INT32(0);
  	int32		offset = PG_GETARG_INT32(1);
  	int32		whence = PG_GETARG_INT32(2);
! 	int64		status;
  
  	if (fd < 0 || fd >= cookies_size || cookies[fd] == NULL)
  		ereport(ERROR,
*************** lo_lseek(PG_FUNCTION_ARGS)
*** 225,233 ****
--- 226,270 ----
  
  	status = inv_seek(cookies[fd], offset, whence);
  
+ 	if (INT_MAX < status)
+ 	{
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_BLOB_OFFSET_OVERFLOW),
+ 				 errmsg("offset overflow: %d", fd)));
+ 		PG_RETURN_INT32(-1);
+ 	}
+ 
  	PG_RETURN_INT32(status);
  }
  
+ 
+ Datum
+ lo_lseek64(PG_FUNCTION_ARGS)
+ {
+ 	int32		fd = PG_GETARG_INT32(0);
+ 	int64		offset = PG_GETARG_INT64(1);
+ 	int32		whence = PG_GETARG_INT32(2);
+ 	MemoryContext currentContext;
+ 	int64			status;
+ 
+ 	if (fd < 0 || fd >= cookies_size || cookies[fd] == NULL)
+ 	{
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_UNDEFINED_OBJECT),
+ 				 errmsg("invalid large-object descriptor: %d", fd)));
+ 		PG_RETURN_INT64(-1);
+ 	}
+ 
+ 	Assert(fscxt != NULL);
+ 	currentContext = MemoryContextSwitchTo(fscxt);
+ 
+ 	status = inv_seek(cookies[fd], offset, whence);
+ 
+ 	MemoryContextSwitchTo(currentContext);
+ 
+ 	PG_RETURN_INT64(status);
+ }
+ 
  Datum
  lo_creat(PG_FUNCTION_ARGS)
  {
*************** Datum
*** 264,276 ****
  lo_tell(PG_FUNCTION_ARGS)
  {
  	int32		fd = PG_GETARG_INT32(0);
  
  	if (fd < 0 || fd >= cookies_size || cookies[fd] == NULL)
  		ereport(ERROR,
  				(errcode(ERRCODE_UNDEFINED_OBJECT),
  				 errmsg("invalid large-object descriptor: %d", fd)));
  
! 	PG_RETURN_INT32(inv_tell(cookies[fd]));
  }
  
  Datum
--- 301,346 ----
  lo_tell(PG_FUNCTION_ARGS)
  {
  	int32		fd = PG_GETARG_INT32(0);
+ 	int64		offset = 0;
  
  	if (fd < 0 || fd >= cookies_size || cookies[fd] == NULL)
  		ereport(ERROR,
  				(errcode(ERRCODE_UNDEFINED_OBJECT),
  				 errmsg("invalid large-object descriptor: %d", fd)));
  
! 	offset = inv_tell(cookies[fd]);
! 
! 	if (INT_MAX < offset)
! 	{
! 		ereport(ERROR,
! 				(errcode(ERRCODE_BLOB_OFFSET_OVERFLOW),
! 				 errmsg("offset overflow: %d", fd)));
! 		PG_RETURN_INT32(-1);
! 	}
! 
! 	PG_RETURN_INT32(offset);
! }
! 
! 
! Datum
! lo_tell64(PG_FUNCTION_ARGS)
! {
! 	int32		fd = PG_GETARG_INT32(0);
! 
! 	if (fd < 0 || fd >= cookies_size || cookies[fd] == NULL)
! 	{
! 		ereport(ERROR,
! 				(errcode(ERRCODE_UNDEFINED_OBJECT),
! 				 errmsg("invalid large-object descriptor: %d", fd)));
! 		PG_RETURN_INT64(-1);
! 	}
! 
! 	/*
! 	 * We assume we do not need to switch contexts for inv_tell. That is
! 	 * true for now, but is probably more than this module ought to
! 	 * assume...
! 	 */
! 	PG_RETURN_INT64(inv_tell(cookies[fd]));
  }
  
  Datum
*************** lo_truncate(PG_FUNCTION_ARGS)
*** 533,538 ****
--- 603,635 ----
  	PG_RETURN_INT32(0);
  }
  
+ Datum
+ lo_truncate64(PG_FUNCTION_ARGS)
+ {
+ 	int32		fd = PG_GETARG_INT32(0);
+ 	int64		len = PG_GETARG_INT64(1);
+ 
+ 	if (fd < 0 || fd >= cookies_size || cookies[fd] == NULL)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_UNDEFINED_OBJECT),
+ 				 errmsg("invalid large-object descriptor: %d", fd)));
+ 
+ 	/* Permission checks */
+ 	if (!lo_compat_privileges &&
+ 		pg_largeobject_aclcheck_snapshot(cookies[fd]->id,
+ 										 GetUserId(),
+ 										 ACL_UPDATE,
+ 									   cookies[fd]->snapshot) != ACLCHECK_OK)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ 				 errmsg("permission denied for large object %u",
+ 						cookies[fd]->id)));
+ 
+ 	inv_truncate(cookies[fd], len);
+ 
+ 	PG_RETURN_INT32(0);
+ }
+ 
  /*
   * AtEOXact_LargeObject -
   *		 prepares large objects for transaction commit
diff --git a/src/backend/storage/large_object/inv_api.c b/src/backend/storage/large_object/inv_api.c
new file mode 100644
index 3adfb15..3f5688b
*** a/src/backend/storage/large_object/inv_api.c
--- b/src/backend/storage/large_object/inv_api.c
*************** inv_drop(Oid lobjId)
*** 324,333 ****
   * NOTE: LOs can contain gaps, just like Unix files.  We actually return
   * the offset of the last byte + 1.
   */
! static uint32
  inv_getsize(LargeObjectDesc *obj_desc)
  {
! 	uint32		lastbyte = 0;
  	ScanKeyData skey[1];
  	SysScanDesc sd;
  	HeapTuple	tuple;
--- 324,333 ----
   * NOTE: LOs can contain gaps, just like Unix files.  We actually return
   * the offset of the last byte + 1.
   */
! static uint64
  inv_getsize(LargeObjectDesc *obj_desc)
  {
! 	uint64		lastbyte = 0;
  	ScanKeyData skey[1];
  	SysScanDesc sd;
  	HeapTuple	tuple;
*************** inv_getsize(LargeObjectDesc *obj_desc)
*** 368,374 ****
  				heap_tuple_untoast_attr((struct varlena *) datafield);
  			pfreeit = true;
  		}
! 		lastbyte = data->pageno * LOBLKSIZE + getbytealen(datafield);
  		if (pfreeit)
  			pfree(datafield);
  	}
--- 368,374 ----
  				heap_tuple_untoast_attr((struct varlena *) datafield);
  			pfreeit = true;
  		}
! 		lastbyte = (uint64) data->pageno * LOBLKSIZE + getbytealen(datafield);
  		if (pfreeit)
  			pfree(datafield);
  	}
*************** inv_getsize(LargeObjectDesc *obj_desc)
*** 378,407 ****
  	return lastbyte;
  }
  
! int
! inv_seek(LargeObjectDesc *obj_desc, int offset, int whence)
  {
  	Assert(PointerIsValid(obj_desc));
  
  	switch (whence)
  	{
  		case SEEK_SET:
! 			if (offset < 0)
! 				elog(ERROR, "invalid seek offset: %d", offset);
  			obj_desc->offset = offset;
  			break;
  		case SEEK_CUR:
! 			if (offset < 0 && obj_desc->offset < ((uint32) (-offset)))
! 				elog(ERROR, "invalid seek offset: %d", offset);
  			obj_desc->offset += offset;
  			break;
  		case SEEK_END:
  			{
! 				uint32		size = inv_getsize(obj_desc);
  
! 				if (offset < 0 && size < ((uint32) (-offset)))
! 					elog(ERROR, "invalid seek offset: %d", offset);
! 				obj_desc->offset = size + offset;
  			}
  			break;
  		default:
--- 378,408 ----
  	return lastbyte;
  }
  
! int64
! inv_seek(LargeObjectDesc *obj_desc, int64 offset, int whence)
  {
  	Assert(PointerIsValid(obj_desc));
  
  	switch (whence)
  	{
  		case SEEK_SET:
! 			if (offset < 0 || offset >= MAX_LARGE_OBJECT_SIZE)
! 				elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
  			obj_desc->offset = offset;
  			break;
  		case SEEK_CUR:
! 			if ((offset + obj_desc->offset) < 0 ||
! 			   (offset + obj_desc->offset) >= MAX_LARGE_OBJECT_SIZE)
! 				elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
  			obj_desc->offset += offset;
  			break;
  		case SEEK_END:
  			{
! 				int64		pos = inv_getsize(obj_desc) + offset;
  
! 				if (pos < 0 || pos >= MAX_LARGE_OBJECT_SIZE)
! 					elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
! 				obj_desc->offset = pos;
  			}
  			break;
  		default:
*************** inv_seek(LargeObjectDesc *obj_desc, int 
*** 410,416 ****
  	return obj_desc->offset;
  }
  
! int
  inv_tell(LargeObjectDesc *obj_desc)
  {
  	Assert(PointerIsValid(obj_desc));
--- 411,417 ----
  	return obj_desc->offset;
  }
  
! int64
  inv_tell(LargeObjectDesc *obj_desc)
  {
  	Assert(PointerIsValid(obj_desc));
*************** int
*** 422,432 ****
  inv_read(LargeObjectDesc *obj_desc, char *buf, int nbytes)
  {
  	int			nread = 0;
! 	int			n;
! 	int			off;
  	int			len;
  	int32		pageno = (int32) (obj_desc->offset / LOBLKSIZE);
! 	uint32		pageoff;
  	ScanKeyData skey[2];
  	SysScanDesc sd;
  	HeapTuple	tuple;
--- 423,433 ----
  inv_read(LargeObjectDesc *obj_desc, char *buf, int nbytes)
  {
  	int			nread = 0;
! 	int64		n;
! 	int64		off;
  	int			len;
  	int32		pageno = (int32) (obj_desc->offset / LOBLKSIZE);
! 	uint64		pageoff;
  	ScanKeyData skey[2];
  	SysScanDesc sd;
  	HeapTuple	tuple;
*************** inv_read(LargeObjectDesc *obj_desc, char
*** 437,442 ****
--- 438,446 ----
  	if (nbytes <= 0)
  		return 0;
  
+ 	if ((nbytes + obj_desc->offset) > MAX_LARGE_OBJECT_SIZE)
+ 		elog(ERROR, "invalid read request size: %d", nbytes);
+ 
  	open_lo_relation();
  
  	ScanKeyInit(&skey[0],
*************** inv_read(LargeObjectDesc *obj_desc, char
*** 467,473 ****
  		 * there may be missing pages if the LO contains unwritten "holes". We
  		 * want missing sections to read out as zeroes.
  		 */
! 		pageoff = ((uint32) data->pageno) * LOBLKSIZE;
  		if (pageoff > obj_desc->offset)
  		{
  			n = pageoff - obj_desc->offset;
--- 471,477 ----
  		 * there may be missing pages if the LO contains unwritten "holes". We
  		 * want missing sections to read out as zeroes.
  		 */
! 		pageoff = ((uint64) data->pageno) * LOBLKSIZE;
  		if (pageoff > obj_desc->offset)
  		{
  			n = pageoff - obj_desc->offset;
*************** inv_write(LargeObjectDesc *obj_desc, con
*** 560,565 ****
--- 564,572 ----
  	if (nbytes <= 0)
  		return 0;
  
+ 	if ((nbytes + obj_desc->offset) > MAX_LARGE_OBJECT_SIZE)
+ 		elog(ERROR, "invalid write request size: %d", nbytes);
+ 
  	open_lo_relation();
  
  	indstate = CatalogOpenIndexes(lo_heap_r);
*************** inv_write(LargeObjectDesc *obj_desc, con
*** 718,727 ****
  }
  
  void
! inv_truncate(LargeObjectDesc *obj_desc, int len)
  {
  	int32		pageno = (int32) (len / LOBLKSIZE);
! 	int			off;
  	ScanKeyData skey[2];
  	SysScanDesc sd;
  	HeapTuple	oldtuple;
--- 725,734 ----
  }
  
  void
! inv_truncate(LargeObjectDesc *obj_desc, int64 len)
  {
  	int32		pageno = (int32) (len / LOBLKSIZE);
! 	int32		off;
  	ScanKeyData skey[2];
  	SysScanDesc sd;
  	HeapTuple	oldtuple;
diff --git a/src/backend/utils/errcodes.txt b/src/backend/utils/errcodes.txt
new file mode 100644
index 3e04164..db8ab53
*** a/src/backend/utils/errcodes.txt
--- b/src/backend/utils/errcodes.txt
*************** Section: Class 22 - Data Exception
*** 199,204 ****
--- 199,205 ----
  2200N    E    ERRCODE_INVALID_XML_CONTENT                                    invalid_xml_content
  2200S    E    ERRCODE_INVALID_XML_COMMENT                                    invalid_xml_comment
  2200T    E    ERRCODE_INVALID_XML_PROCESSING_INSTRUCTION                     invalid_xml_processing_instruction
+ 22P07    E    ERRCODE_BLOB_OFFSET_OVERFLOW                                   blob_offset_overflow
  
  Section: Class 23 - Integrity Constraint Violation
  
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
new file mode 100644
index 77a3b41..a2da836
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DATA(insert OID = 955 (  lowrite		   PGN
*** 1040,1053 ****
--- 1040,1059 ----
  DESCR("large object write");
  DATA(insert OID = 956 (  lo_lseek		   PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 23 "23 23 23" _null_ _null_ _null_ _null_	lo_lseek _null_ _null_ _null_ ));
  DESCR("large object seek");
+ DATA(insert OID = 3170 (  lo_lseek64		   PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 20 "23 20 23" _null_ _null_ _null_ _null_	lo_lseek64 _null_ _null_ _null_ ));
+ DESCR("large object seek (64 bit)");
  DATA(insert OID = 957 (  lo_creat		   PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 26 "23" _null_ _null_ _null_ _null_ lo_creat _null_ _null_ _null_ ));
  DESCR("large object create");
  DATA(insert OID = 715 (  lo_create		   PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 26 "26" _null_ _null_ _null_ _null_ lo_create _null_ _null_ _null_ ));
  DESCR("large object create");
  DATA(insert OID = 958 (  lo_tell		   PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 23 "23" _null_ _null_ _null_ _null_ lo_tell _null_ _null_ _null_ ));
  DESCR("large object position");
+ DATA(insert OID = 3171 (  lo_tell64		   PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 20 "23" _null_ _null_ _null_ _null_ lo_tell64 _null_ _null_ _null_ ));
+ DESCR("large object position (64 bit)");
  DATA(insert OID = 1004 (  lo_truncate	   PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "23 23" _null_ _null_ _null_ _null_ lo_truncate _null_ _null_ _null_ ));
  DESCR("truncate large object");
+ DATA(insert OID = 3172 (  lo_truncate64	   PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "23 20" _null_ _null_ _null_ _null_ lo_truncate64 _null_ _null_ _null_ ));
+ DESCR("truncate large object (64 bit)");
  
  DATA(insert OID = 959 (  on_pl			   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "600 628" _null_ _null_ _null_ _null_	on_pl _null_ _null_ _null_ ));
  DATA(insert OID = 960 (  on_sl			   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "601 628" _null_ _null_ _null_ _null_	on_sl _null_ _null_ _null_ ));
diff --git a/src/include/libpq/be-fsstubs.h b/src/include/libpq/be-fsstubs.h
new file mode 100644
index 0c832da..d74ea0e
*** a/src/include/libpq/be-fsstubs.h
--- b/src/include/libpq/be-fsstubs.h
*************** extern Datum lowrite(PG_FUNCTION_ARGS);
*** 34,41 ****
--- 34,44 ----
  
  extern Datum lo_lseek(PG_FUNCTION_ARGS);
  extern Datum lo_tell(PG_FUNCTION_ARGS);
+ extern Datum lo_lseek64(PG_FUNCTION_ARGS);
+ extern Datum lo_tell64(PG_FUNCTION_ARGS);
  extern Datum lo_unlink(PG_FUNCTION_ARGS);
  extern Datum lo_truncate(PG_FUNCTION_ARGS);
+ extern Datum lo_truncate64(PG_FUNCTION_ARGS);
  
  /*
   * compatibility option for access control
diff --git a/src/include/postgres_ext.h b/src/include/postgres_ext.h
new file mode 100644
index b6ebb7a..76502de
*** a/src/include/postgres_ext.h
--- b/src/include/postgres_ext.h
*************** typedef unsigned int Oid;
*** 56,59 ****
--- 56,64 ----
  #define PG_DIAG_SOURCE_LINE		'L'
  #define PG_DIAG_SOURCE_FUNCTION 'R'
  
+ #ifndef NO_PG_INT64
+ #define HAVE_PG_INT64 1
+ typedef long long int pg_int64;
+ #endif
+ 
  #endif
diff --git a/src/include/storage/large_object.h b/src/include/storage/large_object.h
new file mode 100644
index 1fe07ee..52f01c6
*** a/src/include/storage/large_object.h
--- b/src/include/storage/large_object.h
*************** typedef struct LargeObjectDesc
*** 37,43 ****
  	Oid			id;				/* LO's identifier */
  	Snapshot	snapshot;		/* snapshot to use */
  	SubTransactionId subid;		/* owning subtransaction ID */
! 	uint32		offset;			/* current seek pointer */
  	int			flags;			/* locking info, etc */
  
  /* flag bits: */
--- 37,43 ----
  	Oid			id;				/* LO's identifier */
  	Snapshot	snapshot;		/* snapshot to use */
  	SubTransactionId subid;		/* owning subtransaction ID */
! 	uint64		offset;			/* current seek pointer */
  	int			flags;			/* locking info, etc */
  
  /* flag bits: */
*************** typedef struct LargeObjectDesc
*** 62,68 ****
   * This avoids unnecessary tuple updates caused by partial-page writes.
   */
  #define LOBLKSIZE		(BLCKSZ / 4)
! 
  
  /*
   * Function definitions...
--- 62,71 ----
   * This avoids unnecessary tuple updates caused by partial-page writes.
   */
  #define LOBLKSIZE		(BLCKSZ / 4)
! /*
!  * Maximum byte length for each large object
! */
! #define MAX_LARGE_OBJECT_SIZE	INT64CONST(INT_MAX * LOBLKSIZE)
  
  /*
   * Function definitions...
*************** extern Oid	inv_create(Oid lobjId);
*** 74,83 ****
  extern LargeObjectDesc *inv_open(Oid lobjId, int flags, MemoryContext mcxt);
  extern void inv_close(LargeObjectDesc *obj_desc);
  extern int	inv_drop(Oid lobjId);
! extern int	inv_seek(LargeObjectDesc *obj_desc, int offset, int whence);
! extern int	inv_tell(LargeObjectDesc *obj_desc);
  extern int	inv_read(LargeObjectDesc *obj_desc, char *buf, int nbytes);
  extern int	inv_write(LargeObjectDesc *obj_desc, const char *buf, int nbytes);
! extern void inv_truncate(LargeObjectDesc *obj_desc, int len);
  
  #endif   /* LARGE_OBJECT_H */
--- 77,86 ----
  extern LargeObjectDesc *inv_open(Oid lobjId, int flags, MemoryContext mcxt);
  extern void inv_close(LargeObjectDesc *obj_desc);
  extern int	inv_drop(Oid lobjId);
! extern int64	inv_seek(LargeObjectDesc *obj_desc, int64 offset, int whence);
! extern int64	inv_tell(LargeObjectDesc *obj_desc);
  extern int	inv_read(LargeObjectDesc *obj_desc, char *buf, int nbytes);
  extern int	inv_write(LargeObjectDesc *obj_desc, const char *buf, int nbytes);
! extern void inv_truncate(LargeObjectDesc *obj_desc, int64 len);
  
  #endif   /* LARGE_OBJECT_H */
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
new file mode 100644
index 9d95e26..56d0bb8
*** a/src/interfaces/libpq/exports.txt
--- b/src/interfaces/libpq/exports.txt
*************** PQping                    158
*** 161,163 ****
--- 161,166 ----
  PQpingParams              159
  PQlibVersion              160
  PQsetSingleRowMode        161
+ lo_lseek64                162
+ lo_tell64                 163
+ lo_truncate64             164
diff --git a/src/interfaces/libpq/fe-lobj.c b/src/interfaces/libpq/fe-lobj.c
new file mode 100644
index f3a6d03..fb17ac8
*** a/src/interfaces/libpq/fe-lobj.c
--- b/src/interfaces/libpq/fe-lobj.c
***************
*** 37,46 ****
--- 37,52 ----
  #include "libpq-int.h"
  #include "libpq/libpq-fs.h"		/* must come after sys/stat.h */
  
+ /* for ntohl/htonl */
+ #include <netinet/in.h>
+ #include <arpa/inet.h>
+ 
  #define LO_BUFSIZE		  8192
  
  static int	lo_initialize(PGconn *conn);
  static Oid	lo_import_internal(PGconn *conn, const char *filename, Oid oid);
+ static pg_int64	lo_hton64(pg_int64 host64);
+ static pg_int64	lo_ntoh64(pg_int64 net64);
  
  /*
   * lo_open
*************** lo_truncate(PGconn *conn, int fd, size_t
*** 174,179 ****
--- 180,238 ----
  	}
  }
  
+ /*
+  * lo_truncate64
+  *	  truncates an existing large object to the given size
+  *
+  * returns 0 upon success
+  * returns -1 upon failure
+  */
+ #ifdef HAVE_PG_INT64
+ int
+ lo_truncate64(PGconn *conn, int fd, pg_int64 len)
+ {
+ 	PQArgBlock	argv[2];
+ 	PGresult   *res;
+ 	int			retval;
+ 	int			result_len;
+ 
+ 	if (conn == NULL || conn->lobjfuncs == NULL)
+ 	{
+ 		if (lo_initialize(conn) < 0)
+ 			return -1;
+ 	}
+ 
+ 	if (conn->lobjfuncs->fn_lo_truncate64 == 0)
+ 	{
+ 		printfPQExpBuffer(&conn->errorMessage,
+ 			libpq_gettext("cannot determine OID of function lo_truncate64\n"));
+ 		return -1;
+ 	}
+ 
+ 	argv[0].isint = 1;
+ 	argv[0].len = 4;
+ 	argv[0].u.integer = fd;
+ 
+ 	len = lo_hton64(len);
+ 	argv[1].isint = 0;
+ 	argv[1].len = 8;
+ 	argv[1].u.ptr = (int *) &len;
+ 
+ 	res = PQfn(conn, conn->lobjfuncs->fn_lo_truncate64,
+ 			   &retval, &result_len, 1, argv, 2);
+ 
+ 	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+ 	{
+ 		PQclear(res);
+ 		return retval;
+ 	}
+ 	else
+ 	{
+ 		PQclear(res);
+ 		return -1;
+ 	}
+ }
+ #endif
  
  /*
   * lo_read
*************** lo_lseek(PGconn *conn, int fd, int offse
*** 311,316 ****
--- 370,432 ----
  }
  
  /*
+  * lo_lseek64
+  *	  change the current read or write location on a large object
+  * currently, only L_SET is a legal value for whence
+  *
+  */
+ 
+ #ifdef HAVE_PG_INT64
+ pg_int64
+ lo_lseek64(PGconn *conn, int fd, pg_int64 offset, int whence)
+ {
+ 	PQArgBlock	argv[3];
+ 	PGresult   *res;
+ 	pg_int64		retval;
+ 	int			result_len;
+ 
+ 	if (conn == NULL || conn->lobjfuncs == NULL)
+ 	{
+ 		if (lo_initialize(conn) < 0)
+ 			return -1;
+ 	}
+ 
+ 	if (conn->lobjfuncs->fn_lo_lseek64 == 0)
+ 	{
+ 		printfPQExpBuffer(&conn->errorMessage,
+ 			libpq_gettext("cannot determine OID of function lo_lseek64\n"));
+ 		return -1;
+ 	}
+ 
+ 	argv[0].isint = 1;
+ 	argv[0].len = 4;
+ 	argv[0].u.integer = fd;
+ 
+ 	offset = lo_hton64(offset);
+ 	argv[1].isint = 0;
+ 	argv[1].len = 8;
+ 	argv[1].u.ptr = (int *) &offset;
+ 
+ 	argv[2].isint = 1;
+ 	argv[2].len = 4;
+ 	argv[2].u.integer = whence;
+ 
+ 	res = PQfn(conn, conn->lobjfuncs->fn_lo_lseek64,
+ 			   (int *)&retval, &result_len, 0, argv, 3);
+ 	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+ 	{
+ 		PQclear(res);
+ 		return lo_ntoh64((pg_int64)retval);
+ 	}
+ 	else
+ 	{
+ 		PQclear(res);
+ 		return -1;
+ 	}
+ }
+ #endif
+ 
+ /*
   * lo_creat
   *	  create a new large object
   * the mode is ignored (once upon a time it had a use)
*************** lo_tell(PGconn *conn, int fd)
*** 436,441 ****
--- 552,603 ----
  }
  
  /*
+  * lo_tell64
+  *	  returns the current seek location of the large object
+  *
+  */
+ #ifdef HAVE_PG_INT64
+ pg_int64
+ lo_tell64(PGconn *conn, int fd)
+ {
+ 	pg_int64	retval;
+ 	PQArgBlock	argv[1];
+ 	PGresult   *res;
+ 	int			result_len;
+ 
+ 	if (conn == NULL || conn->lobjfuncs == NULL)
+ 	{
+ 		if (lo_initialize(conn) < 0)
+ 			return -1;
+ 	}
+ 
+ 	if (conn->lobjfuncs->fn_lo_tell64 == 0)
+ 	{
+ 		printfPQExpBuffer(&conn->errorMessage,
+ 			libpq_gettext("cannot determine OID of function lo_tell64\n"));
+ 		return -1;
+ 	}
+ 
+ 	argv[0].isint = 1;
+ 	argv[0].len = 4;
+ 	argv[0].u.integer = fd;
+ 
+ 	res = PQfn(conn, conn->lobjfuncs->fn_lo_tell64,
+ 			   (int *) &retval, &result_len, 0, argv, 1);
+ 	if (PQresultStatus(res) == PGRES_COMMAND_OK)
+ 	{
+ 		PQclear(res);
+ 		return lo_ntoh64((pg_int64) retval);
+ 	}
+ 	else
+ 	{
+ 		PQclear(res);
+ 		return -1;
+ 	}
+ }
+ #endif
+ 
+ /*
   * lo_unlink
   *	  delete a file
   *
*************** lo_initialize(PGconn *conn)
*** 713,720 ****
--- 875,885 ----
  			"'lo_create', "
  			"'lo_unlink', "
  			"'lo_lseek', "
+ 			"'lo_lseek64', "
  			"'lo_tell', "
+ 			"'lo_tell64', "
  			"'lo_truncate', "
+ 			"'lo_truncate64', "
  			"'loread', "
  			"'lowrite') "
  			"and pronamespace = (select oid from pg_catalog.pg_namespace "
*************** lo_initialize(PGconn *conn)
*** 765,774 ****
--- 930,945 ----
  			lobjfuncs->fn_lo_unlink = foid;
  		else if (strcmp(fname, "lo_lseek") == 0)
  			lobjfuncs->fn_lo_lseek = foid;
+ 		else if (strcmp(fname, "lo_lseek64") == 0)
+ 			lobjfuncs->fn_lo_lseek64 = foid;
  		else if (strcmp(fname, "lo_tell") == 0)
  			lobjfuncs->fn_lo_tell = foid;
+ 		else if (strcmp(fname, "lo_tell64") == 0)
+ 			lobjfuncs->fn_lo_tell64 = foid;
  		else if (strcmp(fname, "lo_truncate") == 0)
  			lobjfuncs->fn_lo_truncate = foid;
+ 		else if (strcmp(fname, "lo_truncate64") == 0)
+ 			lobjfuncs->fn_lo_truncate64 = foid;
  		else if (strcmp(fname, "loread") == 0)
  			lobjfuncs->fn_lo_read = foid;
  		else if (strcmp(fname, "lowrite") == 0)
*************** lo_initialize(PGconn *conn)
*** 836,845 ****
  		free(lobjfuncs);
  		return -1;
  	}
! 
  	/*
  	 * Put the structure into the connection control
  	 */
  	conn->lobjfuncs = lobjfuncs;
  	return 0;
  }
--- 1007,1082 ----
  		free(lobjfuncs);
  		return -1;
  	}
! 	if (conn->sversion >= 90300)
! 	{
! 		if (lobjfuncs->fn_lo_lseek64 == 0)
! 		{
! 			printfPQExpBuffer(&conn->errorMessage,
! 					libpq_gettext("cannot determine OID of function lo_lseek64\n"));
! 			free(lobjfuncs);
! 			return -1;
! 		}
! 		if (lobjfuncs->fn_lo_tell64 == 0)
! 		{
! 			printfPQExpBuffer(&conn->errorMessage,
! 					libpq_gettext("cannot determine OID of function lo_tell64\n"));
! 			free(lobjfuncs);
! 			return -1;
! 		}
! 		if (lobjfuncs->fn_lo_truncate64 == 0)
! 		{
! 			printfPQExpBuffer(&conn->errorMessage,
! 					libpq_gettext("cannot determine OID of function lo_truncate64\n"));
! 			free(lobjfuncs);
! 			return -1;
! 		}
! 	}
  	/*
  	 * Put the structure into the connection control
  	 */
  	conn->lobjfuncs = lobjfuncs;
  	return 0;
  }
+ 
+ /*
+  * lo_hton64
+  *	  converts an 64-bit integer from host byte order to network byte order
+  */
+ static pg_int64
+ lo_hton64(pg_int64 host64)
+ {
+ 	pg_int64 	result;
+ 	uint32_t	h32, l32;
+ 
+ 	/* High order half first, since we're doing MSB-first */
+ 	h32 = (uint32_t) (host64 >> 32);
+ 
+ 	/* Now the low order half */
+ 	l32 = (uint32_t) (host64 & 0xffffffff);
+ 
+ 	result = htonl(l32);
+ 	result <<= 32;
+ 	result |= htonl(h32);
+ 
+ 	return result;
+ }
+ 
+ /*
+  * lo_ntoh64
+  *	  converts an 64-bit integer from network byte order to host byte order
+  */
+ static pg_int64
+ lo_ntoh64(pg_int64 net64)
+ {
+ 	pg_int64 	result;
+ 	uint32_t	h32, l32;
+ 
+ 	l32 = (uint32_t) (net64 >> 32);
+ 	h32 = (uint32_t) (net64 & 0xffffffff);
+ 
+ 	result = ntohl(h32);
+ 	result <<= 32;
+ 	result |= ntohl(l32);
+ 
+ 	return result;
+ }
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
new file mode 100644
index 9d05dd2..73568ca
*** a/src/interfaces/libpq/libpq-fe.h
--- b/src/interfaces/libpq/libpq-fe.h
*************** extern Oid	lo_import(PGconn *conn, const
*** 548,553 ****
--- 548,559 ----
  extern Oid	lo_import_with_oid(PGconn *conn, const char *filename, Oid lobjId);
  extern int	lo_export(PGconn *conn, Oid lobjId, const char *filename);
  
+ #ifdef HAVE_PG_INT64
+ extern pg_int64	lo_lseek64(PGconn *conn, int fd, pg_int64 offset, int whence);
+ extern pg_int64	lo_tell64(PGconn *conn, int fd);
+ extern int	lo_truncate64(PGconn *conn, int fd, pg_int64 len);
+ #endif
+ 
  /* === in fe-misc.c === */
  
  /* Get the version of the libpq library in use */
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
new file mode 100644
index 4a6c8fe..375821e
*** a/src/interfaces/libpq/libpq-int.h
--- b/src/interfaces/libpq/libpq-int.h
*************** typedef struct pgLobjfuncs
*** 271,278 ****
--- 271,281 ----
  	Oid			fn_lo_create;	/* OID of backend function lo_create	*/
  	Oid			fn_lo_unlink;	/* OID of backend function lo_unlink	*/
  	Oid			fn_lo_lseek;	/* OID of backend function lo_lseek		*/
+ 	Oid			fn_lo_lseek64;	/* OID of backend function lo_lseek64		*/
  	Oid			fn_lo_tell;		/* OID of backend function lo_tell		*/
+ 	Oid			fn_lo_tell64;		/* OID of backend function lo_tell64		*/
  	Oid			fn_lo_truncate; /* OID of backend function lo_truncate	*/
+ 	Oid			fn_lo_truncate64; /* OID of backend function lo_truncate64	*/
  	Oid			fn_lo_read;		/* OID of backend function LOread		*/
  	Oid			fn_lo_write;	/* OID of backend function LOwrite		*/
  } PGlobjfuncs;
diff --git a/src/test/examples/Makefile b/src/test/examples/Makefile
new file mode 100644
index bbc6ee1..aee5c04
*** a/src/test/examples/Makefile
--- b/src/test/examples/Makefile
*************** override CPPFLAGS := -I$(libpq_srcdir) $
*** 14,20 ****
  override LDLIBS := $(libpq_pgport) $(LDLIBS)
  
  
! PROGS = testlibpq testlibpq2 testlibpq3 testlibpq4 testlo
  
  all: $(PROGS)
  
--- 14,20 ----
  override LDLIBS := $(libpq_pgport) $(LDLIBS)
  
  
! PROGS = testlibpq testlibpq2 testlibpq3 testlibpq4 testlo testlo64
  
  all: $(PROGS)
  
diff --git a/src/test/regress/input/largeobject.source b/src/test/regress/input/largeobject.source
new file mode 100644
index 40f40f8..4984d78
*** a/src/test/regress/input/largeobject.source
--- b/src/test/regress/input/largeobject.source
*************** SELECT lo_tell(fd) FROM lotest_stash_val
*** 125,130 ****
--- 125,153 ----
  SELECT lo_close(fd) FROM lotest_stash_values;
  END;
  
+ -- Test 64-bit largelbject functions.
+ BEGIN;
+ UPDATE lotest_stash_values SET fd = lo_open(loid, CAST(x'20000' | x'40000' AS integer));
+ 
+ SELECT lo_lseek64(fd, 4294967296, 0) FROM lotest_stash_values;
+ SELECT lowrite(fd, 'offset:4GB') FROM lotest_stash_values;
+ SELECT lo_tell64(fd) FROM lotest_stash_values;
+ 
+ SELECT lo_lseek64(fd, -10, 1) FROM lotest_stash_values;
+ SELECT lo_tell64(fd) FROM lotest_stash_values;
+ SELECT loread(fd, 10) FROM lotest_stash_values;
+ 
+ SELECT lo_truncate64(fd, 5000000000) FROM lotest_stash_values;
+ SELECT lo_lseek64(fd, 0, 2) FROM lotest_stash_values;
+ SELECT lo_tell64(fd) FROM lotest_stash_values;
+ 
+ SELECT lo_truncate64(fd, 3000000000) FROM lotest_stash_values;
+ SELECT lo_lseek64(fd, 0, 2) FROM lotest_stash_values;
+ SELECT lo_tell64(fd) FROM lotest_stash_values;
+ 
+ SELECT lo_close(fd) FROM lotest_stash_values;
+ END;
+ 
  -- lo_unlink(lobjId oid) returns integer
  -- return value appears to always be 1
  SELECT lo_unlink(loid) from lotest_stash_values;
diff --git a/src/test/regress/output/largeobject.source b/src/test/regress/output/largeobject.source
new file mode 100644
index 55aaf8f..74c4772
*** a/src/test/regress/output/largeobject.source
--- b/src/test/regress/output/largeobject.source
*************** SELECT lo_close(fd) FROM lotest_stash_va
*** 210,215 ****
--- 210,297 ----
  (1 row)
  
  END;
+ -- Test 64-bit largelbject functions.
+ BEGIN;
+ UPDATE lotest_stash_values SET fd = lo_open(loid, CAST(x'20000' | x'40000' AS integer));
+ SELECT lo_lseek64(fd, 4294967296, 0) FROM lotest_stash_values;
+  lo_lseek64 
+ ------------
+  4294967296
+ (1 row)
+ 
+ SELECT lowrite(fd, 'offset:4GB') FROM lotest_stash_values;
+  lowrite 
+ ---------
+       10
+ (1 row)
+ 
+ SELECT lo_tell64(fd) FROM lotest_stash_values;
+  lo_tell64  
+ ------------
+  4294967306
+ (1 row)
+ 
+ SELECT lo_lseek64(fd, -10, 1) FROM lotest_stash_values;
+  lo_lseek64 
+ ------------
+  4294967296
+ (1 row)
+ 
+ SELECT lo_tell64(fd) FROM lotest_stash_values;
+  lo_tell64  
+ ------------
+  4294967296
+ (1 row)
+ 
+ SELECT loread(fd, 10) FROM lotest_stash_values;
+    loread   
+ ------------
+  offset:4GB
+ (1 row)
+ 
+ SELECT lo_truncate64(fd, 5000000000) FROM lotest_stash_values;
+  lo_truncate64 
+ ---------------
+              0
+ (1 row)
+ 
+ SELECT lo_lseek64(fd, 0, 2) FROM lotest_stash_values;
+  lo_lseek64 
+ ------------
+  5000000000
+ (1 row)
+ 
+ SELECT lo_tell64(fd) FROM lotest_stash_values;
+  lo_tell64  
+ ------------
+  5000000000
+ (1 row)
+ 
+ SELECT lo_truncate64(fd, 3000000000) FROM lotest_stash_values;
+  lo_truncate64 
+ ---------------
+              0
+ (1 row)
+ 
+ SELECT lo_lseek64(fd, 0, 2) FROM lotest_stash_values;
+  lo_lseek64 
+ ------------
+  3000000000
+ (1 row)
+ 
+ SELECT lo_tell64(fd) FROM lotest_stash_values;
+  lo_tell64  
+ ------------
+  3000000000
+ (1 row)
+ 
+ SELECT lo_close(fd) FROM lotest_stash_values;
+  lo_close 
+ ----------
+         0
+ (1 row)
+ 
+ END;
  -- lo_unlink(lobjId oid) returns integer
  -- return value appears to always be 1
  SELECT lo_unlink(loid) from lotest_stash_values;
#46Tatsuo Ishii
ishii@postgresql.org
In reply to: Tatsuo Ishii (#45)
Re: 64-bit API for large object

Ok, committed with minor editings(fix header comments in testlo64.c).
Thank you Kaigai-san for review!
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Show quoted text

As a committer, I have looked into the patch and it seems it's good to
commit. However I want to make a small enhancement in the
documentation part:

1) lo_open section needs to mention about new 64bit APIs. Also it
should include description about lo_truncate, but this is not 64bit
APIs author's fault since it should had been there when lo_truncate
was added.

2) Add mention that 64bit APIs are only available in PostgreSQL 9.3 or
later and if the API is requested against older version of servers
it will fail.

If there's no objection, I would like commit attached patches.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Hi Anzai-san,

The latest patch is fair enough for me, so let me hand over its reviewing
for comitters.

Thanks,

2012/10/1 Nozomi Anzai <anzai@sraoss.co.jp>:

Here is 64-bit API for large object version 3 patch.

I checked this patch. It looks good, but here are still some points to be
discussed.

* I have a question. What is the meaning of INT64_IS_BUSTED?
It seems to me a marker to indicate a platform without 64bit support.
However, the commit 901be0fad4034c9cf8a3588fd6cf2ece82e4b8ce
says as follows:
| Remove all the special-case code for INT64_IS_BUSTED, per decision that
| we're not going to support that anymore.

Removed INT64_IS_BUSTED.

* At inv_seek(), it seems to me it checks offset correctness with wrong way,
as follows:
| case SEEK_SET:
| if (offset < 0)
| elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
| obj_desc->offset = offset;
| break;
It is a right assumption, if large object size would be restricted to 2GB.
But the largest positive int64 is larger than expected limitation.
So, it seems to me it should be compared with (INT_MAX * PAGE_SIZE)
instead.

Fixed.

* At inv_write(), it definitely needs a check to prevent data-write upper 4TB.
In case when obj_desc->offset is a bit below 4TB, an additional 1GB write
will break head of the large object because of "pageno" overflow.

Added a such check.

* Please also add checks on inv_read() to prevent LargeObjectDesc->offset
unexpectedly overflows 4TB boundary.

Added a such check.

* At inv_truncate(), variable "off" is re-defined to int64. Is it really needed
change? All its usage is to store the result of "len % LOBLKSIZE".

Fixed and back to int32.

Thanks,

2012/9/24 Nozomi Anzai <anzai@sraoss.co.jp>:

Here is 64-bit API for large object version 2 patch.

I checked this patch. It can be applied onto the latest master branch
without any problems. My comments are below.

2012/9/11 Tatsuo Ishii <ishii@postgresql.org>:

Ok, here is the patch to implement 64-bit API for large object, to
allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to
32KB). The patch is based on Jeremy Drake's patch posted on September
23, 2005
(http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php)
and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai
for the backend part and Yugo Nagata for the rest(including
documentation patch).

Here are changes made in the patch:

1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata)

lo_initialize() gathers backend 64-bit large object handling
function's oid, namely lo_lseek64, lo_tell64, lo_truncate64.

If client calls lo_*64 functions and backend does not support them,
lo_*64 functions return error to caller. There might be an argument
since calls to lo_*64 functions can automatically be redirected to
32-bit older API. I don't know this is worth the trouble though.

I think it should definitely return an error code when user tries to
use lo_*64 functions towards the backend v9.2 or older, because
fallback to 32bit API can raise unexpected errors if application
intends to seek the area over than 2GB.

Currently lo_initialize() throws an error if one of oids are not
available. I doubt we do the same way for 64-bit functions since this
will make 9.3 libpq unable to access large objects stored in pre-9.2
PostgreSQL servers.

It seems to me the situation to split the case of pre-9.2 and post-9.3
using a condition of "conn->sversion >= 90300".

Fixed so, and tested it by deleteing the lo_tell64's row from pg_proc.

To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
is a pointer to 64-bit integer and actual data is placed somewhere
else. There might be other way: add new member to union u to store
64-bit integer:

typedef struct
{
int len;
int isint;
union
{
int *ptr; /* can't use void (dec compiler barfs) */
int integer;
int64 bigint; /* 64-bit integer */
} u;
} PQArgBlock;

I'm a little bit worried about this way because PQArgBlock is a public
interface.

I'm inclined to add a new field for the union; that seems to me straight
forward approach.
For example, the manner in lo_seek64() seems to me confusable.
It set 1 on "isint" field even though pointer is delivered actually.

+       argv[1].isint = 1;
+       argv[1].len = 8;
+       argv[1].u.ptr = (int *) &len;

Your proposal was not adopted per discussion.

Also we add new type "pg_int64":

#ifndef NO_PG_INT64
#define HAVE_PG_INT64 1
typedef long long int pg_int64;
#endif

in postgres_ext.h per suggestion from Tom Lane:
http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php

I'm uncertain about context of this discussion.

Does it make matter if we include <stdint.h> and use int64_t instead
of the self defined data type?

Your proposal was not adopted per discussion.
Per discussion, endiannness translation was moved to fe-lobj.c.

2) Backend lo_* functions (be-fsstubs.c)(Nozomi Anzai)

Add lo_lseek64, lo_tell64, lo_truncate64 so that they can handle
64-bit seek position and data length. loread64 and lowrite64 are not
added because if a program tries to read/write more than 2GB at once,
it would be a sign that the program need to be re-designed anyway.

I think it is a reasonable.

3) Backend inv_api.c functions(Nozomi Anzai)

No need to add new functions. Just extend them to handle 64-bit data.

BTW , what will happen if older 32-bit libpq accesses large objects
over 2GB?

lo_read and lo_write: they can read or write lobjs using 32-bit API as
long as requested read/write data length is smaller than 2GB. So I
think we can safely allow them to access over 2GB lobjs.

lo_lseek: again as long as requested offset is smaller than 2GB, there
would be no problem.

lo_tell:if current seek position is beyond 2GB, returns an error.

Even though iteration of lo_lseek() may move the offset to 4TB, it also
makes unavailable to use lo_tell() to obtain the current offset, so I think
it is reasonable behavior.

However, error code is not an appropriate one.

+       if (INT_MAX < offset)
+       {
+               ereport(ERROR,
+                               (errcode(ERRCODE_UNDEFINED_OBJECT),
+                                errmsg("invalid large-object
descriptor: %d", fd)));
+               PG_RETURN_INT32(-1);
+       }

According to the manpage of lseek(2)
EOVERFLOW
The resulting file offset cannot be represented in an off_t.

Please add a new error code such as ERRCODE_BLOB_OFFSET_OVERFLOW.

Changed the error code and error message. We added a new error code
"ERRCODE_UNDEFINED_OBJECT (22P07)".

4) src/test/examples/testlo64.c added for 64-bit API example(Yugo Nagata)

Comments and suggestions are welcome.

miscellaneous comments are below.

Regression test is helpful. Even though no need to try to create 4TB large
object, it is helpful to write some chunks around the design boundary.
Could you add some test cases that writes some chunks around 4TB offset.

Added 64-bit lobj test items into regression test and confirmed it worked
rightly.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Nozomi Anzai
SRA OSS, Inc. Japan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Nozomi Anzai
SRA OSS, Inc. Japan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#47Amit kapila
amit.kapila@huawei.com
In reply to: Tatsuo Ishii (#46)
Re: 64-bit API for large object

On Sunday, October 07, 2012 5:42 AM Tatsuo Ishii wrote:

Ok, committed with minor editings(fix header comments in testlo64.c).
Thank you Kaigai-san for review!

Hello Tatsuo Ishii San,

Today when I tried to build the latest code on my windows m/c, I got few errors from the checkin of this patch.

lo_hton64 (due to -- unint32_t)
.\src\interfaces\libpq\fe-lobj.c(1049) : error C2065: 'uint32_t' : undeclared identifier
inv_seek (due to -- MAX_LARGE_OBJECT_SIZE)
\src\backend\storage\large_object\inv_api.c(389) : error C2065: 'LOBLKSIZELL' : undeclared identifier
inv_read ((due to -- MAX_LARGE_OBJECT_SIZE))
\src\backend\storage\large_object\inv_api.c(441) : error C2065: 'LOBLKSIZELL' : undeclared identifier

It may be some settings problem of my m/c if it is okay on some other windows m/c.

With Regards,
Amit Kapila.

As a committer, I have looked into the patch and it seems it's good to
commit. However I want to make a small enhancement in the
documentation part:

1) lo_open section needs to mention about new 64bit APIs. Also it
should include description about lo_truncate, but this is not 64bit
APIs author's fault since it should had been there when lo_truncate
was added.

2) Add mention that 64bit APIs are only available in PostgreSQL 9.3 or
later and if the API is requested against older version of servers
it will fail.

If there's no objection, I would like commit attached patches.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Hi Anzai-san,

The latest patch is fair enough for me, so let me hand over its reviewing
for comitters.

Thanks,

2012/10/1 Nozomi Anzai <anzai@sraoss.co.jp>:

Here is 64-bit API for large object version 3 patch.

I checked this patch. It looks good, but here are still some points to be
discussed.

* I have a question. What is the meaning of INT64_IS_BUSTED?
It seems to me a marker to indicate a platform without 64bit support.
However, the commit 901be0fad4034c9cf8a3588fd6cf2ece82e4b8ce
says as follows:
| Remove all the special-case code for INT64_IS_BUSTED, per decision that
| we're not going to support that anymore.

Removed INT64_IS_BUSTED.

* At inv_seek(), it seems to me it checks offset correctness with wrong way,
as follows:
| case SEEK_SET:
| if (offset < 0)
| elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
| obj_desc->offset = offset;
| break;
It is a right assumption, if large object size would be restricted to 2GB.
But the largest positive int64 is larger than expected limitation.
So, it seems to me it should be compared with (INT_MAX * PAGE_SIZE)
instead.

Fixed.

* At inv_write(), it definitely needs a check to prevent data-write upper 4TB.
In case when obj_desc->offset is a bit below 4TB, an additional 1GB write
will break head of the large object because of "pageno" overflow.

Added a such check.

* Please also add checks on inv_read() to prevent LargeObjectDesc->offset
unexpectedly overflows 4TB boundary.

Added a such check.

* At inv_truncate(), variable "off" is re-defined to int64. Is it really needed
change? All its usage is to store the result of "len % LOBLKSIZE".

Fixed and back to int32.

Thanks,

2012/9/24 Nozomi Anzai <anzai@sraoss.co.jp>:

Here is 64-bit API for large object version 2 patch.

I checked this patch. It can be applied onto the latest master branch
without any problems. My comments are below.

2012/9/11 Tatsuo Ishii <ishii@postgresql.org>:

Ok, here is the patch to implement 64-bit API for large object, to
allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to
32KB). The patch is based on Jeremy Drake's patch posted on September
23, 2005
(http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php)
and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai
for the backend part and Yugo Nagata for the rest(including
documentation patch).

Here are changes made in the patch:

1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata)

lo_initialize() gathers backend 64-bit large object handling
function's oid, namely lo_lseek64, lo_tell64, lo_truncate64.

If client calls lo_*64 functions and backend does not support them,
lo_*64 functions return error to caller. There might be an argument
since calls to lo_*64 functions can automatically be redirected to
32-bit older API. I don't know this is worth the trouble though.

I think it should definitely return an error code when user tries to
use lo_*64 functions towards the backend v9.2 or older, because
fallback to 32bit API can raise unexpected errors if application
intends to seek the area over than 2GB.

Currently lo_initialize() throws an error if one of oids are not
available. I doubt we do the same way for 64-bit functions since this
will make 9.3 libpq unable to access large objects stored in pre-9.2
PostgreSQL servers.

It seems to me the situation to split the case of pre-9.2 and post-9.3
using a condition of "conn->sversion >= 90300".

Fixed so, and tested it by deleteing the lo_tell64's row from pg_proc.

To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
is a pointer to 64-bit integer and actual data is placed somewhere
else. There might be other way: add new member to union u to store
64-bit integer:

typedef struct
{
int len;
int isint;
union
{
int *ptr; /* can't use void (dec compiler barfs) */
int integer;
int64 bigint; /* 64-bit integer */
} u;
} PQArgBlock;

I'm a little bit worried about this way because PQArgBlock is a public
interface.

I'm inclined to add a new field for the union; that seems to me straight
forward approach.
For example, the manner in lo_seek64() seems to me confusable.
It set 1 on "isint" field even though pointer is delivered actually.

+       argv[1].isint = 1;
+       argv[1].len = 8;
+       argv[1].u.ptr = (int *) &len;

Your proposal was not adopted per discussion.

Also we add new type "pg_int64":

#ifndef NO_PG_INT64
#define HAVE_PG_INT64 1
typedef long long int pg_int64;
#endif

in postgres_ext.h per suggestion from Tom Lane:
http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php

I'm uncertain about context of this discussion.

Does it make matter if we include <stdint.h> and use int64_t instead
of the self defined data type?

Your proposal was not adopted per discussion.
Per discussion, endiannness translation was moved to fe-lobj.c.

2) Backend lo_* functions (be-fsstubs.c)(Nozomi Anzai)

Add lo_lseek64, lo_tell64, lo_truncate64 so that they can handle
64-bit seek position and data length. loread64 and lowrite64 are not
added because if a program tries to read/write more than 2GB at once,
it would be a sign that the program need to be re-designed anyway.

I think it is a reasonable.

3) Backend inv_api.c functions(Nozomi Anzai)

No need to add new functions. Just extend them to handle 64-bit data.

BTW , what will happen if older 32-bit libpq accesses large objects
over 2GB?

lo_read and lo_write: they can read or write lobjs using 32-bit API as
long as requested read/write data length is smaller than 2GB. So I
think we can safely allow them to access over 2GB lobjs.

lo_lseek: again as long as requested offset is smaller than 2GB, there
would be no problem.

lo_tell:if current seek position is beyond 2GB, returns an error.

Even though iteration of lo_lseek() may move the offset to 4TB, it also
makes unavailable to use lo_tell() to obtain the current offset, so I think
it is reasonable behavior.

However, error code is not an appropriate one.

+       if (INT_MAX < offset)
+       {
+               ereport(ERROR,
+                               (errcode(ERRCODE_UNDEFINED_OBJECT),
+                                errmsg("invalid large-object
descriptor: %d", fd)));
+               PG_RETURN_INT32(-1);
+       }

According to the manpage of lseek(2)
EOVERFLOW
The resulting file offset cannot be represented in an off_t.

Please add a new error code such as ERRCODE_BLOB_OFFSET_OVERFLOW.

Changed the error code and error message. We added a new error code
"ERRCODE_UNDEFINED_OBJECT (22P07)".

4) src/test/examples/testlo64.c added for 64-bit API example(Yugo Nagata)

Comments and suggestions are welcome.

miscellaneous comments are below.

Regression test is helpful. Even though no need to try to create 4TB large
object, it is helpful to write some chunks around the design boundary.
Could you add some test cases that writes some chunks around 4TB offset.

Added 64-bit lobj test items into regression test and confirmed it worked
rightly.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Nozomi Anzai
SRA OSS, Inc. Japan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Nozomi Anzai
SRA OSS, Inc. Japan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48Tatsuo Ishii
ishii@postgresql.org
In reply to: Amit kapila (#47)
1 attachment(s)
Re: 64-bit API for large object

Amit,

Today when I tried to build the latest code on my windows m/c, I got few errors from the checkin of this patch.

lo_hton64 (due to -- unint32_t)
.\src\interfaces\libpq\fe-lobj.c(1049) : error C2065: 'uint32_t' : undeclared identifier
inv_seek (due to -- MAX_LARGE_OBJECT_SIZE)
\src\backend\storage\large_object\inv_api.c(389) : error C2065: 'LOBLKSIZELL' : undeclared identifier
inv_read ((due to -- MAX_LARGE_OBJECT_SIZE))
\src\backend\storage\large_object\inv_api.c(441) : error C2065: 'LOBLKSIZELL' : undeclared identifier

Thanks for the report. Can you please try included patch?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Attachments:

lobj64fix.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/include/storage/large_object.h b/src/include/storage/large_object.h
new file mode 100644
index 52f01c6..715f0c3
*** a/src/include/storage/large_object.h
--- b/src/include/storage/large_object.h
*************** typedef struct LargeObjectDesc
*** 65,71 ****
  /*
   * Maximum byte length for each large object
  */
! #define MAX_LARGE_OBJECT_SIZE	INT64CONST(INT_MAX * LOBLKSIZE)
  
  /*
   * Function definitions...
--- 65,71 ----
  /*
   * Maximum byte length for each large object
  */
! #define MAX_LARGE_OBJECT_SIZE	((int64)INT_MAX * LOBLKSIZE)
  
  /*
   * Function definitions...
diff --git a/src/interfaces/libpq/fe-lobj.c b/src/interfaces/libpq/fe-lobj.c
new file mode 100644
index fb17ac8..022cfec
*** a/src/interfaces/libpq/fe-lobj.c
--- b/src/interfaces/libpq/fe-lobj.c
*************** static pg_int64
*** 1046,1058 ****
  lo_hton64(pg_int64 host64)
  {
  	pg_int64 	result;
! 	uint32_t	h32, l32;
  
  	/* High order half first, since we're doing MSB-first */
! 	h32 = (uint32_t) (host64 >> 32);
  
  	/* Now the low order half */
! 	l32 = (uint32_t) (host64 & 0xffffffff);
  
  	result = htonl(l32);
  	result <<= 32;
--- 1046,1058 ----
  lo_hton64(pg_int64 host64)
  {
  	pg_int64 	result;
! 	uint32	h32, l32;
  
  	/* High order half first, since we're doing MSB-first */
! 	h32 = (uint32) (host64 >> 32);
  
  	/* Now the low order half */
! 	l32 = (uint32) (host64 & 0xffffffff);
  
  	result = htonl(l32);
  	result <<= 32;
*************** static pg_int64
*** 1069,1078 ****
  lo_ntoh64(pg_int64 net64)
  {
  	pg_int64 	result;
! 	uint32_t	h32, l32;
  
! 	l32 = (uint32_t) (net64 >> 32);
! 	h32 = (uint32_t) (net64 & 0xffffffff);
  
  	result = ntohl(h32);
  	result <<= 32;
--- 1069,1078 ----
  lo_ntoh64(pg_int64 net64)
  {
  	pg_int64 	result;
! 	uint32	h32, l32;
  
! 	l32 = (uint32) (net64 >> 32);
! 	h32 = (uint32) (net64 & 0xffffffff);
  
  	result = ntohl(h32);
  	result <<= 32;
#49Amit Kapila
amit.kapila@huawei.com
In reply to: Tatsuo Ishii (#48)
Re: 64-bit API for large object

On Sunday, October 07, 2012 1:25 PM Tatsuo Ishii wrote:
Amit,

Today when I tried to build the latest code on my windows m/c, I got

few errors from the checkin of this patch.

lo_hton64 (due to -- unint32_t)
.\src\interfaces\libpq\fe-lobj.c(1049) : error C2065: 'uint32_t' :

undeclared identifier

inv_seek (due to -- MAX_LARGE_OBJECT_SIZE)
\src\backend\storage\large_object\inv_api.c(389) : error C2065:

'LOBLKSIZELL' : undeclared identifier

inv_read ((due to -- MAX_LARGE_OBJECT_SIZE))
\src\backend\storage\large_object\inv_api.c(441) : error C2065:

'LOBLKSIZELL' : undeclared identifier

Thanks for the report. Can you please try included patch?

Above errors are not coming after the changes in attached patch.

With Regards,
Amit Kapila.