64-bit API for large object
Hi,
I found this in the TODO list:
Add API for 64-bit large object access
If this is a still valid TODO item and nobody is working on this, I
would like to work in this.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
On Wed, 2012-08-22 at 07:27 +0900, Tatsuo Ishii wrote:
I found this in the TODO list:
Add API for 64-bit large object access
If this is a still valid TODO item and nobody is working on this, I
would like to work in this.
Large objects are limited to 2 GB in size, so a 64-bit API doesn't sound
very useful to me at the moment.
Peter Eisentraut <peter_e@gmx.net> writes:
On Wed, 2012-08-22 at 07:27 +0900, Tatsuo Ishii wrote:
I found this in the TODO list:
Add API for 64-bit large object access
If this is a still valid TODO item and nobody is working on this, I
would like to work in this.
Large objects are limited to 2 GB in size, so a 64-bit API doesn't sound
very useful to me at the moment.
Not entirely. pg_largeobject.pageno is int32, but that's still 2G pages
not bytes, so there's three or so orders of magnitude that could be
gotten by expanding the client-side API before we'd have to change the
server's on-disk representation.
There might well be some local variables in the server's largeobject
code that would need to be widened, but that's the easiest part of the
job.
regards, tom lane
Large objects are limited to 2 GB in size, so a 64-bit API doesn't sound
very useful to me at the moment.Not entirely. pg_largeobject.pageno is int32, but that's still 2G pages
not bytes, so there's three or so orders of magnitude that could be
gotten by expanding the client-side API before we'd have to change the
server's on-disk representation.
Right. You have already explained that in this:
http://archives.postgresql.org/pgsql-hackers/2010-09/msg01888.php
There might well be some local variables in the server's largeobject
code that would need to be widened, but that's the easiest part of the
job.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
On Wed, 2012-08-22 at 01:14 -0400, Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
On Wed, 2012-08-22 at 07:27 +0900, Tatsuo Ishii wrote:
I found this in the TODO list:
Add API for 64-bit large object access
If this is a still valid TODO item and nobody is working on this, I
would like to work in this.Large objects are limited to 2 GB in size, so a 64-bit API doesn't sound
very useful to me at the moment.Not entirely. pg_largeobject.pageno is int32, but that's still 2G pages
not bytes, so there's three or so orders of magnitude that could be
gotten by expanding the client-side API before we'd have to change the
server's on-disk representation.
Well then a 64-bit API would be very useful. Go for it. :-)
On Wed, 2012-08-22 at 01:14 -0400, Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
On Wed, 2012-08-22 at 07:27 +0900, Tatsuo Ishii wrote:
I found this in the TODO list:
Add API for 64-bit large object access
If this is a still valid TODO item and nobody is working on this, I
would like to work in this.Large objects are limited to 2 GB in size, so a 64-bit API doesn't sound
very useful to me at the moment.Not entirely. pg_largeobject.pageno is int32, but that's still 2G pages
not bytes, so there's three or so orders of magnitude that could be
gotten by expanding the client-side API before we'd have to change the
server's on-disk representation.Well then a 64-bit API would be very useful. Go for it. :-)
Ok, I will do it.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
Hi,
I found this in the TODO list:
Add API for 64-bit large object access
If this is a still valid TODO item and nobody is working on this, I
would like to work in this.
Here are the list of functions think we need to change.
1) Frontend lo_* libpq functions(fe-lobj.c)
lo_initialize() need to get backend 64-bit large object handling
function's oid, namely lo_lseek64, lo_tell64, lo_truncate64, loread64,
lowrite64(explained later). If they are not available, use older
32-bit backend functions.
BTW, currently lo_initialize() throws an error if one of oids are not
avilable. I doubt we do the same way for 64-bit functions since this
will make 9.3 libpq unable to access large objects stored in pre-9.2
PostgreSQL servers.
2) Bakend lo_* functions (be-fsstubs.c)
Add lo_lseek64, lo_tell64, lo_truncate64, loread64 and lowrite64 so
that they can handle 64-bit seek position and data lenghth.
3) Backend inv_api.c functions
No need to add new functions. Just extend them to handle 64-bit data.
BTW , what will happen if older 32-bit libpq accesses large objects
over 2GB?
lo_read and lo_write: they can read or write lobjs using 32-bit API as
long as requested read/write data length is smaller than 2GB. So I
think we can safely allow them to access over 2GB lobjs.
lo_lseek: again as long as requested offset is smaller than 2GB, there
would be no problem.
lo_tell:if current seek position is beyond 2GB, returns an error.
Comments, suggestions?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
Tatsuo Ishii <ishii@postgresql.org> writes:
Here are the list of functions think we need to change.
1) Frontend lo_* libpq functions(fe-lobj.c)
lo_initialize() need to get backend 64-bit large object handling
function's oid, namely lo_lseek64, lo_tell64, lo_truncate64, loread64,
lowrite64(explained later). If they are not available, use older
32-bit backend functions.
I don't particularly see a need for loread64 or lowrite64. Who's going
to be reading or writing more than 2GB at once? If someone tries,
they'd be well advised to reconsider their code design anyway.
regards, tom lane
1) Frontend lo_* libpq functions(fe-lobj.c)
lo_initialize() need to get backend 64-bit large object handling
function's oid, namely lo_lseek64, lo_tell64, lo_truncate64, loread64,
lowrite64(explained later). If they are not available, use older
32-bit backend functions.I don't particularly see a need for loread64 or lowrite64. Who's going
to be reading or writing more than 2GB at once? If someone tries,
they'd be well advised to reconsider their code design anyway.
Ok, loread64 and lowrite64 will not be added.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
Correct me if I am wrong.
After expanding large object API to 64-bit, the max size of a large
object will be 8TB(assuming 8KB default BLKSZ).
large object max size = pageno(int32) * LOBLKSIZE
= (2^32-1) * (BLCKSZ / 4)
= (2^32-1) * (8192/4)
= 8TB
I just want to confirm my calculation is correct.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
Tatsuo Ishii <ishii@postgresql.org> writes:
Correct me if I am wrong.
After expanding large object API to 64-bit, the max size of a large
object will be 8TB(assuming 8KB default BLKSZ).
large object max size = pageno(int32) * LOBLKSIZE
= (2^32-1) * (BLCKSZ / 4)
= (2^32-1) * (8192/4)
= 8TB
I just want to confirm my calculation is correct.
pg_largeobject.pageno is a signed int, so I don't think we can let it go
past 2^31-1, so half that.
We could buy back the other bit if we redefined the column as oid
instead of int4 (to make it unsigned), but I think that would create
fairly considerable risk of confusion between the loid and pageno
columns (loid already being oid). I'd just as soon not go there,
at least not till we start seeing actual field complaints about
4TB being paltry ;-)
regards, tom lane
pg_largeobject.pageno is a signed int, so I don't think we can let it go
past 2^31-1, so half that.We could buy back the other bit if we redefined the column as oid
instead of int4 (to make it unsigned), but I think that would create
fairly considerable risk of confusion between the loid and pageno
columns (loid already being oid). I'd just as soon not go there,
at least not till we start seeing actual field complaints about
4TB being paltry ;-)
Agreed. 4TB should be enough.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
On Tue, Aug 28, 2012 at 10:51 PM, Tatsuo Ishii <ishii@postgresql.org> wrote:
pg_largeobject.pageno is a signed int, so I don't think we can let it go
past 2^31-1, so half that.We could buy back the other bit if we redefined the column as oid
instead of int4 (to make it unsigned), but I think that would create
fairly considerable risk of confusion between the loid and pageno
columns (loid already being oid). I'd just as soon not go there,
at least not till we start seeing actual field complaints about
4TB being paltry ;-)Agreed. 4TB should be enough.
...for anybody!
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Ok, here is the patch to implement 64-bit API for large object, to
allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to
32KB). The patch is based on Jeremy Drake's patch posted on September
23, 2005
(http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php)
and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai
for the backend part and Yugo Nagata for the rest(including
documentation patch).
Here are changes made in the patch:
1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata)
lo_initialize() gathers backend 64-bit large object handling
function's oid, namely lo_lseek64, lo_tell64, lo_truncate64.
If client calls lo_*64 functions and backend does not support them,
lo_*64 functions return error to caller. There might be an argument
since calls to lo_*64 functions can automatically be redirected to
32-bit older API. I don't know this is worth the trouble though.
Currently lo_initialize() throws an error if one of oids are not
available. I doubt we do the same way for 64-bit functions since this
will make 9.3 libpq unable to access large objects stored in pre-9.2
PostgreSQL servers.
To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
is a pointer to 64-bit integer and actual data is placed somewhere
else. There might be other way: add new member to union u to store
64-bit integer:
typedef struct
{
int len;
int isint;
union
{
int *ptr; /* can't use void (dec compiler barfs) */
int integer;
int64 bigint; /* 64-bit integer */
} u;
} PQArgBlock;
I'm a little bit worried about this way because PQArgBlock is a public
interface.
Also we add new type "pg_int64":
#ifndef NO_PG_INT64
#define HAVE_PG_INT64 1
typedef long long int pg_int64;
#endif
in postgres_ext.h per suggestion from Tom Lane:
http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php
2) Backend lo_* functions (be-fsstubs.c)(Nozomi Anzai)
Add lo_lseek64, lo_tell64, lo_truncate64 so that they can handle
64-bit seek position and data length. loread64 and lowrite64 are not
added because if a program tries to read/write more than 2GB at once,
it would be a sign that the program need to be re-designed anyway.
3) Backend inv_api.c functions(Nozomi Anzai)
No need to add new functions. Just extend them to handle 64-bit data.
BTW , what will happen if older 32-bit libpq accesses large objects
over 2GB?
lo_read and lo_write: they can read or write lobjs using 32-bit API as
long as requested read/write data length is smaller than 2GB. So I
think we can safely allow them to access over 2GB lobjs.
lo_lseek: again as long as requested offset is smaller than 2GB, there
would be no problem.
lo_tell:if current seek position is beyond 2GB, returns an error.
4) src/test/examples/testlo64.c added for 64-bit API example(Yugo Nagata)
Comments and suggestions are welcome.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
Attachments:
I checked this patch. It can be applied onto the latest master branch
without any problems. My comments are below.
2012/9/11 Tatsuo Ishii <ishii@postgresql.org>:
Ok, here is the patch to implement 64-bit API for large object, to
allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to
32KB). The patch is based on Jeremy Drake's patch posted on September
23, 2005
(http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php)
and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai
for the backend part and Yugo Nagata for the rest(including
documentation patch).Here are changes made in the patch:
1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata)
lo_initialize() gathers backend 64-bit large object handling
function's oid, namely lo_lseek64, lo_tell64, lo_truncate64.If client calls lo_*64 functions and backend does not support them,
lo_*64 functions return error to caller. There might be an argument
since calls to lo_*64 functions can automatically be redirected to
32-bit older API. I don't know this is worth the trouble though.
I think it should definitely return an error code when user tries to
use lo_*64 functions towards the backend v9.2 or older, because
fallback to 32bit API can raise unexpected errors if application
intends to seek the area over than 2GB.
Currently lo_initialize() throws an error if one of oids are not
available. I doubt we do the same way for 64-bit functions since this
will make 9.3 libpq unable to access large objects stored in pre-9.2
PostgreSQL servers.
It seems to me the situation to split the case of pre-9.2 and post-9.3
using a condition of "conn->sversion >= 90300".
To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
is a pointer to 64-bit integer and actual data is placed somewhere
else. There might be other way: add new member to union u to store
64-bit integer:typedef struct
{
int len;
int isint;
union
{
int *ptr; /* can't use void (dec compiler barfs) */
int integer;
int64 bigint; /* 64-bit integer */
} u;
} PQArgBlock;I'm a little bit worried about this way because PQArgBlock is a public
interface.
I'm inclined to add a new field for the union; that seems to me straight
forward approach.
For example, the manner in lo_seek64() seems to me confusable.
It set 1 on "isint" field even though pointer is delivered actually.
+ argv[1].isint = 1;
+ argv[1].len = 8;
+ argv[1].u.ptr = (int *) &len;
Also we add new type "pg_int64":
#ifndef NO_PG_INT64
#define HAVE_PG_INT64 1
typedef long long int pg_int64;
#endifin postgres_ext.h per suggestion from Tom Lane:
http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php
I'm uncertain about context of this discussion.
Does it make matter if we include <stdint.h> and use int64_t instead
of the self defined data type?
2) Backend lo_* functions (be-fsstubs.c)(Nozomi Anzai)
Add lo_lseek64, lo_tell64, lo_truncate64 so that they can handle
64-bit seek position and data length. loread64 and lowrite64 are not
added because if a program tries to read/write more than 2GB at once,
it would be a sign that the program need to be re-designed anyway.
I think it is a reasonable.
3) Backend inv_api.c functions(Nozomi Anzai)
No need to add new functions. Just extend them to handle 64-bit data.
BTW , what will happen if older 32-bit libpq accesses large objects
over 2GB?lo_read and lo_write: they can read or write lobjs using 32-bit API as
long as requested read/write data length is smaller than 2GB. So I
think we can safely allow them to access over 2GB lobjs.lo_lseek: again as long as requested offset is smaller than 2GB, there
would be no problem.lo_tell:if current seek position is beyond 2GB, returns an error.
Even though iteration of lo_lseek() may move the offset to 4TB, it also
makes unavailable to use lo_tell() to obtain the current offset, so I think
it is reasonable behavior.
However, error code is not an appropriate one.
+ if (INT_MAX < offset)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("invalid large-object
descriptor: %d", fd)));
+ PG_RETURN_INT32(-1);
+ }
According to the manpage of lseek(2)
EOVERFLOW
The resulting file offset cannot be represented in an off_t.
Please add a new error code such as ERRCODE_BLOB_OFFSET_OVERFLOW.
4) src/test/examples/testlo64.c added for 64-bit API example(Yugo Nagata)
Comments and suggestions are welcome.
miscellaneous comments are below.
Regression test is helpful. Even though no need to try to create 4TB large
object, it is helpful to write some chunks around the design boundary.
Could you add some test cases that writes some chunks around 4TB offset.
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
3) Backend inv_api.c functions(Nozomi Anzai)
No need to add new functions. Just extend them to handle 64-bit data.
BTW , what will happen if older 32-bit libpq accesses large objects
over 2GB?lo_read and lo_write: they can read or write lobjs using 32-bit API as
long as requested read/write data length is smaller than 2GB. So I
think we can safely allow them to access over 2GB lobjs.lo_lseek: again as long as requested offset is smaller than 2GB, there
would be no problem.lo_tell:if current seek position is beyond 2GB, returns an error.
Even though iteration of lo_lseek() may move the offset to 4TB, it also
makes unavailable to use lo_tell() to obtain the current offset, so I think
it is reasonable behavior.However, error code is not an appropriate one.
+ if (INT_MAX < offset) + { + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg("invalid large-object descriptor: %d", fd))); + PG_RETURN_INT32(-1); + }According to the manpage of lseek(2)
EOVERFLOW
The resulting file offset cannot be represented in an off_t.Please add a new error code such as ERRCODE_BLOB_OFFSET_OVERFLOW.
Agreed.
--
Nozomi Anzai
SRA OSS, Inc. Japan
To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
is a pointer to 64-bit integer and actual data is placed somewhere
else. There might be other way: add new member to union u to store
64-bit integer:typedef struct
{
int len;
int isint;
union
{
int *ptr; /* can't use void (dec compiler barfs) */
int integer;
int64 bigint; /* 64-bit integer */
} u;
} PQArgBlock;I'm a little bit worried about this way because PQArgBlock is a public
interface.I'm inclined to add a new field for the union; that seems to me straight
forward approach.
For example, the manner in lo_seek64() seems to me confusable.
It set 1 on "isint" field even though pointer is delivered actually.+ argv[1].isint = 1; + argv[1].len = 8; + argv[1].u.ptr = (int *) &len;
I have to admit that this is confusing. However I'm worring about
changing sizeof(PQArgBlock) from compatibility's point of view. Maybe
I'm just a paranoia though.
Also we add new type "pg_int64":
#ifndef NO_PG_INT64
#define HAVE_PG_INT64 1
typedef long long int pg_int64;
#endifin postgres_ext.h per suggestion from Tom Lane:
http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.phpI'm uncertain about context of this discussion.
Does it make matter if we include <stdint.h> and use int64_t instead
of the self defined data type?
I think Tom's point is, there are tons of applications which define
their own "int64_t" (at least in 2005).
Also pg_config.h has:
#define HAVE_STDINT_H 1
and this suggests that PostgreSQL adopts to platforms which does not
have stdint.h. If so, we need to take care of such platforms anyway.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
Tatsuo Ishii <ishii@postgresql.org> writes:
To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
is a pointer to 64-bit integer and actual data is placed somewhere
else.
Yeah, I think we have to do it like that. Changing the size of
PQArgBlock would be a libpq ABI break, which IMO is sufficiently painful
to kill this whole proposal. Much better a little localized ugliness
in fe-lobj.c.
regards, tom lane
I think Tom's point is, there are tons of applications which define
their own "int64_t" (at least in 2005).
Also pg_config.h has:#define HAVE_STDINT_H 1
and this suggests that PostgreSQL adopts to platforms which does not
have stdint.h. If so, we need to take care of such platforms anyway.
OK, it makes me clear. It might be helpful a source code comment
to remain why we used self defined datatype here.
2012/9/21 Tom Lane <tgl@sss.pgh.pa.us>:
Tatsuo Ishii <ishii@postgresql.org> writes:
To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
is a pointer to 64-bit integer and actual data is placed somewhere
else.Yeah, I think we have to do it like that. Changing the size of
PQArgBlock would be a libpq ABI break, which IMO is sufficiently painful
to kill this whole proposal. Much better a little localized ugliness
in fe-lobj.c.
Hmm, I see. Please deliver the 64bit integer argument as reference,
and don't forget endian translations here.
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
Currently lo_initialize() throws an error if one of oids are not
available. I doubt we do the same way for 64-bit functions since this
will make 9.3 libpq unable to access large objects stored in pre-9.2
PostgreSQL servers.It seems to me the situation to split the case of pre-9.2 and post-9.3
using a condition of "conn->sversion >= 90300".
Agreed. I'll fix it like that.
4) src/test/examples/testlo64.c added for 64-bit API example(Yugo Nagata)
Comments and suggestions are welcome.
miscellaneous comments are below.
Regression test is helpful. Even though no need to try to create 4TB large
object, it is helpful to write some chunks around the design boundary.
Could you add some test cases that writes some chunks around 4TB offset.
Agreed. I'll do that.
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Yugo Nagata <nagata@sraoss.co.jp>