Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
Hi,
I run into a nasty behavior of current PostgreSQL JDBC.
I maintain images (from Webcams). In the Java and Hibernate (JPA) code
I specified a @Lob annotation on class MyData and a attribte/data type
"byte[] mydata;". Hibernate then generates two tables in PostgreSQL,
one called MyData with a column mydata of type oid and an internal one
called pg_largobjects (which contain foreign keys to the oid). That's
also explained in the JDBC docs [1]http://jdbc.postgresql.org/documentation/head/binary-data.html, saying "PostgreSQL provides two
distinct ways to store binary data. Binary data can be stored in a
table using the data type bytea or by using the Large Object feature
which stores the binary data in a separate table in a special format
and refers to that table by storing a value of type oid in your
table."
Now, when replacing the images (few hundred) with new ones using Java,
pg_largeobjects grew constantly until the file system run out of
space. So old image data did'nt get released! This is to me a bug
because the user/programmer must (and should) assume that there is a
strict 1:1 relationship between generated table MyData and its LOB
column data (stored in pg_largeobjects).
=> I finally found the supplied module 'lo' [2]http://www.postgresql.org/docs/current/interactive/lo.html which releases
detached records. Is this the recommended way to resolve this problem?
Searching for explanations I found a ticket HHH2244 [3]https://hibernate.onjira.com/browse/HHH-2244 which was
closed by the Hibernate team without action referring to the JDBC
Spec. which says: "An SQL BLOB is a built-in type that stores a Binary
Large Object as a column value in a row of a database table".
=> In other words: The PostgreSQL JDBC team should take action on this
but didn't until now, right?
There is another issue about "PostgreSQL and BLOBs" [4]http://relation.to/Bloggers/PostgreSQLAndBLOBs. First it
cites PostgreSQL JDBC docs [1]http://jdbc.postgresql.org/documentation/head/binary-data.html. The thread [4]http://relation.to/Bloggers/PostgreSQLAndBLOBs ends somehow too in a
'deadlock' concluding "...the PostgreSQLDialect (as of 3.5.5) needs
to change not to use MaterializedBlobType until the Postgres (JDBC)
team changes their driver (which does not seem to have happened in the
last 6 years)."
=> Any solutions or comments form insiders on this?
Yours, Stefan
[1]: http://jdbc.postgresql.org/documentation/head/binary-data.html
[2]: http://www.postgresql.org/docs/current/interactive/lo.html
[3]: https://hibernate.onjira.com/browse/HHH-2244
[4]: http://relation.to/Bloggers/PostgreSQLAndBLOBs
I'd like to backup my statement below regarding in JDBC driver from PostgreSQL:
When storing fields of type BLOB it inserts the binary string in
system table pg_largeobject (via user table). But when rows in user
table get updated or deleted it does not update nor delete
corresponding rows in table pg_largeobject.
That's really a bug!
Fortunately there's a solution indicated in the official docs
(http://www.postgresql.org/docs/9.1/static/lo.html)
To me, something like this should be implemented before hand in JDBC driver.
And in any case there should be a bold note about this in the JDBC
docs (http://jdbc.postgresql.org/documentation/head/binary-data.html )
Yours, Stefan
2012/1/6 Stefan Keller <sfkeller@gmail.com>:
Show quoted text
Hi,
I run into a nasty behavior of current PostgreSQL JDBC.
I maintain images (from Webcams). In the Java and Hibernate (JPA) code
I specified a @Lob annotation on class MyData and a attribte/data type
"byte[] mydata;". Hibernate then generates two tables in PostgreSQL,
one called MyData with a column mydata of type oid and an internal one
called pg_largobjects (which contain foreign keys to the oid). That's
also explained in the JDBC docs [1], saying "PostgreSQL provides two
distinct ways to store binary data. Binary data can be stored in a
table using the data type bytea or by using the Large Object feature
which stores the binary data in a separate table in a special format
and refers to that table by storing a value of type oid in your
table."Now, when replacing the images (few hundred) with new ones using Java,
pg_largeobjects grew constantly until the file system run out of
space. So old image data did'nt get released! This is to me a bug
because the user/programmer must (and should) assume that there is a
strict 1:1 relationship between generated table MyData and its LOB
column data (stored in pg_largeobjects).
=> I finally found the supplied module 'lo' [2] which releases
detached records. Is this the recommended way to resolve this problem?Searching for explanations I found a ticket HHH2244 [3] which was
closed by the Hibernate team without action referring to the JDBC
Spec. which says: "An SQL BLOB is a built-in type that stores a Binary
Large Object as a column value in a row of a database table".
=> In other words: The PostgreSQL JDBC team should take action on this
but didn't until now, right?There is another issue about "PostgreSQL and BLOBs" [4]. First it
cites PostgreSQL JDBC docs [1]. The thread [4] ends somehow too in a
'deadlock' concluding "...the PostgreSQLDialect (as of 3.5.5) needs
to change not to use MaterializedBlobType until the Postgres (JDBC)
team changes their driver (which does not seem to have happened in the
last 6 years)."
=> Any solutions or comments form insiders on this?Yours, Stefan
[1] http://jdbc.postgresql.org/documentation/head/binary-data.html
[2] http://www.postgresql.org/docs/current/interactive/lo.html
[3] https://hibernate.onjira.com/browse/HHH-2244
[4] http://relation.to/Bloggers/PostgreSQLAndBLOBs
Stefan Keller wrote on 06.01.2012 19:04:
I maintain images (from Webcams). In the Java and Hibernate (JPA) code
I specified a @Lob annotation on class MyData and a attribte/data type
"byte[] mydata;". Hibernate then generates two tables in PostgreSQL,
one called MyData with a column mydata of type oid and an internal one
called pg_largobjects (which contain foreign keys to the oid). That's
also explained in the JDBC docs [1], saying "PostgreSQL provides two
distinct ways to store binary data. Binary data can be stored in a
table using the data type bytea or by using the Large Object feature
which stores the binary data in a separate table in a special format
and refers to that table by storing a value of type oid in your
table."
I think you are better off using bytea unless you need to access only parts of the blob regularly.
2012/1/8 Thomas Kellerer <spam_eater@gmx.net> wrote:
I think you are better off using bytea unless you need to access only parts
of the blob regularly.
That's a valid tip. But it's to the current JDBC implementation to
take action because it currently leads to disk space leakage when
using JDBC and JPA/Hibernate.
After a tedious time of bug searching I found a possible solution and
a testimonial here: "One of the problems with the JDBC driver (and
this affects the ODBC driver also), is that the specification assumes
that references to BLOBs (Binary Large OBjects) are stored within a
table, and if that entry is changed, the associated BLOB is deleted
from the database. As PostgreSQL stands, this doesn't occur." (taken
from the docs http://www.postgresql.org/docs/9.1/static/lo.html )
In addition, unfortunately the (lo_unlink) problem of LargeObjects and
BLOBs is not mentioned in the JDBC docs:
http://jdbc.postgresql.org/documentation/head/binary-data.html
Stefan
2012/1/8 Thomas Kellerer <spam_eater@gmx.net>:
Show quoted text
Stefan Keller wrote on 06.01.2012 19:04:
I maintain images (from Webcams). In the Java and Hibernate (JPA) code
I specified a @Lob annotation on class MyData and a attribte/data type
"byte[] mydata;". Hibernate then generates two tables in PostgreSQL,
one called MyData with a column mydata of type oid and an internal one
called pg_largobjects (which contain foreign keys to the oid). That's
also explained in the JDBC docs [1], saying "PostgreSQL provides two
distinct ways to store binary data. Binary data can be stored in a
table using the data type bytea or by using the Large Object feature
which stores the binary data in a separate table in a special format
and refers to that table by storing a value of type oid in your
table."I think you are better off using bytea unless you need to access only parts
of the blob regularly.--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Stefan Keller wrote on 08.01.2012 19:13:
I think you are better off using bytea unless you need to access only parts
of the blob regularly.That's a valid tip. But it's to the current JDBC implementation to
take action because it currently leads to disk space leakage when
using JDBC and JPA/Hibernate.
But only if you use large objects.
From my perspective bytea is the (only) data type that matches the JDBC BLOB type.
And none of the problems you have occur when using bytea.
There is no match for PG's large objects in the JDBC API so I don't see your claim that it's a fault of the driver.
What's the reason for you to stick with LargeObjects?
2012/1/8 Thomas Kellerer <spam_eater@gmx.net>:> What's the reason for
you to stick with LargeObjects?
I simply used the @Lob annotation in Hibernate/JPA.
That's all to get a leaking pg_largeobject table.
See http://relation.to/Bloggers/PostgreSQLAndBLOBs,
https://hibernate.onjira.com/browse/HHH-2244 and
https://hibernate.onjira.com/browse/HHH-4876 for some background of
the dilemma.
Stefan
2012/1/8 Thomas Kellerer <spam_eater@gmx.net>:
Show quoted text
Stefan Keller wrote on 08.01.2012 19:13:
I think you are better off using bytea unless you need to access only
parts the blob regularly.That's a valid tip. But it's to the current JDBC implementation to
take action because it currently leads to disk space leakage when
using JDBC and JPA/Hibernate.But only if you use large objects.
From my perspective bytea is the (only) data type that matches the JDBC BLOB
type.
And none of the problems you have occur when using bytea.There is no match for PG's large objects in the JDBC API so I don't see your
claim that it's a fault of the driver.What's the reason for you to stick with LargeObjects?
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Stefan Keller wrote on 08.01.2012 20:35:
2012/1/8 Thomas Kellerer<spam_eater@gmx.net>:
What's the reason for you to stick with LargeObjects?I simply used the @Lob annotation in Hibernate/JPA.
That's all to get a leaking pg_largeobject table.
See http://relation.to/Bloggers/PostgreSQLAndBLOBs,
https://hibernate.onjira.com/browse/HHH-2244 and
https://hibernate.onjira.com/browse/HHH-4876 for some background of
the dilemma.
So it's clearly a Hibernate bug.
2012/1/8 Thomas Kellerer <spam_eater@gmx.net>:> So it's clearly a
Hibernate bug.
Obviously not :->
Hibernate mapping just uses one of two valid variants to map large
objects in JDBC.
So, AFAIK it's a PostgreSQL JDBC bug and an omission in the JDBC docs as well.
Stefan
2012/1/8 Thomas Kellerer <spam_eater@gmx.net>:
Show quoted text
Stefan Keller wrote on 08.01.2012 20:35:
2012/1/8 Thomas Kellerer<spam_eater@gmx.net>:
What's the reason for you to stick with LargeObjects?I simply used the @Lob annotation in Hibernate/JPA.
That's all to get a leaking pg_largeobject table.
See http://relation.to/Bloggers/PostgreSQLAndBLOBs,
https://hibernate.onjira.com/browse/HHH-2244 and
https://hibernate.onjira.com/browse/HHH-4876 for some background of
the dilemma.So it's clearly a Hibernate bug.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Sun, 8 Jan 2012 18:08:09 +0100, Stefan Keller wrote:
I'd like to backup my statement below regarding in JDBC driver from
PostgreSQL:When storing fields of type BLOB it inserts the binary string in
system table pg_largeobject (via user table). But when rows in user
table get updated or deleted it does not update nor delete
corresponding rows in table pg_largeobject.That's really a bug!
Fortunately there's a solution indicated in the official docs
(http://www.postgresql.org/docs/9.1/static/lo.html)
To me, something like this should be implemented before hand in JDBC
driver.
And in any case there should be a bold note about this in the JDBC
docs (http://jdbc.postgresql.org/documentation/head/binary-data.html
)Yours, Stefan
2012/1/6 Stefan Keller <sfkeller@gmail.com>:
Hi,
I run into a nasty behavior of current PostgreSQL JDBC.
I maintain images (from Webcams). In the Java and Hibernate (JPA)
code
I specified a @Lob annotation on class MyData and a attribte/data
type
"byte[] mydata;". Hibernate then generates two tables in PostgreSQL,
one called MyData with a column mydata of type oid and an internal
one
called pg_largobjects (which contain foreign keys to the oid).
That's
also explained in the JDBC docs [1], saying "PostgreSQL provides two
distinct ways to store binary data. Binary data can be stored in a
table using the data type bytea or by using the Large Object feature
which stores the binary data in a separate table in a special format
and refers to that table by storing a value of type oid in your
table."Now, when replacing the images (few hundred) with new ones using
Java,
pg_largeobjects grew constantly until the file system run out of
space. So old image data did'nt get released! This is to me a bug
because the user/programmer must (and should) assume that there is a
strict 1:1 relationship between generated table MyData and its LOB
column data (stored in pg_largeobjects).
=> I finally found the supplied module 'lo' [2] which releases
detached records. Is this the recommended way to resolve this
problem?Searching for explanations I found a ticket HHH2244 [3] which was
closed by the Hibernate team without action referring to the JDBC
Spec. which says: "An SQL BLOB is a built-in type that stores a
Binary
Large Object as a column value in a row of a database table".
=> In other words: The PostgreSQL JDBC team should take action on
this
but didn't until now, right?There is another issue about "PostgreSQL and BLOBs" [4]. First it
cites PostgreSQL JDBC docs [1]. The thread [4] ends somehow too in a
'deadlock' concluding "...the PostgreSQLDialect (as of 3.5.5) needs
to change not to use MaterializedBlobType until the Postgres (JDBC)
team changes their driver (which does not seem to have happened in
the
last 6 years)."
=> Any solutions or comments form insiders on this?Yours, Stefan
[1] http://jdbc.postgresql.org/documentation/head/binary-data.html
[2] http://www.postgresql.org/docs/current/interactive/lo.html
[3] https://hibernate.onjira.com/browse/HHH-2244
[4] http://relation.to/Bloggers/PostgreSQLAndBLOBs
This is common approach for PostgreSQL and some proxy of real life
Blobs. One time I submitted bug about this to Hibernate. But step by
step.
1. BLOBs are... Large Objects, they are stored as reference because
those objects are large, if you will store this objects as bytea then
select * will return all large data. It may not be comfortable not only
to download few GB of data, but to keep this on stack too. From your
perspective it doesn't matters because you put it in byte[]. But if You
will keep e.g. CD-ROM images then it's much more better to use streaming
approach then bytea[]. More over due to some security JDBC driver will
at least double memory consumed by bytea.
2. Specifying hibernate data type as bytea do not resolve problems
because it will still use LOB approach.
3. pg_largeobjects is system table, hibernate do not creates it.
4. Trigger based approach is good for this, but You need to do this
mannualy
5. If you want to use bytea use
@Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I think you
should remove @Lob too) on your field.
Regards,
Radosław Smogura
Thanks, Radosław, for the clarification.
2012/1/8 Radosław Smogura <rsmogura@softperience.eu> wrote:
3. pg_largeobjects is system table, hibernate do not creates it.
4. Trigger based approach is good for this, but You need to do this mannualy
That's exactly my point:
Since JDBC manages creation of OID and pg_largeobjects it's also JDBC
which is responsible for update/delete of rows in pg_largeobjects.
As the user expects, the only thing JDBC would have to do is to issue
lo_unlink() when rows are updated or deleted.
Thus, it's currently a bug in the JDBC driver. And whatever the
solution is, it needs to be mentioned in the JDBC docs.
5. If you want to use bytea use
@Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I think you
should remove @Lob too) on your field.
In fact, this annotation syntax usage looks like the Hibernate mapping
for PostgreSQL could be enhanced.
Yours, Stefan
2012/1/8 Radosław Smogura <rsmogura@softperience.eu>:
Show quoted text
On Sun, 8 Jan 2012 18:08:09 +0100, Stefan Keller wrote:
I'd like to backup my statement below regarding in JDBC driver from
PostgreSQL:When storing fields of type BLOB it inserts the binary string in
system table pg_largeobject (via user table). But when rows in user
table get updated or deleted it does not update nor delete
corresponding rows in table pg_largeobject.That's really a bug!
Fortunately there's a solution indicated in the official docs
(http://www.postgresql.org/docs/9.1/static/lo.html)
To me, something like this should be implemented before hand in JDBC
driver.
And in any case there should be a bold note about this in the JDBC
docs (http://jdbc.postgresql.org/documentation/head/binary-data.html )Yours, Stefan
2012/1/6 Stefan Keller <sfkeller@gmail.com>:
Hi,
I run into a nasty behavior of current PostgreSQL JDBC.
I maintain images (from Webcams). In the Java and Hibernate (JPA) code
I specified a @Lob annotation on class MyData and a attribte/data type
"byte[] mydata;". Hibernate then generates two tables in PostgreSQL,
one called MyData with a column mydata of type oid and an internal one
called pg_largobjects (which contain foreign keys to the oid). That's
also explained in the JDBC docs [1], saying "PostgreSQL provides two
distinct ways to store binary data. Binary data can be stored in a
table using the data type bytea or by using the Large Object feature
which stores the binary data in a separate table in a special format
and refers to that table by storing a value of type oid in your
table."Now, when replacing the images (few hundred) with new ones using Java,
pg_largeobjects grew constantly until the file system run out of
space. So old image data did'nt get released! This is to me a bug
because the user/programmer must (and should) assume that there is a
strict 1:1 relationship between generated table MyData and its LOB
column data (stored in pg_largeobjects).
=> I finally found the supplied module 'lo' [2] which releases
detached records. Is this the recommended way to resolve this problem?Searching for explanations I found a ticket HHH2244 [3] which was
closed by the Hibernate team without action referring to the JDBC
Spec. which says: "An SQL BLOB is a built-in type that stores a Binary
Large Object as a column value in a row of a database table".
=> In other words: The PostgreSQL JDBC team should take action on this
but didn't until now, right?There is another issue about "PostgreSQL and BLOBs" [4]. First it
cites PostgreSQL JDBC docs [1]. The thread [4] ends somehow too in a
'deadlock' concluding "...the PostgreSQLDialect (as of 3.5.5) needs
to change not to use MaterializedBlobType until the Postgres (JDBC)
team changes their driver (which does not seem to have happened in the
last 6 years)."
=> Any solutions or comments form insiders on this?Yours, Stefan
[1] http://jdbc.postgresql.org/documentation/head/binary-data.html
[2] http://www.postgresql.org/docs/current/interactive/lo.html
[3] https://hibernate.onjira.com/browse/HHH-2244
[4] http://relation.to/Bloggers/PostgreSQLAndBLOBsThis is common approach for PostgreSQL and some proxy of real life Blobs.
One time I submitted bug about this to Hibernate. But step by step.1. BLOBs are... Large Objects, they are stored as reference because those
objects are large, if you will store this objects as bytea then select *
will return all large data. It may not be comfortable not only to download
few GB of data, but to keep this on stack too. From your perspective it
doesn't matters because you put it in byte[]. But if You will keep e.g.
CD-ROM images then it's much more better to use streaming approach then
bytea[]. More over due to some security JDBC driver will at least double
memory consumed by bytea.2. Specifying hibernate data type as bytea do not resolve problems because
it will still use LOB approach.3. pg_largeobjects is system table, hibernate do not creates it.
4. Trigger based approach is good for this, but You need to do this mannualy
5. If you want to use bytea use
@Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I think you
should remove @Lob too) on your field.Regards,
Radosław Smogura
On Sun, 8 Jan 2012 21:57:37 +0100, Stefan Keller wrote:
Thanks, Radosław, for the clarification.
2012/1/8 Radosław Smogura <rsmogura@softperience.eu> wrote:
3. pg_largeobjects is system table, hibernate do not creates it.
4. Trigger based approach is good for this, but You need to do this
mannualyThat's exactly my point:
Since JDBC manages creation of OID and pg_largeobjects it's also JDBC
which is responsible for update/delete of rows in pg_largeobjects.
As the user expects, the only thing JDBC would have to do is to issue
lo_unlink() when rows are updated or deleted.
Thus, it's currently a bug in the JDBC driver. And whatever the
solution is, it needs to be mentioned in the JDBC docs.
Not quite, PostgreSQL doesn't have LOB, nor OID type that is only
reference to LOB. In fact, BLOB behaviour in JDBC
is just thin wrapper for this what is missing in PostgreSQL - BLOBs. It
was build form available parts. In addition OID type may be used and
it's used as the system id or may be used as just some kind of row id -
all types, tables, sequences etc has OID. You may create table with
"WITH OIDS" clause. You may use OID data type just as replacement for
(unsigned) int, so JDBC can create LOB but it can't decide if given
field in row is reference to LOB or e.g. table and what with statements
"DELETE WHERE date > "?
In fact JDBC driver is so "stupid" that if you will call getBytes or
getBlob on any column with Oid it will ask for LOB.
Hibernate knows this and it does what is best - calls standard BLOB
interface, and creates table with Oid column.
And here again round trip, in case of deletion only Hibernate may
delete given LOB because only Hibernate and You knows that any value in
Oid column will reflect LOB - JDBC driver doesn't "knows " this, but...
oids may be shared because those are only numbers, not all tables may be
covered by Hibernate, and assuming huge imagination, someone may encode
OID by adding 1, may store it as long, etc. I know it's quite popular
that DB schema comes from entities, but not always You have such
flexibility and sometimes You create Entities for particular schema.
So, as You see only this trigger approach is currently (universally)
the best way. If this is that module I think about it's just trigger
which calls unlink on replace or deletion - in fact You may write own
without any problems. Those are few lines only.
5. If you want to use bytea use
@Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I think
you
should remove @Lob too) on your field.In fact, this annotation syntax usage looks like the Hibernate
mapping
for PostgreSQL could be enhanced.
You have right, but Hibernate team will be in need to add auto deletion
for bulk updates, too. PostgreSQL isn't still so popular it's worth of
it (and they have no so much developers bug report may stand for months
without any comment). Look how many peoples ask for Lobs. It means no
one needs true LOB - true LOB large object stored outside table/row data
space, to allow out of statements operation like streaming, partial
updates etc. This is my definition of LOB, because this is idea of LOB.
I think only DB2 stores LOBs in row, PostgreSQL do not make this but
adverts this bytea ugly approach.
Personally, I create app which captures images from WebCam (like You) -
just frames not movies. From above reason I wanted to move to bytea, but
due to changes (wired instability and leak of backward compatibility) I
still have Oid. Because I have only two tables for Lobs I have garbage
collection simple script.
Best regards
Radek
Show quoted text
Yours, Stefan
2012/1/8 Radosław Smogura <rsmogura@softperience.eu>:
On Sun, 8 Jan 2012 18:08:09 +0100, Stefan Keller wrote:
I'd like to backup my statement below regarding in JDBC driver from
PostgreSQL:When storing fields of type BLOB it inserts the binary string in
system table pg_largeobject (via user table). But when rows in user
table get updated or deleted it does not update nor delete
corresponding rows in table pg_largeobject.That's really a bug!
Fortunately there's a solution indicated in the official docs
(http://www.postgresql.org/docs/9.1/static/lo.html)
To me, something like this should be implemented before hand in
JDBC
driver.
And in any case there should be a bold note about this in the JDBC
docs
(http://jdbc.postgresql.org/documentation/head/binary-data.html )Yours, Stefan
2012/1/6 Stefan Keller <sfkeller@gmail.com>:
Hi,
I run into a nasty behavior of current PostgreSQL JDBC.
I maintain images (from Webcams). In the Java and Hibernate (JPA)
code
I specified a @Lob annotation on class MyData and a attribte/data
type
"byte[] mydata;". Hibernate then generates two tables in
PostgreSQL,
one called MyData with a column mydata of type oid and an internal
one
called pg_largobjects (which contain foreign keys to the oid).
That's
also explained in the JDBC docs [1], saying "PostgreSQL provides
two
distinct ways to store binary data. Binary data can be stored in a
table using the data type bytea or by using the Large Object
feature
which stores the binary data in a separate table in a special
format
and refers to that table by storing a value of type oid in your
table."Now, when replacing the images (few hundred) with new ones using
Java,
pg_largeobjects grew constantly until the file system run out of
space. So old image data did'nt get released! This is to me a bug
because the user/programmer must (and should) assume that there is
a
strict 1:1 relationship between generated table MyData and its LOB
column data (stored in pg_largeobjects).
=> I finally found the supplied module 'lo' [2] which releases
detached records. Is this the recommended way to resolve this
problem?Searching for explanations I found a ticket HHH2244 [3] which was
closed by the Hibernate team without action referring to the JDBC
Spec. which says: "An SQL BLOB is a built-in type that stores a
Binary
Large Object as a column value in a row of a database table".
=> In other words: The PostgreSQL JDBC team should take action on
this
but didn't until now, right?There is another issue about "PostgreSQL and BLOBs" [4]. First it
cites PostgreSQL JDBC docs [1]. The thread [4] ends somehow too in
a
'deadlock' concluding "...the PostgreSQLDialect (as of 3.5.5)
needs
to change not to use MaterializedBlobType until the Postgres
(JDBC)
team changes their driver (which does not seem to have happened in
the
last 6 years)."
=> Any solutions or comments form insiders on this?Yours, Stefan
[1] http://jdbc.postgresql.org/documentation/head/binary-data.html
[2] http://www.postgresql.org/docs/current/interactive/lo.html
[3] https://hibernate.onjira.com/browse/HHH-2244
[4] http://relation.to/Bloggers/PostgreSQLAndBLOBsThis is common approach for PostgreSQL and some proxy of real life
Blobs.
One time I submitted bug about this to Hibernate. But step by step.1. BLOBs are... Large Objects, they are stored as reference because
those
objects are large, if you will store this objects as bytea then
select *
will return all large data. It may not be comfortable not only to
download
few GB of data, but to keep this on stack too. From your perspective
it
doesn't matters because you put it in byte[]. But if You will keep
e.g.
CD-ROM images then it's much more better to use streaming approach
then
bytea[]. More over due to some security JDBC driver will at least
double
memory consumed by bytea.2. Specifying hibernate data type as bytea do not resolve problems
because
it will still use LOB approach.3. pg_largeobjects is system table, hibernate do not creates it.
4. Trigger based approach is good for this, but You need to do this
mannualy5. If you want to use bytea use
@Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I think
you
should remove @Lob too) on your field.Regards,
Radosław Smogura
2012/1/8 Radosław Smogura <rsmogura@softperience.eu> wrote:
Not quite, PostgreSQL doesn't have LOB, nor OID type that is only reference> to LOB. In fact, BLOB behaviour in JDBC> is just thin wrapper for this what is missing in PostgreSQL - BLOBs.
I can't follow: PostgreSQL has bytea and LO. I'm ok with bytea too but
I'd like to get the job done with LO too.
JDBC offers both:
http://jdbc.postgresql.org/documentation/head/binary-data.html
I tried to summarize the characteristics here:
http://www.gis.hsr.ch/wiki/PostgreSQL_-_Binary_Large_Objects
There's an important principle that the code which "allocates"
resources is also responsible to release it.
AFAIK in this case it's JDBC which choses to use LO (which creates
pg_largeobjects entries) and it's therefore also JDBC which has to
clean up.
Yours, Stefan
P.S. Just in order get some more insight I also tried to forward this
to the JDBC list (which currently seems to have problems accepting new
subscriptions).
2012/1/8 Radosław Smogura <rsmogura@softperience.eu>:
Show quoted text
On Sun, 8 Jan 2012 21:57:37 +0100, Stefan Keller wrote:
Thanks, Radosław, for the clarification.
2012/1/8 Radosław Smogura <rsmogura@softperience.eu> wrote:
3. pg_largeobjects is system table, hibernate do not creates it.
4. Trigger based approach is good for this, but You need to do this
mannualyThat's exactly my point:
Since JDBC manages creation of OID and pg_largeobjects it's also JDBC
which is responsible for update/delete of rows in pg_largeobjects.
As the user expects, the only thing JDBC would have to do is to issue
lo_unlink() when rows are updated or deleted.
Thus, it's currently a bug in the JDBC driver. And whatever the
solution is, it needs to be mentioned in the JDBC docs.Not quite, PostgreSQL doesn't have LOB, nor OID type that is only reference
to LOB. In fact, BLOB behaviour in JDBC
is just thin wrapper for this what is missing in PostgreSQL - BLOBs. It was
build form available parts. In addition OID type may be used and it's used
as the system id or may be used as just some kind of row id - all types,
tables, sequences etc has OID. You may create table with "WITH OIDS" clause.
You may use OID data type just as replacement for (unsigned) int, so JDBC
can create LOB but it can't decide if given field in row is reference to LOB
or e.g. table and what with statements "DELETE WHERE date > "?In fact JDBC driver is so "stupid" that if you will call getBytes or getBlob
on any column with Oid it will ask for LOB.Hibernate knows this and it does what is best - calls standard BLOB
interface, and creates table with Oid column.And here again round trip, in case of deletion only Hibernate may delete
given LOB because only Hibernate and You knows that any value in Oid column
will reflect LOB - JDBC driver doesn't "knows " this, but... oids may be
shared because those are only numbers, not all tables may be covered by
Hibernate, and assuming huge imagination, someone may encode OID by adding
1, may store it as long, etc. I know it's quite popular that DB schema comes
from entities, but not always You have such flexibility and sometimes You
create Entities for particular schema.So, as You see only this trigger approach is currently (universally) the
best way. If this is that module I think about it's just trigger which calls
unlink on replace or deletion - in fact You may write own without any
problems. Those are few lines only.5. If you want to use bytea use
@Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I think you
should remove @Lob too) on your field.In fact, this annotation syntax usage looks like the Hibernate mapping
for PostgreSQL could be enhanced.You have right, but Hibernate team will be in need to add auto deletion for
bulk updates, too. PostgreSQL isn't still so popular it's worth of it (and
they have no so much developers bug report may stand for months without any
comment). Look how many peoples ask for Lobs. It means no one needs true LOB
- true LOB large object stored outside table/row data space, to allow out of
statements operation like streaming, partial updates etc. This is my
definition of LOB, because this is idea of LOB. I think only DB2 stores LOBs
in row, PostgreSQL do not make this but adverts this bytea ugly approach.Personally, I create app which captures images from WebCam (like You) - just
frames not movies. From above reason I wanted to move to bytea, but due to
changes (wired instability and leak of backward compatibility) I still have
Oid. Because I have only two tables for Lobs I have garbage collection
simple script.Best regards
RadekYours, Stefan
2012/1/8 Radosław Smogura <rsmogura@softperience.eu>:
On Sun, 8 Jan 2012 18:08:09 +0100, Stefan Keller wrote:
I'd like to backup my statement below regarding in JDBC driver from
PostgreSQL:When storing fields of type BLOB it inserts the binary string in
system table pg_largeobject (via user table). But when rows in user
table get updated or deleted it does not update nor delete
corresponding rows in table pg_largeobject.That's really a bug!
Fortunately there's a solution indicated in the official docs
(http://www.postgresql.org/docs/9.1/static/lo.html)
To me, something like this should be implemented before hand in JDBC
driver.
And in any case there should be a bold note about this in the JDBC
docs (http://jdbc.postgresql.org/documentation/head/binary-data.html )Yours, Stefan
2012/1/6 Stefan Keller <sfkeller@gmail.com>:
Hi,
I run into a nasty behavior of current PostgreSQL JDBC.
I maintain images (from Webcams). In the Java and Hibernate (JPA) code
I specified a @Lob annotation on class MyData and a attribte/data type
"byte[] mydata;". Hibernate then generates two tables in PostgreSQL,
one called MyData with a column mydata of type oid and an internal one
called pg_largobjects (which contain foreign keys to the oid). That's
also explained in the JDBC docs [1], saying "PostgreSQL provides two
distinct ways to store binary data. Binary data can be stored in a
table using the data type bytea or by using the Large Object feature
which stores the binary data in a separate table in a special format
and refers to that table by storing a value of type oid in your
table."Now, when replacing the images (few hundred) with new ones using Java,
pg_largeobjects grew constantly until the file system run out of
space. So old image data did'nt get released! This is to me a bug
because the user/programmer must (and should) assume that there is a
strict 1:1 relationship between generated table MyData and its LOB
column data (stored in pg_largeobjects).
=> I finally found the supplied module 'lo' [2] which releases
detached records. Is this the recommended way to resolve this problem?Searching for explanations I found a ticket HHH2244 [3] which was
closed by the Hibernate team without action referring to the JDBC
Spec. which says: "An SQL BLOB is a built-in type that stores a Binary
Large Object as a column value in a row of a database table".
=> In other words: The PostgreSQL JDBC team should take action on this
but didn't until now, right?There is another issue about "PostgreSQL and BLOBs" [4]. First it
cites PostgreSQL JDBC docs [1]. The thread [4] ends somehow too in a
'deadlock' concluding "...the PostgreSQLDialect (as of 3.5.5) needs
to change not to use MaterializedBlobType until the Postgres (JDBC)
team changes their driver (which does not seem to have happened in the
last 6 years)."
=> Any solutions or comments form insiders on this?Yours, Stefan
[1] http://jdbc.postgresql.org/documentation/head/binary-data.html
[2] http://www.postgresql.org/docs/current/interactive/lo.html
[3] https://hibernate.onjira.com/browse/HHH-2244
[4] http://relation.to/Bloggers/PostgreSQLAndBLOBsThis is common approach for PostgreSQL and some proxy of real life Blobs.
One time I submitted bug about this to Hibernate. But step by step.1. BLOBs are... Large Objects, they are stored as reference because those
objects are large, if you will store this objects as bytea then select *
will return all large data. It may not be comfortable not only to
download
few GB of data, but to keep this on stack too. From your perspective it
doesn't matters because you put it in byte[]. But if You will keep e.g.
CD-ROM images then it's much more better to use streaming approach then
bytea[]. More over due to some security JDBC driver will at least double
memory consumed by bytea.2. Specifying hibernate data type as bytea do not resolve problems
because
it will still use LOB approach.3. pg_largeobjects is system table, hibernate do not creates it.
4. Trigger based approach is good for this, but You need to do this
mannualy5. If you want to use bytea use
@Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I think you
should remove @Lob too) on your field.Regards,
Radosław Smogura
On 9 January 2012 12:40, Stefan Keller <sfkeller@gmail.com> wrote:
There's an important principle that the code which "allocates"
resources is also responsible to release it.
That's one resource allocation model, yes. The other common model is
that resources are freed when there are no remaining references to
them, i.e. a GC model.
AFAIK in this case it's JDBC which choses to use LO (which creates
pg_largeobjects entries) and it's therefore also JDBC which has to
clean up.
If the application calls LargeObjectManager.create() then it's also
responsible for eventually calling LargeObjectManager.unlink().
If you're using JDBC's Blob API, there's no API there to tell the
driver to actually delete the underlying data (there is Blob.free(),
but that appears to just be about freeing local resources, not the
underlying storage).
As a LO is independent storage that might have multiple references to
it (the OID might be stored in many places), without explicit deletion
you need a GC mechanism to collect unreferenced LOs eventually -
that's what vacuumlo etc are doing.
What do you suggest that the driver does differently here? (Perhaps we
could do something like interpret Blob.truncate(0) as "delete the blob
right now" - but is that what Hibernate actually does?)
(Much of this is the whole LO vs. bytea argument all over again. If
you want to store data with a lifetime that's the same as the row it's
embedded in, then bytea is a much better mapping)
Oliver
2012/1/9 Oliver Jowett <oliver@opencloud.com>:
As a LO is independent storage that might have multiple references to> it (the OID might be stored in many places), without explicit deletion> you need a GC mechanism to collect unreferenced LOs eventually -> that's what vacuumlo etc are doing.
I can follow that. But that's not what the JDBC user expects nor is it
explained (nor mentioned) in the JDBC docs.
From a conceptual view I have just an entity MyWebcam with an
attribute called image. Attribute image is of attribute cardinality
1:1 (and private):
// Java using Hibernate/JPA:
@Entity
@Lob
@Basic(fetch=FetchType.LAZY)
public class MyWebcam {
private byte[] image;
private String name;
public byte[] getImage() { return image; }
public void setImage(byte[] _image) { image=_image; }
// ... other stuff
}
That's the classic use case.
Isn't it obvious that if setImage() sets another byte[] that the image
space get's cleared by the layers below?
And since Hibernate chose to use one variant of JDBC, it's also JDBC
which has to take care about orphans.
Yours, Stefan
2012/1/9 Oliver Jowett <oliver@opencloud.com>:
Show quoted text
On 9 January 2012 12:40, Stefan Keller <sfkeller@gmail.com> wrote:
There's an important principle that the code which "allocates"
resources is also responsible to release it.That's one resource allocation model, yes. The other common model is
that resources are freed when there are no remaining references to
them, i.e. a GC model.AFAIK in this case it's JDBC which choses to use LO (which creates
pg_largeobjects entries) and it's therefore also JDBC which has to
clean up.If the application calls LargeObjectManager.create() then it's also
responsible for eventually calling LargeObjectManager.unlink().If you're using JDBC's Blob API, there's no API there to tell the
driver to actually delete the underlying data (there is Blob.free(),
but that appears to just be about freeing local resources, not the
underlying storage).
As a LO is independent storage that might have multiple references to
it (the OID might be stored in many places), without explicit deletion
you need a GC mechanism to collect unreferenced LOs eventually -
that's what vacuumlo etc are doing.What do you suggest that the driver does differently here? (Perhaps we
could do something like interpret Blob.truncate(0) as "delete the blob
right now" - but is that what Hibernate actually does?)(Much of this is the whole LO vs. bytea argument all over again. If
you want to store data with a lifetime that's the same as the row it's
embedded in, then bytea is a much better mapping)Oliver
On 9 January 2012 14:29, Stefan Keller <sfkeller@gmail.com> wrote:
2012/1/9 Oliver Jowett <oliver@opencloud.com>:
As a LO is independent storage that might have multiple references to> it (the OID might be stored in many places), without explicit deletion> you need a GC mechanism to collect unreferenced LOs eventually -> that's what vacuumlo etc are doing.
I can follow that. But that's not what the JDBC user expects nor is it
explained (nor mentioned) in the JDBC docs.From a conceptual view I have just an entity MyWebcam with an
attribute called image. Attribute image is of attribute cardinality
1:1 (and private):// Java using Hibernate/JPA:
@Entity
@Lob
@Basic(fetch=FetchType.LAZY)
public class MyWebcam {
private byte[] image;
private String name;
public byte[] getImage() { return image; }
public void setImage(byte[] _image) { image=_image; }
// ... other stuff
}That's the classic use case.
Isn't it obvious that if setImage() sets another byte[] that the image
space get's cleared by the layers below?
And since Hibernate chose to use one variant of JDBC, it's also JDBC
which has to take care about orphans.
Well, either the Hibernate mapping is misconfigured, or your database
is misconfigured i.e. you are not collecting garbage LOs. If you have
a suitable GC mechanism configured, then what happens?
Otherwise, what should JDBC do differently here? Be specific. It would
be helpful if you could provide a native JDBC example, rather than a
Hibernate example, since it's not clear what JDBC calls are being made
by Hibernate.
Oliver
2012/1/9 Oliver Jowett <oliver@opencloud.com>:
Otherwise, what should JDBC do differently here? Be specific. It would
First, I pretty sure that Hibernate nor the Tomcat/Java GC are
misconfigured - since it works now after having installed the trigger
by hand.
To become more specific read the first two sections as a first hint
here in this official doc:
http://www.postgresql.org/docs/current/interactive/lo.html
I try to trace the JDBC calls coming from Hibernate (although the
problem seems to me pretty clearly located).
That investigation will take some time.
Yours, Stefan
2012/1/9 Oliver Jowett <oliver@opencloud.com>:
Show quoted text
On 9 January 2012 14:29, Stefan Keller <sfkeller@gmail.com> wrote:
2012/1/9 Oliver Jowett <oliver@opencloud.com>:
As a LO is independent storage that might have multiple references to> it (the OID might be stored in many places), without explicit deletion> you need a GC mechanism to collect unreferenced LOs eventually -> that's what vacuumlo etc are doing.
I can follow that. But that's not what the JDBC user expects nor is it
explained (nor mentioned) in the JDBC docs.From a conceptual view I have just an entity MyWebcam with an
attribute called image. Attribute image is of attribute cardinality
1:1 (and private):// Java using Hibernate/JPA:
@Entity
@Lob
@Basic(fetch=FetchType.LAZY)
public class MyWebcam {
private byte[] image;
private String name;
public byte[] getImage() { return image; }
public void setImage(byte[] _image) { image=_image; }
// ... other stuff
}That's the classic use case.
Isn't it obvious that if setImage() sets another byte[] that the image
space get's cleared by the layers below?
And since Hibernate chose to use one variant of JDBC, it's also JDBC
which has to take care about orphans.Well, either the Hibernate mapping is misconfigured, or your database
is misconfigured i.e. you are not collecting garbage LOs. If you have
a suitable GC mechanism configured, then what happens?Otherwise, what should JDBC do differently here? Be specific. It would
be helpful if you could provide a native JDBC example, rather than a
Hibernate example, since it's not clear what JDBC calls are being made
by Hibernate.Oliver
On 10 January 2012 00:06, Stefan Keller <sfkeller@gmail.com> wrote:
2012/1/9 Oliver Jowett <oliver@opencloud.com>:
Otherwise, what should JDBC do differently here? Be specific. It would
First, I pretty sure that Hibernate nor the Tomcat/Java GC are
misconfigured - since it works now after having installed the trigger
by hand.
You misunderstand - by GC I mean the process that collects garbage LOs
that are no longer referenced. I don't mean the JVM's heap GC.
You need a GC process like this if you are using LOs and not managing
their lifetimes explicitly from the application. Consider it part of
the necessary DB setup. You've already discovered the usual mechanisms
for it ('lo' or 'vacuumlo' depending on exactly what your data model
looks like).
To become more specific read the first two sections as a first hint
here in this official doc:
http://www.postgresql.org/docs/current/interactive/lo.html
FWIW, that documentation is pretty old (the JDBC docs now live
separately on jdbc.postgresql.org; the JDBC references in that
appendix are mostly historical)
But I'm not sure quite what you're referring to - those docs are
fairly clear about what you need to do? Specifically:
Now this is fine for PostgreSQL-specific applications, but standard code using JDBC or ODBC won't delete the objects, resulting in orphan objects — objects that are not referenced by anything, and simply occupy disk space.
Which is exactly my point - if you are going to use generic JDBC code
that does not explicitly delete LOs when they become detached, then
you need a separate mechanism to clean them up - that's just the way
the model works. If you want to avoid that, don't use LOs, use bytea.
So I'm still confused about what you'd like to see changed in the JDBC
driver. Can you explain?
Oliver
On 10 January 2012 00:29, Oliver Jowett <oliver@opencloud.com> wrote:
So I'm still confused about what you'd like to see changed in the JDBC
driver. Can you explain?
Perhaps what you're looking for here is "it all just works out of the
box". In that case, the missing piece seems to be that the DDL that
Hibernate emits (or the DDL which you have set up by hand - I don't
know how you have things set up) does not match the data model that
Hibernate is expecting. If Hibernate is expecting a data model where
the lifecycle of the LO is managed by the database and there's only at
most one reference to a particular LO (i.e. you can't link to the same
LO from multiple places), then it can set up appropriate cleanup
triggers on LO columns as part of its DDL automatically. Or it could
just map blobs to bytea, which might be a more natural mapping in that
case anyway and doesn't require triggers etc. So you'd need to talk to
the Hibernate guys about making those changes.
Oliver
On Mon, 9 Jan 2012 00:40:08 +0100, Stefan Keller wrote:
2012/1/8 Radosław Smogura <rsmogura@softperience.eu> wrote:
Not quite, PostgreSQL doesn't have LOB, nor OID type that is only
reference> to LOB. In fact, BLOB behaviour in JDBC> is just thin
wrapper for this what is missing in PostgreSQL - BLOBs.I can't follow: PostgreSQL has bytea and LO. I'm ok with bytea too
but
I'd like to get the job done with LO too.
JDBC offers both:
http://jdbc.postgresql.org/documentation/head/binary-data.html
I tried to summarize the characteristics here:
http://www.gis.hsr.ch/wiki/PostgreSQL_-_Binary_Large_ObjectsThere's an important principle that the code which "allocates"
resources is also responsible to release it.
AFAIK in this case it's JDBC which choses to use LO (which creates
pg_largeobjects entries) and it's therefore also JDBC which has to
clean up.
Hmm we have really different point of view. JDBC driver is just
interface which translates JDBC "commands" to database commands. JDBC
driver is not database. So responsible for this is database, but
database doesn't have "real" BLOBs, this what is made in PG JDBC driver
is just "not perfect" way to add this functionality to PostgreSQL. To
make complete system You need PostgreSQL, JDBC driver, and this trigger.
From reasons I mentioned before it's quite hard for JDBC driver to
implement deallocation, because driver will be in need to ask and to
"know" for OIDs to unlink. This could made from driver database which
need to parse queries, manages db structure etc, and this will be
executed twice in driver and in database - this is ineffective. Just for
example driver may support this for "DELETE FROM", but what will be if
deletion will be result of calling stored procedure? What if stored
procedure will be created in language different then pgSQL (driver still
may try to download it), but if this will be C, or any other? JDBC
driver must be universal and fully "compatible" with PG functionality,
as it may be used only as "helper" for presentation layer, true
processing logic may be somewhere e. g. legacy system for which we add
web interface.
Because of this JDBC driver can't expose desired behavior, even if
everyone agrees that current behavior is unperfect.
Only PG server may do this, or Hibernate for example by new annotation
@MangedPgLob.
Regards,
Radek
Show quoted text
Yours, Stefan
P.S. Just in order get some more insight I also tried to forward this
to the JDBC list (which currently seems to have problems accepting
new
subscriptions).2012/1/8 Radosław Smogura <rsmogura@softperience.eu>:
On Sun, 8 Jan 2012 21:57:37 +0100, Stefan Keller wrote:
Thanks, Radosław, for the clarification.
2012/1/8 Radosław Smogura <rsmogura@softperience.eu> wrote:
3. pg_largeobjects is system table, hibernate do not creates it.
4. Trigger based approach is good for this, but You need to do
this
mannualyThat's exactly my point:
Since JDBC manages creation of OID and pg_largeobjects it's also
JDBC
which is responsible for update/delete of rows in pg_largeobjects.
As the user expects, the only thing JDBC would have to do is to
issue
lo_unlink() when rows are updated or deleted.
Thus, it's currently a bug in the JDBC driver. And whatever the
solution is, it needs to be mentioned in the JDBC docs.Not quite, PostgreSQL doesn't have LOB, nor OID type that is only
reference
to LOB. In fact, BLOB behaviour in JDBC
is just thin wrapper for this what is missing in PostgreSQL - BLOBs.
It was
build form available parts. In addition OID type may be used and
it's used
as the system id or may be used as just some kind of row id - all
types,
tables, sequences etc has OID. You may create table with "WITH OIDS"
clause.
You may use OID data type just as replacement for (unsigned) int, so
JDBC
can create LOB but it can't decide if given field in row is
reference to LOB
or e.g. table and what with statements "DELETE WHERE date > "?In fact JDBC driver is so "stupid" that if you will call getBytes or
getBlob
on any column with Oid it will ask for LOB.Hibernate knows this and it does what is best - calls standard BLOB
interface, and creates table with Oid column.And here again round trip, in case of deletion only Hibernate may
delete
given LOB because only Hibernate and You knows that any value in Oid
column
will reflect LOB - JDBC driver doesn't "knows " this, but... oids
may be
shared because those are only numbers, not all tables may be covered
by
Hibernate, and assuming huge imagination, someone may encode OID by
adding
1, may store it as long, etc. I know it's quite popular that DB
schema comes
from entities, but not always You have such flexibility and
sometimes You
create Entities for particular schema.So, as You see only this trigger approach is currently (universally)
the
best way. If this is that module I think about it's just trigger
which calls
unlink on replace or deletion - in fact You may write own without
any
problems. Those are few lines only.5. If you want to use bytea use
@Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I
think you
should remove @Lob too) on your field.In fact, this annotation syntax usage looks like the Hibernate
mapping
for PostgreSQL could be enhanced.You have right, but Hibernate team will be in need to add auto
deletion for
bulk updates, too. PostgreSQL isn't still so popular it's worth of
it (and
they have no so much developers bug report may stand for months
without any
comment). Look how many peoples ask for Lobs. It means no one needs
true LOB
- true LOB large object stored outside table/row data space, to
allow out of
statements operation like streaming, partial updates etc. This is my
definition of LOB, because this is idea of LOB. I think only DB2
stores LOBs
in row, PostgreSQL do not make this but adverts this bytea ugly
approach.Personally, I create app which captures images from WebCam (like
You) - just
frames not movies. From above reason I wanted to move to bytea, but
due to
changes (wired instability and leak of backward compatibility) I
still have
Oid. Because I have only two tables for Lobs I have garbage
collection
simple script.Best regards
RadekYours, Stefan
2012/1/8 Radosław Smogura <rsmogura@softperience.eu>:
On Sun, 8 Jan 2012 18:08:09 +0100, Stefan Keller wrote:
I'd like to backup my statement below regarding in JDBC driver
from
PostgreSQL:When storing fields of type BLOB it inserts the binary string in
system table pg_largeobject (via user table). But when rows in
user
table get updated or deleted it does not update nor delete
corresponding rows in table pg_largeobject.That's really a bug!
Fortunately there's a solution indicated in the official docs
(http://www.postgresql.org/docs/9.1/static/lo.html)
To me, something like this should be implemented before hand in
JDBC
driver.
And in any case there should be a bold note about this in the
JDBC
docs
(http://jdbc.postgresql.org/documentation/head/binary-data.html )Yours, Stefan
2012/1/6 Stefan Keller <sfkeller@gmail.com>:
Hi,
I run into a nasty behavior of current PostgreSQL JDBC.
I maintain images (from Webcams). In the Java and Hibernate
(JPA) code
I specified a @Lob annotation on class MyData and a
attribte/data type
"byte[] mydata;". Hibernate then generates two tables in
PostgreSQL,
one called MyData with a column mydata of type oid and an
internal one
called pg_largobjects (which contain foreign keys to the oid).
That's
also explained in the JDBC docs [1], saying "PostgreSQL provides
two
distinct ways to store binary data. Binary data can be stored in
a
table using the data type bytea or by using the Large Object
feature
which stores the binary data in a separate table in a special
format
and refers to that table by storing a value of type oid in your
table."Now, when replacing the images (few hundred) with new ones using
Java,
pg_largeobjects grew constantly until the file system run out of
space. So old image data did'nt get released! This is to me a
bug
because the user/programmer must (and should) assume that there
is a
strict 1:1 relationship between generated table MyData and its
LOB
column data (stored in pg_largeobjects).
=> I finally found the supplied module 'lo' [2] which releases
detached records. Is this the recommended way to resolve this
problem?Searching for explanations I found a ticket HHH2244 [3] which
was
closed by the Hibernate team without action referring to the
JDBC
Spec. which says: "An SQL BLOB is a built-in type that stores a
Binary
Large Object as a column value in a row of a database table".
=> In other words: The PostgreSQL JDBC team should take action
on this
but didn't until now, right?There is another issue about "PostgreSQL and BLOBs" [4]. First
it
cites PostgreSQL JDBC docs [1]. The thread [4] ends somehow too
in a
'deadlock' concluding "...the PostgreSQLDialect (as of 3.5.5)
needs
to change not to use MaterializedBlobType until the Postgres
(JDBC)
team changes their driver (which does not seem to have happened
in the
last 6 years)."
=> Any solutions or comments form insiders on this?Yours, Stefan
[1]
http://jdbc.postgresql.org/documentation/head/binary-data.html
[2] http://www.postgresql.org/docs/current/interactive/lo.html
[3] https://hibernate.onjira.com/browse/HHH-2244
[4] http://relation.to/Bloggers/PostgreSQLAndBLOBsThis is common approach for PostgreSQL and some proxy of real life
Blobs.
One time I submitted bug about this to Hibernate. But step by
step.1. BLOBs are... Large Objects, they are stored as reference
because those
objects are large, if you will store this objects as bytea then
select *
will return all large data. It may not be comfortable not only to
download
few GB of data, but to keep this on stack too. From your
perspective it
doesn't matters because you put it in byte[]. But if You will keep
e.g.
CD-ROM images then it's much more better to use streaming approach
then
bytea[]. More over due to some security JDBC driver will at least
double
memory consumed by bytea.2. Specifying hibernate data type as bytea do not resolve problems
because
it will still use LOB approach.3. pg_largeobjects is system table, hibernate do not creates it.
4. Trigger based approach is good for this, but You need to do
this
mannualy5. If you want to use bytea use
@Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I
think you
should remove @Lob too) on your field.Regards,
Radosław Smogura
On 9 January 2012 14:55, Radosław Smogura <rsmogura@softperience.eu> wrote:
So responsible for this is database, but database doesn't have
"real" BLOBs, this what is made in PG JDBC driver is just "not perfect" way
to add this functionality to PostgreSQL.
I think you should elaborate on what you mean when you say that
Postgres doesn't have "real" BLOBs.
This discussion did make me wonder about something in Postgres'
LOB-support though. As explained earlier, the current implementation
allows for dedubbing LOB's, so that it's not necessary to store the
same large(!) object multiple times. That is also what's causing this
issue with the JDBC driver, or perhaps Hybernate in particular.
However, shouldn't it be up to the application designer to dedup large
objects or not?
The current implementation is probably rather convenient for projects
where duplicate large objects are common, but - as it turns out - it
can be quite inconvenient when that's opposite to expectations.
ISTMT this behaviour should at least be optional. Of course that
raises the question what should happen with an existing set of LOB's
when that setting gets changed.
--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.