BLOB / CLOB support in PostgreSQL

Started by Vladimir Sitnikovover 7 years ago16 messages

sitnikov.vladimir@gmail.com

over 7 years ago

Hi,

According to Pgjdbc GitHub statistics, the most popular page is
https://github.com/pgjdbc/pgjdbc/issues/1102 which is
"org.postgresql.jdbc.PgConnection.createClob() is not yet implemented"
issue (1600 visits from 1400 unique visitors per a fortnight).

There are workarounds to silence the error, however I'm sure CLOB (as in
"streaming text datatype") is not yet supported in PostgreSQL backend.

I have browsed pgsql-hackers mailing list re CLOB, and it looks like
there's no relevant discussion, so I'm quite sure I've done my homework on
"check prior mails regarding the subject".

**Issue**: there's no sensible way to map java.sql.Clob to the existing
backend datatypes. `text` can't stream (it always loads the data fully),
and it is 1GB limited.

Java distinguishes java.sql.Blob and java.sql.Clob.
Blob is a "binary stream with streaming features". It can be mapped to
existing "Large Objects", and existing Large Object API somewhat suits for
the implementation.
There are glitches (like "pgjdbc has to perform 4 API calls
tell/seek/tell/seek in order just to get LO length once"), however it is
fine.

Java Clob API is just a dozen of methods (13 to be exact), however there
are two major issues there:
1) "Large Object" is just a binary object. There's no way to tell if the
contents is a UTF-8 string or Windows-1251 string or protobuf-encoded
message or whatever.
That is if pgjdbc encodes java.sql.Clob (large string) into some form of
binary (e.g. UTF-8) and store it as PostgreSQL Large Object, then this LO
automatically becomes "pgjdbc-specific blob".
There's no way to use the data in SQL or pl/pgsql or other applications.
For instance, one can't perform " where clob_column like '%abcd%' "

2) "characters". For instance, `long length()` should return the number of
characters in the string.
If pgjdbc implements java.sql.Clob as a UTF-8 encoded binary, then it would
have to **process the whole blob** in order to measure string length.
The same thing goes for `String getSubString(long pos, int length)`. It
would have to process all the bytes up to character `long pos` (how
otherwise it would know byte position for character `pos`?).

Currently pgjdbc encodes strings using client_encoding, stores them as LO,
and has been like that for ages. Apparently that might easily produce
garbage in the DB if clients use various encodings, however pgjdbc's
default setting is to use UTF-8 so the problem should be not that visible.

I fully understand LO has issues with "removing obsolete entries", however
mapping java.sql.Clob to `text` seems to make less sense.
For instance: suppose pgjdbc choses "Clob == text". Then a client meets
"1GB" limit.

"Streaming TOAST data" looks more like a research project rather than a
clear thing to implement.

What if there was a standard of storing strings in Large Objects?
For instance: "CLOB is a UTF-8 encoded string stored as a single LO". When
such an agreement exists, various applications could read and write the
data.
Of course, UTF-8 might not suit everybody, so a format might be "prefix
that specifies encoding, then encoded string".
Of course both variations above fail to support streaming (as in "need to
process all the contents in order to get the last character"), so it might
be better to use
"prefix that specifies encoding + 'index block' (that specifies offsets for
each 1M characters) + encoded string".
I'm sure there are known algorithms to store strings in binary format that
support subsequence / overwrite / length in reasonable time (O(1) or O(N)
with reasonable constant).
There might be an option to use UTF-16 (so each "character" becomes 2 bytes
always), however it would come at a cost of space usage.

**Here goes the question**: do you think such an implementation ("large
string stored in Large Objects" could be merged into the core eventually)?

Q2: any ideas/existing libraries for random access read-write large strings
stored as binary?

PS. Relevant pgjdbc PR is https://github.com/pgjdbc/pgjdbc/pull/1272

--
Regards,
Vladimir Sitnikov

Vladimir Sitnikov

sitnikov.vladimir@gmail.com

over 5 years ago

In reply to: Vladimir Sitnikov (#1)

Re: BLOB / CLOB support in PostgreSQL

Let me please bump the thread.

Just in case, I'm PgJDBC committer.

PgJDBC receives requests to "support CLOB" from time to time, however, I
believe it is impossible without the support from the database.
To my best knowledge, the database does not have APIs for "streaming large
text data".
The only "streaming large binary data" API I know is LargeObject which
seems to be old-fashioned.

I believe Java is not the only client that wants streaming access for
binary and text data.

Here's a recent pull request to PgJDBC
https://github.com/pgjdbc/pgjdbc/pull/1892 where Andrew suggests to add
BLOB/CLOB support via bytea/text,
and apparently, Andrew is surprised that the database lacks BLOB/CLOB
support.

Any ideas on the way to proceed here?
I don't think it is right to implement Clob via text, especially in case
the database provides its own "large text with streaming" datatype in the
future.

The concerns to avoid "Clob maps to text" could be:
a) Once the behavior is implemented, it is hard to change. That is
applications would rely on it (and it becomes a defacto standard), and it
would be hard to move to the proper "text with streaming API" datatype.
b) If we make «clob is text», then people might start using
update/substring APIs (which is the primary motivation for Clob) without
realizing there’s full value update behind the scenes. Currently, they can
use setString/getString for text, and it is crystal clear that the text is
updated fully on every update.

Vladimir

Andrew Dunstan

andrew.dunstan@2ndquadrant.com

over 5 years ago

In reply to: Vladimir Sitnikov (#2)

Re: BLOB / CLOB support in PostgreSQL

On 9/28/20 9:46 AM, Vladimir Sitnikov wrote:

Let me please bump the thread.

Just in case, I'm PgJDBC committer.

PgJDBC receives requests to "support CLOB" from time to time, however,
I believe it is impossible without the support from the database.
To my best knowledge, the database does not have APIs for "streaming
large text data".
The only "streaming large binary data" API I know is LargeObject which
seems to be old-fashioned.

I believe Java is not the only client that wants streaming access for
binary and text data.

Here's a recent pull request to PgJDBC
https://github.com/pgjdbc/pgjdbc/pull/1892 where Andrew suggests to
add BLOB/CLOB support via bytea/text,
and apparently, Andrew is surprised that the database lacks BLOB/CLOB
support.

I didn't at all say I was surprised. I said I didn't recall seeing a
call for what you're talking about, and indeed there has been very little.

Any ideas on the way to proceed here?
I don't think it is right to implement Clob via text, especially in
case the database provides its own "large text with streaming"
datatype in the future.

The concerns to avoid "Clob maps to text" could be:
a) Once the behavior is implemented, it is hard to change. That is
applications would rely on it (and it becomes a defacto standard), and
it would be hard to move to the proper "text with streaming API" datatype.
b) If we make «clob is text», then people might start using
update/substring APIs (which is the primary motivation for Clob)
without realizing there’s full value update behind the scenes.
Currently, they can use setString/getString for text, and it is
crystal clear that the text is updated fully on every update.

I think your concerns are seriously misplaced. In the first place, what
I have proposed is an optional behaviour, and it does not at all
preclude other behaviour being made available in future, as well as the
current behaviour which would remain the default. Furthermore, it's
consistent with how the MySQL driver behaves, AIUI, and possibly others
as well. That consistency is a major motivator for the work I've done.
Do you suggest they should rip out their Clob interface until MySQL
supports a streaming data type?

But this is hardly the place for such arguments.

To the best of my knowledge there is no concrete proposal for the type
of data type / interface you suggest. I don't even have any very good
idea what such a thing would look like.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Vladimir Sitnikov

sitnikov.vladimir@gmail.com

over 5 years ago

In reply to: Andrew Dunstan (#3)

Re: BLOB / CLOB support in PostgreSQL

Andrew>To the best of my knowledge there is no concrete proposal for the
type
Andrew>of data type / interface you suggest.

To the best of my knowledge, one of the concrete proposals was in the very
first message.

Let me please copy it:

Vladimir> **Here goes the question**: do you think such an implementation
Vladimir> ("large string stored in Large Objects" could be merged into the
core eventually)?

---

Andrew>In the first place, what
Andrew>I have proposed is an optional behaviour

Adding optional behaviors makes support complicated, so flags should be
avoided when
a standard solution is possible.

Even if you name the behavior "optional", people would still rely on it.
For instance, you implement Clob.truncate(int) as in-driver-memory truncate
operation, however,
in practice, that API should truncate the value in the database.

Just in case: the current behavior for LO-based lobs is that
Blob.truncate(int) immediately truncates LO.
So if the application works with the current LO mode, then they would get a
behavior change if they flip the switch.

The same for setString and other modifications.

An escape hatch could be like "ok, let's throw unimplemented for clob
modifications", however, it contradicts the spec
which says:

JDBC Spec> * All methods on the <code>Clob</code> interface must be fully
implemented if the
JDBC Spec> * JDBC driver supports the data type.

What I say here is that the behavior you suggest spans far away from just
"blob maps to bytea".

----

Andrew>Furthermore, it's
Andrew>consistent with how the MySQL driver behaves, AIUI,
Andrew>and possibly others as well

Oracle DB behaves differently. They have BLOB/CLOB column types, and
Clob/Blob interfaces
map to CLOB/BLOB.

Andrew>That consistency is a major motivator for the work I've done.

PostgreSQL supports large binary via LargeObject API, so the driver maps
Blob to LO.

If people want to use bytea, they can use `setBinaryStream(...)`.
What does drive people to use Clob/Blob when the database lacks APIs for it?

Vladimir

Andrew Dunstan

andrew.dunstan@2ndquadrant.com

over 5 years ago

In reply to: Vladimir Sitnikov (#4)

Re: BLOB / CLOB support in PostgreSQL

On 9/28/20 4:44 PM, Vladimir Sitnikov wrote:

Andrew>To the best of my knowledge there is no concrete proposal for
the type
Andrew>of data type / interface you suggest.

To the best of my knowledge, one of the concrete proposals was in the very
first message.

Let me please copy it:

Vladimir> **Here goes the question**: do you think such an implementation
Vladimir> ("large string stored in Large Objects" could be merged into
the core eventually)?

You and I clearly have a different idea from what constitutes a concrete
proposal. This is hardly the ghost of a proposal.

---

Andrew>In the first place, what
Andrew>I have proposed is an optional behaviour

Adding optional behaviors makes support complicated, so flags should
be avoided when
a standard solution is possible.

Even if you name the behavior "optional", people would still rely on it.
For instance, you implement Clob.truncate(int) as in-driver-memory
truncate operation, however,
in practice, that API should truncate the value in the database.

Just in case: the current behavior for LO-based lobs is that
Blob.truncate(int) immediately truncates LO.
So if the application works with the current LO mode, then they would
get a behavior change if they flip the switch.

The same for setString and other modifications.

An escape hatch could be like "ok, let's throw unimplemented for clob
modifications", however, it contradicts the spec
which says:

JDBC Spec> * All methods on the <code>Clob</code> interface must be
fully implemented if the
JDBC Spec> * JDBC driver supports the data type.

What I say here is that the behavior you suggest spans far away from
just "blob maps to bytea".

Again, the truncate() I implemented is 100% compatible with the MySQL
driver. I just checked the MSSQL driver and it too just appears to
truncate what's in memory. So maybe we wouldn't be in such bad company.

----

Andrew>Furthermore, it's
Andrew>consistent with how the MySQL driver behaves, AIUI,
Andrew>and possibly others as well

Oracle DB behaves differently. They have BLOB/CLOB column types, and
Clob/Blob interfaces
map to CLOB/BLOB.

Andrew>That consistency is a major motivator for the work I've done.

PostgreSQL supports large binary via LargeObject API, so the driver
maps Blob to LO.

If people want to use bytea, they can use `setBinaryStream(...)`.
What does drive people to use Clob/Blob when the database lacks APIs
for it?

The reason for my proposal (as I stated more than once) is that people
want to be able to use the same code across databases.

cheers

andrew.

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Vladimir Sitnikov

sitnikov.vladimir@gmail.com

over 5 years ago

In reply to: Andrew Dunstan (#5)

Re: BLOB / CLOB support in PostgreSQL

100% compatible with the MySQL

It is hardly a justification for a feature or for a change request.

Vladimir

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Vladimir Sitnikov (#6)

Re: BLOB / CLOB support in PostgreSQL

On Tue, Sep 29, 2020 at 5:22 AM Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> wrote:

100% compatible with the MySQL

It is hardly a justification for a feature or for a change request.

Vladimir

Glad to see this topic.

The obviously different opinion for this feature is based on if we need a
"perfect" solution
or a "OK-to-most-user cases" solution.

As for PG core developers, I'm +1 with "pg have their own serious
problems" , and
we are lacking the resources to handle everything well. However, "serious
problems"
to different people may be different.

As a rare experienced Java developer, looks raise "NotImplemented" error
for some
unimplemented APIs will not make the maintenance work hard, that probably
not common
used APIs. Not fully supported API should be better than fully not
supported APIs at all.

As an Oracle DBA before, I do see users need CLOB/BLOB some time but for
most of them,
they just want to save/get big stuff. This case in Oracle may be more
outstanding because of
the max length of varchar2 is too low.

When come to the JDBC standard

JDBC Spec> * All methods on the <code>Clob</code> interface must be fully
implemented if the
JDBC Spec> * JDBC driver supports the data type.

What would be the sense behind this? This is not reasonable based on
limited experience.

To be short, I am supporting Andrew's proposal for now.

--
Best Regards
Andy Fan

Dave Cramer

davecramer@postgres.rocks

over 5 years ago

In reply to: Andy Fan (#7)

Re: BLOB / CLOB support in PostgreSQL

On Mon, 28 Sep 2020 at 20:08, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Tue, Sep 29, 2020 at 5:22 AM Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> wrote:

100% compatible with the MySQL

It is hardly a justification for a feature or for a change request.

Vladimir

I would have to agree. This argument alone has not swayed discussions in
the past.

Glad to see this topic.

The obviously different opinion for this feature is based on if we need a
"perfect" solution
or a "OK-to-most-user cases" solution.

As for PG core developers, I'm +1 with "pg have their own serious
problems" , and
we are lacking the resources to handle everything well. However, "serious
problems"
to different people may be different.

As a rare experienced Java developer, looks raise "NotImplemented" error
for some
unimplemented APIs will not make the maintenance work hard, that probably
not common
used APIs. Not fully supported API should be better than fully not
supported APIs at all.

I don't agree with this statement. Currently the largest source of

problems with CLOB not being implemented is from hibernate which chose to
not follow the spec and for some reason fails if we throw a not implemented
(which is allowed by the spec). If we now implement something which only
partially implements the API I can imagine others failing in other ways.
Honestly I have no idea but what is the point of he spec if we don't adhere
to it?

As an Oracle DBA before, I do see users need CLOB/BLOB some time but for
most of them,
they just want to save/get big stuff. This case in Oracle may be more
outstanding because of
the max length of varchar2 is too low.

It is rare to see people writing JDBC now. It is much more likely they are

using JPA or hibernate. Getting around this is rather trivial simply by
using @Type(org.hibernate.type.TextType)

When come to the JDBC standard

JDBC Spec> * All methods on the <code>Clob</code> interface must be fully
implemented if the
JDBC Spec> * JDBC driver supports the data type.

What would be the sense behind this? This is not reasonable based on
limited experience.

The sense about this is that others writing code above it expect everything
to be implemented. If they aren't then things fail.

As mentioned above hibernate fails to check for not implemented and we see
the number of issues that resulted in that. Imagine what happens when we
partially implement the interface. It will be all but useless to them.

I'm not currently convinced the risk/reward ratio is in our favour.

Dave

Show quoted text

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

over 5 years ago

In reply to: Vladimir Sitnikov (#2)

Re: BLOB / CLOB support in PostgreSQL

On 2020-09-28 15:46, Vladimir Sitnikov wrote:

The concerns to avoid "Clob maps to text" could be:
a) Once the behavior is implemented, it is hard to change. That is
applications would rely on it (and it becomes a defacto standard), and
it would be hard to move to the proper "text with streaming API" datatype.
b) If we make «clob is text», then people might start using
update/substring APIs (which is the primary motivation for Clob) without
realizing there’s full value update behind the scenes. Currently, they
can use setString/getString for text, and it is crystal clear that the
text is updated fully on every update.

When we added TOAST, we made the explicit decision to not add a "LONG"
type but instead have the toasting mechanism transparent in all
variable-length types. And that turned out to be a very successful
decision, because it allows this system to be used by all data types,
not only one or two hardcoded ones. Therefore, I'm very strongly of the
opinion that if a streaming system of the sort you allude to were added,
it would also be added transparently into the TOAST system.

The JDBC spec says

"""
An implementation of a Blob, Clob or NClob object may either be locator
based or result in the object being fully materialized on the client.

By default, a JDBC driver should implement the Blob, Clob and NClob
interfaces using the appropriate locator type. An application does not
deal directly with the locator types that are defined in SQL.
"""

(A "locator" in SQL is basically what you might call a streaming handle.)

So yes, this encourages the implementation of locators. But it also
specifies that if you don't have locators, you can implement this using
non-large-object types.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10

Andrew Dunstan

andrew.dunstan@2ndquadrant.com

over 5 years ago

In reply to: Peter Eisentraut (#9)

Re: BLOB / CLOB support in PostgreSQL

On 9/29/20 10:26 AM, Peter Eisentraut wrote:

On 2020-09-28 15:46, Vladimir Sitnikov wrote:

The concerns to avoid "Clob maps to text" could be:
a) Once the behavior is implemented, it is hard to change. That is
applications would rely on it (and it becomes a defacto standard),
and it would be hard to move to the proper "text with streaming API"
datatype.
b) If we make «clob is text», then people might start using
update/substring APIs (which is the primary motivation for Clob)
without realizing there’s full value update behind the scenes.
Currently, they can use setString/getString for text, and it is
crystal clear that the text is updated fully on every update.

When we added TOAST, we made the explicit decision to not add a "LONG"
type but instead have the toasting mechanism transparent in all
variable-length types. And that turned out to be a very successful
decision, because it allows this system to be used by all data types,
not only one or two hardcoded ones. Therefore, I'm very strongly of
the opinion that if a streaming system of the sort you allude to were
added, it would also be added transparently into the TOAST system.

The JDBC spec says

"""
An implementation of a Blob, Clob or NClob object may either be
locator based or result in the object being fully materialized on the
client.

By default, a JDBC driver should implement the Blob, Clob and NClob
interfaces using the appropriate locator type. An application does not
deal directly with the locator types that are defined in SQL.
"""

(A "locator" in SQL is basically what you might call a streaming handle.)

So yes, this encourages the implementation of locators. But it also
specifies that if you don't have locators, you can implement this
using non-large-object types.

So if I read this correctly what I have proposed is completely kosher
according to the spec - it's the "fully materialized on the client"
variant, just like the MySQL and MSSQL drivers.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#11

Dave Cramer

davecramer@postgres.rocks

over 5 years ago

In reply to: Andrew Dunstan (#10)

Re: BLOB / CLOB support in PostgreSQL

On Tue, 29 Sep 2020 at 14:33, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
wrote:

On 9/29/20 10:26 AM, Peter Eisentraut wrote:

On 2020-09-28 15:46, Vladimir Sitnikov wrote:

The concerns to avoid "Clob maps to text" could be:
a) Once the behavior is implemented, it is hard to change. That is
applications would rely on it (and it becomes a defacto standard),
and it would be hard to move to the proper "text with streaming API"
datatype.
b) If we make «clob is text», then people might start using
update/substring APIs (which is the primary motivation for Clob)
without realizing there’s full value update behind the scenes.
Currently, they can use setString/getString for text, and it is
crystal clear that the text is updated fully on every update.

When we added TOAST, we made the explicit decision to not add a "LONG"
type but instead have the toasting mechanism transparent in all
variable-length types. And that turned out to be a very successful
decision, because it allows this system to be used by all data types,
not only one or two hardcoded ones. Therefore, I'm very strongly of
the opinion that if a streaming system of the sort you allude to were
added, it would also be added transparently into the TOAST system.

The JDBC spec says

"""
An implementation of a Blob, Clob or NClob object may either be
locator based or result in the object being fully materialized on the
client.

By default, a JDBC driver should implement the Blob, Clob and NClob
interfaces using the appropriate locator type. An application does not
deal directly with the locator types that are defined in SQL.
"""

(A "locator" in SQL is basically what you might call a streaming handle.)

So yes, this encourages the implementation of locators. But it also
specifies that if you don't have locators, you can implement this
using non-large-object types.

So if I read this correctly what I have proposed is completely kosher
according to the spec - it's the "fully materialized on the client"
variant, just like the MySQL and MSSQL drivers.

I haven't really looked at MySQL or MSSQL but do they implement the full
CLOB API ?
We would need to implement the full API.

BTW, just because it adheres to the spec doesn't seem to hold water in the
PostgreSQL project. Just sayin'

Dave

#12

Andrew Dunstan

andrew.dunstan@2ndquadrant.com

over 5 years ago

In reply to: Dave Cramer (#11)

Re: BLOB / CLOB support in PostgreSQL

On 9/29/20 2:39 PM, Dave Cramer wrote:

On Tue, 29 Sep 2020 at 14:33, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com
<mailto:andrew.dunstan@2ndquadrant.com>> wrote:

On 9/29/20 10:26 AM, Peter Eisentraut wrote:

On 2020-09-28 15:46, Vladimir Sitnikov wrote:

The concerns to avoid "Clob maps to text" could be:
a) Once the behavior is implemented, it is hard to change. That is
applications would rely on it (and it becomes a defacto standard),
and it would be hard to move to the proper "text with streaming

API"

datatype.
b) If we make «clob is text», then people might start using
update/substring APIs (which is the primary motivation for Clob)
without realizing there’s full value update behind the scenes.
Currently, they can use setString/getString for text, and it is
crystal clear that the text is updated fully on every update.

When we added TOAST, we made the explicit decision to not add a

"LONG"

type but instead have the toasting mechanism transparent in all
variable-length types. And that turned out to be a very successful
decision, because it allows this system to be used by all data

types,

not only one or two hardcoded ones. Therefore, I'm very strongly of
the opinion that if a streaming system of the sort you allude to

were

added, it would also be added transparently into the TOAST system.

The JDBC spec says

"""
An implementation of a Blob, Clob or NClob object may either be
locator based or result in the object being fully materialized

on the

client.

By default, a JDBC driver should implement the Blob, Clob and NClob
interfaces using the appropriate locator type. An application

does not

deal directly with the locator types that are defined in SQL.
"""

(A "locator" in SQL is basically what you might call a streaming

handle.)

So yes, this encourages the implementation of locators. But it also
specifies that if you don't have locators, you can implement this
using non-large-object types.

So if I read this correctly what I have proposed is completely kosher
according to the spec - it's the "fully materialized on the client"
variant, just like the MySQL and MSSQL drivers.

I haven't really looked at MySQL or MSSQL but do they implement the
full CLOB API ?
We would need to implement the full API.

BTW, just because it adheres to the spec doesn't seem to hold water in
the PostgreSQL project. Just sayin'

I take your point, but my remark was more in response to the apparent
suggestion that what I submitted was not according to spec.

There are two Clob methods I didn't implement, and one Blob method - the
set*Stream methods, I think they should be implementable, but they will
make the implementation somewhat more complex.

Anyway, at this stage let's take the discussion back to the github forums.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13

Vladimir Sitnikov

sitnikov.vladimir@gmail.com

over 5 years ago

In reply to: Andrew Dunstan (#5)

Re: BLOB / CLOB support in PostgreSQL

Andrew>You and I clearly have a different idea from what constitutes a
concrete
Andrew>proposal. This is hardly the ghost of a proposal.

Can you please clarify what is a proposal from your point of view?
Is it documented?

I think I have read the relevant TODO items:
https://wiki.postgresql.org/wiki/Developer_FAQ#What_do_I_do_after_choosing_an_item_to_work_on.3F

Wiki clearly suggests posting a mail to pgsql-hackers before starting work.

Vladimir

#14

Andrew Dunstan

andrew.dunstan@2ndquadrant.com

over 5 years ago

In reply to: Vladimir Sitnikov (#13)

Re: BLOB / CLOB support in PostgreSQL

On 9/29/20 3:48 PM, Vladimir Sitnikov wrote:

Andrew>You and I clearly have a different idea from what constitutes a
concrete
Andrew>proposal. This is hardly the ghost of a proposal.

Can you please clarify what is a proposal from your point of view?
Is it documented?

I think I have read the relevant TODO items:
https://wiki.postgresql.org/wiki/Developer_FAQ#What_do_I_do_after_choosing_an_item_to_work_on.3F

Wiki clearly suggests posting a mail to pgsql-hackers before starting
work.

A concrete proposal needs to be more than "a feature that does X". It
needs to contain a substantial implementation plan. What structures will
be affected, what APIS, what protocol changes and so on. You don't need
to have the code for these things but you do need a description of
what's intended by way of implementation that is detailed enough for the
community to critique. If you don't the danger is that you will spend a
lot of time coding and then present it to the community and they will
say "Oh, the design is all wrong." That's happened to a number of people
in the past, including some quite high profile people, and it's very sad
and frustrating for everyone when it happens.

A feature for streaming large data types could well be very valuable,
but right at the moment I at least don't have any idea what such a thing
could look like (and as you might imagine I'm quite interested).

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#15

Vladimir Sitnikov

sitnikov.vladimir@gmail.com

over 5 years ago

In reply to: Andrew Dunstan (#14)

Re: BLOB / CLOB support in PostgreSQL

Andrew>It needs to contain a substantial implementation plan

Here's an implementation plan, again, quoted from the very same mail:

Vladimir>Of course both variations above fail to support streaming
Vladimir> (as in "need to process all the contents in order to get the last
character"), so it might be better to use
Vladimir>"prefix that specifies encoding + 'index block' (that specifies
offsets for each 1M characters) + encoded string".

It does describe the data structure.

Andrew>what APIS

I believe it does not matter much.
However, it might be similar to the existing LO APIs, except the indices
are in characters rather than bytes.

Andrew>protocol changes

None.

Andrew>and so on

Well, it looks like I had everything you mentioned in the very first email.

Vladimir

#16

tsunakawa.takay@fujitsu.com

over 5 years ago

In reply to: Vladimir Sitnikov (#2)

RE: BLOB / CLOB support in PostgreSQL

From: Vladimir Sitnikov <sitnikov.vladimir@gmail.com>

Just in case, I'm PgJDBC committer.

Thank you very much for your great efforts for the wonderful PgJDBC. I saw you active.

# I'd be happy if you send emails in text format so that the reply looks nice. Your email seems to be in HTML.

and apparently, Andrew is surprised that the database lacks BLOB/CLOB support.

I was a bit surprised too when I first saw Postgres not support blob/clob but bytea, because I had an impression that Postgres is highly SQL standard compliant. I'm for adding blob/clob data types in server.

At the same time, I wonder why Postgres had to add bytea instead of blob. It may be that there are or were some technical issues. They may stand in the way even now.

One thing I can think of is the parameter format (text/binary). libpq's PQexecParams() can specify input format for each parameter, but it can only specify the output format for all returned columns, not for each column. As a consequence, the bytea host variable support added in PG 12 can INSERT 1 GB of binary data, but retrieval of the value fails with an error message like "invalid alloc request." That's because the server allocates twice the size of stored data to convert it into text format, whose size becomes about 2 GB. That exceeds the limit palloc() can allocate.

33.3. Command Execution Functions
https://www.postgresql.org/docs/devel/libpq-exec.html

The concerns to avoid "Clob maps to text" could be:
a) Once the behavior is implemented, it is hard to change. That is applications would rely on it (and it becomes a defacto standard), and it would be hard to move to the proper "text with streaming API" datatype.
b) If we make <clob is text>, then people might start using update/substring APIs (which is the primary motivation for Clob) without realizing there’s full value update behind the scenes. Currently, they can use setString/getString for text, and it is crystal clear that the text is updated fully on every update.

And if we treat clob as a synonym for text (just like the relationship between char and nchar), even when the user writes clob in DDL, pg_dump will output it as text. That makes it a bit harder to use the output for other DBMSs.

Regards
Takayuki Tsunakawa