Regarding bytea column in Posgresql

Started by Deole, Pushkar (Pushkar)about 11 years ago5 messagesgeneral
Jump to latest

Hi,

I have been assigned to a product that uses Postgresql 9.3 as backend database. I am new to postgresql.
The product provides chat functionality between the uses and the completed chats are stored in the database table in a 'bytea' column in the form of xml. When I query the data from this column I see xml file with text data. I have couple of queries:

1. Is 'bytea' column intended for storing text data?

2. Typically a chat can have text data with several special characters (which can be represented in multi bytes), how these characters can be stored in a bytea column and retrieved back properly?

Thanks,
Pushkar

#2Bill Moran
wmoran@potentialtech.com
In reply to: Deole, Pushkar (Pushkar) (#1)
Re: Regarding bytea column in Posgresql

On Thu, 9 Apr 2015 11:03:30 +0000
"Deole, Pushkar (Pushkar)" <pdeole@avaya.com> wrote:

I have been assigned to a product that uses Postgresql 9.3 as backend database. I am new to postgresql.
The product provides chat functionality between the uses and the completed chats are stored in the database table in a 'bytea' column in the form of xml. When I query the data from this column I see xml file with text data. I have couple of queries:

1. Is 'bytea' column intended for storing text data?

No, it's intended for storing binary data.

2. Typically a chat can have text data with several special characters (which can be represented in multi bytes), how these characters can be stored in a bytea column and retrieved back properly?

bytea won't help you here. You'll have to manage the special
characters entirely in your code. bytea gives you back the
exact same types you put in, with no changes or interpretation.

A better choice would be to use a text field with a proper
text encoding (such as utf-8).

Probably an even better choice would be to use the XML datatype
in PostgreSQL, since you say that you're storing XML anyway.

The place where people tend to get tripped up with TEXT and
XML datatypes is that they're strict. If you try to store
text in a TEXT data type that isn't valid (i.e., multi-byte
characters that aren't correct) you'll get an error and the
data won't be accepted. The same thing happens if you try to
store invalid XML in an XML field (such as XML without proper
closing tags, etc). It seems that this strictness causes a lot
of people to avoid those data types, as there seem to be a lot
of people who would rather have garbage data in their database
than actually go to the work of fixing their application.

--
Bill Moran

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3John R Pierce
pierce@hogranch.com
In reply to: Bill Moran (#2)
Re: Regarding bytea column in Posgresql

On 4/9/2015 4:10 AM, Bill Moran wrote:

1. Is 'bytea' column intended for storing text data?

No, it's intended for storing binary data.

2. Typically a chat can have text data with several special characters (which can be represented in multi bytes), how these characters can be stored in a bytea column and retrieved back properly?

bytea won't help you here. You'll have to manage the special
characters entirely in your code. bytea gives you back the
exact same types you put in, with no changes or interpretation.

A better choice would be to use a text field with a proper
text encoding (such as utf-8).

one possible rationale for using BYTEA is that the data could be in
various encodings, which the application wishes to preserve, and keeps
track of somewhere else (perhaps in a field within the XML?).
PostgreSQL text types would insist that all text be stored in the same
encoding, and anything in a different encoding would have to be
converted to the database encoding. Me, I'd be inclined to convert
everything to UTF8 and store it as such, and convert it back to the
user's encoding on display, but the feasibility of this really depends
on the use cases.

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Craig Ringer
craig@2ndquadrant.com
In reply to: John R Pierce (#3)
Re: Regarding bytea column in Posgresql

On 10 April 2015 at 03:27, John R Pierce <pierce@hogranch.com> wrote:

one possible rationale for using BYTEA is that the data could be in
various encodings, which the application wishes to preserve, and keeps
track of somewhere else (perhaps in a field within the XML?).

Thanks for bringing this up, as it's a good reason to use bytea for XML.

XML actually has an encoding field in the DTD declaration, e.g.

<?xml version="1.0" encoding="UTF-8"?>

It is common - and of dubious correctness - for applications to store XML
in a 'text' or 'xml' field without changing the 'encoding' field in the
doctype to reflect the encoding at rest.

Personally I wish the 'xml' type in Pg knew how to change the encoding
declaration dynamically, but I know it's a hairy problem; e.g. if the
client_encoding is iso-8859-1, but the client then converts the XML
document to utf-8 internally, the encoding will be wrong if the client
doesn't change it back.

I've also run into XML documents that shove data in different encodings
into CDATA sections. This is wrong, of course, but apps sometimes do it
anyway.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In reply to: John R Pierce (#3)
Re: Regarding bytea column in Posgresql

John,

Please find my response inline.

Thanks..

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of John R Pierce
Sent: Friday, April 10, 2015 12:57 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Regarding bytea column in Posgresql

On 4/9/2015 4:10 AM, Bill Moran wrote:

1. Is 'bytea' column intended for storing text data?

No, it's intended for storing binary data.

2. Typically a chat can have text data with several special characters (which can be represented in multi bytes), how these characters can be stored in a bytea column and retrieved back properly?

bytea won't help you here. You'll have to manage the special
characters entirely in your code. bytea gives you back the exact same
types you put in, with no changes or interpretation.

A better choice would be to use a text field with a proper text
encoding (such as utf-8).

one possible rationale for using BYTEA is that the data could be in various encodings, which the application wishes to preserve, and keeps
track of somewhere else (perhaps in a field within the XML?).
PostgreSQL text types would insist that all text be stored in the same encoding, and anything in a different encoding would have to be
converted to the database encoding. Me, I'd be inclined to convert
everything to UTF8 and store it as such, and convert it back to the user's encoding on display, but the feasibility of this really depends on the use cases.

--
john r pierce, recycling bits in santa cruz

--> I confirmed with the developers who had originally developed this product and as per them, the xml is in UTF-8 encoding. So, ideally it would have been better to have a text or XML column type.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.postgresql.org_mailpref_pgsql-2Dgeneral&amp;d=AwICaQ&amp;c=BFpWQw8bsuKpl1SgiZH64Q&amp;r=9JxdEvGq7qy3NnL3UxN47w&amp;m=c2bN1y5Oj6SbZjRfape8xKIQXIPlu4MRCwHN76YWi1I&amp;s=BTWMlID5cymS2KBU5tS__D7oiNHtLnjEsuiZ8TswAvA&amp;e=

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general