Differences in Escaped bytea's when creating a plain pg_dump

Started by WRabout 4 years ago9 messagesgeneral

wolle321@freenet.de

about 4 years ago

Hello community,

I've some trouble in restoring a plain text pg_dump.
Postgres version is 13 x64 running on Windows10x64 installed from EDB
package.

The database has bytea_output = 'escape' option because of some
processing software needs it for historical reasons.

Dump command is:

pg_dump --file=mydump.sql --format=plain --verbose --encoding=UTF8
--schema=public --host=localhost --username=myuser --inserts dbname

We have two tables that have a bytea-row.

But when I look at the dumpfile there is a difference between the
escaped bytea-string. (note: both INSERT's from the same run of pg_dump
in the dumpfile)

SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;

[snip]

INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476,
2000, 2400, 2400,
'\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000',
500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', '2021-08-31 11:53:22.442801', 0, 1);

[snip]

INSERT INTO public.profiles VALUES (1, 1, 's', 152,
'\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000!\\003\\000\\000\\000!\\003\\000\\000\\000!\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000',
'2016-08-25 00:00:00+02');

[snip]

When I restore them via pgadmin4 query tool, both INSERTS do work.

But when I read them with my c++ written software from file and send
them with pqxx-Library as a transaction, the first bytea-string
generates a fault 21020. (0x00 is not a valid utf8 sequence). I also
checked the read string in c++ debugger, the single backslashes in the
one case and the double backslashes in the other case are there.

So my questions are:
Why do we get one bytea-string with double backslashes (which works) and
another one with single backslashes (which actually not works with
pqxx-transactions)?

Can I convince pg_dump somehow, to generate double backslashes in all
bytea-strings?

Why does pgadmin understand both formats. pqxx-transaction does not?

Thank you for this great database-system. I really like it.
Wolfgang

--
May the source be with you

wolle321@freenet.de

about 4 years ago

In reply to: WR (#1)

Re: Differences in Escaped bytea's when creating a plain pg_dump

Am 23.06.2022 um 17:13 schrieb WR:

Hello community,

I've some trouble in restoring a plain text pg_dump.
Postgres version is 13 x64 running on Windows10x64 installed from EDB
package.

The database has bytea_output = 'escape' option because of some
processing software needs it for historical reasons.

Dump command is:

pg_dump --file=mydump.sql --format=plain --verbose --encoding=UTF8
--schema=public --host=localhost --username=myuser --inserts dbname

We have two tables that have a bytea-row.

But when I look at the dumpfile there is a difference between the
escaped bytea-string. (note: both INSERT's from the same run of pg_dump
in the dumpfile)

SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;

[snip]

INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476,
2000, 2400, 2400,
'\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000',
500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', '2021-08-31 11:53:22.442801', 0, 1);

[snip]

INSERT INTO public.profiles VALUES (1, 1, 's', 152,
'\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000!\\003\\000\\000\\000!\\003\\000\\000\\000!\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000',
'2016-08-25 00:00:00+02');

[snip]

When I restore them via pgadmin4 query tool, both INSERTS do work.

But when I read them with my c++ written software from file and send
them with pqxx-Library as a transaction, the first bytea-string
generates a fault 21020. (0x00 is not a valid utf8 sequence). I also
checked the read string in c++ debugger, the single backslashes in the
one case and the double backslashes in the other case are there.

So my questions are:
Why do we get one bytea-string with double backslashes (which works) and
another one with single backslashes (which actually not works with
pqxx-transactions)?

Can I convince pg_dump somehow, to generate double backslashes in all
bytea-strings?

Why does pgadmin understand both formats. pqxx-transaction does not?

Thank you for this great database-system. I really like it.
Wolfgang

Hello again,

I've found one mistake in the data of the second table
(public.profiles). They seem to be really "double escaped" somehow. So
they are not valid anymore.

Now I know pg_dump doesn't make any difference between the two tables.
The only valid data is from table (public.oned_figures) with one
backslash. That was my fault, sorry.

But one question is left.

When I read this valid data into a c++ std::string (and I checked that
the single backslashes are still there). Why can't I put this
SQL-command to a pqxx-transaction and execute it. It looks like the
pqxx-transaction unescapes the bytea-string and then it finds the 0x00
bytes, which are not allowed in text-strings but should be in bytea-strings.

--
May the source be with you

Daniel Verite

daniel@manitou-mail.org

about 4 years ago

In reply to: WR (#2)

Re: Differences in Escaped bytea's when creating a plain pg_dump

WR wrote:

But one question is left.

When I read this valid data into a c++ std::string (and I checked that
the single backslashes are still there). Why can't I put this
SQL-command to a pqxx-transaction and execute it. It looks like the
pqxx-transaction unescapes the bytea-string and then it finds the 0x00
bytes, which are not allowed in text-strings but should be in bytea-strings.

You may check your server logs. They would have the error messages
with the queries as received by the server.

Note that passing '\000'::bytea with a single backslash can be
incorrect if standard_conforming_strings if set to off (not
recommended since version 8.1).
In that case the backslashes need to be doubled.

Example:

postgres=# set standard_conforming_strings to off;
SET

postgres=# set escape_string_warning to off;
SET

postgres=# select '\000'::bytea;
ERROR: invalid byte sequence for encoding "UTF8": 0x00

postgres=# select '\\000'::bytea;
bytea
-------
\x00
(1 row)

There might also be a problem with how the string is being fed with
the C++ code, but you'd need to show us the code to get feedback
on it.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

wolle321@freenet.de

about 4 years ago

In reply to: Daniel Verite (#3)

Re: Differences in Escaped bytea's when creating a plain pg_dump

I'm back at my problem today:

Example:

postgres=# set standard_conforming_strings to off;
SET

postgres=# set escape_string_warning to off;
SET

postgres=# select '\000'::bytea;
ERROR: invalid byte sequence for encoding "UTF8": 0x00

postgres=# select '\\000'::bytea;
bytea
-------
\x00
(1 row)

I made some test with pgadmin. Pgadmin (5.2) also reports this error
now. And it doesn't matter if standard_conforming_strings is on or off.

SET standard_conforming_strings = off;

INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476,
2000, 2400, 2400, '\000', 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)',
'2021-08-31 11:53:22.442801', 0, 1);

And

SET standard_conforming_strings = on;

INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476,
2000, 2400, 2400, '\000', 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)',
'2021-08-31 11:53:22.442801', 0, 1);

Both do report:

WARNUNG: nicht standardkonforme Verwendung von Escape in
Zeichenkettenkonstante
LINE 8: ...(1, 'Figure_Wolle1', 476, -476, 2000, 2400, 2400, '\000\000&...
^
HINT: Verwenden Sie die Syntax für Escape-Zeichenketten, z.B. E'\r\n'.

ERROR: FEHLER: ungültige Byte-Sequenz für Kodierung »UTF8«: 0x00

The warning can be avoided by adding the E before the string constant.

The only solution to avoid the error is, to double-backslash.

INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476,
2000, 2400, 2400, E'\\000', 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)',
'2021-08-31 11:53:22.442801', 0, 1);

I also dumped the table again after INSERT, and the data was correct
(but again without E and with single backslash)

What I cant understand: why does pg_dump produce the string without the
E and without double-backslash, when it is needed? Now I have to write a
correction routine in c++, what fixes the dumps, before using them.

--
May the source be with you

Import Notes

Resolved by subject fallback

wolle321@freenet.de

about 4 years ago

In reply to: WR (#4)

Re: Differences in Escaped bytea's when creating a plain pg_dump

Another strange thing is:

in my first mail I wrote: running the dump in in pgadmin works, in the
last mail I wrote pgadmin also produces the error. I played a little bit
how this could be happend.

Everytime ich used the following sql text in the querytool:

SET standard_conforming_strings = on;

INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476,
2000, 2400, 2400, '\000', 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)',
'2021-08-31 11:53:22.442801', 0, 1);

After each run I deleted the line with a View/Edit Data Panel of the Table.

First run worked.
Second run worked.
Then I changed to SET standard_conforming_strings = off;
Third run worked.
Fourth run throw the error
Then I changed back to SET standard_conforming_strings = on;
Fifth run throw the error too.
And only adding E and second backslash helped.

I could reproduce this behavior everytime I close the query tool and
opened it again.

But this looks more like a pgadmin-bug.

--
May the source be with you

Import Notes

Resolved by subject fallback

David G. Johnston

david.g.johnston@gmail.com

about 4 years ago

In reply to: WR (#4)

Re: Differences in Escaped bytea's when creating a plain pg_dump

On Sunday, June 26, 2022, WR <wolle321@freenet.de> wrote:

I made some test with pgadmin. Pgadmin (5.2) also reports this error now.
And it doesn't matter if standard_conforming_strings is on or off.

SET standard_conforming_strings = off;

INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476,
2000, 2400, 2400, '\000', 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', '2021-08-31
11:53:22.442801', 0, 1);

And

SET standard_conforming_strings = on;

INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476,
2000, 2400, 2400, '\000', 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', '2021-08-31
11:53:22.442801', 0, 1);

Both do report:

WARNUNG: nicht standardkonforme Verwendung von Escape in
Zeichenkettenkonstante
LINE 8: ...(1, 'Figure_Wolle1', 476, -476, 2000, 2400, 2400, '\000\000&...
^
HINT: Verwenden Sie die Syntax für Escape-Zeichenketten, z.B. E'\r\n'.

ERROR: FEHLER: ungültige Byte-Sequenz für Kodierung »UTF8«: 0x00

I’m not in a position to test/experiment on any guaranteed timeframe but
your observation that the outcome of those two commands is independent of
value of standard_conforming_strings is either a straight up bug or you’ve
made a mistake somewhere. The warning is fully conditioned on that setting
being off.

E’\\000’ and ‘\000’ passed as string literals to the bytea input routine
are defined to be equivalent inputs under standard_conforming_strings and
neither can produce the warning in that case.

I suggest doing self-contained examples that demonstrate the documented
behavior not working as documented (or not being functional even if
intended) to pinpoint any bug that might be lurking here. With only
fragments and statements that seem impossible we are left to assume
operator error. pg_dump is completely correct in what it is producing
(non-escape literal \000).

I also suggest using psql and pg_dump directly, and not pgAdmin, to
demonstrate a core PostgreSQL bug.

David J.

Wolfgang Rißler

wolfgang.rissler@freenet.de

about 4 years ago

In reply to: David G. Johnston (#6)

Re: Differences in Escaped bytea's when creating a plain pg_dump

Am 27.06.2022 um 09:32 schrieb David G. Johnston:

[snip]

I suggest doing self-contained examples that demonstrate the documented
behavior not working as documented (or not being functional even if
intended) to pinpoint any bug that might be lurking here. With only
fragments and statements that seem impossible we are left to assume
operator error. pg_dump is completely correct in what it is producing
(non-escape literal \000).

I also suggest using psql and pg_dump directly, and not pgAdmin, to
demonstrate a core PostgreSQL bug.

David J.

Thank you David,
I followed you advice, using pg_dump and psql directly. And the in in
contrast to pgAdmin psql works like expected and reproducable again and
again.
With
SET standard_conforming_strings = on;

an INSERT without E and double backslash works.

SET standard_conforming_strings = off;

I get the warning and the error. So there is no core PostgreSQL bug, I
think.

PgAdmin has different result, when running the same sql commands
repeatedly. Before filing a bug there, I should update to the actual
release.

Now I will test our c++ code and will hopefully find out, why I can't
run the dump from a sql-file (where is SET standard_conforming_strings =
on;) as a pqxx-transaction...

Wolfgang Rißler
mailto: wolfgang.rissler@freenet.de
mobil: +49 1520 9191759
- may the source be with you -

Daniel Verite

daniel@manitou-mail.org

about 4 years ago

In reply to: WR (#5)

Re: Differences in Escaped bytea's when creating a plain pg_dump

WR wrote:

First run worked.
Second run worked.
Then I changed to SET standard_conforming_strings = off;
Third run worked.
Fourth run throw the error
Then I changed back to SET standard_conforming_strings = on;
Fifth run throw the error too.
And only adding E and second backslash helped.

This kind of unstable behavior can be seen if the SET may not be
executed by the same session (same connection to the server) as the
subsequent queries.
SET affects only the session it's being run in.

For instance a connection pooler configured in statement mode may
produce that behavior. The solution in the case of a connection pooler
is to group related statements into a transaction.

Maybe pgAdmin has a pooler like that, but if you're issuing the statements
in the same SQL window, I would find it quite surprising that it doesn't
run them by the same session.
Or maybe you're mixing queries from different SQL windows that each
have their own connection, and in some cases you do the SET
in a window and the INSERT in another window.
Or maybe it's a pgAdmin bug.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Wolfgang Rißler

wolfgang.rissler@freenet.de

about 4 years ago

In reply to: Daniel Verite (#8)

Re: Differences in Escaped bytea's when creating a plain pg_dump

Am 27.06.2022 um 12:12 schrieb Daniel Verite:

WR wrote:

First run worked.
Second run worked.
Then I changed to SET standard_conforming_strings = off;
Third run worked.
Fourth run throw the error
Then I changed back to SET standard_conforming_strings = on;
Fifth run throw the error too.
And only adding E and second backslash helped.

This kind of unstable behavior can be seen if the SET may not be
executed by the same session (same connection to the server) as the
subsequent queries.
SET affects only the session it's being run in.

For instance a connection pooler configured in statement mode may
produce that behavior. The solution in the case of a connection pooler
is to group related statements into a transaction.

Maybe pgAdmin has a pooler like that, but if you're issuing the statements
in the same SQL window, I would find it quite surprising that it doesn't
run them by the same session.
Or maybe you're mixing queries from different SQL windows that each
have their own connection, and in some cases you do the SET
in a window and the INSERT in another window.
Or maybe it's a pgAdmin bug.

Ok, thank you Daniel,

in all tests I put the SET statement before the INSERT in the same query
tool and run it as a whole. (no statement was marked by mouse for single
execution). So I agree with you, that one transmission is used by
pgAdmin to run SET and INSERT.
I always had a SET before each INSERT, so there could be no unknown
state of standard_conforming_strings, no matter if pdAdmin uses
different connections for each run of the script or the same again.

The supplement of the second backslash in the c++ code costs me some
nerves. Sending an statement std::string without doublebackslashes to
pqxx::transaction results in the "0x00 is not a valid UTF8 sequence"
error 21020 (as I wrote already). I'm afraid, I can't simply replace all
backslashes in the whole dumptext with two of them.
Since I found out now, that psql works nice with my dumpfiles, maybe its
easier, to call psql from my code and not to use a pqxx::transaction. So
I can also save myself loading the dump from the sql-file. The more I
think about it, the more I love this idea.

Thanks a lot, Wolfgang

Wolfgang Rißler
mailto: wolfgang.rissler@freenet.de
mobil: +49 1520 9191759
- may the source be with you -