libpq and mysterious "invalid byte sequence for encoding UTF8".

Started by Jiří Pavlovskýabout 5 years ago7 messagesgeneral
Jump to latest
#1Jiří Pavlovský
jiri@pavlovsky.eu

I'm stuck trying to find a cause for

invalid byte sequence for encoding "UTF8".

It is an C program using libpq. I'm using `PQexecParams` to
execute the SQL query.

The offending byte sequence is completely random, sometimes the command
even runs ok. I thought I must have a memory allocation issue somewhere,
but even if I specify all the parameters as static strings, I still
receive the error with a random byte sequence.
What's more, the same query with the same parameters runs ok when I
create a small test program.
So I'm completely stuck. I verified all the possible sources for the
error like client_encoding etc, but could not find the source of the error.
What is confusing me is that the offending byte sequence is random, even
though the query parameters don't change.
Moreover, when I check the postgres log, the query and its parameters
appear to be correct.

I'm trying to update a record in the following table:

    CREATE TABLE public.contacts
    (
        contactid integer NOT NULL DEFAULT
nextval('contacts_contactid_seq'::regclass),
        paperid integer,
        pos character varying(50) COLLATE pg_catalog."default",
        title character varying(10) COLLATE pg_catalog."default",
        firstname character varying(20) COLLATE pg_catalog."default",
        lastname character varying(25) COLLATE pg_catalog."default",
        func character varying(25) COLLATE pg_catalog."default",
        tel1 text COLLATE pg_catalog."default",
        tel2 text COLLATE pg_catalog."default",
        fax1 text COLLATE pg_catalog."default",
        fax2 text COLLATE pg_catalog."default",
        email1 character varying(50) COLLATE pg_catalog."default",
        email2 character varying(50) COLLATE pg_catalog."default",
        maincontact boolean DEFAULT false,
        publdatacontact boolean DEFAULT false,
        invcontact boolean DEFAULT false,
        queries_recipient boolean,
        contact_log text COLLATE pg_catalog."default",
        salesforceid character(18) COLLATE pg_catalog."default",
        fakelastname boolean NOT NULL DEFAULT false,
        CONSTRAINT contacts_pk PRIMARY KEY (contactid),
        CONSTRAINT contacts_paperid_fkey FOREIGN KEY (paperid)
            REFERENCES public.papers (paperid) MATCH SIMPLE
            ON UPDATE CASCADE
            ON DELETE CASCADE
    );

Here is an actual code:

        const char* pparams[16] = {
    NULL,
    NULL,
    "1702",
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    "14340"
        };
   

   
    gchar *query="UPDATE contacts SET
Pos=$1::varchar,Title=$2::varchar,PaperID=$3::int,FirstName=$4::varchar,LastName=$5::varchar,Func=$6::varchar,Tel1=$7::text,Fax1=$8::text,Email1=$9::varchar,Tel2=$10::text,Fax2=$11::text,Email2=$12::varchar,MainContact=$13::boolean,PublDataContact=$14::boolean,InvContact=$15::boolean
WHERE ContactID=$16::int";
   
          result = PQexecParams(conn, query, 16, NULL, pparams, ssizes,
bbinary, 0);

An excerpt from Postgres log:

    Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [113-1] 2021-01-26
09:40:57.505 CET [11334] jira@project-syndicate LOG:  execute <unnamed>:
   
        UPDATE contacts SET Pos = $1::varchar, Title = $2::varchar,
PaperID = $3::int, FirstName = $4::varchar, LastName = $5::varchar, Func
= $6::varchar, Tel1 = $7::text, Fax1 = $8::text, Email1 = $9::varchar,
Tel2 = $10::text, Fax2 = $11::text, Email2 = $12::varchar, MainContact =
$13::boolean, PublDataContact = $14::boolean, InvContact = $15::boolean
WHERE ContactID = $16::int
        Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [113-2]
2021-01-26 09:40:57.505 CET [11334] jira@project-syndicate DETAIL: 
parameters: $1 = NULL, $2 = NULL, $3 = '1702', $4 = NULL, $5 = NULL, $6
= NULL, $7 = NULL, $8 = NULL, $9 = NULL, $10 = NULL, $11 = NULL, $12 =
NULL, $13 = NULL, $14 =  NULL, $15 = NULL, $16 = '14340'
        Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [114-1]
2021-01-26 09:40:57.544 CET [11334] jira@project-syndicate ERROR: 
invalid byte sequence for encoding "UTF8": 0x80

Any ideas as to what could be a cause of the error?

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jiří Pavlovský (#1)
Re: libpq and mysterious "invalid byte sequence for encoding UTF8".

=?UTF-8?B?SmnFmcOtIFBhdmxvdnNrw70=?= <jiri@pavlovsky.eu> writes:

    Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [113-1] 2021-01-26
09:40:57.505 CET [11334] jira@project-syndicate LOG:  execute <unnamed>:
   
        UPDATE contacts SET Pos = $1::varchar, Title = $2::varchar,
PaperID = $3::int, FirstName = $4::varchar, LastName = $5::varchar, Func
= $6::varchar, Tel1 = $7::text, Fax1 = $8::text, Email1 = $9::varchar,
Tel2 = $10::text, Fax2 = $11::text, Email2 = $12::varchar, MainContact =
$13::boolean, PublDataContact = $14::boolean, InvContact = $15::boolean
WHERE ContactID = $16::int
        Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [113-2]
2021-01-26 09:40:57.505 CET [11334] jira@project-syndicate DETAIL: 
parameters: $1 = NULL, $2 = NULL, $3 = '1702', $4 = NULL, $5 = NULL, $6
= NULL, $7 = NULL, $8 = NULL, $9 = NULL, $10 = NULL, $11 = NULL, $12 =
NULL, $13 = NULL, $14 =  NULL, $15 = NULL, $16 = '14340'
        Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [114-1]
2021-01-26 09:40:57.544 CET [11334] jira@project-syndicate ERROR: 
invalid byte sequence for encoding "UTF8": 0x80

This log entry shows that your query text and parameters all made it to
the backend just fine. So I don't think your issue is on the client side.
I'm wondering if the error could be from (say) triggers doing fancy data
manipulations. Noting your use of ON UPDATE CASCADE, it'd be worth
checking indirectly-affected tables as well.

regards, tom lane

#3Jiří Pavlovský
jiri@pavlovsky.eu
In reply to: Tom Lane (#2)
Re: libpq and mysterious "invalid byte sequence for encoding UTF8".

On 01.02.2021 15:55, Tom Lane wrote:

=?UTF-8?B?SmnFmcOtIFBhdmxvdnNrw70=?= <jiri@pavlovsky.eu> writes:

    Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [113-1] 2021-01-26
09:40:57.505 CET [11334] jira@project-syndicate LOG:  execute <unnamed>:

        UPDATE contacts SET Pos = $1::varchar, Title = $2::varchar,
PaperID = $3::int, FirstName = $4::varchar, LastName = $5::varchar, Func
= $6::varchar, Tel1 = $7::text, Fax1 = $8::text, Email1 = $9::varchar,
Tel2 = $10::text, Fax2 = $11::text, Email2 = $12::varchar, MainContact =
$13::boolean, PublDataContact = $14::boolean, InvContact = $15::boolean
WHERE ContactID = $16::int
        Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [113-2]
2021-01-26 09:40:57.505 CET [11334] jira@project-syndicate DETAIL:
parameters: $1 = NULL, $2 = NULL, $3 = '1702', $4 = NULL, $5 = NULL, $6
= NULL, $7 = NULL, $8 = NULL, $9 = NULL, $10 = NULL, $11 = NULL, $12 =
NULL, $13 = NULL, $14 =  NULL, $15 = NULL, $16 = '14340'
        Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [114-1]
2021-01-26 09:40:57.544 CET [11334] jira@project-syndicate ERROR:
invalid byte sequence for encoding "UTF8": 0x80

This log entry shows that your query text and parameters all made it to
the backend just fine. So I don't think your issue is on the client side.
I'm wondering if the error could be from (say) triggers doing fancy data
manipulations. Noting your use of ON UPDATE CASCADE, it'd be worth
checking indirectly-affected tables as well.

regards, tom lane

Thanks for the answer.

I don't think triggers or constraints are the issue.

The exact same code works if I create a small test program with the same
query and the same parameters.

But when used in the context of the whole application it gives the error.

 I don't know what should I try. I tried all possible combinations of
parameters.

I have just even tried to disable triggers and remove any constraints
from the table. Same error.

#4rob stone
floriparob@gmail.com
In reply to: Jiří Pavlovský (#3)
Re: libpq and mysterious "invalid byte sequence for encoding UTF8".

Hello,

On Mon, 2021-02-01 at 18:03 +0100, Jiří Pavlovský wrote:

Thanks for the answer.

I don't think triggers or constraints are the issue.

The exact same code works if I create a small test program with the
same
query and the same parameters.

But when used in the context of the whole application it gives the
error.

  I don't know what should I try. I tried all possible combinations
of
parameters.

I have just even tried to disable triggers and remove any constraints
from the table. Same error.

Columns:- maincontact boolean DEFAULT false,
publdatacontact boolean DEFAULT false,
invcontact boolean DEFAULT false,
queries_recipient boolean,
fakelastname boolean NOT NULL DEFAULT false,

are defined as booleans and all except one have defaults declared, yet
you are passing NULL's which the insert tries to cast to boolean.
Maybe this is the cause of the error??

My 2 cents worth.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: rob stone (#4)
Re: libpq and mysterious "invalid byte sequence for encoding UTF8".

rob stone <floriparob@gmail.com> writes:

Columns:- maincontact boolean DEFAULT false,
publdatacontact boolean DEFAULT false,
invcontact boolean DEFAULT false,
queries_recipient boolean,
fakelastname boolean NOT NULL DEFAULT false,

are defined as booleans and all except one have defaults declared, yet
you are passing NULL's which the insert tries to cast to boolean.
Maybe this is the cause of the error??

Oooh ... if we posit that the statement is expected to throw an error,
maybe what this error is complaining about is an inability to translate
a localized error message to the client's encoding. This could reflect
an incorrectly-encoded .po file, for example. So what we need to know
next is what lc_messages setting Jiří is using, and also the server
and client encodings.

regards, tom lane

#6Jiří Pavlovský
jiri@pavlovsky.eu
In reply to: Tom Lane (#5)
Re: libpq and mysterious "invalid byte sequence for encoding UTF8".

On 02.02.2021 3:53, Tom Lane wrote:

rob stone <floriparob@gmail.com> writes:

Columns:- maincontact boolean DEFAULT false,
publdatacontact boolean DEFAULT false,
invcontact boolean DEFAULT false,
queries_recipient boolean,
fakelastname boolean NOT NULL DEFAULT false,
are defined as booleans and all except one have defaults declared, yet
you are passing NULL's which the insert tries to cast to boolean.
Maybe this is the cause of the error??

Oooh ... if we posit that the statement is expected to throw an error,
maybe what this error is complaining about is an inability to translate
a localized error message to the client's encoding. This could reflect
an incorrectly-encoded .po file, for example. So what we need to know
next is what lc_messages setting Jiří is using, and also the server
and client encodings.

Still, this does not explain why it sometimes work and why the byte
sequence in the error message changes on every run, no?

I'm getting the error even when I changes the boolean fields from NULL
to FALSE.

Client and server encoding is UTF8. Locales are set to C.UTF8.

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jiří Pavlovský (#6)
Re: libpq and mysterious "invalid byte sequence for encoding UTF8".

=?UTF-8?B?SmnFmcOtIFBhdmxvdnNrw70=?= <jiri@pavlovsky.eu> writes:

Client and server encoding is UTF8. Locales are set to C.UTF8.

If lc_messages is C then no translation of error strings would happen,
so that theory goes out the window. Oh well.

Perhaps you could attach gdb to the backend and get a stack trace
from the call to errfinish?

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

regards, tom lane