Libpq: copy file to bytea column
Hi!
The data types of tableout.c1 and tablein.c1 are both bytea. I first export tableout.c1 to a file:
db1=# COPY (SELECT c1 FROM tableout LIMIT 1) TO '/tmp/t';
Then I try to import the file to another table.
This works without flaw:
db1=# COPY tablein FROM '/tmp/t';
However, I get the following errors from log when calling libpq functions PQputCopyData() and PQputCopyEnd().
2010-03-06 20:47:42 CST ERROR: invalid byte sequence for encoding "UTF8": 0x00
2010-03-06 20:47:42 CST HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
2010-03-06 20:47:42 CST CONTEXT: COPY in, line 1: "TPF0\011Report\000\003Tag\002\365\016ExportFromPage\002\000\014ExportToPage\002\000\006Values\016..."
2010-03-06 20:47:42 CST STATEMENT: COPY in FROM STDIN
Helps will be much appreciated!
CN
seiliki wrote:
The data types of tableout.c1 and tablein.c1 are both bytea.
I first export tableout.c1 to a file:db1=# COPY (SELECT c1 FROM tableout LIMIT 1) TO '/tmp/t';
Then I try to import the file to another table.
This works without flaw:
db1=# COPY tablein FROM '/tmp/t';
However, I get the following errors from log when calling
libpq functions PQputCopyData() and PQputCopyEnd().2010-03-06 20:47:42 CST ERROR: invalid byte sequence for
encoding "UTF8": 0x00
2010-03-06 20:47:42 CST HINT: This error can also happen if
the byte sequence does not match the encoding expected by the
server, which is controlled by "client_encoding".
2010-03-06 20:47:42 CST CONTEXT: COPY in, line 1:
"TPF0\011Report\000\003Tag\002\365\016ExportFromPage\002\000\014ExportToPage\002\000\006Values\016..."
2010-03-06 20:47:42 CST STATEMENT: COPY in FROM STDINHelps will be much appreciated!
The difference here is that your first examples are COPY TO/FROM
files on the database server, while your failing C code uses
COPY FROP STDIN.
You will not meet encoding problems with the first because
the file /tmp/t is created in the server encoding, so if you read
it later with a COPY FROM, it will automatically be in the encoding
that COPY FROM <file> expects.
When you use COPY FROM STDIN, you send data from the client to the
server, which is something quite different. Data sent from the client
are expected to be in the client encoding, not in the server encoding.
You can use the SQL statement "SET client_encoding = ..." to set the
client encoding to the encoding of your data, or you can use the
PGCLIENTENCODING environment variable to set it automatically.
The error message you are getting indicates that you client_encoding
is currently UTF8.
By the way, the line quoted in your error message does not look like
valid text at all - it contains "zero" bytes, and preceeding each
string there seems to be a byte indicating the length of the string
that follows.
Yours,
Laurenz Albe
The data types of tableout.c1 and tablein.c1 are both bytea.
I first export tableout.c1 to a file:db1=# COPY (SELECT c1 FROM tableout LIMIT 1) TO '/tmp/t';
Then I try to import the file to another table.
This works without flaw:
db1=# COPY tablein FROM '/tmp/t';
However, I get the following errors from log when calling
libpq functions PQputCopyData() and PQputCopyEnd().2010-03-06 20:47:42 CST ERROR: invalid byte sequence for
encoding "UTF8": 0x00
2010-03-06 20:47:42 CST HINT: This error can also happen if
the byte sequence does not match the encoding expected by the
server, which is controlled by "client_encoding".
2010-03-06 20:47:42 CST CONTEXT: COPY in, line 1:
"TPF0\011Report\000\003Tag\002\365\016ExportFromPage\002\000\014ExportToPage\002\000\006Values\016..."
2010-03-06 20:47:42 CST STATEMENT: COPY in FROM STDINThe difference here is that your first examples are COPY TO/FROM
files on the database server, while your failing C code uses
COPY FROP STDIN.You will not meet encoding problems with the first because
the file /tmp/t is created in the server encoding, so if you read
it later with a COPY FROM, it will automatically be in the encoding
that COPY FROM <file> expects.When you use COPY FROM STDIN, you send data from the client to the
server, which is something quite different. Data sent from the client
are expected to be in the client encoding, not in the server encoding.You can use the SQL statement "SET client_encoding = ..." to set the
client encoding to the encoding of your data, or you can use the
PGCLIENTENCODING environment variable to set it automatically.The error message you are getting indicates that you client_encoding
is currently UTF8.
By the way, the line quoted in your error message does not look like
valid text at all - it contains "zero" bytes, and preceeding each
string there seems to be a byte indicating the length of the string
that follows.
Changing client encoding can not make PQputCopyData() work, either. The problem seems to be caused by the fact that PQputCopyData() does not accept zero value ('\0' in C), which I think can not be encoded to any character regardless of which client encoding is used.
Yes, the data I am trying to import is binary data. That is why I use bytea to store it.
What I can't understand is why PQputCopyData() encodes incoming character string, which was exported by "COPY ... TO '/tmp/t'" SQL command under psql prompt, for bytea columns and refuse to accept or misinterpret zero value.
Below is the more or less self contained code that also yields the same problem.
----------------------------
#define BYTEA "abcd\\011\\000ABCDEFGHIJKLMNO"
int main()
{
PGconn *PgConnect;
PGresult *result;
PgConnect=PQconndb("host=localhost port=5432 dbname=db1 user=postgres password=passwd");
if(!PgConnect)
puts("Fail to connect.");
else{
result=PQexec(PgConnect,"COPY tablein FROM STDIN");
if(IsPgsqlExecutionOk(NULL,result)){
if(PQputCopyData(PgConnect,BYTEA,20) == 1){
if(PQputCopyEnd(PgConnect,NULL) == 1)
puts("Done");
else
puts(PQerrorMessage(PgConnect));
}
else
puts(PQerrorMessage(PgConnect));
}
PQfinish(PgConnect);
}
}
------Errors from the code:---------------
ERROR: invalid byte sequence for encoding "UTF8": 0x00
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
CONTEXT: COPY tablein, line 1: "abcd\011\000ABCDEFGH"
STATEMENT: COPY tablein FROM STDIN
----------------------------
Another issue I can not understand is that both PQputCopyData() and PQputCopyEnd() always return 1 even though the copy operation actually fails. That being said, the above code always shows "Done" but actually error occurs.
Regards,
CN
What I can't understand is why PQputCopyData() encodes incoming character string, which was exported by "COPY ... TO '/tmp/t'" SQL command under psql prompt, for bytea columns and refuse to accept or misinterpret zero value.
As far as PQputCopyData() is concerned, server and/or client does not have to decode input character string for bytea column, does they? My understanding is that all client or server need to do to import data to bytea column is un-escaping all sets of \nnn (or \x000) character strings back to binary bytes and unconditionally store the result to the target bytea column.
Encoding/Decoding is required only by TEXT and (VAR)CHAR columns, isn't it?
Errors like the following ones should not occur to bytea columns, should they?
ERROR: invalid byte sequence for encoding "UTF8": 0x00
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
CONTEXT: COPY tablein, line 1: "abcd\011\000ABCDEFGH"
STATEMENT: COPY tablein FROM STDIN
Regards,
CN
seiliki wrote:
The data types of tableout.c1 and tablein.c1 are both bytea.
[...]
However, I get the following errors from log when calling
libpq functions PQputCopyData() and PQputCopyEnd().2010-03-06 20:47:42 CST ERROR: invalid byte sequence for encoding "UTF8": 0x00
[...]
When you use COPY FROM STDIN, you send data from the client to the
server, which is something quite different. Data sent from the client
are expected to be in the client encoding, not in the server encoding.
[...]
Changing client encoding can not make PQputCopyData() work,
either. The problem seems to be caused by the fact that
PQputCopyData() does not accept zero value ('\0' in C),
which I think can not be encoded to any character regardless
of which client encoding is used.Yes, the data I am trying to import is binary data. That is
why I use bytea to store it.What I can't understand is why PQputCopyData() encodes
incoming character string, which was exported by "COPY ... TO
'/tmp/t'" SQL command under psql prompt, for bytea columns
and refuse to accept or misinterpret zero value.
Sorry, that was my mistake. I did not see that you said that the
columns were bytea. So you can safely ignore everything I said before.
Below is the more or less self contained code that also
yields the same problem.
----------------------------
#define BYTEA "abcd\\011\\000ABCDEFGHIJKLMNO"
I looked it again, and the problem is the number of backslashes.
You are using text mode COPY, not binary COPY, so all the
non-printable characters are escaped with backslash sequences.
As described in
http://www.postgresql.org/docs/8.4/static/datatype-binary.html
certain octets must be escaped.
The problem is that there are two levels of escaping:
First there is some escaping for text literals, and then there
is the bytea escape syntax.
So for the zero octet, \000 would be the bytea escape, and since
backslash is a special character for text literals, it must be
escaped itself.
Ok, so double backslash.
Now since you represent the string as a literal in C, you have a third
level of escaping. So each backslash must again be represented by a
double backslash.
This leads to C strings like "\\\\000".
Your "\\000" would send '\000' to PostgreSQL, which the string
parser interprets as character zero, see
http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-CONSTANTS
This is rejected as string constant right away.
[...]
Another issue I can not understand is that both
PQputCopyData() and PQputCopyEnd() always return 1 even
though the copy operation actually fails. That being said,
the above code always shows "Done" but actually error occurs.
As described in
http://www.postgresql.org/docs/current/static/libpq-copy.html
you'll have to call PQgetResult after PQputCopyEnd to get the
result of the COPY statement.
Here's my modified version of your sample that works for me:
#include <stdio.h>
#include <string.h>
#include <libpq-fe.h>
#define BYTEA "abcd\\\\011\\\\000ABCDEFGHIJKLMNO"
int main()
{
PGconn *PgConnect;
PGresult *result;
PgConnect = PQconnectdb("");
if (!PgConnect)
puts("Fail to connect.");
else {
result = PQexec(PgConnect,"COPY tablein FROM STDIN");
if (PGRES_COPY_IN == PQresultStatus(result)) {
if (PQputCopyData(PgConnect,BYTEA,strlen(BYTEA)) == 1) {
if (PQputCopyEnd(PgConnect,NULL) == 1) {
PQclear(result);
result = PQgetResult(PgConnect);
if (PGRES_COMMAND_OK == PQresultStatus(result))
puts("Done");
else
puts(PQerrorMessage(PgConnect));
}
else
puts(PQerrorMessage(PgConnect));
}
else
puts(PQerrorMessage(PgConnect));
}
else
puts(PQerrorMessage(PgConnect));
PQfinish(PgConnect);
}
}
The problems I spotted in your version:
- there must be four backslashes in the #definition
- you forgot to call PQgetResult
- the second argument of PQputCopyData must be the length of the buffer,
not the length of the resulting bytea field (which is 21, by the way)
- you did not free the memory of "result"
Note that my sample also cannot serve as a textbook example, since
for example PQexec could return a NULL result if it runs out of memory.
Yours,
Laurenz Albe