BUG #14048: copy issues with jsonb

Started by Rick Ottenabout 10 years ago6 messagesbugs
Jump to latest
#1Rick Otten
rotten@windfish.net

The following bug has been logged on the website:

Bug reference: 14048
Logged by: Rick Otten
Email address: rotten@windfish.net
PostgreSQL version: 9.5.1
Operating system: Ubuntu 14.04
Description:

I have a few jsonb columns in a table and I've run into several issues with
COPY with that column type. I didn't see anything specific about it in the
to-do list, and Google didn't turn up anyone else specifically complaining,
so I'm filing it here. I apologize if this is a known issue.

The table is the backend for a web pixel (tracking) service. It has 4 jsonb
columns which contain variable data about the browser and session. (parsed
user agent, list of cookies, query parameters, etc...)

I wanted to copy a section of the table from one 9.5.1 database to another.

For some reason the CSV and TEXT format exports do not quote/escape the
nested JSON very well - after several tries, I gave up on that approach.
The "copy to" output appears to be too mangled. It was not obvious, at
first glance, where it is getting confused.

Next I tried BINARY mode. That worked ok for the rows I was moving last
week, but this week (for a different piece of the table) I'm getting the
following when I try to read it back in:

# copy (select * from pixels.pixel_requests where server_timestamp >
'2016-03-26 00:00') to '/home/rotten/pixels27Mar2016.bin' binary;

# copy pixels.pixel_requests_tmp_2 from '/home/rotten/pixels27Mar2016.bin'
binary;

ERROR: unsupported jsonb version number 63
CONTEXT: COPY pixel_requests_tmp_2, line 1, column parameters

The new tmp table has the same ddl as the original table. Both tables are
UTF-8 enabled. Both are the same version of PostgreSQL running on the same
version of Ubuntu.

I'm willing to share the copy output (313M - 288,000 rows) with someone who
is able to help troubleshoot the issue, but I don't want to post the data in
a public forum.

I'm not sure which row and column is causing the copy command to fail.
(There is already a to-do that may help with that.)

Meanwhile, I'll use pg_dump to get the rows I need. Hopefully that will be
able to extract and reload the jsonb data ok. pg_dump hasn't been a problem
in the past anyway...

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Rick Otten (#1)
Re: BUG #14048: copy issues with jsonb

On Sun, Mar 27, 2016 at 8:04 AM, <rotten@windfish.net> wrote:

The following bug has been logged on the website:

Bug reference: 14048
Logged by: Rick Otten
Email address: rotten@windfish.net
PostgreSQL version: 9.5.1
Operating system: Ubuntu 14.04
Description:

I have a few jsonb columns in a table and I've run into several issues with
COPY with that column type. I didn't see anything specific about it in the
to-do list, and Google didn't turn up anyone else specifically complaining,
so I'm filing it here. I apologize if this is a known issue.

The table is the backend for a web pixel (tracking) service. It has 4
jsonb
columns which contain variable data about the browser and session. (parsed
user agent, list of cookies, query parameters, etc...)

I wanted to copy a section of the table from one 9.5.1 database to another.

For some reason the CSV and TEXT format exports do not quote/escape the
nested JSON very well - after several tries, I gave up on that approach.
The "copy to" output appears to be too mangled. It was not obvious, at
first glance, where it is getting confused.

​[...]

Meanwhile, I'll use pg_dump to get the rows I need. Hopefully that will be
able to extract and reload the jsonb data ok. pg_dump hasn't been a
problem
in the past anyway...

​I'm reasonably certain pg_dump/pg_restore makes use of "COPY" to perform
its work - at least for custom format and plain non-inserts mode​ - so if
that works but whatever you are doing manually does not there is some
degree of suspicion that what you are doing may be incorrect.

We really need a sample record and a self-contained script that you can
make reproduce this behavior.

David J.

#3Rick Otten
rotten@windfish.net
In reply to: David G. Johnston (#2)
Re: BUG #14048: copy issues with jsonb

Ok. There are 288K rows in the file that was failing yesterday. I'll
play with date ranges until I can narrow it down to a few rows and send
it along.

I wasn't able to get to trying pgdump yesterday, I'll give it a try
today and let you know.

What I'm doing manually is exactly what I posted.

On 2016-03-27 22:25, David G. Johnston wrote:

Show quoted text

On Sun, Mar 27, 2016 at 8:04 AM, <rotten@windfish.net> wrote:

The following bug has been logged on the website:

Bug reference: 14048
Logged by: Rick Otten
Email address: rotten@windfish.net
PostgreSQL version: 9.5.1
Operating system: Ubuntu 14.04
Description:

I have a few jsonb columns in a table and I've run into several issues with
COPY with that column type. I didn't see anything specific about it in the
to-do list, and Google didn't turn up anyone else specifically complaining,
so I'm filing it here. I apologize if this is a known issue.

The table is the backend for a web pixel (tracking) service. It has 4 jsonb
columns which contain variable data about the browser and session. (parsed
user agent, list of cookies, query parameters, etc...)

I wanted to copy a section of the table from one 9.5.1 database to another.

For some reason the CSV and TEXT format exports do not quote/escape the
nested JSON very well - after several tries, I gave up on that approach.
The "copy to" output appears to be too mangled. It was not obvious, at
first glance, where it is getting confused.

​[...]

Meanwhile, I'll use pg_dump to get the rows I need. Hopefully that will be
able to extract and reload the jsonb data ok. pg_dump hasn't been a problem
in the past anyway...

​I'm reasonably certain pg_dump/pg_restore makes use of "COPY" to perform its work - at least for custom format and plain non-inserts mode​ - so if that works but whatever you are doing manually does not there is some degree of suspicion that what you are doing may be incorrect.

We really need a sample record and a self-contained script that you can make reproduce this behavior.

David J.

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: David G. Johnston (#2)
Re: BUG #14048: copy issues with jsonb

On Sun, Mar 27, 2016 at 7:25 PM, David G. Johnston <
david.g.johnston@gmail.com> wrote:

On Sun, Mar 27, 2016 at 8:04 AM, <rotten@windfish.net> wrote:

The following bug has been logged on the website:

Bug reference: 14048
Logged by: Rick Otten
Email address: rotten@windfish.net
PostgreSQL version: 9.5.1
Operating system: Ubuntu 14.04
Description:

I have a few jsonb columns in a table and I've run into several issues
with
COPY with that column type. I didn't see anything specific about it in
the
to-do list, and Google didn't turn up anyone else specifically
complaining,
so I'm filing it here. I apologize if this is a known issue.

The table is the backend for a web pixel (tracking) service. It has 4
jsonb
columns which contain variable data about the browser and session.
(parsed
user agent, list of cookies, query parameters, etc...)

I wanted to copy a section of the table from one 9.5.1 database to
another.

For some reason the CSV and TEXT format exports do not quote/escape the
nested JSON very well - after several tries, I gave up on that approach.
The "copy to" output appears to be too mangled. It was not obvious, at
first glance, where it is getting confused.

​[...]

Meanwhile, I'll use pg_dump to get the rows I need. Hopefully that will
be
able to extract and reload the jsonb data ok. pg_dump hasn't been a
problem
in the past anyway...

​I'm reasonably certain pg_dump/pg_restore makes use of "COPY" to perform
its work - at least for custom format and plain non-inserts mode​ - so if
that works but whatever you are doing manually does not there is some
degree of suspicion that what you are doing may be incorrect.

We really need a sample record and a self-contained script that you can
make reproduce this behavior.

​Rick provided a file offline which I peeked at. Responding to him
privately but lacking further public comments this report can be considered
"Not a bug".

David J.

#5Rick Otten
rotten@windfish.net
In reply to: David G. Johnston (#4)
Re: BUG #14048: copy issues with jsonb

So the problem is a column order thing and not something buried in my
jsonb data. Mr. Johnston was very helpful in resolving this.

Apparently when you do a "copy from with csv header", the 'header'
phrase is ignored in spite of the keyword being present in your
statement. By removing the header from the csv file and then pasting it
into the 'copy from' statement (to specify the column order), I was able
to load from csv format. The error looked to me like something was wrong
with the jsonb data, but really it was a column ordering problem.

Ditto for Binary. Unless you know the column order that was dumped into
the binary, you can't do the 'copy from'. (And since you can't see the
column order in the binary file very easily, you have to know it before
you dump it.) 'copy from binary' doesn't infer column order from the
file via any sort of binary header.

I noticed that pg_dump was being very explicit about the column
ordering, which is what tipped me off, and then it was confirmed by Mr.
Johnston and a series of experiments.

I apologize for thinking this was a bug. The slightly unexpected
behavior (to me) had me stumped for a while.

On 2016-03-28 11:48, David G. Johnston wrote:

On Sun, Mar 27, 2016 at 7:25 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Sun, Mar 27, 2016 at 8:04 AM, <rotten@windfish.net> wrote:

The following bug has been logged on the website:

Bug reference: 14048
Logged by: Rick Otten
Email address: rotten@windfish.net
PostgreSQL version: 9.5.1
Operating system: Ubuntu 14.04
Description:

I have a few jsonb columns in a table and I've run into several issues with
COPY with that column type. I didn't see anything specific about it in the
to-do list, and Google didn't turn up anyone else specifically complaining,
so I'm filing it here. I apologize if this is a known issue.

The table is the backend for a web pixel (tracking) service. It has 4 jsonb
columns which contain variable data about the browser and session. (parsed
user agent, list of cookies, query parameters, etc...)

I wanted to copy a section of the table from one 9.5.1 database to another.

For some reason the CSV and TEXT format exports do not quote/escape the
nested JSON very well - after several tries, I gave up on that approach.
The "copy to" output appears to be too mangled. It was not obvious, at
first glance, where it is getting confused.
​[...]
Meanwhile, I'll use pg_dump to get the rows I need. Hopefully that will be
able to extract and reload the jsonb data ok. pg_dump hasn't been a problem
in the past anyway...

​I'm reasonably certain pg_dump/pg_restore makes use of "COPY" to perform its work - at least for custom format and plain non-inserts mode​ - so if that works but whatever you are doing manually does not there is some degree of suspicion that what you are doing may be incorrect.

We really need a sample record and a self-contained script that you can make reproduce this behavior.

​Rick provided a file offline which I peeked at. Responding to him
privately but lacking further public comments this report can be
considered "Not a bug".

David J.

#6David G. Johnston
david.g.johnston@gmail.com
In reply to: Rick Otten (#5)
Re: BUG #14048: copy issues with jsonb

On Mon, Mar 28, 2016 at 9:35 AM, Rick Otten <rotten@windfish.net> wrote:

So the problem is a column order thing and not something buried in my
jsonb data. Mr. Johnston was very helpful in resolving this.

Apparently when you do a "copy from with csv header", the 'header' phrase
is ignored in spite of the keyword being present in your statement.

​It is not ignored - it is informing COPY that the first line of your input
file is not data but meta-data and thus must be ignored.​

There are external utilities available that make doing the whole copy thing
much more user-friendly. Given their existence not much effort is expended
trying to enhance the native copy facilities beyond making it extremely
fast for bulk loading well-specified data.

David J.