json datatype and table bloat?

Started by Gregory Haaseover 12 years ago20 messagesgeneral

haaseg@onefreevoice.com

over 12 years ago

I have a table that is triggering my nagios database bloat alert regularly.
Usually, I have to give it the vacuum full more than once to get it under
the threshold. Today I tried repeatedly and cannot get the alert to resolve.

I had a discussion with one of the primary developers about how the table
is utilized, and it turns out they are basically only ever inserting into
it. This sort of flies in the face of conventional wisdom about bloat being
caused by frequent updates and deletes.

We were looking at it, and one of the things that struck me is that this
table has a column with a json datatype. I looked through
information_schema.columns and there is only one other table with a json
datatype, and I recall having bloat issues with this table in the past as
well.

I'm wondering if the json datatype is just naturally more bloated than
other types, or if the query in the check_postgresql.pl nagios script is
not accurate, or if maybe my thresholds are simply too low?

The table design itself is pretty simple:
id | integer | not null default nextval('
table_schema.table_name_id_seq'::regclass)
type | character varying(255) |
criteria | json |
created_at | timestamp without time zone | not null
updated_at | timestamp without time zone | not null
Indexes:
"table_name_pkey" PRIMARY KEY, btree (id)

The nagios output looks like this (sanitized):
POSTGRES_BLOAT WARNING: DB "db_name" (host:host.domain.com) (db db_name)
table table_schema.table_name rows:29305 pages:1733 shouldbe:330 (5.3X)
wasted size:11493376 (10 MB)

Thoughts?

Greg Haase

chris.travers@gmail.com

over 12 years ago

In reply to: Gregory Haase (#1)

Re: json datatype and table bloat?

On Mon, Oct 28, 2013 at 4:17 PM, Gregory Haase <haaseg@onefreevoice.com>wrote:

I have a table that is triggering my nagios database bloat alert
regularly. Usually, I have to give it the vacuum full more than once to get
it under the threshold. Today I tried repeatedly and cannot get the alert
to resolve.

I had a discussion with one of the primary developers about how the table
is utilized, and it turns out they are basically only ever inserting into
it. This sort of flies in the face of conventional wisdom about bloat being
caused by frequent updates and deletes.

As I understand it, vacuuming only removes the tuples removed by delete and
update operations.

The question is how this is being inserted and if there is anything that
ever updates the rows in any way. Maybe this is an manual process? But
for 5x bloat, you have to have it be a repeated process.

Maybe there was bad data that had to be corrected?

We were looking at it, and one of the things that struck me is that this
table has a column with a json datatype. I looked through
information_schema.columns and there is only one other table with a json
datatype, and I recall having bloat issues with this table in the past as
well.

I'm wondering if the json datatype is just naturally more bloated than
other types, or if the query in the check_postgresql.pl nagios script is
not accurate, or if maybe my thresholds are simply too low?

Next time this happens it would be worth seeing what VACUUM FULL VERBOSE
output is for that table.

--
Best Wishes,
Chris Travers

Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor
lock-in.
http://www.efficito.com/learn_more.shtml

mmoncure@gmail.com

over 12 years ago

In reply to: Chris Travers (#2)

Re: json datatype and table bloat?

On Tue, Oct 29, 2013 at 5:38 AM, Chris Travers <chris.travers@gmail.com> wrote:

On Mon, Oct 28, 2013 at 4:17 PM, Gregory Haase <haaseg@onefreevoice.com>
wrote:

I have a table that is triggering my nagios database bloat alert
regularly. Usually, I have to give it the vacuum full more than once to get
it under the threshold. Today I tried repeatedly and cannot get the alert to
resolve.

I had a discussion with one of the primary developers about how the table
is utilized, and it turns out they are basically only ever inserting into
it. This sort of flies in the face of conventional wisdom about bloat being
caused by frequent updates and deletes.

As I understand it, vacuuming only removes the tuples removed by delete and
update operations.

well, or by rolled back transactions. we we have to wonder if OP has
a lot of queries trying to insert and failing. maybe check the log?

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

haaseg@onefreevoice.com

over 12 years ago

In reply to: Merlin Moncure (#3)

Re: json datatype and table bloat?

Following up.

I don't see any rolled back transactions in the logs.

The part that troubles me really is that vacuum full doesn't actually fix
the problem. If there were bad data that had been corrected via mass
updates, I'd expect the bloat issue to be fixed by a vacuum full.

When I run the vacuum back to back, this is what I get:

db_name=# VACUUM FULL VERBOSE table_schema.table_name;
INFO: vacuuming "table_schema.table_name"
INFO: "table_name": found 2 removable, 29663 nonremovable row versions in
1754 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU 0.07s/0.10u sec elapsed 0.30 sec.
VACUUM
db_name=# VACUUM FULL VERBOSE table_schema.table_name;
INFO: vacuuming "table_schema.table_name"
INFO: "table_name": found 0 removable, 29663 nonremovable row versions in
1754 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU 0.09s/0.09u sec elapsed 0.32 sec.
VACUUM

I think the question to address may be: "Why does the check_postgres query
think there should only be 334 pages instead of 1754?"

The tbloat and wastedbytes calculations provided in the query from this
page: http://wiki.postgresql.org/wiki/Show_database_bloat seems to
correlate with the wasted bytes reported by nagios though.

Greg Haase

On Tue, Oct 29, 2013 at 7:06 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

Show quoted text

On Tue, Oct 29, 2013 at 5:38 AM, Chris Travers <chris.travers@gmail.com>
wrote:

On Mon, Oct 28, 2013 at 4:17 PM, Gregory Haase <haaseg@onefreevoice.com>
wrote:

I have a table that is triggering my nagios database bloat alert
regularly. Usually, I have to give it the vacuum full more than once to

get

it under the threshold. Today I tried repeatedly and cannot get the

alert to

resolve.

I had a discussion with one of the primary developers about how the

table

is utilized, and it turns out they are basically only ever inserting

into

it. This sort of flies in the face of conventional wisdom about bloat

being

caused by frequent updates and deletes.

As I understand it, vacuuming only removes the tuples removed by delete

and

update operations.

well, or by rolled back transactions. we we have to wonder if OP has
a lot of queries trying to insert and failing. maybe check the log?

merlin

pierce@hogranch.com

over 12 years ago

In reply to: Gregory Haase (#4)

Re: json datatype and table bloat?

On 10/29/2013 12:41 PM, Gregory Haase wrote:

db_name=# VACUUM FULL VERBOSE table_schema.table_name;
INFO: vacuuming "table_schema.table_name"
INFO: "table_name": found 2 removable, 29663 nonremovable row
versions in 1754 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU 0.07s/0.10u sec elapsed 0.30 sec.

is there an old transaction pending? that 'masks' vacuum from touching
any tuples newer than the start of that transaction.

--
john r pierce 37N 122W
somewhere on the middle of the left coast

haaseg@onefreevoice.com

over 12 years ago

In reply to: John R Pierce (#5)

Re: json datatype and table bloat?

So, between yesterday and today we actually failed over to our hot-standby
instance and the issue hasn't changed. I don't think you can have a pending
transaction across streaming replication.

On Tue, Oct 29, 2013 at 12:49 PM, John R Pierce <pierce@hogranch.com> wrote:

Show quoted text

On 10/29/2013 12:41 PM, Gregory Haase wrote:

db_name=# VACUUM FULL VERBOSE table_schema.table_name;
INFO: vacuuming "table_schema.table_name"
INFO: "table_name": found 2 removable, 29663 nonremovable row versions
in 1754 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU 0.07s/0.10u sec elapsed 0.30 sec.

is there an old transaction pending? that 'masks' vacuum from touching
any tuples newer than the start of that transaction.

--
john r pierce 37N 122W
somewhere on the middle of the left coast

haaseg@onefreevoice.com

over 12 years ago

In reply to: Gregory Haase (#6)

Re: json datatype and table bloat?

One more thing I just tried:

create table table_schema.table_name_new (like table_schema.table_name);
insert into table_schema.table_name_new select * from table_schema.table_
name;

The new tables shows the same amount of wasted bytes and pages as the old.

So I think based on that I'm going to throw out any notion of updates or
deletes as cause for bloat on this particular table.

-G

On Tue, Oct 29, 2013 at 12:53 PM, Gregory Haase <haaseg@onefreevoice.com>wrote:

Show quoted text

So, between yesterday and today we actually failed over to our hot-standby
instance and the issue hasn't changed. I don't think you can have a pending
transaction across streaming replication.

On Tue, Oct 29, 2013 at 12:49 PM, John R Pierce <pierce@hogranch.com>wrote:

On 10/29/2013 12:41 PM, Gregory Haase wrote:

db_name=# VACUUM FULL VERBOSE table_schema.table_name;
INFO: vacuuming "table_schema.table_name"
INFO: "table_name": found 2 removable, 29663 nonremovable row versions
in 1754 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU 0.07s/0.10u sec elapsed 0.30 sec.

is there an old transaction pending? that 'masks' vacuum from touching
any tuples newer than the start of that transaction.

--
john r pierce 37N 122W
somewhere on the middle of the left coast

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: John R Pierce (#5)

Re: json datatype and table bloat?

John R Pierce <pierce@hogranch.com> writes:

On 10/29/2013 12:41 PM, Gregory Haase wrote:

db_name=# VACUUM FULL VERBOSE table_schema.table_name;
INFO: vacuuming "table_schema.table_name"
INFO: "table_name": found 2 removable, 29663 nonremovable row
versions in 1754 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU 0.07s/0.10u sec elapsed 0.30 sec.

is there an old transaction pending? that 'masks' vacuum from touching
any tuples newer than the start of that transaction.

If old transactions were the problem, vacuum would be reporting that
some-large-number of dead row versions couldn't be removed yet.

There doesn't seem to be anything obviously wrong here.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

haaseg@onefreevoice.com

over 12 years ago

In reply to: Tom Lane (#8)

Re: json datatype and table bloat?

I've isolated the problem to the json field not showing up in pg_stats,
which affects the calculation of the avg row size in the bloat query.

I'm not sure if this is a json issue or some other kind of issue.

db_name=# select c.column_name, c.data_type from information_schema.columns
c where table_name = 'table_name' and not exists (select 1 from pg_stats s
where c.table_name = s.tablename and c.column_name = s.attname);
column_name | data_type
-------------+-----------
criteria | json
(1 row)

-G

On Tue, Oct 29, 2013 at 1:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

John R Pierce <pierce@hogranch.com> writes:

On 10/29/2013 12:41 PM, Gregory Haase wrote:

db_name=# VACUUM FULL VERBOSE table_schema.table_name;
INFO: vacuuming "table_schema.table_name"
INFO: "table_name": found 2 removable, 29663 nonremovable row
versions in 1754 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU 0.07s/0.10u sec elapsed 0.30 sec.

is there an old transaction pending? that 'masks' vacuum from touching
any tuples newer than the start of that transaction.

If old transactions were the problem, vacuum would be reporting that
some-large-number of dead row versions couldn't be removed yet.

There doesn't seem to be anything obviously wrong here.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Gregory Haase (#9)

Re: json datatype and table bloat?

Gregory Haase <haaseg@onefreevoice.com> writes:

I've isolated the problem to the json field not showing up in pg_stats,
which affects the calculation of the avg row size in the bloat query.

I'm not sure if this is a json issue or some other kind of issue.

Possibly your "bloat query" is failing to consider the toast table
associated with this table? If the json values are large they'd
mostly be in the toast table not the main table.

(It's unfortunate that VACUUM FULL doesn't tell you about what's
in the toast table. I'd try just VACUUM VERBOSE here, without the
FULL, to get more info.)

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

haaseg@onefreevoice.com

over 12 years ago

In reply to: Tom Lane (#10)

Re: json datatype and table bloat?

Tom is correct: Vacuum verbose shows that their is an associated toast
table. Neither the check_postgres.pl script or the query on
http://wiki.postgresql.org/wiki/Show_database_bloat appear to take this
into consideration. Both rely on null_frac and avg_width from pg_stats to
estimate how big the table should be. I'm not sure how you would factor the
toast table into that estimate.

-G

On Tue, Oct 29, 2013 at 2:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Gregory Haase <haaseg@onefreevoice.com> writes:

I've isolated the problem to the json field not showing up in pg_stats,
which affects the calculation of the avg row size in the bloat query.

I'm not sure if this is a json issue or some other kind of issue.

Possibly your "bloat query" is failing to consider the toast table
associated with this table? If the json values are large they'd
mostly be in the toast table not the main table.

(It's unfortunate that VACUUM FULL doesn't tell you about what's
in the toast table. I'd try just VACUUM VERBOSE here, without the
FULL, to get more info.)

regards, tom lane

haaseg@onefreevoice.com

over 12 years ago

In reply to: Gregory Haase (#11)

Re: json datatype and table bloat?

I spent some more time on this today, and I realized that the issue isn't
that there are records in the toast table. The issue is that there are NO
records in the toast table. Apparently, all the json we are inserting are
too small to get toasted.

I setup a separate benchmark locally:
create table test_json_stats
(
test_json_stats_id serial,
json_data json,
insert_timestamp timestamp default now() not null
);

created a file called "json bench" with the following:
BEGIN;
insert into test_json_stats (json_data) values ('{"counters": [ {
"first":"1","second":"2"}, { "first":"3","second":"4"}, {
"first":"5","second":"6"}, { "first":"7","second":"8"}, {
"first":"9","second":"10"}, { "first":"11","second":"12"}, {
"first":"13","second":"14"}, { "first":"15","second":"16"}, {
"first":"17","second":"18"}, { "first":"19","second":"20"}, {
"first":"21","second":"22"}, { "first":"23","second":"24"}, {
"first":"25","second":"26"}, { "first":"27","second":"28"}, {
"first":"29","second":"30"}, { "first":"31","second":"32"}, {
"first":"33","second":"34"}, { "first":"35","second":"36"}, {
"first":"37","second":"38"}, { "first":"39","second":"40"}, {
"first":"41","second":"42"}, { "first":"43","second":"44"}, {
"first":"45","second":"46"} ] }');
END;

Then ran pgbench:
pgbench -c 5 -t 2000000 -f json_bench greg

vacuum vebose shows the test_json_stats table has over a million pages and
the toast table exists with zero pages:
INFO: vacuuming "public.test_json_stats"
INFO: "test_json_stats": found 0 removable, 0 nonremovable row versions in
0 out of 1010011 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.01u sec elapsed 0.01 sec.
INFO: vacuuming "pg_toast.pg_toast_51822"
INFO: index "pg_toast_51822_index" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "pg_toast_51822": found 0 removable, 0 nonremovable row versions in
0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

The json_data column is not accounted for in pg_stats:
select attname, null_frac, avg_width from pg_stats where tablename =
'test_json_stats';
attname | null_frac | avg_width
--------------------+-----------+-----------
test_json_stats_id | 0 | 4
insert_timestamp | 0 | 8
(2 rows)

So I'm not sure if I'd actually qualify this as a "bug", but it appears
that there is no way to currently get stats on a json data type.

I subsequently inserted a very large json into the table that consumed 2
pages in pg_toast_51822, but there still doesn't appear to me any way to
get stats on the column.

Greg Haase

On Tue, Oct 29, 2013 at 2:55 PM, Gregory Haase <haaseg@onefreevoice.com>wrote:

Show quoted text

Tom is correct: Vacuum verbose shows that their is an associated toast
table. Neither the check_postgres.pl script or the query on
http://wiki.postgresql.org/wiki/Show_database_bloat appear to take this
into consideration. Both rely on null_frac and avg_width from pg_stats to
estimate how big the table should be. I'm not sure how you would factor the
toast table into that estimate.

-G

On Tue, Oct 29, 2013 at 2:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Gregory Haase <haaseg@onefreevoice.com> writes:

I've isolated the problem to the json field not showing up in pg_stats,
which affects the calculation of the avg row size in the bloat query.

I'm not sure if this is a json issue or some other kind of issue.

Possibly your "bloat query" is failing to consider the toast table
associated with this table? If the json values are large they'd
mostly be in the toast table not the main table.

(It's unfortunate that VACUUM FULL doesn't tell you about what's
in the toast table. I'd try just VACUUM VERBOSE here, without the
FULL, to get more info.)

regards, tom lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Gregory Haase (#12)

Re: json datatype and table bloat?

Gregory Haase <haaseg@onefreevoice.com> writes:

The json_data column is not accounted for in pg_stats:

Ah! I hadn't twigged to the fact that your bloat measurement approach
assumed you had pg_stats entries for all the columns.

So I'm not sure if I'd actually qualify this as a "bug", but it appears
that there is no way to currently get stats on a json data type.

ANALYZE currently punts on columns that don't have an equality operator,
which json does not. There isn't that much in the way of stats that we
could collect, though I suppose we could still compute average datum width
and null fraction.

I'm not sure whether there are plans to invent an equality operator for
json.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

mmoncure@gmail.com

over 12 years ago

In reply to: Tom Lane (#13)

Re: json datatype and table bloat?

On Fri, Nov 1, 2013 at 1:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Gregory Haase <haaseg@onefreevoice.com> writes:

The json_data column is not accounted for in pg_stats:

Ah! I hadn't twigged to the fact that your bloat measurement approach
assumed you had pg_stats entries for all the columns.

So I'm not sure if I'd actually qualify this as a "bug", but it appears
that there is no way to currently get stats on a json data type.

ANALYZE currently punts on columns that don't have an equality operator,
which json does not. There isn't that much in the way of stats that we
could collect, though I suppose we could still compute average datum width
and null fraction.

I'm not sure whether there are plans to invent an equality operator for
json.

IMNSO, this may not be a bug, but it's pretty close. All base types
should have equality operator as well as other supporting
infrastructure that the database itself depends on (in/out and
send/receive for example). This is a pretty good example of why.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

ajelinek@gmail.com

over 12 years ago

In reply to: Merlin Moncure (#14)

Along the lines of the equality operator; I have ran into issues trying to
pivot a table/result set with a json type due what seemed to be no equality
operator.

On Nov 4, 2013 10:14 AM, "Merlin Moncure"
<mmoncure@gmail.com<javascript:_e({}, 'cvml', 'mmoncure@gmail.com');>>
wrote:

On Fri, Nov 1, 2013 at 1:50 PM, Tom Lane <tgl@sss.pgh.pa.us<javascript:_e({}, 'cvml', 'tgl@sss.pgh.pa.us');>>
wrote:

Gregory Haase <haaseg@onefreevoice.com <javascript:_e({}, 'cvml',

'haaseg@onefreevoice.com');>> writes:

The json_data column is not accounted for in pg_stats:

Ah! I hadn't twigged to the fact that your bloat measurement approach
assumed you had pg_stats entries for all the columns.

So I'm not sure if I'd actually qualify this as a "bug", but it appears
that there is no way to currently get stats on a json data type.

ANALYZE currently punts on columns that don't have an equality operator,
which json does not. There isn't that much in the way of stats that we
could collect, though I suppose we could still compute average datum

width

and null fraction.

I'm not sure whether there are plans to invent an equality operator for
json.

IMNSO, this may not be a bug, but it's pretty close. All base types
should have equality operator as well as other supporting
infrastructure that the database itself depends on (in/out and
send/receive for example). This is a pretty good example of why.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org<javascript:_e({}, 'cvml', 'pgsql-general@postgresql.org');>
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent from Gmail Mobile

David G. Johnston

david.g.johnston@gmail.com

over 12 years ago

In reply to: Adam Jelinek (#15)

Re: json datatype and table bloat?

ajelinek@gmail.com wrote

Along the lines of the equality operator; I have ran into issues trying to
pivot a table/result set with a json type due what seemed to be no
equality
operator.

For the curious, and also use-case considerations for development, would you
be able to share what it is you are doing (and how) that combines full json
documents with pivoting?

Compound types holding source data for a pivot seems problematic since
generally all the pivot components are single-valued and, for data, often
numerical.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/json-datatype-and-table-bloat-tp5776182p5776880.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

mmoncure@gmail.com

over 12 years ago

In reply to: David G. Johnston (#16)

Re: json datatype and table bloat?

On Mon, Nov 4, 2013 at 12:14 PM, David Johnston <polobo@yahoo.com> wrote:

ajelinek@gmail.com wrote

Along the lines of the equality operator; I have ran into issues trying to
pivot a table/result set with a json type due what seemed to be no
equality
operator.

For the curious, and also use-case considerations for development, would you
be able to share what it is you are doing (and how) that combines full json
documents with pivoting?

Compound types holding source data for a pivot seems problematic since
generally all the pivot components are single-valued and, for data, often
numerical.

would also like to see this. json type has completely displaced
crosstab in my usage. I don't typically pivot json though: I pivot the
raw data then transform to json. With limited exceptions I consider
storing json in actual table rows to be an anti-pattern (but it should
still work if you do it).

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

ajelinek@gmail.com

over 12 years ago

In reply to: Merlin Moncure (#17)

Re: json datatype and table bloat?

Along the lines of the equality operator; I have ran into issues trying

to

pivot a table/result set with a json type due what seemed to be no
equality
operator.

For the curious, and also use-case considerations for development, would
you
be able to share what it is you are doing (and how) that combines full
json
documents with pivoting?

Compound types holding source data for a pivot seems problematic since
generally all the pivot components are single-valued and, for data, often
numerical.

would also like to see this. json type has completely displaced
crosstab in my usage. I don't typically pivot json though: I pivot the
raw data then transform to json. With limited exceptions I consider
storing json in actual table rows to be an anti-pattern (but it should
still work if you do it).

I could not figure out what I was doing last month to reproduce this. So I
did a small pivot poc, and it is erroring on the max function. So it is
probably not the same issue. My guess is I tried the using the GREATEST
function as a hail marry (which would not have worked) and got the following
message; ERROR: could not identify a comparison function for type json and
then thought/hopped it was the same thing when reading the emails.

CREATE TABLE bad_table_json(id int, detail_type text, details json);
INSERT INTO bad_table_json values(1, 'a', '{"a":1}'::json);
INSERT INTO bad_table_json values(1, 'b', '{"a":1}'::json);
INSERT INTO bad_table_json values(1, 'c', '{"a":1}'::json);

SELECT id
,MAX(CASE WHEN detail_type = 'a' THEN details END) AS a
,MAX(CASE WHEN detail_type = 'b' THEN details END) AS b
,MAX(CASE WHEN detail_type = 'c' THEN details END) AS c
FROM bad_table_json
GROUP BY id

--
View this message in context: http://postgresql.1045698.n5.nabble.com/json-datatype-and-table-bloat-tp5776182p5776947.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

haaseg@onefreevoice.com

over 12 years ago

In reply to: Adam Jelinek (#18)

Re: json datatype and table bloat?

While I agree that an equality operator doesn't really make sense for json
operationally, there are certain maintenance reasons why it may come in
handy. Removing duplicate records comes to mind.

Other than adding basic stats to necessary columns, I would say that the
equality operator is really one of the most basic tenets of a relational
database and should probably exist for any data type - regardless of it's
given usefullness.

Greg Haase
On Nov 4, 2013 6:31 PM, "ajelinek@gmail.com" <ajelinek@gmail.com> wrote:

Show quoted text

Along the lines of the equality operator; I have ran into issues trying

to

pivot a table/result set with a json type due what seemed to be no
equality
operator.

For the curious, and also use-case considerations for development, would
you
be able to share what it is you are doing (and how) that combines full
json
documents with pivoting?

Compound types holding source data for a pivot seems problematic since
generally all the pivot components are single-valued and, for data,

often

numerical.

would also like to see this. json type has completely displaced
crosstab in my usage. I don't typically pivot json though: I pivot the
raw data then transform to json. With limited exceptions I consider
storing json in actual table rows to be an anti-pattern (but it should
still work if you do it).

I could not figure out what I was doing last month to reproduce this. So
I
did a small pivot poc, and it is erroring on the max function. So it is
probably not the same issue. My guess is I tried the using the GREATEST
function as a hail marry (which would not have worked) and got the
following
message; ERROR: could not identify a comparison function for type json and
then thought/hopped it was the same thing when reading the emails.

CREATE TABLE bad_table_json(id int, detail_type text, details json);
INSERT INTO bad_table_json values(1, 'a', '{"a":1}'::json);
INSERT INTO bad_table_json values(1, 'b', '{"a":1}'::json);
INSERT INTO bad_table_json values(1, 'c', '{"a":1}'::json);

SELECT id
,MAX(CASE WHEN detail_type = 'a' THEN details END) AS a
,MAX(CASE WHEN detail_type = 'b' THEN details END) AS b
,MAX(CASE WHEN detail_type = 'c' THEN details END) AS c
FROM bad_table_json
GROUP BY id

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/json-datatype-and-table-bloat-tp5776182p5776947.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

mmoncure@gmail.com

over 12 years ago

In reply to: Adam Jelinek (#18)

Re: json datatype and table bloat?

On Mon, Nov 4, 2013 at 8:31 PM, ajelinek@gmail.com <ajelinek@gmail.com> wrote:

Along the lines of the equality operator; I have ran into issues trying

to

pivot a table/result set with a json type due what seemed to be no
equality
operator.

For the curious, and also use-case considerations for development, would
you
be able to share what it is you are doing (and how) that combines full
json
documents with pivoting?

Compound types holding source data for a pivot seems problematic since
generally all the pivot components are single-valued and, for data, often
numerical.

would also like to see this. json type has completely displaced
crosstab in my usage. I don't typically pivot json though: I pivot the
raw data then transform to json. With limited exceptions I consider
storing json in actual table rows to be an anti-pattern (but it should
still work if you do it).

I could not figure out what I was doing last month to reproduce this. So I
did a small pivot poc, and it is erroring on the max function. So it is
probably not the same issue. My guess is I tried the using the GREATEST
function as a hail marry (which would not have worked) and got the following
message; ERROR: could not identify a comparison function for type json and
then thought/hopped it was the same thing when reading the emails.

CREATE TABLE bad_table_json(id int, detail_type text, details json);
INSERT INTO bad_table_json values(1, 'a', '{"a":1}'::json);
INSERT INTO bad_table_json values(1, 'b', '{"a":1}'::json);
INSERT INTO bad_table_json values(1, 'c', '{"a":1}'::json);

SELECT id
,MAX(CASE WHEN detail_type = 'a' THEN details END) AS a
,MAX(CASE WHEN detail_type = 'b' THEN details END) AS b
,MAX(CASE WHEN detail_type = 'c' THEN details END) AS c
FROM bad_table_json
GROUP BY id

Aside: here's a way to do those type of things. It's not faster
necessarily but seems cleaner to me. This will bypass need for json
comparison. IMMUTABLE plpgsql is generally the fastest way to
implement highly iterated trivial functions.

CREATE OR REPLACE FUNCTION PickInternal(State anyelement, WantValue
TEXT, PickValue TEXT, Value anyelement)
RETURNS anyelement AS
$$
BEGIN
RETURN CASE WHEN WantValue = PickValue THEN Value ELSE State END;
END;
$$ LANGUAGE PLPGSQL IMMUTABLE;

CREATE AGGREGATE Pick(TEXT, TEXT, anyelement) (
SFUNC=PickInternal,
SType=anyelement);

SELECT id
,Pick('a', detail_type, details) AS a
,Pick('b', detail_type, details) AS b
,Pick('c', detail_type, details) AS c
FROM bad_table_json
GROUP BY id;

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general