problem with partitioned table and indexed json field

Started by Raphael Bauduinover 12 years ago12 messagesgeneral
Jump to latest
#1Raphael Bauduin
rblists@gmail.com

Hi,

I have a partitioned table events, with one partition for each month, eg
events_2013_03. The partition is done on the field timestamp, and
constraints are set, but insertion of data is done in the partition
directly (so not with a trigger on the events table)
The field event is of type json, and has a field '_id', which I can access:

=> select event->>'_id' from events limit 1;
?column?
--------------------------
4f9a786f44650105b50aafc9

I created an index on each partition of the table, but not on the events
table itself:
create index events_${y}_${m}_event_id_index on events_${y}_${m}
((event->>'_id'));

Querying the max event_id from a partition works fine:
=> select max(event->>'_id') from events_2013_03;
max
--------------------------
5158cdfe4465012cff522b74

However, requesting on the parent table does return the whole json field,
and not only the '_id':
=> select max(event->>'_id') from events;
{"_id":"526eb3ad4465013e3e131a43","origin":..... }

An explain returns an error:
=> explain select max(event->>'_id') from events;
ERROR: no tlist entry for key 2

This problem appeared when I created the indexes, and removing the index
make the explain work fine, but the plan implies a sequential scan on the
tables which is exactly what I wanted to avoid with the indexes.

Does someone have an explanation, and possibly a way to solve this problem?

thanks

Raph

#2Merlin Moncure
mmoncure@gmail.com
In reply to: Raphael Bauduin (#1)
Re: problem with partitioned table and indexed json field

On Thu, Oct 31, 2013 at 5:46 AM, Raphael Bauduin <rblists@gmail.com> wrote:

Hi,

I have a partitioned table events, with one partition for each month, eg
events_2013_03. The partition is done on the field timestamp, and
constraints are set, but insertion of data is done in the partition directly
(so not with a trigger on the events table)
The field event is of type json, and has a field '_id', which I can access:

=> select event->>'_id' from events limit 1;
?column?
--------------------------
4f9a786f44650105b50aafc9

I created an index on each partition of the table, but not on the events
table itself:
create index events_${y}_${m}_event_id_index on events_${y}_${m}
((event->>'_id'));

Querying the max event_id from a partition works fine:
=> select max(event->>'_id') from events_2013_03;
max
--------------------------
5158cdfe4465012cff522b74

However, requesting on the parent table does return the whole json field,
and not only the '_id':
=> select max(event->>'_id') from events;
{"_id":"526eb3ad4465013e3e131a43","origin":..... }

An explain returns an error:
=> explain select max(event->>'_id') from events;
ERROR: no tlist entry for key 2

This problem appeared when I created the indexes, and removing the index
make the explain work fine, but the plan implies a sequential scan on the
tables which is exactly what I wanted to avoid with the indexes.

Does someone have an explanation, and possibly a way to solve this problem?

wow, that looks like a bug. Can you post the specific postgres version?

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Raphael Bauduin
rblists@gmail.com
In reply to: Merlin Moncure (#2)
Re: problem with partitioned table and indexed json field

It's postgresql 9.3, from the pgdg apt repository:
9.3.0-2.pgdg10.4+1

Raph

On Thu, Oct 31, 2013 at 1:48 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Thu, Oct 31, 2013 at 5:46 AM, Raphael Bauduin <rblists@gmail.com>
wrote:

Hi,

I have a partitioned table events, with one partition for each month, eg
events_2013_03. The partition is done on the field timestamp, and
constraints are set, but insertion of data is done in the partition

directly

(so not with a trigger on the events table)
The field event is of type json, and has a field '_id', which I can

access:

=> select event->>'_id' from events limit 1;
?column?
--------------------------
4f9a786f44650105b50aafc9

I created an index on each partition of the table, but not on the events
table itself:
create index events_${y}_${m}_event_id_index on events_${y}_${m}
((event->>'_id'));

Querying the max event_id from a partition works fine:
=> select max(event->>'_id') from events_2013_03;
max
--------------------------
5158cdfe4465012cff522b74

However, requesting on the parent table does return the whole json field,
and not only the '_id':
=> select max(event->>'_id') from events;
{"_id":"526eb3ad4465013e3e131a43","origin":..... }

An explain returns an error:
=> explain select max(event->>'_id') from events;
ERROR: no tlist entry for key 2

This problem appeared when I created the indexes, and removing the index
make the explain work fine, but the plan implies a sequential scan on the
tables which is exactly what I wanted to avoid with the indexes.

Does someone have an explanation, and possibly a way to solve this

problem?

wow, that looks like a bug. Can you post the specific postgres version?

merlin

--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Raphael Bauduin (#1)
Re: problem with partitioned table and indexed json field

Raphael Bauduin <rblists@gmail.com> writes:

An explain returns an error:
=> explain select max(event->>'_id') from events;
ERROR: no tlist entry for key 2

This is certainly a bug. Can we see a self-contained example that
triggers that?

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Raphael Bauduin
rblists@gmail.com
In reply to: Tom Lane (#4)
Re: problem with partitioned table and indexed json field

I'll look at providing such an example later this week.

Raph

On Thu, Oct 31, 2013 at 3:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Raphael Bauduin <rblists@gmail.com> writes:

An explain returns an error:
=> explain select max(event->>'_id') from events;
ERROR: no tlist entry for key 2

This is certainly a bug. Can we see a self-contained example that
triggers that?

regards, tom lane

--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org

#6Raphael Bauduin
rblists@gmail.com
In reply to: Raphael Bauduin (#5)
Re: problem with partitioned table and indexed json field

Hi,

I have narrowed it a bit. It happens when I create said index on an empty
field. Here's the scenario to reproduce it:

Let me know if you need more info

Cheers

Raph

create table events(id SERIAL,
timestamp timestamp,
event json);

create table events_2012_01( CHECK (timestamp>='2012-01-01' and
timestamp<'2012-2-01' )) inherits (events) ;
create table events_2012_02( CHECK (timestamp>='2012-02-01' and
timestamp<'2012-3-01' )) inherits (events) ;
insert into events_2012_01 (timestamp, event) values ('2012-01-22
08:38:56', '{"_id":"4f93c3a044650105b5074c9a","type":"t1"}');
insert into events_2012_02 (timestamp, event) values ('2012-02-22
08:38:56', '{"_id":"5f93c3a044650105b5074c9a","type":"t2"}');

-- create empty table
create table events_2012_03( CHECK (timestamp>='2012-03-01' and
timestamp<'2012-4-01' )) inherits (events) ;
explain select max(event->>'_id') from events where event is not null;
--OK

--create index
create index events_2012_03_event_id_index on events_2012_03
((event->>'_id'));
explain select max(event->>'_id') from events where event is not null;
--BANG

drop index events_2012_03_event_id_index;
explain select max(event->>'_id') from events where event is not null;
--OK

On Mon, Nov 4, 2013 at 8:39 AM, Raphael Bauduin <rblists@gmail.com> wrote:

I'll look at providing such an example later this week.

Raph

On Thu, Oct 31, 2013 at 3:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Raphael Bauduin <rblists@gmail.com> writes:

An explain returns an error:
=> explain select max(event->>'_id') from events;
ERROR: no tlist entry for key 2

This is certainly a bug. Can we see a self-contained example that
triggers that?

regards, tom lane

--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org

--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org

#7Raphael Bauduin
rblists@gmail.com
In reply to: Raphael Bauduin (#6)
Re: problem with partitioned table and indexed json field

Correction: It happens when I create said index on an empty *table*.

Raph

On Thu, Nov 7, 2013 at 11:10 AM, Raphael Bauduin <rblists@gmail.com> wrote:

Hi,

I have narrowed it a bit. It happens when I create said index on an empty
field. Here's the scenario to reproduce it:

Let me know if you need more info

Cheers

Raph

create table events(id SERIAL,
timestamp timestamp,
event json);

create table events_2012_01( CHECK (timestamp>='2012-01-01' and
timestamp<'2012-2-01' )) inherits (events) ;
create table events_2012_02( CHECK (timestamp>='2012-02-01' and
timestamp<'2012-3-01' )) inherits (events) ;
insert into events_2012_01 (timestamp, event) values ('2012-01-22
08:38:56', '{"_id":"4f93c3a044650105b5074c9a","type":"t1"}');
insert into events_2012_02 (timestamp, event) values ('2012-02-22
08:38:56', '{"_id":"5f93c3a044650105b5074c9a","type":"t2"}');

-- create empty table
create table events_2012_03( CHECK (timestamp>='2012-03-01' and
timestamp<'2012-4-01' )) inherits (events) ;
explain select max(event->>'_id') from events where event is not null;
--OK

--create index
create index events_2012_03_event_id_index on events_2012_03
((event->>'_id'));
explain select max(event->>'_id') from events where event is not null;
--BANG

drop index events_2012_03_event_id_index;
explain select max(event->>'_id') from events where event is not null;
--OK

On Mon, Nov 4, 2013 at 8:39 AM, Raphael Bauduin <rblists@gmail.com> wrote:

I'll look at providing such an example later this week.

Raph

On Thu, Oct 31, 2013 at 3:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Raphael Bauduin <rblists@gmail.com> writes:

An explain returns an error:
=> explain select max(event->>'_id') from events;
ERROR: no tlist entry for key 2

This is certainly a bug. Can we see a self-contained example that
triggers that?

regards, tom lane

--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org

--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org

--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Raphael Bauduin (#6)
Re: problem with partitioned table and indexed json field

Raphael Bauduin <rblists@gmail.com> writes:

I have narrowed it a bit. It happens when I create said index on an empty
field. Here's the scenario to reproduce it:

Thanks, I've reproduced the problem here. The query still seems to run OK,
it's just EXPLAIN that's falling over --- do you see the same?

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Raphael Bauduin
rblists@gmail.com
In reply to: Tom Lane (#8)
Re: problem with partitioned table and indexed json field

The query is also problematic here, because it returns the full json, and
not only the data I selected in the json.
Below, it should return only '_id', and not the whole json stored in event:

test3=> select max(event->>'_id') from events where event is not null;
max
------------------------------------------------
{"_id":"5f93c3a044650105b5074c9a","type":"t2"}

Thanks

raph

On Thu, Nov 7, 2013 at 4:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Raphael Bauduin <rblists@gmail.com> writes:

I have narrowed it a bit. It happens when I create said index on an empty
field. Here's the scenario to reproduce it:

Thanks, I've reproduced the problem here. The query still seems to run OK,
it's just EXPLAIN that's falling over --- do you see the same?

regards, tom lane

--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Raphael Bauduin (#9)
Re: problem with partitioned table and indexed json field

Raphael Bauduin <rblists@gmail.com> writes:

The query is also problematic here, because it returns the full json, and
not only the data I selected in the json.

Doh, right, you mentioned that in the original bug report, and now that
I'm paying a bit more attention I see it too. I was looking for
some sort of error from running the query, not just wrong data.

It looks like the problem is we're building a MergeAppend plan and not
getting the targetlist for the MergeAppend node right. I hacked EXPLAIN
very quickly to not fall over when it fails to find a sort key in the
node's targetlist, and here's what I see:

regression=# explain verbose select max(event->>'_id') from events where event is not null;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=58.75..58.76 rows=1 width=0)
Output: $0
InitPlan 1 (returns $0)
-> Limit (cost=58.70..58.75 rows=1 width=32)
Output: events.event
-> Merge Append (cost=58.70..200.88 rows=3268 width=32)
Sort Key: [no tlist entry for key 2]
-> Sort (cost=0.01..0.02 rows=1 width=32)
Output: events.event, ((events.event ->> '_id'::text))
Sort Key: ((events.event ->> '_id'::text))
-> Seq Scan on public.events (cost=0.00..0.00 rows=1 width=32)
Output: events.event, (events.event ->> '_id'::text)
Filter: ((events.event IS NOT NULL) AND ((events.event ->> '_id'::text) IS NOT NULL))
-> Sort (cost=29.20..31.92 rows=1089 width=32)
Output: events_2012_01.event, ((events_2012_01.event ->> '_id'::text))
Sort Key: ((events_2012_01.event ->> '_id'::text))
-> Seq Scan on public.events_2012_01 (cost=0.00..23.75 rows=1089 width=32)
Output: events_2012_01.event, (events_2012_01.event ->> '_id'::text)
Filter: ((events_2012_01.event IS NOT NULL) AND ((events_2012_01.event ->> '_id'::text) IS NOT NULL))
-> Sort (cost=29.20..31.92 rows=1089 width=32)
Output: events_2012_02.event, ((events_2012_02.event ->> '_id'::text))
Sort Key: ((events_2012_02.event ->> '_id'::text))
-> Seq Scan on public.events_2012_02 (cost=0.00..23.75 rows=1089 width=32)
Output: events_2012_02.event, (events_2012_02.event ->> '_id'::text)
Filter: ((events_2012_02.event IS NOT NULL) AND ((events_2012_02.event ->> '_id'::text) IS NOT NULL))
-> Index Scan Backward using events_2012_03_event_id_index on public.events_2012_03 (cost=0.15..63.30 rows=1089 width=32)
Output: events_2012_03.event, (events_2012_03.event ->> '_id'::text)
Index Cond: ((events_2012_03.event ->> '_id'::text) IS NOT NULL)
Filter: (events_2012_03.event IS NOT NULL)
(29 rows)

So everything looks right for the individual table-scan subplans, but
something's going badly wrong when making the MergeAppend ...
dunno what yet.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#10)
Re: problem with partitioned table and indexed json field

I wrote:

It looks like the problem is we're building a MergeAppend plan and not
getting the targetlist for the MergeAppend node right.

Found it --- simple oversight in building optimized min/max plans.
If you need a patch now, see
http://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=5d0731da521f090f80ea39529fe274ac6d6bffa1

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12Raphael Bauduin
rblists@gmail.com
In reply to: Tom Lane (#11)
Re: problem with partitioned table and indexed json field

On Thu, Nov 7, 2013 at 7:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

It looks like the problem is we're building a MergeAppend plan and not
getting the targetlist for the MergeAppend node right.

Found it --- simple oversight in building optimized min/max plans.
If you need a patch now, see

http://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=5d0731da521f090f80ea39529fe274ac6d6bffa1

Wow, the patch is available thes same day I supplied the steps to reproduce
the bug! I don't think it's possible to be faster :-)
Thanks a alot!

Raph

regards, tom lane

--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org