problem with partitioned table and indexed json field
Hi,
I have a partitioned table events, with one partition for each month, eg
events_2013_03. The partition is done on the field timestamp, and
constraints are set, but insertion of data is done in the partition
directly (so not with a trigger on the events table)
The field event is of type json, and has a field '_id', which I can access:
=> select event->>'_id' from events limit 1;
?column?
--------------------------
4f9a786f44650105b50aafc9
I created an index on each partition of the table, but not on the events
table itself:
create index events_${y}_${m}_event_id_index on events_${y}_${m}
((event->>'_id'));
Querying the max event_id from a partition works fine:
=> select max(event->>'_id') from events_2013_03;
max
--------------------------
5158cdfe4465012cff522b74
However, requesting on the parent table does return the whole json field,
and not only the '_id':
=> select max(event->>'_id') from events;
{"_id":"526eb3ad4465013e3e131a43","origin":..... }
An explain returns an error:
=> explain select max(event->>'_id') from events;
ERROR: no tlist entry for key 2
This problem appeared when I created the indexes, and removing the index
make the explain work fine, but the plan implies a sequential scan on the
tables which is exactly what I wanted to avoid with the indexes.
Does someone have an explanation, and possibly a way to solve this problem?
thanks
Raph
On Thu, Oct 31, 2013 at 5:46 AM, Raphael Bauduin <rblists@gmail.com> wrote:
Hi,
I have a partitioned table events, with one partition for each month, eg
events_2013_03. The partition is done on the field timestamp, and
constraints are set, but insertion of data is done in the partition directly
(so not with a trigger on the events table)
The field event is of type json, and has a field '_id', which I can access:=> select event->>'_id' from events limit 1;
?column?
--------------------------
4f9a786f44650105b50aafc9I created an index on each partition of the table, but not on the events
table itself:
create index events_${y}_${m}_event_id_index on events_${y}_${m}
((event->>'_id'));Querying the max event_id from a partition works fine:
=> select max(event->>'_id') from events_2013_03;
max
--------------------------
5158cdfe4465012cff522b74However, requesting on the parent table does return the whole json field,
and not only the '_id':
=> select max(event->>'_id') from events;
{"_id":"526eb3ad4465013e3e131a43","origin":..... }An explain returns an error:
=> explain select max(event->>'_id') from events;
ERROR: no tlist entry for key 2This problem appeared when I created the indexes, and removing the index
make the explain work fine, but the plan implies a sequential scan on the
tables which is exactly what I wanted to avoid with the indexes.Does someone have an explanation, and possibly a way to solve this problem?
wow, that looks like a bug. Can you post the specific postgres version?
merlin
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
It's postgresql 9.3, from the pgdg apt repository:
9.3.0-2.pgdg10.4+1
Raph
On Thu, Oct 31, 2013 at 1:48 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Thu, Oct 31, 2013 at 5:46 AM, Raphael Bauduin <rblists@gmail.com>
wrote:Hi,
I have a partitioned table events, with one partition for each month, eg
events_2013_03. The partition is done on the field timestamp, and
constraints are set, but insertion of data is done in the partitiondirectly
(so not with a trigger on the events table)
The field event is of type json, and has a field '_id', which I canaccess:
=> select event->>'_id' from events limit 1;
?column?
--------------------------
4f9a786f44650105b50aafc9I created an index on each partition of the table, but not on the events
table itself:
create index events_${y}_${m}_event_id_index on events_${y}_${m}
((event->>'_id'));Querying the max event_id from a partition works fine:
=> select max(event->>'_id') from events_2013_03;
max
--------------------------
5158cdfe4465012cff522b74However, requesting on the parent table does return the whole json field,
and not only the '_id':
=> select max(event->>'_id') from events;
{"_id":"526eb3ad4465013e3e131a43","origin":..... }An explain returns an error:
=> explain select max(event->>'_id') from events;
ERROR: no tlist entry for key 2This problem appeared when I created the indexes, and removing the index
make the explain work fine, but the plan implies a sequential scan on the
tables which is exactly what I wanted to avoid with the indexes.Does someone have an explanation, and possibly a way to solve this
problem?
wow, that looks like a bug. Can you post the specific postgres version?
merlin
--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org
Raphael Bauduin <rblists@gmail.com> writes:
An explain returns an error:
=> explain select max(event->>'_id') from events;
ERROR: no tlist entry for key 2
This is certainly a bug. Can we see a self-contained example that
triggers that?
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
I'll look at providing such an example later this week.
Raph
On Thu, Oct 31, 2013 at 3:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Raphael Bauduin <rblists@gmail.com> writes:
An explain returns an error:
=> explain select max(event->>'_id') from events;
ERROR: no tlist entry for key 2This is certainly a bug. Can we see a self-contained example that
triggers that?regards, tom lane
--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org
Hi,
I have narrowed it a bit. It happens when I create said index on an empty
field. Here's the scenario to reproduce it:
Let me know if you need more info
Cheers
Raph
create table events(id SERIAL,
timestamp timestamp,
event json);
create table events_2012_01( CHECK (timestamp>='2012-01-01' and
timestamp<'2012-2-01' )) inherits (events) ;
create table events_2012_02( CHECK (timestamp>='2012-02-01' and
timestamp<'2012-3-01' )) inherits (events) ;
insert into events_2012_01 (timestamp, event) values ('2012-01-22
08:38:56', '{"_id":"4f93c3a044650105b5074c9a","type":"t1"}');
insert into events_2012_02 (timestamp, event) values ('2012-02-22
08:38:56', '{"_id":"5f93c3a044650105b5074c9a","type":"t2"}');
-- create empty table
create table events_2012_03( CHECK (timestamp>='2012-03-01' and
timestamp<'2012-4-01' )) inherits (events) ;
explain select max(event->>'_id') from events where event is not null;
--OK
--create index
create index events_2012_03_event_id_index on events_2012_03
((event->>'_id'));
explain select max(event->>'_id') from events where event is not null;
--BANG
drop index events_2012_03_event_id_index;
explain select max(event->>'_id') from events where event is not null;
--OK
On Mon, Nov 4, 2013 at 8:39 AM, Raphael Bauduin <rblists@gmail.com> wrote:
I'll look at providing such an example later this week.
Raph
On Thu, Oct 31, 2013 at 3:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Raphael Bauduin <rblists@gmail.com> writes:
An explain returns an error:
=> explain select max(event->>'_id') from events;
ERROR: no tlist entry for key 2This is certainly a bug. Can we see a self-contained example that
triggers that?regards, tom lane
--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org
--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org
Correction: It happens when I create said index on an empty *table*.
Raph
On Thu, Nov 7, 2013 at 11:10 AM, Raphael Bauduin <rblists@gmail.com> wrote:
Hi,
I have narrowed it a bit. It happens when I create said index on an empty
field. Here's the scenario to reproduce it:Let me know if you need more info
Cheers
Raph
create table events(id SERIAL,
timestamp timestamp,
event json);create table events_2012_01( CHECK (timestamp>='2012-01-01' and
timestamp<'2012-2-01' )) inherits (events) ;
create table events_2012_02( CHECK (timestamp>='2012-02-01' and
timestamp<'2012-3-01' )) inherits (events) ;
insert into events_2012_01 (timestamp, event) values ('2012-01-22
08:38:56', '{"_id":"4f93c3a044650105b5074c9a","type":"t1"}');
insert into events_2012_02 (timestamp, event) values ('2012-02-22
08:38:56', '{"_id":"5f93c3a044650105b5074c9a","type":"t2"}');-- create empty table
create table events_2012_03( CHECK (timestamp>='2012-03-01' and
timestamp<'2012-4-01' )) inherits (events) ;
explain select max(event->>'_id') from events where event is not null;
--OK--create index
create index events_2012_03_event_id_index on events_2012_03
((event->>'_id'));
explain select max(event->>'_id') from events where event is not null;
--BANGdrop index events_2012_03_event_id_index;
explain select max(event->>'_id') from events where event is not null;
--OKOn Mon, Nov 4, 2013 at 8:39 AM, Raphael Bauduin <rblists@gmail.com> wrote:
I'll look at providing such an example later this week.
Raph
On Thu, Oct 31, 2013 at 3:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Raphael Bauduin <rblists@gmail.com> writes:
An explain returns an error:
=> explain select max(event->>'_id') from events;
ERROR: no tlist entry for key 2This is certainly a bug. Can we see a self-contained example that
triggers that?regards, tom lane
--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org
--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org
Raphael Bauduin <rblists@gmail.com> writes:
I have narrowed it a bit. It happens when I create said index on an empty
field. Here's the scenario to reproduce it:
Thanks, I've reproduced the problem here. The query still seems to run OK,
it's just EXPLAIN that's falling over --- do you see the same?
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
The query is also problematic here, because it returns the full json, and
not only the data I selected in the json.
Below, it should return only '_id', and not the whole json stored in event:
test3=> select max(event->>'_id') from events where event is not null;
max
------------------------------------------------
{"_id":"5f93c3a044650105b5074c9a","type":"t2"}
Thanks
raph
On Thu, Nov 7, 2013 at 4:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Raphael Bauduin <rblists@gmail.com> writes:
I have narrowed it a bit. It happens when I create said index on an empty
field. Here's the scenario to reproduce it:Thanks, I've reproduced the problem here. The query still seems to run OK,
it's just EXPLAIN that's falling over --- do you see the same?regards, tom lane
--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org
Raphael Bauduin <rblists@gmail.com> writes:
The query is also problematic here, because it returns the full json, and
not only the data I selected in the json.
Doh, right, you mentioned that in the original bug report, and now that
I'm paying a bit more attention I see it too. I was looking for
some sort of error from running the query, not just wrong data.
It looks like the problem is we're building a MergeAppend plan and not
getting the targetlist for the MergeAppend node right. I hacked EXPLAIN
very quickly to not fall over when it fails to find a sort key in the
node's targetlist, and here's what I see:
regression=# explain verbose select max(event->>'_id') from events where event is not null;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=58.75..58.76 rows=1 width=0)
Output: $0
InitPlan 1 (returns $0)
-> Limit (cost=58.70..58.75 rows=1 width=32)
Output: events.event
-> Merge Append (cost=58.70..200.88 rows=3268 width=32)
Sort Key: [no tlist entry for key 2]
-> Sort (cost=0.01..0.02 rows=1 width=32)
Output: events.event, ((events.event ->> '_id'::text))
Sort Key: ((events.event ->> '_id'::text))
-> Seq Scan on public.events (cost=0.00..0.00 rows=1 width=32)
Output: events.event, (events.event ->> '_id'::text)
Filter: ((events.event IS NOT NULL) AND ((events.event ->> '_id'::text) IS NOT NULL))
-> Sort (cost=29.20..31.92 rows=1089 width=32)
Output: events_2012_01.event, ((events_2012_01.event ->> '_id'::text))
Sort Key: ((events_2012_01.event ->> '_id'::text))
-> Seq Scan on public.events_2012_01 (cost=0.00..23.75 rows=1089 width=32)
Output: events_2012_01.event, (events_2012_01.event ->> '_id'::text)
Filter: ((events_2012_01.event IS NOT NULL) AND ((events_2012_01.event ->> '_id'::text) IS NOT NULL))
-> Sort (cost=29.20..31.92 rows=1089 width=32)
Output: events_2012_02.event, ((events_2012_02.event ->> '_id'::text))
Sort Key: ((events_2012_02.event ->> '_id'::text))
-> Seq Scan on public.events_2012_02 (cost=0.00..23.75 rows=1089 width=32)
Output: events_2012_02.event, (events_2012_02.event ->> '_id'::text)
Filter: ((events_2012_02.event IS NOT NULL) AND ((events_2012_02.event ->> '_id'::text) IS NOT NULL))
-> Index Scan Backward using events_2012_03_event_id_index on public.events_2012_03 (cost=0.15..63.30 rows=1089 width=32)
Output: events_2012_03.event, (events_2012_03.event ->> '_id'::text)
Index Cond: ((events_2012_03.event ->> '_id'::text) IS NOT NULL)
Filter: (events_2012_03.event IS NOT NULL)
(29 rows)
So everything looks right for the individual table-scan subplans, but
something's going badly wrong when making the MergeAppend ...
dunno what yet.
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
I wrote:
It looks like the problem is we're building a MergeAppend plan and not
getting the targetlist for the MergeAppend node right.
Found it --- simple oversight in building optimized min/max plans.
If you need a patch now, see
http://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=5d0731da521f090f80ea39529fe274ac6d6bffa1
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Thu, Nov 7, 2013 at 7:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
It looks like the problem is we're building a MergeAppend plan and not
getting the targetlist for the MergeAppend node right.Found it --- simple oversight in building optimized min/max plans.
If you need a patch now, see
Wow, the patch is available thes same day I supplied the steps to reproduce
the bug! I don't think it's possible to be faster :-)
Thanks a alot!
Raph
regards, tom lane
--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org