BUG #15475: Views over CITEXT columns return no data
The following bug has been logged on the website:
Bug reference: 15475
Logged by: Paul Schaap
Email address: ps@ipggroup.com
PostgreSQL version: 11.0
Operating system: OSX 10.14 and Ubuntu 18.10
Description:
The steps to reproduce are:
1) Install CITEXT extension
2) Create a table with a CITEXT column
3) Create a view (not materialised) over the table
4) Query the view filtering on the CITEXT column
No data gets returned when querying via the view, this works under
Postgresql 10 but not 11.
On Wed, Oct 31, 2018 at 6:45 PM PG Bug reporting form <
noreply@postgresql.org> wrote:
The following bug has been logged on the website:
Bug reference: 15475
Logged by: Paul Schaap
Email address: ps@ipggroup.com
PostgreSQL version: 11.0
Operating system: OSX 10.14 and Ubuntu 18.10
Description:The steps to reproduce are:
1) Install CITEXT extension
2) Create a table with a CITEXT column
3) Create a view (not materialised) over the table
4) Query the view filtering on the CITEXT columnNo data gets returned when querying via the view, this works under
Postgresql 10 but not 11.
There is no step here where you populate the table.
Add one, I can't reproduce this (on Ubuntu 16.04, so maybe that is
important).
create extension citext ;
create table j1 (x citext);
insert into j1 values ('abc');
create view j2 as select * from j1;
select * from j2 where x ='aBc';
<see a row>
What are your encoding, collation, and ctype?
Cheers,
Jeff
Hi Jeff,
That was fast :-)
I am trying to create a script that reproduces the issue as it seems it is not quite that simple, please bear with me.
datcollate en_US.UTF-8
datctype en_US.UTF-8
SERVER_ENCODING UTF8
CHEERS
Paul
On 1 Nov 2018, at 10:00, Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Oct 31, 2018 at 6:45 PM PG Bug reporting form <noreply@postgresql.org <mailto:noreply@postgresql.org>> wrote:
The following bug has been logged on the website:Bug reference: 15475
Logged by: Paul Schaap
Email address: ps@ipggroup.com <mailto:ps@ipggroup.com>
PostgreSQL version: 11.0
Operating system: OSX 10.14 and Ubuntu 18.10
Description:The steps to reproduce are:
1) Install CITEXT extension
2) Create a table with a CITEXT column
3) Create a view (not materialised) over the table
4) Query the view filtering on the CITEXT columnNo data gets returned when querying via the view, this works under
Postgresql 10 but not 11.There is no step here where you populate the table.
Add one, I can't reproduce this (on Ubuntu 16.04, so maybe that is important).
create extension citext ;
create table j1 (x citext);
insert into j1 values ('abc');
create view j2 as select * from j1;
select * from j2 where x ='aBc';<see a row>
What are your encoding, collation, and ctype?
Cheers,
Jeff
--
PRIVACY & CONFIDENTIALITY NOTICE
The information contained in this
email is intended for the named recipients only. It may contain
privileged and confidential information and if you are not the intended
recipient, you should not copy it or disclose its contents to any other
person, or take any action in reliance on it. If you have received this
email in error, please notify us immediately by return email.
Hi Jeff,
I have it now but it might be difficult for you to reproduce.
1) pg_dump a lookup table from a Postgres 10(.5) with -Fc and pg_restore to Postgres 11
2) Join to the table in a view, but when you join do an INTEGER cast
CREATE OR REPLACE VIEW citext_view AS
SELECT citext_table.*
FROM citext_table
LEFT OUTER JOIN lookuptable ON citext_table.some_link:INT = lookuptable.linkvalue;
3) Test with different quantities of data as it works with smaller quantities of data, but fails with larger:
-- Works with 1,000
INSERT INTO citext_table SELECT NULL, 'A', GENERATE_SERIES(1, 1000);
SELECT type_id, COUNT(*) FROM citext_view WHERE type_id NOT IN ('2', '3') GROUP BY type_id;
-- gets a 1000 value
-- Doesn't work with 1,000,000
TRUNCATE TABLE citext_table;
INSERT INTO citext_table SELECT NULL, 'A', GENERATE_SERIES(1, 1000000);
SELECT type_id, COUNT(*) FROM citext_view WHERE type_id NOT IN ('2', '3') GROUP BY type_id;
-- No rows come back
n.b. I see nothing in the postgresql.log
The workarounds I have discovered are:
A) pg_dump in 11 and restore back in 11
B) pg_dump from 10 but NOT -Fc (so SQL) and restore to 11
C) Reverse the logic on the JOIN to citext_table.some_link = lookuptable.linkvalue::CITEXT;
CHEERS
Paul
On 1 Nov 2018, at 10:06, Paul Schaap <ps@ipggroup.com> wrote:
Hi Jeff,
That was fast :-)
I am trying to create a script that reproduces the issue as it seems it is not quite that simple, please bear with me.
datcollate en_US.UTF-8
datctype en_US.UTF-8
SERVER_ENCODING UTF8CHEERS
PaulOn 1 Nov 2018, at 10:00, Jeff Janes <jeff.janes@gmail.com <mailto:jeff.janes@gmail.com>> wrote:
On Wed, Oct 31, 2018 at 6:45 PM PG Bug reporting form <noreply@postgresql.org <mailto:noreply@postgresql.org>> wrote:
The following bug has been logged on the website:Bug reference: 15475
Logged by: Paul Schaap
Email address: ps@ipggroup.com <mailto:ps@ipggroup.com>
PostgreSQL version: 11.0
Operating system: OSX 10.14 and Ubuntu 18.10
Description:The steps to reproduce are:
1) Install CITEXT extension
2) Create a table with a CITEXT column
3) Create a view (not materialised) over the table
4) Query the view filtering on the CITEXT columnNo data gets returned when querying via the view, this works under
Postgresql 10 but not 11.There is no step here where you populate the table.
Add one, I can't reproduce this (on Ubuntu 16.04, so maybe that is important).
create extension citext ;
create table j1 (x citext);
insert into j1 values ('abc');
create view j2 as select * from j1;
select * from j2 where x ='aBc';<see a row>
What are your encoding, collation, and ctype?
Cheers,
Jeff
--
PRIVACY & CONFIDENTIALITY NOTICE
The information contained in this
email is intended for the named recipients only. It may contain
privileged and confidential information and if you are not the intended
recipient, you should not copy it or disclose its contents to any other
person, or take any action in reliance on it. If you have received this
email in error, please notify us immediately by return email.
"Paul" == Paul Schaap <ps@ipggroup.com> writes:
Paul> Hi Jeff,
Paul> I have it now but it might be difficult for you to reproduce.
Paul> 1) pg_dump a lookup table from a Postgres 10(.5) with -Fc and
Paul> pg_restore to Postgres 11
Paul> 2) Join to the table in a view, but when you join do an INTEGER
Paul> cast
Paul> 3) Test with different quantities of data as it works with
Paul> smaller quantities of data, but fails with larger:
Are there any indexes on either table?
--
Andrew (irc:RhodiumToad)
Hi Andrew,
Yes there are indexes on both tables, and the issue only shows up with high volumes.
Note I have subsequently isolated the issue down to a pg_dump 10 to pg_restore 11 using -Fc.
Kind Regards,
PAUL SCHAAP / General Manager
IPG Solutions Pty Limited
Full Service eCommerce Payments Partner
Level 6, Blade Building, 69-71 Walker Street, North Sydney, NSW, Australia
W www.ipggroup.com <http://www.ipggroup.com/> | M +61-4-0307-0000 <tel:+61403070000>
On 2 Nov 2018, at 02:43, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
"Paul" == Paul Schaap <ps@ipggroup.com> writes:
Paul> Hi Jeff,
Paul> I have it now but it might be difficult for you to reproduce.Paul> 1) pg_dump a lookup table from a Postgres 10(.5) with -Fc and
Paul> pg_restore to Postgres 11Paul> 2) Join to the table in a view, but when you join do an INTEGER
Paul> castPaul> 3) Test with different quantities of data as it works with
Paul> smaller quantities of data, but fails with larger:Are there any indexes on either table?
--
Andrew (irc:RhodiumToad)
--
PRIVACY & CONFIDENTIALITY NOTICE
The information contained in this
email is intended for the named recipients only. It may contain
privileged and confidential information and if you are not the intended
recipient, you should not copy it or disclose its contents to any other
person, or take any action in reliance on it. If you have received this
email in error, please notify us immediately by return email.
"Paul" == Paul Schaap <ps@ipggroup.com> writes:
Paul> Hi Andrew,
Paul> Yes there are indexes on both tables, and the issue only shows up
Paul> with high volumes.
Does it show up with low volumes too if you do set enable_seqscan=false;
Paul> Note I have subsequently isolated the issue down to a pg_dump 10
Paul> to pg_restore 11 using -Fc.
No, I don't think you have. In particular, it's not plausible that using
-Fc makes any substantive difference, since the commands run for the
restore will not be any different to restoring a plain-text dump.
--
Andrew.
Hi Andrew,
I agree it seems implausible to me too however it is 100% reproducible, see below track of it happening, I have changed the names of things to protect the innocent ;-) :
my_server.rds.amazonaws.com = Postgres 10.5 RDS
Local Postgres 11.0 on OSX, also tried on Ubuntu with the same data and it exhibited the same issue
$ psql -c "DROP TABLE my_lookup CASCADE;" my_databasename
NOTICE: drop cascades to view my_events
DROP TABLE
$ pg_dump -h my_server.rds.amazonaws.com -U my_username -d my_databasename -t my_lookup -Fc | pg_restore -d my_databasename
Password:
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 225; 1259 27069 TABLE my_lookup my_username
pg_restore: [archiver (db)] could not execute query: ERROR: role "my_username" does not exist
Command was: ALTER TABLE public.my_lookup OWNER TO my_username;
WARNING: errors ignored on restore: 1
$ psql -c "CREATE VIEW my_events AS SELECT * FROM my_events_raw LEFT OUTER JOIN my_lookup ON my_citext::INT = my_int;" my_databasename
CREATE VIEW
HERE IS THE STRANGE ANSWER:
$ psql -c "SELECT COUNT(*)
FROM my_events
WHERE my_date >= '2018-08-01'
AND my_date < '2018-09-01'
AND type_id NOT IN ('A','LIST','OF','NOTS');" my_databasename
count
-------
0
(1 row)
BUT HANG ON WE DO GET DATA IF I LIMIT:
$ psql -c "SELECT *
FROM my_events
WHERE my_date >= '2018-08-01'
AND my_date < '2018-09-01'
AND type_id NOT IN ('A','LIST','OF','NOTS') LIMIT 10;" my_databasename
<snip>
(10 rows)
OK LETS RELOAD USING A SQL METHOD:
$ psql -c "DROP TABLE my_lookup CASCADE;" my_databasename
NOTICE: drop cascades to view my_events
DROP TABLE
$ pg_dump -h my_server.rds.amazonaws.com -U my_username -d my_databasename -t my_lookup | psql -d my_databasename
Password:
SET
SET
SET
SET
SET
set_config
------------
(1 row)
SET
SET
SET
SET
SET
CREATE TABLE
ERROR: role "my_username" does not exist
COPY 355109
ALTER TABLE
CREATE INDEX
JUST SO YOU KNOW HOW MANY my_events THERE ARE AS THIS RELIES ON SOME SCALE:
$ psql -c "SELECT COUNT(*) FROM my_events_raw;" my_databasename
count
-----------
162430247
(1 row)
$ psql -c "CREATE VIEW my_events AS SELECT * FROM my_events_raw LEFT OUTER JOIN my_lookup ON my_citext::INT = my_int;" my_databasename
CREATE VIEW
BINGO, THE RIGHT ANSWER:
$ psql -c "SELECT COUNT(*)
FROM my_events
WHERE my_date >= '2018-08-01'
AND my_date < '2018-09-01'
AND type_id NOT IN ('A','LIST','OF','NOTS');" my_databasename
count
--------
168878
(1 row)
The only other clues I have for you are :
my_events_raw is partitioned by month, e.g. FOR VALUES FROM ('2018-02-01 00:00:00+00' ) TO ('2018-03-01 00:00:00+00');
that if I switch the my_citext from CITEXT to TEXT column magically works again, and switching it back again to CITEXT breaks it again
That if I reload the table using any non -Fc method it all works, also no matter if I drop or truncate and load the table
There are no NULL my_int's, there are NULL my_citext's
That if I reverse my_citext::INT = my_int to my_citext = my_int::CITEXT it also works correctly
Is there perhaps any other logging or log level I could set that may help?
I know this is baffling and is a weird edge case, and there are plenty of work arounds that I found, but maybe it hints at another issue so seems worth investigating especially for a .0 release.
CHEERS
Paul
On 2 Nov 2018, at 09:26, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
"Paul" == Paul Schaap <ps@ipggroup.com> writes:
Paul> Hi Andrew,
Paul> Yes there are indexes on both tables, and the issue only shows up
Paul> with high volumes.Does it show up with low volumes too if you do set enable_seqscan=false;
Paul> Note I have subsequently isolated the issue down to a pg_dump 10
Paul> to pg_restore 11 using -Fc.No, I don't think you have. In particular, it's not plausible that using
-Fc makes any substantive difference, since the commands run for the
restore will not be any different to restoring a plain-text dump.--
Andrew.
--
PRIVACY & CONFIDENTIALITY NOTICE
The information contained in this
email is intended for the named recipients only. It may contain
privileged and confidential information and if you are not the intended
recipient, you should not copy it or disclose its contents to any other
person, or take any action in reliance on it. If you have received this
email in error, please notify us immediately by return email.
"Paul" == Paul Schaap <ps@ipggroup.com> writes:
Paul> Hi Andrew,
Paul> I agree it seems implausible to me too however it is 100%
Paul> reproducible, see below track of it happening, I have changed the
Paul> names of things to protect the innocent ;-) :
Can you show the result of adding EXPLAIN ANALYZE in front of your
select queries, both in the failing and successful cases?
i.e.
$ psql -c "EXPLAIN ANALYZE SELECT COUNT(*)
FROM my_events
WHERE my_date >= '2018-08-01'
AND my_date < '2018-09-01'
AND type_id NOT IN ('A','LIST','OF','NOTS');" my_databasename
(and likewise for the other queries you showed)
--
Andrew (irc:RhodiumToad)
Hi Andrew,
Here you go:
BROKEN VERSION:
$ psql -c "EXPLAIN ANALYSE SELECT COUNT(*)
FROM my_events
WHERE my_date >= '2018-08-01'
AND my_date < '2018-09-01'
AND type_id NOT IN ('A','LIST','OF','NOTS');" my_databasename
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=88550586.73..88550586.74 rows=1 width=8) (actual time=32891.602..32891.602 rows=1 loops=1)
-> Gather (cost=88550586.52..88550586.73 rows=2 width=8) (actual time=32884.164..32894.896 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=88549586.52..88549586.53 rows=1 width=8) (actual time=32871.097..32871.097 rows=1 loops=3)
-> Merge Left Join (cost=627157.66..78783047.74 rows=3906615513 width=0) (actual time=32842.042..32863.090 rows=56293 loops=3)
Merge Cond: (((my_events_raw_201807.my_citext)::integer) = my_lookup.my_int)
-> Sort (cost=581238.25..586738.88 rows=2200254 width=32) (actual time=32842.038..32848.497 rows=56293 loops=3)
Sort Key: ((my_events_raw_201807.my_citext)::integer)
Sort Method: quicksort Memory: 3265kB
Worker 0: Sort Method: quicksort Memory: 3284kB
Worker 1: Sort Method: quicksort Memory: 3338kB
-> Result (cost=0.43..244162.87 rows=2200254 width=32) (actual time=10.142..32772.369 rows=56293 loops=3)
-> Parallel Append (cost=0.43..244162.87 rows=2200254 width=32) (actual time=10.139..32727.527 rows=56293 loops=3)
-> Parallel Bitmap Heap Scan on my_events_raw_201807 (cost=2158.55..86099.06 rows=29931 width=32) (actual time=24.819..364.996 rows=5154 loops=1)
Recheck Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
Rows Removed by Filter: 73532
-> Bitmap Index Scan on my_events_raw_201807_my_date_idx (cost=0.00..2140.59 rows=75616 width=0) (actual time=21.943..21.943 rows=78686 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
-> Parallel Index Scan using my_events_raw_201601_my_date_idx on my_events_raw_201601 (cost=0.43..8.46 rows=1 width=32) (actual time=0.047..0.047 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201602_my_date_idx on my_events_raw_201602 (cost=0.43..8.46 rows=1 width=32) (actual time=0.032..0.032 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201603_my_date_idx on my_events_raw_201603 (cost=0.43..8.46 rows=1 width=32) (actual time=0.030..0.030 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201604_my_date_idx on my_events_raw_201604 (cost=0.43..8.46 rows=1 width=32) (actual time=0.028..0.028 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201605_my_date_idx on my_events_raw_201605 (cost=0.43..8.46 rows=1 width=32) (actual time=0.037..0.037 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201606_my_date_idx on my_events_raw_201606 (cost=0.43..8.46 rows=1 width=32) (actual time=0.028..0.028 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201612_my_date_idx on my_events_raw_201612 (cost=0.43..8.46 rows=1 width=32) (actual time=0.030..0.030 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201701_my_date_idx on my_events_raw_201701 (cost=0.43..8.46 rows=1 width=32) (actual time=0.035..0.035 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201702_my_date_idx on my_events_raw_201702 (cost=0.43..8.46 rows=1 width=32) (actual time=0.035..0.035 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201703_my_date_idx on my_events_raw_201703 (cost=0.43..8.46 rows=1 width=32) (actual time=0.027..0.027 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201704_my_date_idx on my_events_raw_201704 (cost=0.43..8.46 rows=1 width=32) (actual time=0.035..0.035 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201705_my_date_idx on my_events_raw_201705 (cost=0.43..8.46 rows=1 width=32) (actual time=0.024..0.024 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201706_my_date_idx on my_events_raw_201706 (cost=0.43..8.46 rows=1 width=32) (actual time=0.027..0.027 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201707_my_date_idx on my_events_raw_201707 (cost=0.43..8.46 rows=1 width=32) (actual time=0.025..0.025 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201708_my_date_idx on my_events_raw_201708 (cost=0.43..8.46 rows=1 width=32) (actual time=0.026..0.026 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201709_my_date_idx on my_events_raw_201709 (cost=0.43..8.46 rows=1 width=32) (actual time=0.025..0.025 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201710_my_date_idx on my_events_raw_201710 (cost=0.43..8.46 rows=1 width=32) (actual time=0.031..0.031 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201711_my_date_idx on my_events_raw_201711 (cost=0.43..8.46 rows=1 width=32) (actual time=0.191..0.191 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201712_my_date_idx on my_events_raw_201712 (cost=0.43..8.46 rows=1 width=32) (actual time=0.030..0.030 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201801_my_date_idx on my_events_raw_201801 (cost=0.43..8.46 rows=1 width=32) (actual time=0.067..0.067 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201802_my_date_idx on my_events_raw_201802 (cost=0.43..8.46 rows=1 width=32) (actual time=0.182..0.182 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201803_my_date_idx on my_events_raw_201803 (cost=0.43..8.46 rows=1 width=32) (actual time=0.028..0.028 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201804_my_date_idx on my_events_raw_201804 (cost=0.43..8.46 rows=1 width=32) (actual time=0.030..0.030 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201805_my_date_idx on my_events_raw_201805 (cost=0.43..8.46 rows=1 width=32) (actual time=0.034..0.034 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201806_my_date_idx on my_events_raw_201806 (cost=0.43..8.46 rows=1 width=32) (actual time=0.038..0.038 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201809_my_date_idx on my_events_raw_201809 (cost=0.43..8.46 rows=1 width=32) (actual time=0.045..0.045 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201810_my_date_idx on my_events_raw_201810 (cost=0.43..8.46 rows=1 width=32) (actual time=0.033..0.033 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201607_my_date_idx on my_events_raw_201607 (cost=0.43..8.45 rows=1 width=32) (actual time=0.036..0.036 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201608_my_date_idx on my_events_raw_201608 (cost=0.43..8.45 rows=1 width=32) (actual time=0.030..0.030 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201609_my_date_idx on my_events_raw_201609 (cost=0.43..8.45 rows=1 width=32) (actual time=0.029..0.029 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201610_my_date_idx on my_events_raw_201610 (cost=0.43..8.45 rows=1 width=32) (actual time=0.032..0.032 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201611_my_date_idx on my_events_raw_201611 (cost=0.43..8.45 rows=1 width=32) (actual time=0.032..0.032 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201811_my_date_idx on my_events_raw_201811 (cost=0.14..8.17 rows=1 width=32) (actual time=0.011..0.011 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201812_my_date_idx on my_events_raw_201812 (cost=0.14..8.17 rows=1 width=32) (actual time=0.006..0.006 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Seq Scan on my_events_raw_201808 (cost=0.00..146775.63 rows=2170289 width=32) (actual time=1.801..32576.815 rows=54575 loops=3)
Filter: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone) AND (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[])))
Rows Removed by Filter: 1806589
-> Materialize (cost=45919.42..47694.96 rows=355109 width=4) (never executed)
-> Sort (cost=45919.42..46807.19 rows=355109 width=4) (never executed)
Sort Key: my_lookup.my_int
-> Seq Scan on my_lookup (cost=0.00..8324.09 rows=355109 width=4) (never executed)
Planning Time: 130.655 ms
Execution Time: 32901.375 ms
(131 rows)
WORKING VERSION WITH CHANGE FROM CITEXT TO TEXT
$ psql -c "EXPLAIN ANALYSE SELECT COUNT(*)
FROM my_events
WHERE my_date >= '2018-08-01'
AND my_date < '2018-09-01'
AND type_id NOT IN ('A','LIST','OF','NOTS');" my_databasename
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=312923.12..312923.13 rows=1 width=8) (actual time=5459.973..5459.974 rows=1 loops=1)
-> Gather (cost=312922.91..312923.12 rows=2 width=8) (actual time=5459.093..5470.668 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=311922.91..311922.92 rows=1 width=8) (actual time=5440.634..5440.634 rows=1 loops=3)
-> Parallel Hash Left Join (cost=8680.58..306422.32 rows=2200235 width=0) (actual time=5440.619..5440.620 rows=0 loops=3)
Hash Cond: ((my_events_raw_201807.my_citext)::integer = my_lookup.my_int)
-> Parallel Append (cost=0.43..244162.87 rows=2200254 width=32) (actual time=10.760..5208.188 rows=56293 loops=3)
-> Parallel Bitmap Heap Scan on my_events_raw_201807 (cost=2158.55..86099.06 rows=29931 width=32) (actual time=28.020..151.682 rows=5154 loops=1)
Recheck Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
Rows Removed by Filter: 73532
-> Bitmap Index Scan on my_events_raw_201807_my_date_idx (cost=0.00..2140.59 rows=75616 width=0) (actual time=25.655..25.655 rows=78686 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
-> Parallel Index Scan using my_events_raw_201601_my_date_idx on my_events_raw_201601 (cost=0.43..8.46 rows=1 width=32) (actual time=0.150..0.150 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201602_my_date_idx on my_events_raw_201602 (cost=0.43..8.46 rows=1 width=32) (actual time=0.060..0.060 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201603_my_date_idx on my_events_raw_201603 (cost=0.43..8.46 rows=1 width=32) (actual time=0.035..0.035 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201604_my_date_idx on my_events_raw_201604 (cost=0.43..8.46 rows=1 width=32) (actual time=0.032..0.032 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201605_my_date_idx on my_events_raw_201605 (cost=0.43..8.46 rows=1 width=32) (actual time=0.032..0.032 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201606_my_date_idx on my_events_raw_201606 (cost=0.43..8.46 rows=1 width=32) (actual time=0.043..0.043 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201612_my_date_idx on my_events_raw_201612 (cost=0.43..8.46 rows=1 width=32) (actual time=0.038..0.038 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201701_my_date_idx on my_events_raw_201701 (cost=0.43..8.46 rows=1 width=32) (actual time=0.045..0.045 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201702_my_date_idx on my_events_raw_201702 (cost=0.43..8.46 rows=1 width=32) (actual time=0.038..0.038 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201703_my_date_idx on my_events_raw_201703 (cost=0.43..8.46 rows=1 width=32) (actual time=0.030..0.030 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201704_my_date_idx on my_events_raw_201704 (cost=0.43..8.46 rows=1 width=32) (actual time=0.026..0.026 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201705_my_date_idx on my_events_raw_201705 (cost=0.43..8.46 rows=1 width=32) (actual time=0.026..0.026 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201706_my_date_idx on my_events_raw_201706 (cost=0.43..8.46 rows=1 width=32) (actual time=0.029..0.029 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201707_my_date_idx on my_events_raw_201707 (cost=0.43..8.46 rows=1 width=32) (actual time=0.029..0.029 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201708_my_date_idx on my_events_raw_201708 (cost=0.43..8.46 rows=1 width=32) (actual time=0.033..0.033 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201709_my_date_idx on my_events_raw_201709 (cost=0.43..8.46 rows=1 width=32) (actual time=0.031..0.031 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201710_my_date_idx on my_events_raw_201710 (cost=0.43..8.46 rows=1 width=32) (actual time=0.034..0.035 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201711_my_date_idx on my_events_raw_201711 (cost=0.43..8.46 rows=1 width=32) (actual time=0.028..0.028 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201712_my_date_idx on my_events_raw_201712 (cost=0.43..8.46 rows=1 width=32) (actual time=0.031..0.031 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201801_my_date_idx on my_events_raw_201801 (cost=0.43..8.46 rows=1 width=32) (actual time=0.033..0.033 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201802_my_date_idx on my_events_raw_201802 (cost=0.43..8.46 rows=1 width=32) (actual time=0.030..0.030 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201803_my_date_idx on my_events_raw_201803 (cost=0.43..8.46 rows=1 width=32) (actual time=0.036..0.036 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201804_my_date_idx on my_events_raw_201804 (cost=0.43..8.46 rows=1 width=32) (actual time=0.055..0.055 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201805_my_date_idx on my_events_raw_201805 (cost=0.43..8.46 rows=1 width=32) (actual time=0.037..0.037 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201806_my_date_idx on my_events_raw_201806 (cost=0.43..8.46 rows=1 width=32) (actual time=0.037..0.037 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201809_my_date_idx on my_events_raw_201809 (cost=0.43..8.46 rows=1 width=32) (actual time=0.050..0.050 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201810_my_date_idx on my_events_raw_201810 (cost=0.43..8.46 rows=1 width=32) (actual time=0.035..0.035 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201607_my_date_idx on my_events_raw_201607 (cost=0.43..8.45 rows=1 width=32) (actual time=0.033..0.034 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201608_my_date_idx on my_events_raw_201608 (cost=0.43..8.45 rows=1 width=32) (actual time=0.044..0.044 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201609_my_date_idx on my_events_raw_201609 (cost=0.43..8.45 rows=1 width=32) (actual time=0.032..0.032 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201610_my_date_idx on my_events_raw_201610 (cost=0.43..8.45 rows=1 width=32) (actual time=0.030..0.030 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201611_my_date_idx on my_events_raw_201611 (cost=0.43..8.45 rows=1 width=32) (actual time=0.031..0.031 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201811_my_date_idx on my_events_raw_201811 (cost=0.14..8.17 rows=1 width=32) (actual time=0.010..0.010 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Index Scan using my_events_raw_201812_my_date_idx on my_events_raw_201812 (cost=0.14..8.17 rows=1 width=32) (actual time=0.007..0.007 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[]))
-> Parallel Seq Scan on my_events_raw_201808 (cost=0.00..146775.63 rows=2170289 width=32) (actual time=1.135..5141.246 rows=54575 loops=3)
Filter: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone) AND (type_id <> ALL ('{A,LIST,OF,NOTS}'::text[])))
Rows Removed by Filter: 1806589
-> Parallel Hash (cost=6252.62..6252.62 rows=147962 width=4) (actual time=116.203..116.204 rows=118370 loops=3)
Buckets: 131072 Batches: 8 Memory Usage: 2816kB
-> Parallel Seq Scan on my_lookup (cost=0.00..6252.62 rows=147962 width=4) (actual time=0.506..53.040 rows=118370 loops=3)
Planning Time: 141.878 ms
Execution Time: 5471.541 ms
(124 rows)
WORKING SQL LOADED VERSION WITH CITEXT:
$ psql -c "EXPLAIN ANALYSE SELECT COUNT(*)
FROM my_events
WHERE my_date >= '2018-08-01'
AND my_date < '2018-09-01'
AND type_id NOT IN ('A','LIST','OF','NOTS');" my_databasename
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=88550586.73..88550586.74 rows=1 width=8) (actual time=30522.517..30522.517 rows=1 loops=1)
-> Gather (cost=88550586.52..88550586.73 rows=2 width=8) (actual time=30522.496..30523.629 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=88549586.52..88549586.53 rows=1 width=8) (actual time=30500.142..30500.142 rows=1 loops=3)
-> Merge Left Join (cost=627157.66..78783047.74 rows=3906615513 width=0) (actual time=30470.062..30491.867 rows=56293 loops=3)
Merge Cond: (((my_events_raw_201807.my_citext)::integer) = my_lookup.my_int)
-> Sort (cost=581238.25..586738.88 rows=2200254 width=32) (actual time=30470.058..30476.691 rows=56293 loops=3)
Sort Key: ((my_events_raw_201807.my_citext)::integer)
Sort Method: quicksort Memory: 3265kB
Worker 0: Sort Method: quicksort Memory: 3354kB
Worker 1: Sort Method: quicksort Memory: 3269kB
-> Result (cost=0.43..244162.87 rows=2200254 width=32) (actual time=8.912..30402.504 rows=56293 loops=3)
-> Parallel Append (cost=0.43..244162.87 rows=2200254 width=32) (actual time=8.910..30360.150 rows=56293 loops=3)
-> Parallel Bitmap Heap Scan on my_events_raw_201807 (cost=2158.55..86099.06 rows=29931 width=32) (actual time=25.072..387.294 rows=5154 loops=1)
Recheck Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
Rows Removed by Filter: 73532
-> Bitmap Index Scan on my_events_raw_201807_my_date_idx (cost=0.00..2140.59 rows=75616 width=0) (actual time=22.898..22.898 rows=78686 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
-> Parallel Index Scan using my_events_raw_201601_my_date_idx on my_events_raw_201601 (cost=0.43..8.46 rows=1 width=32) (actual time=0.061..0.061 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201602_my_date_idx on my_events_raw_201602 (cost=0.43..8.46 rows=1 width=32) (actual time=0.036..0.036 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201603_my_date_idx on my_events_raw_201603 (cost=0.43..8.46 rows=1 width=32) (actual time=0.039..0.039 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201604_my_date_idx on my_events_raw_201604 (cost=0.43..8.46 rows=1 width=32) (actual time=0.037..0.038 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201605_my_date_idx on my_events_raw_201605 (cost=0.43..8.46 rows=1 width=32) (actual time=0.042..0.042 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201606_my_date_idx on my_events_raw_201606 (cost=0.43..8.46 rows=1 width=32) (actual time=0.037..0.037 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201612_my_date_idx on my_events_raw_201612 (cost=0.43..8.46 rows=1 width=32) (actual time=0.038..0.038 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201701_my_date_idx on my_events_raw_201701 (cost=0.43..8.46 rows=1 width=32) (actual time=0.045..0.045 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201702_my_date_idx on my_events_raw_201702 (cost=0.43..8.46 rows=1 width=32) (actual time=0.041..0.041 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201703_my_date_idx on my_events_raw_201703 (cost=0.43..8.46 rows=1 width=32) (actual time=0.035..0.035 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201704_my_date_idx on my_events_raw_201704 (cost=0.43..8.46 rows=1 width=32) (actual time=0.040..0.041 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201705_my_date_idx on my_events_raw_201705 (cost=0.43..8.46 rows=1 width=32) (actual time=0.033..0.033 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201706_my_date_idx on my_events_raw_201706 (cost=0.43..8.46 rows=1 width=32) (actual time=0.027..0.027 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201707_my_date_idx on my_events_raw_201707 (cost=0.43..8.46 rows=1 width=32) (actual time=0.034..0.034 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201708_my_date_idx on my_events_raw_201708 (cost=0.43..8.46 rows=1 width=32) (actual time=0.032..0.032 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201709_my_date_idx on my_events_raw_201709 (cost=0.43..8.46 rows=1 width=32) (actual time=0.032..0.032 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201710_my_date_idx on my_events_raw_201710 (cost=0.43..8.46 rows=1 width=32) (actual time=0.034..0.034 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201711_my_date_idx on my_events_raw_201711 (cost=0.43..8.46 rows=1 width=32) (actual time=0.035..0.035 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201712_my_date_idx on my_events_raw_201712 (cost=0.43..8.46 rows=1 width=32) (actual time=0.034..0.034 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201801_my_date_idx on my_events_raw_201801 (cost=0.43..8.46 rows=1 width=32) (actual time=0.036..0.036 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201802_my_date_idx on my_events_raw_201802 (cost=0.43..8.46 rows=1 width=32) (actual time=0.034..0.034 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201803_my_date_idx on my_events_raw_201803 (cost=0.43..8.46 rows=1 width=32) (actual time=0.030..0.030 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201804_my_date_idx on my_events_raw_201804 (cost=0.43..8.46 rows=1 width=32) (actual time=0.033..0.033 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201805_my_date_idx on my_events_raw_201805 (cost=0.43..8.46 rows=1 width=32) (actual time=0.036..0.036 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201806_my_date_idx on my_events_raw_201806 (cost=0.43..8.46 rows=1 width=32) (actual time=0.033..0.033 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201809_my_date_idx on my_events_raw_201809 (cost=0.43..8.46 rows=1 width=32) (actual time=0.046..0.046 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201810_my_date_idx on my_events_raw_201810 (cost=0.43..8.46 rows=1 width=32) (actual time=0.033..0.033 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201607_my_date_idx on my_events_raw_201607 (cost=0.43..8.45 rows=1 width=32) (actual time=0.036..0.036 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201608_my_date_idx on my_events_raw_201608 (cost=0.43..8.45 rows=1 width=32) (actual time=0.029..0.029 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201609_my_date_idx on my_events_raw_201609 (cost=0.43..8.45 rows=1 width=32) (actual time=0.030..0.030 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201610_my_date_idx on my_events_raw_201610 (cost=0.43..8.45 rows=1 width=32) (actual time=0.034..0.034 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201611_my_date_idx on my_events_raw_201611 (cost=0.43..8.45 rows=1 width=32) (actual time=0.033..0.033 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201811_my_date_idx on my_events_raw_201811 (cost=0.14..8.17 rows=1 width=32) (actual time=0.011..0.012 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Index Scan using my_events_raw_201812_my_date_idx on my_events_raw_201812 (cost=0.14..8.17 rows=1 width=32) (actual time=0.006..0.006 rows=0 loops=1)
Index Cond: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone))
Filter: (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[]))
-> Parallel Seq Scan on my_events_raw_201808 (cost=0.00..146775.63 rows=2170289 width=32) (actual time=0.759..30203.782 rows=54575 loops=3)
Filter: ((my_date >= '2018-08-01 00:00:00+10'::timestamp with time zone) AND (my_date < '2018-09-01 00:00:00+10'::timestamp with time zone) AND (type_id <> ALL ('{A,LIST,OF,NOTS}'::citext[])))
Rows Removed by Filter: 1806589
-> Materialize (cost=45919.42..47694.96 rows=355109 width=4) (never executed)
-> Sort (cost=45919.42..46807.19 rows=355109 width=4) (never executed)
Sort Key: my_lookup.my_int
-> Seq Scan on my_lookup (cost=0.00..8324.09 rows=355109 width=4) (never executed)
Planning Time: 203.442 ms
Execution Time: 30529.264 ms
(131 rows)
CHEERS
Paul
On 2 Nov 2018, at 10:32, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
"Paul" == Paul Schaap <ps@ipggroup.com> writes:
Paul> Hi Andrew,
Paul> I agree it seems implausible to me too however it is 100%
Paul> reproducible, see below track of it happening, I have changed the
Paul> names of things to protect the innocent ;-) :Can you show the result of adding EXPLAIN ANALYZE in front of your
select queries, both in the failing and successful cases?i.e.
$ psql -c "EXPLAIN ANALYZE SELECT COUNT(*)
FROM my_events
WHERE my_date >= '2018-08-01'
AND my_date < '2018-09-01'
AND type_id NOT IN ('A','LIST','OF','NOTS');" my_databasename(and likewise for the other queries you showed)
--
Andrew (irc:RhodiumToad)
--
PRIVACY & CONFIDENTIALITY NOTICE
The information contained in this
email is intended for the named recipients only. It may contain
privileged and confidential information and if you are not the intended
recipient, you should not copy it or disclose its contents to any other
person, or take any action in reliance on it. If you have received this
email in error, please notify us immediately by return email.
"Paul" == Paul Schaap <ps@ipggroup.com> writes:
Paul> Hi Andrew,
Paul> Here you go:
This is looking like a bug in parallel hash. Do the queries all work if
you do set enable_parallel_hash=false; ?
(Going by the row counts in the explains, you got the first two "broken"
and "working" ones backwards)
--
Andrew.
Hi Andrew,
Bingo, set enable_parallel_hash=false; gets a correct result whereas set enable_parallel_hash=true; gets 0.
Yes I might have reversed some of the explains, my excuse its Friday and I went to bed late and am burnt out today :-)
CHEERS
Paul
On 2 Nov 2018, at 11:07, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
"Paul" == Paul Schaap <ps@ipggroup.com> writes:
Paul> Hi Andrew,
Paul> Here you go:This is looking like a bug in parallel hash. Do the queries all work if
you do set enable_parallel_hash=false; ?(Going by the row counts in the explains, you got the first two "broken"
and "working" ones backwards)--
Andrew.
--
PRIVACY & CONFIDENTIALITY NOTICE
The information contained in this
email is intended for the named recipients only. It may contain
privileged and confidential information and if you are not the intended
recipient, you should not copy it or disclose its contents to any other
person, or take any action in reliance on it. If you have received this
email in error, please notify us immediately by return email.
"Paul" == Paul Schaap <ps@ipggroup.com> writes:
Paul> Hi Andrew,
Paul> Bingo, set enable_parallel_hash=false; gets a correct result
Paul> whereas set enable_parallel_hash=true; gets 0.
Paul> Yes I might have reversed some of the explains, my excuse its
Paul> Friday and I went to bed late and am burnt out today :-)
Are all the values of the my_citext column actually null?
--
Andrew (irc:RhodiumToad)
On Fri, Nov 2, 2018 at 2:07 PM Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
"Paul" == Paul Schaap <ps@ipggroup.com> writes:
Paul> Hi Andrew,
Paul> Bingo, set enable_parallel_hash=false; gets a correct result
Paul> whereas set enable_parallel_hash=true; gets 0.Paul> Yes I might have reversed some of the explains, my excuse its
Paul> Friday and I went to bed late and am burnt out today :-)Are all the values of the my_citext column actually null?
Thanks for the report Paul and the analysis Andrew. Discussed with
Andrew a bit on IRC. Summary: multi-batch left joins are not handling
NULLs correctly in the left table when partitioning. Looking into
this now.
--
Thomas Munro
http://www.enterprisedb.com
On Fri, Nov 2, 2018 at 2:29 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
On Fri, Nov 2, 2018 at 2:07 PM Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:"Paul" == Paul Schaap <ps@ipggroup.com> writes:
Paul> Hi Andrew,
Paul> Bingo, set enable_parallel_hash=false; gets a correct result
Paul> whereas set enable_parallel_hash=true; gets 0.Paul> Yes I might have reversed some of the explains, my excuse its
Paul> Friday and I went to bed late and am burnt out today :-)Are all the values of the my_citext column actually null?
Thanks for the report Paul and the analysis Andrew. Discussed with
Andrew a bit on IRC. Summary: multi-batch left joins are not handling
NULLs correctly in the left table when partitioning. Looking into
this now.
Here's a repro.
create table r as select generate_series(1, 1000000) i, null::int j;
update r set j = i where i <= 10;
create table s as select generate_series(1, 1000000) i;
analyze;
select count(*), count(r.j) from r left join s on r.j = s.i;
Unpatched master gives me a 16 batch Parallel Hash Join with the
incorrect answer:
count | count
-------+-------
10 | 10
With the attached patch the answer is correct:
count | count
---------+-------
1000000 | 10
The brown-paper-bag level fix is:
- false, /* outer join, currently unsupported */
+ HJ_FILL_OUTER(hjstate),
It is right and full outer joins that are not yet supported by
Parallel Hash Join. Left outer joins *are* supported. The effect of that
thinko is to make them behave like inner joins (but only in multi-batch
joins, ie when work_mem is exceeded).
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
0001-Fix-bug-in-multi-batch-Parallel-Hash-Left-Join.patchapplication/x-patch; name=0001-Fix-bug-in-multi-batch-Parallel-Hash-Left-Join.patchDownload+1-2
On Fri, Nov 2, 2018 at 3:00 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
... The effect of that
thinko is to make them behave like inner joins (but only in multi-batch
joins, ie when work_mem is exceeded).
To be more precise, I meant they behave like inner joins just for rows
with NULL keys, not in general.
--
Thomas Munro
http://www.enterprisedb.com
On Fri, Nov 2, 2018 at 3:00 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
On Fri, Nov 2, 2018 at 2:29 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:On Fri, Nov 2, 2018 at 2:07 PM Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:"Paul" == Paul Schaap <ps@ipggroup.com> writes:
Paul> Hi Andrew,
Paul> Bingo, set enable_parallel_hash=false; gets a correct result
Paul> whereas set enable_parallel_hash=true; gets 0.Paul> Yes I might have reversed some of the explains, my excuse its
Paul> Friday and I went to bed late and am burnt out today :-)Are all the values of the my_citext column actually null?
Thanks for the report Paul and the analysis Andrew. Discussed with
Andrew a bit on IRC. Summary: multi-batch left joins are not handling
NULLs correctly in the left table when partitioning. Looking into
this now.Here's a repro.
create table r as select generate_series(1, 1000000) i, null::int j;
update r set j = i where i <= 10;
create table s as select generate_series(1, 1000000) i;
analyze;
select count(*), count(r.j) from r left join s on r.j = s.i;Unpatched master gives me a 16 batch Parallel Hash Join with the
incorrect answer:count | count
-------+-------
10 | 10With the attached patch the answer is correct:
count | count
---------+-------
1000000 | 10The brown-paper-bag level fix is:
- false, /* outer join, currently unsupported */ + HJ_FILL_OUTER(hjstate),It is right and full outer joins that are not yet supported by
Parallel Hash Join. Left outer joins *are* supported. The effect of that
thinko is to make them behave like inner joins (but only in multi-batch
joins, ie when work_mem is exceeded).
Pushed.
--
Thomas Munro
http://www.enterprisedb.com