Parallel Scan Bug: invalid attnum: 0

Started by Steve Randallover 9 years ago6 messagesbugs

srandall@s3.com

over 9 years ago

running pg 9.6.1 on AWS

max_parallel_workers_per_gather = 4

This only occurs in some schemas, not all. The affected schemas have
approx 1 million rows in the table. Much larger schemas (65 million rows)
and smaller schemas(100K rows) do not have the problem.

table definition:
CREATE TABLE ro_summaries (
day date NOT NULL,
entry_firm varchar(15) NOT NULL,
route_firm varchar(15) NOT NULL,
order_type varchar(3) NOT NULL,
symbol varchar(15) NOT NULL,
bucket_id int4 NOT NULL,
entry_status varchar(2) NOT NULL,
covered_order bool NOT NULL,
orders_sent int4 NOT NULL DEFAULT 0,
shares_sent int4 NOT NULL DEFAULT 0,
orders_pos int4 NOT NULL DEFAULT 0,
orders_zero int4 NOT NULL DEFAULT 0,
orders_neg int4 NOT NULL DEFAULT 0,
orders_canceled int4 NOT NULL DEFAULT 0,
orders_null int4 NOT NULL DEFAULT 0,
shares_pos int4 NOT NULL DEFAULT 0,
shares_zero int4 NOT NULL DEFAULT 0,
shares_neg int4 NOT NULL DEFAULT 0,
shares_canceled int4 NOT NULL DEFAULT 0,
shares_null int4 NOT NULL DEFAULT 0,
pi_pos numeric NOT NULL DEFAULT 0,
pi_neg numeric NOT NULL DEFAULT 0,
efq numeric NULL,
effective_spread numeric NULL,
quoted_spread numeric NULL,
realized_spread numeric NULL,
speed numeric NULL,
part1_shares int4 NULL,
trades_total int4 NULL,
orders_filled int4 NULL DEFAULT 0,
efq_shares int4 NULL,
exchange varchar(10) NULL,
notional_value numeric NULL
)
WITHOUT OIDS
TABLESPACE pg_default
GO
CREATE UNIQUE INDEX rosum_ux01
ON ro_summaries USING btree (day date_ops, order_type text_ops, bucket_id
int4_ops, route_firm text_ops, entry_firm text_ops, symbol text_ops,
entry_status text_ops, covered_order bool_ops)

view definition:
CREATE OR REPLACE VIEW ro_summaries_v
AS
SELECT ro_summaries.day,
'S3'::character varying AS source_code,
ro_summaries.entry_firm,
ro_summaries.route_firm,
ro_summaries.order_type,
ro_summaries.symbol,
ro_summaries.bucket_id,
ro_summaries.entry_status,
ro_summaries.covered_order,
ro_summaries.orders_sent,
ro_summaries.shares_sent,
(((ro_summaries.shares_pos + ro_summaries.shares_neg) +
ro_summaries.shares_zero) + ro_summaries.shares_null) AS shares_total,
(ro_summaries.shares_pos + ro_summaries.shares_zero) AS
shares_atorbetter,
ro_summaries.shares_pos,
ro_summaries.shares_neg,
ro_summaries.trades_total,
ro_summaries.pi_pos,
ro_summaries.pi_neg,
ro_summaries.efq,
ro_summaries.effective_spread,
ro_summaries.quoted_spread,
ro_summaries.realized_spread,
ro_summaries.speed,
ro_summaries.part1_shares,
ro_summaries.orders_filled,
ro_summaries.efq_shares,
ro_summaries.exchange,
ro_summaries.notional_value
FROM msco_data.ro_summaries;

offending query:
select
count(*) OVER() AS full_count,
source_code,
sum(shares_atorbetter) shares_atorbetter,
'All ' day_display, 'All ' entry_firm, 'All ' route_firm, 'All ' symbol,
'All ' order_type,
0 bucket_id,
case when sum(shares_total) > 0 then
(sum(shares_atorbetter)::numeric/sum(shares_total)::numeric)*100 end
at_or_better,
case WHEN sum((shares_total)) > 0 THEN
(sum(shares_pos)::numeric/sum((shares_total))::numeric)::numeric*100 END
pi_percent, coalesce(sum(pi_pos),0) pi_amount,
case WHEN sum(shares_pos) > 0 THEN (sum(pi_pos)::numeric /
sum(shares_pos)::numeric)::numeric ELSE 0 END pi_per_unit,
case WHEN sum((shares_total)) > 0 THEN
(sum(shares_neg)::numeric/sum((shares_total))::numeric)::numeric*100 END
slippage_percent,
coalesce(sum(pi_neg),0) slippage_amount,
case WHEN sum((efq_shares)) > 0 THEN
sum((efq_shares)*effective_spread)::numeric/sum((efq_shares))::numeric END
effective_spread,
case WHEN sum((efq_shares)) > 0 THEN
sum((efq_shares)*quoted_spread)::numeric/sum((efq_shares))::numeric END
quoted_spread,
case WHEN sum((efq_shares)) > 0 THEN
sum((efq_shares)*realized_spread)::numeric/sum((efq_shares))::numeric END
realized_spread,
case when sum((efq_shares)*quoted_spread) > 0 THEN
(sum((efq_shares)*effective_spread)::numeric/sum((efq_shares)*quoted_spread)::numeric)
* 100 END EFQ,
sum(trades_total) trades, sum((shares_total)) shares,
sum(shares_pos) shares_pos, sum(shares_neg) shares_neg,
sum(orders_sent) orders, sum(shares_sent) ordered_shares,
sum(pi_pos + pi_neg) variance,
case when sum((shares_total)) > 0 THEN sum(pi_pos + pi_neg) /
sum((shares_total))::numeric END avg_variance,
case when sum(orders_sent) > 0 THEN sum(shares_sent) / sum(orders_sent) END
avg_shares_order,
case when sum(shares_sent) > 0 then
least((sum((shares_total))::numeric/sum(shares_sent)::numeric)::numeric*100,100)
END pct_shares_executed,
sum(notional_value) notional_value,
sum(orders_filled) orders_filled,
case when sum(part1_shares) > 0 then sum(part1_shares * speed)::numeric /
sum(part1_shares)::numeric end speed
from ro_summaries_v
where day = '2016-11-07'
group by source_code
ORDER BY day_display desc NULLS LAST
LIMIT 25 OFFSET 0;

Commenting out source_code (and the group by) allows the query to run.
Alternately, commenting out several of the case statements also allows it
to run.

In the meantime I've set max_parallel_workers_per_gather = 0 and the query
works fine.

Please let me know if you need any more information.

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Steve Randall (#1)

Re: Parallel Scan Bug: invalid attnum: 0

Steve Randall <srandall@s3.com> writes:

running pg 9.6.1 on AWS
max_parallel_workers_per_gather = 4
This only occurs in some schemas, not all. The affected schemas have
approx 1 million rows in the table. Much larger schemas (65 million rows)
and smaller schemas(100K rows) do not have the problem.

FWIW, I failed to duplicate this from the information provided.
Can you get us EXPLAIN output for the troublesome case, or does it
fail before printing the EXPLAIN output? How about a stack trace
from the errfinish call?

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Steve Randall

srandall@s3.com

over 9 years ago

In reply to: Tom Lane (#2)

Re: Parallel Scan Bug: invalid attnum: 0

I have been able to reproduce the error in my development environment.
EXPLAIN output is attached. EXPLAIN ANALYZE fails with the error
mentioned.

I have to do a little configuring to get the stack trace logs, but it will
be a couple of weeks before they are ready. Tomorrow I head to San
Francisco for a few days off prior to pgConf.

On Wed, Nov 9, 2016 at 12:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Steve Randall <srandall@s3.com> writes:

running pg 9.6.1 on AWS
max_parallel_workers_per_gather = 4
This only occurs in some schemas, not all. The affected schemas have
approx 1 million rows in the table. Much larger schemas (65 million

rows)

and smaller schemas(100K rows) do not have the problem.

FWIW, I failed to duplicate this from the information provided.
Can you get us EXPLAIN output for the troublesome case, or does it
fail before printing the EXPLAIN output? How about a stack trace
from the errfinish call?

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Steve Randall (#3)

Re: Parallel Scan Bug: invalid attnum: 0

Steve Randall <srandall@s3.com> writes:

I have been able to reproduce the error in my development environment.
EXPLAIN output is attached. EXPLAIN ANALYZE fails with the error
mentioned.

Ah, I've duplicated it. My previous attempt to reverse-engineer your
test case was not selecting a partial-aggregation plan.

Immediate impression is that the logic for planning partial grouped
aggregation did not get the zero-sort-keys case right. In my hands
it produces

TRAP: BadArgument("!(nkeys > 0)", File: "tuplesort.c", Line: 763)
2016-11-10 10:44:26.955 EST [14415] LOG: server process (PID 15726) was terminated by signal 6: Aborted

but in a non-assert build of course you'd get some other misbehavior.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Tom Lane (#4)

Re: Parallel Scan Bug: invalid attnum: 0

I wrote:

Immediate impression is that the logic for planning partial grouped
aggregation did not get the zero-sort-keys case right.

Yeah: after flattening your view, the planner was left with GROUP BY
'S3'::character varying, which is a no-op, but it mistakenly inserted
a no-op Sort into the plan to sort by that. One-liner fix is here
if you need it right away:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7defc3b97a31537547053946808a83e7234d1b61

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Steve Randall

srandall@s3.com

over 9 years ago

In reply to: Tom Lane (#5)

Re: Parallel Scan Bug: invalid attnum: 0

Thank you.

On Thu, Nov 10, 2016 at 10:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

I wrote:

Immediate impression is that the logic for planning partial grouped
aggregation did not get the zero-sort-keys case right.

Yeah: after flattening your view, the planner was left with GROUP BY
'S3'::character varying, which is a no-op, but it mistakenly inserted
a no-op Sort into the plan to sort by that. One-liner fix is here
if you need it right away:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=
7defc3b97a31537547053946808a83e7234d1b61

regards, tom lane

Parallel Scan Bug: invalid attnum: 0

Attachments: