DBT-3 with SF=20 got failed

Started by KaiGai Koheialmost 11 years ago51 messageshackers

kaigai@ak.jp.nec.com

almost 11 years ago

Hello,

I got the following error during DBT-3 benchmark with SF=20.

psql:query21.sql:50: ERROR: invalid memory alloc request size 1073741824
psql:query21.sql:50: ERROR: invalid memory alloc request size 1073741824

It looks to me Hash node tries to 1GB area using palloc0(), but it exceeds
the limitation of none-huge interface.

(gdb) bt
#0 0x00007f669d29a989 in raise () from /lib64/libc.so.6
#1 0x00007f669d29c098 in abort () from /lib64/libc.so.6
#2 0x000000000090ccfd in ExceptionalCondition (conditionName=0xb18130 "!(((Size) (size) <= ((Size) 0x3fffffff)))",
errorType=0xb17efd "FailedAssertion", fileName=0xb17e40 "mcxt.c", lineNumber=856) at assert.c:54
#3 0x000000000093ad53 in palloc0 (size=1073741824) at mcxt.c:856
#4 0x0000000000673045 in ExecHashTableCreate (node=0x7f669de951f0, hashOperators=0x24dbf90, keepNulls=0 '\000') at nodeHash.c:391
#5 0x00000000006752e1 in ExecHashJoin (node=0x24d74e0) at nodeHashjoin.c:169
#6 0x000000000065abf4 in ExecProcNode (node=0x24d74e0) at execProcnode.c:477
#7 0x0000000000681026 in ExecNestLoop (node=0x24d6668) at nodeNestloop.c:123
#8 0x000000000065abca in ExecProcNode (node=0x24d6668) at execProcnode.c:469
#9 0x0000000000681026 in ExecNestLoop (node=0x24d61f8) at nodeNestloop.c:123
#10 0x000000000065abca in ExecProcNode (node=0x24d61f8) at execProcnode.c:469
#11 0x0000000000681026 in ExecNestLoop (node=0x24d5478) at nodeNestloop.c:123
#12 0x000000000065abca in ExecProcNode (node=0x24d5478) at execProcnode.c:469
#13 0x0000000000681026 in ExecNestLoop (node=0x24d51d0) at nodeNestloop.c:123
#14 0x000000000065abca in ExecProcNode (node=0x24d51d0) at execProcnode.c:469

The attached patch replaces this palloc0() by MemoryContextAllocHuge() + memset().
Indeed, this hash table is constructed towards the relation with nrows=119994544,
so, it is not strange even if hash-slot itself is larger than 1GB.

Another allocation request potentially reset of expand hash-slot may also needs
to be "Huge" version of memory allocation, I think.

Thanks,

Below is the query itself and EXPLAIN result.
--------------------------------------------------------------------
dbt3c=# EXPLAIN VERBOSE
dbt3c-# select
dbt3c-# s_name,
dbt3c-# count(*) as numwait
dbt3c-# from
dbt3c-# supplier,
dbt3c-# lineitem l1,
dbt3c-# orders,
dbt3c-# nation
dbt3c-# where
dbt3c-# s_suppkey = l1.l_suppkey
dbt3c-# and o_orderkey = l1.l_orderkey
dbt3c-# and o_orderstatus = 'F'
dbt3c-# and l1.l_receiptdate > l1.l_commitdate
dbt3c-# and exists (
dbt3c(# select
dbt3c(# *
dbt3c(# from
dbt3c(# lineitem l2
dbt3c(# where
dbt3c(# l2.l_orderkey = l1.l_orderkey
dbt3c(# and l2.l_suppkey <> l1.l_suppkey
dbt3c(# )
dbt3c-# and not exists (
dbt3c(# select
dbt3c(# *
dbt3c(# from
dbt3c(# lineitem l3
dbt3c(# where
dbt3c(# l3.l_orderkey = l1.l_orderkey
dbt3c(# and l3.l_suppkey <> l1.l_suppkey
dbt3c(# and l3.l_receiptdate > l3.l_commitdate
dbt3c(# )
dbt3c-# and s_nationkey = n_nationkey
dbt3c-# and n_name = 'UNITED KINGDOM'
dbt3c-# group by
dbt3c-# s_name
dbt3c-# order by
dbt3c-# numwait desc,
dbt3c-# s_name
dbt3c-# LIMIT 100;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------
------------------
Limit (cost=6792765.24..6792765.24 rows=1 width=26)
Output: supplier.s_name, (count(*))
-> Sort (cost=6792765.24..6792765.24 rows=1 width=26)
Output: supplier.s_name, (count(*))
Sort Key: (count(*)) DESC, supplier.s_name
-> HashAggregate (cost=6792765.22..6792765.23 rows=1 width=26)
Output: supplier.s_name, count(*)
Group Key: supplier.s_name
-> Nested Loop Anti Join (cost=4831094.94..6792765.21 rows=1 width=26)
Output: supplier.s_name
-> Nested Loop (cost=4831094.37..6792737.52 rows=1 width=34)
Output: supplier.s_name, l1.l_suppkey, l1.l_orderkey
Join Filter: (supplier.s_nationkey = nation.n_nationkey)
-> Nested Loop (cost=4831094.37..6792736.19 rows=1 width=38)
Output: supplier.s_name, supplier.s_nationkey, l1.l_suppkey, l1.l_orderkey
-> Nested Loop (cost=4831093.81..6792728.20 rows=1 width=42)
Output: supplier.s_name, supplier.s_nationkey, l1.l_suppkey, l1.l_orderkey, l2.l_orderkey
Join Filter: (l1.l_suppkey = supplier.s_suppkey)
-> Hash Semi Join (cost=4831093.81..6783870.20 rows=1 width=12)
Output: l1.l_suppkey, l1.l_orderkey, l2.l_orderkey
Hash Cond: (l1.l_orderkey = l2.l_orderkey)
Join Filter: (l2.l_suppkey <> l1.l_suppkey)
-> Index Scan using lineitem_l_orderkey_idx_part1 on public.lineitem l1 (cost=0.57..1847781.73 rows
=39998181 width=8)
Output: l1.l_orderkey, l1.l_partkey, l1.l_suppkey, l1.l_linenumber, l1.l_quantity, l1.l_extende
dprice, l1.l_discount, l1.l_tax, l1.l_returnflag, l1.l_linestatus, l1.l_shipdate, l1.l_commitdate, l1.l_receiptdate, l1.l_shipinstruct, l1.l_shipm
ode, l1.l_comment
-> Hash (cost=3331161.44..3331161.44 rows=119994544 width=8)
Output: l2.l_orderkey, l2.l_suppkey
-> Seq Scan on public.lineitem l2 (cost=0.00..3331161.44 rows=119994544 width=8)
Output: l2.l_orderkey, l2.l_suppkey
-> Seq Scan on public.supplier (cost=0.00..6358.00 rows=200000 width=34)
Output: supplier.s_suppkey, supplier.s_name, supplier.s_address, supplier.s_nationkey, supplier.s_pho
ne, supplier.s_acctbal, supplier.s_comment
-> Index Scan using orders_o_orderkey_o_orderdate_idx on public.orders (cost=0.56..7.98 rows=1 width=4)
Output: orders.o_orderkey, orders.o_custkey, orders.o_orderstatus, orders.o_totalprice, orders.o_orderdate,
orders.o_orderpriority, orders.o_clerk, orders.o_shippriority, orders.o_comment
Index Cond: (orders.o_orderkey = l1.l_orderkey)
Filter: (orders.o_orderstatus = 'F'::bpchar)
-> Seq Scan on public.nation (cost=0.00..1.31 rows=1 width=4)
Output: nation.n_nationkey, nation.n_name, nation.n_regionkey, nation.n_comment
Filter: (nation.n_name = 'UNITED KINGDOM'::bpchar)
-> Index Scan using lineitem_l_orderkey_idx_part1 on public.lineitem l3 (cost=0.57..13.69 rows=89 width=8)
Output: l3.l_orderkey, l3.l_partkey, l3.l_suppkey, l3.l_linenumber, l3.l_quantity, l3.l_extendedprice, l3.l_discount, l
3.l_tax, l3.l_returnflag, l3.l_linestatus, l3.l_shipdate, l3.l_commitdate, l3.l_receiptdate, l3.l_shipinstruct, l3.l_shipmode, l3.l_comment
Index Cond: (l3.l_orderkey = l1.l_orderkey)
Filter: (l3.l_suppkey <> l1.l_suppkey)
(41 rows)

--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Merlin Moncure

mmoncure@gmail.com

almost 11 years ago

In reply to: KaiGai Kohei (#1)

Re: DBT-3 with SF=20 got failed

On Wed, Jun 10, 2015 at 9:57 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Hello,

I got the following error during DBT-3 benchmark with SF=20.

psql:query21.sql:50: ERROR: invalid memory alloc request size 1073741824
psql:query21.sql:50: ERROR: invalid memory alloc request size 1073741824

It looks to me Hash node tries to 1GB area using palloc0(), but it exceeds
the limitation of none-huge interface.

(gdb) bt
#0 0x00007f669d29a989 in raise () from /lib64/libc.so.6
#1 0x00007f669d29c098 in abort () from /lib64/libc.so.6
#2 0x000000000090ccfd in ExceptionalCondition (conditionName=0xb18130 "!(((Size) (size) <= ((Size) 0x3fffffff)))",
errorType=0xb17efd "FailedAssertion", fileName=0xb17e40 "mcxt.c", lineNumber=856) at assert.c:54
#3 0x000000000093ad53 in palloc0 (size=1073741824) at mcxt.c:856
#4 0x0000000000673045 in ExecHashTableCreate (node=0x7f669de951f0, hashOperators=0x24dbf90, keepNulls=0 '\000') at nodeHash.c:391
#5 0x00000000006752e1 in ExecHashJoin (node=0x24d74e0) at nodeHashjoin.c:169
#6 0x000000000065abf4 in ExecProcNode (node=0x24d74e0) at execProcnode.c:477
#7 0x0000000000681026 in ExecNestLoop (node=0x24d6668) at nodeNestloop.c:123
#8 0x000000000065abca in ExecProcNode (node=0x24d6668) at execProcnode.c:469
#9 0x0000000000681026 in ExecNestLoop (node=0x24d61f8) at nodeNestloop.c:123
#10 0x000000000065abca in ExecProcNode (node=0x24d61f8) at execProcnode.c:469
#11 0x0000000000681026 in ExecNestLoop (node=0x24d5478) at nodeNestloop.c:123
#12 0x000000000065abca in ExecProcNode (node=0x24d5478) at execProcnode.c:469
#13 0x0000000000681026 in ExecNestLoop (node=0x24d51d0) at nodeNestloop.c:123
#14 0x000000000065abca in ExecProcNode (node=0x24d51d0) at execProcnode.c:469

The attached patch replaces this palloc0() by MemoryContextAllocHuge() + memset().
Indeed, this hash table is constructed towards the relation with nrows=119994544,
so, it is not strange even if hash-slot itself is larger than 1GB.

Another allocation request potentially reset of expand hash-slot may also needs
to be "Huge" version of memory allocation, I think.

Thanks,

Below is the query itself and EXPLAIN result.
--------------------------------------------------------------------
dbt3c=# EXPLAIN VERBOSE
dbt3c-# select
dbt3c-# s_name,
dbt3c-# count(*) as numwait
dbt3c-# from
dbt3c-# supplier,
dbt3c-# lineitem l1,
dbt3c-# orders,
dbt3c-# nation
dbt3c-# where
dbt3c-# s_suppkey = l1.l_suppkey
dbt3c-# and o_orderkey = l1.l_orderkey
dbt3c-# and o_orderstatus = 'F'
dbt3c-# and l1.l_receiptdate > l1.l_commitdate
dbt3c-# and exists (
dbt3c(# select
dbt3c(# *
dbt3c(# from
dbt3c(# lineitem l2
dbt3c(# where
dbt3c(# l2.l_orderkey = l1.l_orderkey
dbt3c(# and l2.l_suppkey <> l1.l_suppkey
dbt3c(# )
dbt3c-# and not exists (
dbt3c(# select
dbt3c(# *
dbt3c(# from
dbt3c(# lineitem l3
dbt3c(# where
dbt3c(# l3.l_orderkey = l1.l_orderkey
dbt3c(# and l3.l_suppkey <> l1.l_suppkey
dbt3c(# and l3.l_receiptdate > l3.l_commitdate
dbt3c(# )
dbt3c-# and s_nationkey = n_nationkey
dbt3c-# and n_name = 'UNITED KINGDOM'
dbt3c-# group by
dbt3c-# s_name
dbt3c-# order by
dbt3c-# numwait desc,
dbt3c-# s_name
dbt3c-# LIMIT 100;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------
------------------
Limit (cost=6792765.24..6792765.24 rows=1 width=26)
Output: supplier.s_name, (count(*))
-> Sort (cost=6792765.24..6792765.24 rows=1 width=26)
Output: supplier.s_name, (count(*))
Sort Key: (count(*)) DESC, supplier.s_name
-> HashAggregate (cost=6792765.22..6792765.23 rows=1 width=26)
Output: supplier.s_name, count(*)
Group Key: supplier.s_name
-> Nested Loop Anti Join (cost=4831094.94..6792765.21 rows=1 width=26)
Output: supplier.s_name
-> Nested Loop (cost=4831094.37..6792737.52 rows=1 width=34)
Output: supplier.s_name, l1.l_suppkey, l1.l_orderkey
Join Filter: (supplier.s_nationkey = nation.n_nationkey)
-> Nested Loop (cost=4831094.37..6792736.19 rows=1 width=38)
Output: supplier.s_name, supplier.s_nationkey, l1.l_suppkey, l1.l_orderkey
-> Nested Loop (cost=4831093.81..6792728.20 rows=1 width=42)
Output: supplier.s_name, supplier.s_nationkey, l1.l_suppkey, l1.l_orderkey, l2.l_orderkey
Join Filter: (l1.l_suppkey = supplier.s_suppkey)
-> Hash Semi Join (cost=4831093.81..6783870.20 rows=1 width=12)
Output: l1.l_suppkey, l1.l_orderkey, l2.l_orderkey
Hash Cond: (l1.l_orderkey = l2.l_orderkey)
Join Filter: (l2.l_suppkey <> l1.l_suppkey)
-> Index Scan using lineitem_l_orderkey_idx_part1 on public.lineitem l1 (cost=0.57..1847781.73 rows
=39998181 width=8)
Output: l1.l_orderkey, l1.l_partkey, l1.l_suppkey, l1.l_linenumber, l1.l_quantity, l1.l_extende
dprice, l1.l_discount, l1.l_tax, l1.l_returnflag, l1.l_linestatus, l1.l_shipdate, l1.l_commitdate, l1.l_receiptdate, l1.l_shipinstruct, l1.l_shipm
ode, l1.l_comment
-> Hash (cost=3331161.44..3331161.44 rows=119994544 width=8)
Output: l2.l_orderkey, l2.l_suppkey
-> Seq Scan on public.lineitem l2 (cost=0.00..3331161.44 rows=119994544 width=8)
Output: l2.l_orderkey, l2.l_suppkey
-> Seq Scan on public.supplier (cost=0.00..6358.00 rows=200000 width=34)
Output: supplier.s_suppkey, supplier.s_name, supplier.s_address, supplier.s_nationkey, supplier.s_pho
ne, supplier.s_acctbal, supplier.s_comment
-> Index Scan using orders_o_orderkey_o_orderdate_idx on public.orders (cost=0.56..7.98 rows=1 width=4)
Output: orders.o_orderkey, orders.o_custkey, orders.o_orderstatus, orders.o_totalprice, orders.o_orderdate,
orders.o_orderpriority, orders.o_clerk, orders.o_shippriority, orders.o_comment
Index Cond: (orders.o_orderkey = l1.l_orderkey)
Filter: (orders.o_orderstatus = 'F'::bpchar)
-> Seq Scan on public.nation (cost=0.00..1.31 rows=1 width=4)
Output: nation.n_nationkey, nation.n_name, nation.n_regionkey, nation.n_comment
Filter: (nation.n_name = 'UNITED KINGDOM'::bpchar)
-> Index Scan using lineitem_l_orderkey_idx_part1 on public.lineitem l3 (cost=0.57..13.69 rows=89 width=8)
Output: l3.l_orderkey, l3.l_partkey, l3.l_suppkey, l3.l_linenumber, l3.l_quantity, l3.l_extendedprice, l3.l_discount, l
3.l_tax, l3.l_returnflag, l3.l_linestatus, l3.l_shipdate, l3.l_commitdate, l3.l_receiptdate, l3.l_shipinstruct, l3.l_shipmode, l3.l_comment
Index Cond: (l3.l_orderkey = l1.l_orderkey)
Filter: (l3.l_suppkey <> l1.l_suppkey)

curious: what was work_mem set to?

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

KaiGai Kohei

kaigai@ak.jp.nec.com

almost 11 years ago

In reply to: Merlin Moncure (#2)

Re: DBT-3 with SF=20 got failed

curious: what was work_mem set to?

work_mem=48GB

My machine mounts 256GB physical RAM.
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com]
Sent: Thursday, June 11, 2015 10:52 PM
To: Kaigai Kouhei(海外浩平)
Cc: pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] DBT-3 with SF=20 got failed

On Wed, Jun 10, 2015 at 9:57 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Hello,

I got the following error during DBT-3 benchmark with SF=20.

psql:query21.sql:50: ERROR: invalid memory alloc request size 1073741824
psql:query21.sql:50: ERROR: invalid memory alloc request size 1073741824

It looks to me Hash node tries to 1GB area using palloc0(), but it exceeds
the limitation of none-huge interface.

(gdb) bt
#0 0x00007f669d29a989 in raise () from /lib64/libc.so.6
#1 0x00007f669d29c098 in abort () from /lib64/libc.so.6
#2 0x000000000090ccfd in ExceptionalCondition (conditionName=0xb18130

"!(((Size) (size) <= ((Size) 0x3fffffff)))",

errorType=0xb17efd "FailedAssertion", fileName=0xb17e40 "mcxt.c",

lineNumber=856) at assert.c:54

#3 0x000000000093ad53 in palloc0 (size=1073741824) at mcxt.c:856
#4 0x0000000000673045 in ExecHashTableCreate (node=0x7f669de951f0,

hashOperators=0x24dbf90, keepNulls=0 '\000') at nodeHash.c:391

#5 0x00000000006752e1 in ExecHashJoin (node=0x24d74e0) at nodeHashjoin.c:169
#6 0x000000000065abf4 in ExecProcNode (node=0x24d74e0) at execProcnode.c:477
#7 0x0000000000681026 in ExecNestLoop (node=0x24d6668) at nodeNestloop.c:123
#8 0x000000000065abca in ExecProcNode (node=0x24d6668) at execProcnode.c:469
#9 0x0000000000681026 in ExecNestLoop (node=0x24d61f8) at nodeNestloop.c:123
#10 0x000000000065abca in ExecProcNode (node=0x24d61f8) at execProcnode.c:469
#11 0x0000000000681026 in ExecNestLoop (node=0x24d5478) at nodeNestloop.c:123
#12 0x000000000065abca in ExecProcNode (node=0x24d5478) at execProcnode.c:469
#13 0x0000000000681026 in ExecNestLoop (node=0x24d51d0) at nodeNestloop.c:123
#14 0x000000000065abca in ExecProcNode (node=0x24d51d0) at execProcnode.c:469

The attached patch replaces this palloc0() by MemoryContextAllocHuge() +

memset().

Indeed, this hash table is constructed towards the relation with

nrows=119994544,

so, it is not strange even if hash-slot itself is larger than 1GB.

Another allocation request potentially reset of expand hash-slot may also needs
to be "Huge" version of memory allocation, I think.

Thanks,

Below is the query itself and EXPLAIN result.
--------------------------------------------------------------------
dbt3c=# EXPLAIN VERBOSE
dbt3c-# select
dbt3c-# s_name,
dbt3c-# count(*) as numwait
dbt3c-# from
dbt3c-# supplier,
dbt3c-# lineitem l1,
dbt3c-# orders,
dbt3c-# nation
dbt3c-# where
dbt3c-# s_suppkey = l1.l_suppkey
dbt3c-# and o_orderkey = l1.l_orderkey
dbt3c-# and o_orderstatus = 'F'
dbt3c-# and l1.l_receiptdate > l1.l_commitdate
dbt3c-# and exists (
dbt3c(# select
dbt3c(# *
dbt3c(# from
dbt3c(# lineitem l2
dbt3c(# where
dbt3c(# l2.l_orderkey = l1.l_orderkey
dbt3c(# and l2.l_suppkey <> l1.l_suppkey
dbt3c(# )
dbt3c-# and not exists (
dbt3c(# select
dbt3c(# *
dbt3c(# from
dbt3c(# lineitem l3
dbt3c(# where
dbt3c(# l3.l_orderkey = l1.l_orderkey
dbt3c(# and l3.l_suppkey <> l1.l_suppkey
dbt3c(# and l3.l_receiptdate > l3.l_commitdate
dbt3c(# )
dbt3c-# and s_nationkey = n_nationkey
dbt3c-# and n_name = 'UNITED KINGDOM'
dbt3c-# group by
dbt3c-# s_name
dbt3c-# order by
dbt3c-# numwait desc,
dbt3c-# s_name
dbt3c-# LIMIT 100;

QUERY PLAN

----------------------------------------------------------------------------
----------------------------------------------------------------------

----------------------------------------------------------------------------
----------------------------------------------------------------------

------------------
Limit (cost=6792765.24..6792765.24 rows=1 width=26)
Output: supplier.s_name, (count(*))
-> Sort (cost=6792765.24..6792765.24 rows=1 width=26)
Output: supplier.s_name, (count(*))
Sort Key: (count(*)) DESC, supplier.s_name
-> HashAggregate (cost=6792765.22..6792765.23 rows=1 width=26)
Output: supplier.s_name, count(*)
Group Key: supplier.s_name
-> Nested Loop Anti Join (cost=4831094.94..6792765.21

rows=1 width=26)

Output: supplier.s_name
-> Nested Loop (cost=4831094.37..6792737.52 rows=1

width=34)

Output: supplier.s_name, l1.l_suppkey,

l1.l_orderkey

Join Filter: (supplier.s_nationkey =

nation.n_nationkey)

-> Nested Loop (cost=4831094.37..6792736.19

rows=1 width=38)

Output: supplier.s_name,

supplier.s_nationkey, l1.l_suppkey, l1.l_orderkey

-> Nested Loop

(cost=4831093.81..6792728.20 rows=1 width=42)

Output: supplier.s_name,

supplier.s_nationkey, l1.l_suppkey, l1.l_orderkey, l2.l_orderkey

Join Filter: (l1.l_suppkey =

supplier.s_suppkey)

-> Hash Semi Join

(cost=4831093.81..6783870.20 rows=1 width=12)

Output: l1.l_suppkey,

l1.l_orderkey, l2.l_orderkey

Hash Cond: (l1.l_orderkey =

l2.l_orderkey)

Join Filter: (l2.l_suppkey <>

l1.l_suppkey)

-> Index Scan using

lineitem_l_orderkey_idx_part1 on public.lineitem l1 (cost=0.57..1847781.73
rows

=39998181 width=8)
Output: l1.l_orderkey,

l1.l_partkey, l1.l_suppkey, l1.l_linenumber, l1.l_quantity, l1.l_extende

dprice, l1.l_discount, l1.l_tax, l1.l_returnflag, l1.l_linestatus,

l1.l_shipdate, l1.l_commitdate, l1.l_receiptdate, l1.l_shipinstruct,
l1.l_shipm

ode, l1.l_comment
-> Hash

(cost=3331161.44..3331161.44 rows=119994544 width=8)

Output: l2.l_orderkey,

l2.l_suppkey

-> Seq Scan on

public.lineitem l2 (cost=0.00..3331161.44 rows=119994544 width=8)

Output:

l2.l_orderkey, l2.l_suppkey

-> Seq Scan on public.supplier

(cost=0.00..6358.00 rows=200000 width=34)

Output: supplier.s_suppkey,

supplier.s_name, supplier.s_address, supplier.s_nationkey, supplier.s_pho

ne, supplier.s_acctbal, supplier.s_comment
-> Index Scan using

orders_o_orderkey_o_orderdate_idx on public.orders (cost=0.56..7.98 rows=1
width=4)

Output: orders.o_orderkey,

orders.o_custkey, orders.o_orderstatus, orders.o_totalprice,
orders.o_orderdate,

orders.o_orderpriority, orders.o_clerk, orders.o_shippriority,

orders.o_comment

Index Cond: (orders.o_orderkey =

l1.l_orderkey)

Filter: (orders.o_orderstatus =

'F'::bpchar)

-> Seq Scan on public.nation (cost=0.00..1.31

rows=1 width=4)

Output: nation.n_nationkey, nation.n_name,

nation.n_regionkey, nation.n_comment

Filter: (nation.n_name = 'UNITED

KINGDOM'::bpchar)

-> Index Scan using lineitem_l_orderkey_idx_part1 on

public.lineitem l3 (cost=0.57..13.69 rows=89 width=8)

Output: l3.l_orderkey, l3.l_partkey,

l3.l_suppkey, l3.l_linenumber, l3.l_quantity, l3.l_extendedprice,
l3.l_discount, l

3.l_tax, l3.l_returnflag, l3.l_linestatus, l3.l_shipdate, l3.l_commitdate,

l3.l_receiptdate, l3.l_shipinstruct, l3.l_shipmode, l3.l_comment

Index Cond: (l3.l_orderkey = l1.l_orderkey)
Filter: (l3.l_suppkey <> l1.l_suppkey)

curious: what was work_mem set to?

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jan Wieck

JanWieck@Yahoo.com

almost 11 years ago

In reply to: KaiGai Kohei (#3)

Re: DBT-3 with SF=20 got failed

On 06/11/2015 09:53 AM, Kouhei Kaigai wrote:

curious: what was work_mem set to?

work_mem=48GB

My machine mounts 256GB physical RAM.

work_mem can be allocated several times per backend. Nodes like sort and
hash_aggregate may each allocate that much. You should set work_mem to a
fraction of physical-RAM / concurrent-connections depending on the
complexity of your queries. 48GB does not sound reasonable.

Regards, Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

almost 11 years ago

In reply to: KaiGai Kohei (#1)

Re: DBT-3 with SF=20 got failed

On Wed, Jun 10, 2015 at 10:57 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

The attached patch replaces this palloc0() by MemoryContextAllocHuge() + memset().
Indeed, this hash table is constructed towards the relation with nrows=119994544,
so, it is not strange even if hash-slot itself is larger than 1GB.

You forgot to attach the patch, I think. It looks to me like the size
of a HashJoinTuple is going to be 16 bytes, so 1GB/16 = ~64 million.
That's a lot of buckets, but maybe not unreasonably many if you've got
enough memory.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

KaiGai Kohei

kaigai@ak.jp.nec.com

almost 11 years ago

In reply to: Jan Wieck (#4)

Re: DBT-3 with SF=20 got failed

2015-06-11 23:20 GMT+09:00 Jan Wieck <jan@wi3ck.info>:

On 06/11/2015 09:53 AM, Kouhei Kaigai wrote:

curious: what was work_mem set to?

work_mem=48GB

My machine mounts 256GB physical RAM.

work_mem can be allocated several times per backend. Nodes like sort and
hash_aggregate may each allocate that much. You should set work_mem to a
fraction of physical-RAM / concurrent-connections depending on the
complexity of your queries. 48GB does not sound reasonable.

Smaller number of max_connections and large work_mem configuration are
usual for typical OLAP workloads.

Even if configuration is not reasonable, it is not a right error message.
People cannot understand how to fix it.

psql:query21.sql:50: ERROR: invalid memory alloc request size 1073741824

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Jan Wieck (#4)

Re: DBT-3 with SF=20 got failed

Hi,

On 06/11/15 16:20, Jan Wieck wrote:

On 06/11/2015 09:53 AM, Kouhei Kaigai wrote:

curious: what was work_mem set to?

work_mem=48GB

My machine mounts 256GB physical RAM.

work_mem can be allocated several times per backend. Nodes like sort
and hash_aggregate may each allocate that much. You should set
work_mem to a fraction of physical-RAM / concurrent-connections
depending on the complexity of your queries. 48GB does not sound
reasonable.

That's true, but there are cases where values like this may be useful
(e.g. for a particular query). We do allow such work_mem values, so I
consider this failure to be a bug.

It probably existed in the past, but was amplified by the hash join
improvements I did for 9.5, because that uses NTUP_PER_BUCKET=1 instead
of NTUP_PER_BUCKET=10. So the arrays of buckets are much larger, and we
also much more memory than we had in the past.

Interestingly, the hash code checks for INT_MAX overflows on a number of
places, but does not check for this ...

regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

KaiGai Kohei

kaigai@ak.jp.nec.com

almost 11 years ago

In reply to: Robert Haas (#5)

Re: DBT-3 with SF=20 got failed

2015-06-11 23:28 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:

On Wed, Jun 10, 2015 at 10:57 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

The attached patch replaces this palloc0() by MemoryContextAllocHuge() + memset().
Indeed, this hash table is constructed towards the relation with nrows=119994544,
so, it is not strange even if hash-slot itself is larger than 1GB.

You forgot to attach the patch, I think.

Oops, I forgot to attach indeed.

It looks to me like the size
of a HashJoinTuple is going to be 16 bytes, so 1GB/16 = ~64 million.
That's a lot of buckets, but maybe not unreasonably many if you've got
enough memory.

EXPLAIN says, this Hash node takes underlying SeqScan with
119994544 (~119 million) rows, but it is much smaller than my
work_mem setting.

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

KaiGai Kohei

kaigai@ak.jp.nec.com

almost 11 years ago

In reply to: Tomas Vondra (#7)

Re: DBT-3 with SF=20 got failed

2015-06-11 23:33 GMT+09:00 Tomas Vondra <tomas.vondra@2ndquadrant.com>:

Hi,

On 06/11/15 16:20, Jan Wieck wrote:

On 06/11/2015 09:53 AM, Kouhei Kaigai wrote:

curious: what was work_mem set to?

work_mem=48GB

My machine mounts 256GB physical RAM.

work_mem can be allocated several times per backend. Nodes like sort
and hash_aggregate may each allocate that much. You should set
work_mem to a fraction of physical-RAM / concurrent-connections
depending on the complexity of your queries. 48GB does not sound
reasonable.

That's true, but there are cases where values like this may be useful (e.g.
for a particular query). We do allow such work_mem values, so I consider
this failure to be a bug.

It probably existed in the past, but was amplified by the hash join
improvements I did for 9.5, because that uses NTUP_PER_BUCKET=1 instead of
NTUP_PER_BUCKET=10. So the arrays of buckets are much larger, and we also
much more memory than we had in the past.

Interestingly, the hash code checks for INT_MAX overflows on a number of
places, but does not check for this ...

Which number should be changed in this case?

Indeed, nbuckets is declared as int, so INT_MAX is hard limit of hash-slot.
However, some extreme usage can easily create a situation that we shall
touch this restriction.

Do we have nbuckets using long int?

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Tom Lane

tgl@sss.pgh.pa.us

almost 11 years ago

In reply to: Tomas Vondra (#7)

Re: DBT-3 with SF=20 got failed

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

Interestingly, the hash code checks for INT_MAX overflows on a number of
places, but does not check for this ...

Yeah, and at least at one time there were checks to prevent the hash table
request from exceeding MaxAllocSize. Did those get removed by somebody?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Tom Lane (#10)

Re: DBT-3 with SF=20 got failed

On 06/11/15 16:54, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

Interestingly, the hash code checks for INT_MAX overflows on a number of
places, but does not check for this ...

Yeah, and at least at one time there were checks to prevent the hash
table request from exceeding MaxAllocSize. Did those get removed by
somebody?

I think the problem is in this piece of code:

if ((hashtable->nbatch == 1) &&
(hashtable->nbuckets_optimal <= INT_MAX / 2) &&
/* overflow protection */
(ntuples >= (hashtable->nbuckets_optimal * NTUP_PER_BUCKET)))
{
hashtable->nbuckets_optimal *= 2;
hashtable->log2_nbuckets_optimal += 1;
}

ISTM it does not check against the max_pointers (that's only done in
ExecChooseHashTableSize).

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Robert Haas (#5)

Re: DBT-3 with SF=20 got failed

Hi,

On 06/11/15 16:28, Robert Haas wrote:

On Wed, Jun 10, 2015 at 10:57 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

The attached patch replaces this palloc0() by MemoryContextAllocHuge() + memset().
Indeed, this hash table is constructed towards the relation with nrows=119994544,
so, it is not strange even if hash-slot itself is larger than 1GB.

You forgot to attach the patch, I think. It looks to me like the size
of a HashJoinTuple is going to be 16 bytes, so 1GB/16 = ~64 million.
That's a lot of buckets, but maybe not unreasonably many if you've got
enough memory.

Actually, HashJoinTuple is just a pointer, so it's 8 bytes, so 1GB is
enough for 134217728 million rows, which is more than the 119994544 rows
from the plan.

Also, looking at the error message again:

ERROR: invalid memory alloc request size 1073741824

but this time with beer goggles, I noticed that the amount reported is
exactly 1GB. The backtrace also shows the error happens right inside
ExecHashTableCreate (and not in the resize which may happen later),
which means it gets the nbuckets from ExecChooseHashTableSize directly.

The resize is probably still broken as I mentioned before, but this
crash before reaching that code as the estimates are high enough to
trigger the issue. But ExecChooseHashTableSize is supposed to keep all
the checks from previous versions, and I think it actually does.

But I don't see there any checks regarding the 1GB boundary. What I see
is this:

max_pointers = (work_mem * 1024L) / sizeof(void *);
max_pointers = Min(max_pointers, INT_MAX / 2);

...

dbuckets = Min(dbuckets, max_pointers);

That has nothing to do with 1GB, and it's in the code since the time
work_mem was limited by 2GB, so perhaps there was some reasoning that
it's sufficient (because the tuples stored in the hash table will need
more than 1/2 of the memory, or something like that).

But today this issue is more likely, because people have more RAM and
use higher work_mem values, so the max_pointers value gets much higher.
In the extreme it may get to INT_MAX/2, so ~1 billion, so the buckets
would allocate ~8B on 64-bit machines (on 32-bit machines we'd also get
twice the number of pointers, compared to 64 bits, but that's mostly
irrelevant, because of the memory size limits).

It's also true, that the hash-join improvements in 9.5 - namely the
decrease of NTUP_PER_BUCKET from 10 to 1, made this error more likely.
With 9.4 we'd use only 16777216 buckets (128MB), because that gets us
below 10 tuples per bucket. But now we're shooting for 1 tuple per
bucket, so we end up with 131M buckets, and that's 1GB.

I see two ways to fix this:

(1) enforce the 1GB limit (probably better for back-patching, if that's
necessary)

(2) make it work with hash tables over 1GB

I'm in favor of (2) if there's a good way to do that. It seems a bit
stupid not to be able to use fast hash table because there's some
artificial limit. Are there any fundamental reasons not to use the
MemoryContextAllocHuge fix, proposed by KaiGai-san?

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

David Rowley

dgrowleyml@gmail.com

over 10 years ago

In reply to: KaiGai Kohei (#8)

Re: DBT-3 with SF=20 got failed

On 12 June 2015 at 02:40, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

2015-06-11 23:28 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:

On Wed, Jun 10, 2015 at 10:57 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com>

wrote:

The attached patch replaces this palloc0() by MemoryContextAllocHuge()

+ memset().

Indeed, this hash table is constructed towards the relation with

nrows=119994544,

so, it is not strange even if hash-slot itself is larger than 1GB.

You forgot to attach the patch, I think.

Oops, I forgot to attach indeed.

It looks to me like the size
of a HashJoinTuple is going to be 16 bytes, so 1GB/16 = ~64 million.
That's a lot of buckets, but maybe not unreasonably many if you've got
enough memory.

EXPLAIN says, this Hash node takes underlying SeqScan with
119994544 (~119 million) rows, but it is much smaller than my
work_mem setting.

I've just run into this problem while running a TPC-H benchmark of 100GB,
on a machine with 64GB of RAM.
When attempting to run Q21 with a work_mem of 10GB I'm getting:
ERROR: invalid memory alloc request size 1073741824

Setting work_mem to 1GB or less gets the query running.

I've patched the code with your patch Kohei, and it's now working.

Thought I'd better post this just in case this gets forgotten about.

Thanks

David

--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

#14

Simon Riggs

simon@2ndQuadrant.com

over 10 years ago

In reply to: Tomas Vondra (#12)

Re: DBT-3 with SF=20 got failed

On 12 June 2015 at 00:29, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

I see two ways to fix this:

(1) enforce the 1GB limit (probably better for back-patching, if that's
necessary)

(2) make it work with hash tables over 1GB

I'm in favor of (2) if there's a good way to do that. It seems a bit
stupid not to be able to use fast hash table because there's some
artificial limit. Are there any fundamental reasons not to use the
MemoryContextAllocHuge fix, proposed by KaiGai-san?

If there are no objections, I will apply the patch for 2) to HEAD and
backpatch to 9.5.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#15

KaiGai Kohei

kaigai@ak.jp.nec.com

over 10 years ago

In reply to: Simon Riggs (#14)

Re: DBT-3 with SF=20 got failed

2015-08-19 20:12 GMT+09:00 Simon Riggs <simon@2ndquadrant.com>:

On 12 June 2015 at 00:29, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

I see two ways to fix this:

(1) enforce the 1GB limit (probably better for back-patching, if that's
necessary)

(2) make it work with hash tables over 1GB

I'm in favor of (2) if there's a good way to do that. It seems a bit
stupid not to be able to use fast hash table because there's some artificial
limit. Are there any fundamental reasons not to use the
MemoryContextAllocHuge fix, proposed by KaiGai-san?

If there are no objections, I will apply the patch for 2) to HEAD and
backpatch to 9.5.

Please don't be rush. :-)

It is not difficult to replace palloc() by palloc_huge(), however, it may lead
another problem once planner gives us a crazy estimation.
Below is my comment on the another thread.

==========
Also, we may need to pay attention to reliability of scale estimation
by planner.
Even though the plan below says that Join generates 60521928028 rows,
it actually generates 776157676 rows (0.12%).

tpcds100=# EXPLAIN ANALYZE select
ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2
from web_sales ws1,web_sales ws2
where ws1.ws_order_number = ws2.ws_order_number
and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=25374644.08..1160509591.61 rows=60521928028
width=24) (actual time=138347.979..491889.343 rows=776157676 loops=1)
Merge Cond: (ws1.ws_order_number = ws2.ws_order_number)
Join Filter: (ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
Rows Removed by Join Filter: 127853313
-> Sort (cost=12687322.04..12867325.16 rows=72001248 width=16)
(actual time=73252.300..79017.420 rows=72001237 loops=1)
Sort Key: ws1.ws_order_number
Sort Method: quicksort Memory: 7083296kB
-> Seq Scan on web_sales ws1 (cost=0.00..3290612.48
rows=72001248 width=16) (actual time=0.023..39951.201 rows=72001237
loops=1)
-> Sort (cost=12687322.04..12867325.16 rows=72001248 width=16)
(actual time=65095.655..128885.811 rows=904010978 loops=1)
Sort Key: ws2.ws_order_number
Sort Method: quicksort Memory: 7083296kB
-> Seq Scan on web_sales ws2 (cost=0.00..3290612.48
rows=72001248 width=16) (actual time=0.014..31046.888 rows=72001237
loops=1)
Planning time: 0.232 ms
Execution time: 530176.521 ms
(14 rows)

So, even if we allows nodeHash.c to allocate hash buckets larger than
1GB, its initial size may be determined carefully.
Probably, 1GB is a good starting point even if expanded later.

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Simon Riggs

simon@2ndQuadrant.com

over 10 years ago

In reply to: KaiGai Kohei (#15)

Re: DBT-3 with SF=20 got failed

On 19 August 2015 at 12:55, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

2015-08-19 20:12 GMT+09:00 Simon Riggs <simon@2ndquadrant.com>:

On 12 June 2015 at 00:29, Tomas Vondra <tomas.vondra@2ndquadrant.com>

wrote:

I see two ways to fix this:

(1) enforce the 1GB limit (probably better for back-patching, if that's
necessary)

(2) make it work with hash tables over 1GB

I'm in favor of (2) if there's a good way to do that. It seems a bit
stupid not to be able to use fast hash table because there's some

artificial

limit. Are there any fundamental reasons not to use the
MemoryContextAllocHuge fix, proposed by KaiGai-san?

If there are no objections, I will apply the patch for 2) to HEAD and
backpatch to 9.5.

Please don't be rush. :-)

Please explain what rush you see?

It is not difficult to replace palloc() by palloc_huge(), however, it may
lead
another problem once planner gives us a crazy estimation.
Below is my comment on the another thread.

Yes, I can read both threads and would apply patches for each problem.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#17

KaiGai Kohei

kaigai@ak.jp.nec.com

over 10 years ago

In reply to: Simon Riggs (#16)

Re: DBT-3 with SF=20 got failed

2015-08-19 21:29 GMT+09:00 Simon Riggs <simon@2ndquadrant.com>:

On 19 August 2015 at 12:55, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

2015-08-19 20:12 GMT+09:00 Simon Riggs <simon@2ndquadrant.com>:

On 12 June 2015 at 00:29, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

I see two ways to fix this:

(1) enforce the 1GB limit (probably better for back-patching, if that's
necessary)

(2) make it work with hash tables over 1GB

I'm in favor of (2) if there's a good way to do that. It seems a bit
stupid not to be able to use fast hash table because there's some
artificial
limit. Are there any fundamental reasons not to use the
MemoryContextAllocHuge fix, proposed by KaiGai-san?

If there are no objections, I will apply the patch for 2) to HEAD and
backpatch to 9.5.

Please don't be rush. :-)

Please explain what rush you see?

Unless we have no fail-safe mechanism when planner estimated too
large number of tuples than actual needs, a strange estimation will
consume massive amount of RAMs. It's a bad side effect.
My previous patch didn't pay attention to the scenario, so needs to
revise the patch.

Thanks,

It is not difficult to replace palloc() by palloc_huge(), however, it may
lead
another problem once planner gives us a crazy estimation.
Below is my comment on the another thread.

Yes, I can read both threads and would apply patches for each problem.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Simon Riggs (#16)

Re: DBT-3 with SF=20 got failed

Simon Riggs <simon@2ndQuadrant.com> writes:

On 19 August 2015 at 12:55, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

Please don't be rush. :-)

Please explain what rush you see?

Yours. You appear to be in a hurry to apply patches that there's no
consensus on.

It is not difficult to replace palloc() by palloc_huge(), however, it
may lead another problem once planner gives us a crazy estimation.
Below is my comment on the another thread.

Yes, I can read both threads and would apply patches for each problem.

I don't see anything very wrong with constraining the initial allocation
to 1GB, or even less. That will prevent consuming insane amounts of
work_mem when the planner's rows estimate is too high rather than too low.
And we do have the ability to increase the hash table size on the fly.

The real problem here is the potential integer overflow in
ExecChooseHashTableSize. Now that we know there is one, that should be
fixed (and not only in HEAD/9.5). But I believe Kaigai-san withdrew his
initial proposed patch, and we don't have a replacement as far as I saw.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Simon Riggs

simon@2ndQuadrant.com

over 10 years ago

In reply to: Tom Lane (#18)

Re: DBT-3 with SF=20 got failed

On 19 August 2015 at 14:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

On 19 August 2015 at 12:55, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

Please don't be rush. :-)

Please explain what rush you see?

Yours. You appear to be in a hurry to apply patches that there's no
consensus on.

I think that comment is unreasonable.

The problem was reported 2 months ago; following independent confirmation
of the proposed patch, I suggested committing it, with these words:

"If there are no objections, I will apply the patch for 2) to HEAD and
backpatch to 9.5."

I was clearly waiting for objections before acting, to test whether there
was consensus or not.

Please explain what procedure you would like committers to follow?

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#20

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: KaiGai Kohei (#15)

Re: DBT-3 with SF=20 got failed

Hi,

On 08/19/2015 01:55 PM, Kohei KaiGai wrote:

Merge Join (cost=25374644.08..1160509591.61 rows=60521928028
width=24) (actual time=138347.979..491889.343 rows=776157676 loops=1)
Merge Cond: (ws1.ws_order_number = ws2.ws_order_number)
Join Filter: (ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
Rows Removed by Join Filter: 127853313
-> Sort (cost=12687322.04..12867325.16 rows=72001248 width=16)
(actual time=73252.300..79017.420 rows=72001237 loops=1)
Sort Key: ws1.ws_order_number
Sort Method: quicksort Memory: 7083296kB
-> Seq Scan on web_sales ws1 (cost=0.00..3290612.48
rows=72001248 width=16) (actual time=0.023..39951.201 rows=72001237
loops=1)
-> Sort (cost=12687322.04..12867325.16 rows=72001248 width=16)
(actual time=65095.655..128885.811 rows=904010978 loops=1)
Sort Key: ws2.ws_order_number
Sort Method: quicksort Memory: 7083296kB
-> Seq Scan on web_sales ws2 (cost=0.00..3290612.48
rows=72001248 width=16) (actual time=0.014..31046.888 rows=72001237
loops=1)
Planning time: 0.232 ms
Execution time: 530176.521 ms
(14 rows)

So, even if we allows nodeHash.c to allocate hash buckets larger than
1GB, its initial size may be determined carefully.
Probably, 1GB is a good starting point even if expanded later.

I'm not sure I understand what is the problem here? Could you elaborate?

The initial size of the hash table is determined using the estimate, and
if we overestimate it will create more buckets (i.e. consuming more
memory) and/or start batching (which might be unnecessary).

But I don't really see any "more careful" way to do this, without
penalizing the cases where the estimate is actually correct - e.g. by
starting with much smaller buckets (and then resizing the hash table,
which is not free). Or by starting without batching, betting that we
won't actually need it.

I think it'll be very difficult to get those working without causing
real trouble to cases where we actually do have good estimates (and
those are vast majority of queries).

But both of those are features, and we're dealing with a bug fix here.

kind regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: KaiGai Kohei (#17)

#22

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Tomas Vondra (#21)

#23

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Tom Lane (#18)

#24

KaiGai Kohei

kaigai@ak.jp.nec.com

over 10 years ago

In reply to: Tomas Vondra (#23)

#25

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: KaiGai Kohei (#24)

#26

KaiGai Kohei

kaigai@ak.jp.nec.com

over 10 years ago

In reply to: Tomas Vondra (#25)

#27

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: KaiGai Kohei (#8)

#28

KaiGai Kohei

kaigai@ak.jp.nec.com

over 10 years ago

In reply to: Tomas Vondra (#27)

#29

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: KaiGai Kohei (#28)

#30

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Robert Haas (#29)

#31

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Tomas Vondra (#30)

#32

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Robert Haas (#31)

#33

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Tomas Vondra (#32)

#34

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Robert Haas (#33)

#35

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Tomas Vondra (#34)

#36

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Robert Haas (#35)

#37

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Tomas Vondra (#36)

#38

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Tom Lane (#37)

#39

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Tomas Vondra (#38)

#40

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Robert Haas (#39)

#41

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Tomas Vondra (#40)

#42

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Robert Haas (#41)

#43

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Robert Haas (#41)

#44

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Tom Lane (#42)

#45

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Tomas Vondra (#44)

#46

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Tom Lane (#45)

#47

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Tomas Vondra (#43)

#48

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Robert Haas (#47)

#49

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Tomas Vondra (#48)

#50

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Robert Haas (#49)

#51

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Tomas Vondra (#50)

DBT-3 with SF=20 got failed

Attachments: