Memory Issue with array_agg?
When using array_agg on a large table, memory usage seems to spike up until
Postgres crashes with the following error:
2013-08-17 18:41:02 UTC [2716]: [2] WARNING: terminating connection because
of crash of another server process
2013-08-17 18:41:02 UTC [2716]: [3] DETAIL: The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2013-08-17 18:41:02 UTC [2716]: [4] HINT: In a moment you should be able to
reconnect to the database and repeat your command.
I've definitely isolated it down to using array_agg, as when I changed the
query to use string_agg, it worked fine. I also tried using array_agg on a
few different queries, all yielding the same issue. Swapping in string_agg
fixed the issue once more.
This particular table has over 2 million rows and is 1.2 Gigs, and when I
ran the query while viewing htop, the virtual size of the Postgres process
ballooned to 13.9G until crashing.
The version of Postgres I am using is: PostgreSQL 9.2.4 on
x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1)
4.7.2, 64-bit
Any help would be much appreciated, thanks!
-Robert
## Robert Sosinski (rsosinski@ticketevolution.com):
When using array_agg on a large table, memory usage seems to spike up until
Postgres crashes with the following error:
This sounds like bug #7916.
/messages/by-id/E1UCeEU-0004hY-Pq@wrigleys.postgresql.org
As noted in that thread, changing the AllocSet parameters for
acummArrayResult may help (call to AllocSetContextCreate() in
ArrayBuildState *accumArrayResult(), src/backend/utils/adt/arrayfuncs.c,
change ALLOCSET_DEFAULT_*SIZE to ALLOCSET_SMALL_*SIZE).
Also, lowering work_mem may help, depending on your SQL.
Regards,
Christoph
--
Spare Space
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hello
please, can you send some example or test?
Regards
Pavel Stehule
2013/8/19 Robert Sosinski <rsosinski@ticketevolution.com>
Show quoted text
When using array_agg on a large table, memory usage seems to spike up
until Postgres crashes with the following error:2013-08-17 18:41:02 UTC [2716]: [2] WARNING: terminating connection
because of crash of another server process
2013-08-17 18:41:02 UTC [2716]: [3] DETAIL: The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2013-08-17 18:41:02 UTC [2716]: [4] HINT: In a moment you should be able
to reconnect to the database and repeat your command.I've definitely isolated it down to using array_agg, as when I changed the
query to use string_agg, it worked fine. I also tried using array_agg on a
few different queries, all yielding the same issue. Swapping in string_agg
fixed the issue once more.This particular table has over 2 million rows and is 1.2 Gigs, and when I
ran the query while viewing htop, the virtual size of the Postgres
process ballooned to 13.9G until crashing.The version of Postgres I am using is: PostgreSQL 9.2.4 on
x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1)
4.7.2, 64-bitAny help would be much appreciated, thanks!
-Robert
Hi Pavel,
What kind of example do you need? I cant give you the actual data I have
in the table, but I can give you an example query and the schema attached
below. From there, I would just put in 2 million rows worth 1.2 Gigs of
data. Average size of the the extended columns (using the pg_column_size
function) in bytes are:
guid: 33
name: 2.41
currency: 4
fields: 120.32
example query:
-- find duplicate records using a guid
select guid, array_agg(id) from orders group by guid;
example schema:
Table "public.things"
Column | Type | Modifiers
| Storage | Stats target | Description
------------+-----------------------------+-----------------------------------------------------+----------+--------------+-------------
id | integer | not null default
nextval('things_id_seq'::regclass) | plain | |
version | integer | not null
| plain | |
created_at | timestamp without time zone | not null
| plain | |
updated_at | timestamp without time zone | not null
| plain | |
foo_id | integer | not null
| plain | |
bar_id | integer | not null
| plain | |
baz_id | integer | not null
| plain | |
guid | character varying | not null
| extended | |
name | character varying | not null
| extended | |
price | numeric(12,2) | not null
| main | |
currency | character varying | not null
| extended | |
amount | integer | not null
| plain | |
the_date | date | not null
| plain | |
fields | hstore |
| extended | |
Indexes:
"things_pkey" PRIMARY KEY, btree (id)
"things_foo_id_idx" btree (foo_id)
"things_bar_id_idx" btree (bar_id)
"things_baz_id_idx" btree (baz_id)
"things_guid_uidx" UNIQUE, btree (guid)
"things_lpad_lower_name_eidx" btree (lpad(lower(name::text), 10,
'0'::text))
"things_price_idx" btree (price)
Foreign-key constraints:
"things_foo_id_fkey" FOREIGN KEY (foo_id) REFERENCES foos(id)
"things_bar_id_fkey" FOREIGN KEY (bar_id) REFERENCES bars(id)
"things_baz_id_fkey" FOREIGN KEY (baz_id) REFERENCES bazs(id)
Triggers:
timestamps_trig BEFORE INSERT OR UPDATE ON things FOR EACH ROW EXECUTE
PROCEDURE timestamps_tfun()
Let me know if you need anything else.
Thanks,
On Mon, Aug 19, 2013 at 3:29 AM, Pavel Stehule <pavel.stehule@gmail.com>wrote:
Show quoted text
Hello
please, can you send some example or test?
Regards
Pavel Stehule
2013/8/19 Robert Sosinski <rsosinski@ticketevolution.com>
When using array_agg on a large table, memory usage seems to spike up
until Postgres crashes with the following error:2013-08-17 18:41:02 UTC [2716]: [2] WARNING: terminating connection
because of crash of another server process
2013-08-17 18:41:02 UTC [2716]: [3] DETAIL: The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2013-08-17 18:41:02 UTC [2716]: [4] HINT: In a moment you should be able
to reconnect to the database and repeat your command.I've definitely isolated it down to using array_agg, as when I changed
the query to use string_agg, it worked fine. I also tried using array_agg
on a few different queries, all yielding the same issue. Swapping in
string_agg fixed the issue once more.This particular table has over 2 million rows and is 1.2 Gigs, and when I
ran the query while viewing htop, the virtual size of the Postgres
process ballooned to 13.9G until crashing.The version of Postgres I am using is: PostgreSQL 9.2.4 on
x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1)
4.7.2, 64-bitAny help would be much appreciated, thanks!
-Robert
2013/8/19 Robert Sosinski <rsosinski@ticketevolution.com>
Hi Pavel,
What kind of example do you need? I cant give you the actual data I have
in the table, but I can give you an example query and the schema attached
below. From there, I would just put in 2 million rows worth 1.2 Gigs of
data. Average size of the the extended columns (using the pg_column_size
function) in bytes are:guid: 33
name: 2.41
currency: 4
fields: 120.32example query:
-- find duplicate records using a guid
select guid, array_agg(id) from orders group by guid;
how much distinct guid is there, and how much duplicates
??
regards
Pavel
Show quoted text
example schema:
Table "public.things"Column | Type | Modifiers
| Storage | Stats target | Description------------+-----------------------------+-----------------------------------------------------+----------+--------------+-------------
id | integer | not null default
nextval('things_id_seq'::regclass) | plain | |
version | integer | not null
| plain | |
created_at | timestamp without time zone | not null
| plain | |
updated_at | timestamp without time zone | not null
| plain | |
foo_id | integer | not null
| plain | |
bar_id | integer | not null
| plain | |
baz_id | integer | not null
| plain | |
guid | character varying | not null
| extended | |
name | character varying | not null
| extended | |
price | numeric(12,2) | not null
| main | |
currency | character varying | not null
| extended | |
amount | integer | not null
| plain | |
the_date | date | not null
| plain | |
fields | hstore |
| extended | |
Indexes:
"things_pkey" PRIMARY KEY, btree (id)
"things_foo_id_idx" btree (foo_id)
"things_bar_id_idx" btree (bar_id)
"things_baz_id_idx" btree (baz_id)
"things_guid_uidx" UNIQUE, btree (guid)
"things_lpad_lower_name_eidx" btree (lpad(lower(name::text), 10,
'0'::text))
"things_price_idx" btree (price)Foreign-key constraints:
"things_foo_id_fkey" FOREIGN KEY (foo_id) REFERENCES foos(id)
"things_bar_id_fkey" FOREIGN KEY (bar_id) REFERENCES bars(id)
"things_baz_id_fkey" FOREIGN KEY (baz_id) REFERENCES bazs(id)
Triggers:
timestamps_trig BEFORE INSERT OR UPDATE ON things FOR EACH ROW EXECUTE
PROCEDURE timestamps_tfun()Let me know if you need anything else.
Thanks,
On Mon, Aug 19, 2013 at 3:29 AM, Pavel Stehule <pavel.stehule@gmail.com>wrote:
Hello
please, can you send some example or test?
Regards
Pavel Stehule
2013/8/19 Robert Sosinski <rsosinski@ticketevolution.com>
When using array_agg on a large table, memory usage seems to spike up
until Postgres crashes with the following error:2013-08-17 18:41:02 UTC [2716]: [2] WARNING: terminating connection
because of crash of another server process
2013-08-17 18:41:02 UTC [2716]: [3] DETAIL: The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2013-08-17 18:41:02 UTC [2716]: [4] HINT: In a moment you should be able
to reconnect to the database and repeat your command.I've definitely isolated it down to using array_agg, as when I changed
the query to use string_agg, it worked fine. I also tried using array_agg
on a few different queries, all yielding the same issue. Swapping in
string_agg fixed the issue once more.This particular table has over 2 million rows and is 1.2 Gigs, and when
I ran the query while viewing htop, the virtual size of the Postgres
process ballooned to 13.9G until crashing.The version of Postgres I am using is: PostgreSQL 9.2.4 on
x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1)
4.7.2, 64-bitAny help would be much appreciated, thanks!
-Robert
At the moment, all guids are distinct, however before I zapped the
duplicates, there were 280 duplicates.
Currently, there are over 2 million distinct guids.
-Robert
On Mon, Aug 19, 2013 at 11:12 AM, Pavel Stehule <pavel.stehule@gmail.com>wrote:
Show quoted text
2013/8/19 Robert Sosinski <rsosinski@ticketevolution.com>
Hi Pavel,
What kind of example do you need? I cant give you the actual data I have
in the table, but I can give you an example query and the schema attached
below. From there, I would just put in 2 million rows worth 1.2 Gigs of
data. Average size of the the extended columns (using the pg_column_size
function) in bytes are:guid: 33
name: 2.41
currency: 4
fields: 120.32example query:
-- find duplicate records using a guid
select guid, array_agg(id) from orders group by guid;how much distinct guid is there, and how much duplicates
??
regards
Pavel
example schema:
Table "public.things"Column | Type |
Modifiers | Storage | Stats target | Description------------+-----------------------------+-----------------------------------------------------+----------+--------------+-------------
id | integer | not null default
nextval('things_id_seq'::regclass) | plain | |
version | integer | not null
| plain | |
created_at | timestamp without time zone | not null
| plain | |
updated_at | timestamp without time zone | not null
| plain | |
foo_id | integer | not null
| plain | |
bar_id | integer | not null
| plain | |
baz_id | integer | not null
| plain | |
guid | character varying | not null
| extended | |
name | character varying | not null
| extended | |
price | numeric(12,2) | not null
| main | |
currency | character varying | not null
| extended | |
amount | integer | not null
| plain | |
the_date | date | not null
| plain | |
fields | hstore |
| extended | |
Indexes:
"things_pkey" PRIMARY KEY, btree (id)
"things_foo_id_idx" btree (foo_id)
"things_bar_id_idx" btree (bar_id)
"things_baz_id_idx" btree (baz_id)
"things_guid_uidx" UNIQUE, btree (guid)
"things_lpad_lower_name_eidx" btree (lpad(lower(name::text), 10,
'0'::text))
"things_price_idx" btree (price)Foreign-key constraints:
"things_foo_id_fkey" FOREIGN KEY (foo_id) REFERENCES foos(id)
"things_bar_id_fkey" FOREIGN KEY (bar_id) REFERENCES bars(id)
"things_baz_id_fkey" FOREIGN KEY (baz_id) REFERENCES bazs(id)
Triggers:
timestamps_trig BEFORE INSERT OR UPDATE ON things FOR EACH ROW
EXECUTE PROCEDURE timestamps_tfun()Let me know if you need anything else.
Thanks,
On Mon, Aug 19, 2013 at 3:29 AM, Pavel Stehule <pavel.stehule@gmail.com>wrote:
Hello
please, can you send some example or test?
Regards
Pavel Stehule
2013/8/19 Robert Sosinski <rsosinski@ticketevolution.com>
When using array_agg on a large table, memory usage seems to spike up
until Postgres crashes with the following error:2013-08-17 18:41:02 UTC [2716]: [2] WARNING: terminating connection
because of crash of another server process
2013-08-17 18:41:02 UTC [2716]: [3] DETAIL: The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
2013-08-17 18:41:02 UTC [2716]: [4] HINT: In a moment you should be
able to reconnect to the database and repeat your command.I've definitely isolated it down to using array_agg, as when I changed
the query to use string_agg, it worked fine. I also tried using array_agg
on a few different queries, all yielding the same issue. Swapping in
string_agg fixed the issue once more.This particular table has over 2 million rows and is 1.2 Gigs, and when
I ran the query while viewing htop, the virtual size of the Postgres
process ballooned to 13.9G until crashing.The version of Postgres I am using is: PostgreSQL 9.2.4 on
x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1)
4.7.2, 64-bitAny help would be much appreciated, thanks!
-Robert
Hello
It is strange. I am trying to simulate it without success. On 1 M rows
where every id is 2 times duplicated
processing string_agg .. cca 30MB
processing array_agg cca 32MB
postgres=# create table foo(a int, b varchar);
CREATE TABLE
postgres=# insert into foo select i, md5(i::text) from
generate_series(1,1000000) g(i);
INSERT 0 1000000
postgres=# insert into foo select i, md5(i::text) from
generate_series(1,1000000) g(i);
INSERT 0 1000000
postgres=# CREATE INDEX on foo(b);
CREATE INDEX
postgres=# ANALYZE foo;
ANALYZE
postgres=# explain analyze select string_agg(a::text,',') from foo group by
b;
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=410045.19..447831.37 rows=1022895 width=37) (actual
time=10195.972..14993.493 rows=1000000 loops=1)
-> Sort (cost=410045.19..415045.19 rows=2000000 width=37) (actual
time=10195.944..13659.985 rows=2000000 loops=1)
Sort Key: b
Sort Method: external merge Disk: 97768kB
-> Seq Scan on foo (cost=0.00..36667.00 rows=2000000 width=37)
(actual time=0.018..321.197 rows=2000000 loops=1)
Total runtime: 15066.397 ms
(6 rows)
postgres=# explain analyze select array_agg(a::text) from foo group by b;
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=410045.19..447831.37 rows=1022895 width=37) (actual
time=10062.095..15697.755 rows=1000000 loops=1)
-> Sort (cost=410045.19..415045.19 rows=2000000 width=37) (actual
time=10062.059..13613.300 rows=2000000 loops=1)
Sort Key: b
Sort Method: external merge Disk: 97768kB
-> Seq Scan on foo (cost=0.00..36667.00 rows=2000000 width=37)
(actual time=0.029..311.423 rows=2000000 loops=1)
Total runtime: 15799.226 ms
(6 rows)
Regards
Pavel
2013/8/19 Robert Sosinski <rsosinski@ticketevolution.com>
Show quoted text
At the moment, all guids are distinct, however before I zapped the
duplicates, there were 280 duplicates.Currently, there are over 2 million distinct guids.
-Robert
On Mon, Aug 19, 2013 at 11:12 AM, Pavel Stehule <pavel.stehule@gmail.com>wrote:
2013/8/19 Robert Sosinski <rsosinski@ticketevolution.com>
Hi Pavel,
What kind of example do you need? I cant give you the actual data I
have in the table, but I can give you an example query and the schema
attached below. From there, I would just put in 2 million rows worth 1.2
Gigs of data. Average size of the the extended columns (using the
pg_column_size function) in bytes are:guid: 33
name: 2.41
currency: 4
fields: 120.32example query:
-- find duplicate records using a guid
select guid, array_agg(id) from orders group by guid;how much distinct guid is there, and how much duplicates
??
regards
Pavel
example schema:
Table "public.things"Column | Type |
Modifiers | Storage | Stats target | Description------------+-----------------------------+-----------------------------------------------------+----------+--------------+-------------
id | integer | not null default
nextval('things_id_seq'::regclass) | plain | |
version | integer | not null
| plain | |
created_at | timestamp without time zone | not null
| plain | |
updated_at | timestamp without time zone | not null
| plain | |
foo_id | integer | not null
| plain | |
bar_id | integer | not null
| plain | |
baz_id | integer | not null
| plain | |
guid | character varying | not null
| extended | |
name | character varying | not null
| extended | |
price | numeric(12,2) | not null
| main | |
currency | character varying | not null
| extended | |
amount | integer | not null
| plain | |
the_date | date | not null
| plain | |
fields | hstore |
| extended | |
Indexes:
"things_pkey" PRIMARY KEY, btree (id)
"things_foo_id_idx" btree (foo_id)
"things_bar_id_idx" btree (bar_id)
"things_baz_id_idx" btree (baz_id)
"things_guid_uidx" UNIQUE, btree (guid)
"things_lpad_lower_name_eidx" btree (lpad(lower(name::text), 10,
'0'::text))
"things_price_idx" btree (price)Foreign-key constraints:
"things_foo_id_fkey" FOREIGN KEY (foo_id) REFERENCES foos(id)
"things_bar_id_fkey" FOREIGN KEY (bar_id) REFERENCES bars(id)
"things_baz_id_fkey" FOREIGN KEY (baz_id) REFERENCES bazs(id)
Triggers:
timestamps_trig BEFORE INSERT OR UPDATE ON things FOR EACH ROW
EXECUTE PROCEDURE timestamps_tfun()Let me know if you need anything else.
Thanks,
On Mon, Aug 19, 2013 at 3:29 AM, Pavel Stehule <pavel.stehule@gmail.com>wrote:
Hello
please, can you send some example or test?
Regards
Pavel Stehule
2013/8/19 Robert Sosinski <rsosinski@ticketevolution.com>
When using array_agg on a large table, memory usage seems to spike up
until Postgres crashes with the following error:2013-08-17 18:41:02 UTC [2716]: [2] WARNING: terminating connection
because of crash of another server process
2013-08-17 18:41:02 UTC [2716]: [3] DETAIL: The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
2013-08-17 18:41:02 UTC [2716]: [4] HINT: In a moment you should be
able to reconnect to the database and repeat your command.I've definitely isolated it down to using array_agg, as when I changed
the query to use string_agg, it worked fine. I also tried using array_agg
on a few different queries, all yielding the same issue. Swapping in
string_agg fixed the issue once more.This particular table has over 2 million rows and is 1.2 Gigs, and
when I ran the query while viewing htop, the virtual size of the
Postgres process ballooned to 13.9G until crashing.The version of Postgres I am using is: PostgreSQL 9.2.4 on
x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1)
4.7.2, 64-bitAny help would be much appreciated, thanks!
-Robert
Can you send a EXPLAIN result in both use cases?
Pavel
2013/8/19 Robert Sosinski <rsosinski@ticketevolution.com>
Show quoted text
At the moment, all guids are distinct, however before I zapped the
duplicates, there were 280 duplicates.Currently, there are over 2 million distinct guids.
-Robert
On Mon, Aug 19, 2013 at 11:12 AM, Pavel Stehule <pavel.stehule@gmail.com>wrote:
2013/8/19 Robert Sosinski <rsosinski@ticketevolution.com>
Hi Pavel,
What kind of example do you need? I cant give you the actual data I
have in the table, but I can give you an example query and the schema
attached below. From there, I would just put in 2 million rows worth 1.2
Gigs of data. Average size of the the extended columns (using the
pg_column_size function) in bytes are:guid: 33
name: 2.41
currency: 4
fields: 120.32example query:
-- find duplicate records using a guid
select guid, array_agg(id) from orders group by guid;how much distinct guid is there, and how much duplicates
??
regards
Pavel
example schema:
Table "public.things"Column | Type |
Modifiers | Storage | Stats target | Description------------+-----------------------------+-----------------------------------------------------+----------+--------------+-------------
id | integer | not null default
nextval('things_id_seq'::regclass) | plain | |
version | integer | not null
| plain | |
created_at | timestamp without time zone | not null
| plain | |
updated_at | timestamp without time zone | not null
| plain | |
foo_id | integer | not null
| plain | |
bar_id | integer | not null
| plain | |
baz_id | integer | not null
| plain | |
guid | character varying | not null
| extended | |
name | character varying | not null
| extended | |
price | numeric(12,2) | not null
| main | |
currency | character varying | not null
| extended | |
amount | integer | not null
| plain | |
the_date | date | not null
| plain | |
fields | hstore |
| extended | |
Indexes:
"things_pkey" PRIMARY KEY, btree (id)
"things_foo_id_idx" btree (foo_id)
"things_bar_id_idx" btree (bar_id)
"things_baz_id_idx" btree (baz_id)
"things_guid_uidx" UNIQUE, btree (guid)
"things_lpad_lower_name_eidx" btree (lpad(lower(name::text), 10,
'0'::text))
"things_price_idx" btree (price)Foreign-key constraints:
"things_foo_id_fkey" FOREIGN KEY (foo_id) REFERENCES foos(id)
"things_bar_id_fkey" FOREIGN KEY (bar_id) REFERENCES bars(id)
"things_baz_id_fkey" FOREIGN KEY (baz_id) REFERENCES bazs(id)
Triggers:
timestamps_trig BEFORE INSERT OR UPDATE ON things FOR EACH ROW
EXECUTE PROCEDURE timestamps_tfun()Let me know if you need anything else.
Thanks,
On Mon, Aug 19, 2013 at 3:29 AM, Pavel Stehule <pavel.stehule@gmail.com>wrote:
Hello
please, can you send some example or test?
Regards
Pavel Stehule
2013/8/19 Robert Sosinski <rsosinski@ticketevolution.com>
When using array_agg on a large table, memory usage seems to spike up
until Postgres crashes with the following error:2013-08-17 18:41:02 UTC [2716]: [2] WARNING: terminating connection
because of crash of another server process
2013-08-17 18:41:02 UTC [2716]: [3] DETAIL: The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
2013-08-17 18:41:02 UTC [2716]: [4] HINT: In a moment you should be
able to reconnect to the database and repeat your command.I've definitely isolated it down to using array_agg, as when I changed
the query to use string_agg, it worked fine. I also tried using array_agg
on a few different queries, all yielding the same issue. Swapping in
string_agg fixed the issue once more.This particular table has over 2 million rows and is 1.2 Gigs, and
when I ran the query while viewing htop, the virtual size of the
Postgres process ballooned to 13.9G until crashing.The version of Postgres I am using is: PostgreSQL 9.2.4 on
x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1)
4.7.2, 64-bitAny help would be much appreciated, thanks!
-Robert
Hi Pavel,
Here are the explains you asked for:
explain analyze select string_agg(id::text,',') from things group by guid;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=400357.78..433784.93 rows=1337086 width=37) (actual
time=41434.485..53195.185 rows=2378626 loops=1)
-> Sort (cost=400357.78..403700.49 rows=1337086 width=37) (actual
time=41434.433..44992.736 rows=2378626 loops=1)
Sort Key: guid
Sort Method: quicksort Memory: 284135kB
-> Seq Scan on things (cost=0.00..264304.86 rows=1337086
width=37) (actual time=0.027..21429.179 rows=2378626 loops=1)
Total runtime: 56295.362 ms
(6 rows)
explain analyze select array_agg(id::text) from things group by guid;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=400357.78..433784.93 rows=1337086 width=37) (actual
time=23953.922..38157.059 rows=2378626 loops=1)
-> Sort (cost=400357.78..403700.49 rows=1337086 width=37) (actual
time=23953.847..27527.316 rows=2378626 loops=1)
Sort Key: guid
Sort Method: quicksort Memory: 284135kB
-> Seq Scan on things (cost=0.00..264304.86 rows=1337086
width=37) (actual time=0.007..4941.752 rows=2378626 loops=1)
Total runtime: 41280.897 ms
(6 rows)
These seem to be running on the machine now, and the memory is not
inflating, I just run this one, and it blew up.
explain with t as (select id, guid, md5(concat_ws(':', fields -> 'a',
fields -> 'b', fields -> 'c', fields -> 'd', fields -> 'e', foo_id::text))
from things) select md5, count(id), array_agg(id) from t group by 1 having
count(id) > 1;
-Robert
On Tue, Aug 20, 2013 at 1:53 PM, Pavel Stehule <pavel.stehule@gmail.com>wrote:
Show quoted text
Can you send a EXPLAIN result in both use cases?
Pavel
2013/8/19 Robert Sosinski <rsosinski@ticketevolution.com>
At the moment, all guids are distinct, however before I zapped the
duplicates, there were 280 duplicates.Currently, there are over 2 million distinct guids.
-Robert
On Mon, Aug 19, 2013 at 11:12 AM, Pavel Stehule <pavel.stehule@gmail.com>wrote:
2013/8/19 Robert Sosinski <rsosinski@ticketevolution.com>
Hi Pavel,
What kind of example do you need? I cant give you the actual data I
have in the table, but I can give you an example query and the schema
attached below. From there, I would just put in 2 million rows worth 1.2
Gigs of data. Average size of the the extended columns (using the
pg_column_size function) in bytes are:guid: 33
name: 2.41
currency: 4
fields: 120.32example query:
-- find duplicate records using a guid
select guid, array_agg(id) from orders group by guid;how much distinct guid is there, and how much duplicates
??
regards
Pavel
example schema:
Table "public.things"Column | Type |
Modifiers | Storage | Stats target | Description------------+-----------------------------+-----------------------------------------------------+----------+--------------+-------------
id | integer | not null default
nextval('things_id_seq'::regclass) | plain | |
version | integer | not null
| plain | |
created_at | timestamp without time zone | not null
| plain | |
updated_at | timestamp without time zone | not null
| plain | |
foo_id | integer | not null
| plain | |
bar_id | integer | not null
| plain | |
baz_id | integer | not null
| plain | |
guid | character varying | not null
| extended | |
name | character varying | not null
| extended | |
price | numeric(12,2) | not null
| main | |
currency | character varying | not null
| extended | |
amount | integer | not null
| plain | |
the_date | date | not null
| plain | |
fields | hstore |
| extended | |
Indexes:
"things_pkey" PRIMARY KEY, btree (id)
"things_foo_id_idx" btree (foo_id)
"things_bar_id_idx" btree (bar_id)
"things_baz_id_idx" btree (baz_id)
"things_guid_uidx" UNIQUE, btree (guid)
"things_lpad_lower_name_eidx" btree (lpad(lower(name::text), 10,
'0'::text))
"things_price_idx" btree (price)Foreign-key constraints:
"things_foo_id_fkey" FOREIGN KEY (foo_id) REFERENCES foos(id)
"things_bar_id_fkey" FOREIGN KEY (bar_id) REFERENCES bars(id)
"things_baz_id_fkey" FOREIGN KEY (baz_id) REFERENCES bazs(id)
Triggers:
timestamps_trig BEFORE INSERT OR UPDATE ON things FOR EACH ROW
EXECUTE PROCEDURE timestamps_tfun()Let me know if you need anything else.
Thanks,
On Mon, Aug 19, 2013 at 3:29 AM, Pavel Stehule <pavel.stehule@gmail.com
wrote:
Hello
please, can you send some example or test?
Regards
Pavel Stehule
2013/8/19 Robert Sosinski <rsosinski@ticketevolution.com>
When using array_agg on a large table, memory usage seems to spike up
until Postgres crashes with the following error:2013-08-17 18:41:02 UTC [2716]: [2] WARNING: terminating connection
because of crash of another server process
2013-08-17 18:41:02 UTC [2716]: [3] DETAIL: The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
2013-08-17 18:41:02 UTC [2716]: [4] HINT: In a moment you should be
able to reconnect to the database and repeat your command.I've definitely isolated it down to using array_agg, as when I
changed the query to use string_agg, it worked fine. I also tried using
array_agg on a few different queries, all yielding the same issue.
Swapping in string_agg fixed the issue once more.This particular table has over 2 million rows and is 1.2 Gigs, and
when I ran the query while viewing htop, the virtual size of the
Postgres process ballooned to 13.9G until crashing.The version of Postgres I am using is: PostgreSQL 9.2.4 on
x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1)
4.7.2, 64-bitAny help would be much appreciated, thanks!
-Robert