Postgres 10.3 delete on partition table cannot execute, out of memory.

Started by Mark Magnusabout 8 years ago3 messagesbugs
Jump to latest
#1Mark Magnus
mark.magnus@impactdata.com.au

Bug

Partitioned table has 7202 partitions. No partition contains more than 50
records. Partitioning is done on a foreign key. Any delete operation i.e.
"delete from contacts where id = ?" or
"delete from contacts where id = ? and account_id = ?" or
"delete from contacts where account_id = ?"
results in out of memory condition.

Default Postgres Configuration with exception
max_locks_per_transaction = 1024

Postgres Logs

2018-03-15 14:26:40.340 AEDT [7120] LOG: server process (PID 8177) was
terminated by signal 9: Killed
2018-03-15 14:26:40.340 AEDT [7120] DETAIL: Failed process was running:
delete from contacts where id = 82398 and account_id = 9000
2018-03-15 14:26:40.354 AEDT [7120] LOG: terminating any other active
server processes
2018-03-15 14:26:40.367 AEDT [3821] WARNING: terminating connection
because of crash of another server process
2018-03-15 14:26:40.367 AEDT [3821] DETAIL: The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2018-03-15 14:26:40.367 AEDT [3821] HINT: In a moment you should be able
to reconnect to the database and repeat your command.
2018-03-15 14:26:40.369 AEDT [7726] mark@postgres WARNING: terminating
connection because of crash of another server process
2018-03-15 14:26:40.369 AEDT [7726] mark@postgres DETAIL: The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
2018-03-15 14:26:40.369 AEDT [7726] mark@postgres HINT: In a moment you
should be able to reconnect to the database and repeat your command.
2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development WARNING:
terminating connection because of crash of another server process
2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development DETAIL:
The postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development HINT: In
a moment you should be able to reconnect to the database and repeat your
command.
2018-03-15 14:26:40.569 AEDT [7120] LOG: all server processes terminated;
reinitializing
2018-03-15 14:26:40.639 AEDT [9244] LOG: database system was interrupted;
last known up at 2018-03-15 13:08:47 AEDT
2018-03-15 14:26:41.745 AEDT [9251] mark@postgres FATAL: the database
system is in recovery mode
2018-03-15 14:26:41.746 AEDT [9252] mark@postgres FATAL: the database
system is in recovery mode
2018-03-15 14:26:44.778 AEDT [9244] LOG: database system was not properly
shut down; automatic recovery in progress
2018-03-15 14:26:44.798 AEDT [9244] LOG: redo starts at 0/56782CE0
2018-03-15 14:26:44.798 AEDT [9244] LOG: invalid record length at
0/56782D18: wanted 24, got 0
2018-03-15 14:26:44.798 AEDT [9244] LOG: redo done at 0/56782CE0
2018-03-15 14:26:44.870 AEDT [7120] LOG: database system is ready to
accept connections

System

ubuntu-16.4 LTS

mark-thinkpad-x230
description: Notebook
product: 2306CTO (LENOVO_MT_2306)
vendor: LENOVO
version: ThinkPad X230
serial: PK12NFA
width: 64 bits
capabilities: smbios-2.7 dmi-2.7 vsyscall32
configuration: administrator_password=disabled chassis=notebook
family=ThinkPad X230 power-on_password=disabled sku=LENOVO_MT_2306
uuid=814592E2-9652-CB11-A65F-83069E2A672A
*-core
description: Motherboard
product: 2306CTO
vendor: LENOVO
physical id: 0
version: Not Defined
serial: 1ZST039410C
slot: Not Available
*-cpu
description: CPU
product: Intel(R) Core(TM) i3-3130M CPU @ 2.60GHz
vendor: Intel Corp.
physical id: 1
bus info: cpu@0
version: Intel(R) Core(TM) i3-3130M CPU @ 2.60GHz
serial: None
slot: CPU Socket - U3E1
size: 2208MHz
capacity: 2600MHz
width: 64 bits
clock: 100MHz
capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae
mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse
sse2 ss ht tm pbe syscall nx rdtscp constant_tsc arch_perfmon pebs bts
rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq
dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2
x2apic popcnt tsc_deadline_timer xsave avx f16c lahf_lm epb retpoline
kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt
dtherm arat pln pts cpufreq
configuration: cores=2 enabledcores=2 threads=4

P.S. Please feel free to contact me for more information.

Regards

Mark Magnus
Software Developer

Impact Data Pty Ltd
Data Driven. Direct. Digital.

a: Level 1/428 Little Burke Street, Melbourne, VIC 3000
e: mark.magnus@impactdata.com.au <jay@impactdata.com.au>
w: www.impactdata.com.au

#2Fabio Pardi
f.pardi@portavita.eu
In reply to: Mark Magnus (#1)
Re: Postgres 10.3 delete on partition table cannot execute, out of memory.

Hi Mark,

To me it looks like OOM was triggered.

Could you post to us a bit more info?

like how much RAM on the machine, Postgres memory settings (like work_mem shared buffers maintenance_work_mem , max_connections), what was happening on the DB at the time of the incident and linux logs (if OOM was triggered, then is likely you will find something like that), please?

Regards,

Fabio Pardi

Show quoted text

On 03/15/2018 05:15 AM, Mark Magnus wrote:

Bug

Partitioned table has 7202 partitions. No partition contains more than 50 records. Partitioning is done on a foreign key. Any delete operation i.e.
"delete from contacts where id = ?" or
"delete from contacts where id = ? and account_id = ?" or
"delete from contacts where account_id = ?"
results in out of memory condition.

Default Postgres Configuration with exception
max_locks_per_transaction = 1024

Postgres Logs

2018-03-15 14:26:40.340 AEDT [7120] LOG:  server process (PID 8177) was terminated by signal 9: Killed
2018-03-15 14:26:40.340 AEDT [7120] DETAIL:  Failed process was running: delete from contacts where id = 82398 and account_id = 9000
2018-03-15 14:26:40.354 AEDT [7120] LOG:  terminating any other active server processes
2018-03-15 14:26:40.367 AEDT [3821] WARNING:  terminating connection because of crash of another server process
2018-03-15 14:26:40.367 AEDT [3821] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-03-15 14:26:40.367 AEDT [3821] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2018-03-15 14:26:40.369 AEDT [7726] mark@postgres WARNING:  terminating connection because of crash of another server process
2018-03-15 14:26:40.369 AEDT [7726] mark@postgres DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-03-15 14:26:40.369 AEDT [7726] mark@postgres HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development WARNING:  terminating connection because of crash of another server process
2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2018-03-15 14:26:40.569 AEDT [7120] LOG:  all server processes terminated; reinitializing
2018-03-15 14:26:40.639 AEDT [9244] LOG:  database system was interrupted; last known up at 2018-03-15 13:08:47 AEDT
2018-03-15 14:26:41.745 AEDT [9251] mark@postgres FATAL:  the database system is in recovery mode
2018-03-15 14:26:41.746 AEDT [9252] mark@postgres FATAL:  the database system is in recovery mode
2018-03-15 14:26:44.778 AEDT [9244] LOG:  database system was not properly shut down; automatic recovery in progress
2018-03-15 14:26:44.798 AEDT [9244] LOG:  redo starts at 0/56782CE0
2018-03-15 14:26:44.798 AEDT [9244] LOG:  invalid record length at 0/56782D18: wanted 24, got 0
2018-03-15 14:26:44.798 AEDT [9244] LOG:  redo done at 0/56782CE0
2018-03-15 14:26:44.870 AEDT [7120] LOG:  database system is ready to accept connections

System

ubuntu-16.4 LTS

mark-thinkpad-x230       
    description: Notebook
    product: 2306CTO (LENOVO_MT_2306)
    vendor: LENOVO
    version: ThinkPad X230
    serial: PK12NFA
    width: 64 bits
    capabilities: smbios-2.7 dmi-2.7 vsyscall32
    configuration: administrator_password=disabled chassis=notebook family=ThinkPad X230 power-on_password=disabled sku=LENOVO_MT_2306 uuid=814592E2-9652-CB11-A65F-83069E2A672A
  *-core
       description: Motherboard
       product: 2306CTO
       vendor: LENOVO
       physical id: 0
       version: Not Defined
       serial: 1ZST039410C
       slot: Not Available
     *-cpu
          description: CPU
          product: Intel(R) Core(TM) i3-3130M CPU @ 2.60GHz
          vendor: Intel Corp.
          physical id: 1
          bus info: cpu@0
          version: Intel(R) Core(TM) i3-3130M CPU @ 2.60GHz
          serial: None
          slot: CPU Socket - U3E1
          size: 2208MHz
          capacity: 2600MHz
          width: 64 bits
          clock: 100MHz
          capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx f16c lahf_lm epb retpoline kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm arat pln pts cpufreq
          configuration: cores=2 enabledcores=2 threads=4

P.S. Please feel free to contact me for more information.

Regards

Mark Magnus
Software Developer 

Impact Data Pty Ltd
Data Driven. Direct. Digital.

a:  Level 1/428 Little Burke Street, Melbourne, VIC 3000
e:  mark.magnus@impactdata.com.au <mailto:jay@impactdata.com.au>
w:  www.impactdata.com.au <http://www.impactdata.com.au/&gt;

#3Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Mark Magnus (#1)
Re: Postgres 10.3 delete on partition table cannot execute, out of memory.

Hi.

On 2018/03/15 13:15, Mark Magnus wrote:

Bug

Partitioned table has 7202 partitions. No partition contains more than 50
records. Partitioning is done on a foreign key. Any delete operation i.e.
"delete from contacts where id = ?" or
"delete from contacts where id = ? and account_id = ?" or
"delete from contacts where account_id = ?"
results in out of memory condition.

Default Postgres Configuration with exception
max_locks_per_transaction = 1024

As Fabio says, it seems most likely that OOM was triggered while planning
the delete operation. I can reproduce OOM being triggered on my modest
development machine, so perhaps that's what's happening in your case too.

This is unfortunately expected, given that the underlying planning
mechanism cannot cope beyond a few hundred partitions. :-( See a relevant
note in the documentation; last line of the page at this link:
https://www.postgresql.org/docs/devel/static/ddl-partitioning.html.

Until things improve in that area, one workaround might be to perform the
delete operation directly on the partition, as it's possible to do that.
Or redesign your schema to use less number of partitions.

Thanks,
Amit