large numbers of inserts out of memory strategy

Started by Ted Tothover 8 years ago27 messagesgeneral

txtoth@gmail.com

over 8 years ago

I'm writing a migration utility to move data from non-rdbms data
source to a postgres db. Currently I'm generating SQL INSERT
statements involving 6 related tables for each 'thing'. With 100k or
more 'things' to migrate I'm generating a lot of statements and when I
try to import using psql postgres fails with 'out of memory' when
running on a Linux VM with 4G of memory. If I break into smaller
chunks say ~50K statements then thde import succeeds. I can change my
migration utility to generate multiple files each with a limited
number of INSERTs to get around this issue but maybe there's
another/better way?

Ted

Rob Sargent

robjsargent@gmail.com

over 8 years ago

In reply to: Ted Toth (#1)

Re: large numbers of inserts out of memory strategy

On Nov 28, 2017, at 10:17 AM, Ted Toth <txtoth@gmail.com> wrote:

I'm writing a migration utility to move data from non-rdbms data
source to a postgres db. Currently I'm generating SQL INSERT
statements involving 6 related tables for each 'thing'. With 100k or
more 'things' to migrate I'm generating a lot of statements and when I
try to import using psql postgres fails with 'out of memory' when
running on a Linux VM with 4G of memory. If I break into smaller
chunks say ~50K statements then thde import succeeds. I can change my
migration utility to generate multiple files each with a limited
number of INSERTs to get around this issue but maybe there's
another/better way?

Ted

what tools / languages ate you using?

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 8 years ago

In reply to: Ted Toth (#1)

Re: large numbers of inserts out of memory strategy

Hi,

On 11/28/2017 06:17 PM, Ted Toth wrote:

I'm writing a migration utility to move data from non-rdbms data
source to a postgres db. Currently I'm generating SQL INSERT
statements involving 6 related tables for each 'thing'. With 100k or
more 'things' to migrate I'm generating a lot of statements and when I
try to import using psql postgres fails with 'out of memory' when
running on a Linux VM with 4G of memory. If I break into smaller
chunks say ~50K statements then thde import succeeds. I can change my
migration utility to generate multiple files each with a limited
number of INSERTs to get around this issue but maybe there's
another/better way?

The question is what exactly runs out of memory, and how did you modify
the configuration (particularly related to memory).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Ted Toth

txtoth@gmail.com

over 8 years ago

In reply to: Rob Sargent (#2)

Re: large numbers of inserts out of memory strategy

On Tue, Nov 28, 2017 at 11:19 AM, Rob Sargent <robjsargent@gmail.com> wrote:

On Nov 28, 2017, at 10:17 AM, Ted Toth <txtoth@gmail.com> wrote:

I'm writing a migration utility to move data from non-rdbms data
source to a postgres db. Currently I'm generating SQL INSERT
statements involving 6 related tables for each 'thing'. With 100k or
more 'things' to migrate I'm generating a lot of statements and when I
try to import using psql postgres fails with 'out of memory' when
running on a Linux VM with 4G of memory. If I break into smaller
chunks say ~50K statements then thde import succeeds. I can change my
migration utility to generate multiple files each with a limited
number of INSERTs to get around this issue but maybe there's
another/better way?

Ted

what tools / languages ate you using?

I'm using python to read binary source files and create the text files
contains the SQL. Them I'm running psql -f <file containing SQL>.

Ted Toth

txtoth@gmail.com

over 8 years ago

In reply to: Tomas Vondra (#3)

Re: large numbers of inserts out of memory strategy

On Tue, Nov 28, 2017 at 11:22 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Hi,

On 11/28/2017 06:17 PM, Ted Toth wrote:

I'm writing a migration utility to move data from non-rdbms data
source to a postgres db. Currently I'm generating SQL INSERT
statements involving 6 related tables for each 'thing'. With 100k or
more 'things' to migrate I'm generating a lot of statements and when I
try to import using psql postgres fails with 'out of memory' when
running on a Linux VM with 4G of memory. If I break into smaller
chunks say ~50K statements then thde import succeeds. I can change my
migration utility to generate multiple files each with a limited
number of INSERTs to get around this issue but maybe there's
another/better way?

The question is what exactly runs out of memory, and how did you modify
the configuration (particularly related to memory).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

I'm pretty new to postgres so I haven't changed any configuration
setting and the log is a bit hard for me to make sense of :(

Rob Sargent

robjsargent@gmail.com

over 8 years ago

In reply to: Ted Toth (#4)

Re: large numbers of inserts out of memory strategy

On 11/28/2017 10:50 AM, Ted Toth wrote:

On Tue, Nov 28, 2017 at 11:19 AM, Rob Sargent <robjsargent@gmail.com> wrote:

On Nov 28, 2017, at 10:17 AM, Ted Toth <txtoth@gmail.com> wrote:

I'm writing a migration utility to move data from non-rdbms data
source to a postgres db. Currently I'm generating SQL INSERT
statements involving 6 related tables for each 'thing'. With 100k or
more 'things' to migrate I'm generating a lot of statements and when I
try to import using psql postgres fails with 'out of memory' when
running on a Linux VM with 4G of memory. If I break into smaller
chunks say ~50K statements then thde import succeeds. I can change my
migration utility to generate multiple files each with a limited
number of INSERTs to get around this issue but maybe there's
another/better way?

Ted

what tools / languages ate you using?

I'm using python to read binary source files and create the text files
contains the SQL. Them I'm running psql -f <file containing SQL>.

If you're going out to the file system, I would use COPY of csv files
(if number of records per table is non-trivial). Any bulk loading
python available?

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 8 years ago

In reply to: Ted Toth (#5)

Re: large numbers of inserts out of memory strategy

On 11/28/2017 06:54 PM, Ted Toth wrote:

On Tue, Nov 28, 2017 at 11:22 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Hi,

On 11/28/2017 06:17 PM, Ted Toth wrote:

I'm writing a migration utility to move data from non-rdbms data
source to a postgres db. Currently I'm generating SQL INSERT
statements involving 6 related tables for each 'thing'. With 100k or
more 'things' to migrate I'm generating a lot of statements and when I
try to import using psql postgres fails with 'out of memory' when
running on a Linux VM with 4G of memory. If I break into smaller
chunks say ~50K statements then thde import succeeds. I can change my
migration utility to generate multiple files each with a limited
number of INSERTs to get around this issue but maybe there's
another/better way?

The question is what exactly runs out of memory, and how did you modify
the configuration (particularly related to memory).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

I'm pretty new to postgres so I haven't changed any configuration
setting and the log is a bit hard for me to make sense of :(

The most interesting part of the log is this:

SPI Proc: 2464408024 total in 279 blocks; 1672 free (1 chunks);
2464406352 used
PL/pgSQL function context: 537911352 total in 74 blocks; 2387536
free (4 chunks); 535523816 used

That is, most of the memory is allocated for SPI (2.4GB) and PL/pgSQL
procedure (500MB). How do you do the load? What libraries/drivers?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Steven Lembark

lembark@wrkhors.com

over 8 years ago

In reply to: Ted Toth (#1)

Re: large numbers of inserts out of memory strategy

On Tue, 28 Nov 2017 11:17:07 -0600
Ted Toth <txtoth@gmail.com> wrote:

I'm writing a migration utility to move data from non-rdbms data
source to a postgres db. Currently I'm generating SQL INSERT
statements involving 6 related tables for each 'thing'. With 100k or
more 'things' to migrate I'm generating a lot of statements and when I
try to import using psql postgres fails with 'out of memory' when
running on a Linux VM with 4G of memory. If I break into smaller
chunks say ~50K statements then thde import succeeds. I can change my
migration utility to generate multiple files each with a limited
number of INSERTs to get around this issue but maybe there's
another/better way?

Chunking the loads has a number of advantages other than avoiding
OOM errors, not the least of which are possibly parallel loading
and being able to restart after other failures without re-processing
all of the input data.

Note that 4GiB of core is not all that much by today's standards.
You might want to run top/vmstat and ask if the PG daemons are using
all/most of the available memory. If PG is sucking up all of the core
as it is then tuning the database may not have much of an effect; if
there is lots of spare memory then it'll be worth looking at ways to
tune PG.

Note also that "out of memory" frequently means virutal memory.

Q: Does the VM have swap configured?

If not then add 8GiB and see if that solves your problem; if so then
how much swap is in use when you get the OOM error?

--
Steven Lembark 1505 National Ave
Workhorse Computing Rockford, IL 61103
lembark@wrkhors.com +1 888 359 3508

Ted Toth

txtoth@gmail.com

over 8 years ago

In reply to: Steven Lembark (#8)

Re: large numbers of inserts out of memory strategy

On Tue, Nov 28, 2017 at 12:04 PM, Steven Lembark <lembark@wrkhors.com> wrote:

On Tue, 28 Nov 2017 11:17:07 -0600
Ted Toth <txtoth@gmail.com> wrote:

I'm writing a migration utility to move data from non-rdbms data
source to a postgres db. Currently I'm generating SQL INSERT
statements involving 6 related tables for each 'thing'. With 100k or
more 'things' to migrate I'm generating a lot of statements and when I
try to import using psql postgres fails with 'out of memory' when
running on a Linux VM with 4G of memory. If I break into smaller
chunks say ~50K statements then thde import succeeds. I can change my
migration utility to generate multiple files each with a limited
number of INSERTs to get around this issue but maybe there's
another/better way?

Chunking the loads has a number of advantages other than avoiding
OOM errors, not the least of which are possibly parallel loading
and being able to restart after other failures without re-processing
all of the input data.

Note that 4GiB of core is not all that much by today's standards.
You might want to run top/vmstat and ask if the PG daemons are using
all/most of the available memory. If PG is sucking up all of the core
as it is then tuning the database may not have much of an effect; if
there is lots of spare memory then it'll be worth looking at ways to
tune PG.

Note also that "out of memory" frequently means virutal memory.

Q: Does the VM have swap configured?

If not then add 8GiB and see if that solves your problem; if so then
how much swap is in use when you get the OOM error?

--
Steven Lembark 1505 National Ave
Workhorse Computing Rockford, IL 61103
lembark@wrkhors.com +1 888 359 3508

I understand that 4G is not much ... Yeah in top I see the postmaster
process RES grow until it fails. The VM is basically a Centos 6 box
with 4G of swap.

#10

Ted Toth

txtoth@gmail.com

over 8 years ago

In reply to: Tomas Vondra (#7)

Re: large numbers of inserts out of memory strategy

On Tue, Nov 28, 2017 at 12:01 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

On 11/28/2017 06:54 PM, Ted Toth wrote:

On Tue, Nov 28, 2017 at 11:22 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Hi,

On 11/28/2017 06:17 PM, Ted Toth wrote:

I'm writing a migration utility to move data from non-rdbms data
source to a postgres db. Currently I'm generating SQL INSERT
statements involving 6 related tables for each 'thing'. With 100k or
more 'things' to migrate I'm generating a lot of statements and when I
try to import using psql postgres fails with 'out of memory' when
running on a Linux VM with 4G of memory. If I break into smaller
chunks say ~50K statements then thde import succeeds. I can change my
migration utility to generate multiple files each with a limited
number of INSERTs to get around this issue but maybe there's
another/better way?

The question is what exactly runs out of memory, and how did you modify
the configuration (particularly related to memory).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

I'm pretty new to postgres so I haven't changed any configuration
setting and the log is a bit hard for me to make sense of :(

The most interesting part of the log is this:

SPI Proc: 2464408024 total in 279 blocks; 1672 free (1 chunks);
2464406352 used
PL/pgSQL function context: 537911352 total in 74 blocks; 2387536
free (4 chunks); 535523816 used

That is, most of the memory is allocated for SPI (2.4GB) and PL/pgSQL
procedure (500MB). How do you do the load? What libraries/drivers?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

I'm doing the load with 'psql -f'. I using 9.6 el6 rpms on a Centos VM
I downloaded from the postgres repo.

#11

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 8 years ago

In reply to: Ted Toth (#10)

Re: large numbers of inserts out of memory strategy

On 11/28/2017 07:26 PM, Ted Toth wrote:

On Tue, Nov 28, 2017 at 12:01 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

...

That is, most of the memory is allocated for SPI (2.4GB) and PL/pgSQL
procedure (500MB). How do you do the load? What libraries/drivers?

I'm doing the load with 'psql -f'. I using 9.6 el6 rpms on a Centos VM
I downloaded from the postgres repo.

So what does the script actually do? Because psql certainly is not
running pl/pgsql procedures on it's own. We need to understand why
you're getting OOM in the first place - just inserts alone should not
cause failures like that. Please show us more detailed explanation of
what the load actually does, so that we can try reproducing it.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12

Brian Crowell

brian@fluggo.com

over 8 years ago

In reply to: Tomas Vondra (#11)

Re: large numbers of inserts out of memory strategy

On Tue, Nov 28, 2017 at 12:38 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com

wrote:

So what does the script actually do? Because psql certainly is not
running pl/pgsql procedures on it's own. We need to understand why
you're getting OOM in the first place - just inserts alone should not
cause failures like that. Please show us more detailed explanation of
what the load actually does, so that we can try reproducing it.

Perhaps the script is one giant insert statement?

—Brian

#13

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Brian Crowell (#12)

Re: large numbers of inserts out of memory strategy

Brian Crowell <brian@fluggo.com> writes:

On Tue, Nov 28, 2017 at 12:38 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com

wrote:
So what does the script actually do? Because psql certainly is not
running pl/pgsql procedures on it's own. We need to understand why
you're getting OOM in the first place - just inserts alone should not
cause failures like that. Please show us more detailed explanation of
what the load actually does, so that we can try reproducing it.

Perhaps the script is one giant insert statement?

It's pretty clear from the memory map that the big space consumption
is inside a single invocation of a plpgsql function:

SPI Proc: 2464408024 total in 279 blocks; 1672 free (1 chunks); 2464406352 used
PL/pgSQL function context: 537911352 total in 74 blocks; 2387536 free (4 chunks); 535523816 used

So whatever's going on here, there's more to it than a giant client-issued
INSERT (or COPY), or for that matter a large number of small ones. What
would seem to be required is a many-megabyte-sized plpgsql function body
or DO block.

Actually, the truly weird thing about that map is that the "PL/pgSQL
function context" seems to be a child of a "SPI Proc" context, whereas
it's entirely clear from the code that it ought to be a direct child of
TopMemoryContext. I have no idea how this state of affairs came to be,
and am interested to find out.

regards, tom lane

#14

Ted Toth

txtoth@gmail.com

over 8 years ago

In reply to: Tom Lane (#13)

Re: large numbers of inserts out of memory strategy

On Tue, Nov 28, 2017 at 9:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Brian Crowell <brian@fluggo.com> writes:

On Tue, Nov 28, 2017 at 12:38 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com

wrote:
So what does the script actually do? Because psql certainly is not
running pl/pgsql procedures on it's own. We need to understand why
you're getting OOM in the first place - just inserts alone should not
cause failures like that. Please show us more detailed explanation of
what the load actually does, so that we can try reproducing it.

Perhaps the script is one giant insert statement?

It's pretty clear from the memory map that the big space consumption
is inside a single invocation of a plpgsql function:

SPI Proc: 2464408024 total in 279 blocks; 1672 free (1 chunks); 2464406352 used
PL/pgSQL function context: 537911352 total in 74 blocks; 2387536 free (4 chunks); 535523816 used

So whatever's going on here, there's more to it than a giant client-issued
INSERT (or COPY), or for that matter a large number of small ones. What
would seem to be required is a many-megabyte-sized plpgsql function body
or DO block.

Actually, the truly weird thing about that map is that the "PL/pgSQL
function context" seems to be a child of a "SPI Proc" context, whereas
it's entirely clear from the code that it ought to be a direct child of
TopMemoryContext. I have no idea how this state of affairs came to be,
and am interested to find out.

regards, tom lane

Yes I did generate 1 large DO block:

DO $$
DECLARE thingid bigint; thingrec bigint; thingdataid bigint;
BEGIN
INSERT INTO thing
(ltn,classification,machine,source,thgrec,flags,serial,type) VALUES
('T007336','THING',0,1025,7336,7,'XXX869977564',1) RETURNING id,thgrec
INTO thingid,thingrec;
INSERT INTO recnum_thing (recnum,thing_id) VALUES (thingrec,thingid);
INSERT INTO thingstatus
(thing_id,nrpts,rmks_cs,force_type_id,ftn_cs,force_code,arr_cs,mask,toi_state,plot_id,signa_cs,lastchange,des_cs,rig_cs,ownship,correlation,maxrpts,rtn_cs,ctc_cs,group_mask,dep_cs)
VALUES (thingid,121,'{0,0,0,0}',440,0,23,0,0,0,'{23,-1,3803,3805,-1,-1,0,6}',0,1509459164,0,0,0,0,1000,0,0,0,0);
INSERT INTO thinger
(thing_id,spe_key,cse_unc,lat_spd,cov,dtg,lng,spd,ave_spd,cse,tol,nrpts,lat,alpha,sigma,spd_unc,lng_spd)
VALUES (thingid,-1,0.0,-6.58197336634e-08,'{4.27624291532e-09,0.0,3.07802916488e-09,0.0,4.27624291532e-09,0.0,3.07802916488e-09,4.16110417234e-08,0.0,4.16110417234e-08}',1509459163,2.21596980095,0.000226273215958,1.0,0.0,0.10000000149,121,0.584555745125,10.0,4.23079740131e-08,0.0,-2.49999881907e-10);
INSERT INTO thingdata
(thing_id,category,db_num,xref,org_type,trademark,shortname,fcode,platform,callsign,type,orig_xref,shipclass,home_base,uic,service,di,lngfixed,hull,precision,alert,flag,besufx,name,mmsi,catcode,ntds,imo,pn_num,chxref,threat,sconum,latfixed,db_type,pif,echelon,jtn,quantity,overwrite)
VALUES (thingid,'XXX','','','','','004403704','23','','','','','UNEQUATED','','','','',0.0,'','{0,0,0,0,0}','','KS','','UNKNOWN','004403704','','','','','','AFD','',0.0,3,'','',0,0,0)
RETURNING id INTO thingdataid;
INSERT INTO thingnum (thingdata_id,thgnum,state,dtg,cmd) VALUES
(thingdataid,'013086',0,1502970401,'FOO');

END $$;

Should I limit the number of 'thing' inserts within a DO block or
wrapping each 'thing' insert in it's own DO block?

#15

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Ted Toth (#14)

Re: large numbers of inserts out of memory strategy

Ted Toth <txtoth@gmail.com> writes:

On Tue, Nov 28, 2017 at 9:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So whatever's going on here, there's more to it than a giant client-issued
INSERT (or COPY), or for that matter a large number of small ones. What
would seem to be required is a many-megabyte-sized plpgsql function body
or DO block.

Yes I did generate 1 large DO block:

Apparently by "large" you mean "hundreds of megabytes". Don't do that,
at least not on a machine that hasn't got hundreds of megabytes to spare.
The entire thing has to be sucked into menory and parsed before anything
will happen.

regards, tom lane

#16

Steven Lembark

lembark@wrkhors.com

over 8 years ago

In reply to: Ted Toth (#4)

Re: large numbers of inserts out of memory strategy

what tools / languages ate you using?

I'm using python to read binary source files and create the text files
contains the SQL. Them I'm running psql -f <file containing SQL>.

Then chunking the input should be trivial.
There are a variety of techniques you can use to things like disable
indexes during loading, etc. Maybe load them into temp tables and
then insert the temp's into the destination tables. The point is to
amortize the memory load over the entire load period.

--
Steven Lembark 1505 National Ave
Workhorse Computing Rockford, IL 61103
lembark@wrkhors.com +1 888 359 3508

#17

Steven Lembark

lembark@wrkhors.com

over 8 years ago

In reply to: Ted Toth (#5)

Re: large numbers of inserts out of memory strategy

I'm pretty new to postgres so I haven't changed any configuration
setting and the log is a bit hard for me to make sense of :(

Diving into the shark tank is a helluva way to learn how to swim :-)

Are you interested in finding doc's on how to deal with the tuning?

--
Steven Lembark 1505 National Ave
Workhorse Computing Rockford, IL 61103
lembark@wrkhors.com +1 888 359 3508

#18

Rory Campbell-Lange

rory@campbell-lange.net

over 8 years ago

In reply to: Rob Sargent (#6)

Re: large numbers of inserts out of memory strategy

On 28/11/17, Rob Sargent (robjsargent@gmail.com) wrote:

On 11/28/2017 10:50 AM, Ted Toth wrote:

On Tue, Nov 28, 2017 at 11:19 AM, Rob Sargent <robjsargent@gmail.com> wrote:

On Nov 28, 2017, at 10:17 AM, Ted Toth <txtoth@gmail.com> wrote:

I'm writing a migration utility to move data from non-rdbms data
source to a postgres db. Currently I'm generating SQL INSERT
statements involving 6 related tables for each 'thing'. With 100k or
more 'things' to migrate I'm generating a lot of statements and when I
try to import using psql postgres fails with 'out of memory' when
running on a Linux VM with 4G of memory. If I break into smaller
chunks say ~50K statements then thde import succeeds. I can change my
migration utility to generate multiple files each with a limited
number of INSERTs to get around this issue but maybe there's
another/better way?

what tools / languages ate you using?

I'm using python to read binary source files and create the text files
contains the SQL. Them I'm running psql -f <file containing SQL>.

If you're going out to the file system, I would use COPY of csv files (if
number of records per table is non-trivial). Any bulk loading python
available?

psycopg2 has a copy_from function and (possibly pertinent in this case) a
copy_expert function which allows the read buffer size to be specified.

#19

Peter J. Holzer

hjp-pgsql@hjp.at

over 8 years ago

In reply to: Ted Toth (#14)

Re: large numbers of inserts out of memory strategy

On 2017-11-29 08:32:02 -0600, Ted Toth wrote:

Yes I did generate 1 large DO block:

DO $$
DECLARE thingid bigint; thingrec bigint; thingdataid bigint;
BEGIN
INSERT INTO thing
(ltn,classification,machine,source,thgrec,flags,serial,type) VALUES
('T007336','THING',0,1025,7336,7,'XXX869977564',1) RETURNING id,thgrec
INTO thingid,thingrec;
INSERT INTO recnum_thing (recnum,thing_id) VALUES (thingrec,thingid);
INSERT INTO thingstatus
(thing_id,nrpts,rmks_cs,force_type_id,ftn_cs,force_code,arr_cs,mask,toi_state,plot_id,signa_cs,lastchange,des_cs,rig_cs,ownship,correlation,maxrpts,rtn_cs,ctc_cs,group_mask,dep_cs)
VALUES (thingid,121,'{0,0,0,0}',440,0,23,0,0,0,'{23,-1,3803,3805,-1,-1,0,6}',0,1509459164,0,0,0,0,1000,0,0,0,0);
INSERT INTO thinger
(thing_id,spe_key,cse_unc,lat_spd,cov,dtg,lng,spd,ave_spd,cse,tol,nrpts,lat,alpha,sigma,spd_unc,lng_spd)
VALUES (thingid,-1,0.0,-6.58197336634e-08,'{4.27624291532e-09,0.0,3.07802916488e-09,0.0,4.27624291532e-09,0.0,3.07802916488e-09,4.16110417234e-08,0.0,4.16110417234e-08}',1509459163,2.21596980095,0.000226273215958,1.0,0.0,0.10000000149,121,0.584555745125,10.0,4.23079740131e-08,0.0,-2.49999881907e-10);
INSERT INTO thingdata
(thing_id,category,db_num,xref,org_type,trademark,shortname,fcode,platform,callsign,type,orig_xref,shipclass,home_base,uic,service,di,lngfixed,hull,precision,alert,flag,besufx,name,mmsi,catcode,ntds,imo,pn_num,chxref,threat,sconum,latfixed,db_type,pif,echelon,jtn,quantity,overwrite)
VALUES (thingid,'XXX','','','','','004403704','23','','','','','UNEQUATED','','','','',0.0,'','{0,0,0,0,0}','','KS','','UNKNOWN','004403704','','','','','','AFD','',0.0,3,'','',0,0,0)
RETURNING id INTO thingdataid;
INSERT INTO thingnum (thingdata_id,thgnum,state,dtg,cmd) VALUES
(thingdataid,'013086',0,1502970401,'FOO');

<repeated for each thing>

END $$;

Should I limit the number of 'thing' inserts within a DO block or
wrapping each 'thing' insert in it's own DO block?

I would suggest getting rid of the do block entirely if that is
possible. Just create lots of insert statements. You can get the current
value of a sequence with currval('sequence_name').

Alternately or in addition, since you are using python, you might want
to insert directly into the database from python using psycopg2. For
separate insert statements that should have about the same performance.
(It is usually much faster to write to a csv file and load that with
copy than to insert each row, but you don't do that and it might be
difficult in your case).

#20

Ted Toth

txtoth@gmail.com

over 8 years ago

In reply to: Peter J. Holzer (#19)

Re: large numbers of inserts out of memory strategy

On Thu, Nov 30, 2017 at 4:22 AM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

On 2017-11-29 08:32:02 -0600, Ted Toth wrote:

Yes I did generate 1 large DO block:

DO $$
DECLARE thingid bigint; thingrec bigint; thingdataid bigint;
BEGIN
INSERT INTO thing
(ltn,classification,machine,source,thgrec,flags,serial,type) VALUES
('T007336','THING',0,1025,7336,7,'XXX869977564',1) RETURNING id,thgrec
INTO thingid,thingrec;
INSERT INTO recnum_thing (recnum,thing_id) VALUES (thingrec,thingid);
INSERT INTO thingstatus
(thing_id,nrpts,rmks_cs,force_type_id,ftn_cs,force_code,arr_cs,mask,toi_state,plot_id,signa_cs,lastchange,des_cs,rig_cs,ownship,correlation,maxrpts,rtn_cs,ctc_cs,group_mask,dep_cs)
VALUES (thingid,121,'{0,0,0,0}',440,0,23,0,0,0,'{23,-1,3803,3805,-1,-1,0,6}',0,1509459164,0,0,0,0,1000,0,0,0,0);
INSERT INTO thinger
(thing_id,spe_key,cse_unc,lat_spd,cov,dtg,lng,spd,ave_spd,cse,tol,nrpts,lat,alpha,sigma,spd_unc,lng_spd)
VALUES (thingid,-1,0.0,-6.58197336634e-08,'{4.27624291532e-09,0.0,3.07802916488e-09,0.0,4.27624291532e-09,0.0,3.07802916488e-09,4.16110417234e-08,0.0,4.16110417234e-08}',1509459163,2.21596980095,0.000226273215958,1.0,0.0,0.10000000149,121,0.584555745125,10.0,4.23079740131e-08,0.0,-2.49999881907e-10);
INSERT INTO thingdata
(thing_id,category,db_num,xref,org_type,trademark,shortname,fcode,platform,callsign,type,orig_xref,shipclass,home_base,uic,service,di,lngfixed,hull,precision,alert,flag,besufx,name,mmsi,catcode,ntds,imo,pn_num,chxref,threat,sconum,latfixed,db_type,pif,echelon,jtn,quantity,overwrite)
VALUES (thingid,'XXX','','','','','004403704','23','','','','','UNEQUATED','','','','',0.0,'','{0,0,0,0,0}','','KS','','UNKNOWN','004403704','','','','','','AFD','',0.0,3,'','',0,0,0)
RETURNING id INTO thingdataid;
INSERT INTO thingnum (thingdata_id,thgnum,state,dtg,cmd) VALUES
(thingdataid,'013086',0,1502970401,'FOO');

<repeated for each thing>

END $$;

Should I limit the number of 'thing' inserts within a DO block or
wrapping each 'thing' insert in it's own DO block?

Thanks for the specific suggestions.

I would suggest getting rid of the do block entirely if that is
possible. Just create lots of insert statements. You can get the current
value of a sequence with currval('sequence_name').

What is the downside of using a DO block? I'd have to do a nextval on
each sequence before I could use currval, right? Or I could do 'select
last_value from <sequence>'.

One thing that is unclear to me is when commits occur while using psql
would you know where in the docs I can find information on this
subject?

Alternately or in addition, since you are using python, you might want
to insert directly into the database from python using psycopg2. For
separate insert statements that should have about the same performance.
(It is usually much faster to write to a csv file and load that with
copy than to insert each row, but you don't do that and it might be
difficult in your case).

Yes, I thought about generating csv files but didn't see a way to deal
with the foreign keys.

Show quoted text

hp

--
_ | Peter J. Holzer | we build much bigger, better disasters now
|_|_) | | because we have much more sophisticated
| | | hjp@hjp.at | management tools.
__/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

#21

Peter J. Holzer

hjp-pgsql@hjp.at

over 8 years ago

In reply to: Ted Toth (#20)

#22