Inserting streamed data

Started by Kevin Oldover 23 years ago5 messagesgeneral

kold@carolina.rr.com

over 23 years ago

Hello everyone,

I have data that is streamed to my server and stored in a text file. I
need to get that data into my database as fast as possible. There are
approximately 160,000 rows in this text file. I understand I can use
the COPY command to insert large chunks of data from a text file, but I
can't use it in this situation. Each record in the text file has 502
"fields". I pull out 50 of those. I haven't found a way to manipulate
the COPY command to pull out the values I need. So that solution would
be out.

I have a perl script that goes through the file and pulls out the 50
fields, then inserts them into the database, but it seems to be very
slow. I think I just need some minor performance tuning, but dont' know
which variables to set in the postgresql.conf file that would help with
the speed of the inserts.

Here's my postgresql.conf file now:

max_connections = 10
shared_buffers = 20

I'm running a Solaris 2.7 with 2GB RAM.

Also, saw this at
http://developer.postgresql.org/docs/postgres/kernel-resources.html

[snip...]

Solaris

At least in version 2.6, the default maximum size of a shared
memory segments is too low for PostgreSQL. The relevant settings
can be changed in /etc/system, for example:

set shmsys:shminfo_shmmax=0x2000000
set shmsys:shminfo_shmmin=1
set shmsys:shminfo_shmmni=256
set shmsys:shminfo_shmseg=256

set semsys:seminfo_semmap=256
set semsys:seminfo_semmni=512
set semsys:seminfo_semmns=512
set semsys:seminfo_semmsl=32

[snip...]

Should I do this?

Thanks,
Kevin

--
Kevin Old <kold@carolina.rr.com>

Doug McNaught

doug@mcnaught.org

over 23 years ago

In reply to: Kevin Old (#1)

Re: Inserting streamed data

Kevin Old <kold@carolina.rr.com> writes:

I have data that is streamed to my server and stored in a text file. I
need to get that data into my database as fast as possible. There are
approximately 160,000 rows in this text file. I understand I can use
the COPY command to insert large chunks of data from a text file, but I
can't use it in this situation. Each record in the text file has 502
"fields". I pull out 50 of those. I haven't found a way to manipulate
the COPY command to pull out the values I need. So that solution would
be out.

I have a perl script that goes through the file and pulls out the 50
fields, then inserts them into the database, but it seems to be very
slow. I think I just need some minor performance tuning, but dont' know
which variables to set in the postgresql.conf file that would help with
the speed of the inserts.

First: are you batching up multiple INSERTS in a transaction? If you
don't it will be very slow indeed.

Second, why not have the Perl script pull out the fields you want,
paste them together and feed them to COPY? That should eliminate the
parse overhead of multiple INSERTS.

-Doug

Import Notes

Reply to msg id not found: KevinOld'smessageof31Oct2002131149-0500

David Blood

david@matraex.com

over 23 years ago

In reply to: Doug McNaught (#2)

Re: Inserting streamed data

Why not use use your perl or awk or sed to rebuild the text file with
the columns you want in the order that you want then copy in. This is
the only way we have found to get large amount of data inserted quickly.

David Blood
Matraex, Inc

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Kevin Old
Sent: Thursday, October 31, 2002 11:12 AM
To: pgsql
Subject: [GENERAL] Inserting streamed data

Hello everyone,

Here's my postgresql.conf file now:

max_connections = 10
shared_buffers = 20

I'm running a Solaris 2.7 with 2GB RAM.

Also, saw this at
http://developer.postgresql.org/docs/postgres/kernel-resources.html

[snip...]

Solaris

At least in version 2.6, the default maximum size of a shared
memory segments is too low for PostgreSQL. The relevant settings
can be changed in /etc/system, for example:

set shmsys:shminfo_shmmax=0x2000000
set shmsys:shminfo_shmmin=1
set shmsys:shminfo_shmmni=256
set shmsys:shminfo_shmseg=256

set semsys:seminfo_semmap=256
set semsys:seminfo_semmni=512
set semsys:seminfo_semmns=512
set semsys:seminfo_semmsl=32

[snip...]

Should I do this?

Thanks,
Kevin

--
Kevin Old <kold@carolina.rr.com>

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Import Notes

Resolved by subject fallback

codeWarrior

GPatnude@adelphia.net

over 23 years ago

In reply to: Kevin Old (#1)

Re: Inserting streamed data

Does your table have an index ?? -- You can probably speed it up
significantly by

Preparing the datafile...
Beginning a transaction...

Dropping the index...
Doing the 160,000 insert(s)...
Rebuilding the index...
Committing the transaction...

Ending the transaction

"Kevin Old" <kold@carolina.rr.com> wrote in message
news:1036087909.3123.54.camel@oc...

Show quoted text

Hello everyone,

I have data that is streamed to my server and stored in a text file. I
need to get that data into my database as fast as possible. There are
approximately 160,000 rows in this text file. I understand I can use
the COPY command to insert large chunks of data from a text file, but I
can't use it in this situation. Each record in the text file has 502
"fields". I pull out 50 of those. I haven't found a way to manipulate
the COPY command to pull out the values I need. So that solution would
be out.

I have a perl script that goes through the file and pulls out the 50
fields, then inserts them into the database, but it seems to be very
slow. I think I just need some minor performance tuning, but dont' know
which variables to set in the postgresql.conf file that would help with
the speed of the inserts.

Here's my postgresql.conf file now:

max_connections = 10
shared_buffers = 20

I'm running a Solaris 2.7 with 2GB RAM.

Also, saw this at
http://developer.postgresql.org/docs/postgres/kernel-resources.html

[snip...]

Solaris

At least in version 2.6, the default maximum size of a shared
memory segments is too low for PostgreSQL. The relevant settings
can be changed in /etc/system, for example:

set shmsys:shminfo_shmmax=0x2000000
set shmsys:shminfo_shmmin=1
set shmsys:shminfo_shmmni=256
set shmsys:shminfo_shmseg=256

set semsys:seminfo_semmap=256
set semsys:seminfo_semmni=512
set semsys:seminfo_semmns=512
set semsys:seminfo_semmsl=32

[snip...]

Should I do this?

Thanks,
Kevin

--
Kevin Old <kold@carolina.rr.com>

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Csaba Nagy

nagy@domeus.de

over 23 years ago

In reply to: codeWarrior (#4)

Re: Inserting streamed data

Why don't you pull out the fields with the perl script and write them to a
temprary table, and use COPY to import from that one ?
Perl should be fast with the files, Postgres with the COPY...

Regards,
Csaba.

-----Ursprüngliche Nachricht-----
Von: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]Im Auftrag von Greg Patnude
Gesendet: Samstag, 2. November 2002 18:08
An: pgsql-general@postgresql.org
Betreff: Re: [GENERAL] Inserting streamed data

Does your table have an index ?? -- You can probably speed it up
significantly by

Preparing the datafile...
Beginning a transaction...

Dropping the index...
Doing the 160,000 insert(s)...
Rebuilding the index...
Committing the transaction...

Ending the transaction

"Kevin Old" <kold@carolina.rr.com> wrote in message
news:1036087909.3123.54.camel@oc...

Hello everyone,

I have data that is streamed to my server and stored in a text file. I
need to get that data into my database as fast as possible. There are
approximately 160,000 rows in this text file. I understand I can use
the COPY command to insert large chunks of data from a text file, but I
can't use it in this situation. Each record in the text file has 502
"fields". I pull out 50 of those. I haven't found a way to manipulate
the COPY command to pull out the values I need. So that solution would
be out.

I have a perl script that goes through the file and pulls out the 50
fields, then inserts them into the database, but it seems to be very
slow. I think I just need some minor performance tuning, but dont' know
which variables to set in the postgresql.conf file that would help with
the speed of the inserts.

Here's my postgresql.conf file now:

max_connections = 10
shared_buffers = 20

I'm running a Solaris 2.7 with 2GB RAM.

Also, saw this at
http://developer.postgresql.org/docs/postgres/kernel-resources.html

[snip...]

Solaris

At least in version 2.6, the default maximum size of a shared
memory segments is too low for PostgreSQL. The relevant settings
can be changed in /etc/system, for example:

set shmsys:shminfo_shmmax=0x2000000
set shmsys:shminfo_shmmin=1
set shmsys:shminfo_shmmni=256
set shmsys:shminfo_shmseg=256

set semsys:seminfo_semmap=256
set semsys:seminfo_semmni=512
set semsys:seminfo_semmns=512
set semsys:seminfo_semmsl=32

[snip...]

Should I do this?

Thanks,
Kevin

--
Kevin Old <kold@carolina.rr.com>

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Import Notes

Resolved by subject fallback