Work table

Started by Robert Jamesover 12 years ago10 messagesgeneral
Jump to latest
#1Robert James
srobertjames@gmail.com

I'm using Postgres for data analysis (interactive and batch). I need
to focus the analysis on a subset of one table, and, for both
performance and simplicity, have a function which loads that subset
into another table (DELETE FROM another_table; INSERT INTO
another_table SELECT ...).

Oddly enough, although the SELECT itself is very quick (< 1 s), the
DELETE and INSERT can take over a minute! I can't figure out why.
another_table is simple: it has only 7 fields. Two of those fields
are indexed, using a simple one field standard index. There are no
triggers on it.

What is the cause of this behavior? What should I do to make this
faster? Is there a recommended work around?

(I'm hesitant to drop another_table and recreate it each time, since
many views depend on it.)

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Thomas Kellerer
spam_eater@gmx.net
In reply to: Robert James (#1)
Re: Work table

Robert James wrote on 27.10.2013 20:47:

I'm using Postgres for data analysis (interactive and batch). I need
to focus the analysis on a subset of one table, and, for both
performance and simplicity, have a function which loads that subset
into another table (DELETE FROM another_table; INSERT INTO
another_table SELECT ...).

Oddly enough, although the SELECT itself is very quick (< 1 s), the
DELETE and INSERT can take over a minute! I can't figure out why.
another_table is simple: it has only 7 fields. Two of those fields
are indexed, using a simple one field standard index. There are no
triggers on it.

What is the cause of this behavior? What should I do to make this
faster? Is there a recommended work around?

(I'm hesitant to drop another_table and recreate it each time, since
many views depend on it.)

DELETE can be a quite lengthy thing to do - especially with a large number of rows.

If you use TRUNCATE instead, this will be *much* quicker with the additional benefit,
that if you INSERT the rows in the same transaction, the INSERT will require much less
I/O because it's not logged.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Robert James
srobertjames@gmail.com
In reply to: Thomas Kellerer (#2)
Re: Work table

On 10/27/13, Thomas Kellerer <spam_eater@gmx.net> wrote:

Robert James wrote on 27.10.2013 20:47:

I'm using Postgres for data analysis (interactive and batch). I need
to focus the analysis on a subset of one table, and, for both
performance and simplicity, have a function which loads that subset
into another table (DELETE FROM another_table; INSERT INTO
another_table SELECT ...).

Oddly enough, although the SELECT itself is very quick (< 1 s), the
DELETE and INSERT can take over a minute! I can't figure out why.
another_table is simple: it has only 7 fields. Two of those fields
are indexed, using a simple one field standard index. There are no
triggers on it.

What is the cause of this behavior? What should I do to make this
faster? Is there a recommended work around?

(I'm hesitant to drop another_table and recreate it each time, since
many views depend on it.)

DELETE can be a quite lengthy thing to do - especially with a large number
of rows.

If you use TRUNCATE instead, this will be *much* quicker with the additional
benefit,
that if you INSERT the rows in the same transaction, the INSERT will require
much less
I/O because it's not logged.

Changing DELETE to TRUNCATE and putting it all in a transaction
brought the time down to 40 seconds. But this is still awfully slow,
when the SELECT is under a second.

Is there another problem here? Perhaps something to do with
triggerring autovacuum?

Or should I be using a different type of table for work tables? (RAM only table)

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Robert James (#3)
Re: Work table

On 10/27/2013 02:23 PM, Robert James wrote:

On 10/27/13, Thomas Kellerer <spam_eater@gmx.net> wrote:

Robert James wrote on 27.10.2013 20:47:

I'm using Postgres for data analysis (interactive and batch). I need
to focus the analysis on a subset of one table, and, for both
performance and simplicity, have a function which loads that subset
into another table (DELETE FROM another_table; INSERT INTO
another_table SELECT ...).

Oddly enough, although the SELECT itself is very quick (< 1 s), the
DELETE and INSERT can take over a minute! I can't figure out why.
another_table is simple: it has only 7 fields. Two of those fields
are indexed, using a simple one field standard index. There are no
triggers on it.

What is the cause of this behavior? What should I do to make this
faster? Is there a recommended work around?

(I'm hesitant to drop another_table and recreate it each time, since
many views depend on it.)

DELETE can be a quite lengthy thing to do - especially with a large number
of rows.

If you use TRUNCATE instead, this will be *much* quicker with the additional
benefit,
that if you INSERT the rows in the same transaction, the INSERT will require
much less
I/O because it's not logged.

Changing DELETE to TRUNCATE and putting it all in a transaction
brought the time down to 40 seconds. But this is still awfully slow,
when the SELECT is under a second.

Is there another problem here? Perhaps something to do with
triggerring autovacuum?

Is there a FK relationship involved?

Could we see the schema for another_table?

Or should I be using a different type of table for work tables? (RAM only table)

--
Adrian Klaver
adrian.klaver@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Robert James
srobertjames@gmail.com
In reply to: Adrian Klaver (#4)
Re: Work table

On 10/27/13, Adrian Klaver <adrian.klaver@gmail.com> wrote:

On 10/27/2013 02:23 PM, Robert James wrote:

On 10/27/13, Thomas Kellerer <spam_eater@gmx.net> wrote:

Robert James wrote on 27.10.2013 20:47:

I'm using Postgres for data analysis (interactive and batch). I need
to focus the analysis on a subset of one table, and, for both
performance and simplicity, have a function which loads that subset
into another table (DELETE FROM another_table; INSERT INTO
another_table SELECT ...).

Oddly enough, although the SELECT itself is very quick (< 1 s), the
DELETE and INSERT can take over a minute! I can't figure out why.
another_table is simple: it has only 7 fields. Two of those fields
are indexed, using a simple one field standard index. There are no
triggers on it.

What is the cause of this behavior? What should I do to make this
faster? Is there a recommended work around?

(I'm hesitant to drop another_table and recreate it each time, since
many views depend on it.)

DELETE can be a quite lengthy thing to do - especially with a large
number
of rows.

If you use TRUNCATE instead, this will be *much* quicker with the
additional
benefit,
that if you INSERT the rows in the same transaction, the INSERT will
require
much less
I/O because it's not logged.

Changing DELETE to TRUNCATE and putting it all in a transaction
brought the time down to 40 seconds. But this is still awfully slow,
when the SELECT is under a second.

Is there another problem here? Perhaps something to do with
triggerring autovacuum?

Is there a FK relationship involved?

Could we see the schema for another_table?

1. No FK
2. I removed the indexes from the table
TRUNCATE takes only 40 ms, but the INSERT still takes 10s!
3. ALTER TABLE another_table SET (autovacuum_enabled = true,
toast.autovacuum_enabled = true); didn't seem to make a difference

4. Here's the schema:

CREATE TABLE another_table
(
id serial NOT NULL,
eventtime timestamp without time zone NOT NULL,
reporter character varying NOT NULL,
loc character varying NOT NULL,
city character varying NOT NULL,
stanza character varying,
purdue character varying,
CONSTRAINT segment_pkey PRIMARY KEY (id)
)

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Robert James (#5)
Re: Work table

On 10/27/2013 02:48 PM, Robert James wrote:

On 10/27/13, Adrian Klaver <adrian.klaver@gmail.com> wrote:

Is there another problem here? Perhaps something to do with
triggerring autovacuum?

Is there a FK relationship involved?

Could we see the schema for another_table?

1. No FK
2. I removed the indexes from the table
TRUNCATE takes only 40 ms, but the INSERT still takes 10s!

So how many records are we talking about?

Also complete this sentence :)

INSERT INTO
another_table SELECT ...

In other words what is the SELECT statement for the INSERT?

Also, you mentioned the above was in a function. What is the function
body and how is it being called?

3. ALTER TABLE another_table SET (autovacuum_enabled = true,
toast.autovacuum_enabled = true); didn't seem to make a difference

4. Here's the schema:

CREATE TABLE another_table
(
id serial NOT NULL,
eventtime timestamp without time zone NOT NULL,
reporter character varying NOT NULL,
loc character varying NOT NULL,
city character varying NOT NULL,
stanza character varying,
purdue character varying,
CONSTRAINT segment_pkey PRIMARY KEY (id)
)

--
Adrian Klaver
adrian.klaver@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Eelke Klein
eelke@bolt.nl
In reply to: Robert James (#3)
Re: Work table

2013/10/27 Robert James <srobertjames@gmail.com>

On 10/27/13, Thomas Kellerer <spam_eater@gmx.net> wrote:

Robert James wrote on 27.10.2013 20:47:

I'm using Postgres for data analysis (interactive and batch). I need
to focus the analysis on a subset of one table, and, for both
performance and simplicity, have a function which loads that subset
into another table (DELETE FROM another_table; INSERT INTO
another_table SELECT ...).

Oddly enough, although the SELECT itself is very quick (< 1 s), the
DELETE and INSERT can take over a minute! I can't figure out why.
another_table is simple: it has only 7 fields. Two of those fields
are indexed, using a simple one field standard index. There are no
triggers on it.

What is the cause of this behavior? What should I do to make this
faster? Is there a recommended work around?

(I'm hesitant to drop another_table and recreate it each time, since
many views depend on it.)

DELETE can be a quite lengthy thing to do - especially with a large

number

of rows.

If you use TRUNCATE instead, this will be *much* quicker with the

additional

benefit,
that if you INSERT the rows in the same transaction, the INSERT will

require

much less
I/O because it's not logged.

Changing DELETE to TRUNCATE and putting it all in a transaction
brought the time down to 40 seconds. But this is still awfully slow,
when the SELECT is under a second.

How many rows are being inserted?

Is there another problem here? Perhaps something to do with
triggerring autovacuum?

Or should I be using a different type of table for work tables? (RAM only
table)

You could use a TEMP or UNLOGGED table depending on how long you need it to
stay around (for these types of tables data won't be forced to disk before
returning from the COMMIT).

#8Robert James
srobertjames@gmail.com
In reply to: Adrian Klaver (#6)
Re: Work table

It could be > 1 Million rows.

SELECT is:

SELECT *
FROM another_table
WHERE
eventtime > (SELECT e FROM tags WHERE id = $1) AND
eventtime < (SELECT e FROM tags WHERE id = $2)
;

$1 and $2 are integers.

SELECT ran just now, returning >1Million rows, in 1.6 seconds.

Inserting into work table causes weird behavior - it takes over a
minute, PG CPU climbs to 100%, but then other subsequent queries
sometimes seem to slow down too. After a lot of these, sometimes PG
acts irresponsive until I restart it.

The function is just a wrapper to set $1 and $2. I get the same
behavior when I try just its SQL, no function.

On 10/27/13, Adrian Klaver <adrian.klaver@gmail.com> wrote:

On 10/27/2013 02:48 PM, Robert James wrote:

On 10/27/13, Adrian Klaver <adrian.klaver@gmail.com> wrote:

Is there another problem here? Perhaps something to do with
triggerring autovacuum?

Is there a FK relationship involved?

Could we see the schema for another_table?

1. No FK
2. I removed the indexes from the table
TRUNCATE takes only 40 ms, but the INSERT still takes 10s!

So how many records are we talking about?

Also complete this sentence :)

INSERT INTO
another_table SELECT ...

In other words what is the SELECT statement for the INSERT?

Also, you mentioned the above was in a function. What is the function
body and how is it being called?

3. ALTER TABLE another_table SET (autovacuum_enabled = true,
toast.autovacuum_enabled = true); didn't seem to make a difference

4. Here's the schema:

CREATE TABLE another_table
(
id serial NOT NULL,
eventtime timestamp without time zone NOT NULL,
reporter character varying NOT NULL,
loc character varying NOT NULL,
city character varying NOT NULL,
stanza character varying,
purdue character varying,
CONSTRAINT segment_pkey PRIMARY KEY (id)
)

--
Adrian Klaver
adrian.klaver@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Robert James (#8)
Re: Work table

On 10/28/2013 12:50 PM, Robert James wrote:

It could be > 1 Million rows.

Well that would be a reason.

SELECT is:

SELECT *
FROM another_table
WHERE
eventtime > (SELECT e FROM tags WHERE id = $1) AND
eventtime < (SELECT e FROM tags WHERE id = $2)
;

$1 and $2 are integers.

SELECT ran just now, returning >1Million rows, in 1.6 seconds.

Not surprising there is less overhead for a SELECT than an INSERT.

Inserting into work table causes weird behavior - it takes over a
minute, PG CPU climbs to 100%, but then other subsequent queries
sometimes seem to slow down too. After a lot of these, sometimes PG
acts irresponsive until I restart it.

Below you say it takes 10s.

The function is just a wrapper to set $1 and $2. I get the same
behavior when I try just its SQL, no function.

On 10/27/13, Adrian Klaver <adrian.klaver@gmail.com> wrote:

On 10/27/2013 02:48 PM, Robert James wrote:

On 10/27/13, Adrian Klaver <adrian.klaver@gmail.com> wrote:

Is there another problem here? Perhaps something to do with
triggerring autovacuum?

Is there a FK relationship involved?

Could we see the schema for another_table?

1. No FK
2. I removed the indexes from the table
TRUNCATE takes only 40 ms, but the INSERT still takes 10s!

So how many records are we talking about?

Also complete this sentence :)

INSERT INTO
another_table SELECT ...

In other words what is the SELECT statement for the INSERT?

Also, you mentioned the above was in a function. What is the function
body and how is it being called?

3. ALTER TABLE another_table SET (autovacuum_enabled = true,
toast.autovacuum_enabled = true); didn't seem to make a difference

4. Here's the schema:

CREATE TABLE another_table
(
id serial NOT NULL,
eventtime timestamp without time zone NOT NULL,
reporter character varying NOT NULL,
loc character varying NOT NULL,
city character varying NOT NULL,
stanza character varying,
purdue character varying,
CONSTRAINT segment_pkey PRIMARY KEY (id)
)

--
Adrian Klaver
adrian.klaver@gmail.com

--
Adrian Klaver
adrian.klaver@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10Bill Moran
wmoran@potentialtech.com
In reply to: Robert James (#8)
Re: Work table

On Mon, 28 Oct 2013 15:50:42 -0400 Robert James <srobertjames@gmail.com> wrote:

It could be > 1 Million rows.

SELECT is:

SELECT *
FROM another_table
WHERE
eventtime > (SELECT e FROM tags WHERE id = $1) AND
eventtime < (SELECT e FROM tags WHERE id = $2)
;

$1 and $2 are integers.

SELECT ran just now, returning >1Million rows, in 1.6 seconds.

Inserting into work table causes weird behavior - it takes over a
minute, PG CPU climbs to 100%, but then other subsequent queries
sometimes seem to slow down too. After a lot of these, sometimes PG
acts irresponsive until I restart it.

Depending on the nature of your hardware, this might not be weird ...

For example, if you have enough RAM that all of another_table fits
in RAM, and (comparitively) slow disks, the SELECT would run
considerably faster than an insert of the same size.

--
Bill Moran <wmoran@potentialtech.com>

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general