pgsql: In COPY, insert tuples to the heap in batches.

Started by Heikki Linnakangasabout 14 years ago3 messages
#1Heikki Linnakangas
heikki.linnakangas@iki.fi

In COPY, insert tuples to the heap in batches.

This greatly reduces the WAL volume, especially when the table is narrow.
The overhead of locking the heap page is also reduced. Reduced WAL traffic
also makes it scale a lot better, if you run multiple COPY processes at
the same time.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/d326d9e8ea1d690cf6d968000efaa5121206d231

Modified Files
--------------
src/backend/access/heap/heapam.c | 484 ++++++++++++++++++++++++++++++++++----
src/backend/commands/copy.c | 166 ++++++++++++-
src/backend/postmaster/pgstat.c | 6 +-
src/include/access/heapam.h | 2 +
src/include/access/htup.h | 31 +++
src/include/pgstat.h | 2 +-
6 files changed, 629 insertions(+), 62 deletions(-)

#2Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#1)
Re: [COMMITTERS] pgsql: In COPY, insert tuples to the heap in batches.

On Wed, Nov 9, 2011 at 9:06 AM, Heikki Linnakangas
<heikki.linnakangas@iki.fi> wrote:

In COPY, insert tuples to the heap in batches.

This greatly reduces the WAL volume, especially when the table is narrow.
The overhead of locking the heap page is also reduced. Reduced WAL traffic
also makes it scale a lot better, if you run multiple COPY processes at
the same time.

Sounds good.

I can't see where this applies backup blocks. If it does, can you
document why/where/how it differs from other WAL records?

There's no need for conflict processing on replay with this new WAL
record type. But you should document that and alter the comments that
say it is necessary. Search "conflict".

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#3Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#2)
Re: [COMMITTERS] pgsql: In COPY, insert tuples to the heap in batches.

On 09.11.2011 15:25, Simon Riggs wrote:

On Wed, Nov 9, 2011 at 9:06 AM, Heikki Linnakangas
<heikki.linnakangas@iki.fi> wrote:

In COPY, insert tuples to the heap in batches.

This greatly reduces the WAL volume, especially when the table is narrow.
The overhead of locking the heap page is also reduced. Reduced WAL traffic
also makes it scale a lot better, if you run multiple COPY processes at
the same time.

Sounds good.

I can't see where this applies backup blocks. If it does, can you
document why/where/how it differs from other WAL records?

Good catch, I missed that. I copied the redo function from normal
insertion, but missed that heap_redo() takes care of backup blocks for
you, while heap2_redo() does not.

I'll go fix that..

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com