Re: Introducing an advanced Frequent Update

Started by Luke Lonerganabout 19 years ago4 messages

LLonergan@greenplum.com

about 19 years ago

Simon,

Bring it on! We at GP have been evaluating various approaches to index
organized tables which unify index with heap storage to solve some of
the problems you mention. Split index and heap is a big issue in
Postgres and we'd all welcome a good solution to it, even for limited
circumstances like single index organization or the like.

- Luke

Show quoted text

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Simon Riggs
Sent: Monday, November 06, 2006 1:51 PM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] Introducing an advanced Frequent Update
Optimization

EnterpriseDB has been running a research project to improve
the performance of heavily updated tables. We have a number
of approaches prototyped and we'd like to discuss the best of
these now on -hackers for community input and patch
submission to PostgreSQL core.

The most important step with any proposal is to agree that we
have an issue that needs improvement, discuss how widespread
that issue is and find some clear test cases that show up the
problems. Tests are:

1. pgbench reveals performance that will degrade over a long period.

2. DBT-2 reveals performance that will degrade over a long
period. Many tests over a 2 hour period don't fully show
this, especially when the test is cafeully tuned.

3. Some common scenarios in applications are where some rows
of a table are "hot" from being constantly updated, while
others are not. An example of such a test case is the
truckin' test, included here. It's based directly on a
specific customer application, but its been generalised to
make sure the underlying design pattern is clear.

These tests reveal the following issues, all of which are known:

- update performs inserts into indexes, as well as into heap blocks

- VACUUM can remove heap blocks easily, but performs much
worse on indexes, making VACUUM a less good solution. We have
now been able to speed up index VACUUM, but this require us
to scan the whole index for correct locking. VACUUM scans the
whole table, whereas dead rows may well be localised.
Heap-needs-vacuum-bitmap has been proposed here, but no
solution currently exists for vacuuming only parts of indexes
and so proposals for concurrent vacuums are now being considered.

- indexes that have been stretched apart by updates do not
ever coalesce again and require regular REINDEX, which is not
yet possible concurrently; the contention caused by this
would be catastrophic for performance, even if anybody knew
of a way to do this concurrently.

- There are specific issues with the optimizer's ability to
understand dead row numbers, which can in some cases lead to
SeqScan plans that are inappropriate when tables grow because
of updates. This is a red-herring that can lead to people
thinking the situation is worse than it is; that needs
fixing, but the core issues mentioned above remain.

To alleviate these problems we've added features such as WITH
fillfactor for heaps and table-level autovacuum tuning.
Tuning all of these features to good effect is an art form
that is beyond the reasonable for most users. Many internal
optimizations have been made in this area and as can be seen,
many are still required to achieve better performance.

The proposal about to be made takes a more radical approach
and re-examines the architecture of the heap, to allow us to
consider much faster designs for heavy UPDATEs. Although
initially radical, the proposal appears to be fully MVCC
correct, crash safe as well as being much faster under heavy
updates, while approximately neutral in other cases with no
major downsides.

Why should we care? The UPDATE case has obvious use-cases in
a business design pattern I'll call CustomerAccountDebit
which is pervasive in pay-per-use websites, banks, telephone
companies, road traffic monitoring etc etc. It's also
pervasive in Data Warehousing where summary
tables/materialized views are regularly updated to maintain a
current picture of spending, movements or any other
accumulation of event detail. It's everywhere, basically.

Your various viewpoints on the above are welcome, but
assuming for the moment that you agree so far, we can move
towards the proposal...

These discussions will likely be lengthy if taken seriously
and need to cover a range of different topics to ensure we
cover what we know and ensure we listen to all the feedback
everybody gives. To that end, I'd like to introduce two
colleagues of mine to the community, Pavan Deolasee and
Nikhil Sontakke who have been working hard on developing the
prototypes and measuring/tuning them respectively.

I would stress that we are not bringing our first prototype
to the table, but actually design #5. We think you'll be
interested, but we won't take that for granted.

Our next steps will be to

- discuss various other approaches to the problem, and why we
are now proposing one specific approach and receive "why dont
we..." feedback and additional ideas (Simon)

- discuss the proposal in technical depth, explain the
challenges that remain and ask for feedback and input on
those, with specific regard to low-level coding (Pavan)

- present details of performance testing done so far (Nikhil)

- explain the measures we have taken to prove the correctness
of our approach for MVCC, crash safety and PITR (Simon)

Each of these areas will be started as a separate thread of
discussion on -hackers, to allow us to stay focused on those topics.

But before we do that, any comments on the above?

---

The truckin test case included here consists of a complex
update function that is executed by a custom pgbench script,
on the postgres database.

psql -f truckin.sql postgres

for each test

pgbench -n -f truckin.pgb postgres

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

Heikki Linnakangas

heikki@enterprisedb.com

about 19 years ago

In reply to: Luke Lonergan (#1)

Block B-tree etc. (was Re: Introducing an advanced Frequent Update)

Luke Lonergan wrote:

Simon,

Bring it on! We at GP have been evaluating various approaches to index
organized tables which unify index with heap storage to solve some of
the problems you mention. Split index and heap is a big issue in
Postgres and we'd all welcome a good solution to it, even for limited
circumstances like single index organization or the like.

I don't think Simon's proposal is meant to address that issue, but did
you follow the thread I started in September about Block B-Tree index:
http://archives.postgresql.org/pgsql-hackers/2006-09/msg02041.php

That's our plan to achieve speedups that other DBMS's have achieved with
Index-Organized-Tables or Clustered Indexes. We're running initial
performance tests of it as we speak, and if all goes well we're hoping
to get that into PostgreSQL 8.3.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Josh Berkus

josh@agliodbs.com

about 19 years ago

In reply to: Heikki Linnakangas (#2)

Re: Block B-tree etc. (was Re: Introducing an advanced Frequent Update)

Luke,

That's our plan to achieve speedups that other DBMS's have achieved with
Index-Organized-Tables or Clustered Indexes. We're running initial
performance tests of it as we speak, and if all goes well we're hoping
to get that into PostgreSQL 8.3.

Also, if the EDB folks weren't clear, the two proposals are meant to
compliment each other.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

Luke Lonergan

llonergan@greenplum.com

about 19 years ago

In reply to: Heikki Linnakangas (#2)

Re: Block B-tree etc. (was Re: Introducing an advanced Frequent

Heikki,

On 11/7/06 1:51 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote:

I don't think Simon's proposal is meant to address that issue, but did
you follow the thread I started in September about Block B-Tree index:
http://archives.postgresql.org/pgsql-hackers/2006-09/msg02041.php

Aha - mystery solved - thanks for the clarification.

Block Btree ~= Lossy Btree

- Luke