MVCC and index-only read

Started by Leonardo Francalanciover 17 years ago19 messagesgeneral
Jump to latest
#1Leonardo Francalanci
m_lists@yahoo.it

Hi,

if I got it right the reason some aggregates (such as COUNT) using only index columns are "slow" on postgresql is that it uses MVCC, so it has to read the data as well as the index. It makes sense to me, but I don't understand is how other databases (such as Oracle) do it.
Can someone explain?

Thank you.

Unisciti alla community di Io fotografo e video, il nuovo corso di fotografia di Gazzetta dello sport:
http://www.flickr.com/groups/iofotografoevideo

#2Sam Mason
sam@samason.me.uk
In reply to: Leonardo Francalanci (#1)
Re: MVCC and index-only read

On Tue, Nov 18, 2008 at 04:49:35PM +0000, Scara Maccai wrote:

if I got it right the reason some aggregates (such as COUNT) using
only index columns are "slow" on postgresql is that it uses MVCC, so
it has to read the data as well as the index.

Every aggregate (of which COUNT is just one example) has to read data
from both the index and the table. The reason is that each row in a
table has two important identifiers; the transaction that created it and
the transaction that killed it. Every time a query scans the table it
looks to see that both the transaction that created it COMMITed and that
transaction that killed it (if any) didn't COMMIT. The index doesn't
contain these two identifiers so when scanning the index the code needs
to go and check what these are.

There are various optimizations in PG so that it doesn't need to
actually check the transaction numbers the whole time, thus speeding
things up a bit, but the semantics/behavior is the same.

It makes sense to me,
but I don't understand is how other databases (such as Oracle) do it.

I believe Oracle maintains a separate log (not sure how it's structured)
that contains this information and all the data in both the main table
and index can be considered committed.

There are tradeoffs in both directions; PG's implementation allows
greater concurrency, but Oracle's way is more optimized for read access.
Which implementation is better depends a lot on your work load.

There has been talk of adding the transaction identifiers into the
indexes in PG, which would mean that index scans wouldn't need to go
to the table. The problem is that the indexes would be larger and
modifying data would incur larger overheads as both the data and index
would have to be updated.

I hope someone will point out any mistakes I've made!

Sam

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sam Mason (#2)
Re: MVCC and index-only read

Sam Mason <sam@samason.me.uk> writes:

On Tue, Nov 18, 2008 at 04:49:35PM +0000, Scara Maccai wrote:

It makes sense to me,
but I don't understand is how other databases (such as Oracle) do it.

I believe Oracle maintains a separate log (not sure how it's structured)
that contains this information and all the data in both the main table
and index can be considered committed.

FWIW, I believe that count(*) is pretty slow in Oracle too. The DBs
that can do it fast are the ones that maintain a centralized counter
of the number of rows in each table. Which makes count(*) nice and
fast, at the cost of horrendous concurrency impacts for updates; plus
there's no chance of real MVCC operation. (In an MVCC world the correct
answer for count(*) can vary depending on who's asking --- there's no
hope of doing that with a single counter.)

regards, tom lane

#4Jonah H. Harris
jonah.harris@gmail.com
In reply to: Sam Mason (#2)
Re: MVCC and index-only read

On Tue, Nov 18, 2008 at 12:48 PM, Sam Mason <sam@samason.me.uk> wrote:

It makes sense to me,
but I don't understand is how other databases (such as Oracle) do it.

There are tradeoffs in both directions; [...] but Oracle's way is more optimized

<response type="snarky">
For the most part, that's all you needed to say :)
</response>

--
Jonah H. Harris, Senior DBA
myYearbook.com

#5Leonardo Francalanci
m_lists@yahoo.it
In reply to: Tom Lane (#3)
Re: MVCC and index-only read

FWIW, I believe that count(*) is pretty slow in Oracle too.

Well COUNT was only an example. I think (but I'm not sure AT ALL) that

SELECT A FROM myTAB where A <10000

only uses the index (if there's an index defined for A) in Oracle.

But mine was just curiosity... which I think you and Sam answered.

Thank you.

Unisciti alla community di Io fotografo e video, il nuovo corso di fotografia di Gazzetta dello sport:
http://www.flickr.com/groups/iofotografoevideo

#6Jonah H. Harris
jonah.harris@gmail.com
In reply to: Leonardo Francalanci (#5)
Re: MVCC and index-only read

On Tue, Nov 18, 2008 at 2:02 PM, Scara Maccai <m_lists@yahoo.it> wrote:

SELECT A FROM myTAB where A <10000

only uses the index (if there's an index defined for A) in Oracle.

Well, not exactly. That's called a "covered" index because the query
could be satisfied directly from the index (the attribute is covered
by the index). Oracle sometimes satisfies it with an index fast full
scan, but not always; it depends on the cost of other access methods
and/or what Oracle believes is currently in cache.

--
Jonah H. Harris, Senior DBA
myYearbook.com

#7Thomas Kellerer
spam_eater@gmx.net
In reply to: Jonah H. Harris (#6)
Re: MVCC and index-only read

Jonah H. Harris wrote on 18.11.2008 20:15:

On Tue, Nov 18, 2008 at 2:02 PM, Scara Maccai <m_lists@yahoo.it> wrote:

SELECT A FROM myTAB where A <10000

only uses the index (if there's an index defined for A) in Oracle.

Well, not exactly. That's called a "covered" index because the query
could be satisfied directly from the index (the attribute is covered
by the index). Oracle sometimes satisfies it with an index fast full
scan, but not always; it depends on the cost of other access methods
and/or what Oracle believes is currently in cache.

If all the columns from the select list are available in the index, then Oracle
will always prefer the index scan over a table scan (at least I have never seen
something else). Even for a SELECT that returns all rows of the table.

They are taking this concept even further with index organized tables, where no
real "table data" exists, everything is stored in the index (quited nice for
e.g. link tables that only consist of two or three integer columns)

Thomas

#8Scott Marlowe
scott.marlowe@gmail.com
In reply to: Thomas Kellerer (#7)
Re: MVCC and index-only read

On Tue, Nov 18, 2008 at 12:33 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:

If all the columns from the select list are available in the index, then
Oracle will always prefer the index scan over a table scan (at least I have
never seen something else). Even for a SELECT that returns all rows of the
table.

They are taking this concept even further with index organized tables, where
no real "table data" exists, everything is stored in the index (quited nice
for e.g. link tables that only consist of two or three integer columns)

Sounds like they're borrowing the code from innodb that does much the
same thing. In Innodb, if a field is indexed, it lives only as an
index, not in the table and an index at the same time.

#9Jonah H. Harris
jonah.harris@gmail.com
In reply to: Thomas Kellerer (#7)
Re: MVCC and index-only read

On Tue, Nov 18, 2008 at 2:33 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:

If all the columns from the select list are available in the index, then
Oracle will always prefer the index scan over a table scan (at least I have
never seen something else). Even for a SELECT that returns all rows of the
table.

No, it doesn't always prefer index fast full scan.

They are taking this concept even further with index organized tables, where
no real "table data" exists, everything is stored in the index (quited nice
for e.g. link tables that only consist of two or three integer columns)

Those are essentially clustered indexes, and they're not quite stored
exactly the same..

--
Jonah H. Harris, Senior DBA
myYearbook.com

#10Jonah H. Harris
jonah.harris@gmail.com
In reply to: Scott Marlowe (#8)
Re: MVCC and index-only read

On Tue, Nov 18, 2008 at 2:57 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

Sounds like they're borrowing the code from innodb that does much the
same thing. In Innodb, if a field is indexed, it lives only as an
index, not in the table and an index at the same time.

They aren't borrowing anything, Oracle has had this functionality
since at least Oracle 8i (1999).

--
Jonah H. Harris, Senior DBA
myYearbook.com

#11Scott Marlowe
scott.marlowe@gmail.com
In reply to: Jonah H. Harris (#10)
Re: MVCC and index-only read

On Tue, Nov 18, 2008 at 1:03 PM, Jonah H. Harris <jonah.harris@gmail.com> wrote:

On Tue, Nov 18, 2008 at 2:57 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

Sounds like they're borrowing the code from innodb that does much the
same thing. In Innodb, if a field is indexed, it lives only as an
index, not in the table and an index at the same time.

They aren't borrowing anything, Oracle has had this functionality
since at least Oracle 8i (1999).

Whoa, calm down Francis. I'm not suggesting they stole it or
something. Just that they're using the same basic concepts.

#12Scott Marlowe
scott.marlowe@gmail.com
In reply to: Scott Marlowe (#11)
Re: MVCC and index-only read

On Tue, Nov 18, 2008 at 1:07 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Tue, Nov 18, 2008 at 1:03 PM, Jonah H. Harris <jonah.harris@gmail.com> wrote:

On Tue, Nov 18, 2008 at 2:57 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

Sounds like they're borrowing the code from innodb that does much the
same thing. In Innodb, if a field is indexed, it lives only as an
index, not in the table and an index at the same time.

They aren't borrowing anything, Oracle has had this functionality
since at least Oracle 8i (1999).

Whoa, calm down Francis. I'm not suggesting they stole it or
something. Just that they're using the same basic concepts.

Oh, and citation needed. I don't remember seeing anything about
oracle using indexes as sole storage units back in 8i

--
When fascism comes to America, it will be draped in a flag and
carrying a cross - Sinclair Lewis

#13Jonah H. Harris
jonah.harris@gmail.com
In reply to: Scott Marlowe (#11)
Re: MVCC and index-only read

On Tue, Nov 18, 2008 at 3:07 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

They aren't borrowing anything, Oracle has had this functionality
since at least Oracle 8i (1999).

Whoa, calm down Francis.

My name's not Francis :)

I'm not suggesting they stole it or something. Just that they're using
the same basic concepts.

Hmm...

--- snip

Sounds like they're borrowing the code from innodb that does much the same thing

You can't borrow something you started developing prior to InnoDB's release.

--
Jonah H. Harris, Senior DBA
myYearbook.com

#14Jonah H. Harris
jonah.harris@gmail.com
In reply to: Scott Marlowe (#12)
Re: MVCC and index-only read

On Tue, Nov 18, 2008 at 3:09 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

Oh, and citation needed. I don't remember seeing anything about
oracle using indexes as sole storage units back in 8i

Your memory-foo is weak. See ORGANIZATION INDEX:

http://download-west.oracle.com/docs/cd/A87860_01/doc/server.817/a85397/statem3e.htm#2061671

--
Jonah H. Harris, Senior DBA
myYearbook.com

#15Joshua D. Drake
jd@commandprompt.com
In reply to: Jonah H. Harris (#14)
Re: MVCC and index-only read

On Tue, 2008-11-18 at 15:28 -0500, Jonah H. Harris wrote:

On Tue, Nov 18, 2008 at 3:09 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

Oh, and citation needed. I don't remember seeing anything about
oracle using indexes as sole storage units back in 8i

Your memory-foo is weak. See ORGANIZATION INDEX:

http://download-west.oracle.com/docs/cd/A87860_01/doc/server.817/a85397/statem3e.htm#2061671

Off topic much?

--
Jonah H. Harris, Senior DBA
myYearbook.com

--

#16Jonah H. Harris
jonah.harris@gmail.com
In reply to: Joshua D. Drake (#15)
Re: MVCC and index-only read

On Tue, Nov 18, 2008 at 3:45 PM, Joshua D. Drake <jd@commandprompt.com> wrote:

Off topic much?

Hey, all I did was make a joke; other people wanted to get all
*correct* about it :)

Anyway, as this has been discussed at least twenty times before, this
is a waste of a thread.

--
Jonah H. Harris, Senior DBA
myYearbook.com

#17Daniel Verite
daniel@manitou-mail.org
In reply to: Scott Marlowe (#12)
Re: MVCC and index-only read

Scott Marlowe wrote:

They aren't borrowing anything, Oracle has had this functionality
since at least Oracle 8i (1999).

Whoa, calm down Francis. I'm not suggesting they stole it or
something. Just that they're using the same basic concepts.

Oh, and citation needed. I don't remember seeing anything about
oracle using indexes as sole storage units back in 8i

They call that IOT, for Index Organized Tables, and they were
introduced with Oracle8 (1997). Check out
http://www.orafaq.com/wiki/Oracle_8

Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage:
http://www.manitou-mail.org

#18Thomas Kellerer
spam_eater@gmx.net
In reply to: Jonah H. Harris (#9)
Re: MVCC and index-only read

Jonah H. Harris wrote on 18.11.2008 20:58:

On Tue, Nov 18, 2008 at 2:33 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:

If all the columns from the select list are available in the index, then
Oracle will always prefer the index scan over a table scan (at least I have
never seen something else). Even for a SELECT that returns all rows of the
table.

No, it doesn't always prefer index fast full scan.

Hmm. I was not talking about an index _fast full_ scan, I was talking about
index scans in general. Personally I have never seen Oracle using a table scan
(whatever kind) if all columns in the select are present in the index.

And the manual actually suggests the same:

"If the statement accesses only columns of the index, then Oracle reads the
indexed column values directly from the index, rather than from the table"
http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/optimops.htm#i52300

They are taking this concept even further with index organized tables, where
no real "table data" exists, everything is stored in the index (quited nice
for e.g. link tables that only consist of two or three integer columns)

Those are essentially clustered indexes, and they're not quite stored
exactly the same..

Hmm, my understanding of a clustered index, that it "orders" the table data
according to the index, but there is still "table data" and "index data", right?

That is a bit different to an index-organized table were only a B-Tree index
exists. This is not mandatory, but for my example (a link table with two PK
columns) only a B-Tree index is created.

(I have to admit I don't really know the concept of clustered indexes)

Thomas

#19Jonah H. Harris
jonah.harris@gmail.com
In reply to: Thomas Kellerer (#18)
Re: MVCC and index-only read

On Tue, Nov 18, 2008 at 3:54 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:

Hmm. I was not talking about an index _fast full_ scan, I was talking about
index scans in general. Personally I have never seen Oracle using a table
scan (whatever kind) if all columns in the select are present in the index.

And the manual actually suggests the same:

"If the statement accesses only columns of the index, then Oracle reads the
indexed column values directly from the index, rather than from the table"
http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/optimops.htm#i52300

The manual is wrong.

Those are essentially clustered indexes, and they're not quite stored
exactly the same..

Hmm, my understanding of a clustered index, that it "orders" the table data
according to the index, but there is still "table data" and "index data",
right?

That is a bit different to an index-organized table were only a B-Tree index
exists. This is not mandatory, but for my example (a link table with two PK
columns) only a B-Tree index is created.

Well, clustered indexes mean different things to different vendors.
Oracle's implementation stores the data with the index as does SQL
Server, but in a little different fashion.

--
Jonah H. Harris, Senior DBA
myYearbook.com