Including Snapshot Info with Indexes
Hi,
Currently The index implementation in Postgresql does not store the
Snapshot information in the Index. If we add the snapshot information into
the indexing structure, we will have the following advantages.
a) There can be index only scans like Oracle
b) Unique indexes will become less costly, as older index tuples can be
found out.
c) Even the index scans will get faster, since some of the index tuples
won't translate into HeapScans.
d) Deletes and Updates will become slightly costly, as they have to update
these indexes.
I propose we add a DDL like "Create Index .. With Snapshot", which will have
different relkind.
The design in my mind is to add the Snapshot info together with the values
as first variables. We need to change the Index_getattr slightly to have the
relkind as input parameter, based on which we can have an offset.
Please reply back with your comments.
Thanks,
Gokul.
Gokulakannan Somasundaram wrote:
Currently The index implementation in Postgresql does not store the
Snapshot information in the Index. If we add the snapshot information into
the indexing structure, we will have the following advantages.
This idea has been discussed to death many times before. Please search
the archives.
a) There can be index only scans like Oracle
IMO, the most promising approach to achieving index-only-scans at the
moment is the Dead Space Map, as discussed in the 8.3 dev cycle.
b) Unique indexes will become less costly, as older index tuples can be
found out.
Doesn't seem like a big benefit, considering that in most cases there
won't be any tuples in the index with a duplicate key. A common
exception to that is (non-HOT) updating a row. But in that case, the
page containing the old tuple is already in cache, so the lookup of the
visibility from the heap is cheap.
c) Even the index scans will get faster, since some of the index tuples
won't translate into HeapScans.
That's the same as doing an index-only-scan, right?
d) Deletes and Updates will become slightly costly, as they have to update
these indexes.
I think you're grossly underestimating the cost of that. For example, on
a table with 3 indexes. a delete currently requires one index lookup +
one heap lookup. With visibility in the indexes, that would require 3
index lookups + one heap lookup. That's 4 vs. 2 page accesses, not
taking into account the non-leaf b-tree pages. The real impact will
depend on what's in cache, but the cost can be very high.
Also, the full visibility information would need 12 bytes of space per
tuple. An index tuple on an int4 key currently takes 12 bytes, so that
would double the index size. Storage size has a big impact on
performance. More bytes means more I/O, less data fits in cache, and
more WAL traffic.
There's non-trivial implementation issues involved as well. You'd need a
way to reliably find all the index pointers for a given heap tuple
(search the archives for "retail vacuum" for the issues involved in
that. Broken user defined functions are a problem for example). And
you'd need to keep them all locked at the same time to modify them all
atomically, which is prone to deadlocks.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Mon, 2007-10-08 at 09:40 +0100, Heikki Linnakangas wrote:
This idea has been discussed to death many times before. Please search
the archives.
Somewhat related to the "visibility in index" thing: would it be
possible to have a kind of index-table ? We do have here some tables
which have 2-4 fields, and the combination of them forms the primary key
of the table. There are usually no other indexes on the table, and the
net result is that the PK index duplicates the heap. So it would be cool
if the index would be THE heap somehow...
The most prominent example of this in our DBs is this table:
db> \d table_a
Table "public.table_a"
Column | Type | Modifiers
-----------+--------+-----------
id_1 | bigint | not null
id_2 | bigint | not null
Indexes:
"pk_table_a" PRIMARY KEY, btree (id_1, id_2)
db> select reltuples::bigint, relpages from pg_class where
relname='table_a';
reltuples | relpages
-----------+----------
445286464 | 710090
(1 row)
db> select reltuples::bigint, relpages from pg_class where
relname='pk_table_a';
reltuples | relpages
-----------+----------
445291072 | 599848
(1 row)
This postgres install is compiled with 32K page size (for the ones who
wonder about the page counts). In any case, it is clear that the index
basically duplicates the heap...
Cheers,
Csaba.
Csaba Nagy wrote:
On Mon, 2007-10-08 at 09:40 +0100, Heikki Linnakangas wrote:
This idea has been discussed to death many times before. Please search
the archives.Somewhat related to the "visibility in index" thing: would it be
possible to have a kind of index-table ? We do have here some tables
which have 2-4 fields, and the combination of them forms the primary key
of the table. There are usually no other indexes on the table, and the
net result is that the PK index duplicates the heap. So it would be cool
if the index would be THE heap somehow...
The clustered index patch I worked on for 8.3, but didn't make it in,
would have helped that use case a lot.
A column-store kind of mechanism would be nice. Some columns could be
stored in index-like structures, while other would be in the heap. I
haven't seen any practical proposal on how to do it though.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On 10/8/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
Gokulakannan Somasundaram wrote:
Currently The index implementation in Postgresql does not store the
Snapshot information in the Index. If we add the snapshot informationinto
the indexing structure, we will have the following advantages.
This idea has been discussed to death many times before. Please search
the archives.a) There can be index only scans like Oracle
IMO, the most promising approach to achieving index-only-scans at the
moment is the Dead Space Map, as discussed in the 8.3 dev cycle.
Index only scans means that in order to get certain results, we may not
goto the table at all. For example, if you have an index on columns a and b,
and if there is a query like "select b from table where a between a1 and
a2", then the explain plan need not goto the table. I can't understand how
dead space map will provide such a functionality. In short each index will
act like an Index Organized Table, if the all the columns of the query are
present in the index.
b) Unique indexes will become less costly, as older index tuples can be
found out.
Doesn't seem like a big benefit, considering that in most cases there
won't be any tuples in the index with a duplicate key. A common
exception to that is (non-HOT) updating a row. But in that case, the
page containing the old tuple is already in cache, so the lookup of the
visibility from the heap is cheap.
Its not a big benefit. agreed.
c) Even the index scans will get faster, since some of the index tuples
won't translate into HeapScans.
That's the same as doing an index-only-scan, right?
No here if you have an index on a(say). If there is a query like select *
form table where a between a1 and a2, currently the scan goes to the table
to verify the visibility. Of course if the tuple satisfies vacuum, then it
is marked in the index, which is an optimization. This is not index-only
scan. This is a normal index scan, which can skip certain random I/Os.
d) Deletes and Updates will become slightly costly, as they have to update
these indexes.
I think you're grossly underestimating the cost of that. For example, on
a table with 3 indexes. a delete currently requires one index lookup +
one heap lookup. With visibility in the indexes, that would require 3
index lookups + one heap lookup. That's 4 vs. 2 page accesses, not
taking into account the non-leaf b-tree pages. The real impact will
depend on what's in cache, but the cost can be very high.
That's true. But i am not asking to replace the current index
implementation, but to provide an extra option while indexing. Say if a
particular database setup doesn't do much deletes and updates(imagine tables
with partitioning, where the partitions/tables are dropped instead of
deletes. They can have an option to "create index .. with snapshot"
Imagine the Index Vacuum also will do lesser Random I/Os
Also, the full visibility information would need 12 bytes of space per
tuple. An index tuple on an int4 key currently takes 12 bytes, so that
would double the index size. Storage size has a big impact on
performance. More bytes means more I/O, less data fits in cache, and
more WAL traffic.
I am thinking of certain optimizations here. we have a bit unused in
indextuple structure. If a particular tuple is not deleted, then we can
signify that using that bit and save 6 bytes of saving the xmax and cmax. We
are trading of this space efficiency in place of Random I/Os, which is not a
bad trade-off , i suppose. Again this is going to optional for the user. If
users have an option to create Bitmap index/ Binary index, why can't they
have this option as well?
There's non-trivial implementation issues involved as well. You'd need a
way to reliably find all the index pointers for a given heap tuple
(search the archives for "retail vacuum" for the issues involved in
that. Broken user defined functions are a problem for example). And
you'd need to keep them all locked at the same time to modify them all
atomically, which is prone to deadlocks.
I think Vacuum need not goto the table, as the visibility information is
present in the index itself. I don't know whether i have given the correct
answer here.
Expecting your reply..
Thanks,
Gokul.
On 10/8/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
Csaba Nagy wrote:
On Mon, 2007-10-08 at 09:40 +0100, Heikki Linnakangas wrote:
This idea has been discussed to death many times before. Please search
the archives.Somewhat related to the "visibility in index" thing: would it be
possible to have a kind of index-table ? We do have here some tables
which have 2-4 fields, and the combination of them forms the primary key
of the table. There are usually no other indexes on the table, and the
net result is that the PK index duplicates the heap. So it would be cool
if the index would be THE heap somehow...The clustered index patch I worked on for 8.3, but didn't make it in,
would have helped that use case a lot.A column-store kind of mechanism would be nice. Some columns could be
stored in index-like structures, while other would be in the heap. I
haven't seen any practical proposal on how to do it though.
I think it more resembles the Oracle's IOT with overflow. I think my
proposal, once implemented can be easily extended to incorporate
IOT/Clustered indexes. One thing is for sure. Without storing Visibility
info, Unique Secondary indexes(Indexes on IOT/Clustered indexed tables) is
not possible, as it is not possible to re-locate the same entry in a b-tree,
if we are storing the Primary key in place of tuple-id.
--
Show quoted text
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Gokulakannan Somasundaram wrote:
On 10/8/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
IMO, the most promising approach to achieving index-only-scans at the
moment is the Dead Space Map, as discussed in the 8.3 dev cycle.Index only scans means that in order to get certain results, we may not
goto the table at all. For example, if you have an index on columns a and b,
and if there is a query like "select b from table where a between a1 and
a2", then the explain plan need not goto the table. I can't understand how
dead space map will provide such a functionality.
Please read the discussions in the archives. The dead space map holds
visibility information in a condensed form. For index-only-scans, we
need to know if all tuples on page are are visible to us. If the dead
space map is designed with index-only-scans in mind, we can store a bit
there indicating "all tuples on this page are visible to everyone".
Pages that have that bit set don't need to be visited to check visibility.
What information exactly is going to be stored in the dead space map is
still debated. For vacuuming, we need to know which pages contain dead
tuples that are worth vacuuming, which isn't the same thing as "all
tuples are visible to everyone", but it's quite close.
Heap pages that do have dead or recently modified rows do need to be
visited, so the executor needs to always be prepared to check visibility
from the heap. However, on a table that's not very frequently updated,
most bits will be set and the effect will be the same as genuine
index-only-scans. On a table that is frequently updated, you would
suffer a big hit in update performance with the "duplicate visibility
information in all indexes" scheme as well, as the updates would need to
update the indexes as well, so the performance characteristics are
roughly the same.
That's true. But i am not asking to replace the current index
implementation, but to provide an extra option while indexing. Say if a
particular database setup doesn't do much deletes and updates(imagine tables
with partitioning, where the partitions/tables are dropped instead of
deletes. They can have an option to "create index .. with snapshot"
A nice property of utilizing the dead space map for index-only-scans is
that it doesn't require any action from the DBA. It will "just work". It
also adapts well to tables that have parts that are frequently updated,
and other parts that are not. The frequently updated parts will behave
like what we have now, index-only-scans are not possible because, but
deletes/updates are cheap. But the less frequently updated parts will
eventually have the bits set in the map, and we can do index-only-scans
for those parts.
Imagine the Index Vacuum also will do lesser Random I/Os
Index vacuum doesn't do random I/Os as it is.
Also, the full visibility information would need 12 bytes of space per
tuple. An index tuple on an int4 key currently takes 12 bytes, so that
would double the index size. Storage size has a big impact on
performance. More bytes means more I/O, less data fits in cache, and
more WAL traffic.I am thinking of certain optimizations here. we have a bit unused in
indextuple structure. If a particular tuple is not deleted, then we can
signify that using that bit and save 6 bytes of saving the xmax and cmax. We
are trading of this space efficiency in place of Random I/Os, which is not a
bad trade-off , i suppose. Again this is going to optional for the user. If
users have an option to create Bitmap index/ Binary index, why can't they
have this option as well?
Because we have to maintain it? :)
Speaking of bitmap indexes, that would be nice to have. It looks like
Gavin dropped the ball on the patch, so if you want to continue that
work, that would be great.
There's non-trivial implementation issues involved as well. You'd need a
way to reliably find all the index pointers for a given heap tuple
(search the archives for "retail vacuum" for the issues involved in
that. Broken user defined functions are a problem for example). And
you'd need to keep them all locked at the same time to modify them all
atomically, which is prone to deadlocks.I think Vacuum need not goto the table, as the visibility information is
present in the index itself.
Vacuum isn't the problem here. The problem is: given heap tuple X, how
do you find the the corresponding index tuples? The obvious solution is
to calculate the index keys from the heap tuple, and use them to look
up. But what if you have an expression index on a user-defined function,
and that user-defined function is broken so that it returns a different
value than when the index tuple was inserted? You won't find the index
tuples in that case, so you won't be able to update the visibility
information. Granted, you've got a broken user-defined-function in that
case, and you're going to get bogus query results anyway. But not
finding the index tuple when needed would lead to more serious
corruption. This is the same problem many proposed retail vacuum schemes
have stumbled into, which is why I suggested searching for that.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Ühel kenal päeval, E, 2007-10-08 kell 11:41, kirjutas Heikki
Linnakangas:
The dead space map holds
visibility information in a condensed form. For index-only-scans, we
need to know if all tuples on page are are visible to us. If the dead
space map is designed with index-only-scans in mind, we can store a bit
there indicating "all tuples on this page are visible to everyone".
Pages that have that bit set don't need to be visited to check visibility.What information exactly is going to be stored in the dead space map is
still debated. For vacuuming, we need to know which pages contain dead
tuples that are worth vacuuming, which isn't the same thing as "all
tuples are visible to everyone", but it's quite close.
I would prefer a separate MVC visibility heap (aka. extended "dead space
map") which would duplicate whole visibility info from heap pages, just
in compressed form. After a few releases with duplicated visibility
info, we could remove it from the data heap.
If the whole visibility info (cmin, cmax, tmin, tmax, flags, (+ size for
DSM uses)) for tuples, were in a separate heap, it would allow for a lot
of on-the-fly compression. for example we could:
* get rid of both tmin and tmax for all completed transactions
* reduce any deleted tuple to just flags
* reduce any tuple produced by aborted transaction to just flags
* reduce any tuple visible to all backends to just flags
* RRL compress (runs of) pages produced by same transaction
* RRL compress (runs of) pages with all tuples visible
* RRL compress (runs of) pages with all tuples deleted
depending on distribution of Inserts/Updates/Deletes it will make
visibility info a little or a lot smaller than it is currently, greatly
enchancing chances that it stays in cache (even for OLAP loads, because
data for these are usually produced by bulk inserts and thus their
visibility info is highly compressable)
It also makes VACUUM more efficient, as it's initial scan (find
vacuumable tuples) will need to do lot less IO.
And it allows for more intelligent choices for new tuple placement ,
especially if we want to preserve clustering.
And of course it gives you kind of index-only scans (mostly read index + check in vis.heap)
-------------
Hannu
Ühel kenal päeval, E, 2007-10-08 kell 12:12, kirjutas Gokulakannan
Somasundaram:
Hi,
Currently The index implementation in Postgresql does not store
the Snapshot information in the Index.
..
Please reply back with your comments.
I think you got enoght "search for previous discussion" answers on this
aone already ;)
So I propose a few another ideas to investigate
1. get rid of indexes for TOAST tables
instead of TOAST tuple id store CTID of first TOAST block directly, and
use something like skip lists inside the TOAST block headers to access
next TOAST tuples.
2. store visibility info in TOAST tables
if you store visibility info in TOAST tuples, it becomes possible to
update just the small part of the tuple in the main heap and keep the
bulk of toasted data in place.
both of these ideas are much more complicated to implement than it
appears from my simple description, but should have big benefits for a
sizable number of scenarios which use TOAST.
-------------
Hannu
On Mon, 2007-10-08 at 14:58 +0300, Hannu Krosing wrote:
1. get rid of indexes for TOAST tables
instead of TOAST tuple id store CTID of first TOAST block directly, and
use something like skip lists inside the TOAST block headers to access
next TOAST tuples.
It should be possible to optimise TOAST for when there is just a single
chunk that is toasted. Since we often (and by default) compress data
stored in TOAST tables this would be a frequently used optimisation.
Instead of storing the TOAST OID, which is then looked-up in the index
to find the TOAST tid, we can just store the tid directly in the toast
pointer on the main heap. That way we'll get faster read and write
access for a good proportion of rows by avoiding the toast index and
going straight to the toast heap.
We'd need a different kind of toast pointer which would be 2 bytes
longer than the normal pointer. I think we have a spare flag bit to
indicate this.
That's just a rough sketch, I've not checked the details on this.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
Hi Heikki,
I am always slightly late in understanding things. Let me try
to understand the use of DSM. It is a bitmap index on whether all the tuples
in a particular block is visible to all the backends, whether a particular
block contains tuples which are invisible to everyone. But i think this will
get subjected to the same limitations of Bitmap index. Even Oracle suggests
the use of Bitmap index for only data warehousing tables, where the Bitmap
indexes will be dropped and recreated after every bulk load. This is not a
viable alternative for OLTP transactions. But i think i am late in the game
as i haven't participated in those discussions
One Bitmap index block usually maps to lot of blocks in the heap.
So locking of one page to update the DSM for update/delete/insert would hit
the concurrency. But again all these are my observation w.r.t oracle bitmap
indexes. May be i am missing something in DSM.
I couldn't get that piece of discussion in the archive, which
discusses the design of Retail Vacuum. So please advise me again here.
Let's take up Retail Vacuuming again. The User defined function
which would return different values at different time can be classified as
non-deterministic functions. We can say that this index cannot be created
on a non-deterministic function. This is the way it is implemented in
Oracle. What they have done is they have classified certain built-in
operators and functions as deterministic. Similarly they have classified a
few as non-deterministic operators and functions. Can we follow a similar
approach?
Thanks,
Gokul.
Show quoted text
On 10/8/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
Gokulakannan Somasundaram wrote:
On 10/8/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
IMO, the most promising approach to achieving index-only-scans at the
moment is the Dead Space Map, as discussed in the 8.3 dev cycle.Index only scans means that in order to get certain results, we may not
goto the table at all. For example, if you have an index on columns aand b,
and if there is a query like "select b from table where a between a1 and
a2", then the explain plan need not goto the table. I can't understandhow
dead space map will provide such a functionality.
Please read the discussions in the archives. The dead space map holds
visibility information in a condensed form. For index-only-scans, we
need to know if all tuples on page are are visible to us. If the dead
space map is designed with index-only-scans in mind, we can store a bit
there indicating "all tuples on this page are visible to everyone".
Pages that have that bit set don't need to be visited to check visibility.What information exactly is going to be stored in the dead space map is
still debated. For vacuuming, we need to know which pages contain dead
tuples that are worth vacuuming, which isn't the same thing as "all
tuples are visible to everyone", but it's quite close.Heap pages that do have dead or recently modified rows do need to be
visited, so the executor needs to always be prepared to check visibility
from the heap. However, on a table that's not very frequently updated,
most bits will be set and the effect will be the same as genuine
index-only-scans. On a table that is frequently updated, you would
suffer a big hit in update performance with the "duplicate visibility
information in all indexes" scheme as well, as the updates would need to
update the indexes as well, so the performance characteristics are
roughly the same.That's true. But i am not asking to replace the current index
implementation, but to provide an extra option while indexing. Say if a
particular database setup doesn't do much deletes and updates(imaginetables
with partitioning, where the partitions/tables are dropped instead of
deletes. They can have an option to "create index .. with snapshot"A nice property of utilizing the dead space map for index-only-scans is
that it doesn't require any action from the DBA. It will "just work". It
also adapts well to tables that have parts that are frequently updated,
and other parts that are not. The frequently updated parts will behave
like what we have now, index-only-scans are not possible because, but
deletes/updates are cheap. But the less frequently updated parts will
eventually have the bits set in the map, and we can do index-only-scans
for those parts.Imagine the Index Vacuum also will do lesser Random I/Os
Index vacuum doesn't do random I/Os as it is.
Also, the full visibility information would need 12 bytes of space per
tuple. An index tuple on an int4 key currently takes 12 bytes, so that
would double the index size. Storage size has a big impact on
performance. More bytes means more I/O, less data fits in cache, and
more WAL traffic.I am thinking of certain optimizations here. we have a bit unused in
indextuple structure. If a particular tuple is not deleted, then we can
signify that using that bit and save 6 bytes of saving the xmax andcmax. We
are trading of this space efficiency in place of Random I/Os, which is
not a
bad trade-off , i suppose. Again this is going to optional for the user.
If
users have an option to create Bitmap index/ Binary index, why can't
they
have this option as well?
Because we have to maintain it? :)
Speaking of bitmap indexes, that would be nice to have. It looks like
Gavin dropped the ball on the patch, so if you want to continue that
work, that would be great.There's non-trivial implementation issues involved as well. You'd need a
way to reliably find all the index pointers for a given heap tuple
(search the archives for "retail vacuum" for the issues involved in
that. Broken user defined functions are a problem for example). And
you'd need to keep them all locked at the same time to modify them all
atomically, which is prone to deadlocks.I think Vacuum need not goto the table, as the visibility information is
present in the index itself.Vacuum isn't the problem here. The problem is: given heap tuple X, how
do you find the the corresponding index tuples? The obvious solution is
to calculate the index keys from the heap tuple, and use them to look
up. But what if you have an expression index on a user-defined function,
and that user-defined function is broken so that it returns a different
value than when the index tuple was inserted? You won't find the index
tuples in that case, so you won't be able to update the visibility
information. Granted, you've got a broken user-defined-function in that
case, and you're going to get bogus query results anyway. But not
finding the index tuple when needed would lead to more serious
corruption. This is the same problem many proposed retail vacuum schemes
have stumbled into, which is why I suggested searching for that.--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Gokulakannan Somasundaram wrote:
Hi Heikki, I am always slightly late in understanding things. Let me
try to understand the use of DSM. It is a bitmap index on whether all
the tuples in a particular block is visible to all the backends,
whether a particular block contains tuples which are invisible to
everyone. But i think this will get subjected to the same limitations
of Bitmap index. Even Oracle suggests the use of Bitmap index for
only data warehousing tables, where the Bitmap indexes will be
dropped and recreated after every bulk load. This is not a viable
alternative for OLTP transactions. But i think i am late in the game
as i haven't participated in those discussions
While the DSM might be similar in spirit to a bitmap index, the actual
implementation has a lot more freedome I'd say, since you can tailor it
exactly to the need of tracking some summarized visibility info. So not
all shortcomings of bitmap indices must necessarily apply to the DSM
also. But of course thats mostly handwavering...
One Bitmap index block usually maps to lot of blocks in the heap. So
locking of one page to update the DSM for update/delete/insert would
hit the concurrency. But again all these are my observation w.r.t
oracle bitmap indexes. May be i am missing something in DSM.
A simple DSM would probably contain a bit per page that says "all xmin <
GlobalXmin, and all xmax unset or aborted". That bit would only get SET
during VACUUM, and only unset during INSERT/UPDATE/DELETE. If setting it
is protected by a VACUUM-grade lock on the page, we might get away with
no locking during the unset, making the locking overhead pretty small.
I couldn't get that piece of discussion in the archive, which
discusses the design of Retail Vacuum. So please advise me again
here. Let's take up Retail Vacuuming again. The User defined function
which would return different values at different time can be
classified as non-deterministic functions. We can say that this
index cannot be created on a non-deterministic function. This is the
way it is implemented in Oracle. What they have done is they have
classified certain built-in operators and functions as deterministic.
Similarly they have classified a few as non-deterministic operators
and functions. Can we follow a similar approach?
Postgres already distinguishes VOLATILE,STABLE and IMMUTABLE functions.
It doesn't, however, risk physical data corruption, even if you get that
classification wrong. The worst that happens AFAIK are wrong query
results - but fixing your function, followed by a REINDEX always
corrects the problme. If you start poking holes into that safety net,
there'll be a lot of pushback I believe - and IMHO rightly so, because
people do, and always will, get such classifications wrong.
greetings, Florian Pflug
Gokulakannan Somasundaram wrote:
I am always slightly late in understanding things. Let me try
to understand the use of DSM. It is a bitmap index on whether all the tuples
in a particular block is visible to all the backends, whether a particular
block contains tuples which are invisible to everyone. But i think this will
get subjected to the same limitations of Bitmap index. Even Oracle suggests
the use of Bitmap index for only data warehousing tables, where the Bitmap
indexes will be dropped and recreated after every bulk load. This is not a
viable alternative for OLTP transactions.
Well, it's not quite the same as a bitmap index, though both use a
bitmap. You didn't quite get into details on what the limitations are
and why it wouldn't be suitable for OLTP, but I don't see any
significant problems.
But i think i am late in the game
as i haven't participated in those discussions
Better late than never :).
One Bitmap index block usually maps to lot of blocks in the heap.
So locking of one page to update the DSM for update/delete/insert would hit
the concurrency. But again all these are my observation w.r.t oracle bitmap
indexes. May be i am missing something in DSM.
Yeah, the DSM page could become a contention bottleneck. My current
thinking is that we'd have a flag in the heap page header, that would be
set together with the bit in the DSM. When the flag in the page header
is set, you don't need to lock and update the DSM because you know the
bit is already set. Vacuum would have to clear both the DSM bit and the
flag.
Let's take up Retail Vacuuming again. The User defined function
which would return different values at different time can be classified as
non-deterministic functions. We can say that this index cannot be created
on a non-deterministic function. This is the way it is implemented in
Oracle. What they have done is they have classified certain built-in
operators and functions as deterministic. Similarly they have classified a
few as non-deterministic operators and functions. Can we follow a similar
approach?
We already do. A function must be marked as IMMUTABLE in order to use it
in an index expression. But we can't enforce that the user defined
function really behaves like an immutable function should. If someone
creates a user-defined function in C that calls the C random() function,
we can't stop it.
As I said earlier, using an index like that will of course lead to bogus
results. But it won't currently cause any server crashes or more serious
corruption.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On 10/8/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
Gokulakannan Somasundaram wrote:
I am always slightly late in understanding things. Let me
try
to understand the use of DSM. It is a bitmap index on whether all the
tuples
in a particular block is visible to all the backends, whether a
particular
block contains tuples which are invisible to everyone. But i think this
will
get subjected to the same limitations of Bitmap index. Even Oracle
suggests
the use of Bitmap index for only data warehousing tables, where the
Bitmap
indexes will be dropped and recreated after every bulk load. This is not
a
viable alternative for OLTP transactions.
Well, it's not quite the same as a bitmap index, though both use a
bitmap. You didn't quite get into details on what the limitations are
and why it wouldn't be suitable for OLTP, but I don't see any
significant problems.But i think i am late in the game
as i haven't participated in those discussionsBetter late than never :).
One Bitmap index block usually maps to lot of blocks in the
heap.
So locking of one page to update the DSM for update/delete/insert would
hit
the concurrency. But again all these are my observation w.r.t oracle
bitmap
indexes. May be i am missing something in DSM.
Yeah, the DSM page could become a contention bottleneck. My current
thinking is that we'd have a flag in the heap page header, that would be
set together with the bit in the DSM. When the flag in the page header
is set, you don't need to lock and update the DSM because you know the
bit is already set. Vacuum would have to clear both the DSM bit and the
flag.
It matters to us, where the index scan will goto. If the Index Scan is going
to touch DSM for understanding visibility(This might degrade the performance
of some of the index scans, if they have to wait to acquire the share lock,
and learn that they have to goto the heap to understand their visibility
requirements.) In the mean while, if the vacuum, inserts/updates/deletes are
holding the BUFFER_EXCLUSIVE lock on that, this would hurt the Select
transactions. Since there is only one bit per block in the DSM(best case),
there might be one DSM block per 8000 table blocks. All the transactions
which are accessing the 8000 blocks will be waiting on this one DSM block.
If we are going to update the Heap page header and asking the Indexscan to
refer to that, then there is no reduction in random I/Os. Can't we say that
if the snapshot info is embedded with index, we can avoid all these
difficulties? Most importantly it won't affect the performance of current
postgres in any way.
Let's take up Retail Vacuuming again. The User defined function
which would return different values at different time can be classified
as
non-deterministic functions. We can say that this index cannot be
created
on a non-deterministic function. This is the way it is implemented in
Oracle. What they have done is they have classified certain built-in
operators and functions as deterministic. Similarly they have classifieda
few as non-deterministic operators and functions. Can we follow a
similar
approach?
We already do. A function must be marked as IMMUTABLE in order to use it
in an index expression. But we can't enforce that the user defined
function really behaves like an immutable function should. If someone
creates a user-defined function in C that calls the C random() function,
we can't stop it.
A function is said to be deterministic, if it returns the same value,
irrespective of how many times, it is invoked. I think this definition
clearly puts the random function under the non-deterministic category. If we
have such a classification, do you think we can resolve this issue?
As I said earlier, using an index like that will of course lead to bogus
results. But it won't currently cause any server crashes or more serious
corruption.
One more final word on unique indexes. Whenever we are doing an update,
there will be insertions into the unique indexes which will trigger table
lookups. Ofcourse there is more probability, that the table block would be
in memory(un-pinned). Still contention for a shared resource is avoided, if
the snapshot info is stored with the indexes.
Let me get one more clarification, what would be type of performance results
with this implementation, that would encourage the hackers community to
accept the extra maintenance overhead.
Thanks,
Gokul.
On 10/8/07, Florian G. Pflug <fgp@phlo.org> wrote:
Gokulakannan Somasundaram wrote:
Hi Heikki, I am always slightly late in understanding things. Let me
try to understand the use of DSM. It is a bitmap index on whether all
the tuples in a particular block is visible to all the backends,
whether a particular block contains tuples which are invisible to
everyone. But i think this will get subjected to the same limitations
of Bitmap index. Even Oracle suggests the use of Bitmap index for
only data warehousing tables, where the Bitmap indexes will be
dropped and recreated after every bulk load. This is not a viable
alternative for OLTP transactions. But i think i am late in the game
as i haven't participated in those discussionsWhile the DSM might be similar in spirit to a bitmap index, the actual
implementation has a lot more freedome I'd say, since you can tailor it
exactly to the need of tracking some summarized visibility info. So not
all shortcomings of bitmap indices must necessarily apply to the DSM
also. But of course thats mostly handwavering...One Bitmap index block usually maps to lot of blocks in the heap. So
locking of one page to update the DSM for update/delete/insert would
hit the concurrency. But again all these are my observation w.r.t
oracle bitmap indexes. May be i am missing something in DSM.A simple DSM would probably contain a bit per page that says "all xmin <
GlobalXmin, and all xmax unset or aborted". That bit would only get SET
during VACUUM, and only unset during INSERT/UPDATE/DELETE. If setting it
is protected by a VACUUM-grade lock on the page, we might get away with
no locking during the unset, making the locking overhead pretty small.
Let me try to understand. Do you mean to say some kind of Test and Set
implementation for Insert/Update/Delete?
So that would mean that there won't be any lock during the change of bit
flags. Why do we need lock to set it then?
It looks like a great idea.
I couldn't get that piece of discussion in the archive, which
discusses the design of Retail Vacuum. So please advise me again
here. Let's take up Retail Vacuuming again. The User defined function
which would return different values at different time can be
classified as non-deterministic functions. We can say that this
index cannot be created on a non-deterministic function. This is the
way it is implemented in Oracle. What they have done is they have
classified certain built-in operators and functions as deterministic.
Similarly they have classified a few as non-deterministic operators
and functions. Can we follow a similar approach?Postgres already distinguishes VOLATILE,STABLE and IMMUTABLE functions.
It doesn't, however, risk physical data corruption, even if you get that
classification wrong. The worst that happens AFAIK are wrong query
results - but fixing your function, followed by a REINDEX always
corrects the problme. If you start poking holes into that safety net,
there'll be a lot of pushback I believe - and IMHO rightly so, because
people do, and always will, get such classifications wrong.
A deterministic function is classified as one, which returns the same
results, irrespective of how many times, it is invoked. So if we form a
classification like that, do you think we will resolve the issue of Retail
Vaccum? In the case of User-Defined functions, the user should be defining
it as Deterministic. Can we frame a set of guidelines, or may be some test
procedure, which can declare a certain function as deterministic? I am just
saying from the top of my mind. Even otherwise, if we can even restrict this
indexing to only Built-in deterministic functions., don't you think it would
help the cause of a majority? I have just made the proposal to create the
index with snapshot a optional one.
Thanks,
Gokul.
On 10/9/07, Gokulakannan Somasundaram <gokul007@gmail.com> wrote:
On 10/8/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
Gokulakannan Somasundaram wrote:
I am always slightly late in understanding things. Let me
try
to understand the use of DSM. It is a bitmap index on whether all the
tuples
in a particular block is visible to all the backends, whether a
particular
block contains tuples which are invisible to everyone. But i think
this will
get subjected to the same limitations of Bitmap index. Even Oracle
suggests
the use of Bitmap index for only data warehousing tables, where the
Bitmap
indexes will be dropped and recreated after every bulk load. This is
not a
viable alternative for OLTP transactions.
Well, it's not quite the same as a bitmap index, though both use a
bitmap. You didn't quite get into details on what the limitations are
and why it wouldn't be suitable for OLTP, but I don't see any
significant problems.But i think i am late in the game
as i haven't participated in those discussionsBetter late than never :).
One Bitmap index block usually maps to lot of blocks in the
heap.
So locking of one page to update the DSM for update/delete/insert
would hit
the concurrency. But again all these are my observation w.r.t oracle
bitmap
indexes. May be i am missing something in DSM.
Yeah, the DSM page could become a contention bottleneck. My current
thinking is that we'd have a flag in the heap page header, that would beset together with the bit in the DSM. When the flag in the page header
is set, you don't need to lock and update the DSM because you know the
bit is already set. Vacuum would have to clear both the DSM bit and the
flag.It matters to us, where the index scan will goto. If the Index Scan is
going to touch DSM for understanding visibility(This might degrade the
performance of some of the index scans, if they have to wait to acquire the
share lock, and learn that they have to goto the heap to understand their
visibility requirements.) In the mean while, if the vacuum,
inserts/updates/deletes are holding the BUFFER_EXCLUSIVE lock on that, this
would hurt the Select transactions. Since there is only one bit per block in
the DSM(best case), there might be one DSM block per 8000 table blocks. All
the transactions which are accessing the 8000 blocks will be waiting on this
one DSM block. If we are going to update the Heap page header and asking
the Indexscan to refer to that, then there is no reduction in random I/Os.
Can't we say that if the snapshot info is embedded with index, we can avoid
all these difficulties? Most importantly it won't affect the performance of
current postgres in any way.Let's take up Retail Vacuuming again. The User defined
functionwhich would return different values at different time can be
classified as
non-deterministic functions. We can say that this index cannot be
created
on a non-deterministic function. This is the way it is implemented in
Oracle. What they have done is they have classified certain built-in
operators and functions as deterministic. Similarly they haveclassified a
few as non-deterministic operators and functions. Can we follow a
similar
approach?
We already do. A function must be marked as IMMUTABLE in order to use it
in an index expression. But we can't enforce that the user defined
function really behaves like an immutable function should. If someone
creates a user-defined function in C that calls the C random() function,
we can't stop it.A function is said to be deterministic, if it returns the same value,
irrespective of how many times, it is invoked. I think this definition
clearly puts the random function under the non-deterministic category. If we
have such a classification, do you think we can resolve this issue?
If we frame a set of guidelines/test procedure, do you think it might solve
the issue? Even, if we don't allow this type of indexing to anything other
than built-in deterministic functions, i feel it would serve most of the
indexing requirements.
As I said earlier, using an index like that will of course lead to bogus
Show quoted text
results. But it won't currently cause any server crashes or more serious
corruption.
One more final word on unique indexes. Whenever we are doing an update,
there will be insertions into the unique indexes which will trigger table
lookups. Ofcourse there is more probability, that the table block would be
in memory(un-pinned). Still contention for a shared resource is avoided, if
the snapshot info is stored with the indexes.Let me get one more clarification, what would be type of performance
results with this implementation, that would encourage the hackers community
to accept the extra maintenance overhead.Thanks,
Gokul.
"Gokulakannan Somasundaram" <gokul007@gmail.com> writes:
On 10/9/07, Gokulakannan Somasundaram <gokul007@gmail.com> wrote:
A function is said to be deterministic, if it returns the same value,
irrespective of how many times, it is invoked. I think this definition
clearly puts the random function under the non-deterministic category. If we
have such a classification, do you think we can resolve this issue?If we frame a set of guidelines/test procedure, do you think it might solve
the issue? Even, if we don't allow this type of indexing to anything other
than built-in deterministic functions, i feel it would serve most of the
indexing requirements.
We already do this. c.f. IMMUTABLE at
http://www.postgresql.org/docs/8.2/interactive/xfunc-volatility.html
and
http://www.postgresql.org/docs/8.2/interactive/sql-createindex.html
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
[snip]
In the case of User-Defined functions, the user should be defining it
as Deterministic.
The user CAN already define his functions as
"Deterministic=IMMUTABLE"... the problem is that many of us will define
functions as immutable, when in fact they are not. And do that by
mistake... and there's nothing postgres can do about that.
Can we frame a set of guidelines, or may be some test procedure, which
can declare a certain function as deterministic?
You mean postgres should check your function if it is really immutable ?
I can't imagine any way to do it correctly in reasonable time :-)
Imagine a function of 10 parameters which returns the sum of the
parameters all the time except for parameters all 1 it will randomly
return a value _once in a thousand executions_... please find a generic
algorithm which spots this function as not immutable in reasonable
execution time ;-)
So this example is a bit extreme, but don't underestimate the user ;-)
I am just saying from the top of my mind. Even otherwise, if we can
even restrict this indexing to only Built-in deterministic functions.,
don't you think it would help the cause of a majority? I have just
made the proposal to create the index with snapshot a optional one.
Restrictions like this are always confusing for the end user (i.e. why
can I use built-ins here and not my own ?). I leave to the actual coders
to say anything about code maintenance concerns...
Cheers,
Csaba.
Csaba Nagy wrote:
Can we frame a set of guidelines, or may be some test procedure, which
can declare a certain function as deterministic?You mean postgres should check your function if it is really immutable ?
I can't imagine any way to do it correctly in reasonable time :-)
Imagine a function of 10 parameters which returns the sum of the
parameters all the time except for parameters all 1 it will randomly
return a value _once in a thousand executions_... please find a generic
algorithm which spots this function as not immutable in reasonable
execution time ;-)
So this example is a bit extreme, but don't underestimate the user ;-)
I think you're overly pessimistic here ;-) This classification can be done quite
efficiently as long as your language is "static enough". The trick is not to
execute the function, but to scan the code to find all other functions and SQL
statements a given function may possibly call. If your function calls no SQL
statements, and only other functions already marked IMMUTABLE, then it must be
IMMUTABLE itself.
It does seem that only pl/pgsql is "static enough" for this to work, though,
making this idea rather unappealing.
I am just saying from the top of my mind. Even otherwise, if we can
even restrict this indexing to only Built-in deterministic functions.,
don't you think it would help the cause of a majority? I have just
made the proposal to create the index with snapshot a optional one.Restrictions like this are always confusing for the end user (i.e. why
can I use built-ins here and not my own ?). I leave to the actual coders
to say anything about code maintenance concerns...
Yes, and some built-ins have gotten that classification wrong too in the past
IIRC. Which probably is a good reason not to trust our users to get it right ;-)
greetings, Florian Pflug
I think you're overly pessimistic here ;-) This classification can be done quite
efficiently as long as your language is "static enough". The trick is not to
execute the function, but to scan the code to find all other functions and SQL
statements a given function may possibly call. If your function calls no SQL
statements, and only other functions already marked IMMUTABLE, then it must be
IMMUTABLE itself.
OK, I have a "black-box" mindset right now due to the problem I'm
currently working on, so I didn't even think about checking the source
code of the function (which is the right thing to do if you have the
source code)... in which case you're right, I was overly pessimistic :-)
Cheers,
Csaba.