Will partial index creation use existing index?

Started by Steve Crawfordover 18 years ago6 messagesgeneral
Jump to latest
#1Steve Crawford
scrawford@pinpointresearch.com

Does PostgreSQL use an existing index, if possible, when creating a
partial index?

By way of background, we have some nightly bulk processing that includes
a couple of 15-30 million row tables. Most of the processing is only
looking at prior-day data (up to ~200,000 rows) and for efficiency
requires several indexes. Except for this one process, the indexes are
useless and I'd rather not constantly maintain them.

There is an index on the timestamp column so I have considered creating
the indexes on a temporary basis with something like:
create index foo_bar on foo (bar)
where timestamp_col > current_date - interval '1 day';

(Yes this is simplified, I am aware of the Daylight Saving Time
off-by-an-hour implications.)

It seems that creating this partial index would be more efficient if the
existing index were used but "explain create index..." just gives me an
error and the query seems to run way too long to be processing only the
one day data. For comparison, on a relatively large 225,000 row day I
can create temporary table ondeay... on the same criteria and create 10
indexes and analyze the table in well under 10 seconds which is way
faster than creating even a single partial index on the full table.

Cheers,
Steve

#2Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Steve Crawford (#1)
Re: Will partial index creation use existing index?

On Jul 18, 2007, at 2:16 PM, Steve Crawford wrote:

Does PostgreSQL use an existing index, if possible, when creating a
partial index?

By way of background, we have some nightly bulk processing that
includes
a couple of 15-30 million row tables. Most of the processing is only
looking at prior-day data (up to ~200,000 rows) and for efficiency
requires several indexes. Except for this one process, the indexes are
useless and I'd rather not constantly maintain them.

There is an index on the timestamp column so I have considered
creating
the indexes on a temporary basis with something like:
create index foo_bar on foo (bar)
where timestamp_col > current_date - interval '1 day';

(Yes this is simplified, I am aware of the Daylight Saving Time
off-by-an-hour implications.)

It seems that creating this partial index would be more efficient
if the
existing index were used but "explain create index..." just gives
me an
error and the query seems to run way too long to be processing only
the
one day data. For comparison, on a relatively large 225,000 row day I
can create temporary table ondeay... on the same criteria and
create 10
indexes and analyze the table in well under 10 seconds which is way
faster than creating even a single partial index on the full table.

Check the source code, but I'm 99% certain that CREATE INDEX doesn't
consider any existing indexes. While what you're describing is
theoretically possible, it's not a very common use-case, so it's
rather unlikely to get worked on unless other folks show up with
*real life* examples of where this would be useful.

You might also want to consider partitioning the table.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#3Bruce Momjian
bruce@momjian.us
In reply to: Jim Nasby (#2)
Re: Will partial index creation use existing index?

"Jim Nasby" <decibel@decibel.org> writes:

Check the source code, but I'm 99% certain that CREATE INDEX doesn't consider
any existing indexes. While what you're describing is theoretically possible,
it's not a very common use-case, so it's rather unlikely to get worked on
unless other folks show up with *real life* examples of where this would be
useful.

Yeah we don't support this feature. It is something useful that I even though
of doing a while back. I think it's on the TODO somewhere.

The main use case for it is actually REINDEX. Since you already have an index
which contains precisely the records you want to index and already in order
too.

The main disadvantage is that it's not clear when it would actually be faster.
Generally index scans are slower than reading the whole table and sorting.
Probably it would have to run an SPI query to use the planner to find the best
way to get the rows it wants.

Another problem is that presumably you're reindexing because the existing
index *isn't* in such good shape. You may even be doing it because the
existing index is corrupt.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#4Ted Byers
r.ted.byers@rogers.com
In reply to: Steve Crawford (#1)
Re: Will partial index creation use existing index?

I don't know about partial indexing, but in a similar situation to what you describe, I have resorted to special tables designed specifically to hold one day's worth of data and to support our daily processing. I'd put the daily data into specific tables, with all the requisite indeces, and then, as part of the daily processing and after it has been processed, copy the day's data into the main tables and empty the tables used to hold the data temporarily. It may not be elegant or pretty, but it serves to greatly simplify processing since you know a priori that these tables content only the data you need to process, and you can be certain that eventually it finds its way into the main data tables. And you have your indeces only on the tables used for daily processing, rather than on your main tables. An additional concern I had was that if the process for getting the data is vulnerable to error (and in my case it was), you can apply your verification procedures to it
before bad data gets into your main tables, thereby reducing the risk of bad data getting into the database.

I though this might be an option for you if you have trouble getting your partial indeces to work well for you.

HTH

Ted

Steve Crawford <scrawford@pinpointresearch.com> wrote:
Does PostgreSQL use an existing index, if possible, when creating a
partial index?

By way of background, we have some nightly bulk processing that includes
a couple of 15-30 million row tables. Most of the processing is only
looking at prior-day data (up to ~200,000 rows) and for efficiency
requires several indexes. Except for this one process, the indexes are
useless and I'd rather not constantly maintain them.

#5Ron Johnson
ron.l.johnson@cox.net
In reply to: Bruce Momjian (#3)
Re: Will partial index creation use existing index?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/24/07 14:48, Gregory Stark wrote:
[snip]

The main use case for it is actually REINDEX. Since you already have an index
which contains precisely the records you want to index and already in order
too.

The main disadvantage is that it's not clear when it would actually be faster.
Generally index scans are slower than reading the whole table and sorting.
Probably it would have to run an SPI query to use the planner to find the best
way to get the rows it wants.

I believe you, but it's totally counter to prima-fascia logic.

Scanning the whole table means that you have to read in a whole
bunch of columns that you don't really give a rat's arse about, and
thus is a waste, whereas directly reading an existing index means
that you've got perfect locality of data, since you're only reading
what you care about.

Another problem is that presumably you're reindexing because the existing
index *isn't* in such good shape. You may even be doing it because the
existing index is corrupt.

That, of course, is an excellent point.

- --
Ron Johnson, Jr.
Jefferson LA USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGpqHkS9HxQb37XmcRAu93AKC04YXUWvMI6YiLfBNoy2BYtQw28ACdHqE/
kVqHiPwBONv0Tudy5OnA/SE=
=Fbuw
-----END PGP SIGNATURE-----

#6Bruce Momjian
bruce@momjian.us
In reply to: Ron Johnson (#5)
Re: Will partial index creation use existing index?

"Ron Johnson" <ron.l.johnson@cox.net> writes:

The main disadvantage is that it's not clear when it would actually be faster.
Generally index scans are slower than reading the whole table and sorting.
Probably it would have to run an SPI query to use the planner to find the best
way to get the rows it wants.

I believe you, but it's totally counter to prima-fascia logic.

Scanning the whole table means that you have to read in a whole
bunch of columns that you don't really give a rat's arse about, and
thus is a waste, whereas directly reading an existing index means
that you've got perfect locality of data, since you're only reading
what you care about.

Well you still have to read the table either way. The main difference is
you'll end up reading it in a random access order which is slower and also
means potentially reading parts of it many times over before you're done.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com