Ignore hash indices on replicas

Started by Steven Schlanskerover 13 years ago6 messagesgeneral
Jump to latest
#1Steven Schlansker
steven@likeness.com

I'm using Postgres hash indices on a streaming replica master.
As is documented, hash indices are not logged, so the replica does not have access to them.

I understand that the current wisdom is "don't use hash indices", but (unfortunately?) I have benchmarks that
show that our particular application is faster by quite a bit when a hash index is available.

I assume that fixing the hash index logging issue hasn't been a priority due to low interest / technical limitations, but I'm curious for a stopgap measure -- can we somehow configure Postgres to ignore hash indices on a replica, using other b-tree indices or even a sequential scan? I know I can do this on a per-connection basis by disabling various index lookup methods, but it'd be nice if it just ignored invalid indices on its own.

I've not seen much reference to this problem around, but I do apologize if I've missed it in the manual or it is extremely obvious how you do this :)

Thanks,
Steven

#2Jeff Davis
pgsql@j-davis.com
In reply to: Steven Schlansker (#1)
Re: Ignore hash indices on replicas

On Tue, 2012-07-10 at 00:09 -0700, Steven Schlansker wrote:

I understand that the current wisdom is "don't use hash indices", but
(unfortunately?) I have benchmarks that
show that our particular application is faster by quite a bit when a
hash index is available.

Can you publish the results somewhere? It might provoke some interest.

I assume that fixing the hash index logging issue hasn't been a
priority due to low interest / technical limitations, but I'm curious
for a stopgap measure -- can we somehow configure Postgres to ignore
hash indices on a replica, using other b-tree indices or even a
sequential scan? I know I can do this on a per-connection basis by
disabling various index lookup methods, but it'd be nice if it just
ignored invalid indices on its own.

This might work for you:

http://sigaev.ru/git/gitweb.cgi?p=plantuner.git;a=blob;hb=HEAD;f=README.plantuner

Regards,
Jeff Davis

#3Scott Marlowe
scott.marlowe@gmail.com
In reply to: Steven Schlansker (#1)
Re: Ignore hash indices on replicas

On Tue, Jul 10, 2012 at 1:09 AM, Steven Schlansker <steven@likeness.com> wrote:

I'm using Postgres hash indices on a streaming replica master.
As is documented, hash indices are not logged, so the replica does not have access to them.

I understand that the current wisdom is "don't use hash indices", but (unfortunately?) I have benchmarks that
show that our particular application is faster by quite a bit when a hash index is available.

You could use a slony slave and have different indexes etc between
master and slave but it's more complex to setup, maintain and monitor
for most people.

#4Steven Schlansker
steven@likeness.com
In reply to: Scott Marlowe (#3)
Re: Ignore hash indices on replicas

On Aug 19, 2012, at 8:01 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Tue, Jul 10, 2012 at 1:09 AM, Steven Schlansker <steven@likeness.com> wrote:

I'm using Postgres hash indices on a streaming replica master.
As is documented, hash indices are not logged, so the replica does not have access to them.

I understand that the current wisdom is "don't use hash indices", but (unfortunately?) I have benchmarks that
show that our particular application is faster by quite a bit when a hash index is available.

You could use a slony slave and have different indexes etc between
master and slave but it's more complex to setup, maintain and monitor
for most people.

Thanks for the suggestion, but we finally have replication working in a way we understand / like and I don't really consider this a viable option. The built-in replication has been treating us very well.

#5Steven Schlansker
steven@likeness.com
In reply to: Jeff Davis (#2)
Re: Ignore hash indices on replicas

On Aug 19, 2012, at 2:37 PM, Jeff Davis <pgsql@j-davis.com> wrote:

On Tue, 2012-07-10 at 00:09 -0700, Steven Schlansker wrote:

I understand that the current wisdom is "don't use hash indices", but
(unfortunately?) I have benchmarks that
show that our particular application is faster by quite a bit when a
hash index is available.

Can you publish the results somewhere? It might provoke some interest.

I might be able to spend some time looking at making this public, but the general parameters are:

122M rows, lookup key is a UUID type. Lookups are ~1000 random keys at a time (as in, a giant SELECT * FROM table WHERE key IN (?,?,?,?,…)

I assume that fixing the hash index logging issue hasn't been a
priority due to low interest / technical limitations, but I'm curious
for a stopgap measure -- can we somehow configure Postgres to ignore
hash indices on a replica, using other b-tree indices or even a
sequential scan? I know I can do this on a per-connection basis by
disabling various index lookup methods, but it'd be nice if it just
ignored invalid indices on its own.

This might work for you:

http://sigaev.ru/git/gitweb.cgi?p=plantuner.git;a=blob;hb=HEAD;f=README.plantuner

Thanks for the link; that looks interesting. It is a bit unfortunate that I would have to find and exclude indices manually, but very doable...

#6Jeff Janes
jeff.janes@gmail.com
In reply to: Steven Schlansker (#5)
Re: Ignore hash indices on replicas

On Mon, Aug 20, 2012 at 10:29 AM, Steven Schlansker <steven@likeness.com> wrote:

On Aug 19, 2012, at 2:37 PM, Jeff Davis <pgsql@j-davis.com> wrote:

On Tue, 2012-07-10 at 00:09 -0700, Steven Schlansker wrote:

show that our particular application is faster by quite a bit when a
hash index is available.

Can you publish the results somewhere? It might provoke some interest.

I might be able to spend some time looking at making this public, but the general parameters are:

122M rows, lookup key is a UUID type. Lookups are ~1000 random keys at a time (as in, a giant SELECT * FROM table WHERE key IN (?,?,?,?,…)

How well cached is the data?

If it has to be read from disk, then a bitmap index scan on a btree
index with effective_io_concurrency set high might do quite well,
assuming you can convince the planner to use one.

Cheers,

Jeff