Is it possible to have a "fast-write" Index?
There are several use cases where I see useful an index, but adding it will
slow too much inserts and updates.
For example, when we have 10 million rows on a table, and it's a table
which has frequent updates, we need several index to speed up selects, but
then we'll slow down updates a lot, specially when we have 10 or more
indexes.
Other cases involve indexes for text search, which are used only for user
search and aren't that important, so we want to have them, but we don't
want the overload they put whenever we write on the table.
I know different approaches that already solve some of those problems in
some ways (table partitioning, partial indexes, etc), but i don't feel they
are the solution to every problem of this kind.
Some people already asked for "delayed write" indexes, but the idea gets
discarded because the index could get out of sync, so it can omit results
and this is unacceptable. But i think maybe that could be fixed in several
ways and we can have a fast and reliable index (but maybe not so fast on
selects).
Since I do not know every internal of postgres, i feel simpler to share
here and ask which things can or cannot be done.
Let's imagine there is a new type of index called "weird_btree", where we
trade-off simplicity for speed. In almost every mode, we will rely on
VACUUM to put our index in optimal state.
Mode 1: on "aminsert" mark the index as INVALID. So, if you modified the
table you need to run REINDEX/CREATE INDEX CONCURRENTLY before doing
SELECT. This is almost the same as create index concurrently, the main
difference is you don't have to remember to drop the index before writing.
(I don't see much benefit here)
Mode 2: on "aminsert", put the new entry in a plain, unordered list instead
of the btree. Inserting at the end of a list should be faster than big
btrees and you'll know later which entries you missed indexing.
Mode 2.a: on index scan (amrescan, amgettuple), pretend that after the
btree there is the list and output every row, out-of order. You will have
to tell postgres that our index isn't sorted and it will have to recheck
every row.
Mode 2.b: mark the index invalid instead. When doing the next vacuum, sort
the list and insert it to the btree in a bulk operation. If it's ok, mark
the index valid.
Mode 3: on "aminsert", put the new entry on a second btree; leaving the
first one untouched. Because the second btree is new, will be small, and
writes should be faster. When doing a index scan, read tuples from both at
same time (like merge sort). On vacuum, merge the second btree onto the
first. On this mode, the index is sorted and there's no need of recheck.
Anyone thinks this would be a interesting feature for postgresql?
Did I miss something?
PD: Maybe it's also possible to take advantage of clustering, and have
indexes which entries are range of TIDs; but i'm not sure if this is too
exotic, or if it will make a difference.
Sincerely,
David.
On 6/5/15 11:07 AM, deavid wrote:
Did I miss something?
These are interesting ideas but the problem here is the problem is far
to hypothetical. You're trying to defer index maintenance cost in a case
where if there's any real problem the index pages are already in memory.
So if it's too slow it's not because of IO... but then why is it too slow?
If you have significantly more than 10M rows then IO would be much more
likely to be a problem, but at that point you should probably just be
partitioning anyway.
If you want to attract attention here I think you'll need to come up
with some concrete scenarios and provide data on where all the
performance hit actually is.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
deavid <deavidsedice@gmail.com> writes:
Some people already asked for "delayed write" indexes, but the idea gets
discarded because the index could get out of sync, so it can omit results
and this is unacceptable. But i think maybe that could be fixed in several
ways and we can have a fast and reliable index (but maybe not so fast on
selects).
FWIW, GIN indexes already implement something that's like your mode 2 but
a bit better: there's an unordered "pending insertions" list that has to
be scanned by every search in addition to checking the main index body.
Every so often the pending insertions list is automatically flushed into
the main index body.
The reason we do this for GIN is that that index type puts a huge premium
on doing inserts "in bulk"; it's a lot more efficient if you push many
rows into the index at once, because frequently they'll be inserting into
the same per-key posting lists. I do not see much opportunity for a
corresponding gain for btree.
So I really doubt that anyone would have any enthusiasm for saddling btree
with a similar mechanism. It's complicated (and has been the cause of
multiple bugs); it's hard to figure out when is the optimal time to flush
the pending insertions; and it slows down searches in favor of making
inserts cheap, which is generally not the way to bet --- if that's the
tradeoff you want, why not drop the index altogether?
But anyway, since you can use contrib/btree_gin to get more or less btree
semantics for GIN indexes (except for uniqueness enforcement), you might
try whether just replacing your btree indexes with GIN indexes provides
any win for your insertions.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 5, 2015 at 2:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
deavid <deavidsedice@gmail.com> writes:
Some people already asked for "delayed write" indexes, but the idea gets
discarded because the index could get out of sync, so it can omit results
and this is unacceptable. But i think maybe that could be fixed in several
ways and we can have a fast and reliable index (but maybe not so fast on
selects).FWIW, GIN indexes already implement something that's like your mode 2 but
a bit better: there's an unordered "pending insertions" list that has to
be scanned by every search in addition to checking the main index body.
Every so often the pending insertions list is automatically flushed into
the main index body.The reason we do this for GIN is that that index type puts a huge premium
on doing inserts "in bulk"; it's a lot more efficient if you push many
rows into the index at once, because frequently they'll be inserting into
the same per-key posting lists. I do not see much opportunity for a
corresponding gain for btree.
A forest of btrees (say mode 2.c) may not be a bad idea. When tables
grow consistently, the cost of I/O is usually high in FPW and random
I/O due to the large spread of index updates. I don't have numbers,
but on the databases I've handled it certainly was so.
If you have a btree_forest am that will consist of several btrees that
follow the GIN pattern only instead of an unordered list you have an
ordered btree (which also simplifies merging), you should gain a lot.
The big btrees will be read-only, so they will be compact (100% fill
rate), you will generate less WAL (updates are all local on the small
"staging" btree) and even the disk may perform better with that
pattern.
It is in fact a pattern used by inverted indexes already, so it
wouldn't be too far-fetched.
It is however hard to figure out when compaction has to happen.
Concurrency shouldn't be an issue though, since all but the smallest
btree would be read-only, so you only need a lock while modifying the
forest structure (adding a new btree, swapping components with merged
versions, etc).
It would indeed, though, require a lot of extra storage to perform
compaction. An alternative would be to implement compaction as a
massive insert/delete instead. Certainly, how exactly compaction gets
implemented would be key in deciding whether the approach breaks even.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
deavid wrote:
There are several use cases where I see useful an index, but adding it will
slow too much inserts and updates.
Maybe try a BRIN index. You can't use them for text search currently,
or many other cases for that matter, but there are enough interesting
cases in which they are useful that perhaps you don't need anything
extra.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 06/06/15 04:07, deavid wrote:
There are several use cases where I see useful an index, but adding it
will slow too much inserts and updates.
For example, when we have 10 million rows on a table, and it's a table
which has frequent updates, we need several index to speed up selects,
but then we'll slow down updates a lot, specially when we have 10 or
more indexes.
Other cases involve indexes for text search, which are used only for
user search and aren't that important, so we want to have them, but we
don't want the overload they put whenever we write on the table.
I know different approaches that already solve some of those problems
in some ways (table partitioning, partial indexes, etc), but i don't
feel they are the solution to every problem of this kind.Some people already asked for "delayed write" indexes, but the idea
gets discarded because the index could get out of sync, so it can omit
results and this is unacceptable. But i think maybe that could be
fixed in several ways and we can have a fast and reliable index (but
maybe not so fast on selects).Since I do not know every internal of postgres, i feel simpler to
share here and ask which things can or cannot be done.Let's imagine there is a new type of index called "weird_btree", where
we trade-off simplicity for speed. In almost every mode, we will rely
on VACUUM to put our index in optimal state.Mode 1: on "aminsert" mark the index as INVALID. So, if you modified
the table you need to run REINDEX/CREATE INDEX CONCURRENTLY before
doing SELECT. This is almost the same as create index concurrently,
the main difference is you don't have to remember to drop the index
before writing. (I don't see much benefit here)Mode 2: on "aminsert", put the new entry in a plain, unordered list
instead of the btree. Inserting at the end of a list should be faster
than big btrees and you'll know later which entries you missed indexing.Mode 2.a: on index scan (amrescan, amgettuple), pretend that after the
btree there is the list and output every row, out-of order. You will
have to tell postgres that our index isn't sorted and it will have to
recheck every row.Mode 2.b: mark the index invalid instead. When doing the next vacuum,
sort the list and insert it to the btree in a bulk operation. If it's
ok, mark the index valid.Mode 3: on "aminsert", put the new entry on a second btree; leaving
the first one untouched. Because the second btree is new, will be
small, and writes should be faster. When doing a index scan, read
tuples from both at same time (like merge sort). On vacuum, merge the
second btree onto the first. On this mode, the index is sorted and
there's no need of recheck.Anyone thinks this would be a interesting feature for postgresql?
Did I miss something?PD: Maybe it's also possible to take advantage of clustering, and have
indexes which entries are range of TIDs; but i'm not sure if this is
too exotic, or if it will make a difference.Sincerely,
David.
How about a hybrid indexing system, with 2 parts:
(1) existing index system which is checked first and has been mostly
optimised for speed of reading. If there are only a few inserts/updates
and the system is not heavily loaded, then it gets modified
immediately. The threshold for being too busy, and few enough changes,
could be configurable.
(2) overflow index optimised for writing. Possible in memory and not
backed to permanent storage. A crash would require a complete index
rebuild - but only when there were entries in it (or at least more than
some configurable threshold, to allow for cases were some missing index
entries are acceptable).
So where the index is needed for a query, part 1 is checked first, and
the part 2 if necessary
Have a background process that will move entries from part 2 to part 1,
when the systems is less busy.
Cheers,
Gavin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks to everybody for answering. I wasn't expecting this attention; this
is a great community :-)
Jim asked me about something real. Well, the problem is this showed up more
than five years ago, and keeps popping from time to time since in different
circumstances. I solved them in different ways each time, depending the
exact use-case. I wanted to generalize, because seems a good feature for
several situations; and I don't expect a solution for me as each time I hit
with this I found some way to sort it out.
As Jim said, we need here are figures for real examples, and i don't have
yet. I'll do my "homework" and email back with exact problems with exact
timing. Give me a week or two.
Also, some of you are talking about IO. Well, it's hard to say without the
figures here, but I'm pretty sure I'm hitting CPU time only. We use SSD on
those big databases, and also in my tests i tried setting fsync=off.
So the problem is: i see a low iowait, and CPU time for one core is at
80-90% most of the time. I can buy more ram, better disks, or cpu's with
more cores. But one cpu core would have more-or-less the same speed no
matter how much money you invest.
When someone wants a delayed-write index is similar to setting
"synchronous_commit = off". We want to give an OK to the backend as soon
as is possible and do this work in background. But we also want some
reliability against crashes.
Also, if the task is done in background it may be done from other backend,
so probably several indexes could be packed at once using different backend
processes. We could use the entire cpu if our index writes aren't tied to
the session who wrote the row.
PD: I'm very interested on existent approaches like GIN or BRIN (this one
is new to me). Thanks a lot; i'll try them in my tests.
On Fri, Jun 05, 2015 at 11:54:01PM +0000, deavid wrote:
Thanks to everybody for answering. I wasn't expecting this attention; this
is a great community :-)Jim asked me about something real. Well, the problem is this showed up more
than five years ago, and keeps popping from time to time since in different
circumstances. I solved them in different ways each time, depending the
exact use-case. I wanted to generalize, because seems a good feature for
several situations; and I don't expect a solution for me as each time I hit
with this I found some way to sort it out.
As Jim said, we need here are figures for real examples, and i don't have
yet. I'll do my "homework" and email back with exact problems with exact
timing. Give me a week or two.Also, some of you are talking about IO. Well, it's hard to say without the
figures here, but I'm pretty sure I'm hitting CPU time only. We use SSD on
those big databases, and also in my tests i tried setting fsync=off.So the problem is: i see a low iowait, and CPU time for one core is at
80-90% most of the time. I can buy more ram, better disks, or cpu's with
more cores. But one cpu core would have more-or-less the same speed no
matter how much money you invest.When someone wants a delayed-write index is similar to setting
"synchronous_commit = off". We want to give an OK to the backend as soon
as is possible and do this work in background. But we also want some
reliability against crashes.Also, if the task is done in background it may be done from other backend,
so probably several indexes could be packed at once using different backend
processes. We could use the entire cpu if our index writes aren't tied to
the session who wrote the row.PD: I'm very interested on existent approaches like GIN or BRIN (this one
is new to me). Thanks a lot; i'll try them in my tests.
Hi David,
Here is an interesting read comparing LSM and Fractal Tree indexing:
Regards,
Ken
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi again. I tried to do some test on my office computer, but after spending
2-3 hours I gave up. I'm going to need a real SSD disk to try these things.
100k rows of my "delivery notes" table use 100MB of disk; and 2Gb of RAM
may be not enough to emulate a fast IO. (I was disabling fsync, activating
write caches, etc)
I downloaded Postgresql 9.5-dev from git sources, compiled everything and
restored there two client databases (10Gb each one). My goal was to test
gin_btree and brin indexes as well, but i gave up before doing a complete
test of gin.
Now I have a better plan, I'm going to use my laptop (intel i5, 4Gb of ram)
and i will put here a spare SSD i wasn't using (OCZ Agility 2 120Gb). Hope
this time I could get some figures closer to production.
By now, my results were a bit disappointing: (comparing gin_btree against
regular btree for a column with very low cardinality)
- create index and updates: about 10-20% faster (i had a primary key, so
btree unique checks may be here blurring the results)
- selects: about 2-5 times slower
- index size: about 2 times smaller
What i've found is, I was wrong on fillfactor. (Maybe something has changed
here since postgresql 8.1). I believed a fillfactor lower than 80 will do
more harm than good. At least that was the case 5 years ago. Now I could
get a noticeable speedup with fillfactor=50 in the case of updating the
whole table.
Hope i could setup this laptop soon and get those tests done.
El sáb., 6 jun. 2015 a las 13:07, ktm@rice.edu (<ktm@rice.edu>) escribió:
Show quoted text
On Fri, Jun 05, 2015 at 11:54:01PM +0000, deavid wrote:
Thanks to everybody for answering. I wasn't expecting this attention;
this
is a great community :-)
Jim asked me about something real. Well, the problem is this showed up
more
than five years ago, and keeps popping from time to time since in
different
circumstances. I solved them in different ways each time, depending the
exact use-case. I wanted to generalize, because seems a good feature for
several situations; and I don't expect a solution for me as each time Ihit
with this I found some way to sort it out.
As Jim said, we need here are figures for real examples, and i don't have
yet. I'll do my "homework" and email back with exact problems with exact
timing. Give me a week or two.Also, some of you are talking about IO. Well, it's hard to say without
the
figures here, but I'm pretty sure I'm hitting CPU time only. We use SSD
on
those big databases, and also in my tests i tried setting fsync=off.
So the problem is: i see a low iowait, and CPU time for one core is at
80-90% most of the time. I can buy more ram, better disks, or cpu's with
more cores. But one cpu core would have more-or-less the same speed no
matter how much money you invest.When someone wants a delayed-write index is similar to setting
"synchronous_commit = off". We want to give an OK to the backend as soon
as is possible and do this work in background. But we also want some
reliability against crashes.Also, if the task is done in background it may be done from other
backend,
so probably several indexes could be packed at once using different
backend
processes. We could use the entire cpu if our index writes aren't tied to
the session who wrote the row.PD: I'm very interested on existent approaches like GIN or BRIN (this one
is new to me). Thanks a lot; i'll try them in my tests.Hi David,
Here is an interesting read comparing LSM and Fractal Tree indexing:
Regards,
Ken
On Wed, Jun 10, 2015 at 6:01 PM, deavid <deavidsedice@gmail.com> wrote:
By now, my results were a bit disappointing: (comparing gin_btree against
regular btree for a column with very low cardinality)
- create index and updates: about 10-20% faster (i had a primary key, so
btree unique checks may be here blurring the results)
That could be the effect of GIN's buffering (lets call it LSM? it's similar)
So that with pure btrees could get a similar speedup.
On Wed, Jun 10, 2015 at 6:01 PM, deavid <deavidsedice@gmail.com> wrote:
What i've found is, I was wrong on fillfactor. (Maybe something has changed
here since postgresql 8.1). I believed a fillfactor lower than 80 will do
more harm than good. At least that was the case 5 years ago. Now I could get
a noticeable speedup with fillfactor=50 in the case of updating the whole
table.
8.1 didn't have HOT. I'd bet it's that.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 6/5/15 6:54 PM, deavid wrote:
So the problem is: i see a low iowait, and CPU time for one core is at
80-90% most of the time. I can buy more ram, better disks, or cpu's with
more cores. But one cpu core would have more-or-less the same speed no
matter how much money you invest.When someone wants a delayed-write index is similar to setting
"synchronous_commit = off". We want to give an OK to the backend as
soon as is possible and do this work in background. But we also want
some reliability against crashes.Also, if the task is done in background it may be done from other
backend, so probably several indexes could be packed at once using
different backend processes. We could use the entire cpu if our index
writes aren't tied to the session who wrote the row.
Something that might help here would be doing the index maintenance in
parallel via background workers. There's now enough parallelism
infrastructure that it shouldn't be too hard to hack up a quick test of
that.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 5, 2015 at 10:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
So I really doubt that anyone would have any enthusiasm for saddling btree
with a similar mechanism. It's complicated (and has been the cause of
multiple bugs); it's hard to figure out when is the optimal time to flush
the pending insertions; and it slows down searches in favor of making
inserts cheap, which is generally not the way to bet --- if that's the
tradeoff you want, why not drop the index altogether?
I have seen a case that a major fact table with up to 7 indices, every
15~60 mins with large amount of data loading, and there are
concurrrent seeks against indices at the same time. We can play with
partitioning, or sarcrifice some application semantics, to alleviate
the pressure but it is good to see if we can improve: sorting and
batching insert into btree is helpful for better IO and locking
behavior. So can we guard the case that hard to handle, e.g., the
indices enforcing some constraints (like uniqueness), and improve the
loading senario?
Hint bits update is also painful in above case, but it is out of the topic here.
Thanks,
Qingqing
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 5 June 2015 at 18:07, deavid <deavidsedice@gmail.com> wrote:
There are several use cases where I see useful an index, but adding it
will slow too much inserts and updates.
For example, when we have 10 million rows on a table, and it's a table
which has frequent updates, we need several index to speed up selects, but
then we'll slow down updates a lot, specially when we have 10 or more
indexes.
Other cases involve indexes for text search, which are used only for user
search and aren't that important, so we want to have them, but we don't
want the overload they put whenever we write on the table.
I know different approaches that already solve some of those problems in
some ways (table partitioning, partial indexes, etc), but i don't feel they
are the solution to every problem of this kind.Some people already asked for "delayed write" indexes, but the idea gets
discarded because the index could get out of sync, so it can omit results
and this is unacceptable. But i think maybe that could be fixed in several
ways and we can have a fast and reliable index (but maybe not so fast on
selects).
This is exactly the use case and mechanism for BRIN indexes.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
So I just ran a test case for hash, btree, gin_btree and brin indexes. Also
without indexes, and without primary keys.
* Testing "deliverynotes" table.
- Definition and use case:
It is a table contaning real delivery note headers of several years
It consists of 300k rows, 128 columns, 63 indexes, 243Mb of data
excluding indexes. Since is a table visible for users, almost every
column can be searched so we need lots of indexes. We do not need
searches to be the fastest possible, we only need to accelerate a
bit our user searches; without harming too much writes.
- Things to test:
- measure index creation times.
- measure index space.
- with indexes but without primary key
- with everything
- Create fully, delete everything and Insert again data in blocks
- Test updates for recent data
I attached the logs for every test, if anyone wants to see what i'm exactly
testing.
This was tested on my i5 laptop with 4Gb of RAM and a 120Gb SSD (OCZ
Agility 3). I'm trying to measure CPU time, not I/O time, so some
configurations and tests are specific to avoid as much as IO as I can.
I'm using a dev build for Postgresql 9.5 downloaded from git sources.
Conclusions:
- Gin_btree seems slower in almost every case. It's writes are marginally
better than regular btrees even when using work_mem=160MB. (May be 20%
faster than btree). They are smaller than I thought.
- BRIN indexes seem very fast for writes. For selects maybe is a blend
between having indexes and don't having them. They don't recognize that
some values are simply out of range of indexed values, and that's a pity.
If the values we want are packed together I guess I would get even better
results.
- Primary keys and uniqueness checks doesn't seem to make any difference
here.
- Having no indexes at all is faster than I imagined. (Sometimes it beats
BRIN or Btree) Maybe because the IO here is faster than usual.
- Hash indexes: i tried to do something, but they take too much time to
build and i don't know why. If creates are slow, updates should be slow
too. I'm not going to test them again.
And finally, don't know why but i couldn't vacuum or analyze tables. It
always get stalled without doing anything; so i had to comment every
vacuum. Maybe there is a bug in this dev version or i misconfigured
something.
El vie., 12 jun. 2015 a las 7:27, Simon Riggs (<simon@2ndquadrant.com>)
escribió:
Show quoted text
On 5 June 2015 at 18:07, deavid <deavidsedice@gmail.com> wrote:
There are several use cases where I see useful an index, but adding it
will slow too much inserts and updates.
For example, when we have 10 million rows on a table, and it's a table
which has frequent updates, we need several index to speed up selects, but
then we'll slow down updates a lot, specially when we have 10 or more
indexes.
Other cases involve indexes for text search, which are used only for user
search and aren't that important, so we want to have them, but we don't
want the overload they put whenever we write on the table.
I know different approaches that already solve some of those problems in
some ways (table partitioning, partial indexes, etc), but i don't feel they
are the solution to every problem of this kind.Some people already asked for "delayed write" indexes, but the idea gets
discarded because the index could get out of sync, so it can omit results
and this is unacceptable. But i think maybe that could be fixed in several
ways and we can have a fast and reliable index (but maybe not so fast on
selects).This is exactly the use case and mechanism for BRIN indexes.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
Sorry; Because some misconfiugration vacuum and analyze were'nt working.
Now I'm getting better numbers for BRIN indexes where there are zero rows
to match.
El sáb., 13 jun. 2015 a las 3:17, deavid (<deavidsedice@gmail.com>)
escribió:
Show quoted text
So I just ran a test case for hash, btree, gin_btree and brin indexes.
Also without indexes, and without primary keys.
* Testing "deliverynotes" table.
- Definition and use case:
It is a table contaning real delivery note headers of several years
It consists of 300k rows, 128 columns, 63 indexes, 243Mb of data
excluding indexes. Since is a table visible for users, almost every
column can be searched so we need lots of indexes. We do not need
searches to be the fastest possible, we only need to accelerate a
bit our user searches; without harming too much writes.
- Things to test:
- measure index creation times.
- measure index space.
- with indexes but without primary key
- with everything
- Create fully, delete everything and Insert again data in blocks
- Test updates for recent dataI attached the logs for every test, if anyone wants to see what i'm
exactly testing.
This was tested on my i5 laptop with 4Gb of RAM and a 120Gb SSD (OCZ
Agility 3). I'm trying to measure CPU time, not I/O time, so some
configurations and tests are specific to avoid as much as IO as I can.
I'm using a dev build for Postgresql 9.5 downloaded from git sources.Conclusions:
- Gin_btree seems slower in almost every case. It's writes are marginally
better than regular btrees even when using work_mem=160MB. (May be 20%
faster than btree). They are smaller than I thought.
- BRIN indexes seem very fast for writes. For selects maybe is a blend
between having indexes and don't having them. They don't recognize that
some values are simply out of range of indexed values, and that's a pity.
If the values we want are packed together I guess I would get even better
results.
- Primary keys and uniqueness checks doesn't seem to make any difference
here.
- Having no indexes at all is faster than I imagined. (Sometimes it beats
BRIN or Btree) Maybe because the IO here is faster than usual.
- Hash indexes: i tried to do something, but they take too much time to
build and i don't know why. If creates are slow, updates should be slow
too. I'm not going to test them again.And finally, don't know why but i couldn't vacuum or analyze tables. It
always get stalled without doing anything; so i had to comment every
vacuum. Maybe there is a bug in this dev version or i misconfigured
something.El vie., 12 jun. 2015 a las 7:27, Simon Riggs (<simon@2ndquadrant.com>)
escribió:On 5 June 2015 at 18:07, deavid <deavidsedice@gmail.com> wrote:
There are several use cases where I see useful an index, but adding it
will slow too much inserts and updates.
For example, when we have 10 million rows on a table, and it's a table
which has frequent updates, we need several index to speed up selects, but
then we'll slow down updates a lot, specially when we have 10 or more
indexes.
Other cases involve indexes for text search, which are used only for
user search and aren't that important, so we want to have them, but we
don't want the overload they put whenever we write on the table.
I know different approaches that already solve some of those problems in
some ways (table partitioning, partial indexes, etc), but i don't feel they
are the solution to every problem of this kind.Some people already asked for "delayed write" indexes, but the idea gets
discarded because the index could get out of sync, so it can omit results
and this is unacceptable. But i think maybe that could be fixed in several
ways and we can have a fast and reliable index (but maybe not so fast on
selects).This is exactly the use case and mechanism for BRIN indexes.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I did another try on BRIN and GIN indexes, and I compared to regular btree
indexes. Now i have 16M rows to do the test.
The numbers seem to be good. Both GIN and BRIN seem good options for
certain tables with more writes than reads (Specially BRIN is very good)
I want to share with you my test; I used real-world data, but i didn't had
time to do something accurate or real-word uses. I know the methodology is
not enough, and maybe some calculations on the spreadsheet are wrong. I
tried to do my best.
I'm using an SSD and I'm trying to compare CPU cost, not I/O.
In short, the results were: (compared to btree)
- INSERT: GIN is 50% faster; BRIN is 6x faster. This is the best scenario.
- UPDATE: each case has a winner; for big updates BRIN is 10x faster and
GIN is 25x faster. For small updates (most real world cases) BTREE is
always the winner; but BRIN gives some good results too.
- DELETE: Almost no difference between the three.
- SELECT: BTREE here is the winner. BRIN is 10% slower, and GIN performance
seems a bit random.
VACUUM, ANALYZE and other tasks are 6x faster with BRIN, 50% faster with
GIN.
Index sizes are 50% smaller with GIN, but with BRIN they are very very
small
Hope you find useful these numbers.
El sáb., 13 jun. 2015 a las 11:41, deavid (<deavidsedice@gmail.com>)
escribió:
Show quoted text
Sorry; Because some misconfiugration vacuum and analyze were'nt working.
Now I'm getting better numbers for BRIN indexes where there are zero rows
to match.El sáb., 13 jun. 2015 a las 3:17, deavid (<deavidsedice@gmail.com>)
escribió:So I just ran a test case for hash, btree, gin_btree and brin indexes.
Also without indexes, and without primary keys.
* Testing "deliverynotes" table.
- Definition and use case:
It is a table contaning real delivery note headers of several years
It consists of 300k rows, 128 columns, 63 indexes, 243Mb of data
excluding indexes. Since is a table visible for users, almost every
column can be searched so we need lots of indexes. We do not need
searches to be the fastest possible, we only need to accelerate a
bit our user searches; without harming too much writes.
- Things to test:
- measure index creation times.
- measure index space.
- with indexes but without primary key
- with everything
- Create fully, delete everything and Insert again data in blocks
- Test updates for recent dataI attached the logs for every test, if anyone wants to see what i'm
exactly testing.
This was tested on my i5 laptop with 4Gb of RAM and a 120Gb SSD (OCZ
Agility 3). I'm trying to measure CPU time, not I/O time, so some
configurations and tests are specific to avoid as much as IO as I can.
I'm using a dev build for Postgresql 9.5 downloaded from git sources.Conclusions:
- Gin_btree seems slower in almost every case. It's writes are marginally
better than regular btrees even when using work_mem=160MB. (May be 20%
faster than btree). They are smaller than I thought.
- BRIN indexes seem very fast for writes. For selects maybe is a blend
between having indexes and don't having them. They don't recognize that
some values are simply out of range of indexed values, and that's a pity.
If the values we want are packed together I guess I would get even better
results.
- Primary keys and uniqueness checks doesn't seem to make any difference
here.
- Having no indexes at all is faster than I imagined. (Sometimes it beats
BRIN or Btree) Maybe because the IO here is faster than usual.
- Hash indexes: i tried to do something, but they take too much time to
build and i don't know why. If creates are slow, updates should be slow
too. I'm not going to test them again.And finally, don't know why but i couldn't vacuum or analyze tables. It
always get stalled without doing anything; so i had to comment every
vacuum. Maybe there is a bug in this dev version or i misconfigured
something.El vie., 12 jun. 2015 a las 7:27, Simon Riggs (<simon@2ndquadrant.com>)
escribió:On 5 June 2015 at 18:07, deavid <deavidsedice@gmail.com> wrote:
There are several use cases where I see useful an index, but adding it
will slow too much inserts and updates.
For example, when we have 10 million rows on a table, and it's a table
which has frequent updates, we need several index to speed up selects, but
then we'll slow down updates a lot, specially when we have 10 or more
indexes.
Other cases involve indexes for text search, which are used only for
user search and aren't that important, so we want to have them, but we
don't want the overload they put whenever we write on the table.
I know different approaches that already solve some of those problems
in some ways (table partitioning, partial indexes, etc), but i don't feel
they are the solution to every problem of this kind.Some people already asked for "delayed write" indexes, but the idea
gets discarded because the index could get out of sync, so it can omit
results and this is unacceptable. But i think maybe that could be fixed in
several ways and we can have a fast and reliable index (but maybe not so
fast on selects).This is exactly the use case and mechanism for BRIN indexes.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
compare_indexes.odsapplication/vnd.oasis.opendocument.spreadsheet; name=compare_indexes.odsDownload
PK I��F�l9�. . mimetypeapplication/vnd.oasis.opendocument.spreadsheetPK I��F content.xml�]ms����_�q;��T��B����N����q^�q:���Z�e�I�������R�%[v�`[��8����Y,^��?�N'�/Q���l�zd����(����>�:j�����Srv��(^L�Y�&��mA�Y�+���]��^fq����(���^2�fU��2�W�U�d�� :{^��G_slf�]���K.���Gix��l�P����l����s�@�O�a_���$����w���^�{yy�]r/I�]���Ok��5n~�N
�h��&�),�R�v+�4�C�|�,��bz��� ��F���(�k���<+��e���/�[�yx�h;+����GxS�����0?��}U�<,��;���t�-�`W�j��s��%z9�$��&CI�B\F��-/�/��_�q�K����a8�5�L�U�h���1��D�"�[2�n��g�[_����O��h^����;�,���U���n�Tt�h��y]1gx���j�����v�a�V�q:���8�nH��G�]��w����ezgJ�SS�������:K.f��
��}�Gil��"[o�
�<I�r��-�a�����b����$Ig��1��'��R��~#�~���z2:���k�f�����q�<��n:�EIK��{]��/��u����puF�p������ur��m���;���(���)���W�i<�����p�d?^���{��W|g��"�]d�q�� �q>��%L����w�v����n��0��ey4��Li���Z�p29
��.�M���8H���J[zt_ ��Y�"=���@�a�xOm���y� �[�Yx'�&��^�s9�3VEiGY�,���Q�g�4��M����xd�r�1���B�%q���=�l�p+����-M.� )�R��L�y����
�[��,�$�<����r�<���r��z���)�Q�����Sy���b�_SjI�r�0���$���g�6�@���&#x�$���7E-��E���)�{�oW@��/0��:��aq����4��o��8���(zMiwX`t;���z2S3:,~\������J��J���q��-�M�+R�[�R$E��0t��,�h����q���a]���r�&������U�e�Y<��$A��s������w�I<������*��)ax����UoK���:a6��ye(E?���=,t�5�S�����W��Qn�V��xp���]E0�G�y��*iW�]P��_���e��y�W���pnJ)�?��e]�X3� �"{Us]��h���WL��'����
O+�O���Si�+U"�A�'9��n��������?���gI�}�$������������}��w�y���n����i�:mP�
��`�L���F^�CY$���os�.���"`/��s3~�����p�_�P�{��( G����;���EG��*g 6Wj��A�h\z����z����u�K�G}��u��h�I-yX��i�p�:�'nP�
$��2��X�J�Y�?�oX��k�lL{As���j��6�:��N[���i�;q�:m�f�Qc��Fw�{X�c���~�B,��2)���I�N���������f�4�GP+����i���)���I�����J&S5N.��������V5�����<������{h��G��W��|��<py?���v%b��{���5��lql�������Oo��o�z�����a+������_i�JY�Rf��F��I��O��a�E+��d�M����V�q �� >( ������Z���`�- 8J �����6��m6XA}��s#�f�* ��`o�e�l��>��eA� p;��i����m6X&%�^�te��* ��`o�e�l��>K��h@�Tv�S2����'o�zMZ�������{�Yo���w�~���wH��?���b�By�5b�A����T�
�) nH��9�� Fp��������j���L�{�����Lif7Q�e�\����@:���7d-���Yh %��*�O�-���~s��������Q���!9��@�8a��8�.�1~u� N��%t���D�S<(-���)�PG#*)Q��f�u#�$���(�;j��������'-J�=0-v�.�z��8��O�%%���g�Q77G[ ]��I�[X(�;6<@��xck��`M�m��t�<ni[����A�q �T��tUB�����Nb���Qc����P���#!$�N}���XdK�i%_ ���a8r)�@:���� _��"[�M+)�����0\q��H7B>!8!
���V�L���%z
2p6��Fn�,��jB0��UQ@_���B&�p �JAp�T ]��(�w+�������D ������ �9������ag
��&��(�O�f��|8>���[��-��EL��TP�*Y��Zo�Z����w�����������i\!w�`��O����H~��F�_j�������Hk
u�C�����\X���(�)�A�cN7���x
;cH��n�g��� �q��l�����t������6��9�D~�Yh��n����3N#e�q�A9��l8M�G�
�1���� ������NY�DB�:Xku.}��{�Z/lJ[��nHj��y�D~���B���y�D����6Xg����������kNJ��mbp��&e�NY�o����Xn���Br�Y;PHO!�(�m�hy7���B����c7]X��*��&n�n{�F�3�K���
�v��;cn�v�FJr�a�,����FS�1l���3�9\�����++�t��c����y��O���`����W�\ !�'I@}����<����Ng��f.�7��7���q�E���n7��d���j��`@s�� ��U�)�����c�N�:����&���|��x��r����V���dj�
�t�}
7�Z.�G�BI��r���������.g<��3�Tdjo�t�����a��������������<��"��6j ��z��g�p'�-Yc/u4O}���5HGu��sf���=��hH�#+jv4���<�PB��!���)��lW:[�������G���a�����EG�"����,��9����h�����5V u�K3@G��"%�%�����4�������HG� X�rs>�^��R�3��_bY��~
�#OK��C��<�$�{����-�G�7�PGnN0�
��p�!;t��v�� _��BP���.���-
�y��J���PW3�����������$�W������wc�g���n�9��K����?���b��� ��5������"�H�^*<�5��))��A��>�v�*�&m"�j!�M�U��2���^�w���}�K��f@�|-|BS�9>��f����s��������1����AH���m�53{r62�����
Rn����3��n�j�t���s�3��+p�������=}�5�=_h�SI��~U�o���������qWY{}�If�m��*�w���c6PEy���1W�����]ckCf{U�#b���d��3��v��2h�J���&4�x�������gm]���!D��%'��E����X��l���1"���� {���m��}{>��B<,���9�LI[;vM{U�g�PT2p��o���
����� b:�|&�{6W�9�L����*@gB�tR2������5��B=����;��`����2�5���<
�a�OA~�g������3m�
*h@�"\�cTgg*+>�����A*_�I���m��$KEg{M�G?�Z�c/�������j�(�<�@oy�V���
��u�L=�)5���G~b��6sL�C{U���o��B��;��r�4�o�I A6���$@nVU��[��l����f� 5zs�Y^���O�S��pr�
��c���A��2J����>�jk�t�m�s :��&$�p����y�/<P�<Fg��Dx�����:��qG�rwt�t#���-���P���{��f�}7M����n��;Cg�7����'R)%t��������mU9�W��b������6��p>I�o��Yxf\�r�R�Xxn�
�`��}a>+�]r���x������a��0{��G��L��O������kB���@��!9A����r=��@��
����u�6T�A_�}��gVa�/����M��bl��*��~W*J�J9��R$\�4S���)���a�`/�T/51�D_�i�eq2�L��R$���<�����d����(^L�YB�r����PK��� Z� PK I��F meta.xml��K�� �����
�56r<�,F]L5��J�E�Ii1D�3����&m����;�s��N�
��v��
�Ahn���
��}
�P��������}��-�&�V��yi�z��i:�1���1��9�^-��fS������6��sG��0��D�0-�O�+*��;�VM��� ��+;6��R#{Y�s������}�Vuk �&�7l\�I>�`���cT��;������8O��uLh�<����0�dY��'YR����
G%8��[Z0J-�4��,I���+6����_|(z;�U���h���_���k#�
�:��^��=�o6�~�{/��8������g����{���4� vGk~w8�IK�{�D/9��#��i�|��IL�k�
Bnz����Y������\T��V=A����5�[�D�PK({R�� P PK I��F
styles.xml�Z�n�6��S*Pt�h��Oc7q.��E����(�+%
$'����~o�7����i�����]�$H �;�����������������b3R�<�Ev���:.7���4�1Y'<�rR($�-#r��\������cI���9�k�yI
'�n��f);b�M7���"7j����d����
�)���*���iS<�S�o$C)G1�K���7�.��R�:w��|w4�"��*4������J0�J��0���b�6'
O�Oc�&U~E�dj�����H��vu\NS��i��u69�����-������r�L���)�c���Y�
&��W/�q%��kil��X�r�6-�)�9��j����e�������w�*"�xc{�y�G�! ��!��BozP�I(H������������U9NU=���H�^(�sB�B��kJv�Z�l��Uh@��5*��B��) ������q���Hl�Y2�MI�S��uKC3b����ihh���G����oC=��1 ����8����u)�c.�1A �������g�Yw��P����w����:��9e��c\r�� g�YK�����[�v��R�%U1��k,�qf8n�s�;��7���b��T$�M���,7{�����a���/b�A� i���ZE[=n�,giBR\���r�k�Lf��08x��.����BQh���A/QB����� �O�|O�N���1t �S�f��*��,�1��`�O��C�'.C�`���u3�B7����Z}�3r3�����P������Q�S<b���wG���BK*�?h{������� >f���S���y�\�qjX��*�K���ZDnq�w�G�`����7y�&�)��W���p� ��Y�����*bU��0VA"P����] ��e���X�}�C��r���|��,-��W}�0J����b&�g�E��{�oW�Q^I��]�\��k(Q���*@:�H ��b�B��$��StY��i���� } �04Y����_�a�������_��z�}h��+J}Si��-n��� 4��rZ@0*��\B3�$8�,�����+!�b}���":~�]\sQ�o��aF�)As�P� � h1G�s���U�����y���=������gNW;������k<P�������6�`f�F�u�I���o�_KO�K�EB��q�YE�|�8S�.|���d5�������d�C�m�'bO�rG���So��)�5�����0����}�,nc]�$�3�����Vk�p����9�(w�c������G2�Vs�z���
��f;W<"�an1�3�@v4��At2���~b�������������;mb�u�0��}Ww�� ��`��x������@�WBwv)�q��\X� �P��o����� A"���i���E���Y3����S(���DP�����kdxp��up� ���Ow���[^���^��"�u�jw�L u���v��XZc)���l;L���� ����6�a,2�a$���AQ���W\)��.�w�p����/o%�&zLl�v|0�w�o�wz���D��_��W%�VW����|@���R4�gN
f�7��������4��e�'��~�����uwT��������*��5�Z�X�����g����������b�������p�)H8��v�;������s�Yn��o����g��������[�1���ik�>jE���.v��[��������L�a��O�{���U� ���w�h��2Yt�:tRd���V�����@rE����Y�����DQm�n~�9���Uk�S�8]����O��cH�����^bm~���"����3,gahn���� �c�)|��������_PK�e�~6 � PK I��F manifest.rdf���n�0��<�e��@/r(��j��5�X/������VQ�������F3�����a�����T4c)%�Hh��+:�.���:���+��j���*�wn*9_��-7l���(x��<O�"��8qH��� �Bi��|9�� fWQt���y� =��:���
a�R��� ��@� L��t��NK�3��Q9�����`����<`�+�������^����\��|�hz�czu����#�`�2�O��;y���.�����vDl@��g�����UG�PK��h� � PK I��F Configurations2/popupmenu/PK I��F Configurations2/statusbar/PK I��F Configurations2/toolbar/PK I��F Configurations2/menubar/PK I��F Configurations2/floater/PK I��F '