Comparison of Oracle and PostgreSQL full text search
Thanks to some very helpful input here in earlier threads, I was
finally able to pull together a working prototype Full Text Search
'engine' on PostgreSQL and compare it directly to the way the
production Oracle Text works. The good news is that PostgreSQL is
bloody fast! The slightly iffy news is that the boss is now moaning
about possible training costs!
For what it's worth, I wrote up the performance comparison here:
http://diznix.com/dizwell/archives/153
Maybe it will be of use to anyone else wondering if it's possible to
do full text search and save a couple hundred thousand dollars whilst
you're at it!
Regards
HJR
On Tue, Jul 27, 2010 at 7:58 PM, Howard Rogers <hjr@diznix.com> wrote:
Thanks to some very helpful input here in earlier threads, I was
finally able to pull together a working prototype Full Text Search
'engine' on PostgreSQL and compare it directly to the way the
production Oracle Text works. The good news is that PostgreSQL is
bloody fast! The slightly iffy news is that the boss is now moaning
about possible training costs!
Someone running Oracle is complaining about training costs? That
seems a bit like complaining about needing to give the bellboy a $1
tip at a $1k a night hotel.
For what it's worth, I wrote up the performance comparison here:
http://diznix.com/dizwell/archives/153
Cool, I'll read up. Thanks.
Howard Rogers, 28.07.2010 03:58:
Thanks to some very helpful input here in earlier threads, I was
finally able to pull together a working prototype Full Text Search
'engine' on PostgreSQL and compare it directly to the way the
production Oracle Text works. The good news is that PostgreSQL is
bloody fast! The slightly iffy news is that the boss is now moaning
about possible training costs!
Why is it that managers always see short term savings but fail to see longterm expenses?
For what it's worth, I wrote up the performance comparison here:
http://diznix.com/dizwell/archives/153Maybe it will be of use to anyone else wondering if it's possible to
do full text search and save a couple hundred thousand dollars whilst
you're at it!
Very interesting reading.
Would you mind sharing the tables, index structures and search queries that you used (both for Oracle and Postgres)?
Regards
Thomas
2010/7/28 Thomas Kellerer <spam_eater@gmx.net>:
Why is it that managers always see short term savings but fail to see
longterm expenses?
It's all about CAPEX vs OPEX, baby!
Besides jokes, it's actually myopia.
Because they ALREADY spent money for training they don't see the need
for extra training (and costs), as if people would remain there forever and
knowledge is a definitive thing!
THe point would be to put costs in a time perspective, that is, how
much would it cost in,
say, 5 years, with PG and the same for Oracle.
--
Vincenzo Romano
NotOrAnd Information Technologies
NON QVIETIS MARIBVS NAVTA PERITVS
Howard,
that was a great read!
I especially like your sentence
""" Considering that any search containing more than a half-dozen
search terms is more like an essay than a realistic search; and
considering that returning half a million matches is more a data dump
than a sensible search facility,"""
which really pulls some benchmark-perspectives back into real live.
Thank you,
Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
Using PostgreSQL is mostly about sleeping well at night.
On Tue, Jul 27, 2010 at 9:58 PM, Howard Rogers <hjr@diznix.com> wrote:
Thanks to some very helpful input here in earlier threads, I was
finally able to pull together a working prototype Full Text Search
'engine' on PostgreSQL and compare it directly to the way the
production Oracle Text works. The good news is that PostgreSQL is
bloody fast! The slightly iffy news is that the boss is now moaning
about possible training costs!For what it's worth, I wrote up the performance comparison here:
http://diznix.com/dizwell/archives/153
I always thought there is a clause in their user agreement preventing
the users from publishing benchmarks like that. I must be mistaken.
zhong ming wu wrote:
On Tue, Jul 27, 2010 at 9:58 PM, Howard Rogers <hjr@diznix.com> wrote:
For what it's worth, I wrote up the performance comparison here:
http://diznix.com/dizwell/archives/153I always thought there is a clause in their user agreement preventing
the users from publishing benchmarks like that. I must be mistaken.
Perhaps not as I remember such issues a few years when the company I
worked at profiled postgres against Oracle. Oracle doesn't want poorly-tuned
systems being used as benchmarks. Or so they claim.
Our tests -- very much oriented at postGIS found Oracle to be between 5
and 15% _faster_ depending on the specifics of the task. We decided to go
with postgres given the price difference (several hundred thousand dollars for
Oracle in the configuration we needed vs. zip for postgres -- we already had
trained postgres DBAs).
YMMV.
Greg Williamson
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
zhong ming wu wrote:
I always thought there is a clause in their user agreement preventing
the users from publishing benchmarks like that. I must be mistaken.
No you're correct. Currently, to download the current Oracle 11.2g, one must
agree to:
http://www.oracle.com/technetwork/licenses/standard-license-152015.html
which contains:
<quote>
[...]
You may not:
[...]
- disclose results of any program benchmark tests without our prior consent.
[...]
</quote>
Not having such frustrating license terms is also what makes PostgreSQL a
nicer alternative!
Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org
On Tue, 27 Jul 2010 23:24:12 -0600, Scott Marlowe
<scott.marlowe@gmail.com>
wrote:
Someone running Oracle is complaining about training costs? That
seems a bit like complaining about needing to give the bellboy a $1
tip at a $1k a night hotel.
Depending on how they are running their licensing,
(user/processor/standard/enterprise) Oracle can actually be reasonable in
the sense of a commercial database. That said, PostgreSQL training is
cheap. If you have enough people in your org to justify a on-site training,
a 2 day Administration + Performance + Maintenance class is only 5k
(remember on-site).
Over time though, there is no question that hands down PostgreSQL will
save you money. You can get an Enterprise class support contract for 500.00
a month per server.
Sincerely,
Joshua D. Drake
--
PostgreSQL - XMPP: jdrake(at)jabber(dot)postgresql(dot)org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997
Thomas Kellerer <spam_eater@gmx.net> writes:
Howard Rogers, 28.07.2010 03:58:
For what it's worth, I wrote up the performance comparison here:
http://diznix.com/dizwell/archives/153
Very interesting reading.
Indeed.
Would you mind sharing the tables, index structures and search queries that you used (both for Oracle and Postgres)?
What I'd be interested in is EXPLAIN ANALYZE results. In particular,
I wonder if the planner was switching from indexscan to seqscan plans
for the cases where many rows would be returned, and if so whether it
got the cutover point right.
regards, tom lane
On Wed, Jul 28, 2010 at 8:38 PM, Daniel Verite <daniel@manitou-mail.org> wrote:
zhong ming wu wrote:
I always thought there is a clause in their user agreement preventing
the users from publishing benchmarks like that. I must be mistaken.No you're correct. Currently, to download the current Oracle 11.2g, one must
agree to:
http://www.oracle.com/technetwork/licenses/standard-license-152015.htmlwhich contains:
<quote>
[...]
You may not:
[...]
- disclose results of any program benchmark tests without our prior consent.
[...]
</quote>Not having such frustrating license terms is also what makes PostgreSQL a
nicer alternative!Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hi Daniel:
Nice catch -the thing is, you've linked to the *technet* license. The
one you sign up to when you download the product for free, for
development, prototyping and self-learning purposes. That's not the
same license as the one you sign up to when you pay them stacks of
cash for the 'proper' product for a production deployment (which I
haven't read lately, so I can't say the same silly term isn't in
there, but I'm just saying: the license you linked to is not the one
that applies).
Also, I would argue that what I did was not a 'benchmark test'. We
capture the results and timings of queries as part of our production
application, for management and review purposes. Those are real
results, experienced by real users... not what I'd call a benchmark
"test". (The PostgreSQL results are, certainly, an artificial
benchmark, but then the Oracle license doesn't cover those, happily!)
Regards
HJR
On 28/07/10 02:58, Howard Rogers wrote:
For what it's worth, I wrote up the performance comparison here:
http://diznix.com/dizwell/archives/153
Thanks very much Howard.
It might be my schoolboy-physics ability to fit a curve to two data
points, but does anyone else think that the second and third graphs look
like a sinusoidal variation overlaid on a steadily increasing baseline?
--
Richard Huxton
Archonet Ltd
Greg Williamson wrote:
Our tests -- very much oriented at postGIS found Oracle to be between 5
and 15% _faster_ depending on the specifics of the task. We decided to go
with postgres given the price difference (several hundred thousand dollars for
Oracle in the configuration we needed vs. zip for postgres -- we already had
trained postgres DBAs).
Interesting. Do you have information about the versions of PostGIS/GEOS
that you were using? Of course we'd be very interested to see examples
of test cases with bad performance on the postgis-users list so that we
can improve them.
ATB,
Mark.
--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063
Sirius Labs: http://www.siriusit.co.uk/labs
On 28 July 2010 02:58, Howard Rogers <hjr@diznix.com> wrote:
For what it's worth, I wrote up the performance comparison here:
http://diznix.com/dizwell/archives/153
Thanks, very interesting results. I wonder, are the results being
sorted by the database? The performance degradation for large numbers
of results might be explained by it switching over from an internal to
an external sort, in which case tweaking work_mem might make a
difference.
Of course this is pure speculation without the EXPLAIN ANALYSE output.
Regards,
Dean
On Thu, Jul 29, 2010 at 10:33 PM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
On 28 July 2010 02:58, Howard Rogers <hjr@diznix.com> wrote:
For what it's worth, I wrote up the performance comparison here:
http://diznix.com/dizwell/archives/153Thanks, very interesting results. I wonder, are the results being
sorted by the database? The performance degradation for large numbers
of results might be explained by it switching over from an internal to
an external sort, in which case tweaking work_mem might make a
difference.Of course this is pure speculation without the EXPLAIN ANALYSE output.
Regards,
Dean
Yes, the results were being sorted. I did various tests, changing
work_mem, shared_buffers and much else, one by one, until I arrived at
the combination of settings that gave me the best 'total search time'
results. Personally, I couldn't see any difference in the explain
plans, but I was in a bit of a hurry and I may have missed it.
For the search term 'woman', which matches 1,590,275 documents, here's
the explain plan:
"Sort (cost=185372.88..185372.93 rows=20 width=312) (actual
time=10537.152..10537.154 rows=20 loops=1)"
" Sort Key: a.rf, a.sort_id"
" Sort Method: quicksort Memory: 48kB"
" -> Result (cost=109119.55..185372.45 rows=20 width=312) (actual
time=4309.020..10537.116 rows=20 loops=1)"
" -> Append (cost=109119.55..185372.45 rows=20 width=312)
(actual time=4309.018..10537.108 rows=20 loops=1)"
" -> Subquery Scan a (cost=109119.55..109119.68 rows=10
width=312) (actual time=4309.018..4309.026 rows=10 loops=1)"
" -> Limit (cost=109119.55..109119.58 rows=10
width=641) (actual time=4309.016..4309.019 rows=10 loops=1)"
" -> Sort (cost=109119.55..109121.94
rows=957 width=641) (actual time=4309.014..4309.015 rows=10 loops=1)"
" Sort Key: search_rm.sort_id"
" Sort Method: top-N heapsort Memory: 35kB"
" -> Bitmap Heap Scan on search_rm
(cost=6651.07..109098.87 rows=957 width=641) (actual
time=272.851..4021.458 rows=583275 loops=1)"
" Recheck Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''woman'''::tsquery)"
" Filter:
((bitand(sales_method_code, 1) > 0) AND (bitand(subsiter, 1) > 0) AND
(bitand(filetype, 1) > 0))"
" -> Bitmap Index Scan on
rmsearch_idx (cost=0.00..6650.83 rows=25826 width=0) (actual
time=165.711..165.711 rows=586235 loops=1)"
" Index Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''woman'''::tsquery)"
" -> Subquery Scan b (cost=76252.65..76252.77 rows=10
width=312) (actual time=6228.073..6228.080 rows=10 loops=1)"
" -> Limit (cost=76252.65..76252.67 rows=10
width=727) (actual time=6228.072..6228.075 rows=10 loops=1)"
" -> Sort (cost=76252.65..76254.29 rows=655
width=727) (actual time=6228.071..6228.072 rows=10 loops=1)"
" Sort Key: search_rf.sort_id"
" Sort Method: top-N heapsort Memory: 38kB"
" -> Bitmap Heap Scan on search_rf
(cost=5175.18..76238.49 rows=655 width=727) (actual
time=363.684..5748.279 rows=1007000 loops=1)"
" Recheck Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''woman'''::tsquery)"
" Filter:
((bitand(sales_method_code, 1) > 0) AND (bitand(subsiter, 1) > 0) AND
(bitand(filetype, 1) > 0))"
" -> Bitmap Index Scan on
rfsearch_idx (cost=0.00..5175.02 rows=17694 width=0) (actual
time=242.859..242.859 rows=1030282 loops=1)"
" Index Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''woman'''::tsquery)"
"Total runtime: 10538.832 ms"
And here's the plan for the search term "clover", which matches only
2,808 records in total:
" -> Result (cost=109119.55..185372.45 rows=20 width=312) (actual
time=16.807..23.990 rows=20 loops=1)"
" -> Append (cost=109119.55..185372.45 rows=20 width=312)
(actual time=16.806..23.985 rows=20 loops=1)"
" -> Subquery Scan a (cost=109119.55..109119.68 rows=10
width=312) (actual time=16.806..16.812 rows=10 loops=1)"
" -> Limit (cost=109119.55..109119.58 rows=10
width=641) (actual time=16.805..16.807 rows=10 loops=1)"
" -> Sort (cost=109119.55..109121.94
rows=957 width=641) (actual time=16.804..16.805 rows=10 loops=1)"
" Sort Key: search_rm.sort_id"
" Sort Method: top-N heapsort Memory: 35kB"
" -> Bitmap Heap Scan on search_rm
(cost=6651.07..109098.87 rows=957 width=641) (actual
time=1.054..15.577 rows=1807 loops=1)"
" Recheck Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''clover'''::tsquery)"
" Filter:
((bitand(sales_method_code, 1) > 0) AND (bitand(subsiter, 1) > 0) AND
(bitand(filetype, 1) > 0))"
" -> Bitmap Index Scan on
rmsearch_idx (cost=0.00..6650.83 rows=25826 width=0) (actual
time=0.615..0.615 rows=1807 loops=1)"
" Index Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''clover'''::tsquery)"
" -> Subquery Scan b (cost=76252.65..76252.77 rows=10
width=312) (actual time=7.161..7.166 rows=10 loops=1)"
" -> Limit (cost=76252.65..76252.67 rows=10
width=727) (actual time=7.161..7.163 rows=10 loops=1)"
" -> Sort (cost=76252.65..76254.29 rows=655
width=727) (actual time=7.160..7.161 rows=10 loops=1)"
" Sort Key: search_rf.sort_id"
" Sort Method: top-N heapsort Memory: 35kB"
" -> Bitmap Heap Scan on search_rf
(cost=5175.18..76238.49 rows=655 width=727) (actual time=0.433..6.642
rows=1001 loops=1)"
" Recheck Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''clover'''::tsquery)"
" Filter:
((bitand(sales_method_code, 1) > 0) AND (bitand(subsiter, 1) > 0) AND
(bitand(filetype, 1) > 0))"
" -> Bitmap Index Scan on
rfsearch_idx (cost=0.00..5175.02 rows=17694 width=0) (actual
time=0.250..0.250 rows=1045 loops=1)"
" Index Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''clover'''::tsquery)"
"Total runtime: 24.143 ms"
I can't see any change to the sorting behaviour there. Work_mem was
set to 4096MB, shared buffers to 12228MB, temp_buffers to 1024MB,
effective_cache_size to 18442MB.
Sadly, I won't be able to provide much further analysis or
information, because the box concerned is being wiped. The MD decided
that, as a matter of corporate governance, he couldn't punt the
company on PostgreSQL, so my experimenting days are over. Back to
Oracle: slower, but with a support contract he can sue on, I guess!
Regards
HJR
Greg Williamson wrote:
Our tests -- very much oriented at postGIS found Oracle to be between 5
and 15% _faster_ depending on the specifics of the task. We decided to go
with postgres given the price difference (several hundred thousand dollars for
Oracle in the configuration we needed vs. zip for postgres -- we already had
trained postgres DBAs).
Can always throw the licensing savings toward larger hardware too; $100K
buys a pretty big server nowadays. At the FAA's talk about their
internal deployment of PostgreSQL:
https://www.postgresqlconference.org/2010/east/talks/faa_airports_gis_and_postgresql
They were reporting that some of their difficult queries were
dramatically faster on PostgreSQL; I vaguely recall one of them was 100X
the speed it ran under Oracle Spatial. It was crazy. As always this
sort of thing is very workload dependent. There are certainly queries
(such as some of the ones from the TPC-H that big DB vendors optimize
for) that can be 100X faster on Oracle too.
--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us
On Thu, Jul 29, 2010 at 5:42 PM, Greg Smith <greg@2ndquadrant.com> wrote:
Greg Williamson wrote:
Our tests -- very much oriented at postGIS found Oracle to be between 5
and 15% _faster_ depending on the specifics of the task. We decided to go
with postgres given the price difference (several hundred thousand dollars
for
Oracle in the configuration we needed vs. zip for postgres -- we already
had
trained postgres DBAs).Can always throw the licensing savings toward larger hardware too; $100K
buys a pretty big server nowadays.
Hear hear! You can get a quad x 12 core (48 cores total) server with
128G ram and 32 15k6 hard drives for well under $25k nowadays. For
$50k or so you can throw 100 hard drives at the problem.
On 30 July 2010 00:38, Howard Rogers <hjr@diznix.com> wrote:
I can't see any change to the sorting behaviour there. Work_mem was
set to 4096MB, shared buffers to 12228MB, temp_buffers to 1024MB,
effective_cache_size to 18442MB.
Ah yes. The sorting idea was a complete red herring. The top-N
heapsort to pick the 10 best results will never use much memory. It
looks like it all boils down to the sheer number of matches against
the search term that have to be considered in the first case. Others
on this list might have better ideas as to whether this can be
improved upon.
Sadly, I won't be able to provide much further analysis or
information, because the box concerned is being wiped. The MD decided
that, as a matter of corporate governance, he couldn't punt the
company on PostgreSQL, so my experimenting days are over. Back to
Oracle: slower, but with a support contract he can sue on, I guess!
Yeah, I've been there too.
Thanks and good luck,
Dean
Show quoted text
Regards
HJR
Howard Rogers <hjr@diznix.com> writes:
Sadly, I won't be able to provide much further analysis or
information, because the box concerned is being wiped. The MD decided
that, as a matter of corporate governance, he couldn't punt the
company on PostgreSQL, so my experimenting days are over. Back to
Oracle: slower, but with a support contract he can sue on, I guess!
Too bad. I'm sure EnterpriseDB or one of the other PG support companies
would be happy to sell you a support contract, if having somebody to sue
is an essential part of happiness.
regards, tom lane
On Thu, Jul 29, 2010 at 8:04 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
$50k or so you can throw 100 hard drives at the problem.
Or even one of these: http://www.ramsan.com/products/ramsan-620.asp :-)