Where **not** to use PostgreSQL?
Hi experts,
where would you suggest someone to **not** use PostgreSQL?
Why would you do this?
What alternative would you suggest instead?
Regards,
Thomas Güttler
--
Thomas Guettler http://www.thomas-guettler.de/
I am looking for feedback: https://github.com/guettli/programming-guidelines
Hi
čt 28. 2. 2019 v 12:47 odesílatel Thomas Güttler <
guettliml@thomas-guettler.de> napsal:
Hi experts,
where would you suggest someone to **not** use PostgreSQL?
Why would you do this?
What alternative would you suggest instead?
Don't use Postgres like cache, don't use Postgres for non transactional
short life often updated data.
Use inmemory databases instead
Pavel
Show quoted text
Regards,
Thomas Güttler--
Thomas Guettler http://www.thomas-guettler.de/
I am looking for feedback:
https://github.com/guettli/programming-guidelines
On Thu, Feb 28, 2019 at 1:09 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:
Hi
čt 28. 2. 2019 v 12:47 odesílatel Thomas Güttler <
guettliml@thomas-guettler.de> napsal:Hi experts,
where would you suggest someone to **not** use PostgreSQL?
Hard question. There are a lot of general places where PostgreSQL is not
by itself the best fit, and where a careful weighing of pros and cons would
need to be made before deciding to use it.
Having used PostgreSQL in place of ElasticSearch for PB-scale deployments,
as a high-throughput queue system, and near-real-time OLAP workloads, I am
fairly aware of how hard it can be pushed.
So the answers here are not "don't use PostgreSQL here" but "think about it
first and consider alternatives."
Why would you do this?
Replacing with "What would you consider to be the tradeoffs?"
What alternative would you suggest instead?
So a few possibilities:
1. a) TB-scale full text search systems.
b) PostgreSQL's full text search is quite capable but not so powerful
that it can completely replace Lucene-based systems. So you have to
consider complexity vs functionality if you are tying with other data that
is already in PostgreSQL. Note further that my experience with at least
ElasticSearch is that it is easier to scale something built on multiple
PostgreSQL instances into the PB range than it is to scale ElasticSearch
into the PB range.
c) Solr or ElasticSearch
2. a) High performance job queues
b) PostgreSQL index and table structures are not well suited to large
numbers of ordered deletes. There are ways around these problems and again
if other data is in PostgreSQL, the tradeoff is around transactional
behavior and complexity there vs ease of scaling performance.
c) Redis if the job queue easily fits into a small enough amount of
memory, or Kafka if it does not
On to where you actually should never use PostgreSQL:
Don't use PostgreSQL for things where you do not want or cannot guarantee
transactional atomicity.
While it is possible to have untrusted languages have side effects in the
real world, the fact is that mixing transactions and non-transactional
behavior in this way adds a lot of really ugly complexity.
Use another development environment instead.
Don't use Postgres like cache, don't use Postgres for non transactional
short life often updated data.Use inmemory databases instead
Pavel
Regards,
Thomas Güttler--
Thomas Guettler http://www.thomas-guettler.de/
I am looking for feedback:
https://github.com/guettli/programming-guidelines
--
Best Wishes,
Chris Travers
Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor
lock-in.
http://www.efficito.com/learn_more
On Thu, Feb 28, 2019 at 1:24 PM Chris Travers <chris.travers@gmail.com>
wrote:
1. a) TB-scale full text search systems.
b) PostgreSQL's full text search is quite capable but not so powerful
that it can completely replace Lucene-based systems. So you have to
consider complexity vs functionality if you are tying with other data that
is already in PostgreSQL. Note further that my experience with at least
ElasticSearch is that it is easier to scale something built on multiple
PostgreSQL instances into the PB range than it is to scale ElasticSearch
into the PB range.
c) Solr or ElasticSearch
One question about your use of PostgreSQL for a TB-scale full-text search
system: Did you order search results using ts_rank or ts_rank_cd? I'm
asking because in my experience, PostgreSQL full-text search is extremely
efficient, until you need ranking. It's because the indexes don't contain
the necessary information for ranking, and because of this the heap has to
be consulted, which implies a lot of random IO.
I'd be curious to know a bit more about your experience in this regard.
Regards,
Nicolas Grilly
PS: A potential solution to the performance issue I mentioned is this PG
extension: https://github.com/postgrespro/rum
On Thu, Feb 28, 2019 at 1:50 PM Nicolas Grilly <nicolas@gardentechno.com>
wrote:
On Thu, Feb 28, 2019 at 1:24 PM Chris Travers <chris.travers@gmail.com>
wrote:1. a) TB-scale full text search systems.
b) PostgreSQL's full text search is quite capable but not so
powerful that it can completely replace Lucene-based systems. So you have
to consider complexity vs functionality if you are tying with other data
that is already in PostgreSQL. Note further that my experience with at
least ElasticSearch is that it is easier to scale something built on
multiple PostgreSQL instances into the PB range than it is to scale
ElasticSearch into the PB range.
c) Solr or ElasticSearch
One question about your use of PostgreSQL for a TB-scale full-text search
system: Did you order search results using ts_rank or ts_rank_cd? I'm
asking because in my experience, PostgreSQL full-text search is extremely
efficient, until you need ranking. It's because the indexes don't contain
the necessary information for ranking, and because of this the heap has to
be consulted, which implies a lot of random IO.I'd be curious to know a bit more about your experience in this regard.
Where I did this on the TB scale, we had some sort of ranking but it was
not based on ts_rank.
On the PB scale systems I work on now, it is distributed, and we don't
order in PostgreSQL (or anywhere else, though if someone wants to write to
disk and sort, they can do this I guess)
Regards,
Nicolas Grilly
PS: A potential solution to the performance issue I mentioned is this PG
extension: https://github.com/postgrespro/rum
--
Best Wishes,
Chris Travers
Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor
lock-in.
http://www.efficito.com/learn_more
On Thu, Feb 28, 2019 at 2:12 PM Chris Travers <chris.travers@gmail.com>
wrote:
Where I did this on the TB scale, we had some sort of ranking but it was
not based on ts_rank.On the PB scale systems I work on now, it is distributed, and we don't
order in PostgreSQL (or anywhere else, though if someone wants to write to
disk and sort, they can do this I guess)
Thanks!
Show quoted text
I wish more people would ask this question, to me, it is the true mark of
experience. In general, I think of PostgreSQL as the leading Relational
Database. The farther you get away from relational data and relational
queries, the more I would say, you should look for other products or
solutions. But if you want to store relational data and then run queries
over it, then stick with PostgreSQL.
My 2 scents..
Mark
On Thu, Feb 28, 2019 at 8:28 AM Nicolas Grilly <nicolas@gardentechno.com>
wrote:
Show quoted text
On Thu, Feb 28, 2019 at 2:12 PM Chris Travers <chris.travers@gmail.com>
wrote:Where I did this on the TB scale, we had some sort of ranking but it was
not based on ts_rank.On the PB scale systems I work on now, it is distributed, and we don't
order in PostgreSQL (or anywhere else, though if someone wants to write to
disk and sort, they can do this I guess)Thanks!
On 2/28/19 5:47 AM, Thomas Güttler wrote:
Hi experts,
where would you suggest someone to **not** use PostgreSQL?
1. Small embedded systems. SQLite is great for that.
2. Easy-to-implement Master-Master replication. (The Percona fork of MySQL
does that really well, if you can handle MySQL's limitations.)
--
Angular momentum makes the world go 'round.
On Thu, Feb 28, 2019 at 4:50 AM Nicolas Grilly <nicolas@gardentechno.com>
wrote:
On Thu, Feb 28, 2019 at 1:24 PM Chris Travers <chris.travers@gmail.com>
wrote:1. a) TB-scale full text search systems.
b) PostgreSQL's full text search is quite capable but not so
powerful that it can completely replace Lucene-based systems. So you have
to consider complexity vs functionality if you are tying with other data
that is already in PostgreSQL. Note further that my experience with at
least ElasticSearch is that it is easier to scale something built on
multiple PostgreSQL instances into the PB range than it is to scale
ElasticSearch into the PB range.
c) Solr or ElasticSearchOne question about your use of PostgreSQL for a TB-scale full-text search
system: Did you order search results using ts_rank or ts_rank_cd? I'm
asking because in my experience, PostgreSQL full-text search is extremely
efficient, until you need ranking. It's because the indexes don't contain
the necessary information for ranking, and because of this the heap has to
be consulted, which implies a lot of random IO.
Check out the RUM index extension, it adds ranking information to indexes
to speed up exactly the problem you pointed out:
I often avoid PostgreSQL when using software for which PostgreSQL
support is secondary. Usually this is the case where MySQL is the
default, but PostgreSQL is on the "also supported" list. "Also" is
too often a synonym for "badly", here, and unless I really want to be
able to approach the underlying database *as a database*, the better
choice is to go with the flow. Even when I do need that, I'll
consider whether the features I need are exclusive to, or much better
on, PostgreSQL.
When developing something myself, I've also chosen MySQL because the
other technical people likely to be involved are at least somewhat
familiar with it. A person who is marginally competent with databases
doesn't need the added difficulty of learning a new DBMS while
learning whatever I created.
It's always a pleasure when I don't have such issues, and I can use PostgreSQL.
On Thu, Feb 28, 2019 at 6:47 AM Thomas Güttler
<guettliml@thomas-guettler.de> wrote:
Hi experts,
where would you suggest someone to **not** use PostgreSQL?
Why would you do this?
What alternative would you suggest instead?
Regards,
Thomas Güttler--
Thomas Guettler http://www.thomas-guettler.de/
I am looking for feedback: https://github.com/guettli/programming-guidelines
--
Ray Brinzer
Doh, sorry I missed your postscript!
On Thu, Feb 28, 2019 at 8:02 AM Nicolas Grilly <nicolas@garden-paris.com>
wrote:
Show quoted text
Le jeu. 28 févr. 2019 à 16:48, Michel Pelletier <
pelletier.michel@gmail.com> a écrit :Check out the RUM index extension, it adds ranking information to indexes
to speed up exactly the problem you pointed out:I mentioned it at the end of my message:-)
It would be great to have it integrated in the standard distribution.
Import Notes
Reply to msg id not found: CAG3yVS4qxj6tgjQHUkm9tSu7xBkrLc+dCBDwc4wwTOk9KoBpsQ@mail.gmail.com
Thank you very much for your friendly answers.
I added some parts to my guidlines:
https://github.com/guettli/programming-guidelines/blob/master/README.rst#where-to-not-use-postgresql
Feedback is welcome.
Regards,
Thomas Güttler
Am 28.02.19 um 12:47 schrieb Thomas Güttler:
Hi experts,
where would you suggest someone to **not** use PostgreSQL?
Why would you do this?
What alternative would you suggest instead?
Regards,
Thomas Güttler
--
Thomas Guettler http://www.thomas-guettler.de/
I am looking for feedback: https://github.com/guettli/programming-guidelines
On Thu, Feb 28, 2019 at 6:24 AM Chris Travers <chris.travers@gmail.com> wrote:
On Thu, Feb 28, 2019 at 1:09 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi
čt 28. 2. 2019 v 12:47 odesílatel Thomas Güttler <guettliml@thomas-guettler.de> napsal:
Hi experts,
where would you suggest someone to **not** use PostgreSQL?
Hard question. There are a lot of general places where PostgreSQL is not by itself the best fit, and where a careful weighing of pros and cons would need to be made before deciding to use it.
Having used PostgreSQL in place of ElasticSearch for PB-scale deployments, as a high-throughput queue system, and near-real-time OLAP workloads, I am fairly aware of how hard it can be pushed.
So the answers here are not "don't use PostgreSQL here" but "think about it first and consider alternatives."
Why would you do this?
Replacing with "What would you consider to be the tradeoffs?"
What alternative would you suggest instead?
So a few possibilities:
1. a) TB-scale full text search systems.
b) PostgreSQL's full text search is quite capable but not so powerful that it can completely replace Lucene-based systems. So you have to consider complexity vs functionality if you are tying with other data that is already in PostgreSQL. Note further that my experience with at least ElasticSearch is that it is easier to scale something built on multiple PostgreSQL instances into the PB range than it is to scale ElasticSearch into the PB range.
c) Solr or ElasticSearch
In my company we had to swap out solr for postgres. The main
challenge was that solr's limited query language was not able to deal
with complex authorization use cases that were coming in after the
original project was deployed. Our only solution was to heavily
denormalize the documents so that when simple changes happened on the
OLTP side we had to push large amounts of data into SOLR. In addition
to being slow, solr (or I guess the underlying lucene) started to
develop weird failure modes; there were unpredictable swings in memory
and/or disk usage, underlying system resource exhaustion (especially
fd). The whole thing felt unstable; we had tested heavily with the
original requirements but the tech did not evolve with the project.
The solution was to junk the whole thing and replace it with an API
compatible version of solr in the database. To index the document we
use a special search string with upper case keys and lower case values
in a tab delimited text string; pg_trgm/gin does the rest of the
lifting. It can't compete with solr on best case behavior but give
much better worst case behavior, and, since we don't have to
denormalize, the system fits within memory making scaling a snap.
The moral of the story here is 'Just use postgres'. This is not
zealotry; if I were a microsoft inclined person, I might be advising
use of sql server. If you are not totally and completely aware of the
limits of the system you are probably operating within them. The
database is evolving rapidly and insanely powerful servers,
supercomputers even, from the perspective of even 10 years ago, can be
had for a mouse click on the cheap.
There of course a few use cases were postgres is not optimal tech;
highly unstructured data...super high transaction rate master master
loss tolerant data archiving, warm log storage, etc. These problems
show up quite rarely in the world of data which is generally directed
towards systems support of business applications. If you haven't
mastered the database first, you probably shouldn't be building out
complex systems in non-database technology since you don't know what
you don't know (for example, the immense value that transactions bring
to the table).
merlin
For small, lightweight, portable SQLite is quite nice for times you
don't want to create a full PG installation and still prefer SQL for
lookups.
--
Steven Lembark 3920 10th Ave South
Workhorse Computing Birmingham, AL 35222
lembark@wrkhors.com +1 888 359 3508
Thomas Güttler schrieb am 28.02.2019 um 12:47:
where would you suggest someone to **not** use PostgreSQL?
Why would you do this?
What alternative would you suggest instead?
Due to the MVCC implementation, I would hesitate to use Postgres in environments that have an extremely high and constant rate of DELETE an UPDATE statements, e.g. several thousands or even tens of thousands transactions per second without any "quiet" times where vacuum could "catch up" clean out dead tuples.
Tuning autovacuum to cope with that is very challenging
(at least until the new UNDO log implementation is ready ;)