Where **not** to use PostgreSQL?

Started by Thomas Güttlerabout 7 years ago15 messagesgeneral
Jump to latest
#1Thomas Güttler
guettliml@thomas-guettler.de

Hi experts,

where would you suggest someone to **not** use PostgreSQL?

Why would you do this?

What alternative would you suggest instead?

Regards,
Thomas Güttler

--
Thomas Guettler http://www.thomas-guettler.de/
I am looking for feedback: https://github.com/guettli/programming-guidelines

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Thomas Güttler (#1)
Re: Where **not** to use PostgreSQL?

Hi

čt 28. 2. 2019 v 12:47 odesílatel Thomas Güttler <
guettliml@thomas-guettler.de> napsal:

Hi experts,

where would you suggest someone to **not** use PostgreSQL?

Why would you do this?

What alternative would you suggest instead?

Don't use Postgres like cache, don't use Postgres for non transactional
short life often updated data.

Use inmemory databases instead

Pavel

Show quoted text

Regards,
Thomas Güttler

--
Thomas Guettler http://www.thomas-guettler.de/
I am looking for feedback:
https://github.com/guettli/programming-guidelines

#3Chris Travers
chris.travers@gmail.com
In reply to: Pavel Stehule (#2)
Re: Where **not** to use PostgreSQL?

On Thu, Feb 28, 2019 at 1:09 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

Hi

čt 28. 2. 2019 v 12:47 odesílatel Thomas Güttler <
guettliml@thomas-guettler.de> napsal:

Hi experts,

where would you suggest someone to **not** use PostgreSQL?

Hard question. There are a lot of general places where PostgreSQL is not
by itself the best fit, and where a careful weighing of pros and cons would
need to be made before deciding to use it.

Having used PostgreSQL in place of ElasticSearch for PB-scale deployments,
as a high-throughput queue system, and near-real-time OLAP workloads, I am
fairly aware of how hard it can be pushed.

So the answers here are not "don't use PostgreSQL here" but "think about it
first and consider alternatives."

Why would you do this?

Replacing with "What would you consider to be the tradeoffs?"

What alternative would you suggest instead?

So a few possibilities:

1. a) TB-scale full text search systems.
b) PostgreSQL's full text search is quite capable but not so powerful
that it can completely replace Lucene-based systems. So you have to
consider complexity vs functionality if you are tying with other data that
is already in PostgreSQL. Note further that my experience with at least
ElasticSearch is that it is easier to scale something built on multiple
PostgreSQL instances into the PB range than it is to scale ElasticSearch
into the PB range.
c) Solr or ElasticSearch

2. a) High performance job queues
b) PostgreSQL index and table structures are not well suited to large
numbers of ordered deletes. There are ways around these problems and again
if other data is in PostgreSQL, the tradeoff is around transactional
behavior and complexity there vs ease of scaling performance.
c) Redis if the job queue easily fits into a small enough amount of
memory, or Kafka if it does not

On to where you actually should never use PostgreSQL:

Don't use PostgreSQL for things where you do not want or cannot guarantee
transactional atomicity.
While it is possible to have untrusted languages have side effects in the
real world, the fact is that mixing transactions and non-transactional
behavior in this way adds a lot of really ugly complexity.
Use another development environment instead.

Don't use Postgres like cache, don't use Postgres for non transactional
short life often updated data.

Use inmemory databases instead

Pavel

Regards,
Thomas Güttler

--
Thomas Guettler http://www.thomas-guettler.de/
I am looking for feedback:
https://github.com/guettli/programming-guidelines

--
Best Wishes,
Chris Travers

Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor
lock-in.
http://www.efficito.com/learn_more

#4Nicolas Grilly
nicolas@gardentechno.com
In reply to: Chris Travers (#3)
Re: Where **not** to use PostgreSQL?

On Thu, Feb 28, 2019 at 1:24 PM Chris Travers <chris.travers@gmail.com>
wrote:

1. a) TB-scale full text search systems.
b) PostgreSQL's full text search is quite capable but not so powerful
that it can completely replace Lucene-based systems. So you have to
consider complexity vs functionality if you are tying with other data that
is already in PostgreSQL. Note further that my experience with at least
ElasticSearch is that it is easier to scale something built on multiple
PostgreSQL instances into the PB range than it is to scale ElasticSearch
into the PB range.
c) Solr or ElasticSearch

One question about your use of PostgreSQL for a TB-scale full-text search
system: Did you order search results using ts_rank or ts_rank_cd? I'm
asking because in my experience, PostgreSQL full-text search is extremely
efficient, until you need ranking. It's because the indexes don't contain
the necessary information for ranking, and because of this the heap has to
be consulted, which implies a lot of random IO.

I'd be curious to know a bit more about your experience in this regard.

Regards,

Nicolas Grilly

PS: A potential solution to the performance issue I mentioned is this PG
extension: https://github.com/postgrespro/rum

#5Chris Travers
chris.travers@gmail.com
In reply to: Nicolas Grilly (#4)
Re: Where **not** to use PostgreSQL?

On Thu, Feb 28, 2019 at 1:50 PM Nicolas Grilly <nicolas@gardentechno.com>
wrote:

On Thu, Feb 28, 2019 at 1:24 PM Chris Travers <chris.travers@gmail.com>
wrote:

1. a) TB-scale full text search systems.
b) PostgreSQL's full text search is quite capable but not so
powerful that it can completely replace Lucene-based systems. So you have
to consider complexity vs functionality if you are tying with other data
that is already in PostgreSQL. Note further that my experience with at
least ElasticSearch is that it is easier to scale something built on
multiple PostgreSQL instances into the PB range than it is to scale
ElasticSearch into the PB range.
c) Solr or ElasticSearch

One question about your use of PostgreSQL for a TB-scale full-text search
system: Did you order search results using ts_rank or ts_rank_cd? I'm
asking because in my experience, PostgreSQL full-text search is extremely
efficient, until you need ranking. It's because the indexes don't contain
the necessary information for ranking, and because of this the heap has to
be consulted, which implies a lot of random IO.

I'd be curious to know a bit more about your experience in this regard.

Where I did this on the TB scale, we had some sort of ranking but it was
not based on ts_rank.

On the PB scale systems I work on now, it is distributed, and we don't
order in PostgreSQL (or anywhere else, though if someone wants to write to
disk and sort, they can do this I guess)

Regards,

Nicolas Grilly

PS: A potential solution to the performance issue I mentioned is this PG
extension: https://github.com/postgrespro/rum

--
Best Wishes,
Chris Travers

Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor
lock-in.
http://www.efficito.com/learn_more

#6Nicolas Grilly
nicolas@gardentechno.com
In reply to: Chris Travers (#5)
Re: Where **not** to use PostgreSQL?

On Thu, Feb 28, 2019 at 2:12 PM Chris Travers <chris.travers@gmail.com>
wrote:

Where I did this on the TB scale, we had some sort of ranking but it was
not based on ts_rank.

On the PB scale systems I work on now, it is distributed, and we don't
order in PostgreSQL (or anywhere else, though if someone wants to write to
disk and sort, they can do this I guess)

Thanks!

Show quoted text
#7Mark Moellering
markmoellering@psyberation.com
In reply to: Nicolas Grilly (#6)
Re: Where **not** to use PostgreSQL?

I wish more people would ask this question, to me, it is the true mark of
experience. In general, I think of PostgreSQL as the leading Relational
Database. The farther you get away from relational data and relational
queries, the more I would say, you should look for other products or
solutions. But if you want to store relational data and then run queries
over it, then stick with PostgreSQL.

My 2 scents..

Mark

On Thu, Feb 28, 2019 at 8:28 AM Nicolas Grilly <nicolas@gardentechno.com>
wrote:

Show quoted text

On Thu, Feb 28, 2019 at 2:12 PM Chris Travers <chris.travers@gmail.com>
wrote:

Where I did this on the TB scale, we had some sort of ranking but it was
not based on ts_rank.

On the PB scale systems I work on now, it is distributed, and we don't
order in PostgreSQL (or anywhere else, though if someone wants to write to
disk and sort, they can do this I guess)

Thanks!

#8Ron
ronljohnsonjr@gmail.com
In reply to: Thomas Güttler (#1)
Re: Where **not** to use PostgreSQL?

On 2/28/19 5:47 AM, Thomas Güttler wrote:

Hi experts,

where would you suggest someone to **not** use PostgreSQL?

1. Small embedded systems.  SQLite is great for that.

2. Easy-to-implement Master-Master replication.  (The Percona fork of MySQL
does that really well, if you can handle MySQL's limitations.)

--
Angular momentum makes the world go 'round.

#9Michel Pelletier
pelletier.michel@gmail.com
In reply to: Nicolas Grilly (#4)
Re: Where **not** to use PostgreSQL?

On Thu, Feb 28, 2019 at 4:50 AM Nicolas Grilly <nicolas@gardentechno.com>
wrote:

On Thu, Feb 28, 2019 at 1:24 PM Chris Travers <chris.travers@gmail.com>
wrote:

1. a) TB-scale full text search systems.
b) PostgreSQL's full text search is quite capable but not so
powerful that it can completely replace Lucene-based systems. So you have
to consider complexity vs functionality if you are tying with other data
that is already in PostgreSQL. Note further that my experience with at
least ElasticSearch is that it is easier to scale something built on
multiple PostgreSQL instances into the PB range than it is to scale
ElasticSearch into the PB range.
c) Solr or ElasticSearch

One question about your use of PostgreSQL for a TB-scale full-text search
system: Did you order search results using ts_rank or ts_rank_cd? I'm
asking because in my experience, PostgreSQL full-text search is extremely
efficient, until you need ranking. It's because the indexes don't contain
the necessary information for ranking, and because of this the heap has to
be consulted, which implies a lot of random IO.

Check out the RUM index extension, it adds ranking information to indexes
to speed up exactly the problem you pointed out:

https://github.com/postgrespro/rum

#10Raymond Brinzer
ray.brinzer@gmail.com
In reply to: Thomas Güttler (#1)
Re: Where **not** to use PostgreSQL?

I often avoid PostgreSQL when using software for which PostgreSQL
support is secondary. Usually this is the case where MySQL is the
default, but PostgreSQL is on the "also supported" list. "Also" is
too often a synonym for "badly", here, and unless I really want to be
able to approach the underlying database *as a database*, the better
choice is to go with the flow. Even when I do need that, I'll
consider whether the features I need are exclusive to, or much better
on, PostgreSQL.

When developing something myself, I've also chosen MySQL because the
other technical people likely to be involved are at least somewhat
familiar with it. A person who is marginally competent with databases
doesn't need the added difficulty of learning a new DBMS while
learning whatever I created.

It's always a pleasure when I don't have such issues, and I can use PostgreSQL.

On Thu, Feb 28, 2019 at 6:47 AM Thomas Güttler
<guettliml@thomas-guettler.de> wrote:

Hi experts,

where would you suggest someone to **not** use PostgreSQL?

Why would you do this?

What alternative would you suggest instead?

Regards,
Thomas Güttler

--
Thomas Guettler http://www.thomas-guettler.de/
I am looking for feedback: https://github.com/guettli/programming-guidelines

--
Ray Brinzer

#11Michel Pelletier
pelletier.michel@gmail.com
In reply to: Thomas Güttler (#1)
Re: Where **not** to use PostgreSQL?

Doh, sorry I missed your postscript!

On Thu, Feb 28, 2019 at 8:02 AM Nicolas Grilly <nicolas@garden-paris.com>
wrote:

Show quoted text

Le jeu. 28 févr. 2019 à 16:48, Michel Pelletier <
pelletier.michel@gmail.com> a écrit :

Check out the RUM index extension, it adds ranking information to indexes
to speed up exactly the problem you pointed out:

https://github.com/postgrespro/rum

I mentioned it at the end of my message:-)

It would be great to have it integrated in the standard distribution.

#12Thomas Güttler
guettliml@thomas-guettler.de
In reply to: Thomas Güttler (#1)
Thank you. Was: Where **not** to use PostgreSQL?

Thank you very much for your friendly answers.

I added some parts to my guidlines:

 https://github.com/guettli/programming-guidelines/blob/master/README.rst#where-to-not-use-postgresql

Feedback is welcome.

Regards,

  Thomas Güttler

Am 28.02.19 um 12:47 schrieb Thomas Güttler:

Hi experts,

where would you suggest someone to **not** use PostgreSQL?

Why would you do this?

What alternative would you suggest instead?

Regards,
  Thomas Güttler

--
Thomas Guettler http://www.thomas-guettler.de/
I am looking for feedback: https://github.com/guettli/programming-guidelines

#13Merlin Moncure
mmoncure@gmail.com
In reply to: Chris Travers (#3)
Re: Where **not** to use PostgreSQL?

On Thu, Feb 28, 2019 at 6:24 AM Chris Travers <chris.travers@gmail.com> wrote:

On Thu, Feb 28, 2019 at 1:09 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hi

čt 28. 2. 2019 v 12:47 odesílatel Thomas Güttler <guettliml@thomas-guettler.de> napsal:

Hi experts,

where would you suggest someone to **not** use PostgreSQL?

Hard question. There are a lot of general places where PostgreSQL is not by itself the best fit, and where a careful weighing of pros and cons would need to be made before deciding to use it.

Having used PostgreSQL in place of ElasticSearch for PB-scale deployments, as a high-throughput queue system, and near-real-time OLAP workloads, I am fairly aware of how hard it can be pushed.

So the answers here are not "don't use PostgreSQL here" but "think about it first and consider alternatives."

Why would you do this?

Replacing with "What would you consider to be the tradeoffs?"

What alternative would you suggest instead?

So a few possibilities:

1. a) TB-scale full text search systems.
b) PostgreSQL's full text search is quite capable but not so powerful that it can completely replace Lucene-based systems. So you have to consider complexity vs functionality if you are tying with other data that is already in PostgreSQL. Note further that my experience with at least ElasticSearch is that it is easier to scale something built on multiple PostgreSQL instances into the PB range than it is to scale ElasticSearch into the PB range.
c) Solr or ElasticSearch

In my company we had to swap out solr for postgres. The main
challenge was that solr's limited query language was not able to deal
with complex authorization use cases that were coming in after the
original project was deployed. Our only solution was to heavily
denormalize the documents so that when simple changes happened on the
OLTP side we had to push large amounts of data into SOLR. In addition
to being slow, solr (or I guess the underlying lucene) started to
develop weird failure modes; there were unpredictable swings in memory
and/or disk usage, underlying system resource exhaustion (especially
fd). The whole thing felt unstable; we had tested heavily with the
original requirements but the tech did not evolve with the project.

The solution was to junk the whole thing and replace it with an API
compatible version of solr in the database. To index the document we
use a special search string with upper case keys and lower case values
in a tab delimited text string; pg_trgm/gin does the rest of the
lifting. It can't compete with solr on best case behavior but give
much better worst case behavior, and, since we don't have to
denormalize, the system fits within memory making scaling a snap.

The moral of the story here is 'Just use postgres'. This is not
zealotry; if I were a microsoft inclined person, I might be advising
use of sql server. If you are not totally and completely aware of the
limits of the system you are probably operating within them. The
database is evolving rapidly and insanely powerful servers,
supercomputers even, from the perspective of even 10 years ago, can be
had for a mouse click on the cheap.

There of course a few use cases were postgres is not optimal tech;
highly unstructured data...super high transaction rate master master
loss tolerant data archiving, warm log storage, etc. These problems
show up quite rarely in the world of data which is generally directed
towards systems support of business applications. If you haven't
mastered the database first, you probably shouldn't be building out
complex systems in non-database technology since you don't know what
you don't know (for example, the immense value that transactions bring
to the table).

merlin

#14Steven Lembark
lembark@wrkhors.com
In reply to: Thomas Güttler (#1)
Re: Where **not** to use PostgreSQL?

For small, lightweight, portable SQLite is quite nice for times you
don't want to create a full PG installation and still prefer SQL for
lookups.

--
Steven Lembark 3920 10th Ave South
Workhorse Computing Birmingham, AL 35222
lembark@wrkhors.com +1 888 359 3508

#15Thomas Kellerer
spam_eater@gmx.net
In reply to: Thomas Güttler (#1)
Re: Where **not** to use PostgreSQL?

Thomas Güttler schrieb am 28.02.2019 um 12:47:

where would you suggest someone to **not** use PostgreSQL?

Why would you do this?

What alternative would you suggest instead?

Due to the MVCC implementation, I would hesitate to use Postgres in environments that have an extremely high and constant rate of DELETE an UPDATE statements, e.g. several thousands or even tens of thousands transactions per second without any "quiet" times where vacuum could "catch up" clean out dead tuples.

Tuning autovacuum to cope with that is very challenging
(at least until the new UNDO log implementation is ready ;)