Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

Started by Nonameover 5 years ago40 messagesgeneral
Jump to latest
#1Noname
tutiluren@tutanota.com

Even though I highly appreciate that PostgreSQL, a database software which doesn't cost money, exists *at all*, that fact is oftentimes overshadowed by a small but important number of very frustrating issues which I consider to largely ruin the overall "experience" of using PostgreSQL. I'd call them "almost show-stopping". I realize very well that not everyone "has the same priorities" (clearly not) or the expertise, will and free time to work on a certain "area" of the overall project/ecosystem, but these issues/bugs/limitations are so problematic to me that I have to express them directly to the PostgreSQL developers:

1. All non-ANSI characters are turned into "?"s for application_name. My administration panel is thus full of gibberish such as: "?a??? ???? ? ?m??? ??i???u?" and it always looks as if something is awfully broken. I would not dare to show it to a CEO or other important person, as they'd just go: "We're switching to IBM, effective immediately. Throw this open source rubbish out at once!" The explanation I've heard for this is that it's basically a security issue, as it's possible to set the application_name before the something-something (safe Unicode handler code?) has kicked in, but I have no problems with setting the application_name to Unicode characters *after* the database connection has already been fully established, in a separate query, as I already do, and I doubt that anyone else would have, either. So that explanation, while probably technically true, doesn't seem to make any sense.

2. pg_dump misinterprets non-ANSI values for the "--exclude-*" options (at least the --exclude-table-data one, which is the one I've tested) on Windows, resulting in it being impossible to make more "sophisticated" backups of PostgreSQL databases; it's either all or nothing. Other programs, including my own test scripts and commands, are perfectly able to use any Unicode character sent from/through both cmd.exe and PHP CLI, but not pg_dump, so the idea that "Windows it at fault" here just doesn't seem true. (Although I don't doubt for a second that it often *is* the case... Microsoft is not a nice entity in any way.) I spent a lot of time and efforts experimenting with and asking about this, but eventually gave up and concluded that it was yet another bug in an open source project "only" on Windows with no real/pressing interest in fixing it. For me, this means that I lose a ton of fresh data every day, or have to make *gigantic* backups. (I have several huge "temporary debug log" tables whose data have zero long-term value but tons of short-term value.) It makes me feel crippled and excluded in an uncomfortable manner.

3. The ability to embed PG to run in an automatic, quiet manner as part of something else. I know about SQLite, but it's extremely limited to the point of being virtually useless IMO, which is why I cannot use that for anything nontrivial. I want my familiar PostgreSQL, only not require it to be manually and separately installed on the machine where it is to run as part of some "application". If I could just "embed" it, this would allow me to create a single EXE which I can simply put on a different machine to run my entire "system" which otherwise takes *tons* of tedious, error-prone manual labor to install, set up and maintain. Of course, this is probably much easier said than done, but I don't understand why PG's architecture necessarily dictates that PG must be a stand-alone, separate thing. Or rather, why some "glue" cannot enable it to be used just like SQLite from a *practical* perspective, even if it still is a "server-client model" underneath the hood. (Which doesn't matter at all to me, nor should it matter to anyone else.)

4. There is no built-in means to have PG manage (or even suggest) indexes on its own. Trying to figure out what indexes to create/delete/fine-tune, and determine all the extremely complex rules for this art (yes, I just called index management an *art*, because it is!), is just utterly hopeless to me. It never gets any easier. Not even after many years. It's the by far worst part of databases to me (combined with point five). Having to use third-party solutions ensures that it isn't done in practice, at least for me. I don't trust, nor do I want to deal with, external software and extensions in my databases. I still have nightmares from PostGIS, which I only keep around, angrily, out of absolute necessity. I fundamentally don't like third-party add-ons to things, but want the core product to properly support things. Besides, this (adding/managing indexes) is not even some niche/obscure use-case, but something which is crucial for basically any nontrivial database of any kind!

5. Ever since my early days with PG in the mid-2000s, I've tried numerous times to read the manual, wikis and comments for the configuration files, specifically the performance directives, and asked many, many times for help about that, yet never been able to figure out what they want me to enter for all the numerous options. At this point, it isn't me being lazy/stupid; it's objectively very difficult to understand all of that. The practical end result of this is that I've always gone back to using the untouched default configuration file (except for the logging-related options), which, especially in the past on FreeBSD, *severely* crippled my PG database to not even come close to taking advantage of the full power of the hardware. Instead, it felt like I was using maybe 1% of the machine's power, even with a proper database design and indexes and all of that stuff, simply because the default config was so "conservative" and it couldn't be just set to "use whatever resources are available". I wish so much for PG to have a mode where it self-tunes itself as needed, over time, based on the actual workload, or at least allowed some kind of abstract "performance mode" such as: "you are allowed to use significant system resources, PG", or: "You are one of my most important applications. Just use as much power as you currently need, but at least save about 10% for the rest of the system, will you?" Maybe this is also harder than it sounds to accomplish, but for somebody like me who has zero funding, I cannot hire some professional to sit down with me and fine-tune my system for $899/hour. Also, besides the purely monetary issue, there are serious privacy implications with that scene. I wouldn't want an outsider to have intimate knowledge of my database system, which more than probably is a requirement for them to be able to do their job properly.

I'm sorry if any of the above sounds insulting/"entitled". These are the main things which truly bother me about PG and the "PG ecosystem", and I'd love to hear some first-hand comments on them. At least point 1 and 2 seem like they would be almost trivial to fix, at least compared to the rest.

#2Guyren Howe
guyren@gmail.com
In reply to: Noname (#1)
Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

On Sep 14, 2020, 13:22 -0700, tutiluren@tutanota.com, wrote:

3. The ability to embed PG to run in an automatic, quiet manner as part of something else. I know about SQLite, but it's extremely limited to the point of being virtually useless IMO, which is why I cannot use that for anything nontrivial

I can’t for the life of me imagine how you arrived at this. SQLite is very capable indeed. Its dialect of SQL is (deliberately) very similar to Postgres, featuring such niceties as recursive CTEs and window functions, and it can handle heavy use and multi-terabyte databases if you need (cf bedrockdb).

#3Guyren Howe
guyren@gmail.com
In reply to: Noname (#1)
Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

Regarding indexes, I recommend the website use-the-index-luke.com. That guy’s other website about modern SQL is also great.

Regarding self-tuning, it was actually part of the original vision for relational databases that they would do that, but I’m not aware of any modern SQL database that does, although there are third-party tools that will eg offer index suggestions (I don’t know which is best, so I won’t propose any).

Show quoted text

On Sep 14, 2020, 13:22 -0700, tutiluren@tutanota.com, wrote:

4. There is no built-in means to have PG manage (or even suggest) indexes on its own. Trying to figure out what indexes to create/delete/fine-tune, and determine all the extremely complex rules for this art (yes, I just called index management an *art*, because it is!), is just utterly hopeless to me. It never gets any easier. Not even after many years. It's the by far worst part of databases to me (combined with point five). Having to use third-party solutions ensures that it isn't done in practice, at least for me. I don't trust, nor do I want to deal with, external software and extensions in my databases. I still have nightmares from PostGIS, which I only keep around, angrily, out of absolute necessity. I fundamentally don't like third-party add-ons to things, but want the core product to properly support things. Besides, this (adding/managing indexes) is not even some niche/obscure use-case, but something which is crucial for basically any nontrivial database of any kind!

#4Ravi Krishna
sravikrishna@aol.com
In reply to: Guyren Howe (#2)
Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

I can’t for the life of me imagine how you arrived at this. SQLite is very capable indeed.

Its dialect of SQL is (deliberately) very similar to Postgres, featuring such niceties as
recursive CTEs and window functions, and it can handle heavy use and multi-terabyte
databases if you need (cf bedrockdb).

But it is still a single user database in the sense that concurrent access to a table is not
handled well. I use it for single user embedded applications

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#1)
Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

tutiluren@tutanota.com writes:

1. All non-ANSI characters are turned into "?"s for application_name.

Yeah, that's hard to do much with unfortunately. We cannot assume that
all databases in an installation share the same encoding, so for globally
visible strings like application_name, the only safe solution is to
restrict them to ASCII.

On the other hand, the very same thing could be said of database names
and role names, yet we have never worried much about whether those were
encoding-safe when viewed from databases with different encodings, nor
have there been many complaints about the theoretical unsafety. So maybe
this is just overly anal-retentive and we should drop the restriction,
or at least pass through data that doesn't appear to be invalidly
encoded.

2. pg_dump misinterprets non-ANSI values for the "--exclude-*" options
(at least the --exclude-table-data one, which is the one I've tested) on
Windows, resulting in it being impossible to make more "sophisticated"
backups of PostgreSQL databases; it's either all or nothing.

TBH, I'm going to throw that back on you as probably pilot error. There's
no reason for such cases not to work if you're running the terminal window
in the same encoding that pg_dump thinks it's using.

I spent a lot of time and efforts experimenting with and asking about
this, but eventually gave up and concluded that it was yet another bug
in an open source project "only" on Windows with no real/pressing
interest in fixing it.

Yeah, I remember that discussion. It being open source cuts two ways:
problems get fixed by people who have the motivation to find a workable
fix. Not being a Windows user, I don't particularly care about whether
this case is broken or not, and even if I cared more, I do not have
the resources to figure it out myself. There are other people around the
project who do Windows, but you evidently have not managed to persuade
them that this is something they should spend time on, either.

3. The ability to embed PG to run in an automatic, quiet manner as part
of something else.

That would be a packaging problem for the "something else" to figure out,
no? There are enough platform-specific issues about "automatic" that
I don't see it as very practical for the core Postgres project to take
on, in any case. I'm aware of some packagings, such as Postgres.app for
macOS (https://postgresapp.com), that might be closer to what you are
looking for than the core project ever could be.

4. There is no built-in means to have PG manage (or even suggest)
indexes on its own.

Indeed, and that does not seem to me like something that ought to be
built in. There is at least one "index advisor" plug-in floating
around, which perhaps could help you.

5. Ever since my early days with PG in the mid-2000s, I've tried
numerous times to read the manual, wikis and comments for the
configuration files, specifically the performance directives, and asked
many, many times for help about that, yet never been able to figure out
what they want me to enter for all the numerous options. At this point,
it isn't me being lazy/stupid; it's objectively very difficult to
understand all of that.

We do the best we can as far as the documentation goes; if you have
concrete suggestions about how to improve that, we'll surely listen.

The practical end result of this is that I've always gone back to using
the untouched default configuration file (except for the logging-related
options), which, especially in the past on FreeBSD, *severely* crippled
my PG database to not even come close to taking advantage of the full
power of the hardware.

You seem to be wishing that the default configuration was designed to
try to commandeer the whole machine. It's deliberate project policy
that that not be so --- and given that a couple of bullet points up,
you were arguing for Postgres to be easily usable as an invisible
part of some other app, it doesn't seem like you really want it to
be so either.

There is a lot of info out there about Postgres tuning, and yes some
of it is contradictory, because one size doesn't fit all.  People
have different goals about how they want the system to act, and the
proper settings may vary across platforms or PG versions, and there's a
fair amount of just plain different opinions.  But I think you may be
overcomplicating it.  IMO there's not that much you have to adjust to
start with --- maybe just increase shared_buffers and possibly work_mem
--- and then tweak other settings only when you find yourself running into
that limit.  If you can figure out our logging settings then you can
figure out the rest of this, too.

regards, tom lane

#6raf
raf@raf.org
In reply to: Tom Lane (#5)
Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

On Mon, Sep 14, 2020 at 05:39:57PM -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:

tutiluren@tutanota.com writes:

1. All non-ANSI characters are turned into "?"s for application_name.

Yeah, that's hard to do much with unfortunately. We cannot assume that
all databases in an installation share the same encoding, so for globally
visible strings like application_name, the only safe solution is to
restrict them to ASCII.

On the other hand, the very same thing could be said of database names
and role names, yet we have never worried much about whether those were
encoding-safe when viewed from databases with different encodings, nor
have there been many complaints about the theoretical unsafety. So maybe
this is just overly anal-retentive and we should drop the restriction,
or at least pass through data that doesn't appear to be invalidly
encoded.

Perhaps recode database/role names from the source
database's encoding into utf8, and then recode from utf8
to the destination database's encoding?

For "globally visible strings", maybe recode to the
client_encoding setting, or a new encoding setting for
this purpose since client_encoding seems to be linked
to the database that the client is connected to. I'd've
thought that the application name would arrive encoded
as client_encoding (which defaults to the database
encoding). Maybe globally visible strings need to be
recoded on arrival from the client_encoding to utf8 (or
a "server_encoding" or "global_encoding" setting) so
they are always stored in a known encoding so they can
be recoded as necessary when viewed via connections to
other databases using a different encoding.

Just some thoughts. If they don't make any sense, feel
free to ignore them. :-)

5. Ever since my early days with PG in the mid-2000s, I've tried
numerous times to read the manual, wikis and comments for the
configuration files, specifically the performance directives, and asked
many, many times for help about that, yet never been able to figure out
what they want me to enter for all the numerous options. At this point,
it isn't me being lazy/stupid; it's objectively very difficult to
understand all of that.

We do the best we can as far as the documentation goes; if you have
concrete suggestions about how to improve that, we'll surely listen.

I thought the documentation on postgres performance tuning was fine.
Here's what I got from it (for a dedicated database server host):

shared_buffers = 1GB # Should be 1/4 of RAM
work_mem = 16MB # Should be bigger to do more sorts in-memory but it's per sort per user so not too big
maintenance_work_mem = 128MB # Should be <= 256MB to make vacuum fast without taking away too much RAM from other tasks
min_wal_size = 480MB # Should be at least 3 * 10 * 16MB to avoid too many checkpoints slowing down writes
checkpoint_completion_target = 0.9 # Should be 0.9 if checkpoint_segments was increased to spread out checkpoint writes
random_page_cost = 4.0 # Should be 4.0 for HDD, 1.5-2.5 for SSD, 1.0-1.01 if db fits entirely in RAM
effective_cache_size = 3GB # Should be 1/2 to 3/4 of RAM

It's the "Should..." comments that matter. And it might be out of date...
Actually, it is out of date. The comment for checkpoint_completion_target
refers to checkpoint_segments which no longer exists (since 9.5) so
disregard that.

cheers,
raf

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: raf (#6)
Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

raf <raf@raf.org> writes:

On Mon, Sep 14, 2020 at 05:39:57PM -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:

On the other hand, the very same thing could be said of database names
and role names, yet we have never worried much about whether those were
encoding-safe when viewed from databases with different encodings, nor
have there been many complaints about the theoretical unsafety. So maybe
this is just overly anal-retentive and we should drop the restriction,
or at least pass through data that doesn't appear to be invalidly
encoded.

Perhaps recode database/role names from the source
database's encoding into utf8, and then recode from utf8
to the destination database's encoding?

A lot of people seem to believe that transcoding through utf8
is 100% safe. They're wrong :-( --- the Japanese, at least,
have reason not to trust it, because of the existence of multiple
incompatible conversion standards. And you're still left with the
question of what to do when the destination encoding hasn't
got the character.

Moreover, this is all moderately expensive unless the encodings in
question are already utf8 or latin1. So if we go this way I'd
prefer to do it as I said above -- just drop or question-mark-ize
any characters that don't pass validation in the recipient DB.
That's fairly cheap and it will work perfectly in the typical case
where the whole cluster is on one encoding anyway.

regards, tom lane

#8Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#5)
Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

On Mon, Sep 14, 2020 at 05:39:57PM -0400, Tom Lane wrote:

tutiluren@tutanota.com writes:

1. All non-ANSI characters are turned into "?"s for application_name.

Yeah, that's hard to do much with unfortunately. We cannot assume that
all databases in an installation share the same encoding, so for globally
visible strings like application_name, the only safe solution is to
restrict them to ASCII.

On the other hand, the very same thing could be said of database names
and role names, yet we have never worried much about whether those were
encoding-safe when viewed from databases with different encodings, nor
have there been many complaints about the theoretical unsafety. So maybe
this is just overly anal-retentive and we should drop the restriction,
or at least pass through data that doesn't appear to be invalidly
encoded.

I think the issue is that role and database names are controlled by
privileged users, while application_name is not.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

#9Bruce Momjian
bruce@momjian.us
In reply to: Noname (#1)
Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

On Mon, Sep 14, 2020 at 10:22:31PM +0200, tutiluren@tutanota.com wrote:

4. There is no built-in means to have PG manage (or even suggest) indexes on
its own. Trying to figure out what indexes to create/delete/fine-tune, and
determine all the extremely complex rules for this art (yes, I just called
index management an *art*, because it is!), is just utterly hopeless to me. It
never gets any easier. Not even after many years. It's the by far worst part of
databases to me (combined with point five). Having to use third-party solutions
ensures that it isn't done in practice, at least for me. I don't trust, nor do
I want to deal with, external software and extensions in my databases. I still
have nightmares from PostGIS, which I only keep around, angrily, out of
absolute necessity. I fundamentally don't like third-party add-ons to things,
but want the core product to properly support things. Besides, this (adding/
managing indexes) is not even some niche/obscure use-case, but something which
is crucial for basically any nontrivial database of any kind!

I think you did a very good job of explaining your issues. I think the
underlying problem is that Postgres is targeting a wide market, and your
use-case for a more limited or self-contained database doesn't fit many
of those markets. Also, PostGIS is one of the most complex extensions,
so adding simpler ones should not be as hard.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#8)
Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Sep 14, 2020 at 05:39:57PM -0400, Tom Lane wrote:

On the other hand, the very same thing could be said of database names
and role names, yet we have never worried much about whether those were
encoding-safe when viewed from databases with different encodings, nor
have there been many complaints about the theoretical unsafety. So maybe
this is just overly anal-retentive and we should drop the restriction,
or at least pass through data that doesn't appear to be invalidly
encoded.

I think the issue is that role and database names are controlled by
privileged users, while application_name is not.

That's certainly an argument against a completely laissez-faire approach,
but if we filtered invalidly-encoded data on the reading side, it seems
like we would be in good enough shape.

regards, tom lane

#11Joshua D. Drake
jd@commandprompt.com
In reply to: Noname (#1)
Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

Howdy,

First let me say thanks for the feedback! It is rare that we receive
detailed "user feedback" on these lists so it is good to hear from the
outside world. I am only going to address a few things as others have
addressed the rest.

2. pg_dump misinterprets non-ANSI values for the "--exclude-*" options (at
least the --exclude-table-data one, which is the one I've tested) on
Windows, resulting in it being impossible to make more "sophisticated"
backups of PostgreSQL databases; it's either all or nothing. Other
programs, including my own test scripts and commands, are perfectly able to
use any Unicode character sent from/through both cmd.exe and PHP CLI, but
not pg_dump, so the idea that "Windows it at fault" here just doesn't seem
true. (Although I don't doubt for a second that it often *is* the case...
Microsoft is not a nice entity in any way.) I spent a lot of time and
efforts experimenting with and asking about this, but eventually gave up
and concluded that it was yet another bug in an open source project "only"
on Windows with no real/pressing interest in fixing it. For me, this means
that I lose a ton of fresh data every day, or have to make *gigantic*
backups. (I have several huge "temporary debug log" tables whose data have
zero long-term value but tons of short-term value.) It makes me feel
crippled and excluded in an uncomfortable manner.

I have to agree that pg_dump is largely a step child backup program. It has
consistently been found over the years to be lacking in a number of areas.
Unfortunately, working on pg_dump isn't sexy and it is difficult to get
volunteers or even paid resources to do such a thing. The real solution for
pg_dump is a complete refactor which includes pg_dumpall and it is not a
small undertaking. It should be noted that it is also a less and less used
program. On our team it is normally used for only very specific needs
(grabbing a schema) and we use binary backups or logical replication to
receive specific data.

3. The ability to embed PG to run in an automatic, quiet manner as part of
something else. I know about SQLite, but it's extremely limited to the
point of being virtually useless IMO, which is why I cannot use that for
anything nontrivial. I want my familiar PostgreSQL, only not require it to
be manually and separately installed on the machine where it is to run as
part of some "application". If I could just "embed" it, this would allow me
to create a single EXE which I can simply put on a different machine to run
my entire "system" which otherwise takes *tons* of tedious, error-prone
manual labor to install, set up and maintain. Of course, this is probably
much easier said than done, but I don't understand why PG's architecture
necessarily dictates that PG must be a stand-alone, separate thing. Or
rather, why some "glue" cannot enable it to be used just like SQLite from a
*practical* perspective, even if it still is a "server-client model"
underneath the hood. (Which doesn't matter at all to me, nor should it
matter to anyone else.)

This is really using the wrong tool for the job type of issue. PG was never
designed for such a scenario.

4. There is no built-in means to have PG manage (or even suggest) indexes
on its own. Trying to figure out what indexes to create/delete/fine-tune,
and determine all the extremely complex rules for this art (yes, I just
called index management an *art*, because it is!), is just utterly hopeless
to me. It never gets any easier. Not even after many years. It's the by far
worst part of databases to me (combined with point five). Having to use
third-party solutions ensures that it isn't done in practice, at least for
me. I don't trust, nor do I want to deal with, external software and
extensions in my databases. I still have nightmares from PostGIS, which I
only keep around, angrily, out of absolute necessity. I fundamentally don't
like third-party add-ons to things, but want the core product to properly
support things. Besides, this (adding/managing indexes) is not even some
niche/obscure use-case, but something which is crucial for basically any
nontrivial database of any kind!

I think you are looking at this from a very windows centric way. Open
Source has its origins from the Unix paradigm where each tool was designed
to solve one type of problem and you used multiple tools to create a
"solution". Though we have strayed from that on some items due to the
evolving nature of software needs, that is still at our core and for good
reason. Having tools, flags etc... to do such things (including your point
#3) creates complexity best left to "vendors" not the software project.

5. Ever since my early days with PG in the mid-2000s, I've tried numerous
times to read the manual, wikis and comments for the configuration files,
specifically the performance directives, and asked many, many times for
help about that, yet never been able to figure out what they want me to
enter for all the numerous options. At this point, it isn't me being
lazy/stupid; it's objectively very difficult to understand all of that.

This is absolutely true. The Postgresql documentation is FANTASTIC if you
already understand what is going on or you need a reference. We have
improved this a bit in 13 with the glossary but we still don't have
definitive "5 steps to make your PostgreSQL server fast" and that is very
much because it is a complicated question and it takes a lot of knowledge
to do it correctly.

The practical end result of this is that I've always gone back to using
the untouched default configuration file (except for the logging-related
options), which, especially in the past on FreeBSD, *severely* crippled my
PG database to not even come close to taking advantage of the full power of
the hardware. Instead, it felt like I was using maybe 1% of the machine's
power, even with a proper database design and indexes and all of that
stuff, simply because the default config was so "conservative" and it
couldn't be just set to "use whatever resources are available".

Not to be unkind but this does seem lazy. There are literally hundreds of
"how to make postgres go fast", "how to optimize postgres" if you take 15
minutes to Google. It is true that the project (outside of the wiki)
doesn't have much information in the official documentation but that
doesn't mean that the information is not available.

I wish so much for PG to have a mode where it self-tunes itself as needed,
over time, based on the actual workload, or at least allowed some kind of
abstract "performance mode" such as: "you are allowed to use significant
system resources, PG", or: "You are one of my most important applications.
Just use as much power as you currently need, but at least save about 10%
for the rest of the system, will you?" Maybe this is also harder than it
sounds to accomplish, but for somebody like me who has zero funding, I
cannot hire some professional to sit down with me and fine-tune my system
for $899/hour.

See my comment about Google. The information is out there and easy to find.
There is a lot of fresh and free content right here:

https://postgresconf.org/conferences/postgres-webinar-series/schedule/events

I hope you find the information you are looking for. There is a very
helpful community on Discord and Slack as well:

- https://discord.gg/E7WjZhz
- https://postgres-slack.herokuapp.com/

Thanks,

JD

#12Fabio Ugo Venchiarutti
f.venchiarutti@ocado.com
In reply to: Joshua D. Drake (#11)
Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

On 21/09/2020 17:53, Joshua Drake wrote:

3. The ability to embed PG to run in an automatic, quiet manner as part
of something else. I know about SQLite, but it's extremely limited to
the point of being virtually useless IMO, which is why I cannot use that
for anything nontrivial. I want my familiar PostgreSQL, only not require
it to be manually and separately installed on the machine where it is to
run as part of some "application". If I could just "embed" it, this
would allow me to create a single EXE which I can simply put on a
different machine to run my entire "system" which otherwise takes *tons*
of tedious, error-prone manual labor to install, set up and maintain. Of
course, this is probably much easier said than done, but I don't
understand why PG's architecture necessarily dictates that PG must be a
stand-alone, separate thing. Or rather, why some "glue" cannot enable it
to be used just like SQLite from a *practical* perspective, even if it
still is a "server-client model" underneath the hood. (Which doesn't
matter at all to me, nor should it matter to anyone else.)

It depends what you mean by "embedded".
If you want sqlite's linked library approach, where the database calls
literally run your process' address space, then that's a no go, as
postgres is a multi-user database server with its own process hierarchy.

However, postgres also is a rather agnostic command that does not detach
from the parent's terminal/stdio unless instructed to, so nothing stops
your bespoke application from launching and managing its own postmaster
as a directly managed child process - started as part of application
initialisation - and having that listen on a local socket only available
to the application itself; this is what we implemented in some of our
installations where postgres is enslaved to the cluster control system -
the configuration is re-generated at every restart (IIRC Patroni does
something similar minus the unix socket part).

A plethora of systems are built around the notion of programs calling
other programs and managing the process' life cycle. The limiting factor
to such architecture tends to be the OS's process control semantics and
API (notoriously weaker or more contrived on non-unix-like OSs), but
that's not postgres' fault.

--
Regards

Fabio Ugo Venchiarutti
OSPCFC Network Engineering Dpt.
Ocado Technology

--

Notice:
This email is confidential and may contain copyright material of
members of the Ocado Group. Opinions and views expressed in this message
may not necessarily reflect the opinions and views of the members of the
Ocado Group.

If you are not the intended recipient, please notify us
immediately and delete all copies of this message. Please note that it is
your responsibility to scan this message for viruses.

References to the
"Ocado Group" are to Ocado Group plc (registered in England and Wales with
number 7098618) and its subsidiary undertakings (as that expression is
defined in the Companies Act 2006) from time to time. The registered office
of Ocado Group plc is Buildings One & Two, Trident Place, Mosquito Way,
Hatfield, Hertfordshire, AL10 9UL.

#13Jonathan Strong
jonathanrstrong@gmail.com
In reply to: Fabio Ugo Venchiarutti (#12)
Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

Joshua - adding to what Fabio said (with which I agree wholeheartedly!) -

It will be worthwhile for you to itemize the attributes and needed features
for making a final decision about the architecture of the application you
plan to deploy.

While you are familiar with PostgreSQL and like its features, it may not
wind up being the best match for what you are trying to accomplish. I'm not
sure your concerns about SQLite are true roadblocks. If you are looking to
deploy an executable supported by one or more DLLs but no external
processes, you won't find many mature and truly embeddable DBMSs out there.
SQLite is widely used and well supported, and some of the functionality
you'd normally consider putting in stored procedures might fit in with the
idea of creating custom functions. A few more steps than creating a
PostgreSQL stored procedure or function, but perhaps this approach would
make SQLite a reasonable choice for you. See:

https://stackoverflow.com/questions/2108870/how-to-create-custom-functions-in-sqlite

Back to the question about attributes and features of the DBMS, and Fabio's
question to you re: your definition of an embedded database. An embedded
database is typically used by a single user and is only available within
the context of the current application and user / memory space. It should
also be self-tuning and self-managing, as a typical end user can't and
shouldn't be expected to also handle any DBA tasks. Several databases are
considered "embedded", but the criteria used for this label may not match
your own. If you can spell out your specific expectations (e.g., the app
and database can be deployed in a single exe, etc.), this will help
tremendously in homing in on the right choice.

- Jon

<https://www.linkedin.com/in/jonstrong/&gt;
<https://www.jonathanrstrong.com&gt;

*Jonathan Strong*

CIO / CTO / Consultant

*P:* 609-532-1715 *E:* jonathanrstrong@gmail.com

*Quora Top Writer <https://www.quora.com/profile/Jonathan-R-Strong&gt;*

On Tue, Sep 22, 2020 at 10:18 AM Fabio Ugo Venchiarutti <
f.venchiarutti@ocado.com> wrote:

Show quoted text

On 21/09/2020 17:53, Joshua Drake wrote:

3. The ability to embed PG to run in an automatic, quiet manner as part
of something else. I know about SQLite, but it's extremely limited to
the point of being virtually useless IMO, which is why I cannot use that
for anything nontrivial. I want my familiar PostgreSQL, only not require
it to be manually and separately installed on the machine where it is to
run as part of some "application". If I could just "embed" it, this
would allow me to create a single EXE which I can simply put on a
different machine to run my entire "system" which otherwise takes *tons*
of tedious, error-prone manual labor to install, set up and maintain. Of
course, this is probably much easier said than done, but I don't
understand why PG's architecture necessarily dictates that PG must be a
stand-alone, separate thing. Or rather, why some "glue" cannot enable it
to be used just like SQLite from a *practical* perspective, even if it
still is a "server-client model" underneath the hood. (Which doesn't
matter at all to me, nor should it matter to anyone else.)

It depends what you mean by "embedded".
If you want sqlite's linked library approach, where the database calls
literally run your process' address space, then that's a no go, as
postgres is a multi-user database server with its own process hierarchy.

However, postgres also is a rather agnostic command that does not detach
from the parent's terminal/stdio unless instructed to, so nothing stops
your bespoke application from launching and managing its own postmaster
as a directly managed child process - started as part of application
initialisation - and having that listen on a local socket only available
to the application itself; this is what we implemented in some of our
installations where postgres is enslaved to the cluster control system -
the configuration is re-generated at every restart (IIRC Patroni does
something similar minus the unix socket part).

A plethora of systems are built around the notion of programs calling
other programs and managing the process' life cycle. The limiting factor
to such architecture tends to be the OS's process control semantics and
API (notoriously weaker or more contrived on non-unix-like OSs), but
that's not postgres' fault.

--
Regards

Fabio Ugo Venchiarutti
OSPCFC Network Engineering Dpt.
Ocado Technology

--

Notice:
This email is confidential and may contain copyright material of
members of the Ocado Group. Opinions and views expressed in this message
may not necessarily reflect the opinions and views of the members of the
Ocado Group.

If you are not the intended recipient, please notify us
immediately and delete all copies of this message. Please note that it is
your responsibility to scan this message for viruses.

References to the
"Ocado Group" are to Ocado Group plc (registered in England and Wales with
number 7098618) and its subsidiary undertakings (as that expression is
defined in the Companies Act 2006) from time to time. The registered
office
of Ocado Group plc is Buildings One & Two, Trident Place, Mosquito Way,
Hatfield, Hertfordshire, AL10 9UL.

#14Noname
tutiluren@tutanota.com
In reply to: Noname (#1)
Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

Sep 21, 2020, 7:53 PM by jd@commandprompt.com:

I have to agree that pg_dump is largely a step child backup program. It has consistently been found over the years to be lacking in a number of areas. Unfortunately, working on pg_dump isn't sexy and it is difficult to get volunteers or even paid resources to do such a thing. The real solution for pg_dump is a complete refactor which includes pg_dumpall and it is not a small undertaking. It should be noted that it is also a less and less used program. On our team it is normally used for only very specific needs (grabbing a schema) and we use binary backups or logical replication to receive specific data.

Huh? Are you saying that there is another, superior way to back up PostgreSQL databases other than pg_dump? I re-read the manual on it just now, but didn't see a single word about it being "legacy" or "deprecated" or even that there's any other way to do it. What do you mean?

3. The ability to embed PG to run in an automatic, quiet manner as part of something else. I know about SQLite, but it's extremely limited to the point of being virtually useless IMO, which is why I cannot use that for anything nontrivial. I want my familiar PostgreSQL, only not require it to be manually and separately installed on the machine where it is to run as part of some "application". If I could just "embed" it, this would allow me to create a single EXE which I can simply put on a different machine to run my entire "system" which otherwise takes *tons* of tedious, error-prone manual labor to install, set up and maintain. Of course, this is probably much easier said than done, but I don't understand why PG's architecture necessarily dictates that PG must be a stand-alone, separate thing. Or rather, why some "glue" cannot enable it to be used just like SQLite from a *practical* perspective, even if it still is a "server-client model" underneath the hood. (Which doesn't matter at all to me, nor should it matter to anyone else.)

This is really using the wrong tool for the job type of issue. PG was never designed for such a scenario.

I hate the "wrong tool for the job" argument. It assumes that everyone has infinite time, energy and brain capacity to learn endless redundant tools just to "use the right tool for the job" rather than "use what you actually know". I know PG. I don't know SQLite. They are very different. So obviously, I want to use PG.

What exactly makes PG unsuitable for this? I don't get it. But at the same time, I also realize that it's not going to happen at this point. The entire concept of a desktop computer appears to be phased out as we speak...

4. There is no built-in means to have PG manage (or even suggest) indexes on its own. Trying to figure out what indexes to create/delete/fine-tune, and determine all the extremely complex rules for this art (yes, I just called index management an *art*, because it is!), is just utterly hopeless to me. It never gets any easier. Not even after many years. It's the by far worst part of databases to me (combined with point five). Having to use third-party solutions ensures that it isn't done in practice, at least for me. I don't trust, nor do I want to deal with, external software and extensions in my databases. I still have nightmares from PostGIS, which I only keep around, angrily, out of absolute necessity. I fundamentally don't like third-party add-ons to things, but want the core product to properly support things. Besides, this (adding/managing indexes) is not even some niche/obscure use-case, but something which is crucial for basically any nontrivial database of any kind!

I think you are looking at this from a very windows centric way. Open Source has its origins from the Unix paradigm where each tool was designed to solve one type of problem and you used multiple tools to create a "solution". Though we have strayed from that on some items due to the evolving nature of software needs, that is still at our core and for good reason. Having tools, flags etc... to do such things (including your point #3) creates complexity best left to "vendors" not the software project.

While I understand what you mean, and even agree in theory, in practice, this always results in crappy third-party solutions which I don't want to deal with. PostGIS, for example, forces me to use "postgis" for its schema instead of "PostGIS" just because they arrogantly didn't construct their internal queries properly. "Little" things like that.

The practical end result of this is that I've always gone back to using the untouched default configuration file (except for the logging-related options), which, especially in the past on FreeBSD, *severely* crippled my PG database to not even come close to taking advantage of the full power of the hardware. Instead, it felt like I was using maybe 1% of the machine's power, even with a proper database design and indexes and all of that stuff, simply because the default config was so "conservative" and it couldn't be just set to "use whatever resources are available".

Not to be unkind but this does seem lazy. There are literally hundreds of "how to make postgres go fast", "how to optimize postgres" if you take 15 minutes to Google. It is true that the project (outside of the wiki) doesn't have much information in the official documentation but that doesn't mean that the information is not available.

Hundreds of crappy, outdated, confusing, badly written "web tutorialz" are worth nothing. A couple of clear, unambiguous documentation paragraphs are worth their (metaphorical) weight in gold.

Claiming that "the information is out there" is just hand-waving. It's shifting the burden to the user to actively hunt for information, and very likely be misled by all the garbage articles out there. I learned some horrible practices early on from "web tutz" and it took me many years to unlearn that stuff.

I know that writing documentation isn't fun, but it's necessary. Also, my overall point was to not even have to deal with the specifics, but just be able to tell PG with a single config option that it's allowed to "use most of the machine's resources".

I wish so much for PG to have a mode where it self-tunes itself as needed, over time, based on the actual workload, or at least allowed some kind of abstract "performance mode" such as: "you are allowed to use significant system resources, PG", or: "You are one of my most important applications. Just use as much power as you currently need, but at least save about 10% for the rest of the system, will you?" Maybe this is also harder than it sounds to accomplish, but for somebody like me who has zero funding, I cannot hire some professional to sit down with me and fine-tune my system for $899/hour.

See my comment about Google. The information is out there and easy to find.

I guess I'm the worst idiot in the world, then, who can't DuckDuckGo (Google is evil) it even after 15 years.

Seriously, I didn't type my feedback "for fun". It may be difficult for very intelligent people to understand (as often is the case, because you operate on a whole different level), but the performance-related PostgreSQL configuration options are a *nightmare* to me and many others. I spent *forever* reading about them and couldn't make any sense of it all. Each time I tried, I would give up, frustrated and angry, with no real clue what "magic numbers" it wanted.

It's quite baffling to me how this can be so difficult for you all to understand. Even if we disregard the sheer intelligence factor, it's clear that users of PG don't have the same intimate knowledge of PG's internals as the PG developers, nor could possibly be expected to.

As mentioned, I kept going back to the default configuration over and over again. Anyone who doesn't is either a genius or pretends/thinks that they understand it. (Or I'm extremely dumb.)

Very often, I get the feeling that things like that are the way they are on purpose. Work security and whatnot. But it's very frustrating for people like me who can't afford to buy help and don't have the enormous brain capacity necessary to comprehend the complex relations between the numerous performance-related config options. It really is that difficult.

Discord and Slack

Those modern services don't even let me load them. It's the same thing with everything these days: "verify with phone", "we've detected suspicious activity", "fake error message", etc.

#15Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Noname (#14)
Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

On 9/22/20 3:28 PM, tutiluren@tutanota.com wrote:

Sep 21, 2020, 7:53 PM by jd@commandprompt.com:

I have to agree that pg_dump is largely a step child backup program.
It has consistently been found over the years to be lacking in a
number of areas. Unfortunately, working on pg_dump isn't sexy and it
is difficult to get volunteers or even paid resources to do such a
thing. The real solution for pg_dump is a complete refactor which
includes pg_dumpall and it is not a small undertaking. It should be
noted that it is also a less and less used program. On our team it
is normally used for only very specific needs (grabbing a schema)
and we use binary backups or logical replication to receive specific
data.

Huh? Are you saying that there is another, superior way to back up
PostgreSQL databases other than pg_dump? I re-read the manual on it just
now, but didn't see a single word about it being "legacy" or
"deprecated" or even that there's any other way to do it. What do you mean?

It is not deprecated or legacy and it is still used by many. The issue
is that people try to do 'partial' dumps without reading the docs on
what those switches actually do and the potential conflicts therein. For
large installations there it is often better to use some form of
replication(binary and/or logical) to maintain a continuous backup.
pg_dump can take a long time on very big databases, so it may not keep up.

This is really using the wrong tool for the job type of issue. PG
was never designed for such a scenario.

I hate the "wrong tool for the job" argument. It assumes that everyone
has infinite time, energy and brain capacity to learn endless redundant
tools just to "use the right tool for the job" rather than "use what you
actually know". I know PG. I don't know SQLite. They are very different.
So obviously, I want to use PG.

This comes down to what your definition of embedded is? A matter of
determining whether we are talking apples or oranges.

What exactly makes PG unsuitable for this? I don't get it. But at the
same time, I also realize that it's not going to happen at this point.
The entire concept of a desktop computer appears to be phased out as we
speak...

While I understand what you mean, and even agree in theory, in practice,
this always results in crappy third-party solutions which I don't want
to deal with. PostGIS, for example, forces me to use "postgis" for its
schema instead of "PostGIS" just because they arrogantly didn't
construct their internal queries properly. "Little" things like that.

Huh? A schema is just a name space, why does it matter how the extension
chooses to define it? I mean you could have number of permutations of
postgis.

The practical end result of this is that I've always gone back
to using the untouched default configuration file (except for
the logging-related options), which, especially in the past on
FreeBSD, *severely* crippled my PG database to not even come
close to taking advantage of the full power of the hardware.
Instead, it felt like I was using maybe 1% of the machine's
power, even with a proper database design and indexes and all of
that stuff, simply because the default config was so
"conservative" and it couldn't be just set to "use whatever
resources are available".

Not to be unkind but this does seem lazy. There are literally
hundreds of "how to make postgres go fast", "how to optimize
postgres" if you take 15 minutes to Google. It is true that the
project (outside of the wiki) doesn't have much information in the
official documentation but that doesn't mean that the information is
not available.

Hundreds of crappy, outdated, confusing, badly written "web tutorialz"
are worth nothing. A couple of clear, unambiguous documentation
paragraphs are worth their (metaphorical) weight in gold.

The problem is the 'unambiguous' part. There are so many ways Postgres
is used, writing a configuration doc that suited everyone would have so
many if/and/or/buts that it would define ambiguity. If you want that
information post an email here with the information on your server specs
and proposed usage.

Claiming that "the information is out there" is just hand-waving. It's
shifting the burden to the user to actively hunt for information, and
very likely be misled by all the garbage articles out there. I learned
some horrible practices early on from "web tutz" and it took me many
years to unlearn that stuff.

I know that writing documentation isn't fun, but it's necessary. Also,
my overall point was to not even have to deal with the specifics, but
just be able to tell PG with a single config option that it's allowed to
"use most of the machine's resources".

That would entail building an AI into the code that would deal with all
the possible OS(versions), Postgres(versions), hardware permutations.

I wish so much for PG to have a mode where it self-tunes itself
as needed, over time, based on the actual workload, or at least
allowed some kind of abstract "performance mode" such as: "you
are allowed to use significant system resources, PG", or: "You
are one of my most important applications. Just use as much
power as you currently need, but at least save about 10% for the
rest of the system, will you?" Maybe this is also harder than it
sounds to accomplish, but for somebody like me who has zero
funding, I cannot hire some professional to sit down with me and
fine-tune my system for $899/hour.

See my comment about Google. The information is out there and easy
to find.

I guess I'm the worst idiot in the world, then, who can't DuckDuckGo
(Google is evil) it even after 15 years.

Seriously, I didn't type my feedback "for fun". It may be difficult for
very intelligent people to understand (as often is the case, because you
operate on a whole different level), but the performance-related
PostgreSQL configuration options are a *nightmare* to me and many
others. I spent *forever* reading about them and couldn't make any sense
of it all. Each time I tried, I would give up, frustrated and angry,
with no real clue what "magic numbers" it wanted.

It's quite baffling to me how this can be so difficult for you all to
understand. Even if we disregard the sheer intelligence factor, it's
clear that users of PG don't have the same intimate knowledge of PG's
internals as the PG developers, nor could possibly be expected to.

As mentioned, I kept going back to the default configuration over and
over again. Anyone who doesn't is either a genius or pretends/thinks
that they understand it. (Or I'm extremely dumb.)

Does your server runs to your satisfaction with the default settings?
If not what issues do you see?

Very often, I get the feeling that things like that are the way they are
on purpose. Work security and whatnot. But it's very frustrating for
people like me who can't afford to buy help and don't have the enormous
brain capacity necessary to comprehend the complex relations between the
numerous performance-related config options. It really is that difficult.

Discord and Slack

Those modern services don't even let me load them. It's the same thing
with everything these days: "verify with phone", "we've detected
suspicious activity", "fake error message", etc.

--
Adrian Klaver
adrian.klaver@aklaver.com

#16Noname
tutiluren@tutanota.com
In reply to: Noname (#1)
Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

I have to agree that pg_dump is largely a step child backup program.
It has consistently been found over the years to be lacking in a
number of areas. Unfortunately, working on pg_dump isn't sexy and it
is difficult to get volunteers or even paid resources to do such a
thing. The real solution for pg_dump is a complete refactor which
includes pg_dumpall and it is not a small undertaking. It should be
noted that it is also a less and less used program. On our team it
is normally used for only very specific needs (grabbing a schema)
and we use binary backups or logical replication to receive specific
data.

Huh? Are you saying that there is another, superior way to back up PostgreSQL databases other than pg_dump? I re-read the manual on it just now, but didn't see a single word about it being "legacy" or "deprecated" or even that there's any other way to do it. What do you mean?

It is not deprecated or legacy and it is still used by many. The issue is that people try to do 'partial' dumps without reading the docs on what those switches actually do and the potential conflicts therein. For large installations there it is often better to use some form of replication(binary and/or logical) to maintain a continuous backup. pg_dump can take a long time on very big databases, so it may not keep up.

Well, I've never done partial backups, frankly because I don't trust it to actually work. Full backups may be more wasteful with disk space, but at least "makes sense" in some abstract way. (Also, I try to keep as little data as possible, so they are never *gigantic*.)

This is really using the wrong tool for the job type of issue. PG
was never designed for such a scenario.

I hate the "wrong tool for the job" argument. It assumes that everyone has infinite time, energy and brain capacity to learn endless redundant tools just to "use the right tool for the job" rather than "use what you actually know". I know PG. I don't know SQLite. They are very different. So obviously, I want to use PG.

This comes down to what your definition of embedded is? A matter of determining whether we are talking apples or oranges.

Just what I said in my original question. Something which is bundled and invisible to the user.

While I understand what you mean, and even agree in theory, in practice, this always results in crappy third-party solutions which I don't want to deal with. PostGIS, for example, forces me to use "postgis" for its schema instead of "PostGIS" just because they arrogantly didn't construct their internal queries properly. "Little" things like that.

Huh? A schema is just a name space, why does it matter how the extension chooses to define it? I mean you could have number of permutations of postgis.

I'm saying that PostGIS has a bug due to incorrectly constructed internal queries which makes it impossible to properly name the schema where PostGIS is to reside, causing my database to look very ugly when it has to say "postgis" instead of "PostGIS" for PostGIS's schema. And that was an example of how sloppy/bad third-party things always are, and is one reason why I don't like it when I have to rely on "extensions".

I know that writing documentation isn't fun, but it's necessary. Also, my overall point was to not even have to deal with the specifics, but just be able to tell PG with a single config option that it's allowed to "use most of the machine's resources".

That would entail building an AI into the code that would deal with all the possible OS(versions), Postgres(versions), hardware permutations.

I... guess. If "AI" means "a series of ifs". Which is what software... is? I doubt that people who can make the world's most advanced open source database cannot check the amount of RAM and see how fast the CPU/disk is.

Does your server runs to your satisfaction with the default settings?

Right now, yes, but that says very little as I'm the only user of it. I've had many nightmares in the past, however, where even determining whether the changes in the config did anything (good or bad) has been impossible. I fundamentally don't like the idea that the config is so "conservative" (crippled) with no obvious/easy way to "set a different general mode". If you honestly think that the numerous performance-related options are easy to understand, I don't know what to say.

#17Rob Sargent
robjsargent@gmail.com
In reply to: Noname (#16)
Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

That would entail building an AI into the code that would deal with
all the possible OS(versions), Postgres(versions), hardware
permutations.

I... guess. If "AI" means "a series of ifs". Which is what software...
is? I doubt that people who can make the world's most advanced open
source database cannot check the amount of RAM and see how fast the
CPU/disk is.

But the very nature of "embedded" implies the db is subservient to
something else, and that something else may need most of the resources.

But I heartily agree that tuning a server can be difficult. It's highly
dependent on the context and hardware resources is only a portion of
that context. I've seen generous help on this list for specific
configuration help.

#18Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Noname (#16)
Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

On 9/23/20 11:51 AM, tutiluren@tutanota.com wrote:

It is not deprecated or legacy and it is still used by many. The
issue is that people try to do 'partial' dumps without reading the
docs on what those switches actually do and the potential conflicts
therein. For large installations there it is often better to use
some form of replication(binary and/or logical) to maintain a
continuous backup. pg_dump can take a long time on very big
databases, so it may not keep up.

Well, I've never done partial backups, frankly because I don't trust it
to actually work. Full backups may be more wasteful with disk space, but
at least "makes sense" in some abstract way. (Also, I try to keep as
little data as possible, so they are never *gigantic*.)

Well not partial as in incremental. Instead dump only some portion of
the schema with or without its associated data.

This comes down to what your definition of embedded is? A matter of
determining whether we are talking apples or oranges.

Just what I said in my original question. Something which is bundled and
invisible to the user.

That is going to be difficult with Postgres as it is its own process. It
would take a good bit of tooling to hide that from the user.

Huh? A schema is just a name space, why does it matter how the
extension chooses to define it? I mean you could have number of
permutations of postgis.

I'm saying that PostGIS has a bug due to incorrectly constructed
internal queries which makes it impossible to properly name the schema
where PostGIS is to reside, causing my database to look very ugly when
it has to say "postgis" instead of "PostGIS" for PostGIS's schema. And
that was an example of how sloppy/bad third-party things always are, and
is one reason why I don't like it when I have to rely on "extensions".

If that is the sum of your issues with PostGIS then I really don't have
much sympathy. They are extensions so you aren't required to use them
and rely on their way of doing things. You have the choice of writing
your own code/extension or do without completely. I personally find
compromise on my part wrt to extensions greatly reduces the amount of
coding/effort I have to do to move forward. Honestly, I don't see how it
is different from the compromises that arise out of using the core
project itself.

That would entail building an AI into the code that would deal with
all the possible OS(versions), Postgres(versions), hardware
permutations.

I... guess. If "AI" means "a series of ifs". Which is what software...
is? I doubt that people who can make the world's most advanced open
source database cannot check the amount of RAM and see how fast the
CPU/disk is.

It is more then that. It would have to take into account the behavior
changes that happen in Postgres between major versions. It also would
have to account for OS specific parameters and the changes that happen
there between OS versions. It also would need to 'know' how the database
was going to be used; readonly, heavy writes, etc. Also how the database
should play with other programs on the same machine. Add to the mix
containers, cloud instances and so on and you are outrunning the ability
of 'ifs' to handle it.

Does your server runs to your satisfaction with the default settings?

Right now, yes, but that says very little as I'm the only user of it.
I've had many nightmares in the past, however, where even determining
whether the changes in the config did anything (good or bad) has been
impossible. I fundamentally don't like the idea that the config is so
"conservative" (crippled) with no obvious/easy way to "set a different
general mode". If you honestly think that the numerous
performance-related options are easy to understand, I don't know what to
say.

The thing is 'general mode' is going to mean something different to
someone running a database in the MB-low GB range vs. high GB vs. TB vs.
PB.

As to monitoring see:

https://www.postgresql.org/docs/12/monitoring-stats.html

https://www.postgresql.org/docs/12/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT

I know it is an extension, still it is useful:

https://www.postgresql.org/docs/12/pgstatstatements.html

--
Adrian Klaver
adrian.klaver@aklaver.com

#19Tony Shelver
tshelver@gmail.com
In reply to: Adrian Klaver (#18)
Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

On 9/23/20 11:51 AM, tutiluren@tutanota.com wrote:

Huh? A schema is just a name space, why does it matter how the
extension chooses to define it? I mean you could have number of
permutations of postgis.

I'm saying that PostGIS has a bug due to incorrectly constructed
internal queries which makes it impossible to properly name the schema
where PostGIS is to reside, causing my database to look very ugly when
it has to say "postgis" instead of "PostGIS" for PostGIS's schema. And
that was an example of how sloppy/bad third-party things always are, and
is one reason why I don't like it when I have to rely on "extensions".

Ummmm? I have PostGIS installed in my core app schema, in part because at
the time I didn't know what I was doing. Better to have been in public...

You may also want to look at comparable ANSI (standards based) database
products (Oracle for example) when it comes to the use of case in naming
conventions. Different products handle things in different ways.

You may want to google around the issue, for example
https://postgis.net/2017/11/07/tip-move-postgis-schema/ for moving schemas.

You may want to do some research on where PostGIS comes from. It could
never have been developed as a core part of Postgres, so the fact that
products like PostGIS are so domain specific.
The fact that the Postgesql extension system is so flexible and robust was
probably a key factor in the choice it's choice in the development of
PostGIS.

Effectively, you may lose a bit, but you gain a whole lot more. My
suggestion is 'live with it'.

Or, move to a product that suits your use cases / desires better, But, good
luck finding another open source "free" (or any) product with the
functionality, robusiness and performance of PostGIS / Postgresql.
We tried the MS SQLServer equivalent, the install and use there is way
nastier than PostGIS. Both the use and the performance sucked in
comparison for our use cases.
MySQL's equivalent is nowhere near as functional, robust, as well
documented or as widely used.

Oracle's equivalent is probably the closest. Pity that installing Oracle
and their products as a whole is a nightmare, and rather wallet-draining...

Have a look at GIS / Mapping projects around the world, a majority are
implemented on PostGIS. Openstreetmap is probably the biggest (think open
source version of Google Maps), and it moved to PostGIS from MySQL several
years ago.
We did a lot of research into PostGIS, as GIS / tracking is a core part of
our business.
We didn't find a product that could compare on

- Maturity
- Functionality
- Performance
- Cost
- Documentation
- Support (huge community)

#20Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Tony Shelver (#19)
Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

On 24/09/2020 18:13, Tony Shelver wrote:

On 9/23/20 11:51 AM, tutiluren@tutanota.com
<mailto:tutiluren@tutanota.com> wrote:

     Huh? A schema is just a name space, why does it matter how the
     extension chooses to define it? I mean you could have number of
     permutations of postgis.

I'm saying that PostGIS has a bug due to incorrectly constructed
internal queries which makes it impossible to properly name the schema
where PostGIS is to reside, causing my database to look very ugly when
it has to say "postgis" instead of "PostGIS" for PostGIS's schema. And
that was an example of how sloppy/bad third-party things always are, and
is one reason why I don't like it when I have to rely on "extensions".

All lowercase is good, as you don't have to remember which bits are
capitalized.  And besides, there are far more important issues to
consider as Tony has covered in detail.  It is definitely not a bug!

[..]

Oracle's equivalent is probably the closest.  Pity that installing
Oracle and their products as a whole is a nightmare, and rather
wallet-draining...

Have a look at GIS / Mapping projects around the world, a majority are
implemented on PostGIS.  Openstreetmap is probably the biggest (think
open source version of Google Maps), and it moved to PostGIS from
MySQL several years ago.
We did a lot of research into PostGIS, as GIS / tracking is a core
part of our business.
We didn't find a product that could compare on

* Maturity
* Functionality
* Performance
* Cost
* Documentation
* Support (huge community)

I remember going to free seminars promoting the Oracle database over 25
years ago, and the only thing I can remembers now is the pie charts
saying how much revenue Oracle had versus all the others.  Never  how
many transactions, size of databases, number of users, nor any really
useful metric -- just how much Oracle was being paid!

To be honest, I've heard many bad things about Oracle, and rarely
anything good.

There are many reasons for going with PostgreSQL and PostGIS, but you
are free to use something else if you prefer.

Cheers,
Gavin

#21Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#10)
#22Bruce Momjian
bruce@momjian.us
In reply to: Noname (#14)
#23Noname
tutiluren@tutanota.com
In reply to: Noname (#1)
#24Ron
ronljohnsonjr@gmail.com
In reply to: Bruce Momjian (#22)
#25Noname
tutiluren@tutanota.com
In reply to: Noname (#1)
#26Noname
tutiluren@tutanota.com
In reply to: Adrian Klaver (#18)
#27Thomas Kellerer
shammat@gmx.net
In reply to: Noname (#23)
#28Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Thomas Kellerer (#27)
#29Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Noname (#26)
#30Thomas Kellerer
shammat@gmx.net
In reply to: Adrian Klaver (#28)
#31Johannes Graën
johannes@selfnet.de
In reply to: Noname (#26)
#32Bruce Momjian
bruce@momjian.us
In reply to: Ron (#24)
#33Bruce Momjian
bruce@momjian.us
In reply to: Thomas Kellerer (#27)
#34Sam Gendler
sgendler@ideasculptor.com
In reply to: Noname (#26)
#35John DeSoi
desoi@pgedit.com
In reply to: Noname (#14)
#36Noname
tutiluren@tutanota.com
In reply to: Noname (#1)
#37Christophe Pettus
xof@thebuild.com
In reply to: Noname (#36)
#38Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Noname (#36)
#39Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Adrian Klaver (#18)
#40Achilleas Mantzios
achill@matrix.gatewaynet.com
In reply to: Noname (#36)