Let's make PostgreSQL multi-threaded

Started by Heikki Linnakangasalmost 3 years ago142 messageshackers

heikki.linnakangas@enterprisedb.com

almost 3 years ago

I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
so that the whole server runs in a single process, with multiple
threads. It has been discussed many times in the past, last thread on
pgsql-hackers was back in 2017 when Konstantin made some experiments [0]/messages/by-id/9defcb14-a918-13fe-4b80-a0b02ff85527@postgrespro.ru.

I feel that there is now pretty strong consensus that it would be a good
thing, more so than before. Lots of work to get there, and lots of
details to be hashed out, but no objections to the idea at a high level.

The purpose of this email is to make that silent consensus explicit. If
you have objections to switching from the current multi-process
architecture to a single-process, multi-threaded architecture, please
speak up.

If there are no major objections, I'm going to update the developer FAQ,
removing the excuses there for why we don't use threads [1]https://wiki.postgresql.org/wiki/Developer_FAQ#Why_don.27t_you_use_raw_devices.2C_async-I.2FO.2C_.3Cinsert_your_favorite_wizz-bang_feature_here.3E.3F. And we can
start to talk about the path to get there. Below is a list of some
hurdles and proposed high-level solutions. This isn't an exhaustive
list, just some of the most obvious problems:

# Transition period

The transition surely cannot be done fully in one release. Even if we
could pull it off in core, extensions will need more time to adapt.
There will be a transition period of at least one release, probably
more, where you can choose multi-process or multi-thread model using a
GUC. Depending on how it goes, we can document it as experimental at first.

# Thread per connection

To get started, it's most straightforward to have one thread per
connection, just replacing backend process with a backend thread. In the
future, we might want to have a thread pool with some kind of a
scheduler to assign active queries to worker threads. Or multiple
threads per connection, or spawn additional helper threads for specific
tasks. But that's future work.

# Global variables

We have a lot of global and static variables:

$ objdump -t bin/postgres | grep -e "\.data" -e "\.bss" | grep -v
"data.rel.ro" | wc -l
1666

Some of them are pointers to shared memory structures and can stay as
they are. But many of them are per-connection state. The most
straightforward conversion for those is to turn them into thread-local
variables, like Konstantin did in [0]/messages/by-id/9defcb14-a918-13fe-4b80-a0b02ff85527@postgrespro.ru.

It might be good to have some kind of a Session context struct that we
pass everywhere, or maybe have a single thread-local variable to hold
it. Many of the global variables would become fields in the Session. But
that's future work.

# Extensions

A lot of extensions also contain global variables or other things that
break in a multi-threaded environment. We need a way to label extensions
that support multi-threading. And in the future, also extensions that
*require* a multi-threaded server.

Let's add flags to the control file to mark if the extension is
thread-safe and/or process-safe. If you try to load an extension that's
not compatible with the server's mode, throw an error.

We might need new functions in addition _PG_init, called at connection
startup and shutdown. And background worker API probably needs some changes.

# Exposed PIDs

We expose backend process PIDs to users in a few places.
pg_stat_activity.pid and pg_terminate_backend(), for example. They need
to be replaced, or we can assign a fake PID to each connection when
running in multi-threaded mode.

# Signals

We use signals for communication between backends. SIGURG in latches,
and SIGUSR1 in procsignal, for example. Those primitives need to be
rewritten with some other signalling mechanism in multi-threaded mode.
In principle, it's possible to set per-thread signal handlers, and send
a signal to a particular thread (pthread_kill), but I think it's better
to just rewrite them.

We also document that you can send SIGINT, SIGTERM or SIGHUP to an
individual backend process. I think we need to deprecate that, and maybe
come up with some convenient replacement. E.g. send a message with
backend ID to a unix domain socket, and a new pg_kill executable to send
those messages.

# Restart on crash

If a backend process crashes, postmaster terminates all other backends
and restarts the system. That's hard (impossible?) to do safely if
everything runs in one process. We can continue have a separate
postmaster process that just monitors the main process and restarts it
on crash.

# Thread-safe libraries

Need to switch to thread-safe versions of library functions, e.g.
uselocale() instead of setlocale().

The Python interpreter has a Global Interpreter Lock. It's not possible
to create two completely independent Python interpreters in the same
process, there will be some lock contention on the GIL. Fortunately, the
python community just accepted https://peps.python.org/pep-0684/. That's
exactly what we need: it makes it possible for separate interpreters to
have their own GILs. It's not clear to me if that's in Python 3.12
already, or under development for some future version, but by the time
we make the switch in Postgres, there probably will be a solution in
cpython.

At a quick glance, I think perl and TCL are fine, you can have multiple
interpreters in one process. Need to check any other libraries we use.

[0]: /messages/by-id/9defcb14-a918-13fe-4b80-a0b02ff85527@postgrespro.ru
/messages/by-id/9defcb14-a918-13fe-4b80-a0b02ff85527@postgrespro.ru

[1]: https://wiki.postgresql.org/wiki/Developer_FAQ#Why_don.27t_you_use_raw_devices.2C_async-I.2FO.2C_.3Cinsert_your_favorite_wizz-bang_feature_here.3E.3F
https://wiki.postgresql.org/wiki/Developer_FAQ#Why_don.27t_you_use_raw_devices.2C_async-I.2FO.2C_.3Cinsert_your_favorite_wizz-bang_feature_here.3E.3F

--
Heikki Linnakangas
Neon (https://neon.tech)

Tom Lane

tgl@sss.pgh.pa.us

almost 3 years ago

In reply to: Heikki Linnakangas (#1)

Re: Let's make PostgreSQL multi-threaded

Heikki Linnakangas <hlinnaka@iki.fi> writes:

I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
so that the whole server runs in a single process, with multiple
threads. It has been discussed many times in the past, last thread on
pgsql-hackers was back in 2017 when Konstantin made some experiments [0].

I feel that there is now pretty strong consensus that it would be a good
thing, more so than before. Lots of work to get there, and lots of
details to be hashed out, but no objections to the idea at a high level.

The purpose of this email is to make that silent consensus explicit. If
you have objections to switching from the current multi-process
architecture to a single-process, multi-threaded architecture, please
speak up.

For the record, I think this will be a disaster. There is far too much
code that will get broken, largely silently, and much of it is not
under our control.

regards, tom lane

Tristan Partin

tristan@neon.tech

almost 3 years ago

In reply to: Heikki Linnakangas (#1)

Re: Let's make PostgreSQL multi-threaded

On Mon Jun 5, 2023 at 9:51 AM CDT, Heikki Linnakangas wrote:

# Global variables

We have a lot of global and static variables:

$ objdump -t bin/postgres | grep -e "\.data" -e "\.bss" | grep -v
"data.rel.ro" | wc -l
1666

Some of them are pointers to shared memory structures and can stay as
they are. But many of them are per-connection state. The most
straightforward conversion for those is to turn them into thread-local
variables, like Konstantin did in [0].

It might be good to have some kind of a Session context struct that we
pass everywhere, or maybe have a single thread-local variable to hold
it. Many of the global variables would become fields in the Session. But
that's future work.

+1 to the session context idea after the more simple thread_local
storage idea.

# Extensions

A lot of extensions also contain global variables or other things that
break in a multi-threaded environment. We need a way to label extensions
that support multi-threading. And in the future, also extensions that
*require* a multi-threaded server.

Let's add flags to the control file to mark if the extension is
thread-safe and/or process-safe. If you try to load an extension that's
not compatible with the server's mode, throw an error.

We might need new functions in addition _PG_init, called at connection
startup and shutdown. And background worker API probably needs some changes.

It would be a good idea to start exposing a variable through pkg-config
to tell whether the backend is multi-threaded or multi-process.

# Exposed PIDs

We expose backend process PIDs to users in a few places.
pg_stat_activity.pid and pg_terminate_backend(), for example. They need
to be replaced, or we can assign a fake PID to each connection when
running in multi-threaded mode.

Would it be possible to just transparently slot in the thread ID
instead?

# Thread-safe libraries

Need to switch to thread-safe versions of library functions, e.g.
uselocale() instead of setlocale().

Seems like a good starting point.

The Python interpreter has a Global Interpreter Lock. It's not possible
to create two completely independent Python interpreters in the same
process, there will be some lock contention on the GIL. Fortunately, the
python community just accepted https://peps.python.org/pep-0684/. That's
exactly what we need: it makes it possible for separate interpreters to
have their own GILs. It's not clear to me if that's in Python 3.12
already, or under development for some future version, but by the time
we make the switch in Postgres, there probably will be a solution in
cpython.

3.12 is the currently in-development version of Python. 3.12 is planned
for release in October of this year.

A workaround that some projects seem to do is to use multiple Python
interpreters[0]https://peps.python.org/pep-0684/#existing-use-of-multiple-interpreters, though it seems uncommon. It might be important to note
depending on the minimum version of Python Postgres aims to support (not
sure on this policy).

The C-API of Python also provides mechanisms for releasing the GIL. I am
not familiar with how Postgres uses Python, but I have seen huge
improvements to performance with well-placed GIL releases in
multi-threaded contexts. Surely this API would just become a no-op after
the PEP is implemented.

[0]: https://peps.python.org/pep-0684/#existing-use-of-multiple-interpreters

--
Tristan Partin
Neon (https://neon.tech)

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 3 years ago

In reply to: Tom Lane (#2)

Re: Let's make PostgreSQL multi-threaded

On 05/06/2023 11:18, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
so that the whole server runs in a single process, with multiple
threads. It has been discussed many times in the past, last thread on
pgsql-hackers was back in 2017 when Konstantin made some experiments [0].

I feel that there is now pretty strong consensus that it would be a good
thing, more so than before. Lots of work to get there, and lots of
details to be hashed out, but no objections to the idea at a high level.

The purpose of this email is to make that silent consensus explicit. If
you have objections to switching from the current multi-process
architecture to a single-process, multi-threaded architecture, please
speak up.

For the record, I think this will be a disaster. There is far too much
code that will get broken, largely silently, and much of it is not
under our control.

Noted. Other large projects have gone through this transition. It's not
easy, but it's a lot easier now than it was 10 years ago. The platform
and compiler support is there now, all libraries have thread-safe
interfaces, etc.

I don't expect you or others to buy into any particular code change at
this point, or to contribute time into it. Just to accept that it's a
worthwhile goal. If the implementation turns out to be a disaster, then
it won't be accepted, of course. But I'm optimistic.

--
Heikki Linnakangas
Neon (https://neon.tech)

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 3 years ago

In reply to: Tristan Partin (#3)

Re: Let's make PostgreSQL multi-threaded

On 05/06/2023 11:28, Tristan Partin wrote:

On Mon Jun 5, 2023 at 9:51 AM CDT, Heikki Linnakangas wrote:

# Extensions

A lot of extensions also contain global variables or other things that
break in a multi-threaded environment. We need a way to label extensions
that support multi-threading. And in the future, also extensions that
*require* a multi-threaded server.

Let's add flags to the control file to mark if the extension is
thread-safe and/or process-safe. If you try to load an extension that's
not compatible with the server's mode, throw an error.

We might need new functions in addition _PG_init, called at connection
startup and shutdown. And background worker API probably needs some changes.

It would be a good idea to start exposing a variable through pkg-config
to tell whether the backend is multi-threaded or multi-process.

I think we need to support both modes without having to recompile the
server or the extensions. So it needs to be a runtime check.

# Exposed PIDs

We expose backend process PIDs to users in a few places.
pg_stat_activity.pid and pg_terminate_backend(), for example. They need
to be replaced, or we can assign a fake PID to each connection when
running in multi-threaded mode.

Would it be possible to just transparently slot in the thread ID
instead?

Perhaps. It might break applications that use the PID directly with e.g.
'kill <PID>', though.

The Python interpreter has a Global Interpreter Lock. It's not possible
to create two completely independent Python interpreters in the same
process, there will be some lock contention on the GIL. Fortunately, the
python community just accepted https://peps.python.org/pep-0684/. That's
exactly what we need: it makes it possible for separate interpreters to
have their own GILs. It's not clear to me if that's in Python 3.12
already, or under development for some future version, but by the time
we make the switch in Postgres, there probably will be a solution in
cpython.

3.12 is the currently in-development version of Python. 3.12 is planned
for release in October of this year.

A workaround that some projects seem to do is to use multiple Python
interpreters[0], though it seems uncommon. It might be important to note
depending on the minimum version of Python Postgres aims to support (not
sure on this policy).

The C-API of Python also provides mechanisms for releasing the GIL. I am
not familiar with how Postgres uses Python, but I have seen huge
improvements to performance with well-placed GIL releases in
multi-threaded contexts. Surely this API would just become a no-op after
the PEP is implemented.

[0]: https://peps.python.org/pep-0684/#existing-use-of-multiple-interpreters

Oh, cool. I'm inclined to jump straight to PEP-684 and require python
3.12 in multi-threaded mode, though, or just accept that it's slow. But
let's see what the state of the world is when we get there.

--
Heikki Linnakangas
Neon (https://neon.tech)

Ranier Vilela

ranier.vf@gmail.com

almost 3 years ago

In reply to: Heikki Linnakangas (#5)

Re: Let's make PostgreSQL multi-threaded

On 05/06/2023 11:18, Tom Lane wrote:

Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:

I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
so that the whole server runs in a single process, with multiple
threads. It has been discussed many times in the past, last thread on
pgsql-hackers was back in 2017 when Konstantin made some experiments [0].

I feel that there is now pretty strong consensus that it would be a good
thing, more so than before. Lots of work to get there, and lots of
details to be hashed out, but no objections to the idea at a high level.

The purpose of this email is to make that silent consensus explicit. If
you have objections to switching from the current multi-process
architecture to a single-process, multi-threaded architecture, please
speak up.

For the record, I think this will be a disaster. There is far too much
code that will get broken, largely silently, and much of it is not
under our control.

I fully agreed with Tom.

First, it is not clear what are the benefits of architecture change?

Performance?

Development becomes much more complicated and error-prone.

There are still many low-hanging fruit to be had that can improve
performance.
And the code can gradually and safely remove multithreading barriers.

1. gradual reduction of global variables
2. introduction of local context structures
3. shrink current structures (to fit in 32, 64 boundaries)

4. scope reduction

My 2c.

regards,

Ranier Vilela

Import Notes

Resolved by subject fallback

Bruce Momjian

bruce@momjian.us

almost 3 years ago

In reply to: Ranier Vilela (#6)

Re: Let's make PostgreSQL multi-threaded

On Mon, Jun 5, 2023 at 01:26:00PM -0300, Ranier Vilela wrote:

On 05/06/2023 11:18, Tom Lane wrote:

For the record, I think this will be a disaster. There is far too much
code that will get broken, largely silently, and much of it is not
under our control.

I fully agreed with Tom.

First, it is not clear what are the benefits of architecture change?

Performance?

Development becomes much more complicated and error-prone.

I agree the costs of going threaded have been reduced with compiler and
library improvements, but I don't know if they are reduced enough for
the change to be a net benefit, except on Windows where the process
creation overhead is high.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

Ranier Vilela

ranier.vf@gmail.com

almost 3 years ago

In reply to: Bruce Momjian (#7)

Re: Let's make PostgreSQL multi-threaded

Em seg., 5 de jun. de 2023 às 13:42, Bruce Momjian <bruce@momjian.us>
escreveu:

On Mon, Jun 5, 2023 at 01:26:00PM -0300, Ranier Vilela wrote:

On 05/06/2023 11:18, Tom Lane wrote:

For the record, I think this will be a disaster. There is far too much
code that will get broken, largely silently, and much of it is not
under our control.

I fully agreed with Tom.

First, it is not clear what are the benefits of architecture change?

Performance?

Development becomes much more complicated and error-prone.

I agree the costs of going threaded have been reduced with compiler and
library improvements, but I don't know if they are reduced enough for
the change to be a net benefit, except on Windows where the process
creation overhead is high.

Yeah, but process creation, even on windows, is a tiny part of response
time.
SGDB has one connection per user, so one process or thread.

Unlike a webserver like Nginx, with hundreds of thousands connections.
For the record, Nginx is multithread and uses -Werror for default. (Make
all warnings into errors)

regards,
Ranier Vilela

Bruce Momjian

bruce@momjian.us

almost 3 years ago

In reply to: Heikki Linnakangas (#1)

Re: Let's make PostgreSQL multi-threaded

nOn Mon, Jun 5, 2023 at 05:51:57PM +0300, Heikki Linnakangas wrote:

# Restart on crash

If a backend process crashes, postmaster terminates all other backends and
restarts the system. That's hard (impossible?) to do safely if everything
runs in one process. We can continue have a separate postmaster process that
just monitors the main process and restarts it on crash.

It would be good to know what new class of errors would cause server
restarts, e.g., memory allocation failures?

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

#10

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 3 years ago

In reply to: Ranier Vilela (#6)

Re: Let's make PostgreSQL multi-threaded

On 05/06/2023 12:26, Ranier Vilela wrote:

First, it is not clear what are the benefits of architecture change?

Performance?

I doubt it makes much performance difference, at least not initially. It
might help a little with backend startup time, and maybe some other
things. And might reduce the overhead of context switches and TLB cache
misses.

In the long run, a single-process architecture makes it easier to have
shared catalog caches, plan cache, etc. which can improve performance.
And it can make it easier to launch helper threads for things where
worker processes would be too heavy-weight. But those benefits will
require more work, they won't happen just by replacing processes with
threads.

The ease of developing things like that is my motivation.

Development becomes much more complicated and error-prone.

I don't agree with that.

We currently bend over backwards to make all allocations fixed-sized in
shared memory. You learn to live with that, but a lot of things would be
simpler if you could allocate and free in shared memory more freely.
It's no panacea, you still need to be careful with locking and
concurrency. But a lot simpler.

We have built dynamic shared memory etc. over the years to work around
the limitations of shared memory. But it's still a lot more complicated.

Code that doesn't need to communicate with other processes/threads is
simple to write in either model.

There are still many low-hanging fruit to be had that can improve
performance.
And the code can gradually and safely remove multithreading barriers.

1. gradual reduction of global variables
2. introduction of local context structures
3. shrink current structures (to fit in 32, 64 boundaries)

4. scope reduction

Right, the reason I started this thread is to explicitly note that it is
a worthy goal. If it's not, the above steps would be pointless. But if
we agree that it is a worthy goal, we can start to incrementally work
towards it.

--
Heikki Linnakangas
Neon (https://neon.tech)

#11

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 3 years ago

In reply to: Bruce Momjian (#9)

Re: Let's make PostgreSQL multi-threaded

On 05/06/2023 13:10, Bruce Momjian wrote:

nOn Mon, Jun 5, 2023 at 05:51:57PM +0300, Heikki Linnakangas wrote:

# Restart on crash

If a backend process crashes, postmaster terminates all other backends and
restarts the system. That's hard (impossible?) to do safely if everything
runs in one process. We can continue have a separate postmaster process that
just monitors the main process and restarts it on crash.

It would be good to know what new class of errors would cause server
restarts, e.g., memory allocation failures?

You mean "out of memory"? No, that would be horrible.

I don't think there would be any new class of errors that would cause
server restarts. In theory, having a separate address space for each
backend gives you some protection. In practice, there are a lot of
shared memory structures anyway that you can stomp over, and a segfault
or unexpected exit of any backend process causes postmaster to restart
the whole system anyway.

--
Heikki Linnakangas
Neon (https://neon.tech)

#12

Merlin Moncure

mmoncure@gmail.com

almost 3 years ago

In reply to: Heikki Linnakangas (#10)

Re: Let's make PostgreSQL multi-threaded

On Mon, Jun 5, 2023 at 12:25 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

We currently bend over backwards to make all allocations fixed-sized in
shared memory. You learn to live with that, but a lot of things would be
simpler if you could allocate and free in shared memory more freely.
It's no panacea, you still need to be careful with locking and
concurrency. But a lot simpler.

Would this help with oom killer in linux?

Isn't it true that pgbouncer provides a lot of the same benefits?

merlin

#13

Jonathan S. Katz

jkatz@postgresql.org

almost 3 years ago

In reply to: Heikki Linnakangas (#4)

Re: Let's make PostgreSQL multi-threaded

On 6/5/23 11:33 AM, Heikki Linnakangas wrote:

On 05/06/2023 11:18, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
so that the whole server runs in a single process, with multiple
threads. It has been discussed many times in the past, last thread on
pgsql-hackers was back in 2017 when Konstantin made some experiments
[0].

I feel that there is now pretty strong consensus that it would be a good
thing, more so than before. Lots of work to get there, and lots of
details to be hashed out, but no objections to the idea at a high level.

The purpose of this email is to make that silent consensus explicit. If
you have objections to switching from the current multi-process
architecture to a single-process, multi-threaded architecture, please
speak up.

For the record, I think this will be a disaster. There is far too much
code that will get broken, largely silently, and much of it is not
under our control.

Noted. Other large projects have gone through this transition. It's not
easy, but it's a lot easier now than it was 10 years ago. The platform
and compiler support is there now, all libraries have thread-safe
interfaces, etc.

I don't expect you or others to buy into any particular code change at
this point, or to contribute time into it. Just to accept that it's a
worthwhile goal. If the implementation turns out to be a disaster, then
it won't be accepted, of course. But I'm optimistic.

I don't have enough expertise in this area to comment on if it'd be a
"disaster" or not. My zoomed out observations are two-fold:

1. It seems like there's a lack of consensus on which of processes vs.
threads yield the best performance benefit, and from talking to folks
with greater expertise than me, this can vary between workloads. I
believe one DB even gives uses a choice if they want to run in processes
vs. threads.

2. While I wouldn't want to necessarily discourage a moonshot effort, I
would ask if developer time could be better spent on tackling some of
the other problems around vertical scalability? Per some PGCon
discussions, there's still room for improvement in how PostgreSQL can
best utilize resources available very large "commodity" machines (a
448-core / 24TB RAM instance comes to mind).

I'm purposely giving a nonanswer on whether it's a worthwhile goal, but
rather I'd be curious where it could stack up against some other efforts
to continue to help PostgreSQL improve performance and handle very large
workloads.

Thanks,

Jonathan

#14

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 3 years ago

In reply to: Merlin Moncure (#12)

Re: Let's make PostgreSQL multi-threaded

On 05/06/2023 13:32, Merlin Moncure wrote:

Would this help with oom killer in linux?

Hmm, I guess the OOM killer would better understand what Postgres is
doing, it's not very smart about accounting shared memory. You still
wouldn't want the OOM killer to kill Postgres, though, so I think you'd
still want to disable it in production systems.

Isn't it true that pgbouncer provides a lot of the same benefits?

I guess there is some overlap, although I don't really think of it that
way. Firstly, pgbouncer has its own set of problems. Secondly, switching
to threads would not make connection poolers obsolete. Maybe in the
distant future, Postgres could handle thousands of connections with
ease, and threads would make that easier to achieve that, but that would
need a lot of more work.

--
Heikki Linnakangas
Neon (https://neon.tech)

#15

Bruce Momjian

bruce@momjian.us

almost 3 years ago

In reply to: Heikki Linnakangas (#11)

Re: Let's make PostgreSQL multi-threaded

On Mon, Jun 5, 2023 at 08:29:16PM +0300, Heikki Linnakangas wrote:

On 05/06/2023 13:10, Bruce Momjian wrote:

nOn Mon, Jun 5, 2023 at 05:51:57PM +0300, Heikki Linnakangas wrote:

# Restart on crash

If a backend process crashes, postmaster terminates all other backends and
restarts the system. That's hard (impossible?) to do safely if everything
runs in one process. We can continue have a separate postmaster process that
just monitors the main process and restarts it on crash.

It would be good to know what new class of errors would cause server
restarts, e.g., memory allocation failures?

You mean "out of memory"? No, that would be horrible.

I don't think there would be any new class of errors that would cause server
restarts. In theory, having a separate address space for each backend gives
you some protection. In practice, there are a lot of shared memory
structures anyway that you can stomp over, and a segfault or unexpected exit
of any backend process causes postmaster to restart the whole system anyway.

Uh, yes, but don't we detect failures while modifying shared memory and
force a restart? Wouldn't the scope of failures be much larger?

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

#16

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 3 years ago

In reply to: Bruce Momjian (#15)

Re: Let's make PostgreSQL multi-threaded

On 05/06/2023 14:04, Bruce Momjian wrote:

On Mon, Jun 5, 2023 at 08:29:16PM +0300, Heikki Linnakangas wrote:

I don't think there would be any new class of errors that would cause server
restarts. In theory, having a separate address space for each backend gives
you some protection. In practice, there are a lot of shared memory
structures anyway that you can stomp over, and a segfault or unexpected exit
of any backend process causes postmaster to restart the whole system anyway.

Uh, yes, but don't we detect failures while modifying shared memory and
force a restart? Wouldn't the scope of failures be much larger?

If one process writes over shared memory that it shouldn't, it can cause
a crash in that process or some other process that reads it. Same with
multiple threads, no difference there.

With a single process, one thread can modify another thread's "backend
private" memory, and cause the other thread to crash. Perhaps that's
what you meant?

In practice, I don't think it's so bad. Even in a multi-threaded
environment, common bugs like buffer overflows and use-after-free are
still much more likely to access memory owned by the same thread, thanks
to how memory allocators work. And a completely random memory access is
still more likely to cause a segfault than corrupting another thread's
memory. And tools like CLOBBER_FREED_MEMORY/MEMORY_CONTEXT_CHECKING and
valgrind are pretty good at catching memory access bugs at development
time, whether it's multiple processes or threads.

--
Heikki Linnakangas
Neon (https://neon.tech)

#17

Andrew Dunstan

andrew@dunslane.net

almost 3 years ago

In reply to: Tom Lane (#2)

Re: Let's make PostgreSQL multi-threaded

On 2023-06-05 Mo 11:18, Tom Lane wrote:

Heikki Linnakangas<hlinnaka@iki.fi> writes:

I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
so that the whole server runs in a single process, with multiple
threads. It has been discussed many times in the past, last thread on
pgsql-hackers was back in 2017 when Konstantin made some experiments [0].
I feel that there is now pretty strong consensus that it would be a good
thing, more so than before. Lots of work to get there, and lots of
details to be hashed out, but no objections to the idea at a high level.
The purpose of this email is to make that silent consensus explicit. If
you have objections to switching from the current multi-process
architecture to a single-process, multi-threaded architecture, please
speak up.

For the record, I think this will be a disaster. There is far too much
code that will get broken, largely silently, and much of it is not
under our control.

If we were starting out today we would probably choose a threaded
implementation. But moving to threaded now seems to me like a
multi-year-multi-person project with the prospect of years to come
chasing bugs and the prospect of fairly modest advantages. The risk to
reward doesn't look great.

That's my initial reaction. I could be convinced otherwise.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

#18

Pavel Stehule

pavel.stehule@gmail.com

almost 3 years ago

In reply to: Heikki Linnakangas (#10)

Re: Let's make PostgreSQL multi-threaded

In the long run, a single-process architecture makes it easier to have

shared catalog caches, plan cache, etc. which can improve performance.
And it can make it easier to launch helper threads for things where
worker processes would be too heavy-weight. But those benefits will
require more work, they won't happen just by replacing processes with
threads.

The shared plan cache is not a silver bullet. The good management of shared
plan cache is very very difficult. Our heuristic about custom plans in
prepared statements is nothing, and you should reduce the usage of custom
plans too.

There are a lot of issues known from Oracle. The benefits can be just for
very primitive very fast queries, or extra complex queries where generic
plan is used. Current implementation of local plan caches has lot of
issues (that cannot be fixed), but shared plan cache is another level of
complexity

Regards

Pavel

Show quoted text

#19

Joe Conway

mail@joeconway.com

almost 3 years ago

In reply to: Andrew Dunstan (#17)

Re: Let's make PostgreSQL multi-threaded

On 6/5/23 14:51, Andrew Dunstan wrote:

On 2023-06-05 Mo 11:18, Tom Lane wrote:

Heikki Linnakangas<hlinnaka@iki.fi> writes:

I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
so that the whole server runs in a single process, with multiple
threads. It has been discussed many times in the past, last thread on
pgsql-hackers was back in 2017 when Konstantin made some experiments [0].
I feel that there is now pretty strong consensus that it would be a good
thing, more so than before. Lots of work to get there, and lots of
details to be hashed out, but no objections to the idea at a high level.
The purpose of this email is to make that silent consensus explicit. If
you have objections to switching from the current multi-process
architecture to a single-process, multi-threaded architecture, please
speak up.

For the record, I think this will be a disaster. There is far too much
code that will get broken, largely silently, and much of it is not
under our control.

If we were starting out today we would probably choose a threaded
implementation. But moving to threaded now seems to me like a
multi-year-multi-person project with the prospect of years to come
chasing bugs and the prospect of fairly modest advantages. The risk to
reward doesn't look great.

That's my initial reaction. I could be convinced otherwise.

I read through the thread thus far, and Andrew's response is the one
that best aligns with my reaction.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#20

Jonah H. Harris

jonah.harris@gmail.com

almost 3 years ago

In reply to: Tom Lane (#2)

Re: Let's make PostgreSQL multi-threaded

On Mon, Jun 5, 2023 at 8:18 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

For the record, I think this will be a disaster. There is far too much
code that will get broken, largely silently, and much of it is not
under our control.

While I've long been in favor of a multi-threaded implementation, now in my
old age, I tend to agree with Tom. I'd be interested in Konstantin's
thoughts (and PostgresPro's experience) of multi-threaded vs. internal
pooling with the current process-based model. I recall looking at and
playing with Konstantin's implementations of both, which were impressive.
Yes, the latter doesn't solve the same issues, but many real-world ones
where multi-threaded is argued. Personally, I think there would be not only
a significant amount of time spent dealing with in-the-field stability
regressions before a multi-threaded implementation matures, but it would
also increase the learning curve for anyone trying to start with internals
development.

--
Jonah H. Harris

#21

Bruce Momjian

bruce@momjian.us

almost 3 years ago

In reply to: Heikki Linnakangas (#16)

#22

Peter Geoghegan

pg@bowt.ie

almost 3 years ago

In reply to: Bruce Momjian (#21)

#23

Bruce Momjian

bruce@momjian.us

almost 3 years ago

In reply to: Peter Geoghegan (#22)

#24

Jeremy Schneider

schneider@ardentperf.com

almost 3 years ago

In reply to: Jonah H. Harris (#20)

#25

Peter Geoghegan

pg@bowt.ie

almost 3 years ago

In reply to: Bruce Momjian (#23)

#26

Tatsuo Ishii

t-ishii@sra.co.jp

almost 3 years ago

In reply to: Andrew Dunstan (#17)

#27

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

almost 3 years ago

In reply to: Jonah H. Harris (#20)

#28

Robert Haas

robertmhaas@gmail.com

almost 3 years ago

In reply to: Heikki Linnakangas (#1)

#29

Robert Haas

robertmhaas@gmail.com

almost 3 years ago

In reply to: Robert Haas (#28)

#30

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 3 years ago

In reply to: Robert Haas (#28)

#31

Chapman Flack

chap@anastigmatix.net

almost 3 years ago

In reply to: Konstantin Knizhnik (#27)

#32

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 3 years ago

In reply to: Chapman Flack (#31)

#33

Chapman Flack

chap@anastigmatix.net

almost 3 years ago

In reply to: Heikki Linnakangas (#32)

#34

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

almost 3 years ago

In reply to: Robert Haas (#29)

#35

Robert Haas

robertmhaas@gmail.com

almost 3 years ago

In reply to: Heikki Linnakangas (#30)

#36

Kirk Wolak

wolakk@gmail.com

almost 3 years ago

In reply to: Robert Haas (#35)

#37

Robert Haas

robertmhaas@gmail.com

almost 3 years ago

In reply to: Kirk Wolak (#36)

#38

Bruce Momjian

bruce@momjian.us

almost 3 years ago

In reply to: Heikki Linnakangas (#1)

#39

Thomas Munro

thomas.munro@gmail.com

almost 3 years ago

In reply to: Andrew Dunstan (#17)

#40

Tom Lane

tgl@sss.pgh.pa.us

almost 3 years ago

In reply to: Thomas Munro (#39)

#41

Dilip Kumar

dilipbalaut@gmail.com

almost 3 years ago

In reply to: Robert Haas (#35)

#42

Dilip Kumar

dilipbalaut@gmail.com

almost 3 years ago

In reply to: Tom Lane (#40)

#43

Joe Conway

mail@joeconway.com

almost 3 years ago

In reply to: Tom Lane (#40)

#44

Robert Haas

robertmhaas@gmail.com

almost 3 years ago

In reply to: Tom Lane (#40)

#45

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 3 years ago

In reply to: Heikki Linnakangas (#1)

#46

Yura Sokolov

y.sokolov@postgrespro.ru

almost 3 years ago

In reply to: Robert Haas (#44)

#47

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 3 years ago

In reply to: Heikki Linnakangas (#4)

#48

Thomas Munro

thomas.munro@gmail.com

almost 3 years ago

In reply to: Tomas Vondra (#47)

#49

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Heikki Linnakangas (#1)

#50

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Jonathan S. Katz (#13)

#51

Peter Eisentraut

peter_e@gmx.net

almost 3 years ago

In reply to: Andres Freund (#49)

#52

Thomas Kellerer

shammat@gmx.net

almost 3 years ago

In reply to: Tomas Vondra (#47)

#53

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Bruce Momjian (#23)

#54

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Peter Eisentraut (#51)

#55

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Robert Haas (#44)

#56

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Bruce Momjian (#38)

#57

Jeremy Schneider

schneider@ardentperf.com

almost 3 years ago

In reply to: Thomas Kellerer (#52)

#58

Thomas Munro

thomas.munro@gmail.com

almost 3 years ago

In reply to: Jeremy Schneider (#57)

#59

Dilip Kumar

dilipbalaut@gmail.com

almost 3 years ago

In reply to: Andres Freund (#49)

#60

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

almost 3 years ago

In reply to: Jeremy Schneider (#57)

#61

Hannu Krosing

hannu@tm.ee

almost 3 years ago

In reply to: Andres Freund (#50)

#62

Hannu Krosing

hannu@tm.ee

almost 3 years ago

In reply to: Thomas Kellerer (#52)

#63

Hannu Krosing

hannu@tm.ee

almost 3 years ago

In reply to: Andres Freund (#56)

#64

Hannu Krosing

hannu@tm.ee

almost 3 years ago

In reply to: Hannu Krosing (#61)

#65

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 3 years ago

In reply to: Thomas Munro (#58)

#66

Andrew Dunstan

andrew@dunslane.net

almost 3 years ago

In reply to: Andres Freund (#55)

#67

José Luis Tallón

jltallon@adv-solutions.net

almost 3 years ago

In reply to: Andres Freund (#50)

#68

Matthias van de Meent

boekewurm+postgres@gmail.com

almost 3 years ago

In reply to: Hannu Krosing (#61)

#69

Hannu Krosing

hannu@tm.ee

almost 3 years ago

In reply to: Matthias van de Meent (#68)

#70

Robert Haas

robertmhaas@gmail.com

almost 3 years ago

In reply to: Hannu Krosing (#63)

#71

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

almost 3 years ago

In reply to: Robert Haas (#44)

#72

Robert Haas

robertmhaas@gmail.com

almost 3 years ago

In reply to: Andres Freund (#49)

#73

Robert Haas

robertmhaas@gmail.com

almost 3 years ago

In reply to: Andres Freund (#50)

#74

Robert Haas

robertmhaas@gmail.com

almost 3 years ago

In reply to: Peter Eisentraut (#51)

#75

Robert Haas

robertmhaas@gmail.com

almost 3 years ago

In reply to: Andres Freund (#53)

#76

Bruce Momjian

bruce@momjian.us

almost 3 years ago

In reply to: Andres Freund (#56)

#77

Robert Haas

robertmhaas@gmail.com

almost 3 years ago

In reply to: Hannu Krosing (#69)

#78

Hannu Krosing

hannu@tm.ee

almost 3 years ago

In reply to: Robert Haas (#77)

#79

Matthias van de Meent

boekewurm+postgres@gmail.com

almost 3 years ago

In reply to: Hannu Krosing (#69)

#80

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Bruce Momjian (#76)

#81

Matthias van de Meent

boekewurm+postgres@gmail.com

almost 3 years ago

In reply to: Hannu Krosing (#78)

#82

Robert Haas

robertmhaas@gmail.com

almost 3 years ago

In reply to: Hannu Krosing (#78)

#83

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: José Luis Tallón (#67)

#84

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Hannu Krosing (#64)

#85

Greg Sabino Mullane

greg@turnstep.com

almost 3 years ago

In reply to: Hannu Krosing (#69)

#86

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Konstantin Knizhnik (#71)

#87

Ilya Anfimov

ilan@tzirechnoy.com

almost 3 years ago

In reply to: Thomas Munro (#39)

#88

Hannu Krosing

hannu@tm.ee

almost 3 years ago

In reply to: Tom Lane (#2)

#89

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Hannu Krosing (#78)

#90

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Matthias van de Meent (#81)

#91

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Robert Haas (#82)

#92

Thomas Munro

thomas.munro@gmail.com

almost 3 years ago

In reply to: Ilya Anfimov (#87)

#93

José Luis Tallón

jltallon@adv-solutions.net

almost 3 years ago

In reply to: Robert Haas (#72)

#94

Thomas Munro

thomas.munro@gmail.com

almost 3 years ago

In reply to: Andres Freund (#84)

#95

Dmitry Dolgov

9erthalion6@gmail.com

almost 3 years ago

In reply to: Heikki Linnakangas (#5)

#96

Dave Cramer

pg@fastcrypt.com

almost 3 years ago

In reply to: Hannu Krosing (#88)

#97

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Thomas Munro (#94)

#98

Stephan Doliov

stephan.doliov@gmail.com

almost 3 years ago

In reply to: Andres Freund (#97)

#99

Dave Cramer

pg@fastcrypt.com

almost 3 years ago

In reply to: Stephan Doliov (#98)

#100

Matthias van de Meent

boekewurm+postgres@gmail.com

almost 3 years ago

In reply to: Dave Cramer (#99)

#101

Stephen Frost

sfrost@snowman.net

almost 3 years ago

In reply to: Dave Cramer (#99)

#102

Bruce Momjian

bruce@momjian.us

almost 3 years ago

In reply to: Ashutosh Bapat (#45)

#103

Bruce Momjian

bruce@momjian.us

almost 3 years ago

In reply to: Thomas Munro (#58)

#104

Dave Cramer

pg@fastcrypt.com

almost 3 years ago

In reply to: Stephen Frost (#101)

#105

Hannu Krosing

hannu@tm.ee

almost 3 years ago

In reply to: Heikki Linnakangas (#1)

#106

James Addison

jay@jp-hosting.net

almost 3 years ago

In reply to: Heikki Linnakangas (#1)

#107

Dilip Kumar

dilipbalaut@gmail.com

almost 3 years ago

In reply to: Hannu Krosing (#105)

#108

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 3 years ago

In reply to: Dave Cramer (#104)

#109

Joel Jacobson

joel@compiler.org

almost 3 years ago

In reply to: Tomas Vondra (#108)

#110

Pavel Borisov

pashkin.elfe@gmail.com

almost 3 years ago

In reply to: Dilip Kumar (#107)

#111

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Pavel Borisov (#110)

#112

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 3 years ago

In reply to: Hannu Krosing (#105)

#113

Michael Paquier

michael@paquier.xyz

almost 3 years ago

In reply to: Andres Freund (#111)

#114

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

almost 3 years ago

In reply to: Pavel Borisov (#110)

#115

Kyotaro Horiguchi

horikyota.ntt@gmail.com

almost 3 years ago

In reply to: Konstantin Knizhnik (#114)

#116

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

almost 3 years ago

In reply to: Kyotaro Horiguchi (#115)

#117

Kyotaro Horiguchi

horikyota.ntt@gmail.com

almost 3 years ago

In reply to: Konstantin Knizhnik (#116)

#118

Andreas Karlsson

andreas.karlsson@percona.com

almost 3 years ago

In reply to: Konstantin Knizhnik (#116)

#119

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

almost 3 years ago

In reply to: Kyotaro Horiguchi (#117)

#120

Kyotaro Horiguchi

horikyota.ntt@gmail.com

almost 3 years ago

In reply to: Konstantin Knizhnik (#119)

#121

Andreas Karlsson

andreas.karlsson@percona.com

almost 3 years ago

In reply to: Kyotaro Horiguchi (#120)

#122

James Addison

jay@jp-hosting.net

almost 3 years ago

In reply to: Andres Freund (#111)

#123

Hannu Krosing

hannu@tm.ee

almost 3 years ago

In reply to: Kyotaro Horiguchi (#115)

#124

Robert Haas

robertmhaas@gmail.com

almost 3 years ago

In reply to: James Addison (#122)

#125

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Kyotaro Horiguchi (#115)

#126

Robert Haas

robertmhaas@gmail.com

almost 3 years ago

In reply to: Hannu Krosing (#123)

#127

James Addison

jay@jp-hosting.net

almost 3 years ago

In reply to: Robert Haas (#124)

#128

James Addison

jay@jp-hosting.net

almost 3 years ago

In reply to: Konstantin Knizhnik (#114)

#129

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

almost 3 years ago

In reply to: James Addison (#128)

#130

James Addison

jay@jp-hosting.net

almost 3 years ago

In reply to: Konstantin Knizhnik (#129)

#131

Hannu Krosing

hannu@tm.ee

almost 3 years ago

In reply to: Konstantin Knizhnik (#129)

#132

Hannu Krosing

hannu@tm.ee

almost 3 years ago

In reply to: James Addison (#130)

#133

Hannu Krosing

hannu@tm.ee

almost 3 years ago

In reply to: Hannu Krosing (#132)

#134

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

almost 3 years ago

In reply to: James Addison (#130)

#135

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

almost 3 years ago

In reply to: Hannu Krosing (#132)

#136

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 3 years ago

In reply to: Bruce Momjian (#102)

#137

David Geier

geidav.pg@gmail.com

almost 3 years ago

In reply to: Andres Freund (#50)

#138

Matthias van de Meent

boekewurm+postgres@gmail.com

almost 3 years ago

In reply to: Hannu Krosing (#133)

#139

Merlin Moncure

mmoncure@gmail.com

almost 3 years ago

In reply to: David Geier (#137)

#140

Mark Woodward

woodwardm@google.com

almost 3 years ago

In reply to: Heikki Linnakangas (#112)

#141

David Geier

geidav.pg@gmail.com

almost 3 years ago

In reply to: Merlin Moncure (#139)

#142

Stephen Frost

sfrost@snowman.net

almost 3 years ago

In reply to: David Geier (#141)