getting rid of "thread fork emulation" in pgbench?

Started by Fabien COELHOalmost 11 years ago8 messages
#1Fabien COELHO
coelho@cri.ensmp.fr

Hello hackers,

This is take 2, I already suggested this 2 years ago...

There is a "thread fork emulation" hack which allows pgbench to be
compiled without threads by emulating them with forks, obviously with
limited capabilities. This feature is triggered by configuring postgresql
with --disable-thread-safety.

I'm not aware of platforms which would still benefit from this feature (we
are probably talking of a PostgreSQL 9.6 maybe released in 2016), and this
thread emulation is a real pain for maintaining and extending it: some
features cannot be implemented, or must be implemented twice, or must be
implemented in a heavy way because it cannot be assumed to be in a
threaded implementation.

My question is: would someone still object strongly to the removal of this
"thread fork emulation" hack in pgbench?

That would mean arguing that it is more important than a simpler code base
for pgbench, which would help both its maintenance and extension, and must
be kept despite these costs, hence the "strongly".

Tom's opinion, 2 years ago, was to object strongly:
/messages/by-id/24846.1372352618@sss.pgh.pa.us

"I would object strongly to that, as it would represent a significant
movement of the goalposts on what is required to build Postgres at all, ie
platforms on which --enable-thread-safety is unavailable or expensive
would be out in the cold. Perhaps that set is approaching empty, but a
project that's still standardized on C89 has little business making such a
choice IMO."

However, about not providing a full feature under thread emulation:

"Perhaps that's worth doing. I agree with Fabien that full support of
this feature in the process model is more trouble than it's worth,
though, and I wouldn't scream loudly if we just didn't support it.
--disable-thread-safety doesn't have to be entirely penalty-free."

Robert Haas did also object strongly:
/messages/by-id/CA+TgmoaJawAM0A4N1RnsndFhTOYYbwnt+7N7XWLcW=n-uDC79Q@mail.gmail.com

"I don't believe that to be an acceptable restriction. We generally
require features to work on all platforms we support. We have made
occasional compromises there, but generally only when the restriction is
fundamental to the platform rather than for developer convenience."

So the underlying question is, does Tom and Robert's opinion have changed
from 2 years ago?

If not, I would really like to know at least *one* current platform for
which --disable-thread-safety is required, just to know why I waste time
over such things.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Fabien COELHO (#1)
Re: getting rid of "thread fork emulation" in pgbench?

Fabien COELHO <coelho@cri.ensmp.fr> writes:

There is a "thread fork emulation" hack which allows pgbench to be
compiled without threads by emulating them with forks, obviously with
limited capabilities. This feature is triggered by configuring postgresql
with --disable-thread-safety.

I'm not aware of platforms which would still benefit from this feature (we
are probably talking of a PostgreSQL 9.6 maybe released in 2016), and this
thread emulation is a real pain for maintaining and extending it: some
features cannot be implemented, or must be implemented twice, or must be
implemented in a heavy way because it cannot be assumed to be in a
threaded implementation.

My question is: would someone still object strongly to the removal of this
"thread fork emulation" hack in pgbench?

By my count, the pthread-emulation code amounts to about 175 lines out of
the nearly 4000 in pgbench.c. That's not an undue burden IMO (it's not,
for comparison's sake, very much more than the amount of WIN32-specific
code in that file). And what will you do instead? It would be fine
I think for pgbench to not allow --jobs different from 1 on a threadless
platform, but not for it to fail to work altogether. It looks to me like
allowing it to compile without that code would take nearly as much
effort/mess as what's there now.

So the underlying question is, does Tom and Robert's opinion have changed
from 2 years ago?

Not mine.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Tom Lane (#2)
Re: getting rid of "thread fork emulation" in pgbench?

Hello Tom,

My question is: would someone still object strongly to the removal of this
"thread fork emulation" hack in pgbench?

By my count, the pthread-emulation code amounts to about 175 lines out of
the nearly 4000 in pgbench.c. That's not an undue burden IMO (it's not,
for comparison's sake, very much more than the amount of WIN32-specific
code in that file).

From my point of view, the burden is how things can be implemented if you
have to assume "no threads".

In the discussion at hand which motivates this renewed request, the
question is whether external merge sorts implying rescanning and
reprinting files over and over should be implemented in pgbench so as to
recombine log files.

Now if we have threads, it is simpler (if the overhead is reasonable) that
threads share a file handle and just generate one file, so there is no
merging. That is not possible with processes at least for the aggregated
logs because the thread data must be combined, or would require that
pgbench use shared memory and so on, but then it is really a lot of bother
for a limited feature.

The alternative is that the feature would not be available if no thread,
but then people object to that, on principle.

And what will you do instead? It would be fine I think for pgbench to
not allow --jobs different from 1 on a threadless platform, but not for
it to fail to work altogether.

Sure. No thread really means working with only one thread.

It looks to me like allowing it to compile without that code would take
nearly as much effort/mess as what's there now.

My motivation is to simplify how things are done by simply assuming that
threads are available and can share data, esp for counters.

So the underlying question is, does Tom and Robert's opinion have changed
from 2 years ago?

Not mine.

Ok! I'll ask again in 2017, and probably again in 2019:-)

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Fabien COELHO (#3)
Re: getting rid of "thread fork emulation" in pgbench?

Fabien COELHO <coelho@cri.ensmp.fr> writes:

And what will you do instead? It would be fine I think for pgbench to
not allow --jobs different from 1 on a threadless platform, but not for
it to fail to work altogether.

Sure. No thread really means working with only one thread.

It looks to me like allowing it to compile without that code would take
nearly as much effort/mess as what's there now.

My motivation is to simplify how things are done by simply assuming that
threads are available and can share data, esp for counters.

pgbench does not get to "assume that threads are available", at least
not as long as the project as a whole supports --disable-thread-safety.

As I said, I wouldn't have a problem with restricting the --jobs setting
on threadless platforms, which seems like it would fix your problem since
you wouldn't need to have more than one process involved.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Tom Lane (#4)
Re: getting rid of "thread fork emulation" in pgbench?

As I said, I wouldn't have a problem with restricting the --jobs setting
on threadless platforms, which seems like it would fix your problem since
you wouldn't need to have more than one process involved.

Indeed!

So I undestand that "thread fork emulation" could eventually go from your
point of view, provided that pgbench compiles and runs with one job.

I'll try to submit a patch to that effect, maybe to June CF.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Robert Haas
robertmhaas@gmail.com
In reply to: Fabien COELHO (#3)
Re: getting rid of "thread fork emulation" in pgbench?

On Sun, Mar 29, 2015 at 1:49 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Now if we have threads, it is simpler (if the overhead is reasonable) that
threads share a file handle and just generate one file, so there is no
merging.

I really, really wish you'd stop arguing against the patch to allow
merging of pgbench logs in this basis. There may or may not be other
reasons to reject that patch, but arguing that we can or should reject
it because of the hypothetical possibility that sharing a file handle
will work out just isn't right. People who work hard to develop a
patch that does something useful don't deserve to have it kicked to
the curb because someone can imagine a development path that would
eventually obsolete it. That patch deserves to be evaluated on its
own merits.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Robert Haas (#6)
Re: getting rid of "thread fork emulation" in pgbench?

Hello Robert,

I really, really wish you'd stop arguing against the patch to allow
merging of pgbench logs in this basis.

Hmmm. I'm lost. I thought that discussing how to best implement a feature
was part of reviewing a patch.

There may or may not be other reasons to reject that patch, but arguing
that we can or should reject it because of the hypothetical possibility
that sharing a file handle will work out just isn't right.

I must say that I do not understand why discussing options, and actually
implementing and testing them for hours to see the performance impact, is
bad, but it will save time next time.

As for pgbench thread forlk emulation induced complexity, I already paid
for some stuff I contributed to pgbench, so I feel I'm entitled to rant
about it:-)

People who work hard to develop a patch

I know. I also spent quite some time on testing alternate implementations
on this one.

that does something useful don't deserve to have it kicked to the curb
because someone can imagine a development path that would eventually
obsolete it. That patch deserves to be evaluated on its own merits.

Sure. I did that in the first post. The code is heavy and awkward, include
copy-pastes, the merge sorts (there are 2 of thems) complexity are bad,
the different log formats are hard coded and must be updated in several
place if modified or extended, ...

IMO this would be best outside of pgbench in a script, but this means that
the script must be given indication about the file format which depends on
the options used to run pgbench, which makes it awkward as well.

So the current status is between "rejected" and "returned with feedback",
make your pick.

Now I think that the feature provides value and I am trying to help
salvage it. I can do other things.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Robert Haas
robertmhaas@gmail.com
In reply to: Fabien COELHO (#7)
Re: getting rid of "thread fork emulation" in pgbench?

On Mon, Mar 30, 2015 at 2:00 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

I really, really wish you'd stop arguing against the patch to allow
merging of pgbench logs in this basis.

Hmmm. I'm lost. I thought that discussing how to best implement a feature
was part of reviewing a patch.

Of course it is. But the feature you are talking about and the
feature Tomas is talking about are two different features.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers