OS scheduler bugs affecting high-concurrency contention

Started by Kevin Grittnerover 9 years ago2 messages

kgrittn@gmail.com

over 9 years ago

There is a paper that any one interested in performance at high
concurrency, especially in Linux, should read[1]Jean-Pierre Lozi, Baptiste Lepers, Justin Funston, Fabien Gaud, Vivien Quéma, Alexandra Fedorova. The Linux Scheduler: a Decade of Wasted Cores. In Proceedings of the 11th European Conference on Computer Systems, EuroSys’16. April, 2016, London, UK. http://www.ece.ubc.ca/~sasha/papers/eurosys16-final29.pdf. While doing
other work, a group of researchers saw behavior that they suspected
was due to scheduler bugs in Linux. There were no tools that made
proving that practical, so they developed such a tool set and used
it to find four bugs in the Linux kernel which were introduced in
these releases, have not yet been fixed, and have this following
maximum impact when running NAS benchmarks, based on running with
and without the researchers' fixes for the bugs:

2.6.32: 22%
2,6.38: 13x
3.9: 27x
3.19: 138x

That's right -- one of these OS scheduler bugs in production
versions of Linux can make one of NASA's benchmarks run for 138
times as long as it does without the bug. I don't feel that I can
interpret the results of any high-concurrency benchmarks in a
meaningful way without knowing which of these bugs were present in
the OS used for the benchmark. Just as an example, it is helpful
to know that the benchmarks Andres presented were run on 3.16, so
it would have three of these OS bugs affecting results, but not the
most severe one. I encourage you to read the paper an draw your
own conclusions.

Anyway, please don't confuse this thread with the one on the
"snapshot too old" patch -- I am still working on that and will
post results there when they are ready.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1]: Jean-Pierre Lozi, Baptiste Lepers, Justin Funston, Fabien Gaud, Vivien Quéma, Alexandra Fedorova. The Linux Scheduler: a Decade of Wasted Cores. In Proceedings of the 11th European Conference on Computer Systems, EuroSys’16. April, 2016, London, UK. http://www.ece.ubc.ca/~sasha/papers/eurosys16-final29.pdf
Vivien Quéma, Alexandra Fedorova. The Linux Scheduler: a Decade
of Wasted Cores. In Proceedings of the 11th European
Conference on Computer Systems, EuroSys’16. April, 2016,
London, UK.
http://www.ece.ubc.ca/~sasha/papers/eurosys16-final29.pdf

[2]: NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html
http://www.nas.nasa.gov/publications/npb.html

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andrea Suisani

sickpig@opinioni.net

over 9 years ago

In reply to: Kevin Grittner (#1)

Re: OS scheduler bugs affecting high-concurrency contention

Hi,

On 04/16/2016 04:15 PM, Kevin Grittner wrote:

There is a paper that any one interested in performance at high
concurrency, especially in Linux, should read[1]. While doing
other work, a group of researchers saw behavior that they suspected
was due to scheduler bugs in Linux. There were no tools that made
proving that practical, so they developed such a tool set and used
it to find four bugs in the Linux kernel which were introduced in
these releases, have not yet been fixed, and have this following
maximum impact when running NAS benchmarks, based on running with
and without the researchers' fixes for the bugs:

2.6.32: 22%
2,6.38: 13x
3.9: 27x
3.19: 138x

That's right -- one of these OS scheduler bugs in production
versions of Linux can make one of NASA's benchmarks run for 138
times as long as it does without the bug. I don't feel that I can
interpret the results of any high-concurrency benchmarks in a
meaningful way without knowing which of these bugs were present in
the OS used for the benchmark. Just as an example, it is helpful
to know that the benchmarks Andres presented were run on 3.16, so
it would have three of these OS bugs affecting results, but not the
most severe one. I encourage you to read the paper an draw your
own conclusions.

Anyway, please don't confuse this thread with the one on the
"snapshot too old" patch -- I am still working on that and will
post results there when they are ready.

Thanks for the link, appreciated.

On slightly related topic, Jens Axboe proposed a patchset [1]"[PATCHSET v3][RFC] Make background writeback not suck" http://thread.gmane.org/gmane.linux.kernel/2186732
to improve the performance of background buffered writeback.

On Lwn.net an article about the issue at hand has been recently published [2]"Toward less-annoying background writeback" https://lwn.net/SubscriberLink/682582/93d9e5b6bed03a32/.

Maybe this work could somewhat solve the problem experienced by PostgreSQL users
while checkpoint process flushes all pending changes to disk and recycles the
transaction logs.

--
Andrea Suisani
suisani@opinioni.net
Demetra opinioni.net srl

[1]: "[PATCHSET v3][RFC] Make background writeback not suck" http://thread.gmane.org/gmane.linux.kernel/2186732
http://thread.gmane.org/gmane.linux.kernel/2186732

[2]: "Toward less-annoying background writeback" https://lwn.net/SubscriberLink/682582/93d9e5b6bed03a32/
https://lwn.net/SubscriberLink/682582/93d9e5b6bed03a32/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers