Death by regexp_replace

Started by Benedikt Grundmannalmost 10 years ago13 messages
#1Benedikt Grundmann
bgrundmann@janestreet.com

Today we discovered that we had a backend whose client had gone away, the
automatic query watching process had send both pg_cancel and
pg_terminate_backend but nevertheless the process was sitting there
consuming resources and had been for over 1 day...

gdb revealed that we were sitting in pg_regexec (we forced it to return 16
aka invalid regex to return our system into a good state).

Here is the regular expression and the text to run on:

*WARNING DO NOT DO THIS ON A PRODUCTION BOX*

select regexp_replace('VODI GR,VOD LN,VOD LN,VODN MM,VODPF US,VOD US,VZC
LN', '([^,]+)(,*\1)+', '\1');

This was in postgres 9.2

Cheers,

Bene

#2Robert Haas
robertmhaas@gmail.com
In reply to: Benedikt Grundmann (#1)
Re: Death by regexp_replace

On Fri, Jan 15, 2016 at 10:12 AM, Benedikt Grundmann
<bgrundmann@janestreet.com> wrote:

Today we discovered that we had a backend whose client had gone away, the
automatic query watching process had send both pg_cancel and
pg_terminate_backend but nevertheless the process was sitting there
consuming resources and had been for over 1 day...

gdb revealed that we were sitting in pg_regexec (we forced it to return 16
aka invalid regex to return our system into a good state).

Here is the regular expression and the text to run on:

WARNING DO NOT DO THIS ON A PRODUCTION BOX

select regexp_replace('VODI GR,VOD LN,VOD LN,VODN MM,VODPF US,VOD US,VZC
LN', '([^,]+)(,*\1)+', '\1');

This was in postgres 9.2

9.2.what? Tom just fixed a whole bunch of bugs in this area, so if
you're running less than 9.2.14, please test whether this can be
reproduced with that version.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Jan de Visser
jan@de-visser.net
In reply to: Robert Haas (#2)
Re: Death by regexp_replace

On 2016-01-15 10:25 AM, Robert Haas wrote:

On Fri, Jan 15, 2016 at 10:12 AM, Benedikt Grundmann
<bgrundmann@janestreet.com> wrote:

Today we discovered that we had a backend whose client had gone away, the
automatic query watching process had send both pg_cancel and
pg_terminate_backend but nevertheless the process was sitting there
consuming resources and had been for over 1 day...

gdb revealed that we were sitting in pg_regexec (we forced it to return 16
aka invalid regex to return our system into a good state).

Here is the regular expression and the text to run on:

WARNING DO NOT DO THIS ON A PRODUCTION BOX

select regexp_replace('VODI GR,VOD LN,VOD LN,VODN MM,VODPF US,VOD US,VZC
LN', '([^,]+)(,*\1)+', '\1');

This was in postgres 9.2

9.2.what? Tom just fixed a whole bunch of bugs in this area, so if
you're running less than 9.2.14, please test whether this can be
reproduced with that version.

I just tried this on 9.4.5 (stock Ubuntu 15.10 release), waited a minute
and killed the backend.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Benedikt Grundmann (#1)
Re: Death by regexp_replace

Benedikt Grundmann <bgrundmann@janestreet.com> wrote:

Today we discovered that we had a backend whose client had gone away, the
automatic query watching process had send both pg_cancel and
pg_terminate_backend but nevertheless the process was sitting there
consuming resources and had been for over 1 day...
gdb revealed that we were sitting in pg_regexec (we forced it to return 16
aka invalid regex to return our system into a good state).
Here is the regular expression and the text to run on:
*WARNING DO NOT DO THIS ON A PRODUCTION BOX*
select regexp_replace('VODI GR,VOD LN,VOD LN,VODN MM,VODPF US,VOD US,VZC
LN', '([^,]+)(,*\1)+', '\1');

This was in postgres 9.2

9.2 what? This responds to cancel just fine for me. See 9.2.14
release notes.

(FWIW, I think you probably wanted ,+ not ,* in the regex, else there's
practically no constraint there, leading to having to consider O(N^2)
or more possibilities.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Kevin Grittner
kgrittn@gmail.com
In reply to: Tom Lane (#4)
Re: Death by regexp_replace

On Fri, Jan 15, 2016 at 9:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

*WARNING DO NOT DO THIS ON A PRODUCTION BOX*
select regexp_replace('VODI GR,VOD LN,VOD LN,VODN MM,VODPF US,VOD US,VZC
LN', '([^,]+)(,*\1)+', '\1');

This responds to cancel just fine for me.

(FWIW, I think you probably wanted ,+ not ,* in the regex, else there's
practically no constraint there, leading to having to consider O(N^2)
or more possibilities.)

On master (commit cf7dfbf2) it responds to pg_cancel_backend(),
but it seems to be in an endless loop until you do that.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Benedikt Grundmann
bgrundmann@janestreet.com
In reply to: Kevin Grittner (#5)
Re: Death by regexp_replace

9.2.6

On Fri, Jan 15, 2016 at 3:48 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

Show quoted text

On Fri, Jan 15, 2016 at 9:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

*WARNING DO NOT DO THIS ON A PRODUCTION BOX*
select regexp_replace('VODI GR,VOD LN,VOD LN,VODN MM,VODPF US,VOD US,VZC
LN', '([^,]+)(,*\1)+', '\1');

This responds to cancel just fine for me.

(FWIW, I think you probably wanted ,+ not ,* in the regex, else there's
practically no constraint there, leading to having to consider O(N^2)
or more possibilities.)

On master (commit cf7dfbf2) it responds to pg_cancel_backend(),
but it seems to be in an endless loop until you do that.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#5)
Re: Death by regexp_replace

Kevin Grittner <kgrittn@gmail.com> writes:

On Fri, Jan 15, 2016 at 9:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

(FWIW, I think you probably wanted ,+ not ,* in the regex, else there's
practically no constraint there, leading to having to consider O(N^2)
or more possibilities.)

On master (commit cf7dfbf2) it responds to pg_cancel_backend(),
but it seems to be in an endless loop until you do that.

A bit of further experimentation suggests the runtime growth is actually
more like O(2^N). It will terminate in a reasonable amount of time if the
input string is about half as long as the given example.

The problem is that so far as the DFA engine is concerned, the pattern
substring '(,*\1)+' can match almost anything at all, because it's
equivalent to '(,*[^,]+)+' which is easily seen to match any string
whatever that's got at least one non-comma. So, for each possible match
to the substring '([^,]+)', of which there are lots, it has to consider
every possible way of breaking up all the rest of the string into one or
more substrings. The vast majority of those ways will fail when the
backref match is checked, but there's no way to realize it before that.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Benedikt Grundmann
bgrundmann@janestreet.com
In reply to: Tom Lane (#7)
Re: Death by regexp_replace

On Fri, Jan 15, 2016 at 4:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Kevin Grittner <kgrittn@gmail.com> writes:

On Fri, Jan 15, 2016 at 9:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

(FWIW, I think you probably wanted ,+ not ,* in the regex, else there's
practically no constraint there, leading to having to consider O(N^2)
or more possibilities.)

On master (commit cf7dfbf2) it responds to pg_cancel_backend(),
but it seems to be in an endless loop until you do that.

A bit of further experimentation suggests the runtime growth is actually
more like O(2^N). It will terminate in a reasonable amount of time if the
input string is about half as long as the given example.

The problem is that so far as the DFA engine is concerned, the pattern
substring '(,*\1)+' can match almost anything at all, because it's
equivalent to '(,*[^,]+)+' which is easily seen to match any string
whatever that's got at least one non-comma. So, for each possible match
to the substring '([^,]+)', of which there are lots, it has to consider
every possible way of breaking up all the rest of the string into one or
more substrings. The vast majority of those ways will fail when the
backref match is checked, but there's no way to realize it before that.

To be clear I'm perfectly happy with that query taking forever (I didn't

write it ;-)). The only thing I was unhappy about was that
pg_cancel/terminate_backend didn't work. If that is fixed great.

Show quoted text

regards, tom lane

#9Benedikt Grundmann
bgrundmann@janestreet.com
In reply to: Benedikt Grundmann (#8)
Re: Death by regexp_replace

On Fri, Jan 15, 2016 at 4:39 PM, Benedikt Grundmann <
bgrundmann@janestreet.com> wrote:

On Fri, Jan 15, 2016 at 4:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Kevin Grittner <kgrittn@gmail.com> writes:

On Fri, Jan 15, 2016 at 9:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

(FWIW, I think you probably wanted ,+ not ,* in the regex, else there's
practically no constraint there, leading to having to consider O(N^2)
or more possibilities.)

On master (commit cf7dfbf2) it responds to pg_cancel_backend(),
but it seems to be in an endless loop until you do that.

A bit of further experimentation suggests the runtime growth is actually
more like O(2^N). It will terminate in a reasonable amount of time if the
input string is about half as long as the given example.

The problem is that so far as the DFA engine is concerned, the pattern
substring '(,*\1)+' can match almost anything at all, because it's
equivalent to '(,*[^,]+)+' which is easily seen to match any string
whatever that's got at least one non-comma. So, for each possible match
to the substring '([^,]+)', of which there are lots, it has to consider
every possible way of breaking up all the rest of the string into one or
more substrings. The vast majority of those ways will fail when the
backref match is checked, but there's no way to realize it before that.

To be clear I'm perfectly happy with that query taking forever (I didn't

write it ;-)). The only thing I was unhappy about was that
pg_cancel/terminate_backend didn't work. If that is fixed great.

regards, tom lane

Hmm I just wanted to get the rpm for the latest 9.2 release for centos6 but
it looks like you haven't released at least the link on this page for 9.2

http://yum.postgresql.org/repopackages.php

says 7 in the filename which is certainly not 14 ;-)

http://yum.postgresql.org/9.2/redhat/rhel-6-x86_64/pgdg-centos92-9.2-7.noarch.rpm

Is that expected?

Thanks,

Bene

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Benedikt Grundmann (#9)
package links on website (was Re: [HACKERS] Death by regexp_replace)

Benedikt Grundmann <bgrundmann@janestreet.com> wrote:

Hmm I just wanted to get the rpm for the latest 9.2 release for centos6 but
it looks like you haven't released at least the link on this page for 9.2

http://yum.postgresql.org/repopackages.php

says 7 in the filename which is certainly not 14 ;-)

http://yum.postgresql.org/9.2/redhat/rhel-6-x86_64/pgdg-centos92-9.2-7.noarch.rpm

Is that expected?

I see up-to-date-looking RPMs in

http://yum.postgresql.org/9.2/redhat/rhel-6-x86_64/

but I agree that the repopackages page doesn't seem to be up to date.

regards, tom lane

--
Sent via pgsql-www mailing list (pgsql-www@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-www

#11Robert Haas
robertmhaas@gmail.com
In reply to: Benedikt Grundmann (#9)
Re: Death by regexp_replace

Hmm I just wanted to get the rpm for the latest 9.2 release for centos6 but
it looks like you haven't released at least the link on this page for 9.2

http://yum.postgresql.org/repopackages.php

says 7 in the filename which is certainly not 14 ;-)

http://yum.postgresql.org/9.2/redhat/rhel-6-x86_64/pgdg-centos92-9.2-7.noarch.rpm

Is that expected?

Adding Devrim, who I believe maintains that stuff.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Devrim Gündüz
devrim@gunduz.org
In reply to: Robert Haas (#11)
Re: Death by regexp_replace

Hi,

That is the version of *repo* RPM, not PostgreSQL itself.Once you install it, you can grab the latest version with

yum install postgresql92-server

Regards, Devrim

On January 15, 2016 7:48:53 PM GMT+02:00, Robert Haas <robertmhaas@gmail.com> wrote:

Hmm I just wanted to get the rpm for the latest 9.2 release for

centos6 but

it looks like you haven't released at least the link on this page for

9.2

http://yum.postgresql.org/repopackages.php

says 7 in the filename which is certainly not 14 ;-)

http://yum.postgresql.org/9.2/redhat/rhel-6-x86_64/pgdg-centos92-9.2-7.noarch.rpm

Is that expected?

Adding Devrim, who I believe maintains that stuff.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#13Benedikt Grundmann
bgrundmann@janestreet.com
In reply to: Devrim Gündüz (#12)
Re: Death by regexp_replace

thanks

On Fri, Jan 15, 2016 at 7:22 PM, Devrim Gündüz <devrim@gunduz.org> wrote:

Show quoted text

Hi,

That is the version of *repo* RPM, not PostgreSQL itself.Once you install
it, you can grab the latest version with

yum install postgresql92-server

Regards, Devrim

On January 15, 2016 7:48:53 PM GMT+02:00, Robert Haas <
robertmhaas@gmail.com> wrote:

Hmm I just wanted to get the rpm for the latest 9.2 release for centos6 but

it looks like you haven't released at least the link on this page for 9.2

http://yum.postgresql.org/repopackages.php

says 7 in the filename which is certainly not 14 ;-)

http://yum.postgresql.org/9.2/redhat/rhel-6-x86_64/pgdg-centos92-9.2-7.noarch.rpm

Is that expected?

Adding Devrim, who I believe maintains that stuff.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.