regexp_replace to remove sql comments

Started by Mikeover 10 years ago4 messagesgeneral
Jump to latest
#1Mike
mike@wolman.co.uk

Hi,

I am trying to clean up the query field returned by the
pg_stat_statements extension and remove all comments.

Some of the queries in the query field contain comments like '-- some
comment' and also '/* c style comments */'

I have managed to strip off the '--' comments and also white space but
after trying numerous regex for this via google but I am stuck.

WITH to_clean AS (
SELECT
regexp_replace(
regexp_replace(trim(query), '--[^\r\n]*', '') --clear up
comments like this one <-- this is ok
, '\s+', ' ', 'g') as q --clear up white space <-- this is ok
FROM public.pg_stat_statements
WHERE dbid IN (SELECT oid FROM pg_database WHERE datname =
current_database())
)

SELECT regexp_replace(q,'/\*.*\*/','') as q /* strip off comments like
this */ <-- cannot get a regex to do this
FROM to_clean ORDER BY q

Im now thinking it may be better to do in a pgsql function as I think if
the comments are in queries then they need to be ignored.

Has anyone done anything like this?

Thanks,

Mike.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Marc Mamin
M.Mamin@intershop.de
In reply to: Mike (#1)
Re: regexp_replace to remove sql comments

________________________________________
Von: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org]&quot; im Auftrag von &quot;Mike [mike@wolman.co.uk]
Gesendet: Mittwoch, 28. Oktober 2015 20:04
An: pgsql-general@postgresql.org
Betreff: [GENERAL] regexp_replace to remove sql comments

Hi,

I am trying to clean up the query field returned by the
pg_stat_statements extension and remove all comments.

Some of the queries in the query field contain comments like '-- some
comment' and also '/* c style comments */'

I have managed to strip off the '--' comments and also white space but
after trying numerous regex for this via google but I am stuck.

WITH to_clean AS (
SELECT
regexp_replace(
regexp_replace(trim(query), '--[^\r\n]*', '') --clear up
comments like this one <-- this is ok
, '\s+', ' ', 'g') as q --clear up white space <-- this is ok
FROM public.pg_stat_statements
WHERE dbid IN (SELECT oid FROM pg_database WHERE datname =
current_database())
)

SELECT regexp_replace(q,'/\*.*\*/','') as q /* strip off comments like
this */ <-- cannot get a regex to do this
FROM to_clean ORDER BY q

Hi,
Does this help ?

select regexp_replace(' aaa /*
x
y
z
*/ foo', '\/\*.+\*\/','','g'):

regards,

Marc Mamin

Im now thinking it may be better to do in a pgsql function as I think if
the comments are in queries then they need to be ignored.

Has anyone done anything like this?

Thanks,

Mike.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Mike
mike@wolman.co.uk
In reply to: Marc Mamin (#2)
Re: regexp_replace to remove sql comments

Thanks with a bit of moving stuff about I think thats sorted it - in
case anyone every needs it:

SELECT
query,
trim(regexp_replace(
regexp_replace(
regexp_replace(query,'\/\*.+\*\/','','g'),
'--[^\r\n]*', ' ', 'g')
, '\s+', ' ', 'g')) as q
FROM public.pg_stat_statements
WHERE dbid IN (SELECT oid FROM pg_database WHERE datname =
current_database())

order by query

Thanks again,

Mike.

On 28/10/2015 22:43, Marc Mamin wrote:

', '\/\*.+\*\/','','g'):

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mike (#3)
Re: regexp_replace to remove sql comments

Mike <mike@wolman.co.uk> writes:

Thanks with a bit of moving stuff about I think thats sorted it - in
case anyone every needs it:
SELECT
query,
trim(regexp_replace(
regexp_replace(
regexp_replace(query,'\/\*.+\*\/','','g'),
'--[^\r\n]*', ' ', 'g')
, '\s+', ' ', 'g')) as q
FROM public.pg_stat_statements
WHERE dbid IN (SELECT oid FROM pg_database WHERE datname =
current_database())

This doesn't look too reliable from here:

1. Doesn't handle multiline /* comments.

2. Does wrong thing if more than one /* comment appears on one line.
(You could improve that by using .*? instead of .+, but then it'd
do the wrong thing with nested /* comments.)

3. Breaks things if either -- or /* appear inside a string literal,
double-quoted identifier, or $$ literal.

I'm not at all sure that it's possible to handle this requirement 100%
correctly with regexes; they're unable to do context-sensitive processing.

But so far as pg_stat_statements is concerned, why would you need to
do this at all? The duplicate-query elimination it does should be
insensitive to comments already.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general