\dn [PATTERN] handling not quite right...

Started by Sean Chittendenabout 22 years ago3 messagesbugs

sean@chittenden.org

about 22 years ago

I haven't looked in great detail into why this is happpening, but it
seems as though processNamePattern() doesn't handle ?'s correctly in
the negative lookahead context correctly.

1) \dn [pattern] strips ?'s and replaces them with periods. This may
be intentional (as the comment in describe.c suggests, converting input
from shell-style wildcards gets converted into regexp notation), but is
quite annoying. Ex:

test=# \dn foo(?!_log|_shadow)
********* QUERY **********
SELECT n.nspname AS "Name",
u.usename AS "Owner"
FROM pg_catalog.pg_namespace n LEFT JOIN pg_catalog.pg_user u
ON n.nspowner=u.usesysid
WHERE (n.nspname NOT LIKE 'pg\\_temp\\_%' OR
n.nspname = (pg_catalog.current_schemas(true))[1])
AND n.nspname ~ '^foo(.!_log|_shadow)$'
ORDER BY 1;
**************************

Which is incorrect, IMHO. Instead the last bit of the query should be:

AND n.nspname ~ '^foo(?!_log|_shadow)$'

2) This brings up a large deficiency with the way that \d? [pattern]
handling is done in psql(1). It'd be slick if there was a way to have
psql's pattern routine look at the first non-whitespace character or
two to change change the structure of the query. Something like \dn
!.*_shadow% would change the RE operator from ~ to !~ and \dn %bar%
would translate to LIKE('bar%'). Doing the regexp equiv of
!LIKE('%_shadow') isn't trivial because '^.*(?!_shadow)$' doesn't
return the expected result for various reasons. Oh! This'd be a "gun
pointed at foot" feature, but having the first character being an =
would, without escaping, drop the remainder of the input directly into
the query (ex: \dn =nspname != (LIKE('%_log') OR LIKE('%_shadow'))).
Maybe a psql(1) variable that changes the behavior of the pattern
queries from using an RE to a LIKE statement could also be a
possibility. The more I think about this, a leading pipe could be used
to pipe the output to a utility, so that \dn | egrep -v '(log|shadow)
would work and would be the easiest solution.

Maybe a better "bug report" would be, what's the suggested way of doing:

n.nspname !~ '_(log|shadow)$'?

from a list pattern?

-sc

--
Sean Chittenden

Tom Lane

tgl@sss.pgh.pa.us

about 22 years ago

In reply to: Sean Chittenden (#1)

Re: \dn [PATTERN] handling not quite right...

Sean Chittenden <sean@chittenden.org> writes:

I haven't looked in great detail into why this is happpening, but it
seems as though processNamePattern() doesn't handle ?'s correctly in
the negative lookahead context correctly.

Negative lookahead context!? You are several sigmas beyond the subset
of regex functionality that \d and friends are intended to support.
Given that we're defining * and ? as shell-style wildcards, it's not
going to be feasible to handle very much of ordinary regex usage let
alone advanced.

The more I think about this, a leading pipe could be used
to pipe the output to a utility, so that \dn | egrep -v '(log|shadow)
would work and would be the easiest solution.

This on the other hand seems more like a potentially useful feature,
although I'm unclear on what you expect to get sent through the pipe
exactly --- you want column headers for instance? What if you're using
a nondefault display layout?

regards, tom lane

Sean Chittenden

sean@chittenden.org

about 22 years ago

In reply to: Tom Lane (#2)

Re: \dn [PATTERN] handling not quite right...

I haven't looked in great detail into why this is happpening, but it
seems as though processNamePattern() doesn't handle ?'s correctly in
the negative lookahead context correctly.

Negative lookahead context!? You are several sigmas beyond the subset
of regex functionality that \d and friends are intended to support.
Given that we're defining * and ? as shell-style wildcards, it's not
going to be feasible to handle very much of ordinary regex usage let
alone advanced.

I was worried you'd say as much. I'm in a situation where I've got a
few hundred schemas floating around now and about half of them end with
_log or _shadow and I was surprised at how non-trivial it was to filter
out the _log or _shadow schemas with \dn. I tried thinking up the psql
equiv of tcsh's fignore but had no luck (ex: set fignore = (\~ .o
.bak)).

The more I think about this, a leading pipe could be used
to pipe the output to a utility, so that \dn | egrep -v '(log|shadow)
would work and would be the easiest solution.

This on the other hand seems more like a potentially useful feature,
although I'm unclear on what you expect to get sent through the pipe
exactly --- you want column headers for instance? What if you're using
a nondefault display layout?

Instead of using printf(), fprintf(), fwrite(), or whatever it is that
psql(1) uses internally for displaying result sets, have it use the
following chunk of pseudo code:

if (pipe_symbol_found) {
char buf[8192];
size_t len = 0;
memset(&buf, 0, sizeof(buf));

fh = popen(..., "r+"); /* or setsocketpair() + fork() */
fwrite(formatted_output_buffer, strlen(formatted_output_buffer), 1,
fh);

while((len = read(fileno(fh), buf, sizeof(buf)))) {
fwrite(buf, len, 1, stdout);
}
} else {
/* whatever the current code does */
}

That doesn't take into account the set option that lets you write the
output to a file, but that is trivial to handle. To answer your
question, I'd send _everything_ through the pipe and use the pipe as a
blanket IO filter. I haven't thought about this, but would it be
possible to hand the data off to sh(1) and have it handle the
pipe/redirection foo that way psql doesn't have to have any
pipe/redirection brains? If so, I think that'd be slick since you
could do things like '\dn | tail -n +3 | grep -v blah' to handle your
concern about having the header sent through and a utility not wanting
it.

Too bad tee(1) doesn't support a -p option to have tee(1)'s argument
sent to sh(1) or a pipe instead of a file, then there'd be some real
interesting things that one could script. Ex:

\dn | tee -p 'head -n 3 >> /dev/stdout' | tail -n +3 | egrep -v
'_(log|shadow)$'

Which'd show you the header, but everything after the header would be
sent to egrep(1). I can't understand why win32 users think *NIX's CLI
can be confusing.... *grin*.

Just some thoughts. -sc

--
Sean Chittenden