Re: [PATCH v1] pg_ls_tmpdir to show directories
Re-added -hackers.
Thanks for reviewing.
On Fri, Dec 27, 2019 at 05:22:47PM +0100, Fabien COELHO wrote:
The implementation simply extends an existing functions with a boolean to
allow for sub-directories. However, the function does not seem to show
subdir contents recursively. Should it be the case?
STM that "//"-comments are not project policy.
Sure, but the patch is less important than the design, which needs to be
addressed first. The goal is to somehow show tmpfiles (or at least dirs) used
by parallel workers. I mentioned a few possible ways, of which this was the
simplest to implement. Showing files beneath the dir is probably good, but
need to decide how to present it. Should there be a column for the dir (null
if not a shared filesets)? Or some other presentation, like a boolean column
"is_shared_fileset".
I'm unconvinced by the skipping condition:
+ if (!S_ISREG(attrib.st_mode) && + (!dir_ok && S_ISDIR(attrib.st_mode))) continue;which is pretty hard to read. ISTM you meant "not A and not (B and C)"
instead?
I can write it as two ifs. And, it's probably better to say:
if (!ISDIR() || !dir_ok)
..which is same as: !(B && C), as you said.
Catalog update should be a date + number? Maybe this is best left to the
committer?
Yes, but I preferred to include it, causing a deliberate conflict, to ensure
it's not forgotten.
Thanks,
Justin
Import Notes
Reply to msg id not found: alpine.DEB.2.21.1912271545240.27864@pseudoReference msg id not found: alpine.DEB.2.21.1912271545240.27864@pseudo
Re-added -hackers.
Indeed, I left it out by accident. I tried to bounce the original mail but
it did not work.
Thanks for reviewing.
On Fri, Dec 27, 2019 at 05:22:47PM +0100, Fabien COELHO wrote:
The implementation simply extends an existing functions with a boolean to
allow for sub-directories. However, the function does not seem to show
subdir contents recursively. Should it be the case?STM that "//"-comments are not project policy.
Sure, but the patch is less important than the design, which needs to be
addressed first. The goal is to somehow show tmpfiles (or at least dirs) used
by parallel workers. I mentioned a few possible ways, of which this was the
simplest to implement. Showing files beneath the dir is probably good, but
need to decide how to present it. Should there be a column for the dir (null
if not a shared filesets)? Or some other presentation, like a boolean column
"is_shared_fileset".
Why not simply showing the files underneath their directories?
/path/to/tmp/file1
/path/to/tmp/subdir1/file2
In which case probably showing the directory itself is not useful,
and the is_dir column could be dropped?
I'm unconvinced by the skipping condition:
+ if (!S_ISREG(attrib.st_mode) && + (!dir_ok && S_ISDIR(attrib.st_mode))) continue;which is pretty hard to read. ISTM you meant "not A and not (B and C)"
instead?I can write it as two ifs.
Hmmm. Not sure it would help that much. At least the condition must be
right. Also, the comment should be updated.
And, it's probably better to say:
if (!ISDIR() || !dir_ok)
I cannot say I like it. dir_ok is cheaper to test so could come first.
..which is same as: !(B && C), as you said.
Ok, so you confirm that the condition was wrong.
Catalog update should be a date + number? Maybe this is best left to
the committer?Yes, but I preferred to include it, causing a deliberate conflict, to ensure
it's not forgotten.
Ok.
--
Fabien.
On Fri, Dec 27, 2019 at 06:50:24PM +0100, Fabien COELHO wrote:
On Fri, Dec 27, 2019 at 05:22:47PM +0100, Fabien COELHO wrote:
The implementation simply extends an existing functions with a boolean to
allow for sub-directories. However, the function does not seem to show
subdir contents recursively. Should it be the case?STM that "//"-comments are not project policy.
Sure, but the patch is less important than the design, which needs to be
addressed first. The goal is to somehow show tmpfiles (or at least dirs) used
by parallel workers. I mentioned a few possible ways, of which this was the
simplest to implement. Showing files beneath the dir is probably good, but
need to decide how to present it. Should there be a column for the dir (null
if not a shared filesets)? Or some other presentation, like a boolean column
"is_shared_fileset".Why not simply showing the files underneath their directories?
/path/to/tmp/file1
/path/to/tmp/subdir1/file2In which case probably showing the directory itself is not useful,
and the is_dir column could be dropped?
The names are expected to look like this:
$ sudo find /var/lib/pgsql/12/data/base/pgsql_tmp -ls
142977 4 drwxr-x--- 3 postgres postgres 4096 Dec 27 13:51 /var/lib/pgsql/12/data/base/pgsql_tmp
169868 4 drwxr-x--- 2 postgres postgres 4096 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset
169347 5492 -rw-r----- 1 postgres postgres 5619712 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/0.0
169346 5380 -rw-r----- 1 postgres postgres 5505024 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/1.0
I think we'd have to show sudbdir/file1, subdir/file2, not just file1, file2.
It doesn't seem useful or nice to show a bunch of files called 0.0 or 1.0.
Actually the results should be unique, either on filename or (dir,file).
"ls" wouldn't list same name twice, unless you list multiple dirs, like:
|ls a/b c/d.
It's worth thinking if subdir should be a separate column.
I'm interested to hear back from others.
Justin
Hello Justin,
Why not simply showing the files underneath their directories?
/path/to/tmp/file1
/path/to/tmp/subdir1/file2In which case probably showing the directory itself is not useful,
and the is_dir column could be dropped?The names are expected to look like this:
$ sudo find /var/lib/pgsql/12/data/base/pgsql_tmp -ls
142977 4 drwxr-x--- 3 postgres postgres 4096 Dec 27 13:51 /var/lib/pgsql/12/data/base/pgsql_tmp
169868 4 drwxr-x--- 2 postgres postgres 4096 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset
169347 5492 -rw-r----- 1 postgres postgres 5619712 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/0.0
169346 5380 -rw-r----- 1 postgres postgres 5505024 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/1.0I think we'd have to show sudbdir/file1, subdir/file2, not just file1, file2.
It doesn't seem useful or nice to show a bunch of files called 0.0 or 1.0.
Actually the results should be unique, either on filename or (dir,file).
Ok, so this suggests recursing into subdirs, which requires to make a
separate function of the inner loop.
It's worth thinking if subdir should be a separate column.
My 0.02ᅵᅵ: I would rather simply keep the full path and just add subdir
contents, so that the function output does not change at all.
--
Fabien.
On Sat, Dec 28, 2019 at 07:52:55AM +0100, Fabien COELHO wrote:
Why not simply showing the files underneath their directories?
/path/to/tmp/file1
/path/to/tmp/subdir1/file2In which case probably showing the directory itself is not useful,
and the is_dir column could be dropped?The names are expected to look like this:
$ sudo find /var/lib/pgsql/12/data/base/pgsql_tmp -ls
142977 4 drwxr-x--- 3 postgres postgres 4096 Dec 27 13:51 /var/lib/pgsql/12/data/base/pgsql_tmp
169868 4 drwxr-x--- 2 postgres postgres 4096 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset
169347 5492 -rw-r----- 1 postgres postgres 5619712 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/0.0
169346 5380 -rw-r----- 1 postgres postgres 5505024 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/1.0I think we'd have to show sudbdir/file1, subdir/file2, not just file1, file2.
It doesn't seem useful or nice to show a bunch of files called 0.0 or 1.0.
Actually the results should be unique, either on filename or (dir,file).Ok, so this suggests recursing into subdirs, which requires to make a
separate function of the inner loop.
Yea, it suggests that; but, SRF_RETURN_NEXT doesn't make that very easy.
It'd need to accept the fcinfo argument, and pg_ls_dir_files would call it once
for every tuple to be returned. So it's recursive and saves its state...
The attached is pretty ugly, but I can't see how to do better.
The alternative seems to be to build up a full list of pathnames in the SRF
initial branch, and stat them all later. Or stat them all in the INITIAL case,
and keep a list of stat structures to be emited during future calls.
BTW, it seems to me this error message should be changed:
snprintf(path, sizeof(path), "%s/%s", fctx->location, de->d_name);
if (stat(path, &attrib) < 0)
ereport(ERROR,
(errcode_for_file_access(),
- errmsg("could not stat directory \"%s\": %m", dir)));
+ errmsg("could not stat file \"%s\": %m", path)));
Hello Justin,
Ok, so this suggests recursing into subdirs, which requires to make a
separate function of the inner loop.Yea, it suggests that; but, SRF_RETURN_NEXT doesn't make that very easy.
It'd need to accept the fcinfo argument, and pg_ls_dir_files would call it once
for every tuple to be returned. So it's recursive and saves its state...The attached is pretty ugly, but I can't see how to do better.
Hmmm… I do agree with pretty ugly:-)
If it really neads to be in the structure, why not save the open directory
field in the caller and restore it after the recursive call, and pass the
rest of the structure as a pointer? Something like:
... root_function(...)
{
struct status_t status = initialization();
...
call rec_function(&status, path)
...
cleanup();
}
... rec_function(struct *status, path)
{
status->dir = opendir(path);
for (dir contents)
{
if (it_is_a_dir)
{
/* some comment about why we do that… */
dir_t saved = status->dir;
status->dir = NULL;
rec_function(status, subpath);
status->dir = saved;
}
}
closedir(status->dir), status->dir = NULL;
}
The alternative seems to be to build up a full list of pathnames in the SRF
initial branch, and stat them all later. Or stat them all in the INITIAL case,
and keep a list of stat structures to be emited during future calls.
--
Fabien.
Here's a version which uses an array of directory_fctx, rather than of DIR and
location. That avoids changing the data structure and collatoral implications
to pg_ls_dir().
Currently, this *shows* subdirs of subdirs, but doesn't decend into them.
So I think maybe toplevel subdirs should be shown, too.
And maybe the is_dir flag should be re-introduced (although someone could call
pg_stat_file if needed).
I'm interested to hear feedback on that, although this patch still isn't great.
Hello Justin,
About this v4.
It applies cleanly.
I'm trying to think about how to get rid of the strange structure and
hacks, and the arbitrary looking size 2 array.
Also the recursion is one step, but I'm not sure why, ISTM it could/should
go on always?
Maybe the recursive implementation was not such a good idea, if I
suggested it is because I did not noticed the "return next" re-entrant
stuff, shame on me.
Looking at the code, ISTM that relying on a stack/list would be much
cleaner and easier to understand. The code could look like:
ls_func()
if (first_time)
initialize description
set next directory to visit
while (1)
if next directory
init/push next directory to visit as current
read current directory
if emty
pop/close current directory
elif is_a_dir and recursion allowed
set next directory to visit
else ...
return next tuple
cleanup
The point is to avoid a hack around the directory_fctx array, to have only
one place to push/init a new directory (i.e. call AllocateDir and play
around with the memory context) instead of two, and to allow deeper
recursion if useful.
Details : snprintf return is not checked. Maybe it should say why an
overflow cannot occur.
"bool nulls[3] = { 0,};" -> "bool nulls[3} = { false, false, false };"?
--
Fabien.
On Wed, Jan 15, 2020 at 11:21:36AM +0100, Fabien COELHO wrote:
I'm trying to think about how to get rid of the strange structure and hacks,
and the arbitrary looking size 2 array.Also the recursion is one step, but I'm not sure why, ISTM it could/should
go on always?
Because tmpfiles only go one level deep.
Looking at the code, ISTM that relying on a stack/list would be much cleaner
and easier to understand. The code could look like:
I'm willing to change the implementation, but only after there's an agreement
about the desired behavior (extra column, one level, etc).
Justin
Hello Justin,
I'm trying to think about how to get rid of the strange structure and hacks,
and the arbitrary looking size 2 array.Also the recursion is one step, but I'm not sure why, ISTM it could/should
go on always?Because tmpfiles only go one level deep.
I'm not sure it is a general rule. ISTM that extensions can use tmp files,
and we would have no control about what they would do there.
Looking at the code, ISTM that relying on a stack/list would be much cleaner
and easier to understand. The code could look like:I'm willing to change the implementation, but only after there's an agreement
about the desired behavior (extra column, one level, etc).
For the level, ISTM that the implementation should not make this
assumption. If in practice there is just one level, then the function will
not recurse deep, no problem.
For the column, I'm not sure that "isdir" is necessary.
You could put it implicitely in the file name by ending it with "/",
and/or showing the directory contents is enough a hint that there is a
directory?
Also, I'm not fully sure why ".*" files should be skipped, maybe it should
be an option? Or the user can filter it with SQL if it does not want them?
--
Fabien.
On Thu, Jan 16, 2020 at 09:34:32AM +0100, Fabien COELHO wrote:
Also, I'm not fully sure why ".*" files should be skipped, maybe it should
be an option? Or the user can filter it with SQL if it does not want them?
I think if someone wants the full generality, they can do this:
postgres=# SELECT name, s.size, s.modification, s.isdir FROM (SELECT 'base/pgsql_tmp'p)p, pg_ls_dir(p)name, pg_stat_file(p||'/'||name)s;
name | size | modification | isdir
------+------+------------------------+-------
.foo | 4096 | 2020-01-16 08:57:04-05 | t
In my mind, pg_ls_tmpdir() is for showing tmpfiles, not just a shortcut to
SELECT pg_ls_dir((SELECT 'base/pgsql_tmp'p)); -- or, for all tablespaces:
WITH x AS (SELECT format('/PG_%s_%s', split_part(current_setting('server_version'), '.', 1), catalog_version_no) suffix FROM pg_control_system()), y AS (SELECT a, pg_ls_dir(a) AS d FROM (SELECT DISTINCT COALESCE(NULLIF(pg_tablespace_location(oid),'')||suffix, 'base') a FROM pg_tablespace,x)a) SELECT a, pg_ls_dir(a||'/pgsql_tmp') FROM y WHERE d='pgsql_tmp';
I think changing dotfiles is topic for another patch.
That would also affect pg_ls_dir, and everything else that uses the backing
function pg_ls_dir_files_recurse. I'd have to ask why not also show . and .. ?
(In fact, if I were to change anything, I would propose to limit pg_ls_tmpdir()
to files matching PG_TEMP_FILE_PREFIX).
Justin
Hi Fabien,
On 1/16/20 9:38 AM, Justin Pryzby wrote:
On Thu, Jan 16, 2020 at 09:34:32AM +0100, Fabien COELHO wrote:
Also, I'm not fully sure why ".*" files should be skipped, maybe it should
be an option? Or the user can filter it with SQL if it does not want them?I think if someone wants the full generality, they can do this:
postgres=# SELECT name, s.size, s.modification, s.isdir FROM (SELECT 'base/pgsql_tmp'p)p, pg_ls_dir(p)name, pg_stat_file(p||'/'||name)s;
name | size | modification | isdir
------+------+------------------------+-------
.foo | 4096 | 2020-01-16 08:57:04-05 | tIn my mind, pg_ls_tmpdir() is for showing tmpfiles, not just a shortcut to
SELECT pg_ls_dir((SELECT 'base/pgsql_tmp'p)); -- or, for all tablespaces:
WITH x AS (SELECT format('/PG_%s_%s', split_part(current_setting('server_version'), '.', 1), catalog_version_no) suffix FROM pg_control_system()), y AS (SELECT a, pg_ls_dir(a) AS d FROM (SELECT DISTINCT COALESCE(NULLIF(pg_tablespace_location(oid),'')||suffix, 'base') a FROM pg_tablespace,x)a) SELECT a, pg_ls_dir(a||'/pgsql_tmp') FROM y WHERE d='pgsql_tmp';I think changing dotfiles is topic for another patch.
That would also affect pg_ls_dir, and everything else that uses the backing
function pg_ls_dir_files_recurse. I'd have to ask why not also show . and .. ?(In fact, if I were to change anything, I would propose to limit pg_ls_tmpdir()
to files matching PG_TEMP_FILE_PREFIX).
We seem to be at an impasse on this patch. What do you think of
Justin's comments here?
Do you still believe a different implementation is required?
Regards,
--
-David
david@pgmasters.net
On Tue, Mar 03, 2020 at 02:51:54PM -0500, David Steele wrote:
Hi Fabien,
On 1/16/20 9:38 AM, Justin Pryzby wrote:
On Thu, Jan 16, 2020 at 09:34:32AM +0100, Fabien COELHO wrote:
Also, I'm not fully sure why ".*" files should be skipped, maybe it should
be an option? Or the user can filter it with SQL if it does not want them?I think if someone wants the full generality, they can do this:
postgres=# SELECT name, s.size, s.modification, s.isdir FROM (SELECT 'base/pgsql_tmp'p)p, pg_ls_dir(p)name, pg_stat_file(p||'/'||name)s;
name | size | modification | isdir
------+------+------------------------+-------
.foo | 4096 | 2020-01-16 08:57:04-05 | tIn my mind, pg_ls_tmpdir() is for showing tmpfiles, not just a shortcut to
SELECT pg_ls_dir((SELECT 'base/pgsql_tmp'p)); -- or, for all tablespaces:
WITH x AS (SELECT format('/PG_%s_%s', split_part(current_setting('server_version'), '.', 1), catalog_version_no) suffix FROM pg_control_system()), y AS (SELECT a, pg_ls_dir(a) AS d FROM (SELECT DISTINCT COALESCE(NULLIF(pg_tablespace_location(oid),'')||suffix, 'base') a FROM pg_tablespace,x)a) SELECT a, pg_ls_dir(a||'/pgsql_tmp') FROM y WHERE d='pgsql_tmp';I think changing dotfiles is topic for another patch.
That would also affect pg_ls_dir, and everything else that uses the backing
function pg_ls_dir_files_recurse. I'd have to ask why not also show . and .. ?(In fact, if I were to change anything, I would propose to limit pg_ls_tmpdir()
to files matching PG_TEMP_FILE_PREFIX).We seem to be at an impasse on this patch. What do you think of Justin's
comments here?
Actually, I found Fabien's comment regarding extensions use of tmp dir to be
convincing. And I'm willing to update the patch to use a stack to show
arbitrarily-deep files/dirs rather than just one level deep (as used for shared
filesets in core postgres).
But I don't think it makes sense to go through more implementation/review
cycles without some agreement from a larger group regarding the
desired/intended interface. Should there be a column for "parent dir" ? Or a
column for "is_dir" ? Should dirs be shown at all, or only files ?
--
Justin
On 2020-Mar-03, Justin Pryzby wrote:
But I don't think it makes sense to go through more implementation/review
cycles without some agreement from a larger group regarding the
desired/intended interface. Should there be a column for "parent dir" ? Or a
column for "is_dir" ? Should dirs be shown at all, or only files ?
IMO: is_dir should be there (and subdirs should be listed), but
parent_dir should not appear. Also, the "path" should show the complete
pathname, including containing dirs, starting from whatever the "root"
is for the operation.
So for the example in the initial email, it would look like
path isdir
pgsql_tmp11025.0.sharedfileset/ t
pgsql_tmp11025.0.sharedfileset/0.0 f
pgsql_tmp11025.0.sharedfileset/1.0 f
plus additional columns, same as pg_ls_waldir et al.
I'd rather not have the code assume that there's a single level of
subdirs, or assuming that an entry in the subdir cannot itself be a dir;
that might end up hiding files for no good reason.
I don't understand what purpose is served by having pg_ls_waldir() hide
directories.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3/3/20, 12:24 PM, "Alvaro Herrera" <alvherre@2ndquadrant.com> wrote:
IMO: is_dir should be there (and subdirs should be listed), but
parent_dir should not appear. Also, the "path" should show the complete
pathname, including containing dirs, starting from whatever the "root"
is for the operation.So for the example in the initial email, it would look like
path isdir
pgsql_tmp11025.0.sharedfileset/ t
pgsql_tmp11025.0.sharedfileset/0.0 f
pgsql_tmp11025.0.sharedfileset/1.0 fplus additional columns, same as pg_ls_waldir et al.
I'd rather not have the code assume that there's a single level of
subdirs, or assuming that an entry in the subdir cannot itself be a dir;
that might end up hiding files for no good reason.
+1
Nathan
Import Notes
Resolved by subject fallback
On Tue, Mar 03, 2020 at 05:23:13PM -0300, Alvaro Herrera wrote:
On 2020-Mar-03, Justin Pryzby wrote:
But I don't think it makes sense to go through more implementation/review
cycles without some agreement from a larger group regarding the
desired/intended interface. Should there be a column for "parent dir" ? Or a
column for "is_dir" ? Should dirs be shown at all, or only files ?IMO: is_dir should be there (and subdirs should be listed), but
parent_dir should not appear. Also, the "path" should show the complete
pathname, including containing dirs, starting from whatever the "root"
is for the operation.So for the example in the initial email, it would look like
path isdir
pgsql_tmp11025.0.sharedfileset/ t
pgsql_tmp11025.0.sharedfileset/0.0 f
pgsql_tmp11025.0.sharedfileset/1.0 fplus additional columns, same as pg_ls_waldir et al.
I'd rather not have the code assume that there's a single level of
subdirs, or assuming that an entry in the subdir cannot itself be a dir;
that might end up hiding files for no good reason.
Thanks for your input, see attached.
I'm not sure if prefer the 0002 patch alone (which recurses into dirs all at
once during the initial call), or 0002+3+4, which incrementally reads the dirs
on each call (but requires keeping dirs opened).
I don't understand what purpose is served by having pg_ls_waldir() hide
directories.
We could talk about whether the other functions should show dirs, if it's worth
breaking their return type. Or if they should show hidden or special files,
which doesn't require breaking the return. But until then I am to leave the
behavior alone.
--
Justin
Attachments:
v6-0001-BUG-in-errmsg.patchtext/x-diff; charset=us-asciiDownload+1-2
v6-0002-pg_ls_tmpdir-to-show-directories.patchtext/x-diff; charset=us-asciiDownload+114-43
v6-0003-Convert-data-structure-to-use-a-list.patchtext/x-diff; charset=us-asciiDownload+41-38
v6-0004-Read-each-dir-incrementally-during-each-SRF-call.patchtext/x-diff; charset=us-asciiDownload+47-61
On Thu, Mar 05, 2020 at 10:18:38AM -0600, Justin Pryzby wrote:
I'm not sure if prefer the 0002 patch alone (which recurses into dirs all at
once during the initial call), or 0002+3+4, which incrementally reads the dirs
on each call (but requires keeping dirs opened).
I fixed an issue that leading dirs were being shown which should not have been,
which was easier in the 0004 patch, so squished. And fixed a bug that
"special" files weren't excluded, and "missing_ok" wasn't effective.
I don't understand what purpose is served by having pg_ls_waldir() hide
directories.We could talk about whether the other functions should show dirs, if it's worth
breaking their return type. Or if they should show hidden or special files,
which doesn't require breaking the return. But until then I am to leave the
behavior alone.
I don't see why any of the functions would exclude dirs, but ls_tmpdir deserves
to be fixed since core postgres dynamically creates dirs there.
Also ... I accidentally changed the behavior: master not only doesn't decend
into dirs, it hides them - that was my original complaint. I propose to *also*
change at least tmpdir and logdir to show dirs, but don't decend. I left
waldir alone for now.
Since v12 ls_tmpdir and since v10 logdir and waldir exclude dirs, I think we
should backpatch documentation to say so.
ISTM pg_ls_tmpdir and ls_logdir should be called with missing_ok=true, since
they're not created until they're used.
--
Justin
Attachments:
v7-0001-BUG-in-errmsg.patchtext/x-diff; charset=us-asciiDownload+1-2
v7-0002-Document-historic-behavior-about-hiding-directori.patchtext/x-diff; charset=us-asciiDownload+3-1
v7-0003-Document-historic-behavior-about-hiding-directori.patchtext/x-diff; charset=us-asciiDownload+1-1
v7-0004-pg_ls_tmpdir-to-show-directories.patchtext/x-diff; charset=us-asciiDownload+134-56
v7-0005-Change-logdir-and-archive_statusdir-to-include-di.patchtext/x-diff; charset=us-asciiDownload+8-9
v7-0006-Change-pg_ls_logdir-to-ignore-error-if-initial-to.patchtext/x-diff; charset=us-asciiDownload+1-2
Hello Justin,
Some feedback about the v7 patch set.
About v7.1, seems ok.
About v7.2 & v7.3 seems ok, altought the two could be merged.
About v7.4:
The documentation sentences could probably be improved "for for", "used
... used". Maybe:
For the temporary directory for <parameter>tablespace</parameter>, ...
->
For <parameter>tablespace</parameter> temporary directory, ...
Directories are used for temporary files used by parallel
processes, and are shown recursively.
->
Directories holding temporary files used by parallel
processes are shown recursively.
It seems that lists are used as FIFO structures by appending, fetching &
deleting last, all of which are O(n). ISTM it would be better to use the
head of the list by inserting, getting and deleting first, which are O(1).
ISTM that several instances of: "pg_ls_dir_files(..., true, false);"
should be "pg_ls_dir_files(..., true, DIR_HIDE);".
About v7.5 looks like a doc update which should be merged with v7.4.
Alas, ISTM that there are no tests on any of these functions:-(
--
Fabien.
On Sat, Mar 07, 2020 at 03:14:37PM +0100, Fabien COELHO wrote:
Some feedback about the v7 patch set.
Thanks for looking again
About v7.1, seems ok.
About v7.2 & v7.3 seems ok, altought the two could be merged.
These are separate since I proprose that one should be backpatched to v12 and
the other to v10.
About v7.4:
...
It seems that lists are used as FIFO structures by appending, fetching &
deleting last, all of which are O(n). ISTM it would be better to use the
head of the list by inserting, getting and deleting first, which are O(1).
I think you're referring to linked lists, but pglists are now arrays, for which
that's backwards. See 1cff1b95a and d97b714a2. For example, list_delete_last
says:
* This is the opposite of list_delete_first(), but is noticeably cheaper
* with a long list, since no data need be moved.
ISTM that several instances of: "pg_ls_dir_files(..., true, false);" should
be "pg_ls_dir_files(..., true, DIR_HIDE);".
Oops, that affects an intermediate commit and maybe due to merge conflict.
Thanks.
About v7.5 looks like a doc update which should be merged with v7.4.
No, v7.5 updates pg_proc.dat and changes the return type of two functions.
It's a short commit since all the infrastructure is implemented to make the
functions do whatever we want. But it's deliberately separate since I'm
proposing a breaking change, and one that hasn't been discussed until now.
Alas, ISTM that there are no tests on any of these functions:-(
Yeah. Everything that includes any output is going to include timestamps;
those could be filtered out. waldir is going to have random filenames, and a
differing number of rows. But we should exercize pg_ls_dir_files at least
once..
My previous version had a bug with ignore_missing with pg_ls_tmpdir, which
would've been caught by a test like:
SELECT FROM pg_ls_tmpdir() WHERE name='Does not exist'; -- Never true, so the function runs to completion but returns zero rows.
The 0006 commit changes that for logdir, too. Without 0006, that will ERROR if
the dir doesn't exist (which I think would be the default during regression
tests).
It'd be nice to run pg_ls_tmpdir before the tmpdir exists, and again
afterwards. But I'm having trouble finding a single place to put it. The
closest I can find is dbsize.sql. Any ideas ?
--
Justin
It seems that lists are used as FIFO structures by appending, fetching &
deleting last, all of which are O(n). ISTM it would be better to use the
head of the list by inserting, getting and deleting first, which are O(1).I think you're referring to linked lists, but pglists are now arrays,
Ok… I forgot about this change, so my point is void, you took the right
one.
--
Fabien.