Random pg_upgrade test failure on drongo

Started by Hayato Kuroda (Fujitsu)about 2 years ago31 messages
#1Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com

Dear hackers,

While tracking a buildfarm, I found that drongo failed the test pg_upgrade/003_logical_slots [1]https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=drongo&dt=2023-11-07%2013%3A43%3A23&stg=pg_upgrade-check.
A strange point is that the test passed in the next iteration. Currently I'm not
sure the reason, but I will keep my eye for it and will investigate if it
happens again.

I think this failure is not related with our logical slots work, whereas it
failed 003_logical_slots.pl. More detail, please see latter part.

For more investigation, a server log during the upgrade may be needed. It will
be in the data directory so BF system will not upload them. I may need additional
information if it failed again.

# Analysis of failure

According to the output, pg_upgrade seemed to be failed while restoring objects
to new cluster[2]``` ... # No postmaster PID for node "oldpub" # Running: pg_upgrade --no-sync -d C:\\prog\\bf\\root\\HEAD\\pgsql.build/testrun/pg_upgrade/003_logical_slots\\data/t_003_logical_slots_oldpub_data/pgdata -D C:\\prog\\bf\\root\\HEAD\\pgsql.build/testrun/pg_upgrade/003_logical_slots\\data/t_003_logical_slots_newpub_data/pgdata -b C:/prog/bf/root/HEAD/PGSQL~1.BUI/TMP_IN~1/prog/bf/root/HEAD/inst/bin -B C:/prog/bf/root/HEAD/PGSQL~1.BUI/TMP_IN~1/prog/bf/root/HEAD/inst/bin -s 127.0.0.1 -p 54813 -P 54814 --copy Performing Consistency Checks ... Setting frozenxid and minmxid counters in new cluster ok Restoring global objects in the new cluster ok Restoring database schemas in the new cluster *failure*.

As code-level anaysis, pg_upgrade command failed in exec_prog().
In the function, pg_restore tried to be executed for database "postgres".
Below is a brief call-stack. Note that pg_restore is not used for migrating
logical replication slots, it is done by pg_upgrade binary itself. Also, the
migration is done after all objects are copied, not in create_new_objects().

```
exec_prog()
parallel_exec_prog("pg_restore ... ") <-- Since -j option is not specified, it is just a wrapper
create_new_objects()
main()
```

In exec_prog(), system() system call was called but returned non-zero value.
Doc said that sytem() returns value that is returned by the command interpreter,
when input is not NULL [3]https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/system-wsystem?view=msvc-170. Unfortunately, current code does not output the
return code. Also, BF system does not upload data directory for failed tests.
Therefore, I could not get more information for the investigation.

[1]: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=drongo&amp;dt=2023-11-07%2013%3A43%3A23&amp;stg=pg_upgrade-check
[2]: ``` ... # No postmaster PID for node "oldpub" # Running: pg_upgrade --no-sync -d C:\\prog\\bf\\root\\HEAD\\pgsql.build/testrun/pg_upgrade/003_logical_slots\\data/t_003_logical_slots_oldpub_data/pgdata -D C:\\prog\\bf\\root\\HEAD\\pgsql.build/testrun/pg_upgrade/003_logical_slots\\data/t_003_logical_slots_newpub_data/pgdata -b C:/prog/bf/root/HEAD/PGSQL~1.BUI/TMP_IN~1/prog/bf/root/HEAD/inst/bin -B C:/prog/bf/root/HEAD/PGSQL~1.BUI/TMP_IN~1/prog/bf/root/HEAD/inst/bin -s 127.0.0.1 -p 54813 -P 54814 --copy Performing Consistency Checks ... Setting frozenxid and minmxid counters in new cluster ok Restoring global objects in the new cluster ok Restoring database schemas in the new cluster *failure*
```
...
# No postmaster PID for node "oldpub"
# Running: pg_upgrade --no-sync -d C:\\prog\\bf\\root\\HEAD\\pgsql.build/testrun/pg_upgrade/003_logical_slots\\data/t_003_logical_slots_oldpub_data/pgdata -D C:\\prog\\bf\\root\\HEAD\\pgsql.build/testrun/pg_upgrade/003_logical_slots\\data/t_003_logical_slots_newpub_data/pgdata -b C:/prog/bf/root/HEAD/PGSQL~1.BUI/TMP_IN~1/prog/bf/root/HEAD/inst/bin -B C:/prog/bf/root/HEAD/PGSQL~1.BUI/TMP_IN~1/prog/bf/root/HEAD/inst/bin -s 127.0.0.1 -p 54813 -P 54814 --copy
Performing Consistency Checks
...
Setting frozenxid and minmxid counters in new cluster ok
Restoring global objects in the new cluster ok
Restoring database schemas in the new cluster
*failure*

Consult the last few lines of "C:/prog/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_logical_slots/data/t_003_logical_slots_newpub_data/pgdata/pg_upgrade_output.d/20231107T142224.580/log/pg_upgrade_dump_5.log" for
the probable cause of the failure.
Failure, exiting
[14:23:26.632](70.141s) not ok 10 - run of pg_upgrade of old cluster
[14:23:26.632](0.000s) # Failed test 'run of pg_upgrade of old cluster'
# at C:/prog/bf/root/HEAD/pgsql/src/bin/pg_upgrade/t/003_logical_slots.pl line 170.
### Starting node "newpub"
# Running: pg_ctl -w -D C:\\prog\\bf\\root\\HEAD\\pgsql.build/testrun/pg_upgrade/003_logical_slots\\data/t_003_logical_slots_newpub_data/pgdata -l C:\\prog\\bf\\root\\HEAD\\pgsql.build/testrun/pg_upgrade/003_logical_slots\\log/003_logical_slots_newpub.log -o --cluster-name=newpub start
waiting for server to start.... done
server started
# Postmaster PID for node "newpub" is 4604
[14:23:28.398](1.766s) not ok 11 - check the slot exists on new cluster
[14:23:28.398](0.001s) # Failed test 'check the slot exists on new cluster'
# at C:/prog/bf/root/HEAD/pgsql/src/bin/pg_upgrade/t/003_logical_slots.pl line 176.
[14:23:28.399](0.000s) # got: ''
# expected: 'regress_sub|t'
...
```
[3]: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/system-wsystem?view=msvc-170

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#2Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Hayato Kuroda (Fujitsu) (#1)
RE: Random pg_upgrade test failure on drongo

Dear hackers,

While tracking a buildfarm, I found that drongo failed the test
pg_upgrade/003_logical_slots [1].
A strange point is that the test passed in the next iteration. Currently I'm not
sure the reason, but I will keep my eye for it and will investigate if it
happens again.

This email just tells an update. We found that fairywren was also failed due to
the same reason [2]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&amp;dt=2023-11-08%2010%3A22%3A45. It fails inconsistently, but there might be a bad thing on
windows. I'm now trying to reproduce with my colleagues to analyze more detail.
Also, working with Andrew for getting logs emitted during the upgrade.
I will continue to keep on my eye.

[2]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&amp;dt=2023-11-08%2010%3A22%3A45

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#3Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Hayato Kuroda (Fujitsu) (#2)
RE: Random pg_upgrade test failure on drongo

Dear hackers,

This email tells an update. The machine drongo failed the test a week ago [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&amp;dt=2023-11-15%2006%3A16%3A15
and finally got logfiles. PSA files.

## Observed failure

pg_upgrade_server.log is a server log during the pg_upgrade command. According to
it, the TRUNCATE command seemed to be failed due to a "File exists" error.

```
2023-11-15 00:02:02.239 UTC [1752:18] 003_logical_slots.pl ERROR: could not create file "base/1/2683": File exists
2023-11-15 00:02:02.239 UTC [1752:19] 003_logical_slots.pl STATEMENT:
-- For binary upgrade, preserve pg_largeobject and index relfilenodes
SELECT pg_catalog.binary_upgrade_set_next_index_relfilenode('2683'::pg_catalog.oid);
SELECT pg_catalog.binary_upgrade_set_next_heap_relfilenode('2613'::pg_catalog.oid);
TRUNCATE pg_catalog.pg_largeobject;
...
```

## Analysis

I think it caused due to the STATUS_DELETE_PENDING failure, not related with recent
updates for pg_upgrade.

The file "base/1/2683" is an index file for pg_largeobject_loid_pn_index, and the
output meant that file creation was failed. Below is a backtrace.

```
pgwin32_open() // <-- this returns -1
open()
BasicOpenFilePerm()
PathNameOpenFilePerm()
PathNameOpenFile()
mdcreate()
smgrcreate()
RelationCreateStorage()
RelationSetNewRelfilenumber()
ExecuteTruncateGuts()
ExecuteTruncate()
```

But this is strange. Before calling mdcreate(), we surely unlink the file which
have the same name. Below is a trace until unlink.

```
pgunlink()
unlink()
mdunlinkfork()
mdunlink()
smgrdounlinkall()
RelationSetNewRelfilenumber() // common path with above
ExecuteTruncateGuts()
ExecuteTruncate()
```

I found Thomas said that [2]/messages/by-id/CA+hUKGKsdzw06c5nnb=KYG9GmvyykoVpJA_VR3k0r7cZOKcx6Q@mail.gmail.com pgunlink sometimes could not remove file even if
it returns OK, at that time NTSTATUS is STATUS_DELETE_PENDING. Also, a comment
in pgwin32_open_handle() mentions the same thing:

```
/*
* ERROR_ACCESS_DENIED is returned if the file is deleted but not yet
* gone (Windows NT status code is STATUS_DELETE_PENDING). In that
* case, we'd better ask for the NT status too so we can translate it
* to a more Unix-like error. We hope that nothing clobbers the NT
* status in between the internal NtCreateFile() call and CreateFile()
* returning.
*
```

The definition of STATUS_DELETE_PENDING can be seen in [3]https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55. Based on that, indeed,
open() would be able to fail with STATUS_DELETE_PENDING if the deletion is pending
but it is tried to open.

Another thread [4]/messages/by-id/20220919213217.ptqfdlcc5idk5xup@awork3.anarazel.de also tries the issue while doing rmtree->unlink, and it reties
to remove if it fails with STATUS_DELETE_PENDING. So, should we retry to open when
it fails as well? Anyway, this fix seems out-of-scope for pg_upgrade.

How do you think? Do you have any thoughts about it?

## Acknowledgement

I want to say thanks to Sholk, Vingesh, for helping the analysis.

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&amp;dt=2023-11-15%2006%3A16%3A15
[2]: /messages/by-id/CA+hUKGKsdzw06c5nnb=KYG9GmvyykoVpJA_VR3k0r7cZOKcx6Q@mail.gmail.com
[3]: https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55
[4]: /messages/by-id/20220919213217.ptqfdlcc5idk5xup@awork3.anarazel.de

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#4Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Hayato Kuroda (Fujitsu) (#3)
1 attachment(s)
RE: Random pg_upgrade test failure on drongo

Dear hackers,

This email tells an update. The machine drongo failed the test a week ago [1]
and finally got logfiles. PSA files.

Oh, sorry. I missed to attach files. You can see pg_upgrade_server.log for now.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

pg_upgrade_server.logapplication/octet-stream; name=pg_upgrade_server.logDownload
#5Alexander Lakhin
exclusion@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#4)
1 attachment(s)
Re: Random pg_upgrade test failure on drongo

Hello Kuroda-san,

21.11.2023 13:37, Hayato Kuroda (Fujitsu) wrote:

This email tells an update. The machine drongo failed the test a week ago [1]
and finally got logfiles. PSA files.

Oh, sorry. I missed to attach files. You can see pg_upgrade_server.log for now.

I can easily reproduce this failure on my workstation by running 5 tests
003_logical_slots in parallel inside Windows VM with it's CPU resources
limited to 50%, like so:
VBoxManage controlvm "Windows" cpuexecutioncap 50

set PGCTLTIMEOUT=180
python3 -c "NUMITERATIONS=20;NUMTESTS=5;import os;tsts='';exec('for i in range(1,NUMTESTS+1):
tsts+=f\"pg_upgrade_{i}/003_logical_slots \"'); exec('for i in range(1,NUMITERATIONS+1):print(f\"iteration {i}\");
assert(os.system(f\"meson test --num-processes {NUMTESTS} {tsts}\") == 0)')"
...
iteration 2
ninja: Entering directory `C:\src\postgresql\build'
ninja: no work to do.
1/5 postgresql:pg_upgrade_2 / pg_upgrade_2/003_logical_slots ERROR            60.30s   exit status 25
...
pg_restore: error: could not execute query: ERROR:  could not create file "base/1/2683": File exists
...

I agree with your analysis and would like to propose a PoC fix (see
attached). With this patch applied, 20 iterations succeeded for me.

Best regards,
Alexander

Attachments:

win-unlink-through-rename.patchtext/x-patch; charset=UTF-8; name=win-unlink-through-rename.patchDownload
diff --git a/src/port/dirmod.c b/src/port/dirmod.c
index 07dd190cbc..5185648388 100644
--- a/src/port/dirmod.c
+++ b/src/port/dirmod.c
@@ -121,6 +121,8 @@ pgunlink(const char *path)
 	bool		is_lnk;
 	int			loops = 0;
 	struct stat st;
+	char		tmppath[MAX_PATH];
+	char	   *curpath = path;
 
 	/*
 	 * This function might be called for a regular file or for a junction
@@ -129,9 +131,14 @@ pgunlink(const char *path)
 	 * if we can unlink directly, since that's expected to be the most common
 	 * case.
 	 */
-	if (unlink(path) == 0)
-		return 0;
-	if (errno != EACCES)
+	snprintf(tmppath, sizeof(tmppath), "%s.tmp", path);
+	if (pgrename(path, tmppath) == 0)
+	{
+		if (unlink(tmppath) == 0)
+			return 0;
+		curpath = tmppath;
+	}
+	else if (errno != EACCES)
 		return -1;
 
 	/*
@@ -150,7 +157,7 @@ pgunlink(const char *path)
 	 * fail. We want to wait until the file truly goes away so that simple
 	 * recursive directory unlink algorithms work.
 	 */
-	if (lstat(path, &st) < 0)
+	if (lstat(curpath, &st) < 0)
 	{
 		if (lstat_error_was_status_delete_pending())
 			is_lnk = false;
@@ -167,7 +174,7 @@ pgunlink(const char *path)
 	 * someone else to close the file, as the caller might be holding locks
 	 * and blocking other backends.
 	 */
-	while ((is_lnk ? rmdir(path) : unlink(path)) < 0)
+	while ((is_lnk ? rmdir(curpath) : unlink(curpath)) < 0)
 	{
 		if (errno != EACCES)
 			return -1;
@@ -175,6 +182,14 @@ pgunlink(const char *path)
 			return -1;
 		pg_usleep(100000);		/* us */
 	}
+
+	loops = 0;
+	while (lstat(curpath, &st) < 0 && lstat_error_was_status_delete_pending())
+	{
+		if (++loops > 100)		/* time out after 10 sec */
+			return -1;
+		pg_usleep(100000);		/* us */
+	}
 	return 0;
 }
 
#6Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Alexander Lakhin (#5)
RE: Random pg_upgrade test failure on drongo

Dear Alexander,

I can easily reproduce this failure on my workstation by running 5 tests
003_logical_slots in parallel inside Windows VM with it's CPU resources
limited to 50%, like so:
VBoxManage controlvm "Windows" cpuexecutioncap 50

set PGCTLTIMEOUT=180
python3 -c "NUMITERATIONS=20;NUMTESTS=5;import os;tsts='';exec('for i in
range(1,NUMTESTS+1):
tsts+=f\"pg_upgrade_{i}/003_logical_slots \"'); exec('for i in
range(1,NUMITERATIONS+1):print(f\"iteration {i}\");
assert(os.system(f\"meson test --num-processes {NUMTESTS} {tsts}\") == 0)')"
...
iteration 2
ninja: Entering directory `C:\src\postgresql\build'
ninja: no work to do.
1/5 postgresql:pg_upgrade_2 / pg_upgrade_2/003_logical_slots
ERROR 60.30s exit status 25
...
pg_restore: error: could not execute query: ERROR: could not create file
"base/1/2683": File exists
...

Great. I do not have such an environment so I could not find. This seemed to
suggest that the failure was occurred because the system was busy.

I agree with your analysis and would like to propose a PoC fix (see
attached). With this patch applied, 20 iterations succeeded for me.

Thanks, here are comments. I'm quite not sure for the windows, so I may say
something wrong.

* I'm not sure why the file/directory name was changed before doing a unlink.
Could you add descriptions?
* IIUC, the important points is the latter part, which waits until the status is
changed. Based on that, can we remove a double rmtree() from cleanup_output_dirs()?
They seems to be add for the similar motivation.

```
+	loops = 0;
+	while (lstat(curpath, &st) < 0 && lstat_error_was_status_delete_pending())
+	{
+		if (++loops > 100)		/* time out after 10 sec */
+			return -1;
+		pg_usleep(100000);		/* us */
+	}
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#7Alexander Lakhin
exclusion@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#6)
1 attachment(s)
Re: Random pg_upgrade test failure on drongo

Hello Kuroda-san,

23.11.2023 15:15, Hayato Kuroda (Fujitsu) wrote:

I agree with your analysis and would like to propose a PoC fix (see
attached). With this patch applied, 20 iterations succeeded for me.

Thanks, here are comments. I'm quite not sure for the windows, so I may say
something wrong.

* I'm not sure why the file/directory name was changed before doing a unlink.
Could you add descriptions?

Please look at the simple test program attached. It demonstrates the
failure for me when running in two sessions as follows:
unlink-open test 150 1000
                unlink-open test2 150 1000
...
iteration 60
iteration 61
fopen() after unlink() failed (13)

Process Monitor shows:
...
        9:16:55.9249792 AM    unlink-open.exe    3232 WriteFile    C:\src\test2    SUCCESS    Offset: 138,412,032,
Length: 1,048,576
### unlink() performed for the file "test":
9:16:55.9852903 AM    unlink-open.exe    4968    CreateFile C:\src\test    SUCCESS    Desired Access: Read Attributes,
Delete, Disposition: Open, Options: Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write,
Delete, AllocationSize: n/a, OpenResult: Opened
9:16:55.9853637 AM    unlink-open.exe    4968 QueryAttributeTagFile    C:\src\test    SUCCESS    Attributes: A,
ReparseTag: 0x0
### file "test" gets into state DELETE PENDING:
9:16:55.9853756 AM    unlink-open.exe    4968 SetDispositionInformationFile    C:\src\test    SUCCESS    Delete: True
9:16:55.9853888 AM    unlink-open.exe    4968    CloseFile C:\src\test    SUCCESS
### concurrent operations with file "test2":
        9:16:55.9866362 AM    unlink-open.exe    3232 WriteFile    C:\src\test2    SUCCESS    Offset: 139,460,608,
Length: 1,048,576
...
        9:16:55.9972373 AM    unlink-open.exe    3232 WriteFile    C:\src\test2    SUCCESS    Offset: 157,286,400,
Length: 1,048,576
        9:16:55.9979040 AM    unlink-open.exe    3232 CloseFile    C:\src\test2    SUCCESS
### unlink() for "test2":
        9:16:56.1029981 AM    unlink-open.exe    3232 CreateFile    C:\src\test2    SUCCESS    Desired Access: Read
Attributes, Delete, Disposition: Open, Options: Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode:
Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
        9:16:56.1030432 AM    unlink-open.exe    3232 QueryAttributeTagFile    C:\src\test2    SUCCESS    Attributes:
A, ReparseTag: 0x0
### file "test2" gets into state DELETE PENDING:
        9:16:56.1030517 AM    unlink-open.exe    3232 SetDispositionInformationFile    C:\src\test2    SUCCESS   
Delete: True
        9:16:56.1030625 AM    unlink-open.exe    3232 CloseFile    C:\src\test2    SUCCESS
### and then it opened successfully:
        9:16:56.1189503 AM    unlink-open.exe    3232 CreateFile    C:\src\test2    SUCCESS    Desired Access: Generic
Write, Read Attributes, Disposition: OverwriteIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N,
ShareMode: Read, Write, AllocationSize: 0, OpenResult: Created
        9:16:56.1192016 AM    unlink-open.exe    3232 CloseFile    C:\src\test2    SUCCESS
### operations with file "test2" continued:
        9:16:56.1193394 AM    unlink-open.exe    3232 CreateFile    C:\src\test2    SUCCESS    Desired Access: Read
Attributes, Delete, Disposition: Open, Options: Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode:
Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
        9:16:56.1193895 AM    unlink-open.exe    3232 QueryAttributeTagFile    C:\src\test2    SUCCESS    Attributes:
A, ReparseTag: 0x0
        9:16:56.1194042 AM    unlink-open.exe    3232 SetDispositionInformationFile    C:\src\test2    SUCCESS   
Delete: True
        9:16:56.1194188 AM    unlink-open.exe    3232 CloseFile    C:\src\test2    SUCCESS
        9:16:56.1198459 AM    unlink-open.exe    3232 CreateFile    C:\src\test2    SUCCESS    Desired Access: Generic
Write, Read Attributes, Disposition: OverwriteIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N,
ShareMode: Read, Write, AllocationSize: 0, OpenResult: Created
        9:16:56.1200302 AM    unlink-open.exe    3232 WriteFile    C:\src\test2    SUCCESS    Offset: 0, Length:
1,048,576, Priority: Normal
...
        9:16:56.1275871 AM    unlink-open.exe    3232 WriteFile    C:\src\test2    SUCCESS    Offset: 10,485,760,
Length: 1,048,576

### at the same time, CreateFile() for file "test" failed:
9:16:56.1276453 AM    unlink-open.exe    4968    CreateFile C:\src\test    DELETE PENDING    Desired Access: Generic
Write, Read Attributes, Disposition: OverwriteIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N,
ShareMode: Read, Write, AllocationSize: 0
9:16:56.1279359 AM    unlink-open.exe    3232    WriteFile C:\src\test2    SUCCESS    Offset: 11,534,336, Length: 1,048,576
9:16:56.1283452 AM    unlink-open.exe    3232    WriteFile C:\src\test2    SUCCESS    Offset: 12,582,912, Length: 1,048,576
...

But with rename(MoveFileEx), I see:
unlink-open test 150 1000 rename
...
9:38:01.7035286 AM    unlink-open.exe    10208    WriteFile C:\src\test    SUCCESS    Offset: 156,237,824, Length: 1,048,576
9:38:01.7075621 AM    unlink-open.exe    10208    WriteFile C:\src\test    SUCCESS    Offset: 157,286,400, Length: 1,048,576
9:38:01.7101299 AM    unlink-open.exe    10208    CloseFile C:\src\test    SUCCESS
9:38:01.7130802 AM    unlink-open.exe    10208    CreateFile C:\src\test    SUCCESS    Desired Access: Read Attributes,
Delete, Synchronize, Disposition: Open, Options: Synchronous IO Non-Alert, Open Reparse Point, Attributes: n/a,
ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
9:38:01.7132844 AM    unlink-open.exe    10208 QueryAttributeTagFile    C:\src\test    SUCCESS    Attributes: A,
ReparseTag: 0x0
9:38:01.7133420 AM    unlink-open.exe    10208 QueryBasicInformationFile    C:\src\test    SUCCESS    CreationTime:
11/24/2023 9:38:01 AM, LastAccessTime: 11/24/2023 9:38:01 AM, LastWriteTime: 11/24/2023 9:38:01 AM, ChangeTime:
11/24/2023 9:38:01 AM, FileAttributes: A
9:38:01.7135191 AM    unlink-open.exe    10208    CreateFile C:\src    SUCCESS    Desired Access: Write Data/Add File,
Synchronize, Disposition: Open, Options: , Attributes: n/a, ShareMode: Read, Write, AllocationSize: n/a, OpenResult: Opened

### file "test" renamed to "test.tmp", it doesn't get into state DELETE PENDING
9:38:01.7136221 AM    unlink-open.exe    10208 SetRenameInformationFile    C:\src\test    SUCCESS ReplaceIfExists: True,
FileName: C:\src\test.tmp
9:38:01.8384110 AM    unlink-open.exe    10208    CloseFile C:\src    SUCCESS
9:38:01.8388203 AM    unlink-open.exe    10208    CloseFile C:\src\test.tmp    SUCCESS

### then file "test.tmp" deleted as usual:
9:38:01.8394278 AM    unlink-open.exe    10208    CreateFile C:\src\test.tmp    SUCCESS    Desired Access: Read
Attributes, Delete, Disposition: Open, Options: Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode:
Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
9:38:01.8396534 AM    unlink-open.exe    10208 QueryAttributeTagFile    C:\src\test.tmp    SUCCESS    Attributes: A,
ReparseTag: 0x0
9:38:01.8396885 AM    unlink-open.exe    10208 SetDispositionInformationFile    C:\src\test.tmp    SUCCESS Delete: True
9:38:01.8397312 AM    unlink-open.exe    10208    CloseFile C:\src\test.tmp    SUCCESS
9:38:01.9162566 AM    unlink-open.exe    10208    CreateFile C:\src\test    SUCCESS    Desired Access: Generic Write,
Read Attributes, Disposition: OverwriteIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N,
ShareMode: Read, Write, AllocationSize: 0, OpenResult: Created
9:38:01.9167628 AM    unlink-open.exe    10208    CloseFile C:\src\test    SUCCESS

So the same test run with MoveFileEx():
unlink-open test 150 1000 rename
                unlink-open test2 150 1000 rename
successfully passes for me in the same environment (Windows VM slowed down to 50%).

That is, my idea was to try removing a file through renaming it as a fast
path (thus avoiding that troublesome state DELETE PENDING), and if that
fails, to perform removal as before. May be the whole function might be
simplified, but I'm not sure about special cases yet.

* IIUC, the important points is the latter part, which waits until the status is
changed. Based on that, can we remove a double rmtree() from cleanup_output_dirs()?
They seems to be add for the similar motivation.

I couldn't yet reproduce a failure, which motivated that doubling (IIUC, it
was observed in [1]/messages/by-id/20230131172806.GM22427@telsasoft.com), with c28911750 reverted, so I need more time to
research that issue to answer this question.

[1]: /messages/by-id/20230131172806.GM22427@telsasoft.com

Best regards,
Alexander

Attachments:

unlink-open.ctext/x-csrc; charset=UTF-8; name=unlink-open.cDownload
#8Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Alexander Lakhin (#7)
RE: Random pg_upgrade test failure on drongo

Dear Alexander,

Please look at the simple test program attached. It demonstrates the
failure for me when running in two sessions as follows:
unlink-open test 150 1000
unlink-open test2 150 1000

Thanks for attaching a program. This helps us to understand the issue.
I wanted to confirm your env - this failure was occurred on windows server XXXX, right?

That is, my idea was to try removing a file through renaming it as a fast
path (thus avoiding that troublesome state DELETE PENDING), and if that
fails, to perform removal as before. May be the whole function might be
simplified, but I'm not sure about special cases yet.

I felt that your result showed pgrename() would be more rarely delayed than unlink().
That's why a file which has original name would not exist when subsequent open() was called.

About special cases, I wanted seniors to check.

* IIUC, the important points is the latter part, which waits until the status is
changed. Based on that, can we remove a double rmtree() from

cleanup_output_dirs()?

They seems to be add for the similar motivation.

I couldn't yet reproduce a failure, which motivated that doubling (IIUC, it
was observed in [1]), with c28911750 reverted, so I need more time to
research that issue to answer this question.

Yeah, as the first place, this failure seldom occurred....

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#9Alexander Lakhin
exclusion@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#8)
Re: Random pg_upgrade test failure on drongo

Hello Kuroda-san,

25.11.2023 18:19, Hayato Kuroda (Fujitsu) wrote:

Thanks for attaching a program. This helps us to understand the issue.
I wanted to confirm your env - this failure was occurred on windows server XXXX, right?

I see that behavior on:
Windows 10 Version 1607 (OS Build 14393.0)
Windows Server 2016 Version 1607 (OS Build 14393.0)
Windows Server 2019 Version 1809 (OS Build 17763.1)

But it's not reproduced on:
Windows 10 Version 1809 (OS Build 17763.1) (triple-checked)
Windows Server 2019 Version 1809 (OS Build 17763.592)
Windows 10 Version 22H2 (OS Build 19045.3693)
Windows 11 Version 21H2 (OS Build 22000.613)

So it looks like the failure occurs depending not on Windows edition, but
rather on it's build. For Windows Server 2019 the "good" build is
somewhere between 17763.1 and 17763.592, but for Windows 10 it's between
14393.0 and 17763.1.
(Maybe there was some change related to FILE_DISPOSITION_POSIX_SEMANTICS/
FILE_DISPOSITION_ON_CLOSE implementation; I don't know where to find
information about that change.)

It's also interesting, what is full version/build of OS on drongo and
fairywren.

That is, my idea was to try removing a file through renaming it as a fast
path (thus avoiding that troublesome state DELETE PENDING), and if that
fails, to perform removal as before. May be the whole function might be
simplified, but I'm not sure about special cases yet.

I felt that your result showed pgrename() would be more rarely delayed than unlink().
That's why a file which has original name would not exist when subsequent open() was called.

I think that's because unlink() is performed asynchronously on those old
Windows versions, but rename() is always synchronous.

* IIUC, the important points is the latter part, which waits until the status is
changed. Based on that, can we remove a double rmtree() from

cleanup_output_dirs()?

They seems to be add for the similar motivation.

I couldn't yet reproduce a failure, which motivated that doubling (IIUC, it
was observed in [1]), with c28911750 reverted, so I need more time to
research that issue to answer this question.

Yeah, as the first place, this failure seldom occurred....

I've managed to reproduce that issue (or at least a situation that
manifested similarly) with a sleep added in miscinit.c:
        ereport(IsPostmasterEnvironment ? LOG : NOTICE,
                        (errmsg("database system is shut down")));
+       pg_usleep(500000L);

With this change, I get the same warning as in [1]/messages/by-id/20230131172806.GM22427@telsasoft.com when running in
parallel 10 tests 002_pg_upgrade with a minimal olddump (on iterations
33, 46, 8). And with my PoC patch applied, I could see the same warning
as well (on iteration 6).

I believe that's because rename() can't rename a directory containing an
open file, just as unlink() can't remove it.

In the light of the above, I think that the issue in question should be
fixed in accordance with/as a supplement to [2]/messages/by-id/CA+hUKG+ajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN+icg@mail.gmail.com.

[1]: /messages/by-id/20230131172806.GM22427@telsasoft.com
[2]: /messages/by-id/CA+hUKG+ajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN+icg@mail.gmail.com

Best regards,
Alexander

#10Andrew Dunstan
andrew@dunslane.net
In reply to: Alexander Lakhin (#9)
Re: Random pg_upgrade test failure on drongo

On 2023-11-27 Mo 07:00, Alexander Lakhin wrote:

Hello Kuroda-san,

25.11.2023 18:19, Hayato Kuroda (Fujitsu) wrote:

Thanks for attaching a program. This helps us to understand the issue.
I wanted to confirm your env - this failure was occurred on windows
server XXXX, right?

I see that behavior on:
Windows 10 Version 1607 (OS Build 14393.0)
Windows Server 2016 Version 1607 (OS Build 14393.0)
Windows Server 2019 Version 1809 (OS Build 17763.1)

But it's not reproduced on:
Windows 10 Version 1809 (OS Build 17763.1) (triple-checked)
Windows Server 2019 Version 1809 (OS Build 17763.592)
Windows 10 Version 22H2 (OS Build 19045.3693)
Windows 11 Version 21H2 (OS Build 22000.613)

So it looks like the failure occurs depending not on Windows edition, but
rather on it's build. For Windows Server 2019 the "good" build is
somewhere between 17763.1 and 17763.592, but for Windows 10 it's between
14393.0 and 17763.1.
(Maybe there was some change related to FILE_DISPOSITION_POSIX_SEMANTICS/
FILE_DISPOSITION_ON_CLOSE implementation; I don't know where to find
information about that change.)

It's also interesting, what is full version/build of OS on drongo and
fairywren.

It's WS 2019 1809/17763.4252. The latest available AFAICT is 17763.5122

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#11Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#10)
Re: Random pg_upgrade test failure on drongo

On 2023-11-27 Mo 07:39, Andrew Dunstan wrote:

On 2023-11-27 Mo 07:00, Alexander Lakhin wrote:

Hello Kuroda-san,

25.11.2023 18:19, Hayato Kuroda (Fujitsu) wrote:

Thanks for attaching a program. This helps us to understand the issue.
I wanted to confirm your env - this failure was occurred on windows
server XXXX, right?

I see that behavior on:
Windows 10 Version 1607 (OS Build 14393.0)
Windows Server 2016 Version 1607 (OS Build 14393.0)
Windows Server 2019 Version 1809 (OS Build 17763.1)

But it's not reproduced on:
Windows 10 Version 1809 (OS Build 17763.1) (triple-checked)
Windows Server 2019 Version 1809 (OS Build 17763.592)
Windows 10 Version 22H2 (OS Build 19045.3693)
Windows 11 Version 21H2 (OS Build 22000.613)

So it looks like the failure occurs depending not on Windows edition,
but
rather on it's build. For Windows Server 2019 the "good" build is
somewhere between 17763.1 and 17763.592, but for Windows 10 it's between
14393.0 and 17763.1.
(Maybe there was some change related to
FILE_DISPOSITION_POSIX_SEMANTICS/
FILE_DISPOSITION_ON_CLOSE implementation; I don't know where to find
information about that change.)

It's also interesting, what is full version/build of OS on drongo and
fairywren.

It's WS 2019 1809/17763.4252. The latest available AFAICT is 17763.5122

I've updated it to 17763.5122 now.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#12Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Alexander Lakhin (#9)
RE: Random pg_upgrade test failure on drongo

Dear Alexander, Andrew,

Thanks for your analysis!

I see that behavior on:
Windows 10 Version 1607 (OS Build 14393.0)
Windows Server 2016 Version 1607 (OS Build 14393.0)
Windows Server 2019 Version 1809 (OS Build 17763.1)

But it's not reproduced on:
Windows 10 Version 1809 (OS Build 17763.1) (triple-checked)
Windows Server 2019 Version 1809 (OS Build 17763.592)
Windows 10 Version 22H2 (OS Build 19045.3693)
Windows 11 Version 21H2 (OS Build 22000.613)

So it looks like the failure occurs depending not on Windows edition, but
rather on it's build. For Windows Server 2019 the "good" build is
somewhere between 17763.1 and 17763.592, but for Windows 10 it's between
14393.0 and 17763.1.
(Maybe there was some change related to
FILE_DISPOSITION_POSIX_SEMANTICS/
FILE_DISPOSITION_ON_CLOSE implementation; I don't know where to find
information about that change.)

It's also interesting, what is full version/build of OS on drongo and
fairywren.

Thanks for your interest for the issue. I have been tracking the failure but been not occurred.
Your analysis seems to solve BF failures, by updating OSes.

I think that's because unlink() is performed asynchronously on those old
Windows versions, but rename() is always synchronous.

OK. Actually I could not find descriptions about them, but your experiment showed facts.

I've managed to reproduce that issue (or at least a situation that
manifested similarly) with a sleep added in miscinit.c:
ereport(IsPostmasterEnvironment ? LOG : NOTICE,
(errmsg("database system is shut down")));
+ pg_usleep(500000L);

With this change, I get the same warning as in [1] when running in
parallel 10 tests 002_pg_upgrade with a minimal olddump (on iterations
33, 46, 8). And with my PoC patch applied, I could see the same warning
as well (on iteration 6).

I believe that's because rename() can't rename a directory containing an
open file, just as unlink() can't remove it.

In the light of the above, I think that the issue in question should be
fixed in accordance with/as a supplement to [2].

OK, I understood that we need to fix more around here. For now, we should focus our points.

Your patch seems good, but it needs more sight from windows-friendly developers.
How do other think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#13Alexander Lakhin
exclusion@gmail.com
In reply to: Andrew Dunstan (#11)
1 attachment(s)
Re: Random pg_upgrade test failure on drongo

Hello Andrew and Kuroda-san,

27.11.2023 16:58, Andrew Dunstan wrote:

It's also interesting, what is full version/build of OS on drongo and
fairywren.

It's WS 2019 1809/17763.4252. The latest available AFAICT is 17763.5122

I've updated it to 17763.5122 now.

Thank you for the information! It had pushed me to upgrade my Server
2019 1809 17763.592 to 17763.4252. And then I discovered that I have
difficulties with reproducing the issue on all my VMs after reboot (even
on old versions/builds). It took me a while to understand what's going on
and what affects reproduction of the issue.
I was puzzled by the fact that I can't reproduce the issue with my
unlink-open test program under seemingly the same conditions as before,
until I realized that the issue reproduced only when the target directory
opened in Windows Explorer.

Now I'm sorry for bringing more mystery into the topic and for misleading
information.

So, the issue reproduced only when something scans the working directory
for files/opens them. I added the same logic into my test program (see
unlink-open-scandir attached) and now I see the failure on Windows Server
2019 (Version 10.0.17763.4252).
A script like this:
start cmd /c "unlink-open-scandir test1 10 5000 >log1 2>&1"
...
start cmd /c "unlink-open-scandir test10 10 5000 >log10 2>&1"

results in:
C:\temp>find "failed" log*
---------- LOG1
---------- LOG10
fopen() after unlink() failed (13)
---------- LOG2
fopen() after unlink() failed (13)
---------- LOG3
fopen() after unlink() failed (13)
---------- LOG4
fopen() after unlink() failed (13)
---------- LOG5
fopen() after unlink() failed (13)
---------- LOG6
fopen() after unlink() failed (13)
---------- LOG7
fopen() after unlink() failed (13)
---------- LOG8
fopen() after unlink() failed (13)
---------- LOG9
fopen() after unlink() failed (13)

C:\temp>type log10
...
iteration 108
fopen() after unlink() failed (13)

The same observed on:
Windows 10 Version 1809 (OS Build 17763.1)

But no failures on:
Windows 10 Version 22H2 (OS Build 19045.3693)
Windows 11 Version 21H2 (OS Build 22000.613)

So the behavior change really took place, but my previous estimations were
incorrect (my apologies).

BTW, "rename" mode of the test program can produce more rare errors on
rename:
---------- LOG3
MoveFileEx() failed (0)

but not on open.

30.11.2023 13:00, Hayato Kuroda (Fujitsu) wrote:

Thanks for your interest for the issue. I have been tracking the failure but been not occurred.
Your analysis seems to solve BF failures, by updating OSes.

Yes, but I don't think that leaving Server 2019 behind (I suppose Windows
Server 2019 build 20348 would have the same behaviour as Windows 10 19045)
is affordable. (Though looking at Cirrus CI logs, I see that what is
entitled "Windows Server 2019" in fact is Windows Server 2022 there.)

I think that's because unlink() is performed asynchronously on those old
Windows versions, but rename() is always synchronous.

OK. Actually I could not find descriptions about them, but your experiment showed facts.

I don't know how this peculiarity is called, but it looks like when some
other process captures the file handle, unlink() exits as if the file was
deleted completely, but the subsequent open() fails.

Best regards,
Alexander

Attachments:

unlink-open-scandir.ctext/x-csrc; charset=UTF-8; name=unlink-open-scandir.cDownload
#14Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Alexander Lakhin (#5)
RE: Random pg_upgrade test failure on drongo

Dear Alexander,

I agree with your analysis and would like to propose a PoC fix (see
attached). With this patch applied, 20 iterations succeeded for me.

There are no reviewers so that I will review again. Let's move the PoC
to the concrete patch. Note that I only focused on fixes of random failure,
other parts are out-of-scope.

Basically, code comments can be updated accordingly.

01.

```
/*
* This function might be called for a regular file or for a junction
* point (which we use to emulate symlinks). The latter must be unlinked
* with rmdir() on Windows. Before we worry about any of that, let's see
* if we can unlink directly, since that's expected to be the most common
* case.
*/
snprintf(tmppath, sizeof(tmppath), "%s.tmp", path);
if (pgrename(path, tmppath) == 0)
{
if (unlink(tmppath) == 0)
return 0;
curpath = tmppath;
}
```

You can modify comments atop changes because it is not trivial.
Below is my draft:

```
* XXX: we rename the target file to ".tmp" before calling unlink. The
* removal may fail with STATUS_DELETE_PENDING status on Windows, so
* creating the same file would fail. This assumes that renaming is a
* synchronous operation.
```

02.

```
loops = 0;
while (lstat(curpath, &st) < 0 && lstat_error_was_status_delete_pending())
{
if (++loops > 100) /* time out after 10 sec */
return -1;
pg_usleep(100000); /* us */
}
```

Comments can be added atop the part. Below one is my draft.

```
/*
* Wait until the removal is really finished to avoid ERRORs for creating a
* same file in other functions.
*/
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#15Alexander Lakhin
exclusion@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#14)
Re: Random pg_upgrade test failure on drongo

Hello Kuroda-san,

28.12.2023 06:08, Hayato Kuroda (Fujitsu) wrote:

Dear Alexander,

I agree with your analysis and would like to propose a PoC fix (see
attached). With this patch applied, 20 iterations succeeded for me.

There are no reviewers so that I will review again. Let's move the PoC
to the concrete patch. Note that I only focused on fixes of random failure,
other parts are out-of-scope.

Thinking about that fix more, I'm not satisfied with the approach proposed.
First, it turns every unlink operation into two write operations
(rename + unlink), not to say about new risks of having stale .tmp files
(perhaps, it's ok for regular files (MoveFileEx can overwrite existing
files), but not for directories)
Second, it does that on any Windows OS versions, including modern ones,
which are not affected by the issue, as we know.

So I started to think about other approach: to perform unlink as it's
implemented now, but then wait until the DELETE_PENDING state is gone.
And I was very surprised to see that this state is not transient in our case.
Additional investigation showed that the test fails not because some aside
process opens a file (concretely, {template1_id/postgres_id}/2683), that is
being deleted, but because of an internal process that opens the file and
holds a handle to it indefinitely.
And the internal process is ... background writer (BgBufferSync()).

So, I tried just adding bgwriter_lru_maxpages = 0 to postgresql.conf and
got 20 x 10 tests passing.

Thus, it we want just to get rid of the test failure, maybe it's enough to
add this to the test's config...

The other way to go is to find out whether the background writer process
should react on a shared-inval message, sent from smgrdounlinkall(), and
close that file's handle,

Maybe we could also (after changing the bgwriter's behaviour) add a waiting
loop into pgwin32_open_handle() to completely rule out transient open()
failures due to some other process (such as Windows Exporer) opening a file
being deleted, but I would not complicate the things until we have a clear
vision/plans of using modern APIs/relying of modern OS versions' behavior.
I mean proceeding with something like:
https://commitfest.postgresql.org/40/3951/

Best regards,
Alexander

#16Amit Kapila
amit.kapila16@gmail.com
In reply to: Alexander Lakhin (#15)
Re: Random pg_upgrade test failure on drongo

On Tue, Jan 2, 2024 at 10:30 AM Alexander Lakhin <exclusion@gmail.com> wrote:

28.12.2023 06:08, Hayato Kuroda (Fujitsu) wrote:

Dear Alexander,

I agree with your analysis and would like to propose a PoC fix (see
attached). With this patch applied, 20 iterations succeeded for me.

There are no reviewers so that I will review again. Let's move the PoC
to the concrete patch. Note that I only focused on fixes of random failure,
other parts are out-of-scope.

Thinking about that fix more, I'm not satisfied with the approach proposed.
First, it turns every unlink operation into two write operations
(rename + unlink), not to say about new risks of having stale .tmp files
(perhaps, it's ok for regular files (MoveFileEx can overwrite existing
files), but not for directories)
Second, it does that on any Windows OS versions, including modern ones,
which are not affected by the issue, as we know.

So I started to think about other approach: to perform unlink as it's
implemented now, but then wait until the DELETE_PENDING state is gone.

There is a comment in the code which suggests we shouldn't wait
indefinitely. See "However, we won't wait indefinitely for someone
else to close the file, as the caller might be holding locks and
blocking other backends."

And I was very surprised to see that this state is not transient in our case.
Additional investigation showed that the test fails not because some aside
process opens a file (concretely, {template1_id/postgres_id}/2683), that is
being deleted, but because of an internal process that opens the file and
holds a handle to it indefinitely.
And the internal process is ... background writer (BgBufferSync()).

So, I tried just adding bgwriter_lru_maxpages = 0 to postgresql.conf and
got 20 x 10 tests passing.

Thus, it we want just to get rid of the test failure, maybe it's enough to
add this to the test's config...

What about checkpoints? Can't it do the same while writing the buffers?

--
With Regards,
Amit Kapila.

#17Alexander Lakhin
exclusion@gmail.com
In reply to: Amit Kapila (#16)
1 attachment(s)
Re: Random pg_upgrade test failure on drongo

Hello Amit,

03.01.2024 14:42, Amit Kapila wrote:

So I started to think about other approach: to perform unlink as it's
implemented now, but then wait until the DELETE_PENDING state is gone.

There is a comment in the code which suggests we shouldn't wait
indefinitely. See "However, we won't wait indefinitely for someone
else to close the file, as the caller might be holding locks and
blocking other backends."

Yes, I saw it, but initially I thought that we have a transient condition
there, so waiting in open() (instead of failing immediately) seemed like a
good idea then...

And the internal process is ... background writer (BgBufferSync()).

So, I tried just adding bgwriter_lru_maxpages = 0 to postgresql.conf and
got 20 x 10 tests passing.

Thus, it we want just to get rid of the test failure, maybe it's enough to
add this to the test's config...

What about checkpoints? Can't it do the same while writing the buffers?

As we deal here with pg_upgrade/pg_restore, it must not be very easy to get
the desired effect, but I think it's not impossible in principle.
More details below.
What happens during the pg_upgrade execution is essentially:
1) CREATE DATABASE "postgres" WITH TEMPLATE = template0 OID = 5 ...;
-- this command flushes file buffers as well
2) ALTER DATABASE postgres OWNER TO ...
3) COMMENT ON DATABASE "postgres" IS ...
4)     -- For binary upgrade, preserve pg_largeobject and index relfilenodes
    SELECT pg_catalog.binary_upgrade_set_next_index_relfilenode('2683'::pg_catalog.oid);
    SELECT pg_catalog.binary_upgrade_set_next_heap_relfilenode('2613'::pg_catalog.oid);
    TRUNCATE pg_catalog.pg_largeobject;
--  ^^^ here we can get the error "could not create file "base/5/2683": File exists"
...

We get the effect discussed when the background writer process decides to
flush a file buffer for pg_largeobject during stage 1.
(Thus, if a checkpoint somehow happened to occur during CREATE DATABASE,
the result must be the same.)
And another important factor is shared_buffers = 1MB (set during the test).
With the default setting of 128MB I couldn't see the failure.

It can be reproduced easily (on old Windows versions) just by running
pg_upgrade in a loop (I've got failures on iterations 22, 37, 17 (with the
default cluster)).
If an old cluster contains dozen of databases, this increases the failure
probability significantly (with 10 additional databases I've got failures
on iterations 4, 1, 6).

Please see the reproducing script attached.

Best regards,
Alexander

Attachments:

pg_upgrade_error-repro.txttext/plain; charset=UTF-8; name=pg_upgrade_error-repro.txtDownload
#18Amit Kapila
amit.kapila16@gmail.com
In reply to: Alexander Lakhin (#17)
Re: Random pg_upgrade test failure on drongo

On Thu, Jan 4, 2024 at 5:30 PM Alexander Lakhin <exclusion@gmail.com> wrote:

03.01.2024 14:42, Amit Kapila wrote:

And the internal process is ... background writer (BgBufferSync()).

So, I tried just adding bgwriter_lru_maxpages = 0 to postgresql.conf and
got 20 x 10 tests passing.

Thus, it we want just to get rid of the test failure, maybe it's enough to
add this to the test's config...

What about checkpoints? Can't it do the same while writing the buffers?

As we deal here with pg_upgrade/pg_restore, it must not be very easy to get
the desired effect, but I think it's not impossible in principle.
More details below.
What happens during the pg_upgrade execution is essentially:
1) CREATE DATABASE "postgres" WITH TEMPLATE = template0 OID = 5 ...;
-- this command flushes file buffers as well
2) ALTER DATABASE postgres OWNER TO ...
3) COMMENT ON DATABASE "postgres" IS ...
4) -- For binary upgrade, preserve pg_largeobject and index relfilenodes
SELECT pg_catalog.binary_upgrade_set_next_index_relfilenode('2683'::pg_catalog.oid);
SELECT pg_catalog.binary_upgrade_set_next_heap_relfilenode('2613'::pg_catalog.oid);
TRUNCATE pg_catalog.pg_largeobject;
-- ^^^ here we can get the error "could not create file "base/5/2683": File exists"
...

We get the effect discussed when the background writer process decides to
flush a file buffer for pg_largeobject during stage 1.
(Thus, if a checkpoint somehow happened to occur during CREATE DATABASE,
the result must be the same.)
And another important factor is shared_buffers = 1MB (set during the test).
With the default setting of 128MB I couldn't see the failure.

It can be reproduced easily (on old Windows versions) just by running
pg_upgrade in a loop (I've got failures on iterations 22, 37, 17 (with the
default cluster)).
If an old cluster contains dozen of databases, this increases the failure
probability significantly (with 10 additional databases I've got failures
on iterations 4, 1, 6).

I don't have an old Windows environment to test but I agree with your
analysis and theory. The question is what should we do for these new
random BF failures? I think we should set bgwriter_lru_maxpages to 0
and checkpoint_timeout to 1hr for these new tests. Doing some invasive
fix as part of this doesn't sound reasonable because this is an
existing problem and there seems to be another patch by Thomas that
probably deals with the root cause of the existing problem [1]https://commitfest.postgresql.org/40/3951/ as
pointed out by you.

[1]: https://commitfest.postgresql.org/40/3951/

--
With Regards,
Amit Kapila.

#19Jim Nasby
jim.nasby@gmail.com
In reply to: Amit Kapila (#18)
Re: Random pg_upgrade test failure on drongo

On 1/4/24 10:19 PM, Amit Kapila wrote:

On Thu, Jan 4, 2024 at 5:30 PM Alexander Lakhin <exclusion@gmail.com> wrote:

03.01.2024 14:42, Amit Kapila wrote:

And the internal process is ... background writer (BgBufferSync()).

So, I tried just adding bgwriter_lru_maxpages = 0 to postgresql.conf and
got 20 x 10 tests passing.

Thus, it we want just to get rid of the test failure, maybe it's enough to
add this to the test's config...

What about checkpoints? Can't it do the same while writing the buffers?

As we deal here with pg_upgrade/pg_restore, it must not be very easy to get
the desired effect, but I think it's not impossible in principle.
More details below.
What happens during the pg_upgrade execution is essentially:
1) CREATE DATABASE "postgres" WITH TEMPLATE = template0 OID = 5 ...;
-- this command flushes file buffers as well
2) ALTER DATABASE postgres OWNER TO ...
3) COMMENT ON DATABASE "postgres" IS ...
4) -- For binary upgrade, preserve pg_largeobject and index relfilenodes
SELECT pg_catalog.binary_upgrade_set_next_index_relfilenode('2683'::pg_catalog.oid);
SELECT pg_catalog.binary_upgrade_set_next_heap_relfilenode('2613'::pg_catalog.oid);
TRUNCATE pg_catalog.pg_largeobject;
-- ^^^ here we can get the error "could not create file "base/5/2683": File exists"
...

We get the effect discussed when the background writer process decides to
flush a file buffer for pg_largeobject during stage 1.
(Thus, if a checkpoint somehow happened to occur during CREATE DATABASE,
the result must be the same.)
And another important factor is shared_buffers = 1MB (set during the test).
With the default setting of 128MB I couldn't see the failure.

It can be reproduced easily (on old Windows versions) just by running
pg_upgrade in a loop (I've got failures on iterations 22, 37, 17 (with the
default cluster)).
If an old cluster contains dozen of databases, this increases the failure
probability significantly (with 10 additional databases I've got failures
on iterations 4, 1, 6).

I don't have an old Windows environment to test but I agree with your
analysis and theory. The question is what should we do for these new
random BF failures? I think we should set bgwriter_lru_maxpages to 0
and checkpoint_timeout to 1hr for these new tests. Doing some invasive
fix as part of this doesn't sound reasonable because this is an
existing problem and there seems to be another patch by Thomas that
probably deals with the root cause of the existing problem [1] as
pointed out by you.

[1] - https://commitfest.postgresql.org/40/3951/

Isn't this just sweeping the problem (non-POSIX behavior on SMB and
ReFS) under the carpet? I realize that synthetic test workloads like
pg_upgrade in a loop aren't themselves real-world scenarios, but what
about other cases? Even if we're certain it's not possible for these
issues to wedge a server, it's still not a good experience for users to
get random, unexplained IO-related errors...
--
Jim Nasby, Data Architect, Austin TX

#20Amit Kapila
amit.kapila16@gmail.com
In reply to: Jim Nasby (#19)
Re: Random pg_upgrade test failure on drongo

On Mon, Jan 8, 2024 at 9:36 PM Jim Nasby <jim.nasby@gmail.com> wrote:

On 1/4/24 10:19 PM, Amit Kapila wrote:

On Thu, Jan 4, 2024 at 5:30 PM Alexander Lakhin <exclusion@gmail.com> wrote:

03.01.2024 14:42, Amit Kapila wrote:

And the internal process is ... background writer (BgBufferSync()).

So, I tried just adding bgwriter_lru_maxpages = 0 to postgresql.conf and
got 20 x 10 tests passing.

Thus, it we want just to get rid of the test failure, maybe it's enough to
add this to the test's config...

What about checkpoints? Can't it do the same while writing the buffers?

As we deal here with pg_upgrade/pg_restore, it must not be very easy to get
the desired effect, but I think it's not impossible in principle.
More details below.
What happens during the pg_upgrade execution is essentially:
1) CREATE DATABASE "postgres" WITH TEMPLATE = template0 OID = 5 ...;
-- this command flushes file buffers as well
2) ALTER DATABASE postgres OWNER TO ...
3) COMMENT ON DATABASE "postgres" IS ...
4) -- For binary upgrade, preserve pg_largeobject and index relfilenodes
SELECT pg_catalog.binary_upgrade_set_next_index_relfilenode('2683'::pg_catalog.oid);
SELECT pg_catalog.binary_upgrade_set_next_heap_relfilenode('2613'::pg_catalog.oid);
TRUNCATE pg_catalog.pg_largeobject;
-- ^^^ here we can get the error "could not create file "base/5/2683": File exists"
...

We get the effect discussed when the background writer process decides to
flush a file buffer for pg_largeobject during stage 1.
(Thus, if a checkpoint somehow happened to occur during CREATE DATABASE,
the result must be the same.)
And another important factor is shared_buffers = 1MB (set during the test).
With the default setting of 128MB I couldn't see the failure.

It can be reproduced easily (on old Windows versions) just by running
pg_upgrade in a loop (I've got failures on iterations 22, 37, 17 (with the
default cluster)).
If an old cluster contains dozen of databases, this increases the failure
probability significantly (with 10 additional databases I've got failures
on iterations 4, 1, 6).

I don't have an old Windows environment to test but I agree with your
analysis and theory. The question is what should we do for these new
random BF failures? I think we should set bgwriter_lru_maxpages to 0
and checkpoint_timeout to 1hr for these new tests. Doing some invasive
fix as part of this doesn't sound reasonable because this is an
existing problem and there seems to be another patch by Thomas that
probably deals with the root cause of the existing problem [1] as
pointed out by you.

[1] - https://commitfest.postgresql.org/40/3951/

Isn't this just sweeping the problem (non-POSIX behavior on SMB and
ReFS) under the carpet? I realize that synthetic test workloads like
pg_upgrade in a loop aren't themselves real-world scenarios, but what
about other cases? Even if we're certain it's not possible for these
issues to wedge a server, it's still not a good experience for users to
get random, unexplained IO-related errors...

The point is that this is an existing known Windows behavior and that
too only in certain versions. The fix doesn't seem to be
straightforward, so it seems advisable to avoid random BF failures by
having an appropriate configuration.

--
With Regards,
Amit Kapila.

#21Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Amit Kapila (#18)
1 attachment(s)
RE: Random pg_upgrade test failure on drongo

Dear Amit, Alexander,

We get the effect discussed when the background writer process decides to
flush a file buffer for pg_largeobject during stage 1.
(Thus, if a checkpoint somehow happened to occur during CREATE DATABASE,
the result must be the same.)
And another important factor is shared_buffers = 1MB (set during the test).
With the default setting of 128MB I couldn't see the failure.

It can be reproduced easily (on old Windows versions) just by running
pg_upgrade in a loop (I've got failures on iterations 22, 37, 17 (with the
default cluster)).
If an old cluster contains dozen of databases, this increases the failure
probability significantly (with 10 additional databases I've got failures
on iterations 4, 1, 6).

I don't have an old Windows environment to test but I agree with your
analysis and theory. The question is what should we do for these new
random BF failures? I think we should set bgwriter_lru_maxpages to 0
and checkpoint_timeout to 1hr for these new tests. Doing some invasive
fix as part of this doesn't sound reasonable because this is an
existing problem and there seems to be another patch by Thomas that
probably deals with the root cause of the existing problem [1] as
pointed out by you.

[1] - https://commitfest.postgresql.org/40/3951/

Based on the suggestion by Amit, I have created a patch with the alternative
approach. This just does GUC settings. The reported failure is only for
003_logical_slots, but the patch also includes changes for the recently added
test, 004_subscription. IIUC, there is a possibility that 004 would fail as well.

Per our understanding, this patch can stop random failures. Alexander, can you
test for the confirmation?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

0001-Add-GUC-settings-to-avoid-random-failures-on-Windows.patchapplication/octet-stream; name=0001-Add-GUC-settings-to-avoid-random-failures-on-Windows.patchDownload
From d80da7f3433ef66ca80006ee35bd3f47ea1cc3fc Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Tue, 9 Jan 2024 05:16:15 +0000
Subject: [PATCH] Add GUC settings to avoid random failures on Windows

---
 src/bin/pg_upgrade/t/003_logical_slots.pl |  4 ++++
 src/bin/pg_upgrade/t/004_subscription.pl  | 12 +++++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 0e7014ecce..b880fed804 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -22,6 +22,10 @@ $oldpub->append_conf('postgresql.conf', 'autovacuum = off');
 # Initialize new cluster
 my $newpub = PostgreSQL::Test::Cluster->new('newpub');
 $newpub->init(allows_streaming => 'logical');
+$newpub->append_conf('postgresql.conf', q{
+bgwriter_lru_maxpages = 0
+checkpoint_timeout = 1h
+});
 
 # Setup a common pg_upgrade command to be used by all the test cases
 my @pg_upgrade_cmd = (
diff --git a/src/bin/pg_upgrade/t/004_subscription.pl b/src/bin/pg_upgrade/t/004_subscription.pl
index 9e8f9b7970..0c2342d430 100644
--- a/src/bin/pg_upgrade/t/004_subscription.pl
+++ b/src/bin/pg_upgrade/t/004_subscription.pl
@@ -27,6 +27,11 @@ my $oldbindir = $old_sub->config_data('--bindir');
 # Initialize the new subscriber
 my $new_sub = PostgreSQL::Test::Cluster->new('new_sub');
 $new_sub->init;
+$new_sub->append_conf('postgresql.conf', q{
+bgwriter_lru_maxpages = 0
+checkpoint_timeout = 1h
+});
+
 my $newbindir = $new_sub->config_data('--bindir');
 
 # In a VPATH build, we'll be started in the source directory, but we want
@@ -200,7 +205,12 @@ $old_sub->stop;
 # ------------------------------------------------------
 my $new_sub1 = PostgreSQL::Test::Cluster->new('new_sub1');
 $new_sub1->init;
-$new_sub1->append_conf('postgresql.conf', "max_replication_slots = 0");
+$new_sub1->append_conf('postgresql.conf', q{
+bgwriter_lru_maxpages = 0
+checkpoint_timeout = 1h
+max_replication_slots = 0
+});
+
 
 # pg_upgrade will fail because the new cluster has insufficient
 # max_replication_slots.
-- 
2.43.0

#22Alexander Lakhin
exclusion@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#21)
Re: Random pg_upgrade test failure on drongo

Hello Kuroda-san,

09.01.2024 08:49, Hayato Kuroda (Fujitsu) wrote:

Based on the suggestion by Amit, I have created a patch with the alternative
approach. This just does GUC settings. The reported failure is only for
003_logical_slots, but the patch also includes changes for the recently added
test, 004_subscription. IIUC, there is a possibility that 004 would fail as well.

Per our understanding, this patch can stop random failures. Alexander, can you
test for the confirmation?

Yes, the patch fixes the issue for me (without the patch I observe failures
on iterations 1-2, with 10 tests running in parallel, but with the patch
10 iterations succeeded).

But as far I can see, 004_subscription is not affected by the issue,
because it doesn't enable streaming for nodes new_sub, new_sub1.
As I noted before, I could see the failure only with
shared_buffers = 1MB (which is set with allows_streaming => 'logical').
So I'm not sure, whether we need to modify 004 (or any other test that
runs pg_upgrade).

As to checkpoint_timeout, personally I would not increase it, because it
seems unbelievable to me that pg_restore (with the cluster containing only
two empty databases) can run for longer than 5 minutes. I'd rather
investigate such situation separately, in case we encounter it, but maybe
it's only me.
On the other hand, if a checkpoint could occur by some reason within a
shorter time span, then increasing the timeout would not matter, I suppose.
(I've also tested the bgwriter_lru_maxpages-only modification of your patch
and can confirm that it works as well.)

Best regards,
Alexander

#23Amit Kapila
amit.kapila16@gmail.com
In reply to: Alexander Lakhin (#22)
Re: Random pg_upgrade test failure on drongo

On Tue, Jan 9, 2024 at 2:30 PM Alexander Lakhin <exclusion@gmail.com> wrote:

09.01.2024 08:49, Hayato Kuroda (Fujitsu) wrote:

Based on the suggestion by Amit, I have created a patch with the alternative
approach. This just does GUC settings. The reported failure is only for
003_logical_slots, but the patch also includes changes for the recently added
test, 004_subscription. IIUC, there is a possibility that 004 would fail as well.

Per our understanding, this patch can stop random failures. Alexander, can you
test for the confirmation?

Yes, the patch fixes the issue for me (without the patch I observe failures
on iterations 1-2, with 10 tests running in parallel, but with the patch
10 iterations succeeded).

But as far I can see, 004_subscription is not affected by the issue,
because it doesn't enable streaming for nodes new_sub, new_sub1.
As I noted before, I could see the failure only with
shared_buffers = 1MB (which is set with allows_streaming => 'logical').
So I'm not sure, whether we need to modify 004 (or any other test that
runs pg_upgrade).

I see your point and the probable reason for failure with
shared_buffers=1MB is that the probability of bgwriter holding the
file handle for pg_largeobject increases. So, let's change it only for
003.

As to checkpoint_timeout, personally I would not increase it, because it
seems unbelievable to me that pg_restore (with the cluster containing only
two empty databases) can run for longer than 5 minutes. I'd rather
investigate such situation separately, in case we encounter it, but maybe
it's only me.

I feel it is okay to set a higher value of checkpoint_timeout due to
the same reason though the probability is less. I feel here it is
important to explain in the comments why we are using these settings
in the new test. I have thought of something like: "During the
upgrade, bgwriter or checkpointer could hold the file handle for some
removed file. Now, during restore when we try to create the file with
the same name, it errors out. This behavior is specific to only some
specific Windows versions and the probability of seeing this behavior
is higher in this test because we use wal_level as logical via
allows_streaming => 'logical' which in turn sets shared_buffers as
1MB."

Thoughts?

--
With Regards,
Amit Kapila.

#24Alexander Lakhin
exclusion@gmail.com
In reply to: Amit Kapila (#23)
Re: Random pg_upgrade test failure on drongo

Hello Amit,

09.01.2024 13:08, Amit Kapila wrote:

As to checkpoint_timeout, personally I would not increase it, because it
seems unbelievable to me that pg_restore (with the cluster containing only
two empty databases) can run for longer than 5 minutes. I'd rather
investigate such situation separately, in case we encounter it, but maybe
it's only me.

I feel it is okay to set a higher value of checkpoint_timeout due to
the same reason though the probability is less. I feel here it is
important to explain in the comments why we are using these settings
in the new test. I have thought of something like: "During the
upgrade, bgwriter or checkpointer could hold the file handle for some
removed file. Now, during restore when we try to create the file with
the same name, it errors out. This behavior is specific to only some
specific Windows versions and the probability of seeing this behavior
is higher in this test because we use wal_level as logical via
allows_streaming => 'logical' which in turn sets shared_buffers as
1MB."

Thoughts?

I would describe that behavior as "During upgrade, when pg_restore performs
CREATE DATABASE, bgwriter or checkpointer may flush buffers and hold a file
handle for pg_largeobject, so later TRUNCATE pg_largeobject command will
fail if OS (such as older Windows versions) doesn't remove an unlinked file
completely till it's open. ..."

Best regards,
Alexander

#25Amit Kapila
amit.kapila16@gmail.com
In reply to: Alexander Lakhin (#24)
1 attachment(s)
Re: Random pg_upgrade test failure on drongo

On Tue, Jan 9, 2024 at 4:30 PM Alexander Lakhin <exclusion@gmail.com> wrote:

09.01.2024 13:08, Amit Kapila wrote:

As to checkpoint_timeout, personally I would not increase it, because it
seems unbelievable to me that pg_restore (with the cluster containing only
two empty databases) can run for longer than 5 minutes. I'd rather
investigate such situation separately, in case we encounter it, but maybe
it's only me.

I feel it is okay to set a higher value of checkpoint_timeout due to
the same reason though the probability is less. I feel here it is
important to explain in the comments why we are using these settings
in the new test. I have thought of something like: "During the
upgrade, bgwriter or checkpointer could hold the file handle for some
removed file. Now, during restore when we try to create the file with
the same name, it errors out. This behavior is specific to only some
specific Windows versions and the probability of seeing this behavior
is higher in this test because we use wal_level as logical via
allows_streaming => 'logical' which in turn sets shared_buffers as
1MB."

Thoughts?

I would describe that behavior as "During upgrade, when pg_restore performs
CREATE DATABASE, bgwriter or checkpointer may flush buffers and hold a file
handle for pg_largeobject, so later TRUNCATE pg_largeobject command will
fail if OS (such as older Windows versions) doesn't remove an unlinked file
completely till it's open. ..."

I am slightly hesitant to add any particular system table name in the
comments as this can happen for any other system table as well, so
slightly adjusted the comments in the attached. However, I think it is
okay to mention the particular system table name in the commit
message. Let me know what do you think.

--
With Regards,
Amit Kapila.

Attachments:

v2-0001-Fix-an-intermetant-BF-failure-in-003_logical_slot.patchapplication/octet-stream; name=v2-0001-Fix-an-intermetant-BF-failure-in-003_logical_slot.patchDownload
From 62adfa86fb94aea38a1f4d5d62e0d817cba58f33 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Wed, 10 Jan 2024 14:57:52 +0530
Subject: [PATCH v2] Fix an intermetant BF failure in 003_logical_slots.

During upgrade, when pg_restore performs CREATE DATABASE, bgwriter or
checkpointer may flush buffers and hold a file handle for pg_largeobject,
so later TRUNCATE pg_largeobject command will fail if OS (such as older
Windows versions) doesn't remove an unlinked file completely till it's
open. The probability of seeing this behavior is higher in this test
because we use wal_level as logical via allows_streaming => 'logical'
which in turn set shared_buffers as 1MB and make it more probable for
bgwriter to hold the file handle.

Diagnosed-by: Alexander Lakhin
Author: Hayato Kuroda, Amit Kapila
Reviewed-by: Alexander Lakhin
Discussion: https://postgr.es/m/TYAPR01MB5866AB7FD922CE30A2565B8BF5A8A@TYAPR01MB5866.jpnprd01.prod.outlook.com
---
 src/bin/pg_upgrade/t/003_logical_slots.pl | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl
index 0e7014ecce..0ab368247b 100644
--- a/src/bin/pg_upgrade/t/003_logical_slots.pl
+++ b/src/bin/pg_upgrade/t/003_logical_slots.pl
@@ -23,6 +23,19 @@ $oldpub->append_conf('postgresql.conf', 'autovacuum = off');
 my $newpub = PostgreSQL::Test::Cluster->new('newpub');
 $newpub->init(allows_streaming => 'logical');
 
+# During upgrade, when pg_restore performs CREATE DATABASE, bgwriter or
+# checkpointer may flush buffers and hold a file handle for the system table.
+# So, if later due to some reason we need to re-create the file with the same
+# name like a TRUNCATE command on the same table, then the command will fail
+# if OS (such as older Windows versions) doesn't remove an unlinked file
+# completely till it is open. The probability of seeing this behavior is
+# higher in this test because we use wal_level as logical via
+# allows_streaming => 'logical' which in turn set shared_buffers as 1MB.
+$newpub->append_conf('postgresql.conf', q{
+bgwriter_lru_maxpages = 0
+checkpoint_timeout = 1h
+});
+
 # Setup a common pg_upgrade command to be used by all the test cases
 my @pg_upgrade_cmd = (
 	'pg_upgrade', '--no-sync',
-- 
2.28.0.windows.1

#26Alexander Lakhin
exclusion@gmail.com
In reply to: Amit Kapila (#25)
Re: Random pg_upgrade test failure on drongo

10.01.2024 12:31, Amit Kapila wrote:

I am slightly hesitant to add any particular system table name in the
comments as this can happen for any other system table as well, so
slightly adjusted the comments in the attached. However, I think it is
okay to mention the particular system table name in the commit
message. Let me know what do you think.

Thank you, Amit!

I'd like to note that the culprit is exactly pg_largeobject as coded in
dumpDatabase():
    /*
     * pg_largeobject comes from the old system intact, so set its
     * relfrozenxids, relminmxids and relfilenode.
     */
    if (dopt->binary_upgrade)
...
        appendPQExpBufferStr(loOutQry,
                             "TRUNCATE pg_catalog.pg_largeobject;\n");

I see no other TRUNCATEs (or similar logic) around, so I would specify the
table name in the comments. Though maybe I'm missing something...

Best regards,
Alexander

#27Amit Kapila
amit.kapila16@gmail.com
In reply to: Alexander Lakhin (#26)
Re: Random pg_upgrade test failure on drongo

On Wed, Jan 10, 2024 at 3:30 PM Alexander Lakhin <exclusion@gmail.com> wrote:

10.01.2024 12:31, Amit Kapila wrote:

I am slightly hesitant to add any particular system table name in the
comments as this can happen for any other system table as well, so
slightly adjusted the comments in the attached. However, I think it is
okay to mention the particular system table name in the commit
message. Let me know what do you think.

Thank you, Amit!

I'd like to note that the culprit is exactly pg_largeobject as coded in
dumpDatabase():
/*
* pg_largeobject comes from the old system intact, so set its
* relfrozenxids, relminmxids and relfilenode.
*/
if (dopt->binary_upgrade)
...
appendPQExpBufferStr(loOutQry,
"TRUNCATE pg_catalog.pg_largeobject;\n");

I see no other TRUNCATEs (or similar logic) around, so I would specify the
table name in the comments. Though maybe I'm missing something...

But tomorrow it could be for other tables and if we change this
TRUNCATE logic for pg_largeobject (of which chances are less) then
there is always a chance that one misses changing this comment. I feel
keeping it generic in this case would be better as the problem is
generic but it is currently shown for pg_largeobject.

--
With Regards,
Amit Kapila.

#28Alexander Lakhin
exclusion@gmail.com
In reply to: Amit Kapila (#27)
Re: Random pg_upgrade test failure on drongo

10.01.2024 13:37, Amit Kapila wrote:

But tomorrow it could be for other tables and if we change this
TRUNCATE logic for pg_largeobject (of which chances are less) then
there is always a chance that one misses changing this comment. I feel
keeping it generic in this case would be better as the problem is
generic but it is currently shown for pg_largeobject.

Yes, for sure. So let's keep it generic as you prefer.

Thank you!

Best regards,
Alexander

#29Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Alexander Lakhin (#28)
RE: Random pg_upgrade test failure on drongo

Dear Alexander, Amit,

But tomorrow it could be for other tables and if we change this
TRUNCATE logic for pg_largeobject (of which chances are less) then
there is always a chance that one misses changing this comment. I feel
keeping it generic in this case would be better as the problem is
generic but it is currently shown for pg_largeobject.

Yes, for sure. So let's keep it generic as you prefer.

Thank you!

Thanks for working the patch. I'm also OK to push the Amit's fix patch.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#30Amit Kapila
amit.kapila16@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#29)
Re: Random pg_upgrade test failure on drongo

On Thu, Jan 11, 2024 at 8:15 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

But tomorrow it could be for other tables and if we change this
TRUNCATE logic for pg_largeobject (of which chances are less) then
there is always a chance that one misses changing this comment. I feel
keeping it generic in this case would be better as the problem is
generic but it is currently shown for pg_largeobject.

Yes, for sure. So let's keep it generic as you prefer.

Thank you!

Thanks for working the patch. I'm also OK to push the Amit's fix patch.

Thanks to both of you. I have pushed the patch.

--
With Regards,
Amit Kapila.

#31Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Amit Kapila (#30)
RE: Random pg_upgrade test failure on drongo

Dear hackers,

Thanks to both of you. I have pushed the patch.

I have been tracking the BF animals these days, and this failure has not seen anymore.
I think we can close the topic. Again, thanks for all efforts.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED