Cygwin cleanup
Hi,
Continuing a discussion started over at [1]/messages/by-id/CA+hUKGKZ_FjkBnjGADk+pa2g4oKDcG8=SE5V23sPTP0EELfyzQ@mail.gmail.com. Moving this to a new
thread so that other thread can focus on Unix cleanup, and both
threads can get CI coverage...
1. In a few places, it is alleged that both __CYGWIN__ and WIN32
might be defined at the same time. Do you think we should try to get
rid of that possibility? I understand that we have to have a few
tests for __CYGWIN__ here and there, because eg file permissions don't
work quite right and there's not much we can do about that. But it
seems a bit unhelpful if we also have to worry about a more-or-less
POSIX-ish build taking WIN32 paths at uncertain times if we forget to
defend against that, or wonder why some places are not consistent.
A quick recap of the three flavours of Windows platform we have to
handle, as I understand it:
* MSVC: Windowsy toolchain, Windowsy C
* custom perl scripts instead of configure
* msbuild instead of make
* MSVC compiler
* Windows C APIs
* we provide our own emulation of some POSIX C APIs on top
* MSYS: Unixy toolchain, Windowsy C
* configure (portname = "win32")
* make
* GCC compiler
* Windows C APIs
* we provide our own emulation of some POSIX C APIs on top
* Cygwin: Unixy toolchain, Unixy C
* configure (portname = "cygwin")
* make
* GCC compiler
* POSIX C APIs (emulations provided by the Cygwin runtime libraries)
(The configure/make part will be harmonised by the Meson project.)
The macro WIN32 is visibly defined by something in/near msbuild in
MSVC builds: /D WIN32 is right here in the build transcripts (whereas
the compiler defines _WIN32; good compiler). I am not sure how
exactly it is first defined in MSYS builds; I suspect that MSYS gcc
might define it itself, but I don't have access to MSYS to check. As
for Cygwin, the only translation unit where I could find both
__CYGWIN__ and WIN32 defined is dirmod.c, and that results from
including <windows.h> and ultimately <minwindef.h> (even though WIN32
isn't defined yet at that time). I couldn't understand why we do
that, but I probably didn't read enough commit history. The purpose
of dirmod.c on Cygwin today is only to wrap otherwise pure POSIX code
in retry loops to handle those spurious EACCES errors due to NT
sharing violations, so there is no need for that.
Proposal: let's make it a programming rule that we don't allow
definitions from Windows headers to leak into Cygwin translation
units, preferably by never including them, or if we really must, let's
grant specific exemptions in an isolated and well documented way. We
don't seem to need any such exemptions currently. Places where we
currently worry about the contradictory macros could become
conditional #error directives instead.
2. To make it possible to test any of that, you either need a working
Windows+Cygwin setup, or working CI. I'm a salty old Unix hacker so I
opted for the latter, and I also hope this will eventually be useful
to others. Unfortunately I haven't figured out how to get everything
working yet, so some of the check-world tests are failing. Clues most
welcome!
The version I'm posting here is set to run always, so that cfbot will
show it alongside others. But I would imagine that if we got a
committable-quality version of this, it'd probably be opt-in, so you'd
have to say "ci-os-only: cygwin", or "ci-os-only: cygwin, windows" etc
in a commit to your private github account to ask for it (or maybe
we'll come up with a way to tell cfbot we want the full works of CI
checks; the same decision will come up for MSYS, OpenBSD and NetBSD CI
support that my colleague is working on). There are other things to
fix too, including abysmal performance; see commit message.
3. You can't really run PostgreSQL on Cygwin for real, because its
implementation of signals does not have reliable signal masking, so
unsubtle and probably also subtle breakage occurs. That was reported
upstream by Noah years ago, but they aren't working on a fix.
lorikeet shows random failures, and presumably any CI system will do
the same... I even wondered about putting our own magic entry/exit
macros into signal handlers, that would use atomics to implement a
second level of signal masking (?!) but that's an uncommonly large
bandaid for a defective platform... and trying to fix Cygwin itself
is a rabbithole too far for me.
4. When building with Cygwin GCC 11.3 you get a bunch of warnings
that don't show up on other platforms, seemingly indicating that it
interprets -Wimplicit-fallthrough=3 differently. Huh?
[1]: /messages/by-id/CA+hUKGKZ_FjkBnjGADk+pa2g4oKDcG8=SE5V23sPTP0EELfyzQ@mail.gmail.com
Attachments:
0001-WIP-CI-support-for-Cygwin.patchtext/x-patch; charset=US-ASCII; name=0001-WIP-CI-support-for-Cygwin.patchDownload
From 8f300f5b804d5fd2268709d40e31b52c86d6799c Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 25 Jul 2022 23:05:10 +1200
Subject: [PATCH 1/2] WIP CI support for Cygwin.
XXX Doesn't get all the way through yet...
XXX Needs some --with-X options
XXX This should use a canned Docker image with all the right packages
installed
XXX We would never want this to run by default in CI, but it'd be nice
to be able to ask for it with ci-os-only! (See commented out line)
XXX configure is soooo slooow, can we cache it?! Compiling is also
insanely slow, but ccache gets it down to a couple of minutes if you're
lucky
XXX I don't know how to put variables like BUILD_JOBS into the scripts
XXX I have no idea if crash dump works, and if this should share
elements with the msys work in commitfest #3575
---
.cirrus.yml | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/.cirrus.yml b/.cirrus.yml
index f23d6cae55..b5238f5f52 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -456,6 +456,57 @@ task:
path: "crashlog-*.txt"
type: text/plain
+task:
+ name: Windows - Cygwin
+
+ env:
+ CPUS: 4
+ BUILD_JOBS: 4
+ TEST_JOBS: 8
+
+ #only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
+
+ windows_container:
+ image: cirrusci/windowsservercore:2019
+ os_version: 2019
+ cpu: $CPUS
+ memory: 4G
+
+ ccache_cache:
+ folder: C:\tools\cygwin\tmp\ccache
+
+ sysinfo_script: |
+ chcp
+ systeminfo
+ powershell -Command get-psdrive -psprovider filesystem
+ set
+
+ setup_additional_packages_script: |
+ choco install -y --no-progress cygwin
+ C:\tools\cygwin\cygwinsetup.exe -q -P cygrunsrv,make,gcc-core,ccache,binutils,libtool,pkg-config,flex,bison,zlib-devel,libssl-devel,libreadline-devel,perl,perl-IPC-Run
+ C:\tools\cygwin\bin\bash.exe --login -c "cygserver-config -y" || EXIT /b 1
+ C:\tools\cygwin\bin\bash.exe --login -c "echo 'kern.ipc.semmni 1024' >> /etc/cygserver.conf" || EXIT /b 1
+ C:\tools\cygwin\bin\bash.exe --login -c "echo 'kern.ipc.semmns 1024' >> /etc/cygserver.conf" || EXIT /b 1
+ C:\tools\cygwin\bin\bash.exe --login -c "net start cygserver" || EXIT /b 1
+
+ configure_script:
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && CCACHE_DIR=/tmp/ccache ./configure --enable-cassert --enable-debug --enable-tap-tests CC='ccache gcc'" || EXIT /b 1
+
+ build_script:
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && CCACHE_DIR=/tmp/ccache make -s -j4 world-bin" || EXIT /b 1
+
+ test_world_script:
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && CCACHE_DIR=/tmp/ccache make -s -j1 check-world -Otarget" || EXIT /b 1
+
+ on_failure:
+ <<: *on_failure
+ crashlog_artifacts:
+ path: "crashlog-*.txt"
+ type: text/plain
+
+ always:
+ upload_caches: ccache
+
task:
name: CompilerWarnings
--
2.35.1
0002-WIP-Do-not-pollute-Cygwin-namespace-with-Windows-hea.patchtext/x-patch; charset=US-ASCII; name=0002-WIP-Do-not-pollute-Cygwin-namespace-with-Windows-hea.patchDownload
From ea47a8af0332876629b53620788d40bcb7b1e96c Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 26 Jul 2022 12:56:06 +1200
Subject: [PATCH 2/2] WIP Do not pollute Cygwin namespace with Windows headers.
Establish that __CYGWIN__ and WIN32 should generally not be defined at
the same time. If ever it's necessary to access Windows APIs directly
when building for Cygwin, that should be done in an extremely localized
way, not allowing Windows macros and declarations to leak into other
translation units.
dirmod.c initially looked like a potential case for a localized
exemption, but on closer inspection it doesn't currently have any reason
to include <windows.h> -- so don't.
In passing, remove anachronistic comments about ancient Windows
versions.
---
src/include/pg_config_manual.h | 2 +-
src/include/port.h | 17 +++++++------
src/port/dirmod.c | 46 +++++++++++++++++-----------------
3 files changed, 33 insertions(+), 32 deletions(-)
diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index 5ee2c46267..0d1d5d7c9b 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -148,7 +148,7 @@
* fork()). On other platforms, it's only useful for verifying those
* otherwise Windows-specific code paths.
*/
-#if defined(WIN32) && !defined(__CYGWIN__)
+#if defined(WIN32)
#define EXEC_BACKEND
#endif
diff --git a/src/include/port.h b/src/include/port.h
index d39b04141f..8cf1112a54 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -18,10 +18,12 @@
/*
* Windows has enough specialized port stuff that we push most of it off
* into another file.
- * Note: Some CYGWIN includes might #define WIN32.
*/
-#if defined(WIN32) && !defined(__CYGWIN__)
+#if defined(WIN32)
#include "port/win32_port.h"
+#if defined(__CYGWIN__)
+#error "__CYGWIN__ should not be defined at the same time as WIN32"
+#endif
#endif
/* socket has a different definition on WIN32 */
@@ -150,7 +152,7 @@ extern int pg_disable_aslr(void);
#define EXE ""
#endif
-#if defined(WIN32) && !defined(__CYGWIN__)
+#if defined(WIN32)
#define DEVNULL "nul"
#else
#define DEVNULL "/dev/null"
@@ -276,12 +278,11 @@ extern int pgunlink(const char *path);
* Win32 also doesn't have symlinks, but we can emulate them with
* junction points on newer Win32 versions.
*
- * Cygwin has its own symlinks which work on Win95/98/ME where
- * junction points don't, so use those instead. We have no way of
- * knowing what type of system Cygwin binaries will be run on.
- * Note: Some CYGWIN includes might #define WIN32.
+ * Cygwin has its own symlinks that work where junction points don't, so use
+ * those instead. We have no way of knowing what type of system Cygwin
+ * binaries will be run on.
*/
-#if defined(WIN32) && !defined(__CYGWIN__)
+#if defined(WIN32)
extern int pgsymlink(const char *oldpath, const char *newpath);
extern int pgreadlink(const char *path, char *buf, size_t size);
extern bool pgwin32_is_junction(const char *path);
diff --git a/src/port/dirmod.c b/src/port/dirmod.c
index 7ce042e75d..1bfbead098 100644
--- a/src/port/dirmod.c
+++ b/src/port/dirmod.c
@@ -6,8 +6,10 @@
* Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
- * This includes replacement versions of functions that work on
- * Win32 (NT4 and newer).
+ * This includes replacement versions of functions that work on Windows.
+ * For Cygwin, the purpose of these replacements is to provide retry loops
+ * around POSIX functions. For native Windows, we also redirect to
+ * native Windows APIs.
*
* IDENTIFICATION
* src/port/dirmod.c
@@ -21,25 +23,24 @@
#include "postgres_fe.h"
#endif
+#if defined(WIN32) && defined(__CYGWIN__)
+#error "WIN32 should not be defined at the same time as __CYGWIN__"
+#endif
+
+#if !defined(WIN32) && !defined(__CYGWIN__)
+#error "one of WIN32 or __CYGWIN__ is expected"
+#endif
+
/* Don't modify declarations in system headers */
-#if defined(WIN32) || defined(__CYGWIN__)
#undef rename
#undef unlink
-#endif
#include <unistd.h>
#include <sys/stat.h>
-#if defined(WIN32) || defined(__CYGWIN__)
-#ifndef __CYGWIN__
+#if defined(WIN32)
#include <winioctl.h>
-#else
-#include <windows.h>
-#include <w32api/winioctl.h>
#endif
-#endif
-
-#if defined(WIN32) || defined(__CYGWIN__)
/*
* pgrename
@@ -56,24 +57,24 @@ pgrename(const char *from, const char *to)
* someone else to close the file, as the caller might be holding locks
* and blocking other backends.
*/
-#if defined(WIN32) && !defined(__CYGWIN__)
+#if defined(WIN32)
while (!MoveFileEx(from, to, MOVEFILE_REPLACE_EXISTING))
#else
while (rename(from, to) < 0)
#endif
{
-#if defined(WIN32) && !defined(__CYGWIN__)
+#if defined(WIN32)
DWORD err = GetLastError();
_dosmaperr(err);
/*
- * Modern NT-based Windows versions return ERROR_SHARING_VIOLATION if
- * another process has the file open without FILE_SHARE_DELETE.
- * ERROR_LOCK_VIOLATION has also been seen with some anti-virus
- * software. This used to check for just ERROR_ACCESS_DENIED, so
- * presumably you can get that too with some OS versions. We don't
- * expect real permission errors where we currently use rename().
+ * Windows returns ERROR_SHARING_VIOLATION if another process has the
+ * file open without FILE_SHARE_DELETE. ERROR_LOCK_VIOLATION has also
+ * been seen with some anti-virus software. This used to check for
+ * just ERROR_ACCESS_DENIED, so presumably you can get that too with
+ * some OS versions. We don't expect real permission errors where we
+ * currently use rename().
*/
if (err != ERROR_ACCESS_DENIED &&
err != ERROR_SHARING_VIOLATION &&
@@ -121,10 +122,9 @@ pgunlink(const char *path)
/* We undefined these above; now redefine for possible use below */
#define rename(from, to) pgrename(from, to)
#define unlink(path) pgunlink(path)
-#endif /* defined(WIN32) || defined(__CYGWIN__) */
-#if defined(WIN32) && !defined(__CYGWIN__) /* Cygwin has its own symlinks */
+#if defined(WIN32)
/*
* pgsymlink support:
@@ -352,4 +352,4 @@ pgwin32_is_junction(const char *path)
}
return ((attr & FILE_ATTRIBUTE_REPARSE_POINT) == FILE_ATTRIBUTE_REPARSE_POINT);
}
-#endif /* defined(WIN32) && !defined(__CYGWIN__) */
+#endif /* defined(WIN32) */
--
2.35.1
Thomas Munro <thomas.munro@gmail.com> writes:
3. You can't really run PostgreSQL on Cygwin for real, because its
implementation of signals does not have reliable signal masking, so
unsubtle and probably also subtle breakage occurs. That was reported
upstream by Noah years ago, but they aren't working on a fix.
lorikeet shows random failures, and presumably any CI system will do
the same...
If that's an accurate statement, shouldn't we just drop Cygwin support?
Now that we have a native Windows build, it's hard to see how any live
user would prefer to use the Cygwin build.
regards, tom lane
On Tue, Jul 26, 2022 at 4:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
3. You can't really run PostgreSQL on Cygwin for real, because its
implementation of signals does not have reliable signal masking, so
unsubtle and probably also subtle breakage occurs. That was reported
upstream by Noah years ago, but they aren't working on a fix.
lorikeet shows random failures, and presumably any CI system will do
the same...If that's an accurate statement, shouldn't we just drop Cygwin support?
This thread rejected the idea last time around:
/messages/by-id/136712b0-0619-5619-4634-0f0286acaef7@2ndQuadrant.com
lorikeet still shows the issue. Failures often involve assertions
about PMSignalState or mq->mq_sender. Hmm, it's running Cygwin 3.2.0
(March 2021) and the latest release is 3.3.5, so it's remotely
possible that it's been fixed recently. Maybe that'd be somewhere in
here, but it's not jumping out:
https://github.com/cygwin/cygwin/commits/master/winsup/cygwin/signal.cc
(Oooh, another implementation of signalfd...)
Thomas Munro <thomas.munro@gmail.com> writes:
On Tue, Jul 26, 2022 at 4:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
If that's an accurate statement, shouldn't we just drop Cygwin support?
This thread rejected the idea last time around:
/messages/by-id/136712b0-0619-5619-4634-0f0286acaef7@2ndQuadrant.com
I think maybe we should re-open the discussion. I've certainly
reached the stage of fed-up-ness. That platform seems seriously
broken, upstream is making no progress on fixing it, and there
doesn't seem to be any real-world use-case. The only positive
argument for it is that Readline doesn't work in the other
Windows builds --- but we've apparently not rechecked that
statement in eighteen years, so maybe things are better now.
If we could just continue to blithely ignore lorikeet's failures,
I wouldn't mind so much; but doing any significant amount of new
code development work for the platform seems like throwing away
developer time.
regards, tom lane
On Tue, Jul 26, 2022 at 7:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think maybe we should re-open the discussion. I've certainly
reached the stage of fed-up-ness. That platform seems seriously
broken, upstream is making no progress on fixing it, and there
doesn't seem to be any real-world use-case. The only positive
argument for it is that Readline doesn't work in the other
Windows builds --- but we've apparently not rechecked that
statement in eighteen years, so maybe things are better now.If we could just continue to blithely ignore lorikeet's failures,
I wouldn't mind so much; but doing any significant amount of new
code development work for the platform seems like throwing away
developer time.
I agree with that. All things being equal, I like the idea of
supporting a bunch of different platforms, and Cygwin doesn't really
look that dead. It has recent releases. But if blocking signals
doesn't actually work on that platform, making PostgreSQL work
reliably there seems really difficult.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Jul 26, 2022 at 04:24:25PM +1200, Thomas Munro wrote:
3. You can't really run PostgreSQL on Cygwin for real, because its
implementation of signals does not have reliable signal masking, so
unsubtle and probably also subtle breakage occurs. That was reported
upstream by Noah years ago, but they aren't working on a fix.
lorikeet shows random failures, and presumably any CI system will do
the same...
Reference: /messages/by-id/20170321034703.GB2097809@tornado.leadboat.com
On my 2nd try:
https://cirrus-ci.com/task/5311911574110208
TRAP: FailedAssertion("mq->mq_sender == NULL", File: "shm_mq.c", Line: 230, PID: 16370)
2022-07-26 06:32:35.525 PDT [15538][postmaster] LOG: background worker "parallel worker" (PID 16370) was terminated by signal 6: Aborted
XXX Doesn't get all the way through yet...
Mainly because getopt was causing all tap tests to fail.
I tried to fix that in configure, but ended up changing the callers.
This is getting close, but I don't think has actually managed to pass all tests
yet.. https://cirrus-ci.com/task/5274721116749824
4. When building with Cygwin GCC 11.3 you get a bunch of warnings
that don't show up on other platforms, seemingly indicating that it
interprets -Wimplicit-fallthrough=3 differently. Huh?
Evidently due to the same getopt issues.
XXX This should use a canned Docker image with all the right packages
installed
Has anyone tried using non-canned images ? It sounds like this could reduce
the 4min startup time for windows.
https://cirrus-ci.org/guide/docker-builder-vm/#dockerfile-as-a-ci-environment
XXX configure is soooo slooow, can we cache it?! Compiling is also
insanely slow, but ccache gets it down to a couple of minutes if you're
lucky
One reason compiling was slow is because you ended up with -O2.
You can cache configure as long as you're willing to re-run it whenever options
were changed. That also applies to the existing headerscheck.
XXX I don't know how to put variables like BUILD_JOBS into the scripts
WDYM ? If it's outside of bash and in windows shell it's like %var%, right ?
https://cirrus-ci.org/guide/writing-tasks/#environment-variables
I just noticed that cirrus is misbehaving: if there's a variable called CI
(which there is), then it expands $CI_FOO like ${CI}_FOO rather than ${CI_FOO}.
I've also seen weirdness when variable names or operators appear in the commit
message...
XXX Needs some --with-X options
Done
XXX We would never want this to run by default in CI, but it'd be nice
to be able to ask for it with ci-os-only! (See commented out line)
only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
Doesn't this already do what's needed?
As long as it doesn't also check: CHANGE_MESSAGE !~ 'ci-os-only',
the task will runs only on request.
XXX I have no idea if crash dump works, and if this should share
elements with the msys work in commitfest #3575
Based on the crash above, it wasn't working. And after some changes ... it
still doesn't work.
windows_os is probably skipping too many things.
--
Justin
Attachments:
0001-WIP-CI-support-for-Cygwin.patchtext/x-diff; charset=us-asciiDownload
From 174cc603ff951b86794f57fcc8c7d326d948e006 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 25 Jul 2022 23:05:10 +1200
Subject: [PATCH] WIP CI support for Cygwin.
ci-os-only: cygwin, xindows, xinux
https://cirrus-ci.com/task/5145086722834432
XXX This should use a canned Docker image with all the right packages
installed
XXX I have no idea if crash dump works, and if this should share
elements with the msys work in commitfest #3575
---
.cirrus.yml | 59 +++++++++++++++++++
configure | 18 +++---
configure.ac | 6 +-
src/interfaces/libpq/t/001_uri.pl | 11 +++-
src/interfaces/libpq/t/002_api.pl | 13 +++-
.../libpq_pipeline/t/001_libpq_pipeline.pl | 14 ++++-
src/test/perl/PostgreSQL/Test/Cluster.pm | 2 +-
src/test/perl/PostgreSQL/Test/Utils.pm | 12 +++-
src/tools/ci/cores_backtrace.sh | 6 +-
9 files changed, 120 insertions(+), 21 deletions(-)
diff --git a/.cirrus.yml b/.cirrus.yml
index f23d6cae552..0389b7758d1 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -456,6 +456,65 @@ task:
path: "crashlog-*.txt"
type: text/plain
+task:
+ name: Windows - Cygwin
+
+ env:
+ CPUS: 4
+ BUILD_JOBS: 4
+ TEST_JOBS: 3
+ CCACHE_DIR: /tmp/ccache
+ CCACHE_LOGFILE: ccache.log
+ CONFIGURE_FLAGS: --enable-cassert --enable-debug --enable-tap-tests --with-ldap --with-ssl=openssl --with-gssapi
+ CONFIGURE_CACHE: /tmp/ccache/configure.cache
+ PG_TEST_USE_UNIX_SOCKETS: 1
+
+ only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
+
+ windows_container:
+ image: cirrusci/windowsservercore:2019
+ os_version: 2019
+ cpu: $CPUS
+ memory: 4G
+
+ ccache_cache:
+ folder: C:\tools\cygwin\tmp\ccache
+
+ setup_additional_packages_script: |
+ choco install --cache C:\tools\cygwin\choco -y --no-progress cygwin
+ C:\tools\cygwin\cygwinsetup.exe -q -P cygrunsrv,make,gcc-core,ccache,binutils,libtool,pkg-config,flex,bison,zlib-devel,libssl-devel,libkrb5-devel,openldap-devel,libreadline-devel,perl,perl-IPC-Run
+ C:\tools\cygwin\bin\bash.exe --login -c "cygserver-config -y" || EXIT /b 1
+ C:\tools\cygwin\bin\bash.exe --login -c "echo 'kern.ipc.semmni 1024' >> /etc/cygserver.conf" || EXIT /b 1
+ C:\tools\cygwin\bin\bash.exe --login -c "echo 'kern.ipc.semmns 1024' >> /etc/cygserver.conf" || EXIT /b 1
+ C:\tools\cygwin\bin\bash.exe --login -c "net start cygserver" || EXIT /b 1
+
+ sysinfo_script: |
+ chcp
+ systeminfo
+ powershell -Command get-psdrive -psprovider filesystem
+ set
+ C:\tools\cygwin\bin\bash.exe --login -c "id; uname -a; ulimit -a -H; ulimit -a -S; export" || EXIT /b 1
+
+ configure_script:
+ # Try to configure with the cache file, and retry without if it fails, in case the flags changed.
+ - C:\tools\cygwin\bin\bash.exe --login -xc "cd '%cd%' && for i in 1 2; do ./configure --cache-file=${CONFIGURE_CACHE} ${CONFIGURE_FLAGS} CC='ccache gcc' CFLAGS='-Og -ggdb' && break; rm -v ${CONFIGURE_CACHE}; done" || EXIT /b 1
+
+ build_script:
+ - C:\tools\cygwin\bin\bash.exe --login -xc "ccache --zero-stats" || EXIT /b 1
+ - C:\tools\cygwin\bin\bash.exe --login -xc "cd '%cd%' && make -s -j '%BUILD_JOBS%' world-bin" || EXIT /b 1
+ - C:\tools\cygwin\bin\bash.exe --login -xc "ccache --show-stats" || EXIT /b 1
+
+ always:
+ upload_caches: ccache
+
+ test_world_script:
+ - C:\tools\cygwin\bin\bash.exe --login -xc "cd '%cd%' && timeout 33m make -s -j '%TEST_JOBS%' check-world -Otarget" || EXIT /b 1
+
+ on_failure:
+ <<: *on_failure
+ cores_script:
+ - C:\tools\cygwin\bin\bash.exe --login -xc "cd '%cd%' && src/tools/ci/cores_backtrace.sh linux ." || EXIT /b 1
+
task:
name: CompilerWarnings
diff --git a/configure b/configure
index c5bc3823958..8c6ca4381fb 100755
--- a/configure
+++ b/configure
@@ -5675,15 +5675,15 @@ fi
-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CC} supports -Wimplicit-fallthrough=3, for CFLAGS" >&5
-$as_echo_n "checking whether ${CC} supports -Wimplicit-fallthrough=3, for CFLAGS... " >&6; }
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CC} supports -Wimplicit-fallthrough 3, for CFLAGS" >&5
+$as_echo_n "checking whether ${CC} supports -Wimplicit-fallthrough 3, for CFLAGS... " >&6; }
if ${pgac_cv_prog_CC_cflags__Wimplicit_fallthrough_3+:} false; then :
$as_echo_n "(cached) " >&6
else
pgac_save_CFLAGS=$CFLAGS
pgac_save_CC=$CC
CC=${CC}
-CFLAGS="${CFLAGS} -Wimplicit-fallthrough=3"
+CFLAGS="${CFLAGS} -Wimplicit-fallthrough 3"
ac_save_c_werror_flag=$ac_c_werror_flag
ac_c_werror_flag=yes
cat confdefs.h - <<_ACEOF >conftest.$ac_ext
@@ -5710,19 +5710,19 @@ fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_prog_CC_cflags__Wimplicit_fallthrough_3" >&5
$as_echo "$pgac_cv_prog_CC_cflags__Wimplicit_fallthrough_3" >&6; }
if test x"$pgac_cv_prog_CC_cflags__Wimplicit_fallthrough_3" = x"yes"; then
- CFLAGS="${CFLAGS} -Wimplicit-fallthrough=3"
+ CFLAGS="${CFLAGS} -Wimplicit-fallthrough 3"
fi
- { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CXX} supports -Wimplicit-fallthrough=3, for CXXFLAGS" >&5
-$as_echo_n "checking whether ${CXX} supports -Wimplicit-fallthrough=3, for CXXFLAGS... " >&6; }
+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CXX} supports -Wimplicit-fallthrough 3, for CXXFLAGS" >&5
+$as_echo_n "checking whether ${CXX} supports -Wimplicit-fallthrough 3, for CXXFLAGS... " >&6; }
if ${pgac_cv_prog_CXX_cxxflags__Wimplicit_fallthrough_3+:} false; then :
$as_echo_n "(cached) " >&6
else
pgac_save_CXXFLAGS=$CXXFLAGS
pgac_save_CXX=$CXX
CXX=${CXX}
-CXXFLAGS="${CXXFLAGS} -Wimplicit-fallthrough=3"
+CXXFLAGS="${CXXFLAGS} -Wimplicit-fallthrough 3"
ac_save_cxx_werror_flag=$ac_cxx_werror_flag
ac_cxx_werror_flag=yes
ac_ext=cpp
@@ -5761,7 +5761,7 @@ fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_prog_CXX_cxxflags__Wimplicit_fallthrough_3" >&5
$as_echo "$pgac_cv_prog_CXX_cxxflags__Wimplicit_fallthrough_3" >&6; }
if test x"$pgac_cv_prog_CXX_cxxflags__Wimplicit_fallthrough_3" = x"yes"; then
- CXXFLAGS="${CXXFLAGS} -Wimplicit-fallthrough=3"
+ CXXFLAGS="${CXXFLAGS} -Wimplicit-fallthrough 3"
fi
@@ -17000,7 +17000,7 @@ fi
# mingw has adopted a GNU-centric interpretation of optind/optreset,
# so always use our version on Windows.
-if test "$PORTNAME" = "win32"; then
+if test "$PORTNAME" = "win32" -o test "$PORTNAME" = "cygwin"; then
case " $LIBOBJS " in
*" getopt.$ac_objext "* ) ;;
*) LIBOBJS="$LIBOBJS getopt.$ac_objext"
diff --git a/configure.ac b/configure.ac
index 61d0dd5d586..2be0cae8784 100644
--- a/configure.ac
+++ b/configure.ac
@@ -504,8 +504,8 @@ if test "$GCC" = yes -a "$ICC" = no; then
PGAC_PROG_CXX_CFLAGS_OPT([-Wendif-labels])
PGAC_PROG_CC_CFLAGS_OPT([-Wmissing-format-attribute])
PGAC_PROG_CXX_CFLAGS_OPT([-Wmissing-format-attribute])
- PGAC_PROG_CC_CFLAGS_OPT([-Wimplicit-fallthrough=3])
- PGAC_PROG_CXX_CFLAGS_OPT([-Wimplicit-fallthrough=3])
+ PGAC_PROG_CC_CFLAGS_OPT([-Wimplicit-fallthrough 3])
+ PGAC_PROG_CXX_CFLAGS_OPT([-Wimplicit-fallthrough 3])
PGAC_PROG_CC_CFLAGS_OPT([-Wcast-function-type])
PGAC_PROG_CXX_CFLAGS_OPT([-Wcast-function-type])
# This was included in -Wall/-Wformat in older GCC versions
@@ -1947,7 +1947,7 @@ fi
# mingw has adopted a GNU-centric interpretation of optind/optreset,
# so always use our version on Windows.
-if test "$PORTNAME" = "win32"; then
+if test "$PORTNAME" = "win32" -o test "$PORTNAME" = "cygwin"; then
AC_LIBOBJ(getopt)
AC_LIBOBJ(getopt_long)
fi
diff --git a/src/interfaces/libpq/t/001_uri.pl b/src/interfaces/libpq/t/001_uri.pl
index beaf13b49ca..c46aeb71853 100644
--- a/src/interfaces/libpq/t/001_uri.pl
+++ b/src/interfaces/libpq/t/001_uri.pl
@@ -224,7 +224,16 @@ sub test_uri
$expect{'exit'} = $expect{stderr} eq '';
- my $cmd = [ 'libpq_uri_regress', $uri ];
+ my $cmd;
+ if (-e "$ENV{'TESTDIR'}/test/libpq_uri_regress.exe")
+ {
+ $cmd = [ "$ENV{'TESTDIR'}/test/libpq_uri_regress.exe", $uri ];
+ }
+ else
+ {
+ $cmd = [ 'libpq_uri_regress', $uri ];
+ }
+
$result{exit} = IPC::Run::run $cmd, '>', \$result{stdout}, '2>',
\$result{stderr};
diff --git a/src/interfaces/libpq/t/002_api.pl b/src/interfaces/libpq/t/002_api.pl
index fa00221ae29..3bf89e85823 100644
--- a/src/interfaces/libpq/t/002_api.pl
+++ b/src/interfaces/libpq/t/002_api.pl
@@ -6,7 +6,18 @@ use PostgreSQL::Test::Utils;
use Test::More;
# Test PQsslAttribute(NULL, "library")
-my ($out, $err) = run_command([ 'libpq_testclient', '--ssl' ]);
+
+my $cmd;
+if (-e "$ENV{'TESTDIR'}/test/libpq_uri_regress.exe")
+{
+ $cmd = "$ENV{'TESTDIR'}/test/libpq_testclient.exe";
+}
+else
+{
+ $cmd = 'libpq_testclient'
+}
+
+my ($out, $err) = run_command([ $cmd, '--ssl' ]);
if ($ENV{with_ssl} eq 'openssl')
{
diff --git a/src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl b/src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl
index 0821329c8d3..126cd7a0085 100644
--- a/src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl
+++ b/src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl
@@ -14,7 +14,17 @@ $node->start;
my $numrows = 700;
-my ($out, $err) = run_command([ 'libpq_pipeline', 'tests' ]);
+my $libpq_pipeline;
+if (-e "libpq_pipeline.exe")
+{
+ $libpq_pipeline = "libpq_pipeline.exe";
+}
+else
+{
+ $libpq_pipeline = "libpq_pipeline";
+}
+
+my ($out, $err) = run_command([ $libpq_pipeline, 'tests' ]);
die "oops: $err" unless $err eq '';
my @tests = split(/\s+/, $out);
@@ -39,7 +49,7 @@ for my $testname (@tests)
# Execute the test
$node->command_ok(
[
- 'libpq_pipeline', @extraargs,
+ $libpq_pipeline, @extraargs,
$testname, $node->connstr('postgres')
],
"libpq_pipeline $testname");
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index c8c7bc5045a..febf5988434 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -827,7 +827,7 @@ sub start
# compatibility with older versions.
$ret = PostgreSQL::Test::Utils::system_log(
'pg_ctl', '-w', '-D', $self->data_dir,
- '-l', $self->logfile, '-o', "--cluster-name=$name",
+ '-l', $self->logfile, '-o', "-c cluster-name=$name",
'start');
if ($ret != 0)
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 1ca2cc59170..f10ef55de9a 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -88,10 +88,11 @@ our @EXPORT = qw(
$windows_os
$is_msys2
+ $is_cygwin
$use_unix_sockets
);
-our ($windows_os, $is_msys2, $use_unix_sockets, $timeout_default,
+our ($windows_os, $is_msys2, $is_cygwin, $use_unix_sockets, $timeout_default,
$tmp_check, $log_path, $test_logfile);
BEGIN
@@ -140,13 +141,18 @@ BEGIN
$ENV{PGAPPNAME} = basename($0);
# Must be set early
- $windows_os = $Config{osname} eq 'MSWin32' || $Config{osname} eq 'msys';
+ $windows_os = $Config{osname} eq 'MSWin32' || $Config{osname} eq 'msys'
+ || $Config{osname} eq 'cygwin';
+
# Check if this environment is MSYS2.
$is_msys2 =
$windows_os
&& -x '/usr/bin/uname'
&& `uname -or` =~ /^[2-9].*Msys/;
+ # Check if this environment is Cygwin
+ $is_cygwin = $Config{osname} eq 'cygwin';
+
if ($windows_os)
{
require Win32API::File;
@@ -707,7 +713,7 @@ sub dir_symlink
{
my $oldname = shift;
my $newname = shift;
- if ($windows_os)
+ if ($windows_os && !$is_cygwin)
{
$oldname =~ s,/,\\,g;
$newname =~ s,/,\\,g;
diff --git a/src/tools/ci/cores_backtrace.sh b/src/tools/ci/cores_backtrace.sh
index 28d3cecfc67..475dd609a22 100755
--- a/src/tools/ci/cores_backtrace.sh
+++ b/src/tools/ci/cores_backtrace.sh
@@ -1,5 +1,8 @@
#! /bin/sh
+#set -e
+set -x
+
if [ $# -ne 2 ]; then
echo "cores_backtrace.sh <os> <directory>"
exit 1
@@ -18,7 +21,7 @@ case $os in
esac
first=1
-for corefile in $(find "$directory" -type f) ; do
+for corefile in $(find "$directory" -type f \( -name 'core.*' -o -name core \) ) ; do
if [ "$first" -eq 1 ]; then
first=0
else
@@ -30,6 +33,7 @@ for corefile in $(find "$directory" -type f) ; do
lldb -c $corefile --batch -o 'thread backtrace all' -o 'quit'
else
auxv=$(gdb --quiet --core ${corefile} --batch -ex 'info auxv' 2>/dev/null)
+ echo "auxv $auxv"
if [ $? -ne 0 ]; then
echo "could not process ${corefile}"
continue
--
2.17.1
On Wed, Jul 27, 2022 at 6:44 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Tue, Jul 26, 2022 at 04:24:25PM +1200, Thomas Munro wrote:
3. You can't really run PostgreSQL on Cygwin for real, because its
implementation of signals does not have reliable signal masking, so
unsubtle and probably also subtle breakage occurs. That was reported
upstream by Noah years ago, but they aren't working on a fix.
lorikeet shows random failures, and presumably any CI system will do
the same...Reference: /messages/by-id/20170321034703.GB2097809@tornado.leadboat.com
On my 2nd try:
https://cirrus-ci.com/task/5311911574110208
TRAP: FailedAssertion("mq->mq_sender == NULL", File: "shm_mq.c", Line: 230, PID: 16370)
2022-07-26 06:32:35.525 PDT [15538][postmaster] LOG: background worker "parallel worker" (PID 16370) was terminated by signal 6: Aborted
Thanks for working on this!
Huh, that Cygwin being shipped by Choco is quite old, older than
lorikeet's, but not old enough to not have the bug:
[04:33:55.234] Starting cygwin install, version 2.918
Based on clues in Noah's emails in the archives, I think versions from
maybe somewhere around 2015 didn't have the bug, and then the bug
appeared, and AFAIK it's still here. I wonder if you can tell Choco
to install an ancient version, but even if that's possible you'd be
dealing with other stupid problems and bugs.
XXX Doesn't get all the way through yet...
Mainly because getopt was causing all tap tests to fail.
I tried to fix that in configure, but ended up changing the callers.This is getting close, but I don't think has actually managed to pass all tests
yet.. https://cirrus-ci.com/task/5274721116749824
Woo.
4. When building with Cygwin GCC 11.3 you get a bunch of warnings
that don't show up on other platforms, seemingly indicating that it
interprets -Wimplicit-fallthrough=3 differently. Huh?Evidently due to the same getopt issues.
Ahh, nice detective work.
XXX This should use a canned Docker image with all the right packages
installedHas anyone tried using non-canned images ? It sounds like this could reduce
the 4min startup time for windows.https://cirrus-ci.org/guide/docker-builder-vm/#dockerfile-as-a-ci-environment
Yeah, I had that working once. Not sure what the pros and cons would be for us.
XXX configure is soooo slooow, can we cache it?! Compiling is also
insanely slow, but ccache gets it down to a couple of minutes if you're
luckyOne reason compiling was slow is because you ended up with -O2.
Ah, right.
You can cache configure as long as you're willing to re-run it whenever options
were changed. That also applies to the existing headerscheck.XXX I don't know how to put variables like BUILD_JOBS into the scripts
WDYM ? If it's outside of bash and in windows shell it's like %var%, right ?
https://cirrus-ci.org/guide/writing-tasks/#environment-variables
Right. I should have taken the clue from the %cd% (I got a few ideas
about how to do this from libarchive's CI scripting[1]https://github.com/libarchive/libarchive/blob/master/build/ci/cirrus_ci/ci.cmd).
I just noticed that cirrus is misbehaving: if there's a variable called CI
(which there is), then it expands $CI_FOO like ${CI}_FOO rather than ${CI_FOO}.
I've also seen weirdness when variable names or operators appear in the commit
message...XXX Needs some --with-X options
Done
Neat.
XXX We would never want this to run by default in CI, but it'd be nice
to be able to ask for it with ci-os-only! (See commented out line)
only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'Doesn't this already do what's needed?
As long as it doesn't also check: CHANGE_MESSAGE !~ 'ci-os-only',
the task will runs only on request.
Yeah I was just trying to say that I was sharing the script in a way
that always runs, but for commit we'd want that. This is all far too
slow for cfbot to have to deal with on every build. Looks like we can
expect to be able to build and test fast on Windows soonish, though,
so maybe one day we'd just turn Cygwin and MSYS on?
[1]: https://github.com/libarchive/libarchive/blob/master/build/ci/cirrus_ci/ci.cmd
On Fri, Jul 29, 2022 at 10:04:04AM +1200, Thomas Munro wrote:
Thanks for working on this!
Huh, that Cygwin being shipped by Choco is quite old, older than
lorikeet's, but not old enough to not have the bug:[04:33:55.234] Starting cygwin install, version 2.918
Hm, I think that's the version of "cygwinsetup" but not cygwin..
It also says this: [13:16:36.014] Cygwin v3.3.4.20220408 [Approved]
I wonder if you can tell Choco
to install an ancient version, but even if that's possible you'd be
dealing with other stupid problems and bugs.
Yes: choco install -y --no-progress --version 4.6.1 ccache
XXX This should use a canned Docker image with all the right packages
installedHas anyone tried using non-canned images ? It sounds like this could reduce
the 4min startup time for windows.https://cirrus-ci.org/guide/docker-builder-vm/#dockerfile-as-a-ci-environment
Yeah, I had that working once. Not sure what the pros and cons would be for us.
I think it could be a lot faster to start, since cirrus caches the generated
docker image locally. Rather than (I gather) pulling the image every time.
XXX We would never want this to run by default in CI, but it'd be nice
to be able to ask for it with ci-os-only! (See commented out line)
only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'Doesn't this already do what's needed?
As long as it doesn't also check: CHANGE_MESSAGE !~ 'ci-os-only',
the task will runs only on request.Yeah I was just trying to say that I was sharing the script in a way
that always runs, but for commit we'd want that. This is all far too
slow for cfbot to have to deal with on every build.
It occurred to me today that if cfbot preserved the original patch series, and
commit messages, that would allow patch authors to write things like
"ci-os-only: docs" for a doc only patch. I've never gotten cirrus'
changesOnly() stuff to work...
Looks like we can expect to be able to build and test fast on Windows
soonish, though,
Do you mean with meson ?
so maybe one day we'd just turn Cygwin and MSYS on?
I didn't understand this ?
--
Justin
On Fri, Jul 29, 2022 at 10:23 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Fri, Jul 29, 2022 at 10:04:04AM +1200, Thomas Munro wrote:
[04:33:55.234] Starting cygwin install, version 2.918
Hm, I think that's the version of "cygwinsetup" but not cygwin..
It also says this: [13:16:36.014] Cygwin v3.3.4.20220408 [Approved]
Oops. Ok so we're testing the very latest then, and it definitely
still has the bug as we thought.
It occurred to me today that if cfbot preserved the original patch series, and
commit messages, that would allow patch authors to write things like
"ci-os-only: docs" for a doc only patch. I've never gotten cirrus'
changesOnly() stuff to work...
Maybe it's time to switch to "git am -3 ..." and reject patches that
don't apply that way.
Looks like we can expect to be able to build and test fast on Windows
soonish, though,Do you mean with meson ?
Yeah. Also there are some other things we can do to speed up testing
on Windows (and elsewhere), like not running every test query with new
psql + backend process pair, which takes at least a few hundred ms and
sometimes up to several seconds on this platform; I have some patches
I need to finish...
so maybe one day we'd just turn Cygwin and MSYS on?
I didn't understand this ?
I mean, if, some sunny day, we can compile and test on Windows at
non-glacial speeds, then it would become possible to contemplate
having cfbot run these tasks for every patch every time.
On Wed, Jul 27, 2022 at 5:09 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Jul 26, 2022 at 7:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think maybe we should re-open the discussion. I've certainly
reached the stage of fed-up-ness. That platform seems seriously
broken, upstream is making no progress on fixing it, and there
doesn't seem to be any real-world use-case. The only positive
argument for it is that Readline doesn't work in the other
Windows builds --- but we've apparently not rechecked that
statement in eighteen years, so maybe things are better now.If we could just continue to blithely ignore lorikeet's failures,
I wouldn't mind so much; but doing any significant amount of new
code development work for the platform seems like throwing away
developer time.I agree with that. All things being equal, I like the idea of
supporting a bunch of different platforms, and Cygwin doesn't really
look that dead. It has recent releases. But if blocking signals
doesn't actually work on that platform, making PostgreSQL work
reliably there seems really difficult.
It's one thing to drop old dead Unixes but I don't think anyone would
enjoy dropping support for an active open source project. The best
outcome would be for people who have an interest in seeing PostgreSQL
work correctly on Cygwin to help get the bug fixed. Here are the
threads I'm aware of:
https://cygwin.com/pipermail/cygwin/2017-August/234001.html
https://cygwin.com/pipermail/cygwin/2017-August/234097.html
I wonder if these problems would go away as a nice incidental
side-effect if we used latches for postmaster wakeups. I don't
know... maybe, if the problem is just with the postmaster's pattern of
blocking/unblocking? Maybe backend startup is simple enough that it
doesn't hit the bug? From a quick glance, I think the assertion
failures that occur in regular backends can possibly be blamed on the
postmaster getting confused about its children due to unexpected
handler re-entry.
On Fri, Jul 29, 2022 at 10:04:04AM +1200, Thomas Munro wrote:
XXX We would never want this to run by default in CI, but it'd be nice
to be able to ask for it with ci-os-only! (See commented out line)
only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'Doesn't this already do what's needed?
As long as it doesn't also check: CHANGE_MESSAGE !~ 'ci-os-only',
the task will runs only on request.Yeah I was just trying to say that I was sharing the script in a way
that always runs, but for commit we'd want that.
That makes more sense after noticing that you created a cf entry (for which
cfbot has been skipping my patch due to my "only_if" line). There's still a
few persistent issues:
This fails ~50% of the time in recovery 010-truncate
I hacked around this by setting data_sync_retry.
https://cirrus-ci.com/task/5289444063313920
I found these, not sure if they're relevant.
/messages/by-id/CAA4eK1Kft05mwNuZbTVRmz8SNS3r+uriuCT8DxL5KJy5btoS-A@mail.gmail.com
/messages/by-id/CAFiTN-uGxgo5258hZy2QJoz=s7_Cs7v9=b8Z2GgFV7qmQUOwxw@mail.gmail.com
And an fsync abort in 013 which seems similar to this other one.
data_sync_retry also avoids this issue.
https://cirrus-ci.com/task/6283023745286144?logs=cores#L34
/messages/by-id/CAMVYW_4QhjZ-19Xpr2x1B19soRCNu1BXHM8g1mOnAVtd5VViDw@mail.gmail.com
And sometimes various assertions failing in regress parallel_select (and then times out)
https://api.cirrus-ci.com/v1/artifact/task/5537540282253312/log/src/test/regress/log/postmaster.log
https://api.cirrus-ci.com/v1/artifact/task/6108746773430272/log/src/test/regress/log/postmaster.log
Or "could not map dynamic shared memory segment" (actually in 027-stream-regress):
https://cirrus-ci.com/task/6168860746317824
And segfault in vacuum parallel
https://api.cirrus-ci.com/v1/artifact/task/5404589569605632/log/src/test/regress/log/postmaster.log
Sometimes semctl() failed: Resource temporarily unavailable
https://api.cirrus-ci.com/v1/artifact/task/5027860623654912/log/src/test/subscription/tmp_check/log/014_binary_publisher.log
https://api.cirrus-ci.com/v1/artifact/task/5027860623654912/log/src/bin/pg_rewind/tmp_check/log/001_basic_standby_local.log
Some more
https://cirrus-ci.com/task/6468927780814848
If you're lucky, there's only 1 or 2 problems, of which those are different
symptoms.. Maybe for now this needs to disable tap tests :(
This shows that it *can* pass, if slowly, and infrequently:
https://cirrus-ci.com/task/6546858536337408
This fixes my changes to configure for getopt.
And simplifies the changes to *.pl (the .exe changes weren't necessary at all).
And removes the changes for implicit-fallthrough; I realized that configure was
just deciding that it didn't work and not using it at all.
And adds support for backtraces.
And remove kerberos and and add libxml
Why did you write "|| exit /b 1" in all the bash invocations ? I think cirrus
handles that automatically.
--
Justin
Attachments:
v3-0001-WIP-CI-support-for-Cygwin.patchtext/x-diff; charset=us-asciiDownload
From b929ea7acc33a2fda1ec10693736a2fa83d364e1 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 25 Jul 2022 23:05:10 +1200
Subject: [PATCH v3] WIP CI support for Cygwin.
ci-os-only: cygwin
See also: d8e78714-dc77-4a64-783f-e863ba4d951f@2ndquadrant.com
https://cirrus-ci.com/task/5145086722834432
XXX This should use a canned Docker image with all the right packages
installed? But if the larger image is slower to start, then maybe not...
---
.cirrus.yml | 67 +++++++++++++++++++++++
configure | 2 +-
configure.ac | 2 +-
src/test/perl/PostgreSQL/Test/Cluster.pm | 4 +-
src/test/perl/PostgreSQL/Test/Utils.pm | 12 +++-
src/test/recovery/t/020_archive_status.pl | 2 +-
src/tools/ci/cores_backtrace.sh | 28 +++++++++-
src/tools/ci/pg_ci_base.conf | 2 +
8 files changed, 109 insertions(+), 10 deletions(-)
diff --git a/.cirrus.yml b/.cirrus.yml
index 4b7918ef456..84341ac1b94 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -34,6 +34,7 @@ on_failure: &on_failure
- "**/*.log"
- "**/*.diffs"
- "**/regress_log_*"
+ - "**/*.stackdump"
type: text/plain
task:
@@ -464,6 +465,72 @@ task:
type: text/plain
+task:
+ name: Windows - Cygwin
+ #XXX only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
+ timeout_in: 90m
+
+ env:
+ CPUS: 4
+ BUILD_JOBS: 4
+ TEST_JOBS: 1
+ CCACHE_DIR: /tmp/ccache
+ CONFIGURE_FLAGS: --enable-debug --enable-tap-tests --with-ldap --with-ssl=openssl --with-libxml --enable-cassert
+ # --with-gssapi
+ CONFIGURE_CACHE: /tmp/ccache/configure.cache
+ PG_TEST_USE_UNIX_SOCKETS: 1
+ CCACHE_LOGFILE: ccache.log
+ EXTRA_REGRESS_OPTS: --max-connections=1
+ PG_TEST_EXTRA: ldap ssl # disable kerberos
+
+ windows_container:
+ image: cirrusci/windowsservercore:2019
+ os_version: 2019
+ cpu: $CPUS
+ memory: 4G
+
+ setup_additional_packages_script: |
+ choco install -y --no-progress cygwin
+ C:\tools\cygwin\cygwinsetup.exe -q -P cygrunsrv,make,gcc-core,ccache,binutils,libtool,pkg-config,flex,bison,zlib-devel,libxml2-devel,libxslt-devel,libssl-devel,openldap-devel,libreadline-devel,perl,perl-IPC-Run
+ REM libkrb5-devel,krb5-server
+ C:\tools\cygwin\bin\bash.exe --login -c "cygserver-config -y"
+ C:\tools\cygwin\bin\bash.exe --login -c "echo 'kern.ipc.semmni 1024' >> /etc/cygserver.conf"
+ C:\tools\cygwin\bin\bash.exe --login -c "echo 'kern.ipc.semmns 1024' >> /etc/cygserver.conf"
+ C:\tools\cygwin\bin\bash.exe --login -c "net start cygserver"
+
+ sysinfo_script: |
+ chcp
+ systeminfo
+ powershell -Command get-psdrive -psprovider filesystem
+ set
+ C:\tools\cygwin\bin\bash.exe --login -c "id; uname -a; ulimit -a -H; ulimit -a -S; export"
+
+ ccache_cache:
+ folder: C:\tools\cygwin\tmp\ccache
+
+ configure_script:
+ # Try to configure with the cache file, and retry without if it fails, in case the flags changed.
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && for i in 1 2; do ./configure --cache-file=${CONFIGURE_CACHE} ${CONFIGURE_FLAGS} CC='ccache gcc' CFLAGS='-Og -ggdb' && break; rm -v ${CONFIGURE_CACHE}; done"
+
+ build_script:
+ #- C:\tools\cygwin\bin\bash.exe --login -c "ccache --max-size ${CCACHE_MAXSIZE}"
+ - C:\tools\cygwin\bin\bash.exe --login -c "ccache --zero-stats"
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && make -s -j ${BUILD_JOBS} world-bin"
+ - C:\tools\cygwin\bin\bash.exe --login -c "ccache --show-stats"
+
+ upload_caches: ccache
+
+ test_world_script:
+ #- C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && timeout 44m make -s -j ${TEST_JOBS} check ${CHECKFLAGS} -C src/test/subscription"
+ #- C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && timeout 44m make -s -j ${TEST_JOBS} check ${CHECKFLAGS} -C src/test/recovery"
+ #- C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && timeout 44m make -s check ${CHECKFLAGS} -C src/bin -j 2"
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && timeout 77m make -s -j ${TEST_JOBS} ${CHECK} PROVE_FLAGS='-j2 --timer' ${CHECKFLAGS}"
+
+ on_failure:
+ <<: *on_failure
+ cores_script:
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && src/tools/ci/cores_backtrace.sh cygwin ."
+
task:
name: CompilerWarnings
diff --git a/configure b/configure
index c5bc3823958..d147cf372db 100755
--- a/configure
+++ b/configure
@@ -17000,7 +17000,7 @@ fi
# mingw has adopted a GNU-centric interpretation of optind/optreset,
# so always use our version on Windows.
-if test "$PORTNAME" = "win32"; then
+if test "$PORTNAME" = "win32" -o "$PORTNAME" = "cygwin"; then
case " $LIBOBJS " in
*" getopt.$ac_objext "* ) ;;
*) LIBOBJS="$LIBOBJS getopt.$ac_objext"
diff --git a/configure.ac b/configure.ac
index 61d0dd5d586..6dba8291d64 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1947,7 +1947,7 @@ fi
# mingw has adopted a GNU-centric interpretation of optind/optreset,
# so always use our version on Windows.
-if test "$PORTNAME" = "win32"; then
+if test "$PORTNAME" = "win32" -o "$PORTNAME" = "cygwin"; then
AC_LIBOBJ(getopt)
AC_LIBOBJ(getopt_long)
fi
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index c8c7bc5045a..29894b6a98c 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -1052,7 +1052,7 @@ sub enable_restoring
# the path contains spaces.
$path =~ s{\\}{\\\\}g if ($PostgreSQL::Test::Utils::windows_os);
my $copy_command =
- $PostgreSQL::Test::Utils::windows_os
+ $PostgreSQL::Test::Utils::windows_os && !$PostgreSQL::Test::Utils::is_cygwin
? qq{copy "$path\\\\%f" "%p"}
: qq{cp "$path/%f" "%p"};
@@ -1122,7 +1122,7 @@ sub enable_archiving
# the path contains spaces.
$path =~ s{\\}{\\\\}g if ($PostgreSQL::Test::Utils::windows_os);
my $copy_command =
- $PostgreSQL::Test::Utils::windows_os
+ $PostgreSQL::Test::Utils::windows_os && !$PostgreSQL::Test::Utils::is_cygwin
? qq{copy "%p" "$path\\\\%f"}
: qq{cp "%p" "$path/%f"};
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 1ca2cc59170..c7786089a4b 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -88,10 +88,11 @@ our @EXPORT = qw(
$windows_os
$is_msys2
+ $is_cygwin
$use_unix_sockets
);
-our ($windows_os, $is_msys2, $use_unix_sockets, $timeout_default,
+our ($windows_os, $is_msys2, $is_cygwin, $use_unix_sockets, $timeout_default,
$tmp_check, $log_path, $test_logfile);
BEGIN
@@ -140,13 +141,18 @@ BEGIN
$ENV{PGAPPNAME} = basename($0);
# Must be set early
- $windows_os = $Config{osname} eq 'MSWin32' || $Config{osname} eq 'msys';
+ $windows_os = $Config{osname} eq 'MSWin32' || $Config{osname} eq 'msys' ||
+ $Config{osname} eq 'cygwin';
+
# Check if this environment is MSYS2.
$is_msys2 =
$windows_os
&& -x '/usr/bin/uname'
&& `uname -or` =~ /^[2-9].*Msys/;
+ # Check if this environment is Cygwin
+ $is_cygwin = $Config{osname} eq 'cygwin';
+
if ($windows_os)
{
require Win32API::File;
@@ -707,7 +713,7 @@ sub dir_symlink
{
my $oldname = shift;
my $newname = shift;
- if ($windows_os)
+ if ($windows_os && !$is_cygwin)
{
$oldname =~ s,/,\\,g;
$newname =~ s,/,\\,g;
diff --git a/src/test/recovery/t/020_archive_status.pl b/src/test/recovery/t/020_archive_status.pl
index e6e4eb56a90..0b2716fd7c9 100644
--- a/src/test/recovery/t/020_archive_status.pl
+++ b/src/test/recovery/t/020_archive_status.pl
@@ -26,7 +26,7 @@ my $primary_data = $primary->data_dir;
# a portable solution, use an archive command based on a command known to
# work but will fail: copy with an incorrect original path.
my $incorrect_command =
- $PostgreSQL::Test::Utils::windows_os
+ $PostgreSQL::Test::Utils::windows_os && !$PostgreSQL::Test::Utils::is_cygwin
? qq{copy "%p_does_not_exist" "%f_does_not_exist"}
: qq{cp "%p_does_not_exist" "%f_does_not_exist"};
$primary->safe_psql(
diff --git a/src/tools/ci/cores_backtrace.sh b/src/tools/ci/cores_backtrace.sh
index 28d3cecfc67..64c980039cd 100755
--- a/src/tools/ci/cores_backtrace.sh
+++ b/src/tools/ci/cores_backtrace.sh
@@ -1,5 +1,8 @@
#! /bin/sh
+#set -e
+set -x
+
if [ $# -ne 2 ]; then
echo "cores_backtrace.sh <os> <directory>"
exit 1
@@ -8,9 +11,21 @@ fi
os=$1
directory=$2
+findargs=''
case $os in
freebsd|linux|macos)
- ;;
+ ;;
+ cygwin)
+ # XXX Evidently I don't know how to write two arguments here without pathname expansion later, other than eval.
+ #findargs='-name "*.stackdump"'
+ for corefile in $(find "$directory" -type f -name "*.stackdump") ; do
+ binary=`basename "$corefile" .stackdump`
+ echo;echo;
+ echo "dumping ${corefile} for ${binary}"
+ awk '/^0/{print $2}' $corefile |addr2line -f -i -e ./src/backend/postgres.exe
+ done
+ exit 0
+ ;;
*)
echo "unsupported operating system ${os}"
exit 1
@@ -18,7 +33,7 @@ case $os in
esac
first=1
-for corefile in $(find "$directory" -type f) ; do
+for corefile in $(find "$directory" -type f $findargs) ; do
if [ "$first" -eq 1 ]; then
first=0
else
@@ -28,6 +43,13 @@ for corefile in $(find "$directory" -type f) ; do
if [ "$os" = 'macos' ]; then
lldb -c $corefile --batch -o 'thread backtrace all' -o 'quit'
+ elif [ "$os" = 'cygwin' ]; then
+ # https://cirrus-ci.com/task/4964259674193920
+ #binary=${corefile%.stackdump}
+ #binary=${corefile#*/}
+ binary=`basename "$corefile" .stackdump`
+ echo "dumping ${corefile} for ${binary}"
+ awk '/^0/{print $2}' $corefile |addr2line -f -i -e ./src/backend/postgres.exe
else
auxv=$(gdb --quiet --core ${corefile} --batch -ex 'info auxv' 2>/dev/null)
if [ $? -ne 0 ]; then
@@ -48,3 +70,5 @@ for corefile in $(find "$directory" -type f) ; do
gdb --batch --quiet -ex "thread apply all bt full" -ex "quit" "$binary" "$corefile" 2>/dev/null
fi
done
+
+exit 0
diff --git a/src/tools/ci/pg_ci_base.conf b/src/tools/ci/pg_ci_base.conf
index d8faa9c26c1..206dd993ccc 100644
--- a/src/tools/ci/pg_ci_base.conf
+++ b/src/tools/ci/pg_ci_base.conf
@@ -12,3 +12,5 @@ log_connections = true
log_disconnections = true
log_line_prefix = '%m [%p][%b] %q[%a][%v:%x] '
log_lock_waits = true
+
+data_sync_retry = on
--
2.17.1
Hi,
On 2022-07-28 17:23:19 -0500, Justin Pryzby wrote:
On Fri, Jul 29, 2022 at 10:04:04AM +1200, Thomas Munro wrote:
XXX This should use a canned Docker image with all the right packages
installedHas anyone tried using non-canned images ? It sounds like this could reduce
the 4min startup time for windows.https://cirrus-ci.org/guide/docker-builder-vm/#dockerfile-as-a-ci-environment
Yeah, I had that working once. Not sure what the pros and cons would be for us.
I think it could be a lot faster to start, since cirrus caches the generated
docker image locally. Rather than (I gather) pulling the image every time.
I'm quite certain that is not true. All the docker images built are just
uploaded to the google container registry and then downloaded onto a
*separate* windows host. The dockerfile: stuff generates a separate task
running on a separate machine...
It's a bit better for non-windows containers, because there google has some
optimization for pulling image (pieces) on demand or such.
Greetings,
Andres Freund
On Thu, Aug 4, 2022 at 3:38 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
[train wreck]
Oh my, so I'm getting the impression we might actually be totally
unstable on Cygwin. Which surprises me because ... wait a minute ...
lorikeet isn't even running most of the tests. So... we don't really
know the degree to which any of this works at all?
This shows that it *can* pass, if slowly, and infrequently:
https://cirrus-ci.com/task/6546858536337408
Ok, that's slightly reassuring, so maybe we *can* fix this, but I'm
one step closer to what Tom said, re wasting developer time...
[lots of improvements]
Cool.
Why did you write "|| exit /b 1" in all the bash invocations ? I think cirrus
handles that automatically.
Cargo-culted from libarchive.
Hi,
On 2022-08-04 16:16:06 +1200, Thomas Munro wrote:
Ok, that's slightly reassuring, so maybe we *can* fix this, but I'm
one step closer to what Tom said, re wasting developer time...
It might be worth checking whether the cygwin installer, which at some point
at least allowed installing postgres, has download numbers available anywhere.
It's possible we could e.g. get away with just allowing libpq to be built.
Greetings,
Andres Freund
On Thu, Aug 4, 2022 at 4:16 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Thu, Aug 4, 2022 at 3:38 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
[train wreck]
Oh my, so I'm getting the impression we might actually be totally
unstable on Cygwin. Which surprises me because ... wait a minute ...
lorikeet isn't even running most of the tests. So... we don't really
know the degree to which any of this works at all?
Hmm, it's possible that all these failures are just new-to-me effects
of the known bug. Certainly the assertion failures are of the usual
type, and I think it might be possible for the weird parallel query
failure to be explained by the postmaster forking extra phantom child
processes.
It may be madness to try to work around this, but I wonder if we could
use a static local variable that we update with atomic compare
exhange, inside PG_SIGNAL_HANDLER_ENTRY(), and
PG_SIGNAL_HANDLER_EXIT() macros that do nothing on every other system.
On entry, if you can do 0->1 it means you are allowed to run the
function. If it's non-zero, set n->n+1 and return immediately: signal
blocked, but queued for later. On exit, you CAS n->0. If n was > 1,
then you have to jump back to the top and run the function body again.
Thomas Munro <thomas.munro@gmail.com> writes:
It may be madness to try to work around this, but I wonder if we could
use a static local variable that we update with atomic compare
exhange, inside PG_SIGNAL_HANDLER_ENTRY(), and
PG_SIGNAL_HANDLER_EXIT() macros that do nothing on every other system.
On entry, if you can do 0->1 it means you are allowed to run the
function. If it's non-zero, set n->n+1 and return immediately: signal
blocked, but queued for later. On exit, you CAS n->0. If n was > 1,
then you have to jump back to the top and run the function body again.
And ... we're expending all this effort for what exactly?
regards, tom lane
On Thu, Aug 4, 2022 at 5:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
It may be madness to try to work around this, but I wonder if we could
use a static local variable that we update with atomic compare
exhange, inside PG_SIGNAL_HANDLER_ENTRY(), and
PG_SIGNAL_HANDLER_EXIT() macros that do nothing on every other system.
On entry, if you can do 0->1 it means you are allowed to run the
function. If it's non-zero, set n->n+1 and return immediately: signal
blocked, but queued for later. On exit, you CAS n->0. If n was > 1,
then you have to jump back to the top and run the function body again.And ... we're expending all this effort for what exactly?
I'd be almost as happy if we ripped it all out, shut down lorikeet and
added it to the list of fallen platforms. I'd feel a bit like a
vandal, though. My suggestion is a last-ditch idea for Noah (CCd)
and/or Andrew to consider, who (respectively) blocked this last time
and run lorikeet. No plans to write that patch myself...
On Thu, Aug 04, 2022 at 04:16:06PM +1200, Thomas Munro wrote:
On Thu, Aug 4, 2022 at 3:38 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
[train wreck]
Oh my, so I'm getting the impression we might actually be totally
unstable on Cygwin. Which surprises me because ... wait a minute ...
lorikeet isn't even running most of the tests. So... we don't really
know the degree to which any of this works at all?
Right.
Maybe it's of limited interest, but ..
This updates the patch to build and test with meson.
Which first requires patching some meson.builds.
I guess that's needed for some current BF members, too.
Unfortunately, ccache+PCH causes gcc to crash :(
--
Justin
Attachments:
0001-meson-PROVE-is-not-required.patchtext/x-diff; charset=us-asciiDownload
From 0e0d5f33c8f5f3174b0576597971c80834bf76b8 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Fri, 30 Sep 2022 08:56:07 -0500
Subject: [PATCH 1/4] meson: PROVE is not required
It ought to be possible to build the application without running tests,
or without running TAP tests. And it's essential for supporting
buildfarm/cygwin, where TAP tests consistently fail..
---
meson.build | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/meson.build b/meson.build
index 2d225f706d2..fd8619d6997 100644
--- a/meson.build
+++ b/meson.build
@@ -323,7 +323,7 @@ python = find_program(get_option('PYTHON'), required: true, native: true)
flex = find_program(get_option('FLEX'), native: true, version: '>= 2.5.35')
bison = find_program(get_option('BISON'), native: true, version: '>= 2.3')
sed = find_program(get_option('SED'), 'sed', native: true)
-prove = find_program(get_option('PROVE'), native: true)
+prove = find_program(get_option('PROVE'), native: true, required: false)
tar = find_program(get_option('TAR'), native: true)
gzip = find_program(get_option('GZIP'), native: true)
program_lz4 = find_program(get_option('LZ4'), native: true, required: false)
--
2.25.1
0002-meson-other-fixes-for-cygwin.patchtext/x-diff; charset=us-asciiDownload
From 330f91de111f8bee7969818c9001ee6bbc74a048 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Fri, 30 Sep 2022 13:39:43 -0500
Subject: [PATCH 2/4] meson: other fixes for cygwin
XXX: what about HAVE_BUGGY_STRTOF ?
---
meson.build | 8 ++++++--
src/port/meson.build | 4 ++++
src/test/regress/meson.build | 2 +-
3 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/meson.build b/meson.build
index fd8619d6997..1c3e156f378 100644
--- a/meson.build
+++ b/meson.build
@@ -211,6 +211,10 @@ if host_system == 'aix'
elif host_system == 'cygwin'
cppflags += '-D_GNU_SOURCE'
+ dlsuffix = '.dll'
+ mod_link_args_fmt = ['@0@']
+ mod_link_with_name = 'lib@0@.exe.a'
+ mod_link_with_dir = 'libdir'
elif host_system == 'darwin'
dlsuffix = '.dylib'
@@ -2300,8 +2304,8 @@ gnugetopt_dep = cc.find_library('gnugetopt', required: false)
# (i.e., allow '-' as a flag character), so use our version on those platforms
# - We want to use system's getopt_long() only if the system provides struct
# option
-always_replace_getopt = host_system in ['windows', 'openbsd', 'solaris']
-always_replace_getopt_long = host_system == 'windows' or not cdata.has('HAVE_STRUCT_OPTION')
+always_replace_getopt = host_system in ['windows', 'cygwin', 'openbsd', 'solaris']
+always_replace_getopt_long = host_system in ['windows', 'cygwin'] or not cdata.has('HAVE_STRUCT_OPTION')
# Required on BSDs
execinfo_dep = cc.find_library('execinfo', required: false)
diff --git a/src/port/meson.build b/src/port/meson.build
index c2222696f1b..0ba83cc7930 100644
--- a/src/port/meson.build
+++ b/src/port/meson.build
@@ -40,6 +40,10 @@ if host_system == 'windows'
'win32setlocale.c',
'win32stat.c',
)
+elif host_system == 'cygwin'
+ pgport_sources += files(
+ 'dirmod.c',
+ )
endif
if cc.get_id() == 'msvc'
diff --git a/src/test/regress/meson.build b/src/test/regress/meson.build
index 3dcfc11278f..6ec3c77af53 100644
--- a/src/test/regress/meson.build
+++ b/src/test/regress/meson.build
@@ -10,7 +10,7 @@ regress_sources = pg_regress_c + files(
# patterns like ".*-.*-mingw.*". We probably can do better, but for now just
# replace 'gcc' with 'mingw' on windows.
host_tuple_cc = cc.get_id()
-if host_system == 'windows' and host_tuple_cc == 'gcc'
+if host_system in ['windows', 'cygwin'] and host_tuple_cc == 'gcc'
host_tuple_cc = 'mingw'
endif
host_tuple = '@0@-@1@-@2@'.format(host_cpu, host_system, host_tuple_cc)
--
2.25.1
0003-WIP-CI-support-for-Cygwin.patchtext/x-diff; charset=us-asciiDownload
From a054c0bfb95213e0fd37a7f0c59ed29510f2b873 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 25 Jul 2022 23:05:10 +1200
Subject: [PATCH 3/4] WIP CI support for Cygwin.
ci-os-only: cygwin
See also: d8e78714-dc77-4a64-783f-e863ba4d951f@2ndquadrant.com
https://cirrus-ci.com/task/5145086722834432
XXX This should use a canned Docker image with all the right packages
installed? But if the larger image is slower to start, then maybe not...
---
.cirrus.yml | 62 +++++++++++++++++++++++
configure | 2 +-
configure.ac | 2 +-
src/test/perl/PostgreSQL/Test/Cluster.pm | 4 +-
src/test/perl/PostgreSQL/Test/Utils.pm | 12 +++--
src/test/recovery/t/020_archive_status.pl | 2 +-
src/tools/ci/cores_backtrace.sh | 31 +++++++++++-
src/tools/ci/pg_ci_base.conf | 3 ++
8 files changed, 108 insertions(+), 10 deletions(-)
diff --git a/.cirrus.yml b/.cirrus.yml
index d95ff4bded8..cd4cbf9e5ed 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -477,6 +477,68 @@ task:
type: text/plain
+task:
+ name: Windows - Cygwin
+ #XXX only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
+ timeout_in: 90m
+
+ env:
+ CPUS: 4
+ BUILD_JOBS: 4
+ TEST_JOBS: 1
+ CCACHE_DIR: /tmp/ccache
+ CONFIGURE_FLAGS: --enable-debug --enable-tap-tests --with-ldap --with-ssl=openssl --with-libxml --enable-cassert
+ # --with-gssapi
+ CONFIGURE_CACHE: /tmp/ccache/configure.cache
+ PG_TEST_USE_UNIX_SOCKETS: 1
+ CCACHE_LOGFILE: ccache.log
+ EXTRA_REGRESS_OPTS: --max-connections=1
+ PG_TEST_EXTRA: ldap ssl # disable kerberos
+
+ windows_container:
+ image: cirrusci/windowsservercore:2019
+ os_version: 2019
+ cpu: $CPUS
+ memory: 4G
+
+ setup_additional_packages_script: |
+ choco install -y --no-progress cygwin
+ C:\tools\cygwin\cygwinsetup.exe -q -P cygrunsrv,make,gcc-core,ccache,binutils,libtool,pkg-config,flex,bison,zlib-devel,libxml2-devel,libxslt-devel,libssl-devel,openldap-devel,libreadline-devel,perl,perl-IPC-Run
+ REM libkrb5-devel,krb5-server
+ C:\tools\cygwin\bin\bash.exe --login -c "cygserver-config -y"
+ C:\tools\cygwin\bin\bash.exe --login -c "echo 'kern.ipc.semmni 1024' >> /etc/cygserver.conf"
+ C:\tools\cygwin\bin\bash.exe --login -c "echo 'kern.ipc.semmns 1024' >> /etc/cygserver.conf"
+ C:\tools\cygwin\bin\bash.exe --login -c "net start cygserver"
+
+ sysinfo_script: |
+ chcp
+ systeminfo
+ powershell -Command get-psdrive -psprovider filesystem
+ set
+ C:\tools\cygwin\bin\bash.exe --login -c "id; uname -a; ulimit -a -H; ulimit -a -S; export"
+
+ ccache_cache:
+ folder: C:\tools\cygwin\tmp\ccache
+
+ configure_script:
+ # Try to configure with the cache file, and retry without if it fails, in case the flags changed.
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && for i in 1 2; do ./configure --cache-file=${CONFIGURE_CACHE} ${CONFIGURE_FLAGS} CC='ccache gcc' CFLAGS='-Og -ggdb' && break; rm -v ${CONFIGURE_CACHE}; done"
+
+ build_script:
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && make -s -j ${BUILD_JOBS} world-bin"
+ - C:\tools\cygwin\bin\bash.exe --login -c "ccache --show-stats"
+
+ always:
+ upload_caches: ccache
+
+ test_world_script:
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && timeout 77m make -s -j ${TEST_JOBS} ${CHECK} PROVE_FLAGS='-j2 --timer' ${CHECKFLAGS}"
+
+ on_failure:
+ <<: *on_failure
+ cores_script:
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && src/tools/ci/cores_backtrace.sh cygwin ."
+
task:
name: CompilerWarnings
diff --git a/configure b/configure
index 5ea790d6380..f4b761db92d 100755
--- a/configure
+++ b/configure
@@ -16439,7 +16439,7 @@ fi
# mingw has adopted a GNU-centric interpretation of optind/optreset,
# so always use our version on Windows.
-if test "$PORTNAME" = "win32"; then
+if test "$PORTNAME" = "win32" -o "$PORTNAME" = "cygwin"; then
case " $LIBOBJS " in
*" getopt.$ac_objext "* ) ;;
*) LIBOBJS="$LIBOBJS getopt.$ac_objext"
diff --git a/configure.ac b/configure.ac
index d80cdb5ca25..9b5f8c1cdb4 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1877,7 +1877,7 @@ fi
# mingw has adopted a GNU-centric interpretation of optind/optreset,
# so always use our version on Windows.
-if test "$PORTNAME" = "win32"; then
+if test "$PORTNAME" = "win32" -o "$PORTNAME" = "cygwin"; then
AC_LIBOBJ(getopt)
AC_LIBOBJ(getopt_long)
fi
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index d80134b26f3..5b5e8e67137 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -1052,7 +1052,7 @@ sub enable_restoring
# the path contains spaces.
$path =~ s{\\}{\\\\}g if ($PostgreSQL::Test::Utils::windows_os);
my $copy_command =
- $PostgreSQL::Test::Utils::windows_os
+ $PostgreSQL::Test::Utils::windows_os && !$PostgreSQL::Test::Utils::is_cygwin
? qq{copy "$path\\\\%f" "%p"}
: qq{cp "$path/%f" "%p"};
@@ -1122,7 +1122,7 @@ sub enable_archiving
# the path contains spaces.
$path =~ s{\\}{\\\\}g if ($PostgreSQL::Test::Utils::windows_os);
my $copy_command =
- $PostgreSQL::Test::Utils::windows_os
+ $PostgreSQL::Test::Utils::windows_os && !$PostgreSQL::Test::Utils::is_cygwin
? qq{copy "%p" "$path\\\\%f"}
: qq{cp "%p" "$path/%f"};
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 99d33451064..fb7fca57239 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -88,10 +88,11 @@ our @EXPORT = qw(
$windows_os
$is_msys2
+ $is_cygwin
$use_unix_sockets
);
-our ($windows_os, $is_msys2, $use_unix_sockets, $timeout_default,
+our ($windows_os, $is_msys2, $is_cygwin, $use_unix_sockets, $timeout_default,
$tmp_check, $log_path, $test_logfile);
BEGIN
@@ -140,13 +141,18 @@ BEGIN
$ENV{PGAPPNAME} = basename($0);
# Must be set early
- $windows_os = $Config{osname} eq 'MSWin32' || $Config{osname} eq 'msys';
+ $windows_os = $Config{osname} eq 'MSWin32' || $Config{osname} eq 'msys' ||
+ $Config{osname} eq 'cygwin';
+
# Check if this environment is MSYS2.
$is_msys2 =
$windows_os
&& -x '/usr/bin/uname'
&& `uname -or` =~ /^[2-9].*Msys/;
+ # Check if this environment is Cygwin
+ $is_cygwin = $Config{osname} eq 'cygwin';
+
if ($windows_os)
{
require Win32API::File;
@@ -707,7 +713,7 @@ sub dir_symlink
{
my $oldname = shift;
my $newname = shift;
- if ($windows_os)
+ if ($windows_os && !$is_cygwin)
{
$oldname =~ s,/,\\,g;
$newname =~ s,/,\\,g;
diff --git a/src/test/recovery/t/020_archive_status.pl b/src/test/recovery/t/020_archive_status.pl
index 2108d50073a..fd46f45c627 100644
--- a/src/test/recovery/t/020_archive_status.pl
+++ b/src/test/recovery/t/020_archive_status.pl
@@ -26,7 +26,7 @@ my $primary_data = $primary->data_dir;
# a portable solution, use an archive command based on a command known to
# work but will fail: copy with an incorrect original path.
my $incorrect_command =
- $PostgreSQL::Test::Utils::windows_os
+ $PostgreSQL::Test::Utils::windows_os && !$PostgreSQL::Test::Utils::is_cygwin
? qq{copy "%p_does_not_exist" "%f_does_not_exist"}
: qq{cp "%p_does_not_exist" "%f_does_not_exist"};
$primary->safe_psql(
diff --git a/src/tools/ci/cores_backtrace.sh b/src/tools/ci/cores_backtrace.sh
index 28d3cecfc67..27f93147e4e 100755
--- a/src/tools/ci/cores_backtrace.sh
+++ b/src/tools/ci/cores_backtrace.sh
@@ -1,5 +1,8 @@
#! /bin/sh
+#set -e
+set -x
+
if [ $# -ne 2 ]; then
echo "cores_backtrace.sh <os> <directory>"
exit 1
@@ -8,9 +11,24 @@ fi
os=$1
directory=$2
+findargs=''
case $os in
freebsd|linux|macos)
- ;;
+ ;;
+
+ cygwin)
+ # XXX Evidently I don't know how to write two arguments here without pathname expansion later, other than eval.
+ #findargs='-name "*.stackdump"'
+ for stack in $(find "$directory" -type f -name "*.stackdump") ; do
+ binary=`basename "$stack" .stackdump`
+ echo;echo;
+ echo "dumping ${stack} for ${binary}"
+ #awk '/^0/{print $2}' $stack |addr2line -f -i -e "./src/backend/$binary.exe"
+ awk '/^0/{print $2}' $stack |addr2line -f -i -e src/backend/postgres.exe
+ done
+ exit 0
+ ;;
+
*)
echo "unsupported operating system ${os}"
exit 1
@@ -18,7 +36,7 @@ case $os in
esac
first=1
-for corefile in $(find "$directory" -type f) ; do
+for corefile in $(find "$directory" -type f $findargs) ; do
if [ "$first" -eq 1 ]; then
first=0
else
@@ -28,6 +46,13 @@ for corefile in $(find "$directory" -type f) ; do
if [ "$os" = 'macos' ]; then
lldb -c $corefile --batch -o 'thread backtrace all' -o 'quit'
+ elif [ "$os" = 'cygwin' ]; then
+ # https://cirrus-ci.com/task/4964259674193920
+ #binary=${corefile%.stackdump}
+ #binary=${corefile#*/}
+ binary=`basename "$corefile" .stackdump`
+ echo "dumping ${corefile} for ${binary}"
+ awk '/^0/{print $2}' $corefile |addr2line -f -i -e ./src/backend/postgres.exe
else
auxv=$(gdb --quiet --core ${corefile} --batch -ex 'info auxv' 2>/dev/null)
if [ $? -ne 0 ]; then
@@ -48,3 +73,5 @@ for corefile in $(find "$directory" -type f) ; do
gdb --batch --quiet -ex "thread apply all bt full" -ex "quit" "$binary" "$corefile" 2>/dev/null
fi
done
+
+exit 0
diff --git a/src/tools/ci/pg_ci_base.conf b/src/tools/ci/pg_ci_base.conf
index d8faa9c26c1..0d43b387006 100644
--- a/src/tools/ci/pg_ci_base.conf
+++ b/src/tools/ci/pg_ci_base.conf
@@ -12,3 +12,6 @@ log_connections = true
log_disconnections = true
log_line_prefix = '%m [%p][%b] %q[%a][%v:%x] '
log_lock_waits = true
+
+data_sync_retry = on
+shared_memory_type = mmap
--
2.25.1
0004-s-convert-to-meson.patchtext/x-diff; charset=us-asciiDownload
From bd5fa65f2c9fbc3715382f2bb26d546ef3981729 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Wed, 28 Sep 2022 19:54:59 -0500
Subject: [PATCH 4/4] s!convert to meson
Note that PCH causes gcc to crash:
https://cirrus-ci.com/task/5856316085239808
https://cirrus-ci.com/task/5982327657463808
https://community.chocolatey.org/packages/Cygwin#versionhistory
a semi-relevant message about msys(not cygwin): https://www.postgresql.org/message-id/9f4f22be-f9f1-b350-bc06-521226b87f7a%40dunslane.net
ci-os-only: cygwin
---
.cirrus.yml | 29 +++++++++++++++++------------
src/tools/ci/cores_backtrace.sh | 4 ++--
2 files changed, 19 insertions(+), 14 deletions(-)
diff --git a/.cirrus.yml b/.cirrus.yml
index cd4cbf9e5ed..006058a09b0 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -43,6 +43,7 @@ on_failure_meson: &on_failure_meson
- "build*/testrun/**/*.log"
- "build*/testrun/**/*.diffs"
- "build*/testrun/**/regress_log_*"
+ - "**/*.stackdump"
type: text/plain
# In theory it'd be nice to upload the junit files meson generates, so that
@@ -480,30 +481,35 @@ task:
task:
name: Windows - Cygwin
#XXX only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
+ only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
timeout_in: 90m
env:
CPUS: 4
- BUILD_JOBS: 4
- TEST_JOBS: 1
+ BUILD_JOBS: $CPUS
+ TEST_JOBS: $CPUS
CCACHE_DIR: /tmp/ccache
- CONFIGURE_FLAGS: --enable-debug --enable-tap-tests --with-ldap --with-ssl=openssl --with-libxml --enable-cassert
- # --with-gssapi
- CONFIGURE_CACHE: /tmp/ccache/configure.cache
+ CCACHE_DEPEND: 1
+ # compress because PCH are huge
+ CCACHE_COMPRESS: 1
+ # Actually, do not use ccache for PCH ... it crashes gcc
+ #CCACHE_SLOPPINESS: pch_defines,time_macros,include_file_ctime,include_file_mtime
PG_TEST_USE_UNIX_SOCKETS: 1
CCACHE_LOGFILE: ccache.log
EXTRA_REGRESS_OPTS: --max-connections=1
PG_TEST_EXTRA: ldap ssl # disable kerberos
+ CFLAGS: -Og -ggdb
windows_container:
- image: cirrusci/windowsservercore:2019
+ image: cirrusci/windowsservercore:2019-2022.06.23
os_version: 2019
cpu: $CPUS
memory: 4G
setup_additional_packages_script: |
choco install -y --no-progress cygwin
- C:\tools\cygwin\cygwinsetup.exe -q -P cygrunsrv,make,gcc-core,ccache,binutils,libtool,pkg-config,flex,bison,zlib-devel,libxml2-devel,libxslt-devel,libssl-devel,openldap-devel,libreadline-devel,perl,perl-IPC-Run
+ C:\tools\cygwin\cygwinsetup.exe -q -P cygrunsrv,make,gcc-core,ccache,binutils,libtool,pkg-config,flex,bison,zlib-devel,libxml2-devel,libxslt-devel,libssl-devel,openldap-devel,libreadline-devel,perl,meson,ninja
+ REM perl-IPC-Run,
REM libkrb5-devel,krb5-server
C:\tools\cygwin\bin\bash.exe --login -c "cygserver-config -y"
C:\tools\cygwin\bin\bash.exe --login -c "echo 'kern.ipc.semmni 1024' >> /etc/cygserver.conf"
@@ -521,21 +527,20 @@ task:
folder: C:\tools\cygwin\tmp\ccache
configure_script:
- # Try to configure with the cache file, and retry without if it fails, in case the flags changed.
- - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && for i in 1 2; do ./configure --cache-file=${CONFIGURE_CACHE} ${CONFIGURE_FLAGS} CC='ccache gcc' CFLAGS='-Og -ggdb' && break; rm -v ${CONFIGURE_CACHE}; done"
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && meson setup --buildtype=debug -Dcassert=true -Db_pch=false -Dssl=openssl -Duuid=e2fs -Dtap_tests=disabled -DPG_TEST_EXTRA='$PG_TEST_EXTRA' build"
build_script:
- - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && make -s -j ${BUILD_JOBS} world-bin"
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && ninja -C build -j${BUILD_JOBS}"
- C:\tools\cygwin\bin\bash.exe --login -c "ccache --show-stats"
always:
upload_caches: ccache
test_world_script:
- - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && timeout 77m make -s -j ${TEST_JOBS} ${CHECK} PROVE_FLAGS='-j2 --timer' ${CHECKFLAGS}"
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && meson test $MTEST_ARGS --num-processes ${TEST_JOBS}"
on_failure:
- <<: *on_failure
+ <<: *on_failure_meson
cores_script:
- C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && src/tools/ci/cores_backtrace.sh cygwin ."
diff --git a/src/tools/ci/cores_backtrace.sh b/src/tools/ci/cores_backtrace.sh
index 27f93147e4e..1e593429946 100755
--- a/src/tools/ci/cores_backtrace.sh
+++ b/src/tools/ci/cores_backtrace.sh
@@ -23,8 +23,8 @@ case $os in
binary=`basename "$stack" .stackdump`
echo;echo;
echo "dumping ${stack} for ${binary}"
- #awk '/^0/{print $2}' $stack |addr2line -f -i -e "./src/backend/$binary.exe"
- awk '/^0/{print $2}' $stack |addr2line -f -i -e src/backend/postgres.exe
+ #awk '/^0/{print $2}' $stack |addr2line -f -i -e "./build/src/backend/$binary.exe"
+ awk '/^0/{print $2}' $stack |addr2line -f -i -e ./build/tmp_install/usr/local/pgsql/bin/postgres.exe
done
exit 0
;;
--
2.25.1
On Thu, Oct 20, 2022 at 10:40:40PM -0500, Justin Pryzby wrote:
On Thu, Aug 04, 2022 at 04:16:06PM +1200, Thomas Munro wrote:
On Thu, Aug 4, 2022 at 3:38 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
[train wreck]
Oh my, so I'm getting the impression we might actually be totally
unstable on Cygwin. Which surprises me because ... wait a minute ...
lorikeet isn't even running most of the tests. So... we don't really
know the degree to which any of this works at all?Right.
Maybe it's of limited interest, but ..
This updates the patch to build and test with meson.
Which first requires patching some meson.builds.
I guess that's needed for some current BF members, too.
Unfortunately, ccache+PCH causes gcc to crash :(
Resending with the 'only-if' line commented (doh).
And some fixes to 001 as Andres pointed out by on other thread.
--
Justin
Attachments:
0001-meson-other-fixes-for-cygwin.patchtext/x-diff; charset=us-asciiDownload
From 2741472080eceac5cb6d002c39eaf319d7f72b50 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Fri, 30 Sep 2022 13:39:43 -0500
Subject: [PATCH 1/3] meson: other fixes for cygwin
XXX: what about HAVE_BUGGY_STRTOF ?
See: 20221021034040.GT16921@telsasoft.com
---
meson.build | 8 ++++++--
src/port/meson.build | 4 ++++
src/test/regress/meson.build | 2 ++
3 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/meson.build b/meson.build
index ce2f223a409..ed24370672a 100644
--- a/meson.build
+++ b/meson.build
@@ -211,6 +211,10 @@ if host_system == 'aix'
elif host_system == 'cygwin'
cppflags += '-D_GNU_SOURCE'
+ dlsuffix = '.dll'
+ mod_link_args_fmt = ['@0@']
+ mod_link_with_name = 'lib@0@.exe.a'
+ mod_link_with_dir = 'libdir'
elif host_system == 'darwin'
dlsuffix = '.dylib'
@@ -2301,8 +2305,8 @@ gnugetopt_dep = cc.find_library('gnugetopt', required: false)
# (i.e., allow '-' as a flag character), so use our version on those platforms
# - We want to use system's getopt_long() only if the system provides struct
# option
-always_replace_getopt = host_system in ['windows', 'openbsd', 'solaris']
-always_replace_getopt_long = host_system == 'windows' or not cdata.has('HAVE_STRUCT_OPTION')
+always_replace_getopt = host_system in ['windows', 'cygwin', 'openbsd', 'solaris']
+always_replace_getopt_long = host_system in ['windows', 'cygwin'] or not cdata.has('HAVE_STRUCT_OPTION')
# Required on BSDs
execinfo_dep = cc.find_library('execinfo', required: false)
diff --git a/src/port/meson.build b/src/port/meson.build
index c2222696f1b..0ba83cc7930 100644
--- a/src/port/meson.build
+++ b/src/port/meson.build
@@ -40,6 +40,10 @@ if host_system == 'windows'
'win32setlocale.c',
'win32stat.c',
)
+elif host_system == 'cygwin'
+ pgport_sources += files(
+ 'dirmod.c',
+ )
endif
if cc.get_id() == 'msvc'
diff --git a/src/test/regress/meson.build b/src/test/regress/meson.build
index f1adcd9198c..72a23737fa7 100644
--- a/src/test/regress/meson.build
+++ b/src/test/regress/meson.build
@@ -12,6 +12,8 @@ regress_sources = pg_regress_c + files(
host_tuple_cc = cc.get_id()
if host_system == 'windows' and host_tuple_cc == 'gcc'
host_tuple_cc = 'mingw'
+elif host_system == 'cygwin' and host_tuple_cc == 'gcc'
+ host_tuple_cc = 'cygwin'
endif
host_tuple = '@0@-@1@-@2@'.format(host_cpu, host_system, host_tuple_cc)
--
2.25.1
0002-WIP-CI-support-for-Cygwin.patchtext/x-diff; charset=us-asciiDownload
From 8f31be4d0bf036df890e32568dbc056c36fd57c5 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 25 Jul 2022 23:05:10 +1200
Subject: [PATCH 2/3] WIP CI support for Cygwin.
ci-os-only: cygwin
See also: d8e78714-dc77-4a64-783f-e863ba4d951f@2ndquadrant.com
https://cirrus-ci.com/task/5145086722834432
XXX This should use a canned Docker image with all the right packages
installed? But if the larger image is slower to start, then maybe not...
---
.cirrus.yml | 76 +++++++++++++++++++++++
configure | 2 +-
configure.ac | 2 +-
src/test/perl/PostgreSQL/Test/Cluster.pm | 4 +-
src/test/perl/PostgreSQL/Test/Utils.pm | 12 +++-
src/test/recovery/t/020_archive_status.pl | 2 +-
src/tools/ci/cores_backtrace.sh | 33 +++++++++-
src/tools/ci/pg_ci_base.conf | 2 +
8 files changed, 122 insertions(+), 11 deletions(-)
diff --git a/.cirrus.yml b/.cirrus.yml
index 9f2282471a9..02b0f3b7045 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -464,6 +464,82 @@ task:
type: text/plain
+task:
+ name: Windows - Cygwin
+ only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
+ #XXX only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
+ timeout_in: 90m
+
+ env:
+ CPUS: 4
+ BUILD_JOBS: 4
+ TEST_JOBS: 1
+ CCACHE_DIR: /tmp/ccache
+ CCACHE_LOGFILE: ccache.log
+ CONFIGURE_FLAGS: --enable-cassert --enable-debug --with-ldap --with-ssl=openssl --with-libxml
+ # --enable-tap-tests
+ # --disable-dynamicbase
+ # --with-gssapi
+ CONFIGURE_CACHE: /tmp/ccache/configure.cache
+ PG_TEST_USE_UNIX_SOCKETS: 1
+ EXTRA_REGRESS_OPTS: --max-connections=1
+ PG_TEST_EXTRA: ldap ssl # disable kerberos
+
+ windows_container:
+ image: cirrusci/windowsservercore:2019-2022.06.23
+ os_version: 2019
+ cpu: $CPUS
+ memory: 4G
+
+ setup_additional_packages_script: |
+ choco install -y --no-progress cygwin
+ C:\tools\cygwin\cygwinsetup.exe -q -P cygrunsrv,make,gcc-core,ccache,binutils,libtool,pkg-config,flex,bison,zlib-devel,libxml2-devel,libxslt-devel,libssl-devel,openldap-devel,libreadline-devel,perl
+ REM perl-IPC-Run,
+ REM libkrb5-devel,krb5-server
+ C:\tools\cygwin\bin\bash.exe --login -c "cygserver-config -y"
+ C:\tools\cygwin\bin\bash.exe --login -c "echo 'kern.ipc.semmni 1024' >> /etc/cygserver.conf"
+ C:\tools\cygwin\bin\bash.exe --login -c "echo 'kern.ipc.semmns 1024' >> /etc/cygserver.conf"
+ C:\tools\cygwin\bin\bash.exe --login -c "net start cygserver"
+
+ sysinfo_script: |
+ chcp
+ systeminfo
+ powershell -Command get-psdrive -psprovider filesystem
+ set
+ C:\tools\cygwin\bin\bash.exe --login -c "id; uname -a; ulimit -a -H; ulimit -a -S; export"
+
+ ccache_cache:
+ folder: C:\tools\cygwin\tmp\ccache
+ fingerprint_key: ccache/cygwin
+ reupload_on_changes: true
+
+ configure_script:
+ # Try to configure with the cache file, and retry without if it fails, in case the flags changed.
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && for i in 1 2; do ./configure --cache-file=${CONFIGURE_CACHE} ${CONFIGURE_FLAGS} CC='ccache gcc' CFLAGS='-Og -ggdb' && break; rm -v ${CONFIGURE_CACHE}; done"
+
+ build_script:
+ #- C:\tools\cygwin\bin\bash.exe --login -c "ccache --max-size ${CCACHE_MAXSIZE}"
+ - C:\tools\cygwin\bin\bash.exe --login -c "ccache --zero-stats"
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && make -s -j ${BUILD_JOBS} world-bin"
+ - C:\tools\cygwin\bin\bash.exe --login -c "ccache --show-stats"
+
+ upload_caches: ccache
+
+ test_world_script:
+ #- C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && timeout 44m make -s -j ${TEST_JOBS} check ${CHECKFLAGS} -C src/test/subscription"
+ #- C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && timeout 44m make -s -j ${TEST_JOBS} check ${CHECKFLAGS} -C src/test/recovery"
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && timeout 44m make -s -j ${TEST_JOBS} check ${CHECKFLAGS} -C src/test/modules/test_misc"
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && timeout 44m make -s -j ${TEST_JOBS} check ${CHECKFLAGS} -C src/interfaces/libpq"
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && timeout 44m make -s -j ${TEST_JOBS} check ${CHECKFLAGS} -C src/bin/psql"
+ #- C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && timeout 44m make -s check ${CHECKFLAGS} -C src/bin -j 2"
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && timeout 77m make -s -j ${TEST_JOBS} ${CHECK} PROVE_FLAGS='-j2 --timer' ${CHECKFLAGS}"
+
+ on_failure:
+ <<: *on_failure_ac
+ cores_script:
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && src/tools/ci/cores_backtrace.sh cygwin ."
+
+
task:
name: CompilerWarnings
diff --git a/configure b/configure
index 3966368b8d9..68e366dcfc2 100755
--- a/configure
+++ b/configure
@@ -16494,7 +16494,7 @@ fi
# mingw has adopted a GNU-centric interpretation of optind/optreset,
# so always use our version on Windows.
-if test "$PORTNAME" = "win32"; then
+if test "$PORTNAME" = "win32" -o "$PORTNAME" = "cygwin"; then
case " $LIBOBJS " in
*" getopt.$ac_objext "* ) ;;
*) LIBOBJS="$LIBOBJS getopt.$ac_objext"
diff --git a/configure.ac b/configure.ac
index f76b7ee31fc..33c5475520e 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1878,7 +1878,7 @@ fi
# mingw has adopted a GNU-centric interpretation of optind/optreset,
# so always use our version on Windows.
-if test "$PORTNAME" = "win32"; then
+if test "$PORTNAME" = "win32" -o "$PORTNAME" = "cygwin"; then
AC_LIBOBJ(getopt)
AC_LIBOBJ(getopt_long)
fi
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index d80134b26f3..5b5e8e67137 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -1052,7 +1052,7 @@ sub enable_restoring
# the path contains spaces.
$path =~ s{\\}{\\\\}g if ($PostgreSQL::Test::Utils::windows_os);
my $copy_command =
- $PostgreSQL::Test::Utils::windows_os
+ $PostgreSQL::Test::Utils::windows_os && !$PostgreSQL::Test::Utils::is_cygwin
? qq{copy "$path\\\\%f" "%p"}
: qq{cp "$path/%f" "%p"};
@@ -1122,7 +1122,7 @@ sub enable_archiving
# the path contains spaces.
$path =~ s{\\}{\\\\}g if ($PostgreSQL::Test::Utils::windows_os);
my $copy_command =
- $PostgreSQL::Test::Utils::windows_os
+ $PostgreSQL::Test::Utils::windows_os && !$PostgreSQL::Test::Utils::is_cygwin
? qq{copy "%p" "$path\\\\%f"}
: qq{cp "%p" "$path/%f"};
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 99d33451064..fb7fca57239 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -88,10 +88,11 @@ our @EXPORT = qw(
$windows_os
$is_msys2
+ $is_cygwin
$use_unix_sockets
);
-our ($windows_os, $is_msys2, $use_unix_sockets, $timeout_default,
+our ($windows_os, $is_msys2, $is_cygwin, $use_unix_sockets, $timeout_default,
$tmp_check, $log_path, $test_logfile);
BEGIN
@@ -140,13 +141,18 @@ BEGIN
$ENV{PGAPPNAME} = basename($0);
# Must be set early
- $windows_os = $Config{osname} eq 'MSWin32' || $Config{osname} eq 'msys';
+ $windows_os = $Config{osname} eq 'MSWin32' || $Config{osname} eq 'msys' ||
+ $Config{osname} eq 'cygwin';
+
# Check if this environment is MSYS2.
$is_msys2 =
$windows_os
&& -x '/usr/bin/uname'
&& `uname -or` =~ /^[2-9].*Msys/;
+ # Check if this environment is Cygwin
+ $is_cygwin = $Config{osname} eq 'cygwin';
+
if ($windows_os)
{
require Win32API::File;
@@ -707,7 +713,7 @@ sub dir_symlink
{
my $oldname = shift;
my $newname = shift;
- if ($windows_os)
+ if ($windows_os && !$is_cygwin)
{
$oldname =~ s,/,\\,g;
$newname =~ s,/,\\,g;
diff --git a/src/test/recovery/t/020_archive_status.pl b/src/test/recovery/t/020_archive_status.pl
index fe9ac06b32d..63452b49bd3 100644
--- a/src/test/recovery/t/020_archive_status.pl
+++ b/src/test/recovery/t/020_archive_status.pl
@@ -26,7 +26,7 @@ my $primary_data = $primary->data_dir;
# a portable solution, use an archive command based on a command known to
# work but will fail: copy with an incorrect original path.
my $incorrect_command =
- $PostgreSQL::Test::Utils::windows_os
+ $PostgreSQL::Test::Utils::windows_os && !$PostgreSQL::Test::Utils::is_cygwin
? qq{copy "%p_does_not_exist" "%f_does_not_exist"}
: qq{cp "%p_does_not_exist" "%f_does_not_exist"};
$primary->safe_psql(
diff --git a/src/tools/ci/cores_backtrace.sh b/src/tools/ci/cores_backtrace.sh
index 28d3cecfc67..02bd50b10fa 100755
--- a/src/tools/ci/cores_backtrace.sh
+++ b/src/tools/ci/cores_backtrace.sh
@@ -1,5 +1,8 @@
#! /bin/sh
+#set -e
+set -x
+
if [ $# -ne 2 ]; then
echo "cores_backtrace.sh <os> <directory>"
exit 1
@@ -8,17 +11,32 @@ fi
os=$1
directory=$2
+findargs=''
case $os in
freebsd|linux|macos)
- ;;
+ ;;
+
+ cygwin)
+ # XXX Evidently I don't know how to write two arguments here without pathname expansion later, other than eval.
+ #findargs='-name "*.stackdump"'
+ for stack in $(find "$directory" -type f -name "*.stackdump") ; do
+ binary=`basename "$stack" .stackdump`
+ echo;echo;
+ echo "dumping ${stack} for ${binary}"
+ awk '/^0/{print $2}' $stack |addr2line -f -i -e ./src/backend/postgres.exe
+ #awk '/^0/{print $2}' $stack |addr2line -f -i -e "./src/backend/$binary.exe"
+ done
+ exit 0
+ ;;
+
*)
echo "unsupported operating system ${os}"
exit 1
- ;;
+ ;;
esac
first=1
-for corefile in $(find "$directory" -type f) ; do
+for corefile in $(find "$directory" -type f $findargs) ; do
if [ "$first" -eq 1 ]; then
first=0
else
@@ -28,6 +46,13 @@ for corefile in $(find "$directory" -type f) ; do
if [ "$os" = 'macos' ]; then
lldb -c $corefile --batch -o 'thread backtrace all' -o 'quit'
+ elif [ "$os" = 'cygwin' ]; then
+ # https://cirrus-ci.com/task/4964259674193920
+ #binary=${corefile%.stackdump}
+ #binary=${corefile#*/}
+ binary=`basename "$corefile" .stackdump`
+ echo "dumping ${corefile} for ${binary}"
+ awk '/^0/{print $2}' $corefile |addr2line -f -i -e ./src/backend/postgres.exe
else
auxv=$(gdb --quiet --core ${corefile} --batch -ex 'info auxv' 2>/dev/null)
if [ $? -ne 0 ]; then
@@ -48,3 +73,5 @@ for corefile in $(find "$directory" -type f) ; do
gdb --batch --quiet -ex "thread apply all bt full" -ex "quit" "$binary" "$corefile" 2>/dev/null
fi
done
+
+exit 0
diff --git a/src/tools/ci/pg_ci_base.conf b/src/tools/ci/pg_ci_base.conf
index d8faa9c26c1..206dd993ccc 100644
--- a/src/tools/ci/pg_ci_base.conf
+++ b/src/tools/ci/pg_ci_base.conf
@@ -12,3 +12,5 @@ log_connections = true
log_disconnections = true
log_line_prefix = '%m [%p][%b] %q[%a][%v:%x] '
log_lock_waits = true
+
+data_sync_retry = on
--
2.25.1
0003-f-convert-to-meson.patchtext/x-diff; charset=us-asciiDownload
From 9aace45e27a1bc0bacd5922db9878d724b7bc492 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Wed, 28 Sep 2022 19:54:59 -0500
Subject: [PATCH 3/3] f!convert to meson
https://cirrus-ci.com/task/5982327657463808
https://community.chocolatey.org/packages/Cygwin#versionhistory
ci-os-only: cygwin
---
.cirrus.yml | 34 +++++++++++++--------------------
src/tools/ci/cores_backtrace.sh | 4 ++--
2 files changed, 15 insertions(+), 23 deletions(-)
diff --git a/.cirrus.yml b/.cirrus.yml
index 02b0f3b7045..57610e669e8 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -466,24 +466,24 @@ task:
task:
name: Windows - Cygwin
- only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
#XXX only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
- timeout_in: 90m
+ #XXX only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
+ #timeout_in: 90m
env:
CPUS: 4
- BUILD_JOBS: 4
- TEST_JOBS: 1
+ BUILD_JOBS: $CPUS
+ TEST_JOBS: $CPUS
CCACHE_DIR: /tmp/ccache
CCACHE_LOGFILE: ccache.log
- CONFIGURE_FLAGS: --enable-cassert --enable-debug --with-ldap --with-ssl=openssl --with-libxml
- # --enable-tap-tests
# --disable-dynamicbase
# --with-gssapi
CONFIGURE_CACHE: /tmp/ccache/configure.cache
PG_TEST_USE_UNIX_SOCKETS: 1
EXTRA_REGRESS_OPTS: --max-connections=1
PG_TEST_EXTRA: ldap ssl # disable kerberos
+ CC: ccache gcc
+ CFLAGS: -Og -ggdb
windows_container:
image: cirrusci/windowsservercore:2019-2022.06.23
@@ -493,7 +493,7 @@ task:
setup_additional_packages_script: |
choco install -y --no-progress cygwin
- C:\tools\cygwin\cygwinsetup.exe -q -P cygrunsrv,make,gcc-core,ccache,binutils,libtool,pkg-config,flex,bison,zlib-devel,libxml2-devel,libxslt-devel,libssl-devel,openldap-devel,libreadline-devel,perl
+ C:\tools\cygwin\cygwinsetup.exe -q -P cygrunsrv,make,gcc-core,ccache,binutils,libtool,pkg-config,flex,bison,zlib-devel,libxml2-devel,libxslt-devel,libssl-devel,openldap-devel,libreadline-devel,perl,meson,ninja
REM perl-IPC-Run,
REM libkrb5-devel,krb5-server
C:\tools\cygwin\bin\bash.exe --login -c "cygserver-config -y"
@@ -514,28 +514,20 @@ task:
reupload_on_changes: true
configure_script:
- # Try to configure with the cache file, and retry without if it fails, in case the flags changed.
- - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && for i in 1 2; do ./configure --cache-file=${CONFIGURE_CACHE} ${CONFIGURE_FLAGS} CC='ccache gcc' CFLAGS='-Og -ggdb' && break; rm -v ${CONFIGURE_CACHE}; done"
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && meson setup --buildtype=debug -Dcassert=true -Dssl=openssl -Duuid=e2fs -Dtap_tests=disabled -DPG_TEST_EXTRA='$PG_TEST_EXTRA' build"
build_script:
- #- C:\tools\cygwin\bin\bash.exe --login -c "ccache --max-size ${CCACHE_MAXSIZE}"
- - C:\tools\cygwin\bin\bash.exe --login -c "ccache --zero-stats"
- - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && make -s -j ${BUILD_JOBS} world-bin"
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && ninja -C build -j${BUILD_JOBS}"
- C:\tools\cygwin\bin\bash.exe --login -c "ccache --show-stats"
- upload_caches: ccache
+ always:
+ upload_caches: ccache
test_world_script:
- #- C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && timeout 44m make -s -j ${TEST_JOBS} check ${CHECKFLAGS} -C src/test/subscription"
- #- C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && timeout 44m make -s -j ${TEST_JOBS} check ${CHECKFLAGS} -C src/test/recovery"
- - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && timeout 44m make -s -j ${TEST_JOBS} check ${CHECKFLAGS} -C src/test/modules/test_misc"
- - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && timeout 44m make -s -j ${TEST_JOBS} check ${CHECKFLAGS} -C src/interfaces/libpq"
- - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && timeout 44m make -s -j ${TEST_JOBS} check ${CHECKFLAGS} -C src/bin/psql"
- #- C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && timeout 44m make -s check ${CHECKFLAGS} -C src/bin -j 2"
- - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && timeout 77m make -s -j ${TEST_JOBS} ${CHECK} PROVE_FLAGS='-j2 --timer' ${CHECKFLAGS}"
+ - C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && meson test $MTEST_ARGS --num-processes ${TEST_JOBS}"
on_failure:
- <<: *on_failure_ac
+ <<: *on_failure_meson
cores_script:
- C:\tools\cygwin\bin\bash.exe --login -c "cd '%cd%' && src/tools/ci/cores_backtrace.sh cygwin ."
diff --git a/src/tools/ci/cores_backtrace.sh b/src/tools/ci/cores_backtrace.sh
index 02bd50b10fa..1f0f8795fc6 100755
--- a/src/tools/ci/cores_backtrace.sh
+++ b/src/tools/ci/cores_backtrace.sh
@@ -23,8 +23,8 @@ case $os in
binary=`basename "$stack" .stackdump`
echo;echo;
echo "dumping ${stack} for ${binary}"
- awk '/^0/{print $2}' $stack |addr2line -f -i -e ./src/backend/postgres.exe
- #awk '/^0/{print $2}' $stack |addr2line -f -i -e "./src/backend/$binary.exe"
+ awk '/^0/{print $2}' $stack |addr2line -f -i -e ./build/tmp_install/usr/local/pgsql/bin/postgres.exe
+ #awk '/^0/{print $2}' $stack |addr2line -f -i -e "./build/src/backend/$binary.exe"
done
exit 0
;;
--
2.25.1
Hi,
On 2022-11-08 19:04:37 -0600, Justin Pryzby wrote:
From 2741472080eceac5cb6d002c39eaf319d7f72b50 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Fri, 30 Sep 2022 13:39:43 -0500
Subject: [PATCH 1/3] meson: other fixes for cygwinXXX: what about HAVE_BUGGY_STRTOF ?
What about it? As noted in another thread, HAVE_BUGGY_STRTOF is defined in a
header, and shouldn't be affected by the buildsystem.
Pushed this commit.
XXX This should use a canned Docker image with all the right packages
installed? But if the larger image is slower to start, then maybe not...
I think once we convert the windows containers to windows VMs we can just
install both cygwin and mingw in the same image. The overhead of installing
too much seems far far smaller there.
+ CONFIGURE_FLAGS: --enable-cassert --enable-debug --with-ldap --with-ssl=openssl --with-libxml + # --enable-tap-tests
I assume this is disabled as tap tests fail?
+ C:\tools\cygwin\bin\bash.exe --login -c "cygserver-config -y"
I'd copy the approach used for mingw of putting most of this in an environment
variable.
+findargs='' case $os in freebsd|linux|macos) - ;; + ;; + + cygwin) + # XXX Evidently I don't know how to write two arguments here without pathname expansion later, other than eval. + #findargs='-name "*.stackdump"' + for stack in $(find "$directory" -type f -name "*.stackdump") ; do + binary=`basename "$stack" .stackdump` + echo;echo; + echo "dumping ${stack} for ${binary}" + awk '/^0/{print $2}' $stack |addr2line -f -i -e ./src/backend/postgres.exe + #awk '/^0/{print $2}' $stack |addr2line -f -i -e "./src/backend/$binary.exe" + done + exit 0 + ;;
Is this stuff actually needed? Could we use the infrastructure we use for
backtraces with msvc instead? Or use something that understands .stackdump
files?
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm [...] +++ b/src/test/perl/PostgreSQL/Test/Utils.pm [...] +++ b/src/test/recovery/t/020_archive_status.pl [...]
I think these should be in a separate commit, they're not actually about CI.
Greetings,
Andres Freund
On Fri, Jul 29, 2022 at 10:57 AM Thomas Munro <thomas.munro@gmail.com> wrote:
I wonder if these problems would go away as a nice incidental
side-effect if we used latches for postmaster wakeups. I don't
know... maybe, if the problem is just with the postmaster's pattern of
blocking/unblocking? Maybe backend startup is simple enough that it
doesn't hit the bug? From a quick glance, I think the assertion
failures that occur in regular backends can possibly be blamed on the
postmaster getting confused about its children due to unexpected
handler re-entry.
Just to connect the dots, that's what this patch does:
/messages/by-id/CA+hUKG+Z-HpOj1JsO9eWUP+ar7npSVinsC_npxSy+jdOMsx=Gg@mail.gmail.com
(There may be other places that break under Cygwin's flaky sa_mask
implementation, I don't know and haven't seen any clues about that.)
On Wed, 9 Nov 2022 at 06:34, Justin Pryzby <pryzby@telsasoft.com> wrote:
On Thu, Oct 20, 2022 at 10:40:40PM -0500, Justin Pryzby wrote:
On Thu, Aug 04, 2022 at 04:16:06PM +1200, Thomas Munro wrote:
On Thu, Aug 4, 2022 at 3:38 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
[train wreck]
Oh my, so I'm getting the impression we might actually be totally
unstable on Cygwin. Which surprises me because ... wait a minute ...
lorikeet isn't even running most of the tests. So... we don't really
know the degree to which any of this works at all?Right.
Maybe it's of limited interest, but ..
This updates the patch to build and test with meson.
Which first requires patching some meson.builds.
I guess that's needed for some current BF members, too.
Unfortunately, ccache+PCH causes gcc to crash :(Resending with the 'only-if' line commented (doh).
And some fixes to 001 as Andres pointed out by on other thread.
Is there still some work pending for this thread as Andres had
committed some part, if so, can you post an updated patch for the
same.
Regards,
Vignesh
On Tue, Jan 03, 2023 at 05:54:56PM +0530, vignesh C wrote:
On Thu, Oct 20, 2022 at 10:40:40PM -0500, Justin Pryzby wrote:
On Thu, Aug 04, 2022 at 04:16:06PM +1200, Thomas Munro wrote:
On Thu, Aug 4, 2022 at 3:38 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
[train wreck]
Oh my, so I'm getting the impression we might actually be totally
unstable on Cygwin. Which surprises me because ... wait a minute ...
lorikeet isn't even running most of the tests. So... we don't really
know the degree to which any of this works at all?Right.
Maybe it's of limited interest, but ..
This updates the patch to build and test with meson.
Which first requires patching some meson.builds.
I guess that's needed for some current BF members, too.
Unfortunately, ccache+PCH causes gcc to crash :(Resending with the 'only-if' line commented (doh).
And some fixes to 001 as Andres pointed out by on other thread.Is there still some work pending for this thread as Andres had
committed some part, if so, can you post an updated patch for the
same.
Thomas, what's your opinion ?
On Wed, Jan 4, 2023 at 3:25 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Tue, Jan 03, 2023 at 05:54:56PM +0530, vignesh C wrote:
Is there still some work pending for this thread as Andres had
committed some part, if so, can you post an updated patch for the
same.Thomas, what's your opinion ?
One observation is that your CI patch *nearly* succeeds, even if
hacked to turn on the full TAP tests, if applied on top of the
WaitEventSet-for-postmaster patch:
https://cirrus-ci.com/task/4533371804581888
No cigar though, it still failed a few times for me in the
subscription tests with EAGAIN, when accessing semaphores:
semctl(24576010, 14, SETVAL, 0) failed: Resource temporarily unavailable
That isn't an error I expect from semctl(), but from some cursory
research it seems like that system call is actually talking to the
cygserver process over a pipe (?) to implement SysV semaphores. Maybe
it couldn't keep up, but doesn't like to block? Perhaps we could try
to tune that server, but let's try the POSIX kind of semaphores
instead. From a quick peek at the source, they are implemented some
other way on direct native NT voodoo, no cygserver involved.
https://cirrus-ci.com/task/5142810819559424 [still running at time of writing]
Gotta run, but I'll check again in the morning to see if that does better...
On Fri, Jan 6, 2023 at 1:22 AM Thomas Munro <thomas.munro@gmail.com> wrote:
https://cirrus-ci.com/task/5142810819559424 [still running at time of writing]
Gotta run, but I'll check again in the morning to see if that does better...
Yes! Two successful runs with all TAP tests so far. So it looks like
we can probably stop lorikeet's spurious failures, by happy
coincidence due to other work, and we could seriously consider
committing this optional CI test for it, much like we have the
optional MSYS build. Any interest in producing a tidied up version of
the patch, Justin? Or I can, but I'll go and work on other things
first.
I pushed a change to switch the semaphore implementation. I haven't
personally seen that failure mode on lorikeet, but I would guess
that's because (1) it's only running a tiny subset of the tests, (2)
it crashes for the other reason with higher likelihood, and/or (3)
it's not using much concurrency yet because the build farm doesn't use
meson yet.
On Wed, Nov 9, 2022 at 2:04 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
+data_sync_retry = on
Sharing with the list some clues that Justin and I figured out about
what that part is doing. Without it, you get failures like:
PANIC: could not open file "pg_logical/snapshots/0-14FE6B0.snap":
No such file or directory
That's been seen before:
/messages/by-id/17827.1549866683@sss.pgh.pa.us
That thread concluded that the operating system must have a non-atomic
rename(), ie a kernel bug. I don't know why Cygwin would display that
behaviour and our native Windows build not; maybe timing, or maybe our
own open() and rename() wrappers for Windows do something important
differently than Cygwin's open() and rename().
On reflection, that seems a bit too flimsy to have in-tree without
more investigation, which I won't have time for myself, so I'm going
to withdraw this entry.
On 2023-01-05 Th 16:39, Thomas Munro wrote:
On Fri, Jan 6, 2023 at 1:22 AM Thomas Munro <thomas.munro@gmail.com> wrote:
https://cirrus-ci.com/task/5142810819559424 [still running at time of writing]
Gotta run, but I'll check again in the morning to see if that does better...
Yes! Two successful runs with all TAP tests so far. So it looks like
we can probably stop lorikeet's spurious failures, by happy
coincidence due to other work, and we could seriously consider
committing this optional CI test for it, much like we have the
optional MSYS build. Any interest in producing a tidied up version of
the patch, Justin? Or I can, but I'll go and work on other things
first.I pushed a change to switch the semaphore implementation. I haven't
personally seen that failure mode on lorikeet, but I would guess
that's because (1) it's only running a tiny subset of the tests, (2)
it crashes for the other reason with higher likelihood, and/or (3)
it's not using much concurrency yet because the build farm doesn't use
meson yet.
OK, should I now try re-enabling TAP tests on lorikeet?
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
On Sat, Jan 7, 2023 at 3:40 AM Andrew Dunstan <andrew@dunslane.net> wrote:
OK, should I now try re-enabling TAP tests on lorikeet?
Not before https://commitfest.postgresql.org/41/4032/ is committed.
After that, it might be worth a try? I have no idea if the PANIC
problem I mentioned last night would apply to lorikeet's kernel too.
To summarise the kinds of failure we have analysed in this thread:
1. SysV semaphores are buggy; fixed, I hope, by recent commit (= just
don't use them).
2. The regular crashes we already knew about from other threads due
to signal masking being buggy seem to be fixed, coincidentally, by CF
#4032, not yet committed (= don't rely on sa_mask for correctness).
3. PANIC apparently caused by non-atomic rename(), based on analysis
of similar failures seen on other old buggy OSes back in 2018.
If lorikeet has problem #3 (which it may not; we know from CF #3951
that kernel versions differ in related respects and Server 2019 as
used on CI seems to have the most conservative/old Windows behaviour)
then it might fail in the TAP tests just like the proposed
CI-for-Cygwin patch, unless you also do data_sync_retry=on, which
seems like a pretty ugly workaround to me. I don't know what else
might be broken by non-atomic rename(), and I'd rather not find out
:-D I finished up here by trying to tidy up some weird looking
nonsense in our code while working on general portability cleanup,
since I needed a way to check if __CYGWIN__ stuff still works, but
what we found out is that it's more broken than anyone realised, and
now I have to pull the emergency rabbit hole ejection cord because I
have less than zero time for or interest in debugging Cygwin.
On Sat, Jan 07, 2023 at 12:39:11AM +1300, Thomas Munro wrote:
On Wed, Nov 9, 2022 at 2:04 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
+data_sync_retry = on
Sharing with the list some clues that Justin and I figured out about
what that part is doing. Without it, you get failures like:PANIC: could not open file "pg_logical/snapshots/0-14FE6B0.snap":
No such file or directoryThat's been seen before:
/messages/by-id/17827.1549866683@sss.pgh.pa.us
That thread concluded that the operating system must have a non-atomic
rename(), ie a kernel bug. I don't know why Cygwin would display that
behaviour and our native Windows build not; maybe timing, or maybe our
own open() and rename() wrappers for Windows do something important
differently than Cygwin's open() and rename().On reflection, that seems a bit too flimsy to have in-tree without
more investigation, which I won't have time for myself, so I'm going
to withdraw this entry.
Not so fast :)
Here's my latest copy of the patch. Most recently, rather than setting
data_sync_retry=no, I changed to call fsync_fname_ext() rather than
fsync_fname(), which uses PANIC (except when data_sync_retry is
disabled). That seems to work, showing that the problem is limited to
SnapBuildSerialize(), and not a problem with all fsync()...
https://cirrus-ci.com/task/5990885733695488
Thomas raised a good question, which was how the tests were passing when
SnapBuildSerialize() was raising an error, which is what it would've
been doing when I used data_sync_retry=no.
So .. why is wal_sync_method being used to control fsync for things
other than WAL?
See 6dc7760ac (c. 2005) which added wal_fsync_writethrough, at which
point (since 9b178555f, c. 2004) wal_sync_method was already being used
for SLOG.
Now, it's also being used for logical decoding (since b89e1510 and
858ec1185, c. 2014) in rewriteheap.c/snapbuild.c. And pidfiles (since
ee0e525bf, 2010). And the control file (8b938d36f7, 2019). Note that
data_sync_retry wasn't added until 9ccdd7f66 (c. 2018)
It looks like logical decoding may be the "most wrong" place that
wal_sync_method is being used, so maybe my change is reasonable to
consider, and not just a workaround.
I'm going to re-open the CF entry to let this run for a while to see how
it works out.
--
Justin
Attachments:
0001-WIP-CI-support-for-Cygwin.patchtext/x-diff; charset=us-asciiDownload
From b07add11b8bf39f5bfbae4f9072470f31da97360 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 25 Jul 2022 23:05:10 +1200
Subject: [PATCH] WIP CI support for Cygwin.
ci-os-only: cygwin
See also: d8e78714-dc77-4a64-783f-e863ba4d951f@2ndquadrant.com
https://cirrus-ci.com/task/5145086722834432
XXX This should use a canned Docker image with all the right packages
installed? But if the larger image is slower to start, then maybe not...
---
.cirrus.yml | 83 +++++++++++++++++++++
configure | 2 +-
configure.ac | 2 +-
src/backend/replication/logical/snapbuild.c | 4 +-
src/test/perl/PostgreSQL/Test/Cluster.pm | 4 +-
src/test/perl/PostgreSQL/Test/Utils.pm | 12 ++-
src/test/recovery/t/020_archive_status.pl | 2 +-
src/tools/ci/cores_backtrace.sh | 19 ++++-
8 files changed, 118 insertions(+), 10 deletions(-)
diff --git a/.cirrus.yml b/.cirrus.yml
index d13726ed893..4507f734e94 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -737,6 +737,89 @@ task:
type: text/plain
+task:
+ name: Windows - Cygwin
+
+ # due to resource constraints we don't run this task by default for now
+ trigger_type: manual
+ # worth using only_if despite being manual, otherwise this task will show up
+ # when e.g. ci-os-only: linux is used.
+ only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
+ # otherwise it'll be sorted before other tasks
+ depends_on: SanityCheck
+
+ #XXX only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
+ #timeout_in: 120m
+
+ env:
+ CPUS: 4
+ BUILD_JOBS: $CPUS
+ TEST_JOBS: $CPUS
+ CCACHE_DIR: /tmp/ccache
+ CCACHE_LOGFILE: ccache.log
+ # --disable-dynamicbase
+ # --with-gssapi
+ CONFIGURE_CACHE: /tmp/ccache/configure.cache
+ PG_TEST_USE_UNIX_SOCKETS: 1
+ EXTRA_REGRESS_OPTS: --max-connections=1
+ PG_TEST_EXTRA: ldap ssl # disable kerberos
+ CC: ccache gcc
+ CFLAGS: -Og -ggdb
+ BASH: C:\tools\cygwin\bin\bash.exe --login
+
+ #windows_container:
+ #image: cirrusci/windowsservercore:2019-2022.06.23
+ #os_version: 2019
+ compute_engine_instance:
+ image_project: $IMAGE_PROJECT
+ image: family/pg-ci-windows-ci-vs-2019
+ platform: windows
+ cpu: $CPUS
+ memory: 4G
+
+ setup_additional_packages_script: |
+ choco install -y --no-progress cygwin
+ C:\tools\cygwin\cygwinsetup.exe -q -P cygrunsrv,make,gcc-core,ccache,binutils,libtool,pkg-config,flex,bison,zlib-devel,libxml2-devel,libxslt-devel,libssl-devel,openldap-devel,libreadline-devel,perl,meson,ninja,perl-IPC-Run
+ REM libkrb5-devel,krb5-server
+ %BASH% -c "cygserver-config -y"
+ %BASH% -c "echo 'kern.ipc.semmni 1024' >> /etc/cygserver.conf"
+ %BASH% -c "echo 'kern.ipc.semmns 1024' >> /etc/cygserver.conf"
+ %BASH% -c "net start cygserver"
+
+ sysinfo_script: |
+ chcp
+ systeminfo
+ powershell -Command get-psdrive -psprovider filesystem
+ set
+ %BASH% -c "id; uname -a; ulimit -a -H; ulimit -a -S; export"
+
+ ccache_cache:
+ folder: C:\tools\cygwin\tmp\ccache
+ fingerprint_key: ccache/cygwin
+ reupload_on_changes: true
+
+ configure_script: |
+ %BASH% -c "cd '%cd%' && meson setup --buildtype=debug -Dcassert=true -Dssl=openssl -Duuid=e2fs -DPG_TEST_EXTRA='$PG_TEST_EXTRA' build"
+
+ build_script: |
+ %BASH% -c "cd '%cd%' && ninja -C build -j${BUILD_JOBS}"
+ %BASH% -c "ccache --show-stats"
+
+ always:
+ upload_caches: ccache
+
+ #%BASH% -c "cd '%cd%' && echo 'data_sync_retry = on' >> src/tools/ci/pg_ci_base.conf"
+ #%BASH% -c "cd '%cd%' && echo 'wal_sync_method = fdatasync' >> src/tools/ci/pg_ci_base.conf"
+ # --repeat 9
+ test_world_script: |
+ %BASH% -c "cd '%cd%' && meson test $MTEST_ARGS --num-processes ${TEST_JOBS}"
+
+ on_failure:
+ <<: *on_failure_meson
+ cores_script: |
+ %BASH% -c "cd '%cd%' && src/tools/ci/cores_backtrace.sh cygwin ."
+
+
task:
name: CompilerWarnings
# task that did not run, count as a success, so we need to recheck Linux'
diff --git a/configure b/configure
index 5d07fd0bb91..72d56f00534 100755
--- a/configure
+++ b/configure
@@ -16477,7 +16477,7 @@ fi
# mingw has adopted a GNU-centric interpretation of optind/optreset,
# so always use our version on Windows.
-if test "$PORTNAME" = "win32"; then
+if test "$PORTNAME" = "win32" -o "$PORTNAME" = "cygwin"; then
case " $LIBOBJS " in
*" getopt.$ac_objext "* ) ;;
*) LIBOBJS="$LIBOBJS getopt.$ac_objext"
diff --git a/configure.ac b/configure.ac
index e9b74ced6ca..e0a9c332060 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1899,7 +1899,7 @@ fi
# mingw has adopted a GNU-centric interpretation of optind/optreset,
# so always use our version on Windows.
-if test "$PORTNAME" = "win32"; then
+if test "$PORTNAME" = "win32" -o "$PORTNAME" = "cygwin"; then
AC_LIBOBJ(getopt)
AC_LIBOBJ(getopt_long)
fi
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 829c5681120..f0929600fcc 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1812,7 +1812,9 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
}
/* make sure we persist */
- fsync_fname(path, false);
+ if (fsync_fname_ext(path, false, false, ERROR))
+ elog(ERROR, "failed to fsync");
+
fsync_fname("pg_logical/snapshots", true);
/*
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 04921ca3a3d..31ac1d020a4 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -1097,7 +1097,7 @@ sub enable_restoring
# the path contains spaces.
$path =~ s{\\}{\\\\}g if ($PostgreSQL::Test::Utils::windows_os);
my $copy_command =
- $PostgreSQL::Test::Utils::windows_os
+ $PostgreSQL::Test::Utils::windows_os && !$PostgreSQL::Test::Utils::is_cygwin
? qq{copy "$path\\\\%f" "%p"}
: qq{cp "$path/%f" "%p"};
@@ -1167,7 +1167,7 @@ sub enable_archiving
# the path contains spaces.
$path =~ s{\\}{\\\\}g if ($PostgreSQL::Test::Utils::windows_os);
my $copy_command =
- $PostgreSQL::Test::Utils::windows_os
+ $PostgreSQL::Test::Utils::windows_os && !$PostgreSQL::Test::Utils::is_cygwin
? qq{copy "%p" "$path\\\\%f"}
: qq{cp "%p" "$path/%f"};
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 878e12b15ed..0c3f4dc35a0 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -88,10 +88,11 @@ our @EXPORT = qw(
$windows_os
$is_msys2
+ $is_cygwin
$use_unix_sockets
);
-our ($windows_os, $is_msys2, $use_unix_sockets, $timeout_default,
+our ($windows_os, $is_msys2, $is_cygwin, $use_unix_sockets, $timeout_default,
$tmp_check, $log_path, $test_logfile);
BEGIN
@@ -140,13 +141,18 @@ BEGIN
$ENV{PGAPPNAME} = basename($0);
# Must be set early
- $windows_os = $Config{osname} eq 'MSWin32' || $Config{osname} eq 'msys';
+ $windows_os = $Config{osname} eq 'MSWin32' || $Config{osname} eq 'msys'
+ || $Config{osname} eq 'cygwin';
+
# Check if this environment is MSYS2.
$is_msys2 =
$windows_os
&& -x '/usr/bin/uname'
&& `uname -or` =~ /^[2-9].*Msys/;
+ # Check if this environment is Cygwin
+ $is_cygwin = $Config{osname} eq 'cygwin';
+
if ($windows_os)
{
require Win32API::File;
@@ -707,7 +713,7 @@ sub dir_symlink
{
my $oldname = shift;
my $newname = shift;
- if ($windows_os)
+ if ($windows_os && !$is_cygwin)
{
$oldname =~ s,/,\\,g;
$newname =~ s,/,\\,g;
diff --git a/src/test/recovery/t/020_archive_status.pl b/src/test/recovery/t/020_archive_status.pl
index 13ada994dbb..0462d1d90c2 100644
--- a/src/test/recovery/t/020_archive_status.pl
+++ b/src/test/recovery/t/020_archive_status.pl
@@ -26,7 +26,7 @@ my $primary_data = $primary->data_dir;
# a portable solution, use an archive command based on a command known to
# work but will fail: copy with an incorrect original path.
my $incorrect_command =
- $PostgreSQL::Test::Utils::windows_os
+ $PostgreSQL::Test::Utils::windows_os && !$PostgreSQL::Test::Utils::is_cygwin
? qq{copy "%p_does_not_exist" "%f_does_not_exist"}
: qq{cp "%p_does_not_exist" "%f_does_not_exist"};
$primary->safe_psql(
diff --git a/src/tools/ci/cores_backtrace.sh b/src/tools/ci/cores_backtrace.sh
index 28d3cecfc67..c49f9b07752 100755
--- a/src/tools/ci/cores_backtrace.sh
+++ b/src/tools/ci/cores_backtrace.sh
@@ -1,5 +1,7 @@
#! /bin/sh
+#set -e
+
if [ $# -ne 2 ]; then
echo "cores_backtrace.sh <os> <directory>"
exit 1
@@ -8,9 +10,22 @@ fi
os=$1
directory=$2
+findargs=''
case $os in
freebsd|linux|macos)
- ;;
+ ;;
+
+ cygwin)
+ for stack in $(find "$directory" -type f -name "*.stackdump") ; do
+ binary=`basename "$stack" .stackdump`
+ echo;echo;
+ echo "dumping ${stack} for ${binary}"
+ awk '/^0/{print $2}' $stack |addr2line -f -i -e ./build/tmp_install/usr/local/pgsql/bin/postgres.exe
+ #awk '/^0/{print $2}' $stack |addr2line -f -i -e "./build/src/backend/$binary.exe"
+ done
+ exit 0
+ ;;
+
*)
echo "unsupported operating system ${os}"
exit 1
@@ -48,3 +63,5 @@ for corefile in $(find "$directory" -type f) ; do
gdb --batch --quiet -ex "thread apply all bt full" -ex "quit" "$binary" "$corefile" 2>/dev/null
fi
done
+
+exit 0
--
2.25.1
On Wed, Jan 11, 2023 at 10:39:49PM -0600, Justin Pryzby wrote:
Here's my latest copy of the patch. + # due to resource constraints we don't run this task by default for now + trigger_type: manual
Now, with trigger_type commented, so Thomas doesn't have to click
"trigger" for me.
Attachments:
0001-WIP-CI-support-for-Cygwin.patchtext/x-diff; charset=us-asciiDownload
From 16d2553e1e1d95aea1c215d7e909c7ea57fe160e Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 25 Jul 2022 23:05:10 +1200
Subject: [PATCH] WIP CI support for Cygwin.
See also: d8e78714-dc77-4a64-783f-e863ba4d951f@2ndquadrant.com
https://cirrus-ci.com/task/5145086722834432
XXX This should use a canned Docker image with all the right packages
installed? But if the larger image is slower to start, then maybe not...
XXX trigger_type: manual
ci-os-only: cygwin
---
.cirrus.yml | 83 +++++++++++++++++++++
configure | 2 +-
configure.ac | 2 +-
src/backend/replication/logical/snapbuild.c | 4 +-
src/test/perl/PostgreSQL/Test/Cluster.pm | 4 +-
src/test/perl/PostgreSQL/Test/Utils.pm | 12 ++-
src/test/recovery/t/020_archive_status.pl | 2 +-
src/tools/ci/cores_backtrace.sh | 19 ++++-
8 files changed, 118 insertions(+), 10 deletions(-)
diff --git a/.cirrus.yml b/.cirrus.yml
index d13726ed893..f704da2ff14 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -737,6 +737,89 @@ task:
type: text/plain
+task:
+ name: Windows - Cygwin
+
+ # due to resource constraints we don't run this task by default for now
+ #XXX trigger_type: manual
+ # worth using only_if despite being manual, otherwise this task will show up
+ # when e.g. ci-os-only: linux is used.
+ only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
+ # otherwise it'll be sorted before other tasks
+ depends_on: SanityCheck
+
+ #XXX only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
+ #timeout_in: 120m
+
+ env:
+ CPUS: 4
+ BUILD_JOBS: $CPUS
+ TEST_JOBS: $CPUS
+ CCACHE_DIR: /tmp/ccache
+ CCACHE_LOGFILE: ccache.log
+ # --disable-dynamicbase
+ # --with-gssapi
+ CONFIGURE_CACHE: /tmp/ccache/configure.cache
+ PG_TEST_USE_UNIX_SOCKETS: 1
+ EXTRA_REGRESS_OPTS: --max-connections=1
+ PG_TEST_EXTRA: ldap ssl # disable kerberos
+ CC: ccache gcc
+ CFLAGS: -Og -ggdb
+ BASH: C:\tools\cygwin\bin\bash.exe --login
+
+ #windows_container:
+ #image: cirrusci/windowsservercore:2019-2022.06.23
+ #os_version: 2019
+ compute_engine_instance:
+ image_project: $IMAGE_PROJECT
+ image: family/pg-ci-windows-ci-vs-2019
+ platform: windows
+ cpu: $CPUS
+ memory: 4G
+
+ setup_additional_packages_script: |
+ choco install -y --no-progress cygwin
+ C:\tools\cygwin\cygwinsetup.exe -q -P cygrunsrv,make,gcc-core,ccache,binutils,libtool,pkg-config,flex,bison,zlib-devel,libxml2-devel,libxslt-devel,libssl-devel,openldap-devel,libreadline-devel,perl,meson,ninja,perl-IPC-Run
+ REM libkrb5-devel,krb5-server
+ %BASH% -c "cygserver-config -y"
+ %BASH% -c "echo 'kern.ipc.semmni 1024' >> /etc/cygserver.conf"
+ %BASH% -c "echo 'kern.ipc.semmns 1024' >> /etc/cygserver.conf"
+ %BASH% -c "net start cygserver"
+
+ sysinfo_script: |
+ chcp
+ systeminfo
+ powershell -Command get-psdrive -psprovider filesystem
+ set
+ %BASH% -c "id; uname -a; ulimit -a -H; ulimit -a -S; export"
+
+ ccache_cache:
+ folder: C:\tools\cygwin\tmp\ccache
+ fingerprint_key: ccache/cygwin
+ reupload_on_changes: true
+
+ configure_script: |
+ %BASH% -c "cd '%cd%' && meson setup --buildtype=debug -Dcassert=true -Dssl=openssl -Duuid=e2fs -DPG_TEST_EXTRA='$PG_TEST_EXTRA' build"
+
+ build_script: |
+ %BASH% -c "cd '%cd%' && ninja -C build -j${BUILD_JOBS}"
+ %BASH% -c "ccache --show-stats"
+
+ always:
+ upload_caches: ccache
+
+ #%BASH% -c "cd '%cd%' && echo 'data_sync_retry = on' >> src/tools/ci/pg_ci_base.conf"
+ #%BASH% -c "cd '%cd%' && echo 'wal_sync_method = fdatasync' >> src/tools/ci/pg_ci_base.conf"
+ # --repeat 9
+ test_world_script: |
+ %BASH% -c "cd '%cd%' && meson test $MTEST_ARGS --num-processes ${TEST_JOBS}"
+
+ on_failure:
+ <<: *on_failure_meson
+ cores_script: |
+ %BASH% -c "cd '%cd%' && src/tools/ci/cores_backtrace.sh cygwin ."
+
+
task:
name: CompilerWarnings
# task that did not run, count as a success, so we need to recheck Linux'
diff --git a/configure b/configure
index 5d07fd0bb91..72d56f00534 100755
--- a/configure
+++ b/configure
@@ -16477,7 +16477,7 @@ fi
# mingw has adopted a GNU-centric interpretation of optind/optreset,
# so always use our version on Windows.
-if test "$PORTNAME" = "win32"; then
+if test "$PORTNAME" = "win32" -o "$PORTNAME" = "cygwin"; then
case " $LIBOBJS " in
*" getopt.$ac_objext "* ) ;;
*) LIBOBJS="$LIBOBJS getopt.$ac_objext"
diff --git a/configure.ac b/configure.ac
index e9b74ced6ca..e0a9c332060 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1899,7 +1899,7 @@ fi
# mingw has adopted a GNU-centric interpretation of optind/optreset,
# so always use our version on Windows.
-if test "$PORTNAME" = "win32"; then
+if test "$PORTNAME" = "win32" -o "$PORTNAME" = "cygwin"; then
AC_LIBOBJ(getopt)
AC_LIBOBJ(getopt_long)
fi
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 829c5681120..f0929600fcc 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1812,7 +1812,9 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
}
/* make sure we persist */
- fsync_fname(path, false);
+ if (fsync_fname_ext(path, false, false, ERROR))
+ elog(ERROR, "failed to fsync");
+
fsync_fname("pg_logical/snapshots", true);
/*
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 04921ca3a3d..31ac1d020a4 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -1097,7 +1097,7 @@ sub enable_restoring
# the path contains spaces.
$path =~ s{\\}{\\\\}g if ($PostgreSQL::Test::Utils::windows_os);
my $copy_command =
- $PostgreSQL::Test::Utils::windows_os
+ $PostgreSQL::Test::Utils::windows_os && !$PostgreSQL::Test::Utils::is_cygwin
? qq{copy "$path\\\\%f" "%p"}
: qq{cp "$path/%f" "%p"};
@@ -1167,7 +1167,7 @@ sub enable_archiving
# the path contains spaces.
$path =~ s{\\}{\\\\}g if ($PostgreSQL::Test::Utils::windows_os);
my $copy_command =
- $PostgreSQL::Test::Utils::windows_os
+ $PostgreSQL::Test::Utils::windows_os && !$PostgreSQL::Test::Utils::is_cygwin
? qq{copy "%p" "$path\\\\%f"}
: qq{cp "%p" "$path/%f"};
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 878e12b15ed..0c3f4dc35a0 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -88,10 +88,11 @@ our @EXPORT = qw(
$windows_os
$is_msys2
+ $is_cygwin
$use_unix_sockets
);
-our ($windows_os, $is_msys2, $use_unix_sockets, $timeout_default,
+our ($windows_os, $is_msys2, $is_cygwin, $use_unix_sockets, $timeout_default,
$tmp_check, $log_path, $test_logfile);
BEGIN
@@ -140,13 +141,18 @@ BEGIN
$ENV{PGAPPNAME} = basename($0);
# Must be set early
- $windows_os = $Config{osname} eq 'MSWin32' || $Config{osname} eq 'msys';
+ $windows_os = $Config{osname} eq 'MSWin32' || $Config{osname} eq 'msys'
+ || $Config{osname} eq 'cygwin';
+
# Check if this environment is MSYS2.
$is_msys2 =
$windows_os
&& -x '/usr/bin/uname'
&& `uname -or` =~ /^[2-9].*Msys/;
+ # Check if this environment is Cygwin
+ $is_cygwin = $Config{osname} eq 'cygwin';
+
if ($windows_os)
{
require Win32API::File;
@@ -707,7 +713,7 @@ sub dir_symlink
{
my $oldname = shift;
my $newname = shift;
- if ($windows_os)
+ if ($windows_os && !$is_cygwin)
{
$oldname =~ s,/,\\,g;
$newname =~ s,/,\\,g;
diff --git a/src/test/recovery/t/020_archive_status.pl b/src/test/recovery/t/020_archive_status.pl
index 13ada994dbb..0462d1d90c2 100644
--- a/src/test/recovery/t/020_archive_status.pl
+++ b/src/test/recovery/t/020_archive_status.pl
@@ -26,7 +26,7 @@ my $primary_data = $primary->data_dir;
# a portable solution, use an archive command based on a command known to
# work but will fail: copy with an incorrect original path.
my $incorrect_command =
- $PostgreSQL::Test::Utils::windows_os
+ $PostgreSQL::Test::Utils::windows_os && !$PostgreSQL::Test::Utils::is_cygwin
? qq{copy "%p_does_not_exist" "%f_does_not_exist"}
: qq{cp "%p_does_not_exist" "%f_does_not_exist"};
$primary->safe_psql(
diff --git a/src/tools/ci/cores_backtrace.sh b/src/tools/ci/cores_backtrace.sh
index 28d3cecfc67..c49f9b07752 100755
--- a/src/tools/ci/cores_backtrace.sh
+++ b/src/tools/ci/cores_backtrace.sh
@@ -1,5 +1,7 @@
#! /bin/sh
+#set -e
+
if [ $# -ne 2 ]; then
echo "cores_backtrace.sh <os> <directory>"
exit 1
@@ -8,9 +10,22 @@ fi
os=$1
directory=$2
+findargs=''
case $os in
freebsd|linux|macos)
- ;;
+ ;;
+
+ cygwin)
+ for stack in $(find "$directory" -type f -name "*.stackdump") ; do
+ binary=`basename "$stack" .stackdump`
+ echo;echo;
+ echo "dumping ${stack} for ${binary}"
+ awk '/^0/{print $2}' $stack |addr2line -f -i -e ./build/tmp_install/usr/local/pgsql/bin/postgres.exe
+ #awk '/^0/{print $2}' $stack |addr2line -f -i -e "./build/src/backend/$binary.exe"
+ done
+ exit 0
+ ;;
+
*)
echo "unsupported operating system ${os}"
exit 1
@@ -48,3 +63,5 @@ for corefile in $(find "$directory" -type f) ; do
gdb --batch --quiet -ex "thread apply all bt full" -ex "quit" "$binary" "$corefile" 2>/dev/null
fi
done
+
+exit 0
--
2.25.1
Hi,
On 2023-01-11 22:39:49 -0600, Justin Pryzby wrote:
Thomas raised a good question, which was how the tests were passing when
SnapBuildSerialize() was raising an error, which is what it would've
been doing when I used data_sync_retry=no.
Presumably some test not checking for failures in a part of the test.
So .. why is wal_sync_method being used to control fsync for things
other than WAL?
Historical raisins, I think. The problem is that macOS lies about fsync, and
one needs special magic to make it behave like a real fsync. Somebody thought
that instead of inventing a separate GUC to control whether the "real" fsync
is used for other subsystems, it'd be better to reuse wal_sync_method.
Note that this isn't the function that is actually used for WAL (that's
issue_xlog_fsync()), and that pg_fsync() only uses the GUC to know whether to
use pg_fsync_writethrough() or pg_fsync_no_writethrough(fd).
See 6dc7760ac (c. 2005) which added wal_fsync_writethrough, at which
point (since 9b178555f, c. 2004) wal_sync_method was already being used
for SLOG.Now, it's also being used for logical decoding (since b89e1510 and
858ec1185, c. 2014) in rewriteheap.c/snapbuild.c. And pidfiles (since
ee0e525bf, 2010). And the control file (8b938d36f7, 2019). Note that
data_sync_retry wasn't added until 9ccdd7f66 (c. 2018)It looks like logical decoding may be the "most wrong" place that
wal_sync_method is being used, so maybe my change is reasonable to
consider, and not just a workaround.
I don't follow. What does using fsync_fname() vs fsync_fname_ext() have to do
with pg_fsync() using wal_sync_method? fsync_fname() is just a wrapper around
fsync_fname_ext(). Both end up in pg_fsync().
Are you actually proposing that we don't PANIC after an fsync for the category
of files that you list here, even with data_sync_retry set?
Greetings,
Andres Freund
On Thu, Jan 12, 2023 at 06:43:54PM -0800, Andres Freund wrote:
It looks like logical decoding may be the "most wrong" place that
wal_sync_method is being used, so maybe my change is reasonable to
consider, and not just a workaround.I don't follow. What does using fsync_fname() vs fsync_fname_ext() have to do
with pg_fsync() using wal_sync_method? fsync_fname() is just a wrapper around
fsync_fname_ext(). Both end up in pg_fsync().
My patch used fsync_fname_ext() which would cause an ERROR rather than a
PANIC when failing to fsync the logical decoding pathname.
Are you actually proposing that we don't PANIC after an fsync for the category
of files that you list here, even with data_sync_retry set?
Yes, but I'm referring only to my change to SnapBuildSerialize().
The rest of the verbage was me trying to figure out the
history/evolution of pg_fsync usage.
--
Justin
Hi,
On 2023-01-12 22:17:55 -0600, Justin Pryzby wrote:
On Thu, Jan 12, 2023 at 06:43:54PM -0800, Andres Freund wrote:
Are you actually proposing that we don't PANIC after an fsync for the category
of files that you list here, even with data_sync_retry set?Yes, but I'm referring only to my change to SnapBuildSerialize().
I can't see how that change could be correct?
Greetings,
Andres Freund
On Thu, Jan 12, 2023 at 10:17:55PM -0600, Justin Pryzby wrote:
On Thu, Jan 12, 2023 at 06:43:54PM -0800, Andres Freund wrote:
It looks like logical decoding may be the "most wrong" place that
wal_sync_method is being used, so maybe my change is reasonable to
consider, and not just a workaround.I don't follow. What does using fsync_fname() vs fsync_fname_ext() have to do
with pg_fsync() using wal_sync_method? fsync_fname() is just a wrapper around
fsync_fname_ext(). Both end up in pg_fsync().My patch used fsync_fname_ext() which would cause an ERROR rather than a
PANIC when failing to fsync the logical decoding pathname.Are you actually proposing that we don't PANIC after an fsync for the category
of files that you list here, even with data_sync_retry set?Yes, but I'm referring only to my change to SnapBuildSerialize().
The rest of the verbage was me trying to figure out the
history/evolution of pg_fsync usage.
Also note the existing comment (originating from Thomas' "fsync-gate"
commit, which introduced data_sync_retry):
+ * It's safe to just ERROR on fsync() here because we'll retry the whole
+ * operation including the writes.
Also, it seems to work fine if one calls pg_fsync() again, rather than
calling fsync_fname(), which re-opens the file.
It also seems to work fine if one omits the initial call to
fsync_fname("pg_logical/snapshots", true);
Since SnapBuildSerialize() isn't atomic (the system could crash at any
point), I'm not seeing why these wouldn't be adequately safe. But also
hoping Thomas will comment on that.
--
Justin
Hi,
On 2023-01-23 17:28:14 -0600, Justin Pryzby wrote:
On Thu, Jan 12, 2023 at 10:17:55PM -0600, Justin Pryzby wrote:
On Thu, Jan 12, 2023 at 06:43:54PM -0800, Andres Freund wrote:
It looks like logical decoding may be the "most wrong" place that
wal_sync_method is being used, so maybe my change is reasonable to
consider, and not just a workaround.I don't follow. What does using fsync_fname() vs fsync_fname_ext() have to do
with pg_fsync() using wal_sync_method? fsync_fname() is just a wrapper around
fsync_fname_ext(). Both end up in pg_fsync().My patch used fsync_fname_ext() which would cause an ERROR rather than a
PANIC when failing to fsync the logical decoding pathname.Are you actually proposing that we don't PANIC after an fsync for the category
of files that you list here, even with data_sync_retry set?Yes, but I'm referring only to my change to SnapBuildSerialize().
The rest of the verbage was me trying to figure out the
history/evolution of pg_fsync usage.Also note the existing comment (originating from Thomas' "fsync-gate"
commit, which introduced data_sync_retry):+ * It's safe to just ERROR on fsync() here because we'll retry the whole + * operation including the writes.Also, it seems to work fine if one calls pg_fsync() again, rather than
calling fsync_fname(), which re-opens the file.
I don't think that'd achieve the same thing necessarily. But it's notoriously
hard to know what which OS requires in this area.
It also seems to work fine if one omits the initial call to
fsync_fname("pg_logical/snapshots", true);
I don't think it's a good idea randomly weaken individual fsyncs just because
it somehow, without any theory as to how, fixes tests on cygwin.
Since SnapBuildSerialize() isn't atomic (the system could crash at any
point), I'm not seeing why these wouldn't be adequately safe.
I'm not sure what you mean by that. It's atomic from a crash safety view: It
first writes into a tempfile, fsyncs that + directory, then renames the file
into place, fsyncs new filename + directory again. Tempfiles are removed after
a crash. In case of a crash you can either end up with an "old" or a "new"
file.
Greetings,
Andres Freund
Note that cirrus failed like this:
2023-01-25 23:17:10.417 GMT [29821][walsender] [sub1][3/0:0] ERROR: could not open file "pg_logical/snapshots/0-14F2060.snap": Is a directory
2023-01-25 23:17:10.417 GMT [29821][walsender] [sub1][3/0:0] STATEMENT: START_REPLICATION SLOT "sub1" LOGICAL 0/0 (proto_version '4', origin 'any', publication_names '"pub1"')
2023-01-25 23:17:10.418 GMT [29850][walsender] [pg_16413_sync_16394_7192732880582452157][6/0:0] PANIC: could not open file "pg_logical/snapshots/0-14F2060.snap": No such file or directory
2023-01-25 23:17:10.418 GMT [29850][walsender] [pg_16413_sync_16394_7192732880582452157][6/0:0] STATEMENT: START_REPLICATION SLOT "pg_16413_sync_16394_7192732880582452157" LOGICAL 0/14F2060 (proto_version '4', origin 'any', publication_names '"pub3"')
I don't understand how "Is a directory" happened ..
It looks like maybe the call stack would've been:
SnapBuildSerializationPoint()
xlog_decode() or standby_decode() ?
LogicalDecodingProcessRecord()
XLogSendLogical()
WalSndLoop()
StartLogicalReplication()
--
Justin
On Fri, Jan 13, 2023 at 5:17 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
My patch used fsync_fname_ext() which would cause an ERROR rather than a
PANIC when failing to fsync the logical decoding pathname.
FTR While analysing a lot of CI logs trying to debug something else I
came across a plain Windows/MSVC (not Cygwin) build that panicked like
this:
https://cirrus-ci.com/task/6689224833892352
https://api.cirrus-ci.com/v1/artifact/task/6689224833892352/testrun/build/testrun/subscription/013_partition/log/013_partition_publisher.log
https://api.cirrus-ci.com/v1/artifact/task/6689224833892352/crashlog/crashlog-postgres.exe_0af4_2023-02-05_21-53-20-018.txt
On Wed, Feb 8, 2023 at 8:06 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Fri, Jan 13, 2023 at 5:17 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
My patch used fsync_fname_ext() which would cause an ERROR rather than a
PANIC when failing to fsync the logical decoding pathname.FTR While analysing a lot of CI logs trying to debug something else I
came across a plain Windows/MSVC (not Cygwin) build that panicked like
this:https://cirrus-ci.com/task/6689224833892352
https://api.cirrus-ci.com/v1/artifact/task/6689224833892352/testrun/build/testrun/subscription/013_partition/log/013_partition_publisher.log
https://api.cirrus-ci.com/v1/artifact/task/6689224833892352/crashlog/crashlog-postgres.exe_0af4_2023-02-05_21-53-20-018.txt
Here are some more flapping CI failures due to this phenomenon
(nothing to do with Cygwin, this is just regular Windows):
4509011781877760 | Windows - Server 2019, VS 2019 - Meson & ninja
4525770962370560 | Windows - Server 2019, VS 2019 - Meson & ninja
5664518341132288 | Windows - Server 2019, VS 2019 - Meson & ninja
5689846694412288 | Windows - Server 2019, VS 2019 - Meson & ninja
5853025126842368 | Windows - Server 2019, VS 2019 - Meson & ninja
6639943179567104 | Windows - Server 2019, VS 2019 - Meson & ninja
6727728217456640 | Windows - Server 2019, VS 2019 - Meson & ninja
6740158104469504 | Windows - Server 2019, VS 2019 - Meson & ninja
They all say something like 'PANIC: could not open file
"pg_logical/snapshots/0-1597938.snap": No such file or directory',
because they all do rename(some_temporary_file, that_name), then try
to re-open and sync it, but rename() on Windows fails to be atomic so
a concurrent process can see an intermediate ENOENT state. I see a
few 'local' workarounds we could do to fix that, but ... there seems
to be a much better idea staring us in the face in the comments!
I think this would be fixed as a happy by-product of this TODO in
SnapBuildSerialize():
* TODO: Do the fsync() via checkpoints/restartpoints, doing it here has
* some noticeable overhead since it's performed synchronously during
* decoding?
I have done no analysis myself of whether that is sound, but assuming
it is, I think the way to achieve that is to tweak FileTag so that it
can describe the file to be fsync'd, and use the sync.c machinery to
fsync the file in the background. Presumably that would provide a
huge speed up for logical decoding, and people would rejoice.
Some other topics that came up in this thread:
* Now that PostgreSQL seems to be stable enough on Cygwin to get
through the basic regression tests reliably, lorikeet might as well
run the full TAP test suite?
* Justin complained about the weird effects of wal_sync_method, and I
finally got around to showing how I think that should be untangled, in
https://commitfest.postgresql.org/44/4453/