AIX support - alignment issues
Hi,
(sorry for sending this twice to you Noah, forgot -hackers the first time
round)
We've had a bunch of changes to manually deal with our alignment code not
understanding AIX alignment.
commit f3b421da5f4addc95812b9db05a24972b8fd9739
Author: Peter Eisentraut <peter_e@gmx.net>
Date: 2016-12-21 12:00:00 -0500
Reorder pg_sequence columns to avoid alignment issue
commit 79b716cfb7a1be2a61ebb4418099db1258f35e30
Author: Amit Kapila <akapila@postgresql.org>
Date: 2022-04-07 09:39:25 +0530
Reorder subskiplsn in pg_subscription to avoid alignment issues.
A good explanation of the problem is in /messages/by-id/20220402081346.GD3719101@rfd.leadboat.com
I strikes me as a remarkably bad idea to manually try to maintain the correct
alignment. Even with the tests added it's still quite manual and requires
contorted struct layouts (see e.g. [1]/messages/by-id/CAFiTN-uiAngcW50Trwa94F1EWY2BxEx+B38QSyX3DtV3dzEGhA@mail.gmail.com).
I think we should either teach our system the correct alignment rules or we
should drop AIX support.
If we decide we want to continue supporting AIX we should bite the bullet and
add a 64bit-int TYPALIGN_*. It might be worth to translate that to bytes when
building tupledescs, so we don't need more branches (reducing them compared to
today).
Personally I think we should just drop AIX. The amount of effort to keep it
working is substantial due to being quite different from other unices ([2]linking etc is handled entirely different, so there's a fair bit of dedicated AIX code around the buildsystem - a lot of it vestigial stuff, see references to aix3.2.5 etc.), the is
very outdated, the whole ecosystem is barely on lifesupport ([3]7.2 was released in 2015-10-05, 7.3 in 2021-12-10, the set of changes is pretty darn small for that timeframe https://www.ibm.com/common/ssi/cgi-bin/ssialias?infotype=AN&subtype=CA&htmlfid=897/ENUS221-328&appname=USN). And all of that
for very little real world use.
Afaics we don't have access to an up2date AIX system. Some of have access to
7.2 via the gcc compile farm, but not 7.3. Most other niche-y operating
systems we can start in a VM, but I've yet to see a legal and affordable way
to do that with AIX.
I think Noah has done quite a heroic effort at keeping the AIX animals in a
kind-of-healthy state, but without more widespread access and more widespread
usage it seems like a doomed effort.
Greetings,
Andres Freund
[1]: /messages/by-id/CAFiTN-uiAngcW50Trwa94F1EWY2BxEx+B38QSyX3DtV3dzEGhA@mail.gmail.com
[2]: linking etc is handled entirely different, so there's a fair bit of dedicated AIX code around the buildsystem - a lot of it vestigial stuff, see references to aix3.2.5 etc.
dedicated AIX code around the buildsystem - a lot of it vestigial stuff,
see references to aix3.2.5 etc.
[3]: 7.2 was released in 2015-10-05, 7.3 in 2021-12-10, the set of changes is pretty darn small for that timeframe https://www.ibm.com/common/ssi/cgi-bin/ssialias?infotype=AN&subtype=CA&htmlfid=897/ENUS221-328&appname=USN
pretty darn small for that timeframe
https://www.ibm.com/common/ssi/cgi-bin/ssialias?infotype=AN&subtype=CA&htmlfid=897/ENUS221-328&appname=USN
Bull / Atos stopped their AIX work in 2022-03-01 - unfortunately they
didn't even keep the announcement of that online.
https://www.linkedin.com/pulse/said-say-bull-closing-down-aix-open-source-platform-michaelis
https://github.com/power-devops/bullfreeware
On Sat, Jul 2, 2022 at 11:34 AM Andres Freund <andres@anarazel.de> wrote:
Personally I think we should just drop AIX. The amount of effort to keep it
working is substantial due to being quite different from other unices ([2]), the is
very outdated, the whole ecosystem is barely on lifesupport ([3]). And all of that
for very little real world use.
I tend to agree about dropping AIX. But I wonder if there is an
argument against that proposal that doesn't rely on AIX being relevant
to at least one user. Has supporting AIX ever led to the discovery of
a bug that didn't just affect AIX? In other words, are AIX systems
peculiar in some particular way that clearly makes them more likely to
flush out a certain class of bugs? What is the best argument *against*
desupporting AIX that you know of?
Desupporting AIX doesn't mean that any AIX users will be left in the
lurch immediately. Obviously these users will be able to use a
supported version of Postgres for several more years.
--
Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes:
I tend to agree about dropping AIX. But I wonder if there is an
argument against that proposal that doesn't rely on AIX being relevant
to at least one user. Has supporting AIX ever led to the discovery of
a bug that didn't just affect AIX?
Searching the commit log quickly finds
591e088dd
datetime.c's parsing logic has assumed that strtod() will accept
a string that looks like ".", which it does in glibc, but not on
some less-common platforms such as AIX.
glibc's behavior is clearly not meeting the letter of the POSIX spec here.
a745b9365
I'm not sure how we've managed not to notice this problem, but it
seems to explain slow execution of the 017_shm.pl test script on AIX
since commit 4fdbf9af5, which added a speculative "pg_ctl stop" with
the idea of making real sure that the postmaster isn't there. In the
test steps that kill-9 and then restart the postmaster, it's possible
to get past the initial signal attempt before kill() stops working
for the doomed postmaster. If that happens, pg_ctl waited till
PGCTLTIMEOUT before giving up ... and the buildfarm's AIX members
have that set very high.
Admittedly, this one is more about "slow" than about "AIX".
57b5a9646
Most versions of tar are willing to overlook the missing terminator, but
the AIX buildfarm animals were not. Fix by inventing a new kind of
bbstreamer that just blindly adds a terminator, and using it whenever we
don't parse the tar archive.
Another place where we failed to conform to relevant standards.
b9b610577
Fix ancient violation of zlib's API spec.
And another.
Now, it's certainly possible that AIX is the only surviving platform
that hasn't adopted bug-compatible-with-glibc interpretations of
POSIX. But I think the standard is the standard, and we ought to
stay within it. So I find value in these fixes.
regards, tom lane
Hi,
On 2022-07-02 11:54:16 -0700, Peter Geoghegan wrote:
I tend to agree about dropping AIX. But I wonder if there is an
argument against that proposal that doesn't rely on AIX being relevant
to at least one user. Has supporting AIX ever led to the discovery of
a bug that didn't just affect AIX?
Yes, it clearly has. But I tend to think that that's far outweighed by the
complications triggered by AIX support. It'd be a different story if AIX
hadn't a very peculiar linking model and was more widely accessible.
What is the best argument *against* desupporting AIX that you know of?
Hm.
- a distinct set of system libraries that can help find portability issues
- With IBM's compiler it adds a, not otherwise used, compiler that PG builds
with. So the warnings could theoretically help find issues that we wouldn't
otherwise see - but I don't think that's been particularly useful (nor
monitored). And the compiler is buggy enough to add a fair bit work over the
years.
Desupporting AIX doesn't mean that any AIX users will be left in the
lurch immediately. Obviously these users will be able to use a
supported version of Postgres for several more years.
Right.
Greetings,
Andres Freund
On Sat, Jul 2, 2022 at 12:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Now, it's certainly possible that AIX is the only surviving platform
that hasn't adopted bug-compatible-with-glibc interpretations of
POSIX. But I think the standard is the standard, and we ought to
stay within it. So I find value in these fixes.
I imagine that there is strong evolutionary pressure pushing minority
platforms in the direction of bug-compatible-with-glibc. There is
definitely a similar trend around things like endianness and alignment
pickiness. But it wasn't always so.
It seems fair to wonder if AIX bucks the glibc-compatible trend
because it is already on the verge of extinction. If it wasn't just
about dead already then somebody would have gone to the trouble of
making it bug-compatible-with-glibc by now. (To be clear, I'm not
arguing that this is a good thing.)
Maybe it is still worth hanging on to AIX support for the time being,
but it would be nice to have some idea of where we *will* finally draw
the line. If the complaints from Andres aren't a good enough reason
now, then what other hypothetical reasons might be good enough in the
future? It seems fairly likely that Postgres desupporting AIX will
happen (say) at some time in the next decade, no matter what happens
today.
--
Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes:
Maybe it is still worth hanging on to AIX support for the time being,
but it would be nice to have some idea of where we *will* finally draw
the line. If the complaints from Andres aren't a good enough reason
now, then what other hypothetical reasons might be good enough in the
future? It seems fairly likely that Postgres desupporting AIX will
happen (say) at some time in the next decade, no matter what happens
today.
Agreed. But I think that this sort of thing is better driven by
"when there's no longer anyone willing to do the legwork" than
by project policy. IOW, we'll stop when Noah gets tired of doing
it (and no one steps up to take his place).
In the case at hand, given that the test added by 79b716cfb/c1da0acbb
correctly detects troublesome catalog layouts (and no I've not studied
it myself), I don't see that we have to do more right now.
I am a little concerned though that we don't have access to the latest
version of AIX --- that seems like a non-maintainable situation.
regards, tom lane
Hi,
On 2022-07-02 16:34:35 -0400, Tom Lane wrote:
Agreed. But I think that this sort of thing is better driven by
"when there's no longer anyone willing to do the legwork" than
by project policy. IOW, we'll stop when Noah gets tired of doing
it (and no one steps up to take his place).
I do think we should take the impact it has on everyone into account, not just
Noah's willingness. If it's just "does somebody still kind of maintain it"
then we'll bear the distributed cost of complications for irrelevant platforms
way longer than worthwhile.
In the case at hand, given that the test added by 79b716cfb/c1da0acbb
correctly detects troublesome catalog layouts (and no I've not studied
it myself), I don't see that we have to do more right now.
What made me look at this issue right now is that the alignment issue lead the
56bit relfilenode patch to move the relfilenode field to the start of pg_class
(ahead of the oid), because a 64bit value cannot be after a NameData. Now, I
think we can do a bit better by moving a few more fields around. But the
restriction still seems quite onerous.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
What made me look at this issue right now is that the alignment issue lead the
56bit relfilenode patch to move the relfilenode field to the start of pg_class
(ahead of the oid),
Agreed, up with that we should not put. However ...
because a 64bit value cannot be after a NameData.
... this coding rule strikes me as utterly ridiculous. Why can't we
instead insist that NAMEDATALEN must be a multiple of 8? Anyone who
tries to make it different from that is likely to be wasting padding
space even on platforms where there's not a deeper problem.
regards, tom lane
On Sun, Jul 3, 2022 at 8:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I am a little concerned though that we don't have access to the latest
version of AIX --- that seems like a non-maintainable situation.
The release history doesn't look toooo bad on that front: the live
versions are 7.1 (2010-2023), 7.2 (2015-TBA) and 7.3 (2021-TBA). 7.3
only came out half a year ago, slightly after Windows 11, which we
aren't testing yet either. Those GCC AIX systems seem to be provided
by IBM and the Open Source Lab at Oregon State University which has a
POWER lab providing ongoing CI services etc to various OSS projects,
so I would assume that upgrades (and retirement of the
about-to-be-desupported 7.1 system) will come along eventually.
I don't have a dog in this race, but AIX is clearly not in the same
category as HP-UX (and maybe Solaris is somewhere in between). AIX
runs on hardware you can buy today that got a major refresh last year
(Power 10), while HP-UX runs only on discontinued CPUs, so while it's
a no-brainer to drop HP-UX support, it's a trickier question for AIX.
I guess the way open source is supposed to work is that someone with a
real interest in PostgreSQL on AIX helps maintain it, not only keeping
it building and passing tests, but making it work really well (cf huge
pages, scalable event handling, probably more things that would be
obvious to an AIX expert...), and representing ongoing demand and
interests from the AIX user community...
Thomas Munro <thomas.munro@gmail.com> writes:
On Sun, Jul 3, 2022 at 8:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I am a little concerned though that we don't have access to the latest
version of AIX --- that seems like a non-maintainable situation.
The release history doesn't look toooo bad on that front: the live
versions are 7.1 (2010-2023), 7.2 (2015-TBA) and 7.3 (2021-TBA). 7.3
only came out half a year ago, slightly after Windows 11, which we
aren't testing yet either. Those GCC AIX systems seem to be provided
by IBM and the Open Source Lab at Oregon State University which has a
POWER lab providing ongoing CI services etc to various OSS projects,
so I would assume that upgrades (and retirement of the
about-to-be-desupported 7.1 system) will come along eventually.
OK, we can wait awhile to see what happens on that.
I don't have a dog in this race, but AIX is clearly not in the same
category as HP-UX (and maybe Solaris is somewhere in between). AIX
runs on hardware you can buy today that got a major refresh last year
(Power 10), while HP-UX runs only on discontinued CPUs, so while it's
a no-brainer to drop HP-UX support, it's a trickier question for AIX.
Yeah. FTR, I'm out of the HP-UX game: due to a hardware failure,
I can no longer boot that installation. I would have preferred to
keep pademelon, with its pre-C99 compiler, going until v11 is EOL,
but that ain't happening. I see that EDB are still running a couple
of HP-UX/IA64 animals, but I wonder if they're prepared to do anything
to support those animals --- like, say, fix platform-specific bugs.
Robert has definitely indicated displeasure with doing so, but
I don't know if he makes the decisions on that.
I would not stand in the way of dropping HP-UX and IA64 support as of
v16. (I do still feel that HPPA is of interest, to keep us honest
about spinlock support --- but that dual-stack arrangement that IA64
uses is surely not part of anyone's future.)
I have no opinion either way about Solaris.
regards, tom lane
Hi,
On 2022-07-04 10:33:37 +1200, Thomas Munro wrote:
I don't have a dog in this race, but AIX is clearly not in the same
category as HP-UX (and maybe Solaris is somewhere in between).
The reason to consider whether it's worth supporting AIX is that it's library
model is very different from other unix like platforms (much closer to windows
though). We also have dedicated compiler support for it, which I guess could
separately be dropped.
Greetings,
Andres Freund
Hi,
On 2022-07-03 20:08:19 -0400, Tom Lane wrote:
I would have preferred to keep pademelon, with its pre-C99 compiler, going
until v11 is EOL, but that ain't happening.
I'm not too worried about that - clang with
-std=c89 -Wc99-extensions -Werror=c99-extensions
as it's running on mylodon for the older branches seems to do a decent
job. And is obviously much faster :)
I would not stand in the way of dropping HP-UX and IA64 support as of
v16.
Cool.
I do still feel that HPPA is of interest, to keep us honest
about spinlock support
I.e. forgetting to initialize them? Or the weird alignment stuff it has?
I'd started to work a patch to detect missing initialization for both
spinlocks and lwlocks, I think that'd be good to have for more common cases.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2022-07-03 20:08:19 -0400, Tom Lane wrote:
I do still feel that HPPA is of interest, to keep us honest
about spinlock support
I.e. forgetting to initialize them? Or the weird alignment stuff it has?
The nonzero initialization mainly, and to a lesser extent the weird
size of a lock. I think the fact that the active word is only part
of the lock struct is pretty well encapsulated.
I'd started to work a patch to detect missing initialization for both
spinlocks and lwlocks, I think that'd be good to have for more common cases.
No objection to having more than one check for this ;-)
regards, tom lane
On Mon, Jul 4, 2022 at 12:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I would not stand in the way of dropping HP-UX and IA64 support as of
v16. (I do still feel that HPPA is of interest, to keep us honest
about spinlock support --- but that dual-stack arrangement that IA64
uses is surely not part of anyone's future.)
I tried to find everything relating to HP-UX, aCC, ia64 and hppa. Or
do you still want to keep the hppa bits for NetBSD (I wasn't sure if
your threat to set up a NetBSD/hppa system was affected by the
hardware failure you mentioned)? Or just leave it in there in
orphaned hall-of-fame state, like m68k, m88k, Vax?
Attachments:
0001-Remove-HP-UX-aCC-ia64-and-hppa-support.patchtext/x-patch; charset=US-ASCII; name=0001-Remove-HP-UX-aCC-ia64-and-hppa-support.patchDownload+13-850
Thomas Munro <thomas.munro@gmail.com> writes:
On Mon, Jul 4, 2022 at 12:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I would not stand in the way of dropping HP-UX and IA64 support as of
v16. (I do still feel that HPPA is of interest, to keep us honest
about spinlock support --- but that dual-stack arrangement that IA64
uses is surely not part of anyone's future.)
I tried to find everything relating to HP-UX, aCC, ia64 and hppa. Or
do you still want to keep the hppa bits for NetBSD (I wasn't sure if
your threat to set up a NetBSD/hppa system was affected by the
hardware failure you mentioned)?
No, the hardware failure is that the machine's SCSI controller seems
to be fried, thus internal drives no longer accessible. I have a
working NetBSD-current installation on an external USB drive, and plan
to commission it as a buildfarm animal once NetBSD 10 is officially
branched. It'll be a frankencritter of the first order, because
USB didn't exist when the machine was built, but hey...
regards, tom lane
Hi,
On 2022-07-02 11:33:54 -0700, Andres Freund wrote:
If we decide we want to continue supporting AIX we should bite the bullet and
add a 64bit-int TYPALIGN_*. It might be worth to translate that to bytes when
building tupledescs, so we don't need more branches (reducing them compared to
today).
I just thought an easier way - why don't we introduce a 'catalog_double'
that's defined to be pg_attribute_aligned(whatever-we-need) on AIX? Then we
can get rid of the manually enforced alignedness and we don't need to contort
catalog order.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
I just thought an easier way - why don't we introduce a 'catalog_double'
that's defined to be pg_attribute_aligned(whatever-we-need) on AIX? Then we
can get rid of the manually enforced alignedness and we don't need to contort
catalog order.
Hm, do all the AIX compilers we care about have support for that?
If so, it seems like a great idea.
regards, tom lane
On 05.07.22 07:31, Andres Freund wrote:
On 2022-07-02 11:33:54 -0700, Andres Freund wrote:
If we decide we want to continue supporting AIX we should bite the bullet and
add a 64bit-int TYPALIGN_*. It might be worth to translate that to bytes when
building tupledescs, so we don't need more branches (reducing them compared to
today).I just thought an easier way - why don't we introduce a 'catalog_double'
that's defined to be pg_attribute_aligned(whatever-we-need) on AIX? Then we
can get rid of the manually enforced alignedness and we don't need to contort
catalog order.
Isn't the problem that on AIX, double and int64 have different alignment
requirements, and we just check the one for double and apply it to
int64? That ought to be fixable by two separate alignment checks in
configure and a new alignment letter for pg_type.
Hi,
On 2022-07-05 01:36:24 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
I just thought an easier way - why don't we introduce a 'catalog_double'
that's defined to be pg_attribute_aligned(whatever-we-need) on AIX? Then we
can get rid of the manually enforced alignedness and we don't need to contort
catalog order.Hm, do all the AIX compilers we care about have support for that?
If so, it seems like a great idea.
Afaics we support xlc and gcc on AIX, and we enable the attribute for both
already. So, I think they do.
Greetings,
Andres Freund
Hi,
On 2022-07-05 08:13:21 +0200, Peter Eisentraut wrote:
On 05.07.22 07:31, Andres Freund wrote:
On 2022-07-02 11:33:54 -0700, Andres Freund wrote:
If we decide we want to continue supporting AIX we should bite the bullet and
add a 64bit-int TYPALIGN_*. It might be worth to translate that to bytes when
building tupledescs, so we don't need more branches (reducing them compared to
today).I just thought an easier way - why don't we introduce a 'catalog_double'
that's defined to be pg_attribute_aligned(whatever-we-need) on AIX? Then we
can get rid of the manually enforced alignedness and we don't need to contort
catalog order.Isn't the problem that on AIX, double and int64 have different alignment
requirements, and we just check the one for double and apply it to int64?
That ought to be fixable by two separate alignment checks in configure and a
new alignment letter for pg_type.
Except that that's quite a bit of work to get right, particularly without
regressing the performance on all platforms. The attalign switches during
tuple deforming are already quite hot.
Greetings,
Andres Freund