AIX support
Hello Team,
We are working on AIX systems and noticed that the thread on removing AIX support in Postgres going forward.
https://github.com/postgres/postgres/commit/0b16bb8776bb834eb1ef8204ca95dd7667ab948b”
We would be glad to understand any outstanding issues hindering the support on AIX.
It is important for us to have Postgres to be supported on AIX. As we are using Postgres extensively on AIX.
Also we would like to provide any feasible support from our end for enabling the support on AIX.
We would request the community to extend the support on AIX ..
Thanks & regards,
Sriram.
On 2024-Mar-21, Sriram RK wrote:
Hello Team,
We are working on AIX systems and noticed that the thread on removing AIX support in Postgres going forward.
https://github.com/postgres/postgres/commit/0b16bb8776bb834eb1ef8204ca95dd7667ab948b”
We would be glad to understand any outstanding issues hindering the support on AIX.
There's a Discussion link at the bottom of that commit message. I
suggest you read that discussion complete, and consider how much effort
you or your company are willing to spend on doing the maintenance of the
port yourselves for the community. Maybe ponder this question: would it
be less onerous to migrate your Postgres servers to Linux, like Phil
Florent described on the currently-last message of that thread?
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Para tener más hay que desear menos"
Sriram RK <sriram.rk@outlook.com> writes:
We are working on AIX systems and noticed that the thread on removing AIX support in Postgres going forward.
https://github.com/postgres/postgres/commit/0b16bb8776bb834eb1ef8204ca95dd7667ab948b
We would be glad to understand any outstanding issues hindering the
support on AIX.
Did you read the linked thread? Starting say about here:
/messages/by-id/20240224172345.32@rfd.leadboat.com
Also we would like to provide any feasible support from our end for enabling the support on AIX.
Who is "we", and how much resources are you prepared to put into this?
We would request the community to extend the support on AIX ..
The community, in the sense of the existing people doing significant
work on Postgres, are absolutely not going to do that. If you can
bring a bunch of work to fix all the problems noted in the discussion
thread, and commit to providing ongoing development manpower and
hardware to keep it working, maybe something could happen. But I
suspect you will find it cheaper to start thinking about migrating
off AIX.
regards, tom lane
Thanks, Tom and Alvaro, for the info.
We shall look into to details and get back.
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thursday, 21 March 2024 at 7:27 PM
To: Sriram RK <sriram.rk@outlook.com>
Cc: pgsql-hackers@postgresql.org <pgsql-hackers@postgresql.org>
Subject: Re: AIX support
Sriram RK <sriram.rk@outlook.com> writes:
We are working on AIX systems and noticed that the thread on removing AIX support in Postgres going forward.
https://github.com/postgres/postgres/commit/0b16bb8776bb834eb1ef8204ca95dd7667ab948b
We would be glad to understand any outstanding issues hindering the
support on AIX.
Did you read the linked thread? Starting say about here:
/messages/by-id/20240224172345.32@rfd.leadboat.com
Also we would like to provide any feasible support from our end for enabling the support on AIX.
Who is "we", and how much resources are you prepared to put into this?
We would request the community to extend the support on AIX ..
The community, in the sense of the existing people doing significant
work on Postgres, are absolutely not going to do that. If you can
bring a bunch of work to fix all the problems noted in the discussion
thread, and commit to providing ongoing development manpower and
hardware to keep it working, maybe something could happen. But I
suspect you will find it cheaper to start thinking about migrating
off AIX.
regards, tom lane
Hi Team,
We are setting up the build environment and trying to build the source and also trying to analyze the assert from the Aix point of view.
Also, would like to know if we can access the buildfarm(power machines) to run any of the specific tests to hit this assert.
Thanks & regards,
Sriram.
Show quoted text
From: Sriram RK <sriram.rk@outlook.com>
Date: Thursday, 21 March 2024 at 10:05 PM
To: Tom Lane tgl@sss.pgh.pa.us<mailto:tgl@sss.pgh.pa.us>, Alvaro Herrera <alvherre@alvh.no-ip.org>
Cc: pgsql-hackers@postgresql.org<mailto:pgsql-hackers@postgresql.org> <pgsql-hackers@postgresql.org>
Subject: Re: AIX support
Thanks, Tom and Alvaro, for the info.
We shall look into to details and get back.
On Thu, Mar 28, 2024 at 11:09:43AM +0000, Sriram RK wrote:
We are setting up the build environment and trying to build the source and also trying to analyze the assert from the Aix point of view.
The thread Alvaro and Tom cited contains an analysis. It's a compiler bug.
You can get past the compiler bug by upgrading your compiler; both ibm-clang
17.1.1.2 and gcc 13.2.0 are free from the bug.
Also, would like to know if we can access the buildfarm(power machines) to run any of the specific tests to hit this assert.
https://portal.cfarm.net/users/new/ is the form to request access. It lists
the eligibility criteria.
On Fri, Mar 29, 2024 at 3:48 PM Noah Misch <noah@leadboat.com> wrote:
On Thu, Mar 28, 2024 at 11:09:43AM +0000, Sriram RK wrote:
We are setting up the build environment and trying to build the source and also trying to analyze the assert from the Aix point of view.
The thread Alvaro and Tom cited contains an analysis. It's a compiler bug.
You can get past the compiler bug by upgrading your compiler; both ibm-clang
17.1.1.2 and gcc 13.2.0 are free from the bug.
For the specific issue that triggered that, I strongly suspect that it
would go away if we just used smgrzeroextend() instead of smgrextend()
using that variable with the alignment requirement, since, as far as I
can tell from build farm clues, the otherwise similar function-local
static variable used by the former (ie one that the linker must still
control the location of AFAIK?) seems to work fine.
But we didn't remove AIX because of that, it was just the straw that
broke the camel's back.
Noah Misch <noah@leadboat.com> writes:
On Thu, Mar 28, 2024 at 11:09:43AM +0000, Sriram RK wrote:
Also, would like to know if we can access the buildfarm(power machines) to run any of the specific tests to hit this assert.
https://portal.cfarm.net/users/new/ is the form to request access. It lists
the eligibility criteria.
There might be some confusion here about what system we are talking
about. The Postgres buildfarm is described at
https://buildfarm.postgresql.org/index.html
but it consists of a large number of individual machines run by
individual owners. There would not be a lot of point in adding a
new AIX machine to the Postgres buildfarm right now, because it
would surely fail to build HEAD. What Noah is referencing is
the GCC compile farm, which happens to include some AIX machines.
The existing AIX entries in the Postgres buildfarm are run (by Noah)
on those GCC compile farm machines, which really the GCC crowd have
been *very* forgiving about letting us abuse like that. If you have
your own AIX hardware there's not a lot of reason that you should
need to access the GCC farm.
What you do need to do to reproduce the described problems is
check out the Postgres git tree and rewind to just before
commit 0b16bb877, where we deleted AIX support. Any attempt
to restore AIX support would have to start with reverting that
commit (and perhaps the followup f0827b443).
regards, tom lane
On Fri, Mar 29, 2024 at 4:00 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Fri, Mar 29, 2024 at 3:48 PM Noah Misch <noah@leadboat.com> wrote:
The thread Alvaro and Tom cited contains an analysis. It's a compiler bug.
You can get past the compiler bug by upgrading your compiler; both ibm-clang
17.1.1.2 and gcc 13.2.0 are free from the bug.For the specific issue that triggered that, I strongly suspect that it
would go away if we just used smgrzeroextend() instead of smgrextend()
using that variable with the alignment requirement, since, as far as I
can tell from build farm clues, the otherwise similar function-local
static variable used by the former (ie one that the linker must still
control the location of AFAIK?) seems to work fine.
Oh, sorry, I had missed the part where newer compilers fix the issue
too. Old out-of-support versions of AIX running old compilers, what
fun.
Thomas Munro <thomas.munro@gmail.com> writes:
Oh, sorry, I had missed the part where newer compilers fix the issue
too. Old out-of-support versions of AIX running old compilers, what
fun.
Indeed. One of the topics that needs investigation if you want to
pursue this is which AIX system and compiler versions still deserve
support, and which of the AIX hacks we had been carrying still need
to be there based on that analysis. For context, we've been pruning
support for extinct-in-the-wild OS versions pretty aggressively
over the past couple of years, and I'd expect to apply the same
standard to AIX.
regards, tom lane
What you do need to do to reproduce the described problems is
check out the Postgres git tree and rewind to just before
commit 0b16bb877, where we deleted AIX support. Any attempt
to restore AIX support would have to start with reverting that
commit (and perhaps the followup f0827b443).
regards, tom lane
Hi Team, thank you for all the info.
We progressed to build the source on our nodes and the build was successful with the below configuration.
Postgres - github-bcdfa5f2e2f
AIX - 71c
Xlc - 13.1.0
Bison - 3.0.5
Going ahead, we want to build the changes that were removed as part of “0b16bb8776bb8”, with latest Xlc and gcc version.
We were building the source from the postgres ftp server(https://ftp.postgresql.org/pub/source/), would like to understand if there are any source level changes between the ftp server and the source on github?
Regards,
Sriram.
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Friday, 29 March 2024 at 9:03 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Noah Misch <noah@leadboat.com>, Sriram RK <sriram.rk@outlook.com>, Alvaro Herrera <alvherre@alvh.no-ip.org>, pgsql-hackers@postgresql.org <pgsql-hackers@postgresql.org>
Subject: Re: AIX support
Thomas Munro <thomas.munro@gmail.com> writes:
Oh, sorry, I had missed the part where newer compilers fix the issue
too. Old out-of-support versions of AIX running old compilers, what
fun.
Indeed. One of the topics that needs investigation if you want to
pursue this is which AIX system and compiler versions still deserve
support, and which of the AIX hacks we had been carrying still need
to be there based on that analysis. For context, we've been pruning
support for extinct-in-the-wild OS versions pretty aggressively
over the past couple of years, and I'd expect to apply the same
standard to AIX.
regards, tom lane
On Fri, Apr 05, 2024 at 04:12:06PM +0000, Sriram RK wrote:
What you do need to do to reproduce the described problems is
check out the Postgres git tree and rewind to just before
commit 0b16bb877, where we deleted AIX support. Any attempt
to restore AIX support would have to start with reverting that
commit (and perhaps the followup f0827b443).
Going ahead, we want to build the changes that were removed as part of “0b16bb8776bb8”, with latest Xlc and gcc version.
We were building the source from the postgres ftp server(https://ftp.postgresql.org/pub/source/), would like to understand if there are any source level changes between the ftp server and the source on github?
To succeed in this endeavor, you'll need to develop fluency in the tools to
answer questions like that, or bring in someone who is fluent. In this case,
GNU diff is the standard tool for answering your question. These resources
cover other topics you would need to learn:
https://wiki.postgresql.org/wiki/Developer_FAQ
https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F
Thanks Noah and Team,
We (IBM-AIX team) looked into this issue
/messages/by-id/20240225194322.a5@rfd.leadboat.com
This is related to the compiler issue. The compilers xlc(13.1) and gcc(8.0) have issues. But we verified that this issue is resolved with the newer compiler versions openXL(xlc17.1) and gcc(12.0) onwards.
We reported this issue to the xlc team and they have noted this issue. A fix might be possible in May for this issue in xlc v16. We would like to understand if the community can start using the latest compilers to build the source.
Also as part of the support, we will help in fixing all the issues related to AIX and continue to support AIX for Postgres. If we need another CI environment we can work to make one available. But for time being can we start reverting the patch that has removed AIX support.
We want to make a note that postgres is used extensively in our IBM product and is being exploited by multiple customers.
Please let us know if there are any specific details you'd like to discuss further.
Regards,
Sriram.
On 18 April 2024 14:15:43 GMT+03:00, Sriram RK <sriram.rk@outlook.com> wrote:
Thanks Noah and Team,
We (IBM-AIX team) looked into this issue
/messages/by-id/20240225194322.a5@rfd.leadboat.com
This is related to the compiler issue. The compilers xlc(13.1) and gcc(8.0) have issues. But we verified that this issue is resolved with the newer compiler versions openXL(xlc17.1) and gcc(12.0) onwards.
We reported this issue to the xlc team and they have noted this issue. A fix might be possible in May for this issue in xlc v16. We would like to understand if the community can start using the latest compilers to build the source.
Also as part of the support, we will help in fixing all the issues related to AIX and continue to support AIX for Postgres. If we need another CI environment we can work to make one available. But for time being can we start reverting the patch that has removed AIX support.
Let's start by setting up a new AIX buildfarm member. Regardless of what we do with v17, we continue to support AIX on the stable branches, and we really need a buildfarm member to keep the stable branches working anyway.
We want to make a note that postgres is used extensively in our IBM product and is being exploited by multiple customers.
Noted. I'm glad to hear you are interested to put in some effort for this. The situation from the current maintainers is that none of us have much interest, or resources or knowledge to keep the AIX build working, so we'll definitely need the help.
No promises on v17, but let's at least make sure the stable branches keep working. And with the patches and buildfarm support from you, maybe v17 is feasible too.
- Heikki
Let's start by setting up a new AIX buildfarm member. Regardless of what we do with v17, we continue to support AIX on the stable branches, and we really need a buildfarm member to keep the stable branches working anyway.
Thanks Heikki. We had already build the source code(v17+ bcdfa5f2e2f) on our local nodes. We will try to setup the buildfarm and let you know.
Is there any specific configuration we are looking for?
Regards,
Sriram.
Hi,
On 2024-04-18 11:15:43 +0000, Sriram RK wrote:
We (IBM-AIX team) looked into this issue
/messages/by-id/20240225194322.a5@rfd.leadboat.com
This is related to the compiler issue. The compilers xlc(13.1) and gcc(8.0)
have issues. But we verified that this issue is resolved with the newer
compiler versions openXL(xlc17.1) and gcc(12.0) onwards.
The reason we used these compilers was that those were the only ones we had
kinda somewhat reasonable access to, via the gcc projects'
"compile farm" https://portal.cfarm.net/
We have to rely on whatever the aix machines there provide. They're not
particularly plentiful resource-wise either.
This is generally one of the big issues with AIX support. There are other
niche-y OSs that don't have a lot of users, e.g. openbsd, but in contrast to
AIX I can just start an openbsd VM within a few minutes and iron out whatever
portability issue I'm dealing with.
Not being AIX customers we also can't raise bugs about compiler bugs, so we're
stuck doing bad workarounds.
Also as part of the support, we will help in fixing all the issues related
to AIX and continue to support AIX for Postgres. If we need another CI
environment we can work to make one available. But for time being can we
start reverting the patch that has removed AIX support.
The state when was removed was not in a state that I am OK with adding back.
We want to make a note that postgres is used extensively in our IBM product
and is being exploited by multiple customers.
To be blunt: Then it'd have been nice to see some investment in that before
now. Both on the code level and the infrastructure level (i.e. access to
machines etc).
Greetings,
Andres Freund
On Fri, Apr 19, 2024 at 6:01 AM Andres Freund <andres@anarazel.de> wrote:
On 2024-04-18 11:15:43 +0000, Sriram RK wrote:
We (IBM-AIX team) looked into this issue
/messages/by-id/20240225194322.a5@rfd.leadboat.com
This is related to the compiler issue. The compilers xlc(13.1) and gcc(8.0)
have issues. But we verified that this issue is resolved with the newer
compiler versions openXL(xlc17.1) and gcc(12.0) onwards.The reason we used these compilers was that those were the only ones we had
kinda somewhat reasonable access to, via the gcc projects'
"compile farm" https://portal.cfarm.net/
We have to rely on whatever the aix machines there provide. They're not
particularly plentiful resource-wise either.
To be fair, those OSUOSL machines are donated by IBM:
https://osuosl.org/services/powerdev/
It's just that they seem to be mostly focused on supporting Linux on
POWER, with only a couple of AIX hosts (partitions/virtual machines?)
made available via portal.cfarm.net, and they only very recently added
a modern AIX 7.3 host. That's cfarm119, upgraded in September-ish,
long after many threads on this list that between-the-lines threatened
to drop support.
This is generally one of the big issues with AIX support. There are other
niche-y OSs that don't have a lot of users, e.g. openbsd, but in contrast to
AIX I can just start an openbsd VM within a few minutes and iron out whatever
portability issue I'm dealing with.
Yeah. It is a known secret that you can run AIX inside Qemu/kvm (it
appears that IBM has made changes to it to make that possible, because
earlier AIX versions didn't like Qemu's POWER emulation or
virtualisation, there are blog posts about it), but IBM doesn't
actually make the images available to non-POWER-hardware owners (you
need a serial number). If I were an OS vendor and wanted developers
to target my OS for free, at the very top of my TODO list I would
have: provide an easy to use image for developers to be able to spin
something up in minutes and possibly even use in CI systems. That's
the reason I can fix any minor portability issue on Linux, illumos,
*BSD quickly and Windows with only moderate extra pain. Even Oracle
knows this, see Solaris CBE.
We want to make a note that postgres is used extensively in our IBM product
and is being exploited by multiple customers.To be blunt: Then it'd have been nice to see some investment in that before
now. Both on the code level and the infrastructure level (i.e. access to
machines etc).
In the AIX space generally, there were even clues that funding had
been completely cut even for packaging PostgreSQL. I was aware of two
packaging projects (not sure how they were related):
1. The ATOS packaging group, who used to show up on our mailing lists
and discuss code changes, which announced it was shutting down:
https://github.com/power-devops/bullfreeware
2. And last time I looked a few months back, the IBM AIX Toolbox
packaging project only had PostgreSQL 10 or 11 packages, already out
of support by us, meaning that their maintainer had given up, too:
https://www.ibm.com/support/pages/aix-toolbox-open-source-software-downloads-alpha
However I see that recently (last month?) someone has added PostgreSQL
15, so something has only just reawoken there?
There are quite a lot of threads about AIX problems, but they are
almost all just us non-AIX-users trying to unbreak stupid stuff on the
build farm, which at some points began to seem distinctly quixotic:
chivalric hackers valiantly trying to keep the entire Unix family tree
working even though we don't remember why and th versions involved are
out of support even by the vendor. Of the three old giant commercial
Unixes, HP-UX was dropped without another mention (it really was a
windmill after all), Solaris is somehow easier to deal with (I could
guess it's because it influenced Linux and BSD so much, ELF and linker
details spring to mind), while AIX fails on every dimension:
unrepresented by users, lacking in build farm, unavailable to
non-customers, and unusual among Unixen.
For any complier/hardware related issue we should able to provide support.
We are in the process of identifying the AIX systems that can be added to the CI/buildfarm environment.
Regards,
Sriram.
On 19.04.24 13:04, Sriram RK wrote:
For any complier/hardware related issue we should able to provide support.
We are in the process of identifying the AIX systems that can be added
to the CI/buildfarm environment.
I think we should manage expectations here, if there is any hope of
getting AIX support back into PG17.
I have some sympathy for this. The discussion about removing AIX
support had a very short turnaround and happened in an unrelated thread,
without any sort of public announcement or consultation. So this report
of "hey, we were still using that" is timely and fair.
But the underlying issue that led to the removal (something to do with
direct I/O support and alignment) would still need to be addressed. And
this probably wouldn't just need some infrastructure support; it would
require contributions from someone who actively knows how to develop on
this platform. Now, direct I/O is currently sort of an experimental
feature, so disabling it on AIX, as was initially suggested in that
discussion, might be okay for now, but the issue will come up again.
Even if this new buildfarm support is forthcoming, there has to be some
sort of deadline in any resurrection attempts for PG17. The first beta
date has been set for 23 May. If we are making the reinstatement of AIX
support contingent on new buildfarm support, those machines need to be
available, at least initially, at least for backbranches, like in a
week. Which seems tight.
I can see several ways going forward:
1. We revert the removal of AIX support and carry on with the status quo
ante. (The removal of AIX is a regression; it is timely and in scope
now to revert the change.)
2. Like (1), but we consider that notice has been given, and we will
remove it early in PG18 (like August) unless the situation improves.
3. We leave it out of PG17 and consider a new AIX port for PG18 on its
own merits.
Note that such a "new" port would probably require quite a bit of
development and research work, to clean up all the cruft that had
accumulated over the years in the old port. Another looming issue is
that the meson build system only supported AIX with gcc before the
removal. I don't know what it would take to expand that to support
xclang, but if it requires meson upstream work, you have that to do, too.
Peter Eisentraut <peter@eisentraut.org> writes:
I have some sympathy for this. The discussion about removing AIX
support had a very short turnaround and happened in an unrelated thread,
without any sort of public announcement or consultation. So this report
of "hey, we were still using that" is timely and fair.
Yup, that's a totally fair complaint. Still ...
I can see several ways going forward:
1. We revert the removal of AIX support and carry on with the status quo
ante. (The removal of AIX is a regression; it is timely and in scope
now to revert the change.)
2. Like (1), but we consider that notice has been given, and we will
remove it early in PG18 (like August) unless the situation improves.
3. We leave it out of PG17 and consider a new AIX port for PG18 on its
own merits.
Andres has ably summarized the reasons why the status quo ante was
getting untenable. The direct-I/O problem could have been tolerable
on its own, but in reality it was the straw that broke the camel's
back so far as our willingness to maintain AIX support went. There
were just too many hacks and workarounds for too many problems,
with too few people interested in looking for better answers.
So I'm totally not in favor of #1, at least not without some hard
commitments and follow-through on really cleaning up the mess
(which maybe looks more like your #2). What's needed here, as
you said, is for someone with a decent amount of expertise in
modern AIX to review all the issues. Maybe framing that as a
"new port" per #3 would be a good way to think about it. But
I don't want to just revert the AIX-ectomy and continue drifting.
On the whole, it wouldn't be the worst thing in the world if PG 17
lacks AIX support but that comes back in PG 18. That approach would
solve the schedule-crunch aspect and give time for considered review
of how many of the hacks removed in 0b16bb877 really need to be put
back, versus being obsolete or amenable to a nicer solution in
late-model AIX. If we take a "new port" mindset then it would be
totally reasonable to say that it only supports very recent AIX
releases, so I'd hope at least some of the cruft could be removed.
regards, tom lane