[PATCH] Prefer getenv("HOME") to find the UNIX home directory

Started by Anders Kaseorgover 4 years ago19 messageshackers
Jump to latest
#1Anders Kaseorg
andersk@mit.edu

According to getpwnam(3):

An application that wants to determine its user's home directory
should inspect the value of HOME (rather than the value
getpwuid(getuid())->pw_dir) since this allows the user to modify
their notion of "the home directory" during a login session.

This is important for systems where many users share the same UID, and for test systems that change HOME to avoid interference with the user’s real home directory. It matches what most applications do, as well as what glibc does for glob("~", GLOB_TILDE, …) and wordexp("~", …).

There was some previous discussion of this in 2016, where although there were some questions about the use case, there seemed to be general support for the concept:

/messages/by-id/CAEH6cQqbdbXoUHJBbX9ixwfjFFsUC-a8hFntKcci=diWgBb3fQ@mail.gmail.com

Regardless of whether one thinks modifying HOME is a good idea, if we happen to find ourselves in that case, we should respect the modified HOME, so that when the user creates (say) a ~/.pgpass file, we’ll look for it at the same place the user’s editor created it. getenv() also skips the overhead of reading /etc/passwd as an added bonus.

The way I ran into this issue myself was in a test suite that runs on GitHub Actions, which automatically sets HOME=/github/home.

Anders

Attachments:

v1-0001-Prefer-getenv-HOME-to-find-the-UNIX-home-director.patchtext/x-patch; name=v1-0001-Prefer-getenv-HOME-to-find-the-UNIX-home-director.patchDownload+32-17
#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Anders Kaseorg (#1)
Re: [PATCH] Prefer getenv("HOME") to find the UNIX home directory

On 2021-Oct-14, Anders Kaseorg wrote:

This is important for systems where many users share the same UID, and
for test systems that change HOME to avoid interference with the
user’s real home directory. It matches what most applications do, as
well as what glibc does for glob("~", GLOB_TILDE, …) and wordexp("~",
…).

There was some previous discussion of this in 2016, where although
there were some questions about the use case, there seemed to be
general support for the concept:

/messages/by-id/CAEH6cQqbdbXoUHJBbX9ixwfjFFsUC-a8hFntKcci=diWgBb3fQ@mail.gmail.com

I think modifying $HOME is a strange way to customize things, but given
how widespread it is [claimed to be] today, it seems reasonable to do
things that way.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#3Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#2)
Re: [PATCH] Prefer getenv("HOME") to find the UNIX home directory

On Mon, Oct 18, 2021 at 07:23:50PM -0300, Alvaro Herrera wrote:

I think modifying $HOME is a strange way to customize things, but given
how widespread it is [claimed to be] today, it seems reasonable to do
things that way.

I am not sure about this claim, but it seems to me that we could get
rid of the duplications in src/port/path.c, libpq/fe-connect.c and
psql/command.c (this one is different for WIN32 but consistency would
be a good thing) as the proposed patch outlines. So I would suggest
to begin with that rather than changing three places to do the same
thing.
--
Michael

#4Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Alvaro Herrera (#2)
Re: [PATCH] Prefer getenv("HOME") to find the UNIX home directory

At Mon, 18 Oct 2021 19:23:50 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in

On 2021-Oct-14, Anders Kaseorg wrote:

This is important for systems where many users share the same UID, and
for test systems that change HOME to avoid interference with the
user’s real home directory. It matches what most applications do, as
well as what glibc does for glob("~", GLOB_TILDE, …) and wordexp("~",
…).

There was some previous discussion of this in 2016, where although
there were some questions about the use case, there seemed to be
general support for the concept:

/messages/by-id/CAEH6cQqbdbXoUHJBbX9ixwfjFFsUC-a8hFntKcci=diWgBb3fQ@mail.gmail.com

I think modifying $HOME is a strange way to customize things, but given
how widespread it is [claimed to be] today, it seems reasonable to do
things that way.

I tend to agree to this, but seeing ssh ignoring $HOME, I'm not sure
it's safe that we follow the variable at least when accessing
confidentiality(?) files. Since I don't understand the exact
reasoning for the ssh's behavior so it's just my humbole opinion.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#5Anders Kaseorg
andersk@mit.edu
In reply to: Kyotaro Horiguchi (#4)
Re: [PATCH] Prefer getenv("HOME") to find the UNIX home directory

On 10/19/21 01:34, Kyotaro Horiguchi wrote:

I tend to agree to this, but seeing ssh ignoring $HOME, I'm not sure
it's safe that we follow the variable at least when accessing
confidentiality(?) files. Since I don't understand the exact
reasoning for the ssh's behavior so it's just my humbole opinion.

According to https://bugzilla.mindrot.org/show_bug.cgi?id=3048#c1, it
used to be supported to install the ssh binary as setuid. A
setuid/setgid binary needs to treat all environment variables with
suspicion: if it can be convinced to write a file to $HOME with root
privileges, then a user who modifies $HOME before invoking the binary
could cause it to write to a file that the user normally couldn’t.

There’s no such concern for a binary that isn’t setuid/setgid. Anyone
with the ability to modify $HOME can be assumed to already have full
control of the user account.

Anders

#6Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Anders Kaseorg (#5)
Re: [PATCH] Prefer getenv("HOME") to find the UNIX home directory

At Tue, 19 Oct 2021 02:44:03 -0700, Anders Kaseorg <andersk@mit.edu> wrote in

On 10/19/21 01:34, Kyotaro Horiguchi wrote:

I tend to agree to this, but seeing ssh ignoring $HOME, I'm not sure
it's safe that we follow the variable at least when accessing
confidentiality(?) files. Since I don't understand the exact
reasoning for the ssh's behavior so it's just my humbole opinion.

According to https://bugzilla.mindrot.org/show_bug.cgi?id=3048#c1, it
used to be supported to install the ssh binary as setuid. A
setuid/setgid binary needs to treat all environment variables with
suspicion: if it can be convinced to write a file to $HOME with root
privileges, then a user who modifies $HOME before invoking the binary
could cause it to write to a file that the user normally couldn’t.

There’s no such concern for a binary that isn’t setuid/setgid. Anyone
with the ability to modify $HOME can be assumed to already have full
control of the user account.

Thansk for the link. Still I'm not sure it's the fact but it sounds
reasonable enough. If that's the case, I vote +1 for psql or other
commands honoring $HOME.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#7Daniel Gustafsson
daniel@yesql.se
In reply to: Kyotaro Horiguchi (#6)
Re: [PATCH] Prefer getenv("HOME") to find the UNIX home directory

On 20 Oct 2021, at 07:40, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

At Tue, 19 Oct 2021 02:44:03 -0700, Anders Kaseorg <andersk@mit.edu> wrote in

On 10/19/21 01:34, Kyotaro Horiguchi wrote:

I tend to agree to this, but seeing ssh ignoring $HOME, I'm not sure
it's safe that we follow the variable at least when accessing
confidentiality(?) files. Since I don't understand the exact
reasoning for the ssh's behavior so it's just my humbole opinion.

According to https://bugzilla.mindrot.org/show_bug.cgi?id=3048#c1, it
used to be supported to install the ssh binary as setuid. A
setuid/setgid binary needs to treat all environment variables with
suspicion: if it can be convinced to write a file to $HOME with root
privileges, then a user who modifies $HOME before invoking the binary
could cause it to write to a file that the user normally couldn’t.

There’s no such concern for a binary that isn’t setuid/setgid. Anyone
with the ability to modify $HOME can be assumed to already have full
control of the user account.

Thansk for the link. Still I'm not sure it's the fact but it sounds
reasonable enough. If that's the case, I vote +1 for psql or other
commands honoring $HOME.

Is the proposed change portable across all linux/unix systems we support?
Reading aobut indicates that it's likely to be, but neither NetBSD nor FreeBSD
have the upthread referenced wording in their manpages.

--
Daniel Gustafsson https://vmware.com/

#8Anders Kaseorg
andersk@mit.edu
In reply to: Daniel Gustafsson (#7)
Re: [PATCH] Prefer getenv("HOME") to find the UNIX home directory

On 10/20/21 04:55, Daniel Gustafsson wrote:

Is the proposed change portable across all linux/unix systems we support?
Reading aobut indicates that it's likely to be, but neither NetBSD nor FreeBSD
have the upthread referenced wording in their manpages.

Since the proposed change falls back to the old behavior if HOME is
unset or empty, I assume this is a question about convention and not
literally about whether it will work on these systems. I don’t find it
surprising that this convention isn’t explicitly called out in every
system’s manpage for the wrong function, but it still applies to these
systems.

POSIX specifies that the shell uses the HOME environment variable for
‘cd’ with no arguments and for the expansion of ~. This implies by
reference that this behavior is required of wordexp() as well.

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/cd.html
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_01
https://pubs.opengroup.org/onlinepubs/9699919799/functions/wordexp.html

libc’s glob() and wordexp() respect HOME in glibc, musl, NetBSD, and
FreeBSD.

https://sourceware.org/git/?p=glibc.git;a=blob;f=posix/glob.c;hb=glibc-2.34#l622
https://sourceware.org/git/?p=glibc.git;a=blob;f=posix/wordexp.c;hb=glibc-2.34#l293

https://git.musl-libc.org/cgit/musl/tree/src/regex/glob.c?h=v1.2.2#n203
https://git.musl-libc.org/cgit/musl/tree/src/misc/wordexp.c?h=v1.2.2#n111

https://github.com/NetBSD/src/blob/netbsd-9/lib/libc/gen/glob.c#L424
https://github.com/NetBSD/src/blob/netbsd-9/lib/libc/gen/wordexp.c#L129-L150
https://github.com/NetBSD/src/blob/netbsd-9/bin/sh/expand.c#L434-L441

https://github.com/freebsd/freebsd-src/blob/release/13.0.0/lib/libc/gen/glob.c#L457
https://github.com/freebsd/freebsd-src/blob/release/13.0.0/lib/libc/gen/wordexp.c#L171-L190
https://github.com/freebsd/freebsd-src/blob/release/13.0.0/bin/sh/expand.c#L396

(Today I learned that musl and BSD libc literally spawn a shell process
to handle wordexp(). Wow.)

Anders

#9Chapman Flack
chap@anastigmatix.net
In reply to: Anders Kaseorg (#1)
Re: Is my home $HOME or is it getpwent()->pw_dir ?

On 12/20/21 09:15, Peter Eisentraut wrote:

On 18.12.21 21:57, Chapman Flack wrote:

When I start psql, strace shows $HOME being honored when looking
for .terminfo and .inputrc, and getpwent()->pw_dir being used
to look for .pgpass, .psqlrc, and .psql_history, which of course
aren't there.

I'm sure the .terminfo and .inputrc lookups are being done by library code.
In my experience, it seems traditionally unixy to let $HOME take precedence.

See this patch: https://commitfest.postgresql.org/36/3362/

Wow, just a couple months ago. Yes, I should have tagged on to that
rather than starting a new thread.

I was proposing an option or variable on the assumption that just changing
the default behavior would be off the table. But I am +1 on just changing
the default behavior, if that's not off the table.

Regards,
-Chap

*seeing that RFC 5322 3.6.4 permits more than one msg-id for in-reply-to,
crosses fingers to see what PGLister will make of it*

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Anders Kaseorg (#8)
Re: [PATCH] Prefer getenv("HOME") to find the UNIX home directory

Anders Kaseorg <andersk@mit.edu> writes:

On 10/20/21 04:55, Daniel Gustafsson wrote:

Is the proposed change portable across all linux/unix systems we support?
Reading aobut indicates that it's likely to be, but neither NetBSD nor FreeBSD
have the upthread referenced wording in their manpages.

Since the proposed change falls back to the old behavior if HOME is
unset or empty, I assume this is a question about convention and not
literally about whether it will work on these systems. I don’t find it
surprising that this convention isn’t explicitly called out in every
system’s manpage for the wrong function, but it still applies to these
systems.

Given the POSIX requirements, it's basically impossible to believe
that there are interesting cases where $HOME isn't set. Thus, it
seems to me that keeping the getpwuid calls will just mean carrying
untestable dead code, so we should simplify matters by ripping
those out and *only* consulting $HOME.

The v1 patch also neglects the matter of documentation. I think
the simplest and most transparent thing to do is just to explicitly
mention $HOME everyplace we talk about files that are sought there,
in place of our current convention to write "~". (I'm too lazy
to go digging in the git history, but I have a feeling that this is
undoing somebody's intentional change from a long time back.)

BTW, not directly impacted by this patch but adjacent to it,
I noted that on Windows psql's \cd defaults to changing to "/".
That seems a bit surprising, and we definitely fail to document it.
I settled for noting it in the documentation, but should we make
it do something else?

PFA v2 patch.

regards, tom lane

Attachments:

v2-0001-use-HOME-for-home-directory.patchtext/x-diff; charset=us-ascii; name=v2-0001-use-HOME-for-home-directory.patchDownload+42-52
#11Anders Kaseorg
andersk@mit.edu
In reply to: Tom Lane (#10)
Re: [PATCH] Prefer getenv("HOME") to find the UNIX home directory

On 1/9/22 10:59, Tom Lane wrote:

Given the POSIX requirements, it's basically impossible to believe
that there are interesting cases where $HOME isn't set. Thus, it
seems to me that keeping the getpwuid calls will just mean carrying
untestable dead code, so we should simplify matters by ripping
those out and *only* consulting $HOME.

While POSIX requires that the login program put you in a conforming
environment, nothing stops the user from building a non-conforming
environment, such as with ‘env -i’. One could argue that such a user
deserves whatever broken behavior they might get. But to me it seems
prudent to continue working there if it worked before.

The v1 patch also neglects the matter of documentation. I think
the simplest and most transparent thing to do is just to explicitly
mention $HOME everyplace we talk about files that are sought there,
in place of our current convention to write "~". (I'm too lazy
to go digging in the git history, but I have a feeling that this is
undoing somebody's intentional change from a long time back.)

The reason I didn’t change the documentation is that this is already
what “~” is supposed to mean according to POSIX and common
implementations. See previous discussion:

/messages/by-id/1634252654444.90107@mit.edu
/messages/by-id/d452fd57-8c34-0a94-79c1-4498eb4ffbdc@mit.edu

I consider my patch a bug fix that implements the behavior one would
already expect from the existing documentation.

Therefore, I still prefer my v1 patch on both counts. I am willing to
be overruled if you still disagree, but I wanted to explain my reasoning.

Anders

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Anders Kaseorg (#11)
Re: [PATCH] Prefer getenv("HOME") to find the UNIX home directory

Anders Kaseorg <andersk@mit.edu> writes:

On 1/9/22 10:59, Tom Lane wrote:

Given the POSIX requirements, it's basically impossible to believe
that there are interesting cases where $HOME isn't set. Thus, it
seems to me that keeping the getpwuid calls will just mean carrying
untestable dead code, so we should simplify matters by ripping
those out and *only* consulting $HOME.

While POSIX requires that the login program put you in a conforming
environment, nothing stops the user from building a non-conforming
environment, such as with ‘env -i’. One could argue that such a user
deserves whatever broken behavior they might get. But to me it seems
prudent to continue working there if it worked before.

The only case that the v1 patch helps such a user for is if they
unset HOME or set it precisely to ''. If they set it to anything
else, it's still broken from their perspective. So I do not find
that that argument holds water.

Moreover, ISTM that the only plausible use-case for unsetting HOME
is to prevent programs from finding stuff in your home directory.
What would be the point otherwise? So it's pretty hard to envision
a case where somebody is actually using, and happy with, the
behavior you argue we ought to keep.

The v1 patch also neglects the matter of documentation.

The reason I didn’t change the documentation is that this is already
what “~” is supposed to mean according to POSIX and common
implementations.

The point here is precisely that we're changing what *we* think ~
means. I don't think we can just leave the docs unchanged.

regards, tom lane

#13Anders Kaseorg
andersk@mit.edu
In reply to: Tom Lane (#12)
Re: [PATCH] Prefer getenv("HOME") to find the UNIX home directory

On 1/9/22 13:04, Tom Lane wrote:

The only case that the v1 patch helps such a user for is if they
unset HOME or set it precisely to ''. If they set it to anything
else, it's still broken from their perspective. So I do not find
that that argument holds water.

Moreover, ISTM that the only plausible use-case for unsetting HOME
is to prevent programs from finding stuff in your home directory.
What would be the point otherwise? So it's pretty hard to envision
a case where somebody is actually using, and happy with, the
behavior you argue we ought to keep.

Obviously a user who intentionally breaks their environment should
expect problems. But what I’m saying is that a user could have written
a script that unsets HOME by *accident* while intending to clear *other*
things out of the environment. They might have developed it by starting
with an empty environment and adding back the minimal set of variables
they needed to get something to work. Since most programs (including
most libcs and shells) do in fact fall back to getpwuid when HOME is
unset, they may not have noticed an unset HOME as a problem. Unsetting
HOME does not, in practice, prevent most programs from finding stuff in
your home directory.

Anders

#14Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#10)
Re: [PATCH] Prefer getenv("HOME") to find the UNIX home directory

On Sun, Jan 09, 2022 at 01:59:02PM -0500, Tom Lane wrote:

Given the POSIX requirements, it's basically impossible to believe
that there are interesting cases where $HOME isn't set.

I've run into this before - children of init may not have HOME set.

It's easy enough to add it if it's needed, but should probably be called out in
the release notes.

--
Justin

#15Christoph Moench-Tegeder
cmt@burggraben.net
In reply to: Tom Lane (#10)
Re: [PATCH] Prefer getenv("HOME") to find the UNIX home directory

## Tom Lane (tgl@sss.pgh.pa.us):

Given the POSIX requirements, it's basically impossible to believe
that there are interesting cases where $HOME isn't set.

When I look at a random Debian with the usual PGDG packages, the
postmaster process (and every backend) has a rather minimal environment
without HOME. When I remember the code correctly, walreceiver uses
the functions from fe-connect.c and may need to find the service file,
a password file or certificates. If I'm correct with that, requiring
HOME to be set would be a significant change for existing "normal"
installations.
What about containers and similar "reduced" environments?

Regards,
Christoph

--
Spare Space

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christoph Moench-Tegeder (#15)
Re: [PATCH] Prefer getenv("HOME") to find the UNIX home directory

Christoph Moench-Tegeder <cmt@burggraben.net> writes:

## Tom Lane (tgl@sss.pgh.pa.us):

Given the POSIX requirements, it's basically impossible to believe
that there are interesting cases where $HOME isn't set.

When I look at a random Debian with the usual PGDG packages, the
postmaster process (and every backend) has a rather minimal environment
without HOME. When I remember the code correctly, walreceiver uses
the functions from fe-connect.c and may need to find the service file,
a password file or certificates. If I'm correct with that, requiring
HOME to be set would be a significant change for existing "normal"
installations.
What about containers and similar "reduced" environments?

Isn't that a flat out violation of POSIX 8.3 Other Environment Variables?

HOME
The system shall initialize this variable at the time of login to
be a pathname of the user's home directory. See <pwd.h>.

To claim it's not, you have to claim these programs aren't logged in,
in which case where did they get any privileges from?

regards, tom lane

#17Christoph Moench-Tegeder
cmt@burggraben.net
In reply to: Tom Lane (#16)
Re: [PATCH] Prefer getenv("HOME") to find the UNIX home directory

## Tom Lane (tgl@sss.pgh.pa.us):

Isn't that a flat out violation of POSIX 8.3 Other Environment Variables?

HOME
The system shall initialize this variable at the time of login to
be a pathname of the user's home directory. See <pwd.h>.

To claim it's not, you have to claim these programs aren't logged in,
in which case where did they get any privileges from?

After poking around across some Linuxes, it looks like people silently
agreed that "services" are not logged-in users: among the daemons,
having HOME set (as observed in /proc/*/environ) is an exception,
not the norm. I'm not sure if that's a "new" thing with systemd,
I don't have a linux with pure SysV-init available (but I guess those
are rare animals anyways).

Regards,
Christoph

--
Spare Space

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christoph Moench-Tegeder (#17)
Re: [PATCH] Prefer getenv("HOME") to find the UNIX home directory

Christoph Moench-Tegeder <cmt@burggraben.net> writes:

## Tom Lane (tgl@sss.pgh.pa.us):

Isn't that a flat out violation of POSIX 8.3 Other Environment Variables?

After poking around across some Linuxes, it looks like people silently
agreed that "services" are not logged-in users: among the daemons,
having HOME set (as observed in /proc/*/environ) is an exception,
not the norm.

Meh. I guess there's not much point in arguing with facts on the
ground. Anders' proposed behavior seems like the way to go then,
at least so far as libpq is concerned. (I remain skeptical that
psql would be run in such an environment, but there's no value
in making it act different from libpq.)

regards, tom lane

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#18)
Re: [PATCH] Prefer getenv("HOME") to find the UNIX home directory

I wrote:

Meh. I guess there's not much point in arguing with facts on the
ground. Anders' proposed behavior seems like the way to go then,
at least so far as libpq is concerned.

So I pushed that, but while working on it I grew quite annoyed
at the messy API exhibited by src/port/thread.c, particularly
at how declaring its functions in port.h requires #including
<netdb.h> and <pwd.h> there. That means those headers are
read by every compile in our tree, though only a tiny number
of modules actually need either. So here are a couple of
follow-on patches to improve that situation.

For pqGethostbyname, there is no consumer other than
src/port/getaddrinfo.c, which makes it even sillier because that
file isn't even compiled on a large majority of platforms, making
pqGethostbyname dead code that we nonetheless build everywhere.
So 0001 attached just moves that function into getaddrinfo.c.

For pqGetpwuid, the best solution seemed to be to add a
less platform-dependent API layer, which I did in 0002
attached. Perhaps someone would object to the API details
I chose, but by and large this seems like an improvement
that reduces the amount of code duplication in the callers.

Thoughts?

regards, tom lane

Attachments:

0001-make-pqGethostbyname-static.patchtext/x-diff; charset=us-ascii; name=0001-make-pqGethostbyname-static.patchDownload+48-45
0002-simplify-pqGetpwuid-API.patchtext/x-diff; charset=us-ascii; name=0002-simplify-pqGetpwuid-API.patchDownload+120-73