BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()

Started by PanBianover 8 years ago10 messagesbugs
Jump to latest
#1PanBian
bianpan2016@163.com

The following bug has been logged on the website:

Bug reference: 14929
Logged by: Pan Bian
Email address: bianpan2016@163.com
PostgreSQL version: 10.1
Operating system: Linux
Description:

File: src/backend/access/transam/twophase.c
Function: restoreTwoPhaseData
Line: 1738

AllocateDir() will return a NULL pointer if it fails to open the specified
directory. However, in function restoreTwoPhaseData(), its return value is
not checked. This may result in a NULL pointer dereference when trying to
free it (see line 1759).

For your convenience, I copy and paste related codes as follows:

1732 void
1733 restoreTwoPhaseData(void)
1734 {
1735 DIR *cldir;
1736 struct dirent *clde;
1737
1738 cldir = AllocateDir(TWOPHASE_DIR);
1739 LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
1740 while ((clde = ReadDir(cldir, TWOPHASE_DIR)) != NULL)
1741 {
...
1758 LWLockRelease(TwoPhaseStateLock);
1759 FreeDir(cldir);
1760 }

Thank you!

Pan Bian

#2Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: PanBian (#1)
Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()

On 2017/11/27 18:31, bianpan2016@163.com wrote:

The following bug has been logged on the website:

Bug reference: 14929
Logged by: Pan Bian
Email address: bianpan2016@163.com
PostgreSQL version: 10.1
Operating system: Linux
Description:

File: src/backend/access/transam/twophase.c
Function: restoreTwoPhaseData
Line: 1738

AllocateDir() will return a NULL pointer if it fails to open the specified
directory. However, in function restoreTwoPhaseData(), its return value is
not checked. This may result in a NULL pointer dereference when trying to
free it (see line 1759).

For your convenience, I copy and paste related codes as follows:

1732 void
1733 restoreTwoPhaseData(void)
1734 {
1735 DIR *cldir;
1736 struct dirent *clde;
1737
1738 cldir = AllocateDir(TWOPHASE_DIR);
1739 LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
1740 while ((clde = ReadDir(cldir, TWOPHASE_DIR)) != NULL)
1741 {
...
1758 LWLockRelease(TwoPhaseStateLock);
1759 FreeDir(cldir);
1760 }

Thanks for the report.

It seems like a good idea to check cldir for NULL before freeing. Please
find attached a patch to implement the same.

Thanks,
Amit

Attachments:

AllocateDir-result-could-be-NULL.patchtext/plain; charset=UTF-8; name=AllocateDir-result-could-be-NULL.patchDownload+2-1
#3Michael Paquier
michael@paquier.xyz
In reply to: PanBian (#1)
Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()

On Mon, Nov 27, 2017 at 6:31 PM, <bianpan2016@163.com> wrote:

AllocateDir() will return a NULL pointer if it fails to open the specified
directory. However, in function restoreTwoPhaseData(), its return value is
not checked. This may result in a NULL pointer dereference when trying to
free it (see line 1759).

You are missing the fact that ReadDir goes through ReadDirExtended,
which drops an ERROR log if the folder allocated is NULL.
--
Michael

#4Michael Paquier
michael@paquier.xyz
In reply to: Amit Langote (#2)
Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()

On Mon, Nov 27, 2017 at 7:35 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

It seems like a good idea to check cldir for NULL before freeing. Please
find attached a patch to implement the same.

There are other places where this could be added like
CheckPointLogicalRewriteHeap() if you'd like to make the code more
consistent, but my opinion would be on the contrary to drop all the
checks after AllocateDir if ReadDir is directly used. Not worth
bothering at the end.
--
Michael

#5Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Michael Paquier (#3)
Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()

On 2017/11/27 19:53, Michael Paquier wrote:

On Mon, Nov 27, 2017 at 6:31 PM, <bianpan2016@163.com> wrote:

AllocateDir() will return a NULL pointer if it fails to open the specified
directory. However, in function restoreTwoPhaseData(), its return value is
not checked. This may result in a NULL pointer dereference when trying to
free it (see line 1759).

You are missing the fact that ReadDir goes through ReadDirExtended,
which drops an ERROR log if the folder allocated is NULL.

I noticed that too, but isn't possible that elevel might be such that we
end up returning to restoreTwoPhaseData() after all and hit the line in it
that will then dereference the NULL cldir? Maybe, that never happens
because, elevel is never less than ERROR in that code path?

Thanks,
Amit

#6Michael Paquier
michael@paquier.xyz
In reply to: Amit Langote (#5)
Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()

On Mon, Nov 27, 2017 at 8:09 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

On 2017/11/27 19:53, Michael Paquier wrote:

On Mon, Nov 27, 2017 at 6:31 PM, <bianpan2016@163.com> wrote:

AllocateDir() will return a NULL pointer if it fails to open the specified
directory. However, in function restoreTwoPhaseData(), its return value is
not checked. This may result in a NULL pointer dereference when trying to
free it (see line 1759).

You are missing the fact that ReadDir goes through ReadDirExtended,
which drops an ERROR log if the folder allocated is NULL.

I noticed that too, but isn't possible that elevel might be such that we
end up returning to restoreTwoPhaseData() after all and hit the line in it
that will then dereference the NULL cldir? Maybe, that never happens
because, elevel is never less than ERROR in that code path?

I don't quite follow. ReadDir is called at least once via
ReadDirExtended with elevel = ERROR. So if the input folder is NULL,
the call to ReadDir directly fails so you would not face a pointer
dereference. Something that can turn wrong though is the error message
generated if errno is changed in LWLockAcquire between the calls of
AllocateDir and ReadDir. For that the current code pattern is actually
incorrect.
--
Michael

#7PanBian
bianpan2016@163.com
In reply to: Michael Paquier (#3)
Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()

On Mon, Nov 27, 2017 at 07:53:30PM +0900, Michael Paquier wrote:

On Mon, Nov 27, 2017 at 6:31 PM, <bianpan2016@163.com> wrote:

AllocateDir() will return a NULL pointer if it fails to open the specified
directory. However, in function restoreTwoPhaseData(), its return value is
not checked. This may result in a NULL pointer dereference when trying to
free it (see line 1759).

You are missing the fact that ReadDir goes through ReadDirExtended,
which drops an ERROR log if the folder allocated is NULL.

You are right. Its my carelessness. ReadDir will not return back on a
NULL dir parameter. The code is bug free. Sorry for the trouble.

Thank you all,
Pan Bian

Show quoted text

--
Michael

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: PanBian (#7)
Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()

PanBian <bianpan2016@163.com> writes:

On Mon, Nov 27, 2017 at 07:53:30PM +0900, Michael Paquier wrote:

You are missing the fact that ReadDir goes through ReadDirExtended,
which drops an ERROR log if the folder allocated is NULL.

You are right. Its my carelessness. ReadDir will not return back on a
NULL dir parameter. The code is bug free. Sorry for the trouble.

There are some issues here, though:

1. twophase.c does this:

cldir = AllocateDir(TWOPHASE_DIR);
LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
while ((clde = ReadDir(cldir, TWOPHASE_DIR)) != NULL)
{

which is flat out wrong because LWLockAcquire might well clobber errno.
I don't see any good reason why we couldn't just swap the order of
those two calls.

2. basebackup.c and some other places do things like

dir = AllocateDir("pg_wal");
if (!dir)
ereport(ERROR,
(errmsg("could not open directory \"%s\": %m", "pg_wal")));
while ((de = ReadDir(dir, "pg_wal")) != NULL)

Not only is this a waste of code, because the error message is no better
than what ReadDir would provide, but it's wrong because it omits
errcode_for_file_access(), causing the SQLSTATE to be reported as XX000.
There are other places that are even lazier and use elog(), failing
translatability as well as the errcode test.

There might be some other problems I missed in a quick scan.

So there's definitely room for a cleanup patch here, but the originally
proposed change isn't it.

regards, tom lane

#9Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#8)
Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()

On Tue, Nov 28, 2017 at 12:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

PanBian <bianpan2016@163.com> writes:

On Mon, Nov 27, 2017 at 07:53:30PM +0900, Michael Paquier wrote:

You are missing the fact that ReadDir goes through ReadDirExtended,
which drops an ERROR log if the folder allocated is NULL.

You are right. Its my carelessness. ReadDir will not return back on a
NULL dir parameter. The code is bug free. Sorry for the trouble.

There are some issues here, though:

1. twophase.c does this:

cldir = AllocateDir(TWOPHASE_DIR);
LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
while ((clde = ReadDir(cldir, TWOPHASE_DIR)) != NULL)
{

which is flat out wrong because LWLockAcquire might well clobber errno.
I don't see any good reason why we couldn't just swap the order of
those two calls.

I have not checked if it actually updates errno or not, but relying on
the fact that it may do it sucks.

2. basebackup.c and some other places do things like

dir = AllocateDir("pg_wal");
if (!dir)
ereport(ERROR,
(errmsg("could not open directory \"%s\": %m", "pg_wal")));
while ((de = ReadDir(dir, "pg_wal")) != NULL)

Not only is this a waste of code, because the error message is no better
than what ReadDir would provide, but it's wrong because it omits
errcode_for_file_access(), causing the SQLSTATE to be reported as XX000.
There are other places that are even lazier and use elog(), failing
translatability as well as the errcode test.

I agree with using ereport() everywhere, a path may have been created
by initdb, but anything used after a base backup should be reported to
the user.

There might be some other problems I missed in a quick scan.

So there's definitely room for a cleanup patch here, but the originally
proposed change isn't it.

I have spotted more problems. In pg_available_extensions,
AllocateDir() does nothing in the event of an error but forgets to
reset errno. In perform_base_backup, CheckXLogRemoved() is called
without saving errno, so the error message generated after may be
wrong.

I think that this requires its own new thread with a more extended
analysis on -hackers to attract attention, this goes way beyond the
original complain about a pointer dereference. And there is a
collection of small issues. I'll try to look at that after I am done
with my CF duties except if someone beats me or volunteers for it.
--
Michael

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#9)
Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()

Michael Paquier <michael.paquier@gmail.com> writes:

On Tue, Nov 28, 2017 at 12:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

1. twophase.c does this:
cldir = AllocateDir(TWOPHASE_DIR);
LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
while ((clde = ReadDir(cldir, TWOPHASE_DIR)) != NULL)
which is flat out wrong because LWLockAcquire might well clobber errno.

I have not checked if it actually updates errno or not, but relying on
the fact that it may do it sucks.

I think it might only change errno in some code paths, for example
LWLockAcquire -> wait needed -> PGSemaphoreLock -> sem_wait gets
interrupted -> EINTR. Which is the worst of all worlds, because
the report would usually be right and only occasionally mislead you.

There might be some other problems I missed in a quick scan.

I have spotted more problems.

Ugh.

I think that this requires its own new thread with a more extended
analysis on -hackers to attract attention,

Agreed.

One thing I was thinking about is that there are places such as
DeleteAllExportedSnapshotFiles and RelationCacheInitFileRemove that
want the error message to come out only at LOG level, not ERROR,
because they're running in the postmaster. But not only are they
issuing a mostly-duplicative ereport call, but they're really not
protecting themselves fully, because you'd probably want failures
inside ReadDir to be reported at LOG level as well. So what this
leads me to think is that we should export ReadDirExtended (currently
that's only accessible inside fd.c) and then the use-case would be
solvable with

dir = AllocateDir(path);
while ((dirent = ReadDirExtended(dir, path, LOG)) != NULL)
process dirent;
FreeDir(dir);

This line of thinking also says that FreeDir needs to be tweaked to
do nothing if dir == NULL, assuming that somebody should have logged
the directory-open failure already.

The other thing that really could use discussion, as you mentioned
upthread, is how useful are custom error messages for AllocateDir
failures. I'm not real sure for instance that
could not open write-ahead log directory "pg_wal": ...
is a useful improvement over a generic message
could not open directory "pg_wal": ...
Experts will know what pg_wal is anyway, and non-experts may not
gain much from being told it's a write-ahead log directory.

I'm prepared to listen to arguments that these custom messages
(or at least some of them) are worth keeping, but I think someone
ought to make that case, rather than letting them stand just because
they're there now.

regards, tom lane