logical decoding : exceeded maxAllocatedDescs for .spill files

Started by Amit Khandekarover 6 years ago116 messageshackers
Jump to latest
#1Amit Khandekar
amitdkhan.pg@gmail.com

Hi,

I reproduced the error "exceeded maxAllocatedDescs (492) while trying
to open file ...", which was also discussed about in the thread [1]/messages/by-id/738a590a-2ce5-9394-2bef-7b1caad89b37@2ndquadrant.com.
This issue is similar but not exactly the same as [1]/messages/by-id/738a590a-2ce5-9394-2bef-7b1caad89b37@2ndquadrant.com. In [1]/messages/by-id/738a590a-2ce5-9394-2bef-7b1caad89b37@2ndquadrant.com, the
file for which this error used to show up was
"pg_logical/mappings/map...." , while here it's the .spill file. And
here the issue , in short, seems to be : The .spill file does not get
closed there and then, unlike in [1]/messages/by-id/738a590a-2ce5-9394-2bef-7b1caad89b37@2ndquadrant.com where there was a file descriptor
leak.

I could reproduce it using a transaction containing a long series of
sub-transactions (possibly could be reproduced without
sub-transactions, but looking at the code I could come up with this
script using sub-transactions easily) :

create table tab(id int);

-- Function that does huge changes in a single transaction
create or replace function f(id int) returns int as
$$
begin
-- Iterate this more than 492 times (max transient file
descriptors PG would typically allow)
-- This will create that many sub-transactions due to presence of
exception block.
FOR i IN 1..600 LOOP

BEGIN
-- Iterate more than 4096 times (so that changes spill to
disk: max_changes_in_memory)
FOR j IN 1..5000 LOOP
insert into tab values (1);
END LOOP;
EXCEPTION
when division_by_zero then perform 'dummy';
END;

END LOOP;

return id;
end $$ language plpgsql;

SELECT * FROM pg_create_logical_replication_slot('logical', 'test_decoding');

begin;
select f(1); -- Do huge changes in a single transaction
commit;

\! pg_recvlogical -d postgres --slot=logical --verbose --start -f -

pg_recvlogical: starting log streaming at 0/0 (slot logical)
pg_recvlogical: streaming initiated
pg_recvlogical: confirming write up to 0/0, flush to 0/0 (slot logical)
BEGIN 1869
pg_recvlogical: confirming write up to 0/1B6D6E38, flush to 0/1B6D6E38
(slot logical)
pg_recvlogical: error: unexpected termination of replication stream:
ERROR: exceeded maxAllocatedDescs (492) while trying to open file
"pg_replslot/logical/xid-2362-lsn-0-24000000.spill"
pg_recvlogical: disconnected; waiting 5 seconds to try again

Looking at the code, what might be happening is,
ReorderBufferIterTXNInit()=>ReorderBufferRestoreChanges() opens the
files, but leaves them open if end of file is not reached. Eventually
if end of file is reached, it gets closed. The function returns back
without closing the file descriptor if reorder buffer changes being
restored are more than max_changes_in_memory. Probably later on, the
rest of the changes get restored in another
ReorderBufferRestoreChanges() call. But meanwhile, if there are a lot
of such files opened, we can run out of the max files that PG decides
to keep open (it has some logic that takes into account system files
allowed to be open, and already opened).

Offhand, what I am thinking is, we need to close the file descriptor
before returning from ReorderBufferRestoreChanges(), and keep track of
the file offset and file path, so that next time we can resume reading
from there.

Comments ?

[1]: /messages/by-id/738a590a-2ce5-9394-2bef-7b1caad89b37@2ndquadrant.com

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Khandekar (#1)
Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On 2019-Sep-11, Amit Khandekar wrote:

I reproduced the error "exceeded maxAllocatedDescs (492) while trying
to open file ...", which was also discussed about in the thread [1].
This issue is similar but not exactly the same as [1]. In [1], the
file for which this error used to show up was
"pg_logical/mappings/map...." , while here it's the .spill file. And
here the issue , in short, seems to be : The .spill file does not get
closed there and then, unlike in [1] where there was a file descriptor
leak.

Uh-oh :-( Thanks for the reproducer -- I confirm it breaks things.

Looking at the code, what might be happening is,
ReorderBufferIterTXNInit()=>ReorderBufferRestoreChanges() opens the
files, but leaves them open if end of file is not reached. Eventually
if end of file is reached, it gets closed. The function returns back
without closing the file descriptor if reorder buffer changes being
restored are more than max_changes_in_memory. Probably later on, the
rest of the changes get restored in another
ReorderBufferRestoreChanges() call. But meanwhile, if there are a lot
of such files opened, we can run out of the max files that PG decides
to keep open (it has some logic that takes into account system files
allowed to be open, and already opened).

Makes sense.

Offhand, what I am thinking is, we need to close the file descriptor
before returning from ReorderBufferRestoreChanges(), and keep track of
the file offset and file path, so that next time we can resume reading
from there.

I think doing this all the time would make restore very slow -- there's
a reason we keep the files open, after all. It would be better if we
can keep the descriptors open as much as possible, and only close them
if there's trouble. I was under the impression that using
OpenTransientFile was already taking care of that, but that's evidently
not the case.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alvaro Herrera (#2)
Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant wrote:

On 2019-Sep-11, Amit Khandekar wrote:

I reproduced the error "exceeded maxAllocatedDescs (492) while trying
to open file ...", which was also discussed about in the thread [1].
This issue is similar but not exactly the same as [1]. In [1], the
file for which this error used to show up was
"pg_logical/mappings/map...." , while here it's the .spill file. And
here the issue , in short, seems to be : The .spill file does not get
closed there and then, unlike in [1] where there was a file descriptor
leak.

Uh-oh :-( Thanks for the reproducer -- I confirm it breaks things.

Looking at the code, what might be happening is,
ReorderBufferIterTXNInit()=>ReorderBufferRestoreChanges() opens the
files, but leaves them open if end of file is not reached. Eventually
if end of file is reached, it gets closed. The function returns back
without closing the file descriptor if reorder buffer changes being
restored are more than max_changes_in_memory. Probably later on, the
rest of the changes get restored in another
ReorderBufferRestoreChanges() call. But meanwhile, if there are a lot
of such files opened, we can run out of the max files that PG decides
to keep open (it has some logic that takes into account system files
allowed to be open, and already opened).

Makes sense.

Offhand, what I am thinking is, we need to close the file descriptor
before returning from ReorderBufferRestoreChanges(), and keep track of
the file offset and file path, so that next time we can resume reading
from there.

I think doing this all the time would make restore very slow -- there's a
reason we keep the files open, after all.

How much slower? It certainly will have a hit, but maybe it's negligible
compared to all the other stuff happening in this code?

It would be better if we can keep the descriptors open as much as
possible, and only close them if there's trouble. I was under the
impression that using OpenTransientFile was already taking care of that,
but that's evidently not the case.

I don't see how the current API could do that transparently - it does
track the files, but the user only gets a file descriptor. With just a
file descriptor, how could the code know to do reopen/seek when it's going
just through the regular fopen/fclose?

Anyway, I agree we need to do something, to fix this corner case (many
serialized in-progress transactions). ISTM we have two options - either do
something in the context of reorderbuffer.c, or extend the transient file
API somehow. I'd say the second option is the right thing going forward,
because it does allow doing it transparently and without leaking details
about maxAllocatedDescs etc. There are two issues, though - it does
require changes / extensions to the API, and it's not backpatchable.

So maybe we should start with the localized fix in reorderbuffer, and I
agree tracking offset seems reasonable.

As a sidenote - in the other thread about streaming, one of the patches
does change how we log subxact assignments. In the end, this allows using
just a single file for the top-level transaction, instead of having one
file per subxact. That would also solve this.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#3)
Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On 2019-Sep-12, Tomas Vondra wrote:

On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant wrote:

On 2019-Sep-11, Amit Khandekar wrote:

I think doing this all the time would make restore very slow -- there's a
reason we keep the files open, after all.

How much slower? It certainly will have a hit, but maybe it's negligible
compared to all the other stuff happening in this code?

I dunno -- that's a half-assed guess based on there being many more
syscalls, and on the fact that the API is how it is in the first place.
(Andres did a lot of perf benchmarking and tweaking back when he was
writing this ... I just point out that I have a colleague that had to
invent *a lot* of new MemCxt infrastructure in order to make some of
Andres' perf optimizations cleaner, just as a semi-related data point.
Anyway, I digress.)

Anyway, such a fix would pessimize all cases, including every single
case that works today (which evidently is almost every single usage of
this feature, since we hadn't heard of this problem until yesterday), in
order to solve a problem that you can only find in very rare ones.

Another point of view is that we should make it work first, then make it
fast. But the point remains that it works fine and fast for 99.99% of
cases.

It would be better if we can keep the descriptors open as much as
possible, and only close them if there's trouble. I was under the
impression that using OpenTransientFile was already taking care of that,
but that's evidently not the case.

I don't see how the current API could do that transparently - it does
track the files, but the user only gets a file descriptor. With just a
file descriptor, how could the code know to do reopen/seek when it's going
just through the regular fopen/fclose?

Yeah, I don't know what was in Amit's mind, but it seemed obvious to me
that such a fix required changing that API so that a seekpos is kept
together with the fd. ReorderBufferRestoreChange is static in
reorderbuffer.c so it's not a problem to change its ABI.

I agree with trying to get a reorderbuffer-localized back-patchable fix
first, then we how to improve from that.

As a sidenote - in the other thread about streaming, one of the patches
does change how we log subxact assignments. In the end, this allows using
just a single file for the top-level transaction, instead of having one
file per subxact. That would also solve this.

:-( While I would love to get that patch done and get rid of the issue,
it's not a backpatchable fix either. ... however, it does mean we maybe
can get away with the reorderbuffer.c-local fix, and then just use your
streaming stuff to get rid of the perf problem forever?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#3)
Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On Thu, Sep 12, 2019 at 5:31 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

I don't see how the current API could do that transparently - it does
track the files, but the user only gets a file descriptor. With just a
file descriptor, how could the code know to do reopen/seek when it's going
just through the regular fopen/fclose?

Anyway, I agree we need to do something, to fix this corner case (many
serialized in-progress transactions). ISTM we have two options - either do
something in the context of reorderbuffer.c, or extend the transient file
API somehow. I'd say the second option is the right thing going forward,
because it does allow doing it transparently and without leaking details
about maxAllocatedDescs etc. There are two issues, though - it does
require changes / extensions to the API, and it's not backpatchable.

So maybe we should start with the localized fix in reorderbuffer, and I
agree tracking offset seems reasonable.

We've already got code that knows how to track this sort of thing.
You just need to go through the File abstraction (PathNameOpenFile or
PathNameOpenFilePerm or OpenTemporaryFile) rather than using the
functions that deal directly with fds (OpenTransientFile,
BasicOpenFile, etc.). It seems like it would be better to reuse the
existing VFD layer than to invent a whole new one specific to logical
replication.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#5)
Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Hi,

On 2019-09-12 09:41:02 -0400, Robert Haas wrote:

On Thu, Sep 12, 2019 at 5:31 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

I don't see how the current API could do that transparently - it does
track the files, but the user only gets a file descriptor. With just a
file descriptor, how could the code know to do reopen/seek when it's going
just through the regular fopen/fclose?

Anyway, I agree we need to do something, to fix this corner case (many
serialized in-progress transactions). ISTM we have two options - either do
something in the context of reorderbuffer.c, or extend the transient file
API somehow. I'd say the second option is the right thing going forward,
because it does allow doing it transparently and without leaking details
about maxAllocatedDescs etc. There are two issues, though - it does
require changes / extensions to the API, and it's not backpatchable.

So maybe we should start with the localized fix in reorderbuffer, and I
agree tracking offset seems reasonable.

We've already got code that knows how to track this sort of thing.
You just need to go through the File abstraction (PathNameOpenFile or
PathNameOpenFilePerm or OpenTemporaryFile) rather than using the
functions that deal directly with fds (OpenTransientFile,
BasicOpenFile, etc.). It seems like it would be better to reuse the
existing VFD layer than to invent a whole new one specific to logical
replication.

Yea, I agree that that is the right fix.

Greetings,

Andres Freund

#7Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#4)
Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Hi,

On 2019-09-12 09:57:55 -0300, Alvaro Herrera wrote:

On 2019-Sep-12, Tomas Vondra wrote:

On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant wrote:

On 2019-Sep-11, Amit Khandekar wrote:

I think doing this all the time would make restore very slow -- there's a
reason we keep the files open, after all.

How much slower? It certainly will have a hit, but maybe it's negligible
compared to all the other stuff happening in this code?

I'd expect it to be significant.

As a sidenote - in the other thread about streaming, one of the patches
does change how we log subxact assignments. In the end, this allows using
just a single file for the top-level transaction, instead of having one
file per subxact. That would also solve this.

Uhm, how is rollback to savepoint going to be handled in that case? I
don't think it's great to just retain space for all rolled back
subtransactions.

Greetings,

Andres Freund

#8Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#7)
Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On Thu, Sep 12, 2019 at 11:34:01AM -0700, Andres Freund wrote:

Hi,

On 2019-09-12 09:57:55 -0300, Alvaro Herrera wrote:

On 2019-Sep-12, Tomas Vondra wrote:

On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant wrote:

On 2019-Sep-11, Amit Khandekar wrote:

I think doing this all the time would make restore very slow -- there's a
reason we keep the files open, after all.

How much slower? It certainly will have a hit, but maybe it's negligible
compared to all the other stuff happening in this code?

I'd expect it to be significant.

As a sidenote - in the other thread about streaming, one of the patches
does change how we log subxact assignments. In the end, this allows using
just a single file for the top-level transaction, instead of having one
file per subxact. That would also solve this.

Uhm, how is rollback to savepoint going to be handled in that case? I
don't think it's great to just retain space for all rolled back
subtransactions.

The code can just do ftruncate() to the position of the subtransaction.
That's what the patch [1]https://commitfest.postgresql.org/24/1927/ does.

[1]: https://commitfest.postgresql.org/24/1927/

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#9Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Robert Haas (#5)
Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On Thu, 12 Sep 2019 at 19:11, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Sep 12, 2019 at 5:31 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

I don't see how the current API could do that transparently - it does
track the files, but the user only gets a file descriptor. With just a
file descriptor, how could the code know to do reopen/seek when it's going
just through the regular fopen/fclose?

Anyway, I agree we need to do something, to fix this corner case (many
serialized in-progress transactions). ISTM we have two options - either do
something in the context of reorderbuffer.c, or extend the transient file
API somehow. I'd say the second option is the right thing going forward,
because it does allow doing it transparently and without leaking details
about maxAllocatedDescs etc. There are two issues, though - it does
require changes / extensions to the API, and it's not backpatchable.

So maybe we should start with the localized fix in reorderbuffer, and I
agree tracking offset seems reasonable.

We've already got code that knows how to track this sort of thing.

You mean tracking excess kernel fds right ? Yeah, we can use VFDs so
that excess fds are automatically closed. But Alvaro seems to be
talking in context of tracking of file seek position. VFD does not
have a mechanism to track file offsets if one of the vfd cached file
is closed and reopened. Robert, are you suggesting to add this
capability to VFD ? I agree that we could do it, but for
back-patching, offhand I couldn't think of a simpler way.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Khandekar (#9)
Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Amit Khandekar <amitdkhan.pg@gmail.com> writes:

You mean tracking excess kernel fds right ? Yeah, we can use VFDs so
that excess fds are automatically closed. But Alvaro seems to be
talking in context of tracking of file seek position. VFD does not
have a mechanism to track file offsets if one of the vfd cached file
is closed and reopened.

Hm. It used to, but somebody got rid of that on the theory that
we could use pread/pwrite instead. I'm inclined to think that that
was the right tradeoff, but it'd mean that getting logical decoding
to adhere to the VFD API requires extra work to track file position
on the caller side.

Again, though, the advice that's been given here is that we should
fix logical decoding to use the VFD API as it stands, not change
that API. I concur with that.

regards, tom lane

#11Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#10)
Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On Fri, Sep 13, 2019 at 12:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Khandekar <amitdkhan.pg@gmail.com> writes:

You mean tracking excess kernel fds right ? Yeah, we can use VFDs so
that excess fds are automatically closed. But Alvaro seems to be
talking in context of tracking of file seek position. VFD does not
have a mechanism to track file offsets if one of the vfd cached file
is closed and reopened.

Hm. It used to, but somebody got rid of that on the theory that
we could use pread/pwrite instead. I'm inclined to think that that
was the right tradeoff, but it'd mean that getting logical decoding
to adhere to the VFD API requires extra work to track file position
on the caller side.

Oops. I forgot that we'd removed that.

Again, though, the advice that's been given here is that we should
fix logical decoding to use the VFD API as it stands, not change
that API. I concur with that.

A reasonable position. So I guess logical decoding has to track the
file position itself, but perhaps use the VFD layer for managing FD
pooling.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Robert Haas (#11)
Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On Fri, 13 Sep 2019 at 22:01, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Sep 13, 2019 at 12:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Again, though, the advice that's been given here is that we should
fix logical decoding to use the VFD API as it stands, not change
that API. I concur with that.

A reasonable position. So I guess logical decoding has to track the
file position itself, but perhaps use the VFD layer for managing FD
pooling.

Yeah, something like the attached patch. I think this tracking of
offsets would have been cleaner if we add in-built support in VFD. But
yeah, for bank branches at least, we need to handle it outside of VFD.
Or may be we would add it if we find one more use-case.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

use_vfd_for_logrep.patchapplication/octet-stream; name=use_vfd_for_logrep.patchDownload+36-18
#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Khandekar (#12)
Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Amit Khandekar <amitdkhan.pg@gmail.com> writes:

On Fri, 13 Sep 2019 at 22:01, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Sep 13, 2019 at 12:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Again, though, the advice that's been given here is that we should
fix logical decoding to use the VFD API as it stands, not change
that API. I concur with that.

A reasonable position. So I guess logical decoding has to track the
file position itself, but perhaps use the VFD layer for managing FD
pooling.

Yeah, something like the attached patch. I think this tracking of
offsets would have been cleaner if we add in-built support in VFD. But
yeah, for bank branches at least, we need to handle it outside of VFD.
Or may be we would add it if we find one more use-case.

Again, we had that and removed it, for what seem to me to be solid
reasons. It adds cycles when we're forced to close/reopen a file,
and it also adds failure modes that we could do without (ie, failure
of either the ftell or the lseek, which are particularly nasty because
they shouldn't happen according to the VFD abstraction). I do not
think there is going to be any argument strong enough to make us
put it back, especially not for non-mainstream callers like logical
decoding.

regards, tom lane

#14Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#13)
Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Hi,

On 2019-09-14 14:34:21 -0400, Tom Lane wrote:

Amit Khandekar <amitdkhan.pg@gmail.com> writes:

On Fri, 13 Sep 2019 at 22:01, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Sep 13, 2019 at 12:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Again, though, the advice that's been given here is that we should
fix logical decoding to use the VFD API as it stands, not change
that API. I concur with that.

A reasonable position. So I guess logical decoding has to track the
file position itself, but perhaps use the VFD layer for managing FD
pooling.

Yeah, something like the attached patch. I think this tracking of
offsets would have been cleaner if we add in-built support in VFD. But
yeah, for bank branches at least, we need to handle it outside of VFD.
Or may be we would add it if we find one more use-case.

Again, we had that and removed it, for what seem to me to be solid
reasons. It adds cycles when we're forced to close/reopen a file,
and it also adds failure modes that we could do without (ie, failure
of either the ftell or the lseek, which are particularly nasty because
they shouldn't happen according to the VFD abstraction). I do not
think there is going to be any argument strong enough to make us
put it back, especially not for non-mainstream callers like logical
decoding.

Yea, I think that's the right call. Avoiding kernel seeks is quite
worthwhile, and we shouldn't undo it just because of this usecase. And
that'll become more and more important performance-wise (and has already
done so, with all the intel fixes making syscalls much slower).

I could see an argument for adding a separate generic layer providing
position tracking ontop of the VFD abstraction however. Seems quite
possible that there's some other parts of the system that could benefit
from using VFDs rather than plain fds. And they'd probably also need the
positional tracking.

Greetings,

Andres Freund

#15Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Andres Freund (#14)
Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On Tue, 17 Sep 2019 at 21:19, Andres Freund <andres@anarazel.de> wrote:

On 2019-09-14 14:34:21 -0400, Tom Lane wrote:

Amit Khandekar <amitdkhan.pg@gmail.com> writes:

Yeah, something like the attached patch. I think this tracking of
offsets would have been cleaner if we add in-built support in VFD. But
yeah, for bank branches at least, we need to handle it outside of VFD.
Or may be we would add it if we find one more use-case.

Again, we had that and removed it, for what seem to me to be solid
reasons. It adds cycles when we're forced to close/reopen a file,
and it also adds failure modes that we could do without (ie, failure
of either the ftell or the lseek, which are particularly nasty because
they shouldn't happen according to the VFD abstraction).

Ok. So you mean, when the caller would call FileRead() for sequential
reading, underneath VFD would do a pread(), but if pread() returns
error, the errno can belong to read() or it might as well belong to
lseek(). If it's due to lseek(), it's not expected from the caller
because for the caller it's just a sequential read. Yeah, makes sense.

I do not
think there is going to be any argument strong enough to make us
put it back, especially not for non-mainstream callers like logical
decoding.

Ok. Also, more about putting back is in the below comments ...

Yea, I think that's the right call. Avoiding kernel seeks is quite
worthwhile, and we shouldn't undo it just because of this usecase. And
that'll become more and more important performance-wise (and has already
done so, with all the intel fixes making syscalls much slower).

By the way, I was not thinking about adding back the read() and
lseek() calls. I was saying we continue to use the pread() call, so
it's just a single system call. FileReadAt(..., offset) would do
pread() with user-supplied offset, and FileRead() would do pread()
using internally tracked offset. So for the user, FileReadAt() is like
pread(), and FileRead() would be like read().

But I agree with Tom's objection about having to unnecessarily handle
lseek error codes.

I could see an argument for adding a separate generic layer providing
position tracking ontop of the VFD abstraction however. Seems quite
possible that there's some other parts of the system that could benefit
from using VFDs rather than plain fds. And they'd probably also need the
positional tracking.

Yeah, that also could be done.

Probably, for now at least, what everyone seems to agree is to take my
earlier attached patch forward.

I am going to see if I can add a TAP test for the patch, and will add
the patch into the commitfest soon.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

#16Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Khandekar (#15)
Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On Wed, 18 Sep 2019 at 12:24, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Probably, for now at least, what everyone seems to agree is to take my
earlier attached patch forward.

I am going to see if I can add a TAP test for the patch, and will add
the patch into the commitfest soon.

Attached is an updated patch v2. Has a new test scenario added in
contrib/test_decoding/sql/spill test, and some minor code cleanup.

Going to add this into Nov commitfest.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

use_vfd_for_logrep_v2.patchapplication/x-patch; name=use_vfd_for_logrep_v2.patchDownload+73-20
#17Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Khandekar (#16)
Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On Thu, Oct 3, 2019 at 4:48 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On Wed, 18 Sep 2019 at 12:24, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Probably, for now at least, what everyone seems to agree is to take my
earlier attached patch forward.

I am going to see if I can add a TAP test for the patch, and will add
the patch into the commitfest soon.

Attached is an updated patch v2.

I see that you have made changes in ReorderBufferRestoreChanges to use
PathNameOpenFile, but not in ReorderBufferSerializeTXN. Is there a
reason for the same? In my test environment, with the test provided
by you, I got the error (reported in this thread) via
ReorderBufferSerializeTXN. See call stack below:

!errfinish(int dummy=0, ...) Line 441 C
!OpenTransientFilePerm(const char * fileName=0x012deeac, int
fileFlags=33033, unsigned short fileMode=384) Line 2272 + 0x57 bytes
C
!OpenTransientFile(const char * fileName=0x012deeac, int
fileFlags=33033) Line 2256 + 0x15 bytes C
!ReorderBufferSerializeTXN(ReorderBuffer * rb=0x01ee4d80,
ReorderBufferTXN * txn=0x1f9a6ce8) Line 2302 + 0x11 bytes C
!ReorderBufferIterTXNInit(ReorderBuffer * rb=0x01ee4d80,
ReorderBufferTXN * txn=0x01f08f80) Line 1044 + 0xd bytes C

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#18Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Kapila (#17)
Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On Mon, 18 Nov 2019 at 17:20, Amit Kapila <amit.kapila16@gmail.com> wrote:

I see that you have made changes in ReorderBufferRestoreChanges to use
PathNameOpenFile, but not in ReorderBufferSerializeTXN. Is there a
reason for the same? In my test environment, with the test provided
by you, I got the error (reported in this thread) via
ReorderBufferSerializeTXN.

You didn't get this error with the patch applied, did you ?

If you were debugging this without the patch applied, I suspect that
the reason why ReorderBufferSerializeTXN() => OpenTransientFile() is
generating this error is because the max limit must be already crossed
because of earlier calls to ReorderBufferRestoreChanges().

Note that in ReorderBufferSerializeTXN(), OpenTransientFile() is
sufficient because the code in that function has made sure the fd gets
closed there itself.

If you are getting this error even with the patch applied, then this
needs investigation.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

#19Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#17)
Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On Mon, Nov 18, 2019 at 5:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Oct 3, 2019 at 4:48 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On Wed, 18 Sep 2019 at 12:24, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Probably, for now at least, what everyone seems to agree is to take my
earlier attached patch forward.

I am going to see if I can add a TAP test for the patch, and will add
the patch into the commitfest soon.

Attached is an updated patch v2.

I see that you have made changes in ReorderBufferRestoreChanges to use
PathNameOpenFile, but not in ReorderBufferSerializeTXN. Is there a
reason for the same?

I have one more question regarding this patch. It seems to me that
the files opened via OpenTransientFile or OpenTemporaryFile are
automatically closed at transaction end(abort), but that doesn't seem
to be the case for files opened with PathNameOpenFile. See
AtEOXact_Files and AtEOSubXact_Files. So, now with the change
proposed by this patch, don't we need to deal it in some other way?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#20Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Kapila (#19)
Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On Mon, 18 Nov 2019 at 17:52, Amit Kapila <amit.kapila16@gmail.com> wrote:

I have one more question regarding this patch. It seems to me that
the files opened via OpenTransientFile or OpenTemporaryFile are
automatically closed at transaction end(abort), but that doesn't seem
to be the case for files opened with PathNameOpenFile. See
AtEOXact_Files and AtEOSubXact_Files. So, now with the change
proposed by this patch, don't we need to deal it in some other way?

For the API's that use VFDs (like PathNameOpenFile), the files opened
are always recorded in the VfdCache array. So it is not required to do
the cleanup at (sub)transaction end, because the kernel fds get closed
dynamically in ReleaseLruFiles() whenever they reach max_safe_fds
limit. So if a transaction aborts, the fds might remain open, but
those will get cleaned up whenever we require more fds, through
ReleaseLruFiles(). Whereas, for files opened through
OpenTransientFile(), VfdCache is not involved, so this needs
transaction end cleanup.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

#21Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Khandekar (#18)
#22Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Kapila (#21)
#23Thomas Munro
thomas.munro@gmail.com
In reply to: Amit Khandekar (#22)
#24Juan José Santamaría Flecha
juanjo.santamaria@gmail.com
In reply to: Thomas Munro (#23)
#25Thomas Munro
thomas.munro@gmail.com
In reply to: Juan José Santamaría Flecha (#24)
#26Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#25)
#27Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Thomas Munro (#26)
#28Amit Kapila
amit.kapila16@gmail.com
In reply to: Thomas Munro (#25)
#29Thomas Munro
thomas.munro@gmail.com
In reply to: Amit Khandekar (#27)
#30Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Khandekar (#22)
#31Thomas Munro
thomas.munro@gmail.com
In reply to: Amit Kapila (#28)
#32Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Khandekar (#22)
#33Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Kapila (#32)
#34Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Kapila (#32)
#35Juan José Santamaría Flecha
juanjo.santamaria@gmail.com
In reply to: Amit Khandekar (#34)
#36Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Juan José Santamaría Flecha (#35)
#37Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Alvaro Herrera (#36)
#38Amit Kapila
amit.kapila16@gmail.com
In reply to: Juan José Santamaría Flecha (#35)
#39Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Khandekar (#34)
#40Juan José Santamaría Flecha
juanjo.santamaria@gmail.com
In reply to: Amit Kapila (#38)
#41Amit Kapila
amit.kapila16@gmail.com
In reply to: Juan José Santamaría Flecha (#40)
#42Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Kapila (#41)
#43Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Khandekar (#42)
#44Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Kapila (#43)
#45Juan José Santamaría Flecha
juanjo.santamaria@gmail.com
In reply to: Amit Kapila (#41)
#46Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Khandekar (#44)
#47Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Kapila (#46)
#48Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Khandekar (#47)
#49Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Kapila (#48)
#50Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Khandekar (#49)
#51Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Khandekar (#50)
#52Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#51)
#53Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Kapila (#52)
#54Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Khandekar (#53)
#55Thomas Munro
thomas.munro@gmail.com
In reply to: Juan José Santamaría Flecha (#24)
#56Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Kapila (#54)
#57Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Khandekar (#56)
#58Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Kapila (#57)
#59Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Khandekar (#58)
#60Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Khandekar (#59)
#61Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Kapila (#60)
#62Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Khandekar (#61)
#63Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Kapila (#62)
#64Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Khandekar (#63)
#65Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Kapila (#64)
#66Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Khandekar (#65)
#67Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Khandekar (#66)
#68Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Kapila (#67)
#69Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Khandekar (#68)
#70Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Kapila (#69)
#71Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Khandekar (#70)
#72Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#71)
#73Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#72)
#74Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Kapila (#73)
#75Noah Misch
noah@leadboat.com
In reply to: Amit Khandekar (#74)
#76Amit Kapila
amit.kapila16@gmail.com
In reply to: Noah Misch (#75)
#77Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Noah Misch (#75)
#78Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#76)
#79Noah Misch
noah@leadboat.com
In reply to: Amit Khandekar (#77)
#80Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#79)
#81Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#80)
#82Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#80)
#83Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Kapila (#82)
#84Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#83)
#85Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#84)
#86Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Kapila (#81)
#87Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#86)
#88Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#80)
#89Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#87)
#90Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#88)
#91Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#89)
#92Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Kapila (#91)
#93Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#92)
#94Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#87)
#95Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#94)
#96Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#95)
#97Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#96)
#98Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#93)
#99Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#98)
#100Noah Misch
noah@leadboat.com
In reply to: Noah Misch (#79)
#101Daniel Gustafsson
daniel@yesql.se
In reply to: Noah Misch (#100)
#102Kuntal Ghosh
kuntalghosh.2007@gmail.com
In reply to: Tom Lane (#97)
#103Amit Kapila
amit.kapila16@gmail.com
In reply to: Kuntal Ghosh (#102)
#104Kuntal Ghosh
kuntalghosh.2007@gmail.com
In reply to: Amit Kapila (#103)
#105Andres Freund
andres@anarazel.de
In reply to: Kuntal Ghosh (#102)
#106Andres Freund
andres@anarazel.de
In reply to: Kuntal Ghosh (#102)
#107Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#105)
#108Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#97)
#109Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#107)
#110Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#106)
#111Kuntal Ghosh
kuntalghosh.2007@gmail.com
In reply to: Andres Freund (#109)
#112Kuntal Ghosh
kuntalghosh.2007@gmail.com
In reply to: Amit Kapila (#110)
#113Amit Kapila
amit.kapila16@gmail.com
In reply to: Kuntal Ghosh (#104)
#114Amit Kapila
amit.kapila16@gmail.com
In reply to: Kuntal Ghosh (#112)
#115Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#114)
#116Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#115)