logical decoding : exceeded maxAllocatedDescs for .spill files

Started by Amit Khandekarover 6 years ago116 messageshackers

amitdkhan.pg@gmail.com

over 6 years ago

Hi,

I reproduced the error "exceeded maxAllocatedDescs (492) while trying
to open file ...", which was also discussed about in the thread [1]/messages/by-id/738a590a-2ce5-9394-2bef-7b1caad89b37@2ndquadrant.com.
This issue is similar but not exactly the same as [1]/messages/by-id/738a590a-2ce5-9394-2bef-7b1caad89b37@2ndquadrant.com. In [1]/messages/by-id/738a590a-2ce5-9394-2bef-7b1caad89b37@2ndquadrant.com, the
file for which this error used to show up was
"pg_logical/mappings/map...." , while here it's the .spill file. And
here the issue , in short, seems to be : The .spill file does not get
closed there and then, unlike in [1]/messages/by-id/738a590a-2ce5-9394-2bef-7b1caad89b37@2ndquadrant.com where there was a file descriptor
leak.

I could reproduce it using a transaction containing a long series of
sub-transactions (possibly could be reproduced without
sub-transactions, but looking at the code I could come up with this
script using sub-transactions easily) :

create table tab(id int);

-- Function that does huge changes in a single transaction
create or replace function f(id int) returns int as
$$
begin
-- Iterate this more than 492 times (max transient file
descriptors PG would typically allow)
-- This will create that many sub-transactions due to presence of
exception block.
FOR i IN 1..600 LOOP

BEGIN
-- Iterate more than 4096 times (so that changes spill to
disk: max_changes_in_memory)
FOR j IN 1..5000 LOOP
insert into tab values (1);
END LOOP;
EXCEPTION
when division_by_zero then perform 'dummy';
END;

END LOOP;

return id;
end $$ language plpgsql;

SELECT * FROM pg_create_logical_replication_slot('logical', 'test_decoding');

begin;
select f(1); -- Do huge changes in a single transaction
commit;

\! pg_recvlogical -d postgres --slot=logical --verbose --start -f -

pg_recvlogical: starting log streaming at 0/0 (slot logical)
pg_recvlogical: streaming initiated
pg_recvlogical: confirming write up to 0/0, flush to 0/0 (slot logical)
BEGIN 1869
pg_recvlogical: confirming write up to 0/1B6D6E38, flush to 0/1B6D6E38
(slot logical)
pg_recvlogical: error: unexpected termination of replication stream:
ERROR: exceeded maxAllocatedDescs (492) while trying to open file
"pg_replslot/logical/xid-2362-lsn-0-24000000.spill"
pg_recvlogical: disconnected; waiting 5 seconds to try again

Looking at the code, what might be happening is,
ReorderBufferIterTXNInit()=>ReorderBufferRestoreChanges() opens the
files, but leaves them open if end of file is not reached. Eventually
if end of file is reached, it gets closed. The function returns back
without closing the file descriptor if reorder buffer changes being
restored are more than max_changes_in_memory. Probably later on, the
rest of the changes get restored in another
ReorderBufferRestoreChanges() call. But meanwhile, if there are a lot
of such files opened, we can run out of the max files that PG decides
to keep open (it has some logic that takes into account system files
allowed to be open, and already opened).

Offhand, what I am thinking is, we need to close the file descriptor
before returning from ReorderBufferRestoreChanges(), and keep track of
the file offset and file path, so that next time we can resume reading
from there.

Comments ?

[1]: /messages/by-id/738a590a-2ce5-9394-2bef-7b1caad89b37@2ndquadrant.com

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Alvaro Herrera

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Amit Khandekar (#1)

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On 2019-Sep-11, Amit Khandekar wrote:

I reproduced the error "exceeded maxAllocatedDescs (492) while trying
to open file ...", which was also discussed about in the thread [1].
This issue is similar but not exactly the same as [1]. In [1], the
file for which this error used to show up was
"pg_logical/mappings/map...." , while here it's the .spill file. And
here the issue , in short, seems to be : The .spill file does not get
closed there and then, unlike in [1] where there was a file descriptor
leak.

Uh-oh :-( Thanks for the reproducer -- I confirm it breaks things.

Looking at the code, what might be happening is,
ReorderBufferIterTXNInit()=>ReorderBufferRestoreChanges() opens the
files, but leaves them open if end of file is not reached. Eventually
if end of file is reached, it gets closed. The function returns back
without closing the file descriptor if reorder buffer changes being
restored are more than max_changes_in_memory. Probably later on, the
rest of the changes get restored in another
ReorderBufferRestoreChanges() call. But meanwhile, if there are a lot
of such files opened, we can run out of the max files that PG decides
to keep open (it has some logic that takes into account system files
allowed to be open, and already opened).

Makes sense.

Offhand, what I am thinking is, we need to close the file descriptor
before returning from ReorderBufferRestoreChanges(), and keep track of
the file offset and file path, so that next time we can resume reading
from there.

I think doing this all the time would make restore very slow -- there's
a reason we keep the files open, after all. It would be better if we
can keep the descriptors open as much as possible, and only close them
if there's trouble. I was under the impression that using
OpenTransientFile was already taking care of that, but that's evidently
not the case.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 6 years ago

In reply to: Alvaro Herrera (#2)

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant wrote:

On 2019-Sep-11, Amit Khandekar wrote:

I reproduced the error "exceeded maxAllocatedDescs (492) while trying
to open file ...", which was also discussed about in the thread [1].
This issue is similar but not exactly the same as [1]. In [1], the
file for which this error used to show up was
"pg_logical/mappings/map...." , while here it's the .spill file. And
here the issue , in short, seems to be : The .spill file does not get
closed there and then, unlike in [1] where there was a file descriptor
leak.

Uh-oh :-( Thanks for the reproducer -- I confirm it breaks things.

Looking at the code, what might be happening is,
ReorderBufferIterTXNInit()=>ReorderBufferRestoreChanges() opens the
files, but leaves them open if end of file is not reached. Eventually
if end of file is reached, it gets closed. The function returns back
without closing the file descriptor if reorder buffer changes being
restored are more than max_changes_in_memory. Probably later on, the
rest of the changes get restored in another
ReorderBufferRestoreChanges() call. But meanwhile, if there are a lot
of such files opened, we can run out of the max files that PG decides
to keep open (it has some logic that takes into account system files
allowed to be open, and already opened).

Makes sense.

Offhand, what I am thinking is, we need to close the file descriptor
before returning from ReorderBufferRestoreChanges(), and keep track of
the file offset and file path, so that next time we can resume reading
from there.

I think doing this all the time would make restore very slow -- there's a
reason we keep the files open, after all.

How much slower? It certainly will have a hit, but maybe it's negligible
compared to all the other stuff happening in this code?

It would be better if we can keep the descriptors open as much as
possible, and only close them if there's trouble. I was under the
impression that using OpenTransientFile was already taking care of that,
but that's evidently not the case.

I don't see how the current API could do that transparently - it does
track the files, but the user only gets a file descriptor. With just a
file descriptor, how could the code know to do reopen/seek when it's going
just through the regular fopen/fclose?

Anyway, I agree we need to do something, to fix this corner case (many
serialized in-progress transactions). ISTM we have two options - either do
something in the context of reorderbuffer.c, or extend the transient file
API somehow. I'd say the second option is the right thing going forward,
because it does allow doing it transparently and without leaking details
about maxAllocatedDescs etc. There are two issues, though - it does
require changes / extensions to the API, and it's not backpatchable.

So maybe we should start with the localized fix in reorderbuffer, and I
agree tracking offset seems reasonable.

As a sidenote - in the other thread about streaming, one of the patches
does change how we log subxact assignments. In the end, this allows using
just a single file for the top-level transaction, instead of having one
file per subxact. That would also solve this.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Alvaro Herrera

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Tomas Vondra (#3)

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On 2019-Sep-12, Tomas Vondra wrote:

On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant wrote:

On 2019-Sep-11, Amit Khandekar wrote:

I think doing this all the time would make restore very slow -- there's a
reason we keep the files open, after all.

How much slower? It certainly will have a hit, but maybe it's negligible
compared to all the other stuff happening in this code?

I dunno -- that's a half-assed guess based on there being many more
syscalls, and on the fact that the API is how it is in the first place.
(Andres did a lot of perf benchmarking and tweaking back when he was
writing this ... I just point out that I have a colleague that had to
invent *a lot* of new MemCxt infrastructure in order to make some of
Andres' perf optimizations cleaner, just as a semi-related data point.
Anyway, I digress.)

Anyway, such a fix would pessimize all cases, including every single
case that works today (which evidently is almost every single usage of
this feature, since we hadn't heard of this problem until yesterday), in
order to solve a problem that you can only find in very rare ones.

Another point of view is that we should make it work first, then make it
fast. But the point remains that it works fine and fast for 99.99% of
cases.

It would be better if we can keep the descriptors open as much as
possible, and only close them if there's trouble. I was under the
impression that using OpenTransientFile was already taking care of that,
but that's evidently not the case.

I don't see how the current API could do that transparently - it does
track the files, but the user only gets a file descriptor. With just a
file descriptor, how could the code know to do reopen/seek when it's going
just through the regular fopen/fclose?

Yeah, I don't know what was in Amit's mind, but it seemed obvious to me
that such a fix required changing that API so that a seekpos is kept
together with the fd. ReorderBufferRestoreChange is static in
reorderbuffer.c so it's not a problem to change its ABI.

I agree with trying to get a reorderbuffer-localized back-patchable fix
first, then we how to improve from that.

As a sidenote - in the other thread about streaming, one of the patches
does change how we log subxact assignments. In the end, this allows using
just a single file for the top-level transaction, instead of having one
file per subxact. That would also solve this.

:-( While I would love to get that patch done and get rid of the issue,
it's not a backpatchable fix either. ... however, it does mean we maybe
can get away with the reorderbuffer.c-local fix, and then just use your
streaming stuff to get rid of the perf problem forever?

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Robert Haas

robertmhaas@gmail.com

over 6 years ago

In reply to: Tomas Vondra (#3)

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On Thu, Sep 12, 2019 at 5:31 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

I don't see how the current API could do that transparently - it does
track the files, but the user only gets a file descriptor. With just a
file descriptor, how could the code know to do reopen/seek when it's going
just through the regular fopen/fclose?

Anyway, I agree we need to do something, to fix this corner case (many
serialized in-progress transactions). ISTM we have two options - either do
something in the context of reorderbuffer.c, or extend the transient file
API somehow. I'd say the second option is the right thing going forward,
because it does allow doing it transparently and without leaking details
about maxAllocatedDescs etc. There are two issues, though - it does
require changes / extensions to the API, and it's not backpatchable.

So maybe we should start with the localized fix in reorderbuffer, and I
agree tracking offset seems reasonable.

We've already got code that knows how to track this sort of thing.
You just need to go through the File abstraction (PathNameOpenFile or
PathNameOpenFilePerm or OpenTemporaryFile) rather than using the
functions that deal directly with fds (OpenTransientFile,
BasicOpenFile, etc.). It seems like it would be better to reuse the
existing VFD layer than to invent a whole new one specific to logical
replication.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Andres Freund

andres@anarazel.de

over 6 years ago

In reply to: Robert Haas (#5)

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Hi,

On 2019-09-12 09:41:02 -0400, Robert Haas wrote:

On Thu, Sep 12, 2019 at 5:31 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

I don't see how the current API could do that transparently - it does
track the files, but the user only gets a file descriptor. With just a
file descriptor, how could the code know to do reopen/seek when it's going
just through the regular fopen/fclose?

Anyway, I agree we need to do something, to fix this corner case (many
serialized in-progress transactions). ISTM we have two options - either do
something in the context of reorderbuffer.c, or extend the transient file
API somehow. I'd say the second option is the right thing going forward,
because it does allow doing it transparently and without leaking details
about maxAllocatedDescs etc. There are two issues, though - it does
require changes / extensions to the API, and it's not backpatchable.

So maybe we should start with the localized fix in reorderbuffer, and I
agree tracking offset seems reasonable.

We've already got code that knows how to track this sort of thing.
You just need to go through the File abstraction (PathNameOpenFile or
PathNameOpenFilePerm or OpenTemporaryFile) rather than using the
functions that deal directly with fds (OpenTransientFile,
BasicOpenFile, etc.). It seems like it would be better to reuse the
existing VFD layer than to invent a whole new one specific to logical
replication.

Yea, I agree that that is the right fix.

Greetings,

Andres Freund

andres@anarazel.de

over 6 years ago

In reply to: Alvaro Herrera (#4)

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Hi,

On 2019-09-12 09:57:55 -0300, Alvaro Herrera wrote:

On 2019-Sep-12, Tomas Vondra wrote:

On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant wrote:

On 2019-Sep-11, Amit Khandekar wrote:

I think doing this all the time would make restore very slow -- there's a
reason we keep the files open, after all.

How much slower? It certainly will have a hit, but maybe it's negligible
compared to all the other stuff happening in this code?

I'd expect it to be significant.

As a sidenote - in the other thread about streaming, one of the patches
does change how we log subxact assignments. In the end, this allows using
just a single file for the top-level transaction, instead of having one
file per subxact. That would also solve this.

Uhm, how is rollback to savepoint going to be handled in that case? I
don't think it's great to just retain space for all rolled back
subtransactions.

Greetings,

Andres Freund

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 6 years ago

In reply to: Andres Freund (#7)

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On Thu, Sep 12, 2019 at 11:34:01AM -0700, Andres Freund wrote:

Hi,

On 2019-09-12 09:57:55 -0300, Alvaro Herrera wrote:

On 2019-Sep-12, Tomas Vondra wrote:

On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant wrote:

On 2019-Sep-11, Amit Khandekar wrote:

I think doing this all the time would make restore very slow -- there's a
reason we keep the files open, after all.

How much slower? It certainly will have a hit, but maybe it's negligible
compared to all the other stuff happening in this code?

I'd expect it to be significant.

As a sidenote - in the other thread about streaming, one of the patches
does change how we log subxact assignments. In the end, this allows using
just a single file for the top-level transaction, instead of having one
file per subxact. That would also solve this.

Uhm, how is rollback to savepoint going to be handled in that case? I
don't think it's great to just retain space for all rolled back
subtransactions.

The code can just do ftruncate() to the position of the subtransaction.
That's what the patch [1]https://commitfest.postgresql.org/24/1927/ does.

[1]: https://commitfest.postgresql.org/24/1927/

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Amit Khandekar

amitdkhan.pg@gmail.com

over 6 years ago

In reply to: Robert Haas (#5)

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On Thu, 12 Sep 2019 at 19:11, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Sep 12, 2019 at 5:31 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

I don't see how the current API could do that transparently - it does
track the files, but the user only gets a file descriptor. With just a
file descriptor, how could the code know to do reopen/seek when it's going
just through the regular fopen/fclose?

Anyway, I agree we need to do something, to fix this corner case (many
serialized in-progress transactions). ISTM we have two options - either do
something in the context of reorderbuffer.c, or extend the transient file
API somehow. I'd say the second option is the right thing going forward,
because it does allow doing it transparently and without leaking details
about maxAllocatedDescs etc. There are two issues, though - it does
require changes / extensions to the API, and it's not backpatchable.

So maybe we should start with the localized fix in reorderbuffer, and I
agree tracking offset seems reasonable.

We've already got code that knows how to track this sort of thing.

You mean tracking excess kernel fds right ? Yeah, we can use VFDs so
that excess fds are automatically closed. But Alvaro seems to be
talking in context of tracking of file seek position. VFD does not
have a mechanism to track file offsets if one of the vfd cached file
is closed and reopened. Robert, are you suggesting to add this
capability to VFD ? I agree that we could do it, but for
back-patching, offhand I couldn't think of a simpler way.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

#10

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Amit Khandekar (#9)

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Amit Khandekar <amitdkhan.pg@gmail.com> writes:

You mean tracking excess kernel fds right ? Yeah, we can use VFDs so
that excess fds are automatically closed. But Alvaro seems to be
talking in context of tracking of file seek position. VFD does not
have a mechanism to track file offsets if one of the vfd cached file
is closed and reopened.

Hm. It used to, but somebody got rid of that on the theory that
we could use pread/pwrite instead. I'm inclined to think that that
was the right tradeoff, but it'd mean that getting logical decoding
to adhere to the VFD API requires extra work to track file position
on the caller side.

Again, though, the advice that's been given here is that we should
fix logical decoding to use the VFD API as it stands, not change
that API. I concur with that.

regards, tom lane

#11

Robert Haas

robertmhaas@gmail.com

over 6 years ago

In reply to: Tom Lane (#10)

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On Fri, Sep 13, 2019 at 12:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Khandekar <amitdkhan.pg@gmail.com> writes:

You mean tracking excess kernel fds right ? Yeah, we can use VFDs so
that excess fds are automatically closed. But Alvaro seems to be
talking in context of tracking of file seek position. VFD does not
have a mechanism to track file offsets if one of the vfd cached file
is closed and reopened.

Hm. It used to, but somebody got rid of that on the theory that
we could use pread/pwrite instead. I'm inclined to think that that
was the right tradeoff, but it'd mean that getting logical decoding
to adhere to the VFD API requires extra work to track file position
on the caller side.

Oops. I forgot that we'd removed that.

Again, though, the advice that's been given here is that we should
fix logical decoding to use the VFD API as it stands, not change
that API. I concur with that.

A reasonable position. So I guess logical decoding has to track the
file position itself, but perhaps use the VFD layer for managing FD
pooling.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12

Amit Khandekar

amitdkhan.pg@gmail.com

over 6 years ago

In reply to: Robert Haas (#11)

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On Fri, 13 Sep 2019 at 22:01, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Sep 13, 2019 at 12:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Again, though, the advice that's been given here is that we should
fix logical decoding to use the VFD API as it stands, not change
that API. I concur with that.

A reasonable position. So I guess logical decoding has to track the
file position itself, but perhaps use the VFD layer for managing FD
pooling.

Yeah, something like the attached patch. I think this tracking of
offsets would have been cleaner if we add in-built support in VFD. But
yeah, for bank branches at least, we need to handle it outside of VFD.
Or may be we would add it if we find one more use-case.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

#13

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Amit Khandekar (#12)

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Amit Khandekar <amitdkhan.pg@gmail.com> writes:

On Fri, 13 Sep 2019 at 22:01, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Sep 13, 2019 at 12:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Again, though, the advice that's been given here is that we should
fix logical decoding to use the VFD API as it stands, not change
that API. I concur with that.

A reasonable position. So I guess logical decoding has to track the
file position itself, but perhaps use the VFD layer for managing FD
pooling.

Yeah, something like the attached patch. I think this tracking of
offsets would have been cleaner if we add in-built support in VFD. But
yeah, for bank branches at least, we need to handle it outside of VFD.
Or may be we would add it if we find one more use-case.

Again, we had that and removed it, for what seem to me to be solid
reasons. It adds cycles when we're forced to close/reopen a file,
and it also adds failure modes that we could do without (ie, failure
of either the ftell or the lseek, which are particularly nasty because
they shouldn't happen according to the VFD abstraction). I do not
think there is going to be any argument strong enough to make us
put it back, especially not for non-mainstream callers like logical
decoding.

regards, tom lane

#14

Andres Freund

andres@anarazel.de

over 6 years ago

In reply to: Tom Lane (#13)

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Hi,

On 2019-09-14 14:34:21 -0400, Tom Lane wrote:

Amit Khandekar <amitdkhan.pg@gmail.com> writes:

On Fri, 13 Sep 2019 at 22:01, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Sep 13, 2019 at 12:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Again, though, the advice that's been given here is that we should
fix logical decoding to use the VFD API as it stands, not change
that API. I concur with that.

A reasonable position. So I guess logical decoding has to track the
file position itself, but perhaps use the VFD layer for managing FD
pooling.

Yeah, something like the attached patch. I think this tracking of
offsets would have been cleaner if we add in-built support in VFD. But
yeah, for bank branches at least, we need to handle it outside of VFD.
Or may be we would add it if we find one more use-case.

Again, we had that and removed it, for what seem to me to be solid
reasons. It adds cycles when we're forced to close/reopen a file,
and it also adds failure modes that we could do without (ie, failure
of either the ftell or the lseek, which are particularly nasty because
they shouldn't happen according to the VFD abstraction). I do not
think there is going to be any argument strong enough to make us
put it back, especially not for non-mainstream callers like logical
decoding.

Yea, I think that's the right call. Avoiding kernel seeks is quite
worthwhile, and we shouldn't undo it just because of this usecase. And
that'll become more and more important performance-wise (and has already
done so, with all the intel fixes making syscalls much slower).

I could see an argument for adding a separate generic layer providing
position tracking ontop of the VFD abstraction however. Seems quite
possible that there's some other parts of the system that could benefit
from using VFDs rather than plain fds. And they'd probably also need the
positional tracking.

Greetings,

Andres Freund

#15

Amit Khandekar

amitdkhan.pg@gmail.com

over 6 years ago

In reply to: Andres Freund (#14)

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On Tue, 17 Sep 2019 at 21:19, Andres Freund <andres@anarazel.de> wrote:

On 2019-09-14 14:34:21 -0400, Tom Lane wrote:

Amit Khandekar <amitdkhan.pg@gmail.com> writes:

Yeah, something like the attached patch. I think this tracking of
offsets would have been cleaner if we add in-built support in VFD. But
yeah, for bank branches at least, we need to handle it outside of VFD.
Or may be we would add it if we find one more use-case.

Again, we had that and removed it, for what seem to me to be solid
reasons. It adds cycles when we're forced to close/reopen a file,
and it also adds failure modes that we could do without (ie, failure
of either the ftell or the lseek, which are particularly nasty because
they shouldn't happen according to the VFD abstraction).

Ok. So you mean, when the caller would call FileRead() for sequential
reading, underneath VFD would do a pread(), but if pread() returns
error, the errno can belong to read() or it might as well belong to
lseek(). If it's due to lseek(), it's not expected from the caller
because for the caller it's just a sequential read. Yeah, makes sense.

I do not
think there is going to be any argument strong enough to make us
put it back, especially not for non-mainstream callers like logical
decoding.

Ok. Also, more about putting back is in the below comments ...

Yea, I think that's the right call. Avoiding kernel seeks is quite
worthwhile, and we shouldn't undo it just because of this usecase. And
that'll become more and more important performance-wise (and has already
done so, with all the intel fixes making syscalls much slower).

By the way, I was not thinking about adding back the read() and
lseek() calls. I was saying we continue to use the pread() call, so
it's just a single system call. FileReadAt(..., offset) would do
pread() with user-supplied offset, and FileRead() would do pread()
using internally tracked offset. So for the user, FileReadAt() is like
pread(), and FileRead() would be like read().

But I agree with Tom's objection about having to unnecessarily handle
lseek error codes.

I could see an argument for adding a separate generic layer providing
position tracking ontop of the VFD abstraction however. Seems quite
possible that there's some other parts of the system that could benefit
from using VFDs rather than plain fds. And they'd probably also need the
positional tracking.

Yeah, that also could be done.

Probably, for now at least, what everyone seems to agree is to take my
earlier attached patch forward.

I am going to see if I can add a TAP test for the patch, and will add
the patch into the commitfest soon.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

#16

Amit Khandekar

amitdkhan.pg@gmail.com

over 6 years ago

In reply to: Amit Khandekar (#15)

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

On Wed, 18 Sep 2019 at 12:24, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Probably, for now at least, what everyone seems to agree is to take my
earlier attached patch forward.

I am going to see if I can add a TAP test for the patch, and will add
the patch into the commitfest soon.

Attached is an updated patch v2. Has a new test scenario added in
contrib/test_decoding/sql/spill test, and some minor code cleanup.

Going to add this into Nov commitfest.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

logical decoding : exceeded maxAllocatedDescs for .spill files

Attachments:

Attachments: