pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o
pgstat: Bring up pgstat in BaseInit() to fix uninitialized use of pgstat by AV.
Previously pgstat_initialize() was called in InitPostgres() and
AuxiliaryProcessMain(). As it turns out there was at least one case where we
reported stats before pgstat_initialize() was called, see
AutoVacWorkerMain()'s intentionally early call to pgstat_report_autovac().
This turns out to not be a problem with the current pgstat implementation as
pgstat_initialize() only registers a shutdown callback. But in the shared
memory based stats implementation we are working towards pgstat_initialize()
has to do more work.
After b406478b87e BaseInit() is a central place where initialization shared by
normal backends and auxiliary backends can be put. Obviously BaseInit() is
called before InitPostgres() registers ShutdownPostgres. Previously
ShutdownPostgres was the first before_shmem_exit callback, now that's commonly
pgstats. That should be fine.
Previously pgstat_initialize() was not called in bootstrap mode, but there
does not appear to be a need for that. It's now done unconditionally.
To detect future issues like this, assertions are added to a few places
verifying that the pgstat subsystem is initialized and not yet shut down.
Author: Andres Freund <andres@anarazel.de>
Discussion: /messages/by-id/20210405092914.mmxqe7j56lsjfsej@alap3.anarazel.de
Discussion: /messages/by-id/20210802164124.ufo5buo4apl6yuvs@alap3.anarazel.de
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/ee3f8d3d3aec0d7c961d6b398d31504bb272a450
Modified Files
--------------
src/backend/postmaster/auxprocess.c | 2 --
src/backend/postmaster/pgstat.c | 53 +++++++++++++++++++++++++++++++++++--
src/backend/utils/init/postinit.c | 23 +++++++++-------
3 files changed, 65 insertions(+), 13 deletions(-)
Andres Freund <andres@anarazel.de> writes:
pgstat: Bring up pgstat in BaseInit() to fix uninitialized use of pgstat by AV.
sifaka took exception to this, or at least I guess it was this one out
of the four you pushed at once:
TRAP: FailedAssertion("pgstat_is_initialized && !pgstat_is_shutdown", File: "pgstat.c", Line: 4810, PID: 74447)
0 postgres 0x0000000100e5a520 ExceptionalCondition + 124
1 postgres 0x0000000100ca1dec pgstat_reset_counters + 0
2 postgres 0x0000000100ca2548 pgstat_report_tempfile + 80
3 postgres 0x0000000100d07ca4 FileClose + 536
4 postgres 0x0000000100d09764 CleanupTempFiles + 160
5 postgres 0x0000000100d0f8c0 proc_exit_prepare + 228
6 postgres 0x0000000100d0f79c proc_exit + 24
7 postgres 0x0000000100e5ae04 errfinish + 856
8 postgres 0x0000000100f1a804 ProcessInterrupts.cold.9 + 88
9 postgres 0x0000000100d36a98 ProcessInterrupts + 604
10 postgres 0x0000000100cd93c4 sendDir + 1516
11 postgres 0x0000000100cd83a0 perform_base_backup + 3592
12 postgres 0x0000000100cd72c0 SendBaseBackup + 256
13 postgres 0x0000000100ce6a48 exec_replication_command + 1852
14 postgres 0x0000000100d39190 PostgresMain + 3260
15 postgres 0x0000000100ca9c78 process_startup_packet_die + 0
16 postgres 0x0000000100ca94ec ClosePostmasterPorts + 0
17 postgres 0x0000000100ca6a0c PostmasterMain + 4584
18 postgres 0x0000000100c0b798 help + 0
19 libdyld.dylib 0x0000000184391430 start + 4
regards, tom lane
Hi,
On 2021-08-06 22:44:07 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
pgstat: Bring up pgstat in BaseInit() to fix uninitialized use of pgstat by AV.
sifaka took exception to this, or at least I guess it was this one out
of the four you pushed at once:
Longfin as well. It's the assertions from ee3f8d3d3ae, but possibly only
exposed after fb2c5028e63.
Not sure why it's your two animals that report this issue, but not others? Why
would a backend doing SendBaseBackup() previously have allocated temp files?
Don't get me wrong - it's good that they surfaced the issue, and it's an issue
independent of the specific trigger.
Glad I added the assertions...
See also /messages/by-id/20210803023612.iziacxk5syn2r4ut@alap3.anarazel.de
TRAP: FailedAssertion("pgstat_is_initialized && !pgstat_is_shutdown", File: "pgstat.c", Line: 4810, PID: 74447)
0 postgres 0x0000000100e5a520 ExceptionalCondition + 124
1 postgres 0x0000000100ca1dec pgstat_reset_counters + 0
2 postgres 0x0000000100ca2548 pgstat_report_tempfile + 80
3 postgres 0x0000000100d07ca4 FileClose + 536
4 postgres 0x0000000100d09764 CleanupTempFiles + 160
5 postgres 0x0000000100d0f8c0 proc_exit_prepare + 228
6 postgres 0x0000000100d0f79c proc_exit + 24
7 postgres 0x0000000100e5ae04 errfinish + 856
8 postgres 0x0000000100f1a804 ProcessInterrupts.cold.9 + 88
9 postgres 0x0000000100d36a98 ProcessInterrupts + 604
10 postgres 0x0000000100cd93c4 sendDir + 1516
11 postgres 0x0000000100cd83a0 perform_base_backup + 3592
12 postgres 0x0000000100cd72c0 SendBaseBackup + 256
13 postgres 0x0000000100ce6a48 exec_replication_command + 1852
14 postgres 0x0000000100d39190 PostgresMain + 3260
15 postgres 0x0000000100ca9c78 process_startup_packet_die + 0
16 postgres 0x0000000100ca94ec ClosePostmasterPorts + 0
17 postgres 0x0000000100ca6a0c PostmasterMain + 4584
18 postgres 0x0000000100c0b798 help + 0
19 libdyld.dylib 0x0000000184391430 start + 4
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
Not sure why it's your two animals that report this issue, but not others? Why
would a backend doing SendBaseBackup() previously have allocated temp files?
Guessing the common factor is "macOS", but that's just a guess.
I can poke into it tomorrow.
regards, tom lane
Hi,
On 2021-08-06 23:13:28 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
Not sure why it's your two animals that report this issue, but not others? Why
would a backend doing SendBaseBackup() previously have allocated temp files?Guessing the common factor is "macOS", but that's just a guess.
Probably not a bad one...
I can poke into it tomorrow.
Thanks! Might be interesting to run the pg_basebackup tests with
log_temp_files=0...
Greetings,
Andres Freund
I wrote:
Guessing the common factor is "macOS", but that's just a guess.
I can poke into it tomorrow.
I did try it real quick on my Mac laptop, and that fails too.
Here's a more accurate backtrace, in case that helps.
(lldb) bt
* thread #1, stop reason = signal SIGSTOP
* frame #0: 0x00007fff2033e92e libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff2036d5bd libsystem_pthread.dylib`pthread_kill + 263
frame #2: 0x00007fff202c2406 libsystem_c.dylib`abort + 125
frame #3: 0x0000000108cdf87f postgres`ExceptionalCondition(conditionName=<unavailable>, errorType=<unavailable>, fileName=<unavailable>, lineNumber=<unavailable>) at assert.c:69:2 [opt]
frame #4: 0x0000000108b0f551 postgres`pgstat_send [inlined] pgstat_assert_is_up at pgstat.c:4810:2 [opt]
frame #5: 0x0000000108b0f532 postgres`pgstat_send(msg=<unavailable>, len=<unavailable>) at pgstat.c:3032 [opt]
frame #6: 0x0000000108b0fbef postgres`pgstat_report_tempfile(filesize=<unavailable>) at pgstat.c:1812:2 [opt]
frame #7: 0x0000000108b7abfe postgres`FileClose [inlined] ReportTemporaryFileUsage(path="base/pgsql_tmp/pgsql_tmp35840.0", size=0) at fd.c:1483:2 [opt]
frame #8: 0x0000000108b7abf6 postgres`FileClose(file=1) at fd.c:1987 [opt]
frame #9: 0x0000000108b7c3b8 postgres`CleanupTempFiles(isCommit=false, isProcExit=true) at fd.c:0 [opt]
frame #10: 0x0000000108b82661 postgres`proc_exit_prepare(code=1) at ipc.c:209:3 [opt]
frame #11: 0x0000000108b8253d postgres`proc_exit(code=1) at ipc.c:107:2 [opt]
frame #12: 0x0000000108ce0201 postgres`errfinish(filename=<unavailable>, lineno=<unavailable>, funcname=<unavailable>) at elog.c:666:3 [opt]
frame #13: 0x0000000108db362b postgres`ProcessInterrupts.cold.9 at postgres.c:3222:3 [opt]
frame #14: 0x0000000108baa2a4 postgres`ProcessInterrupts at postgres.c:3218:22 [opt]
frame #15: 0x0000000108b49f4a postgres`sendDir(path=".", basepathlen=1, sizeonly=false, tablespaces=0x00007fb47f0719c8, sendtblspclinks=true, manifest=<unavailable>, spcoid=0x0000000000000000) at basebackup.c:1277:3 [opt]
frame #16: 0x0000000108b48dc8 postgres`perform_base_backup(opt=0x00007ffee73ae0a0) at basebackup.c:432:5 [opt]
frame #17: 0x0000000108b47bf0 postgres`SendBaseBackup(cmd=<unavailable>) at basebackup.c:949:2 [opt]
frame #18: 0x0000000108b58055 postgres`exec_replication_command(cmd_string="BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS NOWAIT MANIFEST 'yes' ") at walsender.c:1625:4 [opt]
frame #19: 0x0000000108bac8aa postgres`PostgresMain(argc=<unavailable>, argv=<unavailable>, dbname=<unavailable>, username=<unavailable>) at postgres.c:4484:12 [opt]
frame #20: 0x0000000108b178db postgres`BackendRun(port=<unavailable>) at postmaster.c:4519:2 [opt]
frame #21: 0x0000000108b17178 postgres`ServerLoop [inlined] BackendStartup(port=<unavailable>) at postmaster.c:4241:3 [opt]
frame #22: 0x0000000108b17154 postgres`ServerLoop at postmaster.c:1758 [opt]
frame #23: 0x0000000108b1421b postgres`PostmasterMain(argc=4, argv=0x00007fb47ec06640) at postmaster.c:1430:11 [opt]
frame #24: 0x0000000108a6ac13 postgres`main(argc=<unavailable>, argv=<unavailable>) at main.c:199:3 [opt]
frame #25: 0x00007fff20388f3d libdyld.dylib`start + 1
I see two core dumps with what seem to be this same trace after
running pg_basebackup's tests.
regards, tom lane
Hi,
On 2021-08-06 23:22:36 -0400, Tom Lane wrote:
I wrote:
Guessing the common factor is "macOS", but that's just a guess.
I can poke into it tomorrow.I did try it real quick on my Mac laptop, and that fails too.
Here's a more accurate backtrace, in case that helps.
Thanks!
I now managed to reproduce it on a CI system (after some initial
failed attempts due to pg_upgrade tests failing due to path length issues):
https://api.cirrus-ci.com/v1/artifact/task/4863568911794176/log/src/bin/pg_basebackup/tmp_check/log/010_pg_basebackup_main.log
2021-08-06 21:18:40.096 PDT [22031] 010_pg_basebackup.pl LOG: received replication command: BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS NOWAIT MANIFEST 'yes'
2021-08-06 21:18:40.096 PDT [22031] 010_pg_basebackup.pl STATEMENT: BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS NOWAIT MANIFEST 'yes'
2021-08-06 21:18:42.210 PDT [22031] 010_pg_basebackup.pl LOG: could not send data to client: Broken pipe
2021-08-06 21:18:42.210 PDT [22031] 010_pg_basebackup.pl STATEMENT: BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS NOWAIT MANIFEST 'yes'
2021-08-06 21:18:42.210 PDT [22031] 010_pg_basebackup.pl FATAL: connection to client lost
2021-08-06 21:18:42.210 PDT [22031] 010_pg_basebackup.pl STATEMENT: BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS NOWAIT MANIFEST 'yes'
TRAP: FailedAssertion("pgstat_is_initialized && !pgstat_is_shutdown", File: "pgstat.c", Line: 4810, PID: 22031)
vs what I locally get:
2021-08-06 20:52:06.163 PDT [1397252] 010_pg_basebackup.pl LOG: received replication command: BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS NOWAIT MANIFEST 'yes'
2021-08-06 20:52:06.163 PDT [1397252] 010_pg_basebackup.pl STATEMENT: BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS NOWAIT MANIFEST 'yes'
2021-08-06 20:52:08.189 PDT [1397252] 010_pg_basebackup.pl LOG: could not send data to client: Broken pipe
2021-08-06 20:52:08.189 PDT [1397252] 010_pg_basebackup.pl STATEMENT: BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS NOWAIT MANIFEST 'yes'
2021-08-06 20:52:08.189 PDT [1397252] 010_pg_basebackup.pl ERROR: base backup could not send data, aborting backup
2021-08-06 20:52:08.189 PDT [1397252] 010_pg_basebackup.pl STATEMENT: BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS NOWAIT MANIFEST 'yes'
2021-08-06 20:52:08.189 PDT [1397252] 010_pg_basebackup.pl FATAL: connection to client lost
Note that OSX version doesn't have the "base backup could not send data,
aborting backup" message. I guess that is what leads to the difference in
behaviour:
The temp file is created by InitializeBackupManifest(). In the !OSX case, we
first abort via an ERROR, which triggers the cleanup via
WalSndResourceCleanup(). On OSX however, we immediately error out with FATAL
for some reason (timing? network buffering differences?), which will never
reach WalSndErrorCleanup(). Therefore the temp file only gets deleted during
proc_exit(), which triggers the issue...
Not yet really sure what the best way to deal with this is. Presumably this
issue would be fixed if AtProcExit_Files()/CleanupTempFiles() were scheduled
via before_shmem_exit(). And perhaps it's not too off to schedule
CleanupTempFiles() there - but it doesn't quite seem entirely right either.
I'd kinda like to avoid having to overhaul the process exit infrastructure as
a prerequisite to getting the shared memory stats patch in :(.
Greetings,
Andres Freund
Hi,
On 2021-08-06 21:49:52 -0700, Andres Freund wrote:
The temp file is created by InitializeBackupManifest(). In the !OSX case, we
first abort via an ERROR, which triggers the cleanup via
WalSndResourceCleanup(). On OSX however, we immediately error out with FATAL
for some reason (timing? network buffering differences?), which will never
reach WalSndErrorCleanup(). Therefore the temp file only gets deleted during
proc_exit(), which triggers the issue...Not yet really sure what the best way to deal with this is. Presumably this
issue would be fixed if AtProcExit_Files()/CleanupTempFiles() were scheduled
via before_shmem_exit(). And perhaps it's not too off to schedule
CleanupTempFiles() there - but it doesn't quite seem entirely right either.
Huh. I just noticed that AtProcExit_Files() is not even scheduled via
on_shmem_exit() but on_proc_exit(). That means that even before fb2c5028e63
we sent pgstat messages *well* after pgstat_shutdown_hook() already
ran. Crufty.
Just hacking in an earlier CleanupTempFiles() does "fix" the OSX issue:
https://cirrus-ci.com/task/5941265494704128?logs=macos_basebackup#L4
I'm inclined to leave things as-is until tomorrow to see if other things are
shaken loose and then either commit a bandaid along those lines or revert the
patch. Or something proper if we can figure it out till then.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2021-08-06 21:49:52 -0700, Andres Freund wrote:
Not yet really sure what the best way to deal with this is. Presumably this
issue would be fixed if AtProcExit_Files()/CleanupTempFiles() were scheduled
via before_shmem_exit(). And perhaps it's not too off to schedule
CleanupTempFiles() there - but it doesn't quite seem entirely right either.
Huh. I just noticed that AtProcExit_Files() is not even scheduled via
on_shmem_exit() but on_proc_exit(). That means that even before fb2c5028e63
we sent pgstat messages *well* after pgstat_shutdown_hook() already
ran. Crufty.
Just hacking in an earlier CleanupTempFiles() does "fix" the OSX issue:
https://cirrus-ci.com/task/5941265494704128?logs=macos_basebackup#L4
So if I have the lay of the land correctly:
1. Somebody decided it'd be a great idea for temp file cleanup to send
stats collector messages.
2. Temp file cleanup happens after shmem disconnection.
3. This accidentally worked, up to now, because stats transmission happens
via a separate socket not shared memory.
4. We can't keep both #1 and #2 if we'd like to switch to shmem-based
stats collection.
Intuitively it seems like temp file management should be a low-level,
backend-local function and therefore should be okay to run after
shmem disconnection. I do not have a warm feeling about reversing that
module layering --- what's to stop someone from breaking things by
trying to use a temp file in their on_proc_exit or on_shmem_exit hook?
Maybe what needs to go overboard is point 1.
More generally, this points up the fact that we don't have a well-defined
module hierarchy that would help us understand what code can safely do which.
I'm not volunteering to design that, but maybe it needs to happen soon.
regards, tom lane
Hi,
On 2021-08-07 11:43:07 -0400, Tom Lane wrote:
So if I have the lay of the land correctly:
1. Somebody decided it'd be a great idea for temp file cleanup to send
stats collector messages.2. Temp file cleanup happens after shmem disconnection.
3. This accidentally worked, up to now, because stats transmission happens
via a separate socket not shared memory.4. We can't keep both #1 and #2 if we'd like to switch to shmem-based
stats collection.
Sounds accurate to me.
Intuitively it seems like temp file management should be a low-level,
backend-local function and therefore should be okay to run after
shmem disconnection. I do not have a warm feeling about reversing that
module layering --- what's to stop someone from breaking things by
trying to use a temp file in their on_proc_exit or on_shmem_exit hook?
We could just add an assert preventing that from happening. It's hard to
believe that there could be a good reason to use temp files in those hook
points.
I'm somewhat inclined to split InitFileAccess() into two by separating out
InitTemporaryFileAccess() or such. InitFileAccess() would continue to happen
early and register a proc exit hook that errors out when there's a temp file
(as a backstop for non cassert builds). The new InitTemporaryFileAccess()
would happen a bit later and schedule CleanupTempFiles() to happen via
before_shmem_access(). And we add a Assert(!proc_exit_inprogress) to the
routines for opening a temp file.
Maybe what needs to go overboard is point 1.
Keeping stats of temp files seems useful enough that I'm a bit hesitant to go
there. I guess we could just prevent pgstats_report_tempfile() from being
called when CleanupTempFiles() is called during proc exit, but that doesn't
seem great.
More generally, this points up the fact that we don't have a well-defined
module hierarchy that would help us understand what code can safely do which.
I'm not volunteering to design that, but maybe it needs to happen soon.
I agree. Part of the reason for whacking around process startup (in both
pushed and still pending commits) was that previously it wasn't just poorly
defined, it differed significantly between platforms. And I'm quite unhappy
with the vagueness in which we defined the meaning of the various shutdown
callbacks ([1]/messages/by-id/20210803023612.iziacxk5syn2r4ut@alap3.anarazel.de).
I suspect to even get to the point of doing a useful redesign of the module
hierarchy, we'd need to unify more of the process initialization between
EXEC_BACKEND and normal builds.
I've bitten by all this often enough to be motivated to propose
something. However I want to get the basics of the shared memory stats stuff
in first - it's a pain to keep it upated, and we'll need to find and solve all
of the issues it has anyway, even if we go for a redesign of module / startup
/ shutdown layering.
Greetings,
Andres Freund
[1]: /messages/by-id/20210803023612.iziacxk5syn2r4ut@alap3.anarazel.de
Andres Freund <andres@anarazel.de> writes:
On 2021-08-07 11:43:07 -0400, Tom Lane wrote:
Intuitively it seems like temp file management should be a low-level,
backend-local function and therefore should be okay to run after
shmem disconnection. I do not have a warm feeling about reversing that
module layering --- what's to stop someone from breaking things by
trying to use a temp file in their on_proc_exit or on_shmem_exit hook?
We could just add an assert preventing that from happening. It's hard to
believe that there could be a good reason to use temp files in those hook
points.
The bigger picture here is that anyplace anybody ever wants to add stats
collection in suddenly becomes a "must run before shmem disconnection"
module. I think that way madness lies --- we can't have the entire
backend shut down before shmem disconnection. So I feel like there's
a serious design problem here, and it's not confined to temp files,
or at least it won't stay confined to temp files. (There may indeed
be other problems already, that we just haven't had the good luck to
have buildfarm timing vagaries expose to us.)
Maybe the solution is to acknowledge that we might lose some events
during backend shutdown, and redefine the behavior as "we ignore
event reports after pgstat shutdown", not "we assert that there never
can be any such reports".
I'm somewhat inclined to split InitFileAccess() into two by separating out
InitTemporaryFileAccess() or such. InitFileAccess() would continue to happen
early and register a proc exit hook that errors out when there's a temp file
(as a backstop for non cassert builds). The new InitTemporaryFileAccess()
would happen a bit later and schedule CleanupTempFiles() to happen via
before_shmem_access(). And we add a Assert(!proc_exit_inprogress) to the
routines for opening a temp file.
Maybe that would work, but after you multiply it by a bunch of different
scenarios in different modules, it's going to get less and less attractive.
I've bitten by all this often enough to be motivated to propose
something. However I want to get the basics of the shared memory stats stuff
in first - it's a pain to keep it upated, and we'll need to find and solve all
of the issues it has anyway, even if we go for a redesign of module / startup
/ shutdown layering.
Fair. But I suggest that the first cut should look more like what
I suggest above, ie just be willing to lose events during shutdown.
The downsides of that are not so enormous that we should be willing
to undertake major klugery to avoid it before we've even got a
semi-working system.
regards, tom lane
Hi,
On 2021-08-07 13:06:47 -0400, Tom Lane wrote:
The bigger picture here is that anyplace anybody ever wants to add stats
collection in suddenly becomes a "must run before shmem disconnection"
module. I think that way madness lies --- we can't have the entire
backend shut down before shmem disconnection. So I feel like there's
a serious design problem here, and it's not confined to temp files,
or at least it won't stay confined to temp files. (There may indeed
be other problems already, that we just haven't had the good luck to
have buildfarm timing vagaries expose to us.)
I think more often it should not end up as "must run before shmem
disconnection" as a whole, but should be split into a portion running at that
point.
Maybe the solution is to acknowledge that we might lose some events
during backend shutdown, and redefine the behavior as "we ignore
event reports after pgstat shutdown", not "we assert that there never
can be any such reports".
I think it's fine to make such calls, but that it ought to reside in the stats
emitting modules. Only it can decide whether needing to emit stats during
shutdown is a rare edge case or a commonly expected path. E.g. the case of
parallel worker shutdown sending stats too late is something common enough to
be problematic, so I don't want to make it too hard to detect such cases.
I'm somewhat inclined to split InitFileAccess() into two by separating out
InitTemporaryFileAccess() or such. InitFileAccess() would continue to happen
early and register a proc exit hook that errors out when there's a temp file
(as a backstop for non cassert builds). The new InitTemporaryFileAccess()
would happen a bit later and schedule CleanupTempFiles() to happen via
before_shmem_access(). And we add a Assert(!proc_exit_inprogress) to the
routines for opening a temp file.Maybe that would work, but after you multiply it by a bunch of different
scenarios in different modules, it's going to get less and less attractive.
I'm not quite convinced. Even if we had a nicer ordering / layering of
subsystems, we'd still have to deal with subsystems that don't fit nicely
because they have conflicting needs. And we'd still need detection of use of
subsystems that already have not been initialized / are already shut down.
One example of that is imo fd.c being a very low level module that might be
needed during shutdown of other modules but which also currently depends on
the stats subsystem for temp file management. Which doesn't really make sense,
because stats very well could depend on fd.c routines.
I think needing to split the current fd.c subsystem into a lower-level (file
access) and a higher level (temporary file management) module is precisely
what a better designed module layering system will *force* us to do.
I've bitten by all this often enough to be motivated to propose
something. However I want to get the basics of the shared memory stats stuff
in first - it's a pain to keep it upated, and we'll need to find and solve all
of the issues it has anyway, even if we go for a redesign of module / startup
/ shutdown layering.Fair. But I suggest that the first cut should look more like what
I suggest above, ie just be willing to lose events during shutdown.
The downsides of that are not so enormous that we should be willing
to undertake major klugery to avoid it before we've even got a
semi-working system.
I think that's more likely to hide bugs unfortunately. Consider fa91d4c91f2 -
I might not have found that if we had just ignored "too late" pgstats activity
in pgstats.c or fd.c, and that's not an edge case.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2021-08-07 13:06:47 -0400, Tom Lane wrote:
Fair. But I suggest that the first cut should look more like what
I suggest above, ie just be willing to lose events during shutdown.
The downsides of that are not so enormous that we should be willing
to undertake major klugery to avoid it before we've even got a
semi-working system.
I think that's more likely to hide bugs unfortunately. Consider fa91d4c91f2 -
I might not have found that if we had just ignored "too late" pgstats activity
in pgstats.c or fd.c, and that's not an edge case.
Depends what you want to define as a bug. What I am not happy about
is the prospect of random assertion failures for the next six months
while you finish redesigning half of the system. The rest of us
have work we want to get done, too. I don't object to the idea of
making no-lost-events an end goal, but we are clearly not ready
for that today.
regards, tom lane
Hi,
On 2021-08-07 13:37:16 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2021-08-07 13:06:47 -0400, Tom Lane wrote:
Fair. But I suggest that the first cut should look more like what
I suggest above, ie just be willing to lose events during shutdown.
The downsides of that are not so enormous that we should be willing
to undertake major klugery to avoid it before we've even got a
semi-working system.I think that's more likely to hide bugs unfortunately. Consider fa91d4c91f2 -
I might not have found that if we had just ignored "too late" pgstats activity
in pgstats.c or fd.c, and that's not an edge case.Depends what you want to define as a bug. What I am not happy about
is the prospect of random assertion failures for the next six months
while you finish redesigning half of the system. The rest of us
have work we want to get done, too. I don't object to the idea of
making no-lost-events an end goal, but we are clearly not ready
for that today.
I don't know what to do about that. How would we even find these cases if they
aren't hit during regression tests on my machine (nor on a lot of others)?
Obviously I had ran the regression tests many times before pushing the earlier
changes.
The check for pgstat being up is in one central place and thus easily can be
turned into a warning if problems around the shutdown sequence become a
frequent issue on HEAD. If you think it's better to turn that into a WARNING
now and then arm into an assert later, I can live with that as well, but I
don't think it'll lead to a better outcome.
The shared memory stats stuff isn't my personal project - it's
Horiguchi-san's. I picked it up because it seems like an important thing to
address and because it'd been in maybe a dozen CFs without a lot of
progress. It just turns out that there's a lot of prerequisite changes :(.
Greetings,
Andres Freund
Hi,
On 2021-08-07 09:48:50 -0700, Andres Freund wrote:
I'm somewhat inclined to split InitFileAccess() into two by separating out
InitTemporaryFileAccess() or such. InitFileAccess() would continue to happen
early and register a proc exit hook that errors out when there's a temp file
(as a backstop for non cassert builds). The new InitTemporaryFileAccess()
would happen a bit later and schedule CleanupTempFiles() to happen via
before_shmem_access(). And we add a Assert(!proc_exit_inprogress) to the
routines for opening a temp file.
Attached is a patch showing how this could look like. Note that the PANIC
should likely not be that but a WARNING, but the PANIC more useful for running
some initial tests...
I'm not sure whether we'd want to continue having the proc exit hook? It seems
to me that asserts would provide a decent enough protection against
introducing new temp files during shutdown.
Alternatively we could make the asserts in OpenTemporaryFile et al
elog(ERROR)s, and be pretty certain that no temp files would be open too late?
Greetings,
Andres Freund
Attachments:
file-access.difftext/x-diff; charset=us-asciiDownload
diff --git i/src/include/storage/fd.h w/src/include/storage/fd.h
index 2d843eb9929..34602ae0069 100644
--- i/src/include/storage/fd.h
+++ w/src/include/storage/fd.h
@@ -158,6 +158,7 @@ extern int MakePGDirectory(const char *directoryName);
/* Miscellaneous support routines */
extern void InitFileAccess(void);
+extern void InitTemporaryFileAccess(void);
extern void set_max_safe_fds(void);
extern void closeAllVfds(void);
extern void SetTempTablespaces(Oid *tableSpaces, int numSpaces);
diff --git i/src/backend/storage/file/fd.c w/src/backend/storage/file/fd.c
index df45aa56841..57556a0ac86 100644
--- i/src/backend/storage/file/fd.c
+++ w/src/backend/storage/file/fd.c
@@ -231,6 +231,9 @@ static bool have_xact_temporary_files = false;
*/
static uint64 temporary_files_size = 0;
+/* Temporary file access initialized and not yet shut down? */
+static bool temporary_files_allowed = false;
+
/*
* List of OS handles opened with AllocateFile, AllocateDir and
* OpenTransientFile.
@@ -328,6 +331,7 @@ static bool reserveAllocatedDesc(void);
static int FreeDesc(AllocateDesc *desc);
static void AtProcExit_Files(int code, Datum arg);
+static void BeforeShmemExit_Files(int code, Datum arg);
static void CleanupTempFiles(bool isCommit, bool isProcExit);
static void RemovePgTempRelationFiles(const char *tsdirname);
static void RemovePgTempRelationFilesInDbspace(const char *dbspacedirname);
@@ -868,6 +872,9 @@ durable_rename_excl(const char *oldfile, const char *newfile, int elevel)
*
* This is called during either normal or standalone backend start.
* It is *not* called in the postmaster.
+ *
+ * Note that this does not initialize temporary file access, that is
+ * separately initialized via InitTemporaryFileAccess().
*/
void
InitFileAccess(void)
@@ -886,9 +893,34 @@ InitFileAccess(void)
SizeVfdCache = 1;
- /* register proc-exit hook to ensure temp files are dropped at exit */
+ /*
+ * Register proc-exit hook to ensure temp files are dropped at exit. This
+ * serves as a backstop to BeforeShmemExit_Files() in case somebody
+ * creates a temp file in a shutdown hook.
+ */
on_proc_exit(AtProcExit_Files, 0);
- before_shmem_exit(AtProcExit_Files, 0);
+}
+
+/*
+ * InitTemporaryFileAccess --- initialize temporary file access during startup
+ *
+ * This is called during either normal or standalone backend start.
+ * It is *not* called in the postmaster.
+ */
+void
+InitTemporaryFileAccess(void)
+{
+ Assert(SizeVfdCache != 0); /* InitFileAccess() needs to have run*/
+
+ /*
+ * Register before-shmem-exit hook to ensure temp files are dropped while
+ * we can still report stats.
+ */
+ before_shmem_exit(BeforeShmemExit_Files, 0);
+
+#ifdef USE_ASSERT_CHECKING
+ temporary_files_allowed = true;
+#endif
}
/*
@@ -1671,6 +1703,8 @@ OpenTemporaryFile(bool interXact)
{
File file = 0;
+ Assert(temporary_files_allowed); /* check temp file access is up */
+
/*
* Make sure the current resource owner has space for this File before we
* open it, if we'll be registering it below.
@@ -1806,6 +1840,8 @@ PathNameCreateTemporaryFile(const char *path, bool error_on_failure)
{
File file;
+ Assert(temporary_files_allowed); /* check temp file access is up */
+
ResourceOwnerEnlargeFiles(CurrentResourceOwner);
/*
@@ -1844,6 +1880,8 @@ PathNameOpenTemporaryFile(const char *path, int mode)
{
File file;
+ Assert(temporary_files_allowed); /* check temp file access is up */
+
ResourceOwnerEnlargeFiles(CurrentResourceOwner);
file = PathNameOpenFile(path, mode | PG_BINARY);
@@ -3004,11 +3042,28 @@ AtEOXact_Files(bool isCommit)
numTempTableSpaces = -1;
}
+/*
+ * BeforeShmemExit_Files
+ *
+ * before_shmem_access hook to clean up temp files during backend shutdown.
+ * Here, we want to clean up *all* temp files including interXact ones.
+ */
+static void
+BeforeShmemExit_Files(int code, Datum arg)
+{
+ CleanupTempFiles(false, true);
+
+ /* prevent further temp files from being created */
+ temporary_files_allowed = false;
+}
+
/*
* AtProcExit_Files
*
* on_proc_exit hook to clean up temp files during backend shutdown.
- * Here, we want to clean up *all* temp files including interXact ones.
+ *
+ * They all should have been cleaned up during BeforeShmemExit_Files and is
+ * just a backstop / debugging aid.
*/
static void
AtProcExit_Files(int code, Datum arg)
@@ -3027,6 +3082,10 @@ AtProcExit_Files(int code, Datum arg)
* that's not the case, we are being called for transaction commit/abort
* and should only remove transaction-local temp files. In either case,
* also clean up "allocated" stdio files, dirs and fds.
+ *
+ * This will be called twice during shutdown. Once from
+ * BeforeShmemExit_Files(), once from AtProcExit_Files(). The latter should
+ * not find any temporary files and is just a backstop / debugging aid.
*/
static void
CleanupTempFiles(bool isCommit, bool isProcExit)
@@ -3055,7 +3114,14 @@ CleanupTempFiles(bool isCommit, bool isProcExit)
* debugging cross-check.
*/
if (isProcExit)
+ {
+ /* FIXME: replace PANIC with WARNING before commit */
+ if (!temporary_files_allowed)
+ elog(PANIC,
+ "temporary file %s open while temporary file access is not allowed",
+ VfdCache[i].fileName);
FileClose(i);
+ }
else if (fdstate & FD_CLOSE_AT_EOXACT)
{
elog(WARNING,
diff --git i/src/backend/utils/init/postinit.c w/src/backend/utils/init/postinit.c
index 87dc060b201..5089dd43ae2 100644
--- i/src/backend/utils/init/postinit.c
+++ w/src/backend/utils/init/postinit.c
@@ -517,6 +517,12 @@ BaseInit(void)
*/
DebugFileOpen();
+ /*
+ * Initialize file access. Done early so other subsystems can access
+ * files.
+ */
+ InitFileAccess();
+
/*
* Initialize statistics reporting. This needs to happen early to ensure
* that pgstat's shutdown callback runs after the shutdown callbacks of
@@ -525,11 +531,16 @@ BaseInit(void)
*/
pgstat_initialize();
- /* Do local initialization of file, storage and buffer managers */
- InitFileAccess();
+ /* Do local initialization of storage and buffer managers */
InitSync();
smgrinit();
InitBufferPoolAccess();
+
+ /*
+ * Initialize temporary file access after pgstat, so that the temorary
+ * file shutdown hook can report temporary file statistics.
+ */
+ InitTemporaryFileAccess();
}
Andres Freund <andres@anarazel.de> writes:
On 2021-08-07 13:37:16 -0400, Tom Lane wrote:
Depends what you want to define as a bug. What I am not happy about
is the prospect of random assertion failures for the next six months
while you finish redesigning half of the system. The rest of us
have work we want to get done, too. I don't object to the idea of
making no-lost-events an end goal, but we are clearly not ready
for that today.
I don't know what to do about that. How would we even find these cases if they
aren't hit during regression tests on my machine (nor on a lot of others)?
The regression tests really aren't that helpful for testing the problem
scenario here, which basically is SIGTERM'ing a query-in-progress.
I'm rather surprised that the buildfarm managed to exercise that at all.
You might try setting up a test scaffold that runs the core regression
tests and SIGINT's the postmaster, or alternatively SIGTERM's some
individual session, at random times partway through. Obviously this
will make the regression tests report failure, but what to look for
is if anything dumps core on the way out.
regards, tom lane
Hi,
On 2021-08-07 15:12:38 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2021-08-07 13:37:16 -0400, Tom Lane wrote:
Depends what you want to define as a bug. What I am not happy about
is the prospect of random assertion failures for the next six months
while you finish redesigning half of the system. The rest of us
have work we want to get done, too. I don't object to the idea of
making no-lost-events an end goal, but we are clearly not ready
for that today.I don't know what to do about that. How would we even find these cases if they
aren't hit during regression tests on my machine (nor on a lot of others)?The regression tests really aren't that helpful for testing the problem
scenario here, which basically is SIGTERM'ing a query-in-progress.
I'm rather surprised that the buildfarm managed to exercise that at all.
They're also not that helpful because this problem likely is unreachable for
any tempfiles other than the one in InitializeBackupManifest(). Pretty much
all, or even all, the other tempfiles are cleaned up either via transaction
and/or resowner cleanup.
I wonder if we should do something about WalSndResourceCleanup() not being
reached for FATALs? I think at least a note in WalSndResourceCleanup()
commenting on that fact seems like it might be a good idea?
It seems like it could eventually be a problem that the resowners added in
0d8c9c1210c4 aren't ever cleaned up in case of a FATAL error. Most resowner
cleanup actions are also backstopped with some form of on-exit hook, but I
don't think it's all - e.g. buffer pins aren't.
I guess I should start a thread about this on -hackers...
You might try setting up a test scaffold that runs the core regression
tests and SIGINT's the postmaster, or alternatively SIGTERM's some
individual session, at random times partway through. Obviously this
will make the regression tests report failure, but what to look for
is if anything dumps core on the way out.
Worth trying.
Greetings,
Andres Freund
Hi,
On 2021-08-07 12:01:31 -0700, Andres Freund wrote:
Attached is a patch showing how this could look like. Note that the PANIC
should likely not be that but a WARNING, but the PANIC more useful for running
some initial tests...
I pushed a slightly evolved version of this. As the commit message noted, this
may not be the best approach, but we can revise after further discussion.
I'm not sure whether we'd want to continue having the proc exit hook? It seems
to me that asserts would provide a decent enough protection against
introducing new temp files during shutdown.
Alternatively we could make the asserts in OpenTemporaryFile et al
elog(ERROR)s, and be pretty certain that no temp files would be open too late?
I ended up removing the proc exit hook and not converting the asserts to an
elog(). Happy to change either.
Greetings,
Andres Freund
On Sat, Aug 7, 2021 at 11:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Intuitively it seems like temp file management should be a low-level,
backend-local function and therefore should be okay to run after
shmem disconnection. I do not have a warm feeling about reversing that
module layering --- what's to stop someone from breaking things by
trying to use a temp file in their on_proc_exit or on_shmem_exit hook?
Maybe what needs to go overboard is point 1.More generally, this points up the fact that we don't have a well-defined
module hierarchy that would help us understand what code can safely do which.
I'm not volunteering to design that, but maybe it needs to happen soon.
Yeah, I was quite surprised when I saw this commit, because my first
reaction was - why in the world would temporary file shutdown properly
precede DSM shutdown, given that temporary files are a low-level
mechanism? The explanation that we're trying to send statistics at
that point makes sense as far as it goes, but it seems to me that we
cannot be far from having a circular dependency. All we need is to
have DSM require the use of temporary files, and we'll end up needing
DSM shutdown to happen both before and after temporary file cleanup.
/me wonders idly about dynamic_shared_memory_type=file
I think that subsystems like "memory" and "files" really ought to be
the lowest-level things we have, and should be shut down last. Stuff
like "send a message to the stats collector" seems like a higher level
thing that may require those lower-level facilities in order to
operate, and must therefore be shut down first. Maybe some subsystems
need to be divided into upper and lower levels to make this work, or,
well, I don't know, something else. But I'm deeply suspicious that
lifting stuff like this to the front of the shutdown sequence is just
papering over the problem, and not actually solving it.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi,
On 2021-08-09 11:46:30 -0400, Robert Haas wrote:
I think that subsystems like "memory" and "files" really ought to be
the lowest-level things we have, and should be shut down last.
I don't disagree with that - but there's a difference between having that as
an abstract goal, and having it a dependency of committing somebodies patch.
Stuff like "send a message to the stats collector" seems like a higher level
thing that may require those lower-level facilities in order to operate, and
must therefore be shut down first.
Yep.
Maybe some subsystems need to be divided into upper and lower levels to make
this work, or, well, I don't know, something else.
That's what I ended up doing, right? There's now InitFileAccess() and
InitTemporaryFileAccess().
But I'm deeply suspicious that lifting stuff like this to the front of the
shutdown sequence is just papering over the problem, and not actually
solving it.
If you have a concrete proposal that you think makes sense to tie shared
memory stats to, I'm happy to entertain it. One main motivator for b406478b87e
etc was to allow rejiggering things like this more easily.
Greetings,
Andres Freund
Hi,
On Sun, Aug 8, 2021 at 11:24 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2021-08-07 12:01:31 -0700, Andres Freund wrote:
Attached is a patch showing how this could look like. Note that the PANIC
should likely not be that but a WARNING, but the PANIC more useful for running
some initial tests...I pushed a slightly evolved version of this. As the commit message noted, this
may not be the best approach, but we can revise after further discussion.
While testing streaming logical replication, I got another assertion
failure with the current HEAD (e2ce88b58f) when the apply worker
raised an error during applying spooled changes:
TRAP: FailedAssertion("pgstat_is_initialized && !pgstat_is_shutdown",
File: "pgstat.c", Line: 4810, PID: 11084)
0 postgres 0x000000010704fd9a
ExceptionalCondition + 234
1 postgres 0x0000000106d41e62
pgstat_assert_is_up + 66
2 postgres 0x0000000106d43854 pgstat_send + 20
3 postgres 0x0000000106d4433e
pgstat_report_tempfile + 94
4 postgres 0x0000000106df9519
ReportTemporaryFileUsage + 25
5 postgres 0x0000000106df945a
PathNameDeleteTemporaryFile + 282
6 postgres 0x0000000106df8c7e
unlink_if_exists_fname + 174
7 postgres 0x0000000106df8b3a walkdir + 426
8 postgres 0x0000000106df8982
PathNameDeleteTemporaryDir + 82
9 postgres 0x0000000106dfe591
SharedFileSetDeleteAll + 113
10 postgres 0x0000000106dfdf62
SharedFileSetDeleteOnProcExit + 66
11 postgres 0x0000000106e05275
proc_exit_prepare + 325
12 postgres 0x0000000106e050a3 proc_exit + 19
13 postgres 0x0000000106d3ba99
StartBackgroundWorker + 649
14 postgres 0x0000000106d54e85
do_start_bgworker + 613
15 postgres 0x0000000106d4ef26
maybe_start_bgworkers + 486
16 postgres 0x0000000106d4d767 sigusr1_handler + 631
17 libsystem_platform.dylib 0x00007fff736705fd _sigtramp + 29
18 ??? 0x0000000000000000 0x0 + 0
19 postgres 0x0000000106d4c990 PostmasterMain + 6640
20 postgres 0x0000000106c24fa3 main + 819
21 libdyld.dylib 0x00007fff73477cc9 start + 1
The apply worker registers SharedFileSetDeleteOnProcExit() when
creating a file set to serialize the changes. When it raises an error
due to conflict during applying the change, the callback eventually
reports the temp file statistics but pgstat already shut down,
resulting in this assertion failure.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Tue, Aug 10, 2021 at 4:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
The apply worker registers SharedFileSetDeleteOnProcExit() when
creating a file set to serialize the changes. When it raises an error
due to conflict during applying the change, the callback eventually
reports the temp file statistics but pgstat already shut down,
resulting in this assertion failure.
I think we can try to fix this by registering to clean up these files
via before_shmem_exit() as done by Andres in commit 675c945394.
Similar to that commit, we can change the function name
SharedFileSetDeleteOnProcExit to SharedFileSetDeleteOnShmExit and
register it via before_shmem_exit() instead of on_proc_exit(). Can you
try that and see if it fixes the issue for you unless you have better
ideas to try out?
--
With Regards,
Amit Kapila.
On Wed, Aug 11, 2021 at 10:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Aug 10, 2021 at 4:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
The apply worker registers SharedFileSetDeleteOnProcExit() when
creating a file set to serialize the changes. When it raises an error
due to conflict during applying the change, the callback eventually
reports the temp file statistics but pgstat already shut down,
resulting in this assertion failure.I think we can try to fix this by registering to clean up these files
via before_shmem_exit() as done by Andres in commit 675c945394.
Similar to that commit, we can change the function name
SharedFileSetDeleteOnProcExit to SharedFileSetDeleteOnShmExit and
register it via before_shmem_exit() instead of on_proc_exit(). Can you
try that and see if it fixes the issue for you unless you have better
ideas to try out?
It seems to me that moving the shared fileset cleanup to
before_shmem_exit() is the right approach to fix this problem. The
issue is fixed by the attached patch.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
0001-Move-shared-fileset-cleanup-to-before_shmem_exit.patchapplication/octet-stream; name=0001-Move-shared-fileset-cleanup-to-before_shmem_exit.patchDownload
From 4134a5edc2cdf80ddb0c1d9e3c76378329418f7d Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 12 Aug 2021 10:57:41 +0900
Subject: [PATCH] Move shared fileset cleanup to before_shmem_exit().
The reported problem is that shared file set created in
SharedFileSetInit() by logical replication apply worker is cleaned up
in SharedFileSetDeleteOnProcExit() when the process exited on an error
due to a conflict. As shared fileset cleanup causes pgstat reporting
for underlying temporary files, the assertions added in ee3f8d3d3ae
caused failures.
To fix the problem, similar to 675c945394, move shared fileset cleanup
to a before_shmem_exit() hook, ensuring that the fileset is dropped
while we can still report stats for underlying temporary files.
---
src/backend/storage/file/sharedfileset.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/src/backend/storage/file/sharedfileset.c b/src/backend/storage/file/sharedfileset.c
index ed37c940ad..0d9700bf56 100644
--- a/src/backend/storage/file/sharedfileset.c
+++ b/src/backend/storage/file/sharedfileset.c
@@ -36,7 +36,7 @@
static List *filesetlist = NIL;
static void SharedFileSetOnDetach(dsm_segment *segment, Datum datum);
-static void SharedFileSetDeleteOnProcExit(int status, Datum arg);
+static void SharedFileSetDeleteBeforeShmemExit(int status, Datum arg);
static void SharedFileSetPath(char *path, SharedFileSet *fileset, Oid tablespace);
static void SharedFilePath(char *path, SharedFileSet *fileset, const char *name);
static Oid ChooseTablespace(const SharedFileSet *fileset, const char *name);
@@ -112,7 +112,12 @@ SharedFileSetInit(SharedFileSet *fileset, dsm_segment *seg)
* fileset clean up.
*/
Assert(filesetlist == NIL);
- on_proc_exit(SharedFileSetDeleteOnProcExit, 0);
+
+ /*
+ * Register before-shmem-exit hook to ensure fileset is dropped
+ * while we can still report stats for underlying temporary files.
+ */
+ before_shmem_exit(SharedFileSetDeleteBeforeShmemExit, 0);
registered_cleanup = true;
}
@@ -259,12 +264,12 @@ SharedFileSetOnDetach(dsm_segment *segment, Datum datum)
}
/*
- * Callback function that will be invoked on the process exit. This will
+ * Callback function that will be invoked before shmem exit. This will
* process the list of all the registered sharedfilesets and delete the
* underlying files.
*/
static void
-SharedFileSetDeleteOnProcExit(int status, Datum arg)
+SharedFileSetDeleteBeforeShmemExit(int status, Datum arg)
{
/*
* Remove all the pending shared fileset entries. We don't use foreach()
--
2.24.3 (Apple Git-128)
On Thu, Aug 12, 2021 at 7:39 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Aug 11, 2021 at 10:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Aug 10, 2021 at 4:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
The apply worker registers SharedFileSetDeleteOnProcExit() when
creating a file set to serialize the changes. When it raises an error
due to conflict during applying the change, the callback eventually
reports the temp file statistics but pgstat already shut down,
resulting in this assertion failure.I think we can try to fix this by registering to clean up these files
via before_shmem_exit() as done by Andres in commit 675c945394.
Similar to that commit, we can change the function name
SharedFileSetDeleteOnProcExit to SharedFileSetDeleteOnShmExit and
register it via before_shmem_exit() instead of on_proc_exit(). Can you
try that and see if it fixes the issue for you unless you have better
ideas to try out?It seems to me that moving the shared fileset cleanup to
before_shmem_exit() is the right approach to fix this problem. The
issue is fixed by the attached patch.
+1, the fix makes sense to me.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Thu, Aug 12, 2021 at 11:38 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, Aug 12, 2021 at 7:39 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Aug 11, 2021 at 10:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Aug 10, 2021 at 4:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
The apply worker registers SharedFileSetDeleteOnProcExit() when
creating a file set to serialize the changes. When it raises an error
due to conflict during applying the change, the callback eventually
reports the temp file statistics but pgstat already shut down,
resulting in this assertion failure.I think we can try to fix this by registering to clean up these files
via before_shmem_exit() as done by Andres in commit 675c945394.
Similar to that commit, we can change the function name
SharedFileSetDeleteOnProcExit to SharedFileSetDeleteOnShmExit and
register it via before_shmem_exit() instead of on_proc_exit(). Can you
try that and see if it fixes the issue for you unless you have better
ideas to try out?It seems to me that moving the shared fileset cleanup to
before_shmem_exit() is the right approach to fix this problem. The
issue is fixed by the attached patch.+1, the fix makes sense to me.
I have also tested and fix works for me. The fix works because
pgstat_initialize() is called before we register clean up in
SharedFileSetInit(). I am not sure if we need an Assert to ensure that
and if so how we can do that? Any suggestions?
--
With Regards,
Amit Kapila.
On Thu, Aug 12, 2021 at 3:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Aug 12, 2021 at 11:38 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, Aug 12, 2021 at 7:39 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Aug 11, 2021 at 10:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Aug 10, 2021 at 4:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
The apply worker registers SharedFileSetDeleteOnProcExit() when
creating a file set to serialize the changes. When it raises an error
due to conflict during applying the change, the callback eventually
reports the temp file statistics but pgstat already shut down,
resulting in this assertion failure.I think we can try to fix this by registering to clean up these files
via before_shmem_exit() as done by Andres in commit 675c945394.
Similar to that commit, we can change the function name
SharedFileSetDeleteOnProcExit to SharedFileSetDeleteOnShmExit and
register it via before_shmem_exit() instead of on_proc_exit(). Can you
try that and see if it fixes the issue for you unless you have better
ideas to try out?It seems to me that moving the shared fileset cleanup to
before_shmem_exit() is the right approach to fix this problem. The
issue is fixed by the attached patch.+1, the fix makes sense to me.
I have also tested and fix works for me. The fix works because
pgstat_initialize() is called before we register clean up in
SharedFileSetInit(). I am not sure if we need an Assert to ensure that
and if so how we can do that? Any suggestions?
I think that the assertion added by ee3f8d3d3ae ensures that
pgstat_initialize() is callbed before the callback for fileset cleanup
is registered, no?
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Thu, Aug 12, 2021 at 1:13 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Aug 12, 2021 at 3:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Aug 12, 2021 at 11:38 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, Aug 12, 2021 at 7:39 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Aug 11, 2021 at 10:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Aug 10, 2021 at 4:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
The apply worker registers SharedFileSetDeleteOnProcExit() when
creating a file set to serialize the changes. When it raises an error
due to conflict during applying the change, the callback eventually
reports the temp file statistics but pgstat already shut down,
resulting in this assertion failure.I think we can try to fix this by registering to clean up these files
via before_shmem_exit() as done by Andres in commit 675c945394.
Similar to that commit, we can change the function name
SharedFileSetDeleteOnProcExit to SharedFileSetDeleteOnShmExit and
register it via before_shmem_exit() instead of on_proc_exit(). Can you
try that and see if it fixes the issue for you unless you have better
ideas to try out?It seems to me that moving the shared fileset cleanup to
before_shmem_exit() is the right approach to fix this problem. The
issue is fixed by the attached patch.+1, the fix makes sense to me.
I have also tested and fix works for me. The fix works because
pgstat_initialize() is called before we register clean up in
SharedFileSetInit(). I am not sure if we need an Assert to ensure that
and if so how we can do that? Any suggestions?I think that the assertion added by ee3f8d3d3ae ensures that
pgstat_initialize() is callbed before the callback for fileset cleanup
is registered, no?
Right, it ensures that callback for fileset, is called after
pgstat_initialize() and before pgstat_shutdown.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Thu, Aug 12, 2021 at 1:13 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Aug 12, 2021 at 3:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I have also tested and fix works for me. The fix works because
pgstat_initialize() is called before we register clean up in
SharedFileSetInit(). I am not sure if we need an Assert to ensure that
and if so how we can do that? Any suggestions?I think that the assertion added by ee3f8d3d3ae ensures that
pgstat_initialize() is callbed before the callback for fileset cleanup
is registered, no?
I think I am missing something here, can you please explain?
--
With Regards,
Amit Kapila.
Hi,
On 2021-08-12 11:46:09 +0530, Amit Kapila wrote:
On Thu, Aug 12, 2021 at 11:38 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, Aug 12, 2021 at 7:39 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
It seems to me that moving the shared fileset cleanup to
before_shmem_exit() is the right approach to fix this problem. The
issue is fixed by the attached patch.+1, the fix makes sense to me.
I'm not so sure. Why does sharedfileset have its own proc exit hook in the
first place? ISTM that this should be dealt with using resowners, rathers than
a sharedfileset specific mechanism?
That said, I think it's fine to go for the ordering change in the short term.
I have also tested and fix works for me. The fix works because
pgstat_initialize() is called before we register clean up in
SharedFileSetInit(). I am not sure if we need an Assert to ensure that
and if so how we can do that? Any suggestions?
I don't think we need to assert that - we'd see failures soon enough if
that rule were violated...
Greetings,
Andres Freund
On Thu, Aug 12, 2021 at 1:52 PM Andres Freund <andres@anarazel.de> wrote:
On 2021-08-12 11:46:09 +0530, Amit Kapila wrote:
On Thu, Aug 12, 2021 at 11:38 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, Aug 12, 2021 at 7:39 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
It seems to me that moving the shared fileset cleanup to
before_shmem_exit() is the right approach to fix this problem. The
issue is fixed by the attached patch.+1, the fix makes sense to me.
I'm not so sure. Why does sharedfileset have its own proc exit hook in the
first place? ISTM that this should be dealt with using resowners, rathers than
a sharedfileset specific mechanism?
The underlying temporary files need to be closed at xact end but need
to survive across transactions. These are registered with the resource
owner via PathNameOpenTemporaryFile/PathNameCreateTemporaryFile and
then closed at xact end. So, we need a way to remove the files used by
the process (apply worker in this particular case) before process exit
and used this proc_exit hook (possibly on the lines of
AtProcExit_Files).
That said, I think it's fine to go for the ordering change in the short term.
I have also tested and fix works for me. The fix works because
pgstat_initialize() is called before we register clean up in
SharedFileSetInit(). I am not sure if we need an Assert to ensure that
and if so how we can do that? Any suggestions?I don't think we need to assert that - we'd see failures soon enough if
that rule were violated...
Fair enough.
--
With Regards,
Amit Kapila.
Hi,
On 2021-08-12 15:06:23 +0530, Amit Kapila wrote:
On Thu, Aug 12, 2021 at 1:52 PM Andres Freund <andres@anarazel.de> wrote:
I'm not so sure. Why does sharedfileset have its own proc exit hook in the
first place? ISTM that this should be dealt with using resowners, rathers than
a sharedfileset specific mechanism?
The underlying temporary files need to be closed at xact end but need
to survive across transactions.
Why do they need to be closed at xact end? To avoid wasting memory due to too
many buffered files?
These are registered with the resource owner via
PathNameOpenTemporaryFile/PathNameCreateTemporaryFile and then closed
at xact end. So, we need a way to remove the files used by the process
(apply worker in this particular case) before process exit and used
this proc_exit hook (possibly on the lines of AtProcExit_Files).
What I'm wondering is why it is a good idea to have a SharedFileSet specific
cleanup mechanism. One that only operates on process lifetime level, rather
than something more granular. I get that the of the files here needs to be
longer than a transaction, but that can easily be addressed by having a longer
lived resource owner.
Process lifetime may work well for the current worker.c, but even there it
doesn't seem optimal. One e.g. could easily imagine that we'd want to handle
connection errors or configuration changes without restarting the worker, in
which case process lifetime obviously isn't a good idea anymore.
I think SharedFileSetInit() needs a comment explaining that it needs to be
called in a process-lifetime memory context if used without dsm
segments. Because otherwise SharedFileSetDeleteOnProcExit() will access
already freed memory (both for filesetlist and the SharedFileSet itself).
Greetings,
Andres Freund
Hi,
On 2021-08-12 05:48:19 -0700, Andres Freund wrote:
I think SharedFileSetInit() needs a comment explaining that it needs to be
called in a process-lifetime memory context if used without dsm
segments. Because otherwise SharedFileSetDeleteOnProcExit() will access
already freed memory (both for filesetlist and the SharedFileSet itself).
Oh. And I think it's not ok that SharedFileSetDeleteAll() unconditionally does
SharedFileSetUnregister(). SharedFileSetUnregister() asserts out if there's no
match, but DSM based sets are never entered into filesetlist. So one cannot
have a non-DSM and DSM set coexisting. Which seems surprising.
Greetings,
Andres Freund
On Thu, Aug 12, 2021 at 6:18 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2021-08-12 15:06:23 +0530, Amit Kapila wrote:
On Thu, Aug 12, 2021 at 1:52 PM Andres Freund <andres@anarazel.de> wrote:
I'm not so sure. Why does sharedfileset have its own proc exit hook in the
first place? ISTM that this should be dealt with using resowners, rathers than
a sharedfileset specific mechanism?The underlying temporary files need to be closed at xact end but need
to survive across transactions.Why do they need to be closed at xact end? To avoid wasting memory due to too
many buffered files?
Yes.
These are registered with the resource owner via
PathNameOpenTemporaryFile/PathNameCreateTemporaryFile and then closed
at xact end. So, we need a way to remove the files used by the process
(apply worker in this particular case) before process exit and used
this proc_exit hook (possibly on the lines of AtProcExit_Files).What I'm wondering is why it is a good idea to have a SharedFileSet specific
cleanup mechanism. One that only operates on process lifetime level, rather
than something more granular. I get that the of the files here needs to be
longer than a transaction, but that can easily be addressed by having a longer
lived resource owner.Process lifetime may work well for the current worker.c, but even there it
doesn't seem optimal. One e.g. could easily imagine that we'd want to handle
connection errors or configuration changes without restarting the worker, in
which case process lifetime obviously isn't a good idea anymore.
I don't deny that we can't make this at a more granular level. IIRC,
at that time, we tried to follow AtProcExit_Files which cleans up temp
files at proc exit and we needed something similar for temporary files
used via SharedFileSet. I think we can extend this API but I guess it
is better to then do it for dsm-based as well so that these get
tracked via resowner.
I think SharedFileSetInit() needs a comment explaining that it needs to be
called in a process-lifetime memory context if used without dsm
segments.
We already have a comment about proc_exit clean up of files but will
extend that a bit about memory context.
--
With Regards,
Amit Kapila.
On Thu, Aug 12, 2021 at 6:24 PM Andres Freund <andres@anarazel.de> wrote:
On 2021-08-12 05:48:19 -0700, Andres Freund wrote:
I think SharedFileSetInit() needs a comment explaining that it needs to be
called in a process-lifetime memory context if used without dsm
segments. Because otherwise SharedFileSetDeleteOnProcExit() will access
already freed memory (both for filesetlist and the SharedFileSet itself).Oh. And I think it's not ok that SharedFileSetDeleteAll() unconditionally does
SharedFileSetUnregister(). SharedFileSetUnregister() asserts out if there's no
match, but DSM based sets are never entered into filesetlist. So one cannot
have a non-DSM and DSM set coexisting. Which seems surprising.
Oops, it should be allowed to have both non-DSM and DSM set
coexisting. I think we can remove Assert from
SharedFileSetUnregister(). The other way could be to pass a parameter
to SharedFileSetDeleteAll() to tell whether to unregister or not.
--
With Regards,
Amit Kapila.
Hi,
(dropping -committers to avoid moderation stalls due xposting to multiple lists -
I find that more annoying than helpful)
On 2021-08-13 14:38:37 +0530, Amit Kapila wrote:
What I'm wondering is why it is a good idea to have a SharedFileSet specific
cleanup mechanism. One that only operates on process lifetime level, rather
than something more granular. I get that the of the files here needs to be
longer than a transaction, but that can easily be addressed by having a longer
lived resource owner.Process lifetime may work well for the current worker.c, but even there it
doesn't seem optimal. One e.g. could easily imagine that we'd want to handle
connection errors or configuration changes without restarting the worker, in
which case process lifetime obviously isn't a good idea anymore.I don't deny that we can't make this at a more granular level. IIRC,
at that time, we tried to follow AtProcExit_Files which cleans up temp
files at proc exit and we needed something similar for temporary files
used via SharedFileSet.
The comparison to AtProcExit_Files isn't convincing to me - normally temp
files are cleaned up long before that via resowner cleanup.
To me the reuse of SharedFileSet for worker.c as executed seems like a bad
design. As things stand there's little code shared between dsm/non-dsm shared
file sets. The cleanup semantics are entirely different. Several functions
don't work if used on the "wrong kind" of set (e.g. SharedFileSetAttach()).
I think we can extend this API but I guess it is better to then do it
for dsm-based as well so that these get tracked via resowner.
DSM segments are resowner managed already, so it's not obvious that that'd buy
us much? Although I guess there could be a few edge cases that'd look cleaner,
because we could reliably trigger cleanup in the leader instead of relying on
dsm detach hooks + refcounts to manage when a set is physically deleted?
Greetings,
Andres Freund
On Fri, Aug 13, 2021 at 9:29 PM Andres Freund <andres@anarazel.de> wrote:
I think we can extend this API but I guess it is better to then do it
for dsm-based as well so that these get tracked via resowner.DSM segments are resowner managed already, so it's not obvious that that'd buy
us much? Although I guess there could be a few edge cases that'd look cleaner,
because we could reliably trigger cleanup in the leader instead of relying on
dsm detach hooks + refcounts to manage when a set is physically deleted?
In an off-list discussion with Thomas and Amit, we tried to discuss
how to clean up the shared files set in the current use case. Thomas
suggested that instead of using individual shared fileset for storing
the data for each XID why don't we just create a single shared fileset
for complete worker lifetime and when the worker is exiting we can
just remove that shared fileset. And for storing XID data, we can
just create the files under the same shared fileset and delete those
files when we longer need them. I like this idea and it looks much
cleaner, after this, we can get rid of the special cleanup mechanism
using 'filesetlist'. I have attached a patch for the same.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
0001-Better-usage-of-sharedfileset-in-apply-worker.patchapplication/octet-stream; name=0001-Better-usage-of-sharedfileset-in-apply-worker.patchDownload
From 94c6673707af74638f85c33ae33c253c22f9ad63 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Mon, 16 Aug 2021 15:49:12 +0530
Subject: [PATCH] Better usage of sharedfileset in apply worker
Instead of using a separate shared fileset for each xid, use one shared
fileset for whole lifetime of the worker. So for each xid, just create
shared buffile under that shared fileset and remove the file whenever we
are done with the file. For subxact file we only need to create once
we get the first subtransaction and for detecting that we also extend the
buffile open and buffile delete interfaces to allow the missing files.
---
src/backend/replication/logical/worker.c | 217 +++++++-----------------------
src/backend/storage/file/buffile.c | 21 ++-
src/backend/storage/file/sharedfileset.c | 79 -----------
src/backend/utils/sort/logtape.c | 2 +-
src/backend/utils/sort/sharedtuplestore.c | 2 +-
src/include/storage/buffile.h | 5 +-
6 files changed, 68 insertions(+), 258 deletions(-)
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index ecaed15..b747593 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -221,20 +221,6 @@ typedef struct ApplyExecutionData
PartitionTupleRouting *proute; /* partition routing info */
} ApplyExecutionData;
-/*
- * Stream xid hash entry. Whenever we see a new xid we create this entry in the
- * xidhash and along with it create the streaming file and store the fileset handle.
- * The subxact file is created iff there is any subxact info under this xid. This
- * entry is used on the subsequent streams for the xid to get the corresponding
- * fileset handles, so storing them in hash makes the search faster.
- */
-typedef struct StreamXidHash
-{
- TransactionId xid; /* xid is the hash key and must be first */
- SharedFileSet *stream_fileset; /* shared file set for stream data */
- SharedFileSet *subxact_fileset; /* shared file set for subxact info */
-} StreamXidHash;
-
static MemoryContext ApplyMessageContext = NULL;
MemoryContext ApplyContext = NULL;
@@ -255,10 +241,12 @@ static bool in_streamed_transaction = false;
static TransactionId stream_xid = InvalidTransactionId;
/*
- * Hash table for storing the streaming xid information along with shared file
- * set for streaming and subxact files.
+ * Shared fileset for storing the changes and subxact information for the
+ * streaming transaction. We will use only one shared fileset and for each
+ * xid a separate changes and subxact files will be created under the same
+ * shared fileset.
*/
-static HTAB *xidhash = NULL;
+static SharedFileSet *xidfileset = NULL;
/* BufFile handle of the current streaming file */
static BufFile *stream_fd = NULL;
@@ -1129,7 +1117,6 @@ static void
apply_handle_stream_start(StringInfo s)
{
bool first_segment;
- HASHCTL hash_ctl;
if (in_streamed_transaction)
ereport(ERROR,
@@ -1157,17 +1144,22 @@ apply_handle_stream_start(StringInfo s)
errmsg_internal("invalid transaction ID in streamed replication transaction")));
/*
- * Initialize the xidhash table if we haven't yet. This will be used for
+ * Initialize the xidfileset if we haven't yet. This will be used for
* the entire duration of the apply worker so create it in permanent
* context.
*/
- if (xidhash == NULL)
+ if (xidfileset == NULL)
{
- hash_ctl.keysize = sizeof(TransactionId);
- hash_ctl.entrysize = sizeof(StreamXidHash);
- hash_ctl.hcxt = ApplyContext;
- xidhash = hash_create("StreamXidHash", 1024, &hash_ctl,
- HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ MemoryContext oldctx;
+
+ /*
+ * We need to maintain shared fileset across multiple stream
+ * start/stop calls. So, need to allocate it in a persistent context.
+ */
+ oldctx = MemoryContextSwitchTo(ApplyContext);
+ xidfileset = palloc(sizeof(SharedFileSet));
+ SharedFileSetInit(xidfileset, NULL);
+ MemoryContextSwitchTo(oldctx);
}
/* open the spool file for this transaction */
@@ -1258,7 +1250,6 @@ apply_handle_stream_abort(StringInfo s)
BufFile *fd;
bool found = false;
char path[MAXPGPATH];
- StreamXidHash *ent;
subidx = -1;
begin_replication_step();
@@ -1287,19 +1278,9 @@ apply_handle_stream_abort(StringInfo s)
return;
}
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
/* open the changes file */
changes_filename(path, MyLogicalRepWorker->subid, xid);
- fd = BufFileOpenShared(ent->stream_fileset, path, O_RDWR);
+ fd = BufFileOpenShared(xidfileset, path, O_RDWR, false);
/* OK, truncate the file at the right offset */
BufFileTruncateShared(fd, subxact_data.subxacts[subidx].fileno,
@@ -1327,7 +1308,6 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
int nchanges;
char path[MAXPGPATH];
char *buffer = NULL;
- StreamXidHash *ent;
MemoryContext oldcxt;
BufFile *fd;
@@ -1345,17 +1325,7 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
changes_filename(path, MyLogicalRepWorker->subid, xid);
elog(DEBUG1, "replaying changes from file \"%s\"", path);
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
- fd = BufFileOpenShared(ent->stream_fileset, path, O_RDONLY);
+ fd = BufFileOpenShared(xidfileset, path, O_RDONLY, false);
buffer = palloc(BLCKSZ);
initStringInfo(&s2);
@@ -2509,6 +2479,16 @@ UpdateWorkerStats(XLogRecPtr last_lsn, TimestampTz send_time, bool reply)
}
/*
+* Cleanup shared fileset if created.
+*/
+static void
+worker_cleanup(int code, Datum arg)
+{
+ if (xidfileset != NULL)
+ SharedFileSetDeleteAll(xidfileset);
+}
+
+/*
* Apply main loop.
*/
static void
@@ -2534,6 +2514,9 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
"LogicalStreamingContext",
ALLOCSET_DEFAULT_SIZES);
+ /* do cleanup on worker exit (e.g. after DROP SUBSCRIPTION) */
+ on_shmem_exit(worker_cleanup, (Datum) 0);
+
/* mark as idle, before starting to loop */
pgstat_report_activity(STATE_IDLE, NULL);
@@ -2957,18 +2940,11 @@ subxact_info_write(Oid subid, TransactionId xid)
{
char path[MAXPGPATH];
Size len;
- StreamXidHash *ent;
BufFile *fd;
Assert(TransactionIdIsValid(xid));
- /* Find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- /* By this time we must have created the transaction entry */
- Assert(ent);
+ subxact_filename(path, subid, xid);
/*
* If there is no subtransaction then nothing to do, but if already have
@@ -2976,39 +2952,17 @@ subxact_info_write(Oid subid, TransactionId xid)
*/
if (subxact_data.nsubxacts == 0)
{
- if (ent->subxact_fileset)
- {
- cleanup_subxact_info();
- SharedFileSetDeleteAll(ent->subxact_fileset);
- pfree(ent->subxact_fileset);
- ent->subxact_fileset = NULL;
- }
+ cleanup_subxact_info();
+ BufFileDeleteShared(xidfileset, path, true);
return;
}
subxact_filename(path, subid, xid);
- /*
- * Create the subxact file if it not already created, otherwise open the
- * existing file.
- */
- if (ent->subxact_fileset == NULL)
- {
- MemoryContext oldctx;
-
- /*
- * We need to maintain shared fileset across multiple stream
- * start/stop calls. So, need to allocate it in a persistent context.
- */
- oldctx = MemoryContextSwitchTo(ApplyContext);
- ent->subxact_fileset = palloc(sizeof(SharedFileSet));
- SharedFileSetInit(ent->subxact_fileset, NULL);
- MemoryContextSwitchTo(oldctx);
-
- fd = BufFileCreateShared(ent->subxact_fileset, path);
- }
- else
- fd = BufFileOpenShared(ent->subxact_fileset, path, O_RDWR);
+ /* Try to open the subxact file, if it doesn't exist then create it */
+ fd = BufFileOpenShared(xidfileset, path, O_RDWR, true);
+ if (fd == NULL)
+ fd = BufFileCreateShared(xidfileset, path);
len = sizeof(SubXactInfo) * subxact_data.nsubxacts;
@@ -3035,35 +2989,22 @@ subxact_info_read(Oid subid, TransactionId xid)
char path[MAXPGPATH];
Size len;
BufFile *fd;
- StreamXidHash *ent;
MemoryContext oldctx;
Assert(!subxact_data.subxacts);
Assert(subxact_data.nsubxacts == 0);
Assert(subxact_data.nsubxacts_max == 0);
- /* Find the stream xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
+ subxact_filename(path, subid, xid);
/*
- * If subxact_fileset is not valid that mean we don't have any subxact
- * info
+ * Open the subxact file. If subxact file is not created that mean we
+ * don't have any subxact info so nothing to be done.
*/
- if (ent->subxact_fileset == NULL)
+ fd = BufFileOpenShared(xidfileset, path, O_RDONLY, true);
+ if (fd == NULL)
return;
- subxact_filename(path, subid, xid);
-
- fd = BufFileOpenShared(ent->subxact_fileset, path, O_RDONLY);
-
/* read number of subxact items */
if (BufFileRead(fd, &subxact_data.nsubxacts,
sizeof(subxact_data.nsubxacts)) !=
@@ -3204,36 +3145,14 @@ static void
stream_cleanup_files(Oid subid, TransactionId xid)
{
char path[MAXPGPATH];
- StreamXidHash *ent;
-
- /* Find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
/* Delete the change file and release the stream fileset memory */
changes_filename(path, subid, xid);
- SharedFileSetDeleteAll(ent->stream_fileset);
- pfree(ent->stream_fileset);
- ent->stream_fileset = NULL;
+ BufFileDeleteShared(xidfileset, path, false);
/* Delete the subxact file and release the memory, if it exist */
- if (ent->subxact_fileset)
- {
- subxact_filename(path, subid, xid);
- SharedFileSetDeleteAll(ent->subxact_fileset);
- pfree(ent->subxact_fileset);
- ent->subxact_fileset = NULL;
- }
-
- /* Remove the xid entry from the stream xid hash */
- hash_search(xidhash, (void *) &xid, HASH_REMOVE, NULL);
+ subxact_filename(path, subid, xid);
+ SharedFileSetDelete(xidfileset, path, true);
}
/*
@@ -3253,21 +3172,13 @@ static void
stream_open_file(Oid subid, TransactionId xid, bool first_segment)
{
char path[MAXPGPATH];
- bool found;
MemoryContext oldcxt;
- StreamXidHash *ent;
Assert(in_streamed_transaction);
Assert(OidIsValid(subid));
Assert(TransactionIdIsValid(xid));
Assert(stream_fd == NULL);
- /* create or find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_ENTER,
- &found);
-
changes_filename(path, subid, xid);
elog(DEBUG1, "opening file \"%s\" for streamed changes", path);
@@ -3283,44 +3194,14 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
* writing, in append mode.
*/
if (first_segment)
- {
- MemoryContext savectx;
- SharedFileSet *fileset;
-
- if (found)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
-
- /*
- * We need to maintain shared fileset across multiple stream
- * start/stop calls. So, need to allocate it in a persistent context.
- */
- savectx = MemoryContextSwitchTo(ApplyContext);
- fileset = palloc(sizeof(SharedFileSet));
-
- SharedFileSetInit(fileset, NULL);
- MemoryContextSwitchTo(savectx);
-
- stream_fd = BufFileCreateShared(fileset, path);
-
- /* Remember the fileset for the next stream of the same transaction */
- ent->xid = xid;
- ent->stream_fileset = fileset;
- ent->subxact_fileset = NULL;
- }
+ stream_fd = BufFileCreateShared(xidfileset, path);
else
{
- if (!found)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
-
/*
* Open the file and seek to the end of the file because we always
* append the changes file.
*/
- stream_fd = BufFileOpenShared(ent->stream_fileset, path, O_RDWR);
+ stream_fd = BufFileOpenShared(xidfileset, path, O_RDWR, false);
BufFileSeek(stream_fd, 0, 0, SEEK_END);
}
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index a4be5fe..99a1d3d 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -278,10 +278,12 @@ BufFileCreateShared(SharedFileSet *fileset, const char *name)
* with BufFileCreateShared in the same SharedFileSet using the same name.
* The backend that created the file must have called BufFileClose() or
* BufFileExportShared() to make sure that it is ready to be opened by other
- * backends and render it read-only.
+ * backends and render it read-only. If missing_ok is true then it will return
+ * NULL if file doesn not exist otherwise error.
*/
BufFile *
-BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
+BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode,
+ bool missing_ok)
{
BufFile *file;
char segment_name[MAXPGPATH];
@@ -318,10 +320,14 @@ BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
* name.
*/
if (nfiles == 0)
+ {
+ if (missing_ok)
+ return NULL;
ereport(ERROR,
(errcode_for_file_access(),
- errmsg("could not open temporary file \"%s\" from BufFile \"%s\": %m",
+ errmsg("could not open temporary file \"%s\" from BufFile \"%s\": %m",
segment_name, name)));
+ }
file = makeBufFileCommon(nfiles);
file->files = files;
@@ -341,10 +347,11 @@ BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
* the SharedFileSet to be cleaned up.
*
* Only one backend should attempt to delete a given name, and should know
- * that it exists and has been exported or closed.
+ * that it exists and has been exported or closed otherwise missing_ok should
+ * be passed true.
*/
void
-BufFileDeleteShared(SharedFileSet *fileset, const char *name)
+BufFileDeleteShared(SharedFileSet *fileset, const char *name, bool missing_ok)
{
char segment_name[MAXPGPATH];
int segment = 0;
@@ -358,7 +365,7 @@ BufFileDeleteShared(SharedFileSet *fileset, const char *name)
for (;;)
{
SharedSegmentName(segment_name, name, segment);
- if (!SharedFileSetDelete(fileset, segment_name, true))
+ if (!SharedFileSetDelete(fileset, segment_name, !missing_ok))
break;
found = true;
++segment;
@@ -366,7 +373,7 @@ BufFileDeleteShared(SharedFileSet *fileset, const char *name)
CHECK_FOR_INTERRUPTS();
}
- if (!found)
+ if (!found && !missing_ok)
elog(ERROR, "could not delete unknown shared BufFile \"%s\"", name);
}
diff --git a/src/backend/storage/file/sharedfileset.c b/src/backend/storage/file/sharedfileset.c
index ed37c94..832b0a6 100644
--- a/src/backend/storage/file/sharedfileset.c
+++ b/src/backend/storage/file/sharedfileset.c
@@ -33,10 +33,7 @@
#include "storage/sharedfileset.h"
#include "utils/builtins.h"
-static List *filesetlist = NIL;
-
static void SharedFileSetOnDetach(dsm_segment *segment, Datum datum);
-static void SharedFileSetDeleteOnProcExit(int status, Datum arg);
static void SharedFileSetPath(char *path, SharedFileSet *fileset, Oid tablespace);
static void SharedFilePath(char *path, SharedFileSet *fileset, const char *name);
static Oid ChooseTablespace(const SharedFileSet *fileset, const char *name);
@@ -101,23 +98,6 @@ SharedFileSetInit(SharedFileSet *fileset, dsm_segment *seg)
/* Register our cleanup callback. */
if (seg)
on_dsm_detach(seg, SharedFileSetOnDetach, PointerGetDatum(fileset));
- else
- {
- static bool registered_cleanup = false;
-
- if (!registered_cleanup)
- {
- /*
- * We must not have registered any fileset before registering the
- * fileset clean up.
- */
- Assert(filesetlist == NIL);
- on_proc_exit(SharedFileSetDeleteOnProcExit, 0);
- registered_cleanup = true;
- }
-
- filesetlist = lcons((void *) fileset, filesetlist);
- }
}
/*
@@ -225,9 +205,6 @@ SharedFileSetDeleteAll(SharedFileSet *fileset)
SharedFileSetPath(dirpath, fileset, fileset->tablespaces[i]);
PathNameDeleteTemporaryDir(dirpath);
}
-
- /* Unregister the shared fileset */
- SharedFileSetUnregister(fileset);
}
/*
@@ -259,62 +236,6 @@ SharedFileSetOnDetach(dsm_segment *segment, Datum datum)
}
/*
- * Callback function that will be invoked on the process exit. This will
- * process the list of all the registered sharedfilesets and delete the
- * underlying files.
- */
-static void
-SharedFileSetDeleteOnProcExit(int status, Datum arg)
-{
- /*
- * Remove all the pending shared fileset entries. We don't use foreach()
- * here because SharedFileSetDeleteAll will remove the current element in
- * filesetlist. Though we have used foreach_delete_current() to remove the
- * element from filesetlist it could only fix up the state of one of the
- * loops, see SharedFileSetUnregister.
- */
- while (list_length(filesetlist) > 0)
- {
- SharedFileSet *fileset = (SharedFileSet *) linitial(filesetlist);
-
- SharedFileSetDeleteAll(fileset);
- }
-
- filesetlist = NIL;
-}
-
-/*
- * Unregister the shared fileset entry registered for cleanup on proc exit.
- */
-void
-SharedFileSetUnregister(SharedFileSet *input_fileset)
-{
- ListCell *l;
-
- /*
- * If the caller is following the dsm based cleanup then we don't maintain
- * the filesetlist so return.
- */
- if (filesetlist == NIL)
- return;
-
- foreach(l, filesetlist)
- {
- SharedFileSet *fileset = (SharedFileSet *) lfirst(l);
-
- /* Remove the entry from the list */
- if (input_fileset == fileset)
- {
- filesetlist = foreach_delete_current(filesetlist, l);
- return;
- }
- }
-
- /* Should have found a match */
- Assert(false);
-}
-
-/*
* Build the path for the directory holding the files backing a SharedFileSet
* in a given tablespace.
*/
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index cafc087..08612f0 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -564,7 +564,7 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
lt = <s->tapes[i];
pg_itoa(i, filename);
- file = BufFileOpenShared(fileset, filename, O_RDONLY);
+ file = BufFileOpenShared(fileset, filename, O_RDONLY, false);
filesize = BufFileSize(file);
/*
diff --git a/src/backend/utils/sort/sharedtuplestore.c b/src/backend/utils/sort/sharedtuplestore.c
index 57e35db..ad18991 100644
--- a/src/backend/utils/sort/sharedtuplestore.c
+++ b/src/backend/utils/sort/sharedtuplestore.c
@@ -559,7 +559,7 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
sts_filename(name, accessor, accessor->read_participant);
accessor->read_file =
- BufFileOpenShared(accessor->fileset, name, O_RDONLY);
+ BufFileOpenShared(accessor->fileset, name, O_RDONLY, false);
}
/* Seek and load the chunk header. */
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 566523d..3f997e1 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -49,8 +49,9 @@ extern long BufFileAppend(BufFile *target, BufFile *source);
extern BufFile *BufFileCreateShared(SharedFileSet *fileset, const char *name);
extern void BufFileExportShared(BufFile *file);
extern BufFile *BufFileOpenShared(SharedFileSet *fileset, const char *name,
- int mode);
-extern void BufFileDeleteShared(SharedFileSet *fileset, const char *name);
+ int mode, bool missing_ok);
+extern void BufFileDeleteShared(SharedFileSet *fileset, const char *name,
+ bool missing_ok);
extern void BufFileTruncateShared(BufFile *file, int fileno, off_t offset);
#endif /* BUFFILE_H */
--
1.8.3.1
On Mon, Aug 16, 2021 at 8:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Aug 13, 2021 at 9:29 PM Andres Freund <andres@anarazel.de> wrote:
I think we can extend this API but I guess it is better to then do it
for dsm-based as well so that these get tracked via resowner.DSM segments are resowner managed already, so it's not obvious that that'd buy
us much? Although I guess there could be a few edge cases that'd look cleaner,
because we could reliably trigger cleanup in the leader instead of relying on
dsm detach hooks + refcounts to manage when a set is physically deleted?In an off-list discussion with Thomas and Amit, we tried to discuss
how to clean up the shared files set in the current use case. Thomas
suggested that instead of using individual shared fileset for storing
the data for each XID why don't we just create a single shared fileset
for complete worker lifetime and when the worker is exiting we can
just remove that shared fileset. And for storing XID data, we can
just create the files under the same shared fileset and delete those
files when we longer need them. I like this idea and it looks much
cleaner, after this, we can get rid of the special cleanup mechanism
using 'filesetlist'. I have attached a patch for the same.
It seems to me that this idea would obviate any need for resource
owners as we will have only one fileset now. I have a few initial
comments on the patch:
1.
+ /* do cleanup on worker exit (e.g. after DROP SUBSCRIPTION) */
+ on_shmem_exit(worker_cleanup, (Datum) 0);
It should be registered with before_shmem_exit() hook to allow sending
stats for file removal.
2. After these changes, the comments atop stream_open_file and
SharedFileSetInit need to be changed.
3. In function subxact_info_write(), we are computing subxact file
path twice which doesn't seem to be required.
4.
+ if (missing_ok)
+ return NULL;
ereport(ERROR,
(errcode_for_file_access(),
- errmsg("could not open temporary file \"%s\" from BufFile \"%s\": %m",
+ errmsg("could not open temporary file \"%s\" from BufFile \"%s\": %m",
segment_name, name)));
There seems to be a formatting issue with errmsg. Also, it is better
to keep an empty line before ereport.
5. How can we provide a strict mechanism to not allow to use dsm APIs
for non-dsm FileSet? One idea could be that we can have a variable
(probably bool) in SharedFileSet structure which will be initialized
in SharedFileSetInit based on whether the caller has provided dsm
segment. Then in other DSM-based APIs, we can check if it is used for
the wrong type.
--
With Regards,
Amit Kapila.
On Tue, Aug 17, 2021 at 10:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Aug 16, 2021 at 8:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Aug 13, 2021 at 9:29 PM Andres Freund <andres@anarazel.de> wrote:
I think we can extend this API but I guess it is better to then do it
for dsm-based as well so that these get tracked via resowner.DSM segments are resowner managed already, so it's not obvious that that'd buy
us much? Although I guess there could be a few edge cases that'd look cleaner,
because we could reliably trigger cleanup in the leader instead of relying on
dsm detach hooks + refcounts to manage when a set is physically deleted?In an off-list discussion with Thomas and Amit, we tried to discuss
how to clean up the shared files set in the current use case. Thomas
suggested that instead of using individual shared fileset for storing
the data for each XID why don't we just create a single shared fileset
for complete worker lifetime and when the worker is exiting we can
just remove that shared fileset. And for storing XID data, we can
just create the files under the same shared fileset and delete those
files when we longer need them. I like this idea and it looks much
cleaner, after this, we can get rid of the special cleanup mechanism
using 'filesetlist'. I have attached a patch for the same.It seems to me that this idea would obviate any need for resource
owners as we will have only one fileset now. I have a few initial
comments on the patch:
One more comment:
@@ -2976,39 +2952,17 @@ subxact_info_write(Oid subid, TransactionId xid)
..
+ /* Try to open the subxact file, if it doesn't exist then create it */
+ fd = BufFileOpenShared(xidfileset, path, O_RDWR, true);
+ if (fd == NULL)
+ fd = BufFileCreateShared(xidfileset, path);
..
Instead of trying to create the file here based on whether it exists
or not, can't we create it in subxact_info_add where we are first time
allocating memory for subxacts? If that works then in the above code,
the file should always exist.
--
With Regards,
Amit Kapila.
On Tue, Aug 17, 2021 at 12:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
One more comment: @@ -2976,39 +2952,17 @@ subxact_info_write(Oid subid, TransactionId xid) .. + /* Try to open the subxact file, if it doesn't exist then create it */ + fd = BufFileOpenShared(xidfileset, path, O_RDWR, true); + if (fd == NULL) + fd = BufFileCreateShared(xidfileset, path); ..Instead of trying to create the file here based on whether it exists
or not, can't we create it in subxact_info_add where we are first time
allocating memory for subxacts? If that works then in the above code,
the file should always exist.
One problem with this approach is that for now we delay creating the
subxact file till the end of the stream and if by end of the stream
all the subtransactions got aborted within the same stream then we
don't even create that file. But with this suggestion, we will always
create the file as soon as we get the first subtransaction.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Tue, Aug 17, 2021 at 10:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Aug 16, 2021 at 8:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Aug 13, 2021 at 9:29 PM Andres Freund <andres@anarazel.de> wrote:
I think we can extend this API but I guess it is better to then do it
for dsm-based as well so that these get tracked via resowner.DSM segments are resowner managed already, so it's not obvious that that'd buy
us much? Although I guess there could be a few edge cases that'd look cleaner,
because we could reliably trigger cleanup in the leader instead of relying on
dsm detach hooks + refcounts to manage when a set is physically deleted?In an off-list discussion with Thomas and Amit, we tried to discuss
how to clean up the shared files set in the current use case. Thomas
suggested that instead of using individual shared fileset for storing
the data for each XID why don't we just create a single shared fileset
for complete worker lifetime and when the worker is exiting we can
just remove that shared fileset. And for storing XID data, we can
just create the files under the same shared fileset and delete those
files when we longer need them. I like this idea and it looks much
cleaner, after this, we can get rid of the special cleanup mechanism
using 'filesetlist'. I have attached a patch for the same.It seems to me that this idea would obviate any need for resource
owners as we will have only one fileset now. I have a few initial
comments on the patch:1. + /* do cleanup on worker exit (e.g. after DROP SUBSCRIPTION) */ + on_shmem_exit(worker_cleanup, (Datum) 0);It should be registered with before_shmem_exit() hook to allow sending
stats for file removal.
Done
2. After these changes, the comments atop stream_open_file and
SharedFileSetInit need to be changed.
Done
3. In function subxact_info_write(), we are computing subxact file
path twice which doesn't seem to be required.
Fixed
4. + if (missing_ok) + return NULL; ereport(ERROR, (errcode_for_file_access(), - errmsg("could not open temporary file \"%s\" from BufFile \"%s\": %m", + errmsg("could not open temporary file \"%s\" from BufFile \"%s\": %m", segment_name, name)));There seems to be a formatting issue with errmsg. Also, it is better
to keep an empty line before ereport.
Done
5. How can we provide a strict mechanism to not allow to use dsm APIs
for non-dsm FileSet? One idea could be that we can have a variable
(probably bool) in SharedFileSet structure which will be initialized
in SharedFileSetInit based on whether the caller has provided dsm
segment. Then in other DSM-based APIs, we can check if it is used for
the wrong type.
Yeah, we can do something like that, can't we just use an existing
variable instead of adding new, e.g. refcnt is required only when
multiple processes are attached, so maybe if dsm segment is not passed
then we can keep refcnt as 0 and based on we can give an error. For
example, if we try to call SharedFileSetAttach for the SharedFileSet
which has refcnt as 0 then we error out?
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v2-0001-Better-usage-of-sharedfileset-in-apply-worker.patchapplication/octet-stream; name=v2-0001-Better-usage-of-sharedfileset-in-apply-worker.patchDownload
From ec692a4880f2316d8f3bfe6a6f14f752bb05a757 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Mon, 16 Aug 2021 15:49:12 +0530
Subject: [PATCH v2] Better usage of sharedfileset in apply worker
Instead of using a separate shared fileset for each xid, use one shared
fileset for whole lifetime of the worker. So for each xid, just create
shared buffile under that shared fileset and remove the file whenever we
are done with the file. For subxact file we only need to create once
we get the first subtransaction and for detecting that we also extend the
buffile open and buffile delete interfaces to allow the missing files.
---
src/backend/replication/logical/worker.c | 226 +++++++-----------------------
src/backend/storage/file/buffile.c | 20 ++-
src/backend/storage/file/sharedfileset.c | 79 -----------
src/backend/utils/sort/logtape.c | 2 +-
src/backend/utils/sort/sharedtuplestore.c | 2 +-
src/include/storage/buffile.h | 5 +-
6 files changed, 73 insertions(+), 261 deletions(-)
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index ecaed15..a2481b3 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -221,20 +221,6 @@ typedef struct ApplyExecutionData
PartitionTupleRouting *proute; /* partition routing info */
} ApplyExecutionData;
-/*
- * Stream xid hash entry. Whenever we see a new xid we create this entry in the
- * xidhash and along with it create the streaming file and store the fileset handle.
- * The subxact file is created iff there is any subxact info under this xid. This
- * entry is used on the subsequent streams for the xid to get the corresponding
- * fileset handles, so storing them in hash makes the search faster.
- */
-typedef struct StreamXidHash
-{
- TransactionId xid; /* xid is the hash key and must be first */
- SharedFileSet *stream_fileset; /* shared file set for stream data */
- SharedFileSet *subxact_fileset; /* shared file set for subxact info */
-} StreamXidHash;
-
static MemoryContext ApplyMessageContext = NULL;
MemoryContext ApplyContext = NULL;
@@ -255,10 +241,12 @@ static bool in_streamed_transaction = false;
static TransactionId stream_xid = InvalidTransactionId;
/*
- * Hash table for storing the streaming xid information along with shared file
- * set for streaming and subxact files.
+ * Shared fileset for storing the changes and subxact information for the
+ * streaming transaction. We will use only one shared fileset and for each
+ * xid a separate changes and subxact files will be created under the same
+ * shared fileset.
*/
-static HTAB *xidhash = NULL;
+static SharedFileSet *xidfileset = NULL;
/* BufFile handle of the current streaming file */
static BufFile *stream_fd = NULL;
@@ -1129,7 +1117,6 @@ static void
apply_handle_stream_start(StringInfo s)
{
bool first_segment;
- HASHCTL hash_ctl;
if (in_streamed_transaction)
ereport(ERROR,
@@ -1157,17 +1144,22 @@ apply_handle_stream_start(StringInfo s)
errmsg_internal("invalid transaction ID in streamed replication transaction")));
/*
- * Initialize the xidhash table if we haven't yet. This will be used for
+ * Initialize the xidfileset if we haven't yet. This will be used for
* the entire duration of the apply worker so create it in permanent
* context.
*/
- if (xidhash == NULL)
+ if (xidfileset == NULL)
{
- hash_ctl.keysize = sizeof(TransactionId);
- hash_ctl.entrysize = sizeof(StreamXidHash);
- hash_ctl.hcxt = ApplyContext;
- xidhash = hash_create("StreamXidHash", 1024, &hash_ctl,
- HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ MemoryContext oldctx;
+
+ /*
+ * We need to keep the shared fileset for the worker lifetime so, need
+ * to allocate it in a persistent context.
+ */
+ oldctx = MemoryContextSwitchTo(ApplyContext);
+ xidfileset = palloc(sizeof(SharedFileSet));
+ SharedFileSetInit(xidfileset, NULL);
+ MemoryContextSwitchTo(oldctx);
}
/* open the spool file for this transaction */
@@ -1258,7 +1250,6 @@ apply_handle_stream_abort(StringInfo s)
BufFile *fd;
bool found = false;
char path[MAXPGPATH];
- StreamXidHash *ent;
subidx = -1;
begin_replication_step();
@@ -1287,19 +1278,9 @@ apply_handle_stream_abort(StringInfo s)
return;
}
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
/* open the changes file */
changes_filename(path, MyLogicalRepWorker->subid, xid);
- fd = BufFileOpenShared(ent->stream_fileset, path, O_RDWR);
+ fd = BufFileOpenShared(xidfileset, path, O_RDWR, false);
/* OK, truncate the file at the right offset */
BufFileTruncateShared(fd, subxact_data.subxacts[subidx].fileno,
@@ -1327,7 +1308,6 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
int nchanges;
char path[MAXPGPATH];
char *buffer = NULL;
- StreamXidHash *ent;
MemoryContext oldcxt;
BufFile *fd;
@@ -1345,17 +1325,7 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
changes_filename(path, MyLogicalRepWorker->subid, xid);
elog(DEBUG1, "replaying changes from file \"%s\"", path);
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
- fd = BufFileOpenShared(ent->stream_fileset, path, O_RDONLY);
+ fd = BufFileOpenShared(xidfileset, path, O_RDONLY, false);
buffer = palloc(BLCKSZ);
initStringInfo(&s2);
@@ -2509,6 +2479,16 @@ UpdateWorkerStats(XLogRecPtr last_lsn, TimestampTz send_time, bool reply)
}
/*
+* Cleanup shared fileset if created.
+*/
+static void
+worker_cleanup(int code, Datum arg)
+{
+ if (xidfileset != NULL)
+ SharedFileSetDeleteAll(xidfileset);
+}
+
+/*
* Apply main loop.
*/
static void
@@ -2534,6 +2514,12 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
"LogicalStreamingContext",
ALLOCSET_DEFAULT_SIZES);
+ /*
+ * Register before-shmem-exit hook to ensure fileset is dropped while we
+ * can still report stats for underlying temporary files.
+ */
+ before_shmem_exit(worker_cleanup, (Datum) 0);
+
/* mark as idle, before starting to loop */
pgstat_report_activity(STATE_IDLE, NULL);
@@ -2957,18 +2943,11 @@ subxact_info_write(Oid subid, TransactionId xid)
{
char path[MAXPGPATH];
Size len;
- StreamXidHash *ent;
BufFile *fd;
Assert(TransactionIdIsValid(xid));
- /* Find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- /* By this time we must have created the transaction entry */
- Assert(ent);
+ subxact_filename(path, subid, xid);
/*
* If there is no subtransaction then nothing to do, but if already have
@@ -2976,39 +2955,15 @@ subxact_info_write(Oid subid, TransactionId xid)
*/
if (subxact_data.nsubxacts == 0)
{
- if (ent->subxact_fileset)
- {
- cleanup_subxact_info();
- SharedFileSetDeleteAll(ent->subxact_fileset);
- pfree(ent->subxact_fileset);
- ent->subxact_fileset = NULL;
- }
+ cleanup_subxact_info();
+ BufFileDeleteShared(xidfileset, path, true);
return;
}
- subxact_filename(path, subid, xid);
-
- /*
- * Create the subxact file if it not already created, otherwise open the
- * existing file.
- */
- if (ent->subxact_fileset == NULL)
- {
- MemoryContext oldctx;
-
- /*
- * We need to maintain shared fileset across multiple stream
- * start/stop calls. So, need to allocate it in a persistent context.
- */
- oldctx = MemoryContextSwitchTo(ApplyContext);
- ent->subxact_fileset = palloc(sizeof(SharedFileSet));
- SharedFileSetInit(ent->subxact_fileset, NULL);
- MemoryContextSwitchTo(oldctx);
-
- fd = BufFileCreateShared(ent->subxact_fileset, path);
- }
- else
- fd = BufFileOpenShared(ent->subxact_fileset, path, O_RDWR);
+ /* Try to open the subxact file, if it doesn't exist then create it */
+ fd = BufFileOpenShared(xidfileset, path, O_RDWR, true);
+ if (fd == NULL)
+ fd = BufFileCreateShared(xidfileset, path);
len = sizeof(SubXactInfo) * subxact_data.nsubxacts;
@@ -3035,35 +2990,22 @@ subxact_info_read(Oid subid, TransactionId xid)
char path[MAXPGPATH];
Size len;
BufFile *fd;
- StreamXidHash *ent;
MemoryContext oldctx;
Assert(!subxact_data.subxacts);
Assert(subxact_data.nsubxacts == 0);
Assert(subxact_data.nsubxacts_max == 0);
- /* Find the stream xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
+ subxact_filename(path, subid, xid);
/*
- * If subxact_fileset is not valid that mean we don't have any subxact
- * info
+ * Open the subxact file. If subxact file is not created that mean we
+ * don't have any subxact info so nothing to be done.
*/
- if (ent->subxact_fileset == NULL)
+ fd = BufFileOpenShared(xidfileset, path, O_RDONLY, true);
+ if (fd == NULL)
return;
- subxact_filename(path, subid, xid);
-
- fd = BufFileOpenShared(ent->subxact_fileset, path, O_RDONLY);
-
/* read number of subxact items */
if (BufFileRead(fd, &subxact_data.nsubxacts,
sizeof(subxact_data.nsubxacts)) !=
@@ -3204,36 +3146,14 @@ static void
stream_cleanup_files(Oid subid, TransactionId xid)
{
char path[MAXPGPATH];
- StreamXidHash *ent;
-
- /* Find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
/* Delete the change file and release the stream fileset memory */
changes_filename(path, subid, xid);
- SharedFileSetDeleteAll(ent->stream_fileset);
- pfree(ent->stream_fileset);
- ent->stream_fileset = NULL;
+ BufFileDeleteShared(xidfileset, path, false);
/* Delete the subxact file and release the memory, if it exist */
- if (ent->subxact_fileset)
- {
- subxact_filename(path, subid, xid);
- SharedFileSetDeleteAll(ent->subxact_fileset);
- pfree(ent->subxact_fileset);
- ent->subxact_fileset = NULL;
- }
-
- /* Remove the xid entry from the stream xid hash */
- hash_search(xidhash, (void *) &xid, HASH_REMOVE, NULL);
+ subxact_filename(path, subid, xid);
+ SharedFileSetDelete(xidfileset, path, true);
}
/*
@@ -3243,8 +3163,8 @@ stream_cleanup_files(Oid subid, TransactionId xid)
*
* Open a file for streamed changes from a toplevel transaction identified
* by stream_xid (global variable). If it's the first chunk of streamed
- * changes for this transaction, initialize the shared fileset and create the
- * buffile, otherwise open the previously created file.
+ * changes for this transaction, create the buffile, otherwise open the
+ * previously created file.
*
* This can only be called at the beginning of a "streaming" block, i.e.
* between stream_start/stream_stop messages from the upstream.
@@ -3253,21 +3173,13 @@ static void
stream_open_file(Oid subid, TransactionId xid, bool first_segment)
{
char path[MAXPGPATH];
- bool found;
MemoryContext oldcxt;
- StreamXidHash *ent;
Assert(in_streamed_transaction);
Assert(OidIsValid(subid));
Assert(TransactionIdIsValid(xid));
Assert(stream_fd == NULL);
- /* create or find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_ENTER,
- &found);
-
changes_filename(path, subid, xid);
elog(DEBUG1, "opening file \"%s\" for streamed changes", path);
@@ -3283,44 +3195,14 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
* writing, in append mode.
*/
if (first_segment)
- {
- MemoryContext savectx;
- SharedFileSet *fileset;
-
- if (found)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
-
- /*
- * We need to maintain shared fileset across multiple stream
- * start/stop calls. So, need to allocate it in a persistent context.
- */
- savectx = MemoryContextSwitchTo(ApplyContext);
- fileset = palloc(sizeof(SharedFileSet));
-
- SharedFileSetInit(fileset, NULL);
- MemoryContextSwitchTo(savectx);
-
- stream_fd = BufFileCreateShared(fileset, path);
-
- /* Remember the fileset for the next stream of the same transaction */
- ent->xid = xid;
- ent->stream_fileset = fileset;
- ent->subxact_fileset = NULL;
- }
+ stream_fd = BufFileCreateShared(xidfileset, path);
else
{
- if (!found)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
-
/*
* Open the file and seek to the end of the file because we always
* append the changes file.
*/
- stream_fd = BufFileOpenShared(ent->stream_fileset, path, O_RDWR);
+ stream_fd = BufFileOpenShared(xidfileset, path, O_RDWR, false);
BufFileSeek(stream_fd, 0, 0, SEEK_END);
}
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index a4be5fe..55bbc3c 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -278,10 +278,12 @@ BufFileCreateShared(SharedFileSet *fileset, const char *name)
* with BufFileCreateShared in the same SharedFileSet using the same name.
* The backend that created the file must have called BufFileClose() or
* BufFileExportShared() to make sure that it is ready to be opened by other
- * backends and render it read-only.
+ * backends and render it read-only. If missing_ok is true then it will return
+ * NULL if file doesn not exist otherwise error.
*/
BufFile *
-BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
+BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode,
+ bool missing_ok)
{
BufFile *file;
char segment_name[MAXPGPATH];
@@ -318,10 +320,15 @@ BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
* name.
*/
if (nfiles == 0)
+ {
+ if (missing_ok)
+ return NULL;
+
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not open temporary file \"%s\" from BufFile \"%s\": %m",
segment_name, name)));
+ }
file = makeBufFileCommon(nfiles);
file->files = files;
@@ -341,10 +348,11 @@ BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
* the SharedFileSet to be cleaned up.
*
* Only one backend should attempt to delete a given name, and should know
- * that it exists and has been exported or closed.
+ * that it exists and has been exported or closed otherwise missing_ok should
+ * be passed true.
*/
void
-BufFileDeleteShared(SharedFileSet *fileset, const char *name)
+BufFileDeleteShared(SharedFileSet *fileset, const char *name, bool missing_ok)
{
char segment_name[MAXPGPATH];
int segment = 0;
@@ -358,7 +366,7 @@ BufFileDeleteShared(SharedFileSet *fileset, const char *name)
for (;;)
{
SharedSegmentName(segment_name, name, segment);
- if (!SharedFileSetDelete(fileset, segment_name, true))
+ if (!SharedFileSetDelete(fileset, segment_name, !missing_ok))
break;
found = true;
++segment;
@@ -366,7 +374,7 @@ BufFileDeleteShared(SharedFileSet *fileset, const char *name)
CHECK_FOR_INTERRUPTS();
}
- if (!found)
+ if (!found && !missing_ok)
elog(ERROR, "could not delete unknown shared BufFile \"%s\"", name);
}
diff --git a/src/backend/storage/file/sharedfileset.c b/src/backend/storage/file/sharedfileset.c
index ed37c94..832b0a6 100644
--- a/src/backend/storage/file/sharedfileset.c
+++ b/src/backend/storage/file/sharedfileset.c
@@ -33,10 +33,7 @@
#include "storage/sharedfileset.h"
#include "utils/builtins.h"
-static List *filesetlist = NIL;
-
static void SharedFileSetOnDetach(dsm_segment *segment, Datum datum);
-static void SharedFileSetDeleteOnProcExit(int status, Datum arg);
static void SharedFileSetPath(char *path, SharedFileSet *fileset, Oid tablespace);
static void SharedFilePath(char *path, SharedFileSet *fileset, const char *name);
static Oid ChooseTablespace(const SharedFileSet *fileset, const char *name);
@@ -101,23 +98,6 @@ SharedFileSetInit(SharedFileSet *fileset, dsm_segment *seg)
/* Register our cleanup callback. */
if (seg)
on_dsm_detach(seg, SharedFileSetOnDetach, PointerGetDatum(fileset));
- else
- {
- static bool registered_cleanup = false;
-
- if (!registered_cleanup)
- {
- /*
- * We must not have registered any fileset before registering the
- * fileset clean up.
- */
- Assert(filesetlist == NIL);
- on_proc_exit(SharedFileSetDeleteOnProcExit, 0);
- registered_cleanup = true;
- }
-
- filesetlist = lcons((void *) fileset, filesetlist);
- }
}
/*
@@ -225,9 +205,6 @@ SharedFileSetDeleteAll(SharedFileSet *fileset)
SharedFileSetPath(dirpath, fileset, fileset->tablespaces[i]);
PathNameDeleteTemporaryDir(dirpath);
}
-
- /* Unregister the shared fileset */
- SharedFileSetUnregister(fileset);
}
/*
@@ -259,62 +236,6 @@ SharedFileSetOnDetach(dsm_segment *segment, Datum datum)
}
/*
- * Callback function that will be invoked on the process exit. This will
- * process the list of all the registered sharedfilesets and delete the
- * underlying files.
- */
-static void
-SharedFileSetDeleteOnProcExit(int status, Datum arg)
-{
- /*
- * Remove all the pending shared fileset entries. We don't use foreach()
- * here because SharedFileSetDeleteAll will remove the current element in
- * filesetlist. Though we have used foreach_delete_current() to remove the
- * element from filesetlist it could only fix up the state of one of the
- * loops, see SharedFileSetUnregister.
- */
- while (list_length(filesetlist) > 0)
- {
- SharedFileSet *fileset = (SharedFileSet *) linitial(filesetlist);
-
- SharedFileSetDeleteAll(fileset);
- }
-
- filesetlist = NIL;
-}
-
-/*
- * Unregister the shared fileset entry registered for cleanup on proc exit.
- */
-void
-SharedFileSetUnregister(SharedFileSet *input_fileset)
-{
- ListCell *l;
-
- /*
- * If the caller is following the dsm based cleanup then we don't maintain
- * the filesetlist so return.
- */
- if (filesetlist == NIL)
- return;
-
- foreach(l, filesetlist)
- {
- SharedFileSet *fileset = (SharedFileSet *) lfirst(l);
-
- /* Remove the entry from the list */
- if (input_fileset == fileset)
- {
- filesetlist = foreach_delete_current(filesetlist, l);
- return;
- }
- }
-
- /* Should have found a match */
- Assert(false);
-}
-
-/*
* Build the path for the directory holding the files backing a SharedFileSet
* in a given tablespace.
*/
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index cafc087..08612f0 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -564,7 +564,7 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
lt = <s->tapes[i];
pg_itoa(i, filename);
- file = BufFileOpenShared(fileset, filename, O_RDONLY);
+ file = BufFileOpenShared(fileset, filename, O_RDONLY, false);
filesize = BufFileSize(file);
/*
diff --git a/src/backend/utils/sort/sharedtuplestore.c b/src/backend/utils/sort/sharedtuplestore.c
index 57e35db..ad18991 100644
--- a/src/backend/utils/sort/sharedtuplestore.c
+++ b/src/backend/utils/sort/sharedtuplestore.c
@@ -559,7 +559,7 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
sts_filename(name, accessor, accessor->read_participant);
accessor->read_file =
- BufFileOpenShared(accessor->fileset, name, O_RDONLY);
+ BufFileOpenShared(accessor->fileset, name, O_RDONLY, false);
}
/* Seek and load the chunk header. */
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 566523d..3f997e1 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -49,8 +49,9 @@ extern long BufFileAppend(BufFile *target, BufFile *source);
extern BufFile *BufFileCreateShared(SharedFileSet *fileset, const char *name);
extern void BufFileExportShared(BufFile *file);
extern BufFile *BufFileOpenShared(SharedFileSet *fileset, const char *name,
- int mode);
-extern void BufFileDeleteShared(SharedFileSet *fileset, const char *name);
+ int mode, bool missing_ok);
+extern void BufFileDeleteShared(SharedFileSet *fileset, const char *name,
+ bool missing_ok);
extern void BufFileTruncateShared(BufFile *file, int fileno, off_t offset);
#endif /* BUFFILE_H */
--
1.8.3.1
On Tue, Aug 17, 2021 at 1:30 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Tue, Aug 17, 2021 at 10:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
5. How can we provide a strict mechanism to not allow to use dsm APIs
for non-dsm FileSet? One idea could be that we can have a variable
(probably bool) in SharedFileSet structure which will be initialized
in SharedFileSetInit based on whether the caller has provided dsm
segment. Then in other DSM-based APIs, we can check if it is used for
the wrong type.Yeah, we can do something like that, can't we just use an existing
variable instead of adding new, e.g. refcnt is required only when
multiple processes are attached, so maybe if dsm segment is not passed
then we can keep refcnt as 0 and based on we can give an error. For
example, if we try to call SharedFileSetAttach for the SharedFileSet
which has refcnt as 0 then we error out?
But as of now, we treat refcnt as 0 for SharedFileSet that is already
destroyed. See SharedFileSetAttach.
--
With Regards,
Amit Kapila.
Hi,
On 2021-08-17 10:54:30 +0530, Amit Kapila wrote:
5. How can we provide a strict mechanism to not allow to use dsm APIs
for non-dsm FileSet? One idea could be that we can have a variable
(probably bool) in SharedFileSet structure which will be initialized
in SharedFileSetInit based on whether the caller has provided dsm
segment. Then in other DSM-based APIs, we can check if it is used for
the wrong type.
Well, isn't the issue here that it's not a shared file set in case you
explicitly don't want to share it? ISTM that the proper way to address
this would be to split out a FileSet from SharedFileSet that's then used
for worker.c and sharedfileset.c. Rather than making sharedfileset.c
support a non-shared mode.
Greetings,
Andres Freund
Hi,
I took a quick look at the v2 patch and noticed a typo.
+ * backends and render it read-only. If missing_ok is true then it will return
+ * NULL if file doesn not exist otherwise error.
*/
doesn not=> doesn't
Best regards,
Houzj
On Wed, Aug 18, 2021 9:17 AM houzj.fnst@fujitsu.com wrote:
Hi,
I took a quick look at the v2 patch and noticed a typo.
+ * backends and render it read-only. If missing_ok is true then it will return + * NULL if file doesn not exist otherwise error. */ doesn not=> doesn't
Here are some other comments:
1).
+BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode,
+ bool missing_ok)
{
BufFile *file;
char segment_name[MAXPGPATH];
...
files = palloc(sizeof(File) * capacity);
...
@@ -318,10 +320,15 @@ BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
* name.
*/
if (nfiles == 0)
+ {
+ if (missing_ok)
+ return NULL;
+
I think it might be better to pfree(files) when return NULL.
2).
/* Delete the subxact file and release the memory, if it exist */
- if (ent->subxact_fileset)
- {
- subxact_filename(path, subid, xid);
- SharedFileSetDeleteAll(ent->subxact_fileset);
- pfree(ent->subxact_fileset);
- ent->subxact_fileset = NULL;
- }
-
- /* Remove the xid entry from the stream xid hash */
- hash_search(xidhash, (void *) &xid, HASH_REMOVE, NULL);
+ subxact_filename(path, subid, xid);
+ SharedFileSetDelete(xidfileset, path, true);
Without the patch it doesn't throw an error if not exist,
But with the patch, it pass error_on_failure=true to SharedFileSetDelete().
Was it intentional ?
Best regards,
Houzj
On Wed, Aug 18, 2021 at 8:23 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
On Wed, Aug 18, 2021 9:17 AM houzj.fnst@fujitsu.com wrote:
Hi,
I took a quick look at the v2 patch and noticed a typo.
+ * backends and render it read-only. If missing_ok is true then it will return + * NULL if file doesn not exist otherwise error. */ doesn not=> doesn'tHere are some other comments:
1). +BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode, + bool missing_ok) { BufFile *file; char segment_name[MAXPGPATH]; ... files = palloc(sizeof(File) * capacity); ... @@ -318,10 +320,15 @@ BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode) * name. */ if (nfiles == 0) + { + if (missing_ok) + return NULL; +I think it might be better to pfree(files) when return NULL.
2). /* Delete the subxact file and release the memory, if it exist */ - if (ent->subxact_fileset) - { - subxact_filename(path, subid, xid); - SharedFileSetDeleteAll(ent->subxact_fileset); - pfree(ent->subxact_fileset); - ent->subxact_fileset = NULL; - } - - /* Remove the xid entry from the stream xid hash */ - hash_search(xidhash, (void *) &xid, HASH_REMOVE, NULL); + subxact_filename(path, subid, xid); + SharedFileSetDelete(xidfileset, path, true);Without the patch it doesn't throw an error if not exist,
But with the patch, it pass error_on_failure=true to SharedFileSetDelete().
Don't we need to use BufFileDeleteShared instead of
SharedFileSetDelete as you have used to remove the changes file?
--
With Regards,
Amit Kapila.
On Wed, Aug 18, 2021 at 9:30 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Aug 18, 2021 at 8:23 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:On Wed, Aug 18, 2021 9:17 AM houzj.fnst@fujitsu.com wrote:
Hi,
I took a quick look at the v2 patch and noticed a typo.
+ * backends and render it read-only. If missing_ok is true then it will return + * NULL if file doesn not exist otherwise error. */ doesn not=> doesn'tHere are some other comments:
1). +BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode, + bool missing_ok) { BufFile *file; char segment_name[MAXPGPATH]; ... files = palloc(sizeof(File) * capacity); ... @@ -318,10 +320,15 @@ BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode) * name. */ if (nfiles == 0) + { + if (missing_ok) + return NULL; +I think it might be better to pfree(files) when return NULL.
2). /* Delete the subxact file and release the memory, if it exist */ - if (ent->subxact_fileset) - { - subxact_filename(path, subid, xid); - SharedFileSetDeleteAll(ent->subxact_fileset); - pfree(ent->subxact_fileset); - ent->subxact_fileset = NULL; - } - - /* Remove the xid entry from the stream xid hash */ - hash_search(xidhash, (void *) &xid, HASH_REMOVE, NULL); + subxact_filename(path, subid, xid); + SharedFileSetDelete(xidfileset, path, true);Without the patch it doesn't throw an error if not exist,
But with the patch, it pass error_on_failure=true to SharedFileSetDelete().Don't we need to use BufFileDeleteShared instead of
SharedFileSetDelete as you have used to remove the changes file?
Yeah, it should be BufFileDeleteShared.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Tue, Aug 17, 2021 at 4:34 PM Andres Freund <andres@anarazel.de> wrote:
On 2021-08-17 10:54:30 +0530, Amit Kapila wrote:
5. How can we provide a strict mechanism to not allow to use dsm APIs
for non-dsm FileSet? One idea could be that we can have a variable
(probably bool) in SharedFileSet structure which will be initialized
in SharedFileSetInit based on whether the caller has provided dsm
segment. Then in other DSM-based APIs, we can check if it is used for
the wrong type.Well, isn't the issue here that it's not a shared file set in case you
explicitly don't want to share it? ISTM that the proper way to address
this would be to split out a FileSet from SharedFileSet that's then used
for worker.c and sharedfileset.c.
Okay, but note that to accomplish the same, we need to tweak the
BufFile (buffile.c) APIs as well so that they can work with FileSet.
As per the initial analysis, there doesn't seem to be any problem with
that though.
--
With Regards,
Amit Kapila.
On Wed, Aug 18, 2021 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Aug 17, 2021 at 4:34 PM Andres Freund <andres@anarazel.de> wrote:
On 2021-08-17 10:54:30 +0530, Amit Kapila wrote:
5. How can we provide a strict mechanism to not allow to use dsm APIs
for non-dsm FileSet? One idea could be that we can have a variable
(probably bool) in SharedFileSet structure which will be initialized
in SharedFileSetInit based on whether the caller has provided dsm
segment. Then in other DSM-based APIs, we can check if it is used for
the wrong type.Well, isn't the issue here that it's not a shared file set in case you
explicitly don't want to share it? ISTM that the proper way to address
this would be to split out a FileSet from SharedFileSet that's then used
for worker.c and sharedfileset.c.Okay, but note that to accomplish the same, we need to tweak the
BufFile (buffile.c) APIs as well so that they can work with FileSet.
As per the initial analysis, there doesn't seem to be any problem with
that though.
I was looking into this, so if we want to do that I think the outline
will look like this
- There will be a fileset.c and fileset.h files, and we will expose a
new structure FileSet, which will be the same as SharedFileSet, except
mutext and refcount. The fileset.c will expose FileSetInit(),
FileSetCreate(), FileSetOpen(), FileSetDelete() and FileSetDeleteAll()
interfaces.
- sharefileset.c will internally call the fileset.c's interfaces. The
SharedFileSet structure will also contain FileSet and other members
i.e. mutex and refcount.
- the buffile.c's interfaces which are ending with Shared e.g.
BufFileCreateShared, BufFileOpenShared, should be converted to
BufFileCreate and
BufFileOpen respectively. And the input to these interfaces can be
converted to FileSet instead of SharedFileSet.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Aug 18, 2021 at 3:45 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Aug 18, 2021 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Aug 17, 2021 at 4:34 PM Andres Freund <andres@anarazel.de> wrote:
On 2021-08-17 10:54:30 +0530, Amit Kapila wrote:
5. How can we provide a strict mechanism to not allow to use dsm APIs
for non-dsm FileSet? One idea could be that we can have a variable
(probably bool) in SharedFileSet structure which will be initialized
in SharedFileSetInit based on whether the caller has provided dsm
segment. Then in other DSM-based APIs, we can check if it is used for
the wrong type.Well, isn't the issue here that it's not a shared file set in case you
explicitly don't want to share it? ISTM that the proper way to address
this would be to split out a FileSet from SharedFileSet that's then used
for worker.c and sharedfileset.c.Okay, but note that to accomplish the same, we need to tweak the
BufFile (buffile.c) APIs as well so that they can work with FileSet.
As per the initial analysis, there doesn't seem to be any problem with
that though.I was looking into this, so if we want to do that I think the outline
will look like this- There will be a fileset.c and fileset.h files, and we will expose a
new structure FileSet, which will be the same as SharedFileSet, except
mutext and refcount. The fileset.c will expose FileSetInit(),
FileSetCreate(), FileSetOpen(), FileSetDelete() and FileSetDeleteAll()
interfaces.- sharefileset.c will internally call the fileset.c's interfaces. The
SharedFileSet structure will also contain FileSet and other members
i.e. mutex and refcount.- the buffile.c's interfaces which are ending with Shared e.g.
BufFileCreateShared, BufFileOpenShared, should be converted to
BufFileCreate and
BufFileOpen respectively.
The other alternative to name buffile APIs could be
BufFileCreateFileSet, BufFileOpenFileSet, etc.
--
With Regards,
Amit Kapila.
On Wed, Aug 18, 2021 at 3:45 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Aug 18, 2021 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Aug 17, 2021 at 4:34 PM Andres Freund <andres@anarazel.de> wrote:
On 2021-08-17 10:54:30 +0530, Amit Kapila wrote:
5. How can we provide a strict mechanism to not allow to use dsm APIs
for non-dsm FileSet? One idea could be that we can have a variable
(probably bool) in SharedFileSet structure which will be initialized
in SharedFileSetInit based on whether the caller has provided dsm
segment. Then in other DSM-based APIs, we can check if it is used for
the wrong type.Well, isn't the issue here that it's not a shared file set in case you
explicitly don't want to share it? ISTM that the proper way to address
this would be to split out a FileSet from SharedFileSet that's then used
for worker.c and sharedfileset.c.Okay, but note that to accomplish the same, we need to tweak the
BufFile (buffile.c) APIs as well so that they can work with FileSet.
As per the initial analysis, there doesn't seem to be any problem with
that though.I was looking into this, so if we want to do that I think the outline
will look like this- There will be a fileset.c and fileset.h files, and we will expose a
new structure FileSet, which will be the same as SharedFileSet, except
mutext and refcount. The fileset.c will expose FileSetInit(),
FileSetCreate(), FileSetOpen(), FileSetDelete() and FileSetDeleteAll()
interfaces.- sharefileset.c will internally call the fileset.c's interfaces. The
SharedFileSet structure will also contain FileSet and other members
i.e. mutex and refcount.- the buffile.c's interfaces which are ending with Shared e.g.
BufFileCreateShared, BufFileOpenShared, should be converted to
BufFileCreate and
BufFileOpen respectively. And the input to these interfaces can be
converted to FileSet instead of SharedFileSet.
Here is the first draft based on the idea we discussed, 0001, splits
sharedfileset.c in sharedfileset.c and fileset.c and 0002 is same
patch I submitted earlier(use single fileset throughout the worker),
just it is rebased on top of 0001. Please let me know your thoughts.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v1-0002-Better-usage-of-sharedfileset-in-apply-worker.patchtext/x-patch; charset=US-ASCII; name=v1-0002-Better-usage-of-sharedfileset-in-apply-worker.patchDownload
From 7368136290a67e49f6bf0ad5773be67243e99637 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Fri, 20 Aug 2021 11:32:33 +0530
Subject: [PATCH v1 2/2] Better usage of sharedfileset in apply worker
Instead of using a separate shared fileset for each xid, use one shared
fileset for whole lifetime of the worker. So for each xid, just create
shared buffile under that shared fileset and remove the file whenever we
are done with the file. For subxact file we only need to create once
we get the first subtransaction and for detecting that we also extend the
buffile open and buffile delete interfaces to allow the missing files.
---
src/backend/replication/logical/worker.c | 229 +++++++-----------------------
src/backend/storage/file/buffile.c | 23 ++-
src/backend/utils/sort/logtape.c | 2 +-
src/backend/utils/sort/sharedtuplestore.c | 3 +-
src/include/storage/buffile.h | 5 +-
5 files changed, 76 insertions(+), 186 deletions(-)
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 9901cf6..77cad7f 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -221,20 +221,6 @@ typedef struct ApplyExecutionData
PartitionTupleRouting *proute; /* partition routing info */
} ApplyExecutionData;
-/*
- * Stream xid hash entry. Whenever we see a new xid we create this entry in the
- * xidhash and along with it create the streaming file and store the fileset handle.
- * The subxact file is created iff there is any subxact info under this xid. This
- * entry is used on the subsequent streams for the xid to get the corresponding
- * fileset handles, so storing them in hash makes the search faster.
- */
-typedef struct StreamXidHash
-{
- TransactionId xid; /* xid is the hash key and must be first */
- FileSet *stream_fileset; /* shared file set for stream data */
- FileSet *subxact_fileset; /* shared file set for subxact info */
-} StreamXidHash;
-
static MemoryContext ApplyMessageContext = NULL;
MemoryContext ApplyContext = NULL;
@@ -255,10 +241,12 @@ static bool in_streamed_transaction = false;
static TransactionId stream_xid = InvalidTransactionId;
/*
- * Hash table for storing the streaming xid information along with shared file
- * set for streaming and subxact files.
+ * Fileset for storing the changes and subxact information for the streaming
+ * transaction. We will use only one fileset and for each xid a separate
+ * changes and subxact files will be created under the same fileset.
*/
-static HTAB *xidhash = NULL;
+static FileSet *xidfileset = NULL;
+
/* BufFile handle of the current streaming file */
static BufFile *stream_fd = NULL;
@@ -1129,7 +1117,6 @@ static void
apply_handle_stream_start(StringInfo s)
{
bool first_segment;
- HASHCTL hash_ctl;
if (in_streamed_transaction)
ereport(ERROR,
@@ -1157,17 +1144,21 @@ apply_handle_stream_start(StringInfo s)
errmsg_internal("invalid transaction ID in streamed replication transaction")));
/*
- * Initialize the xidhash table if we haven't yet. This will be used for
- * the entire duration of the apply worker so create it in permanent
- * context.
+ * Initialize the xidfileset if we haven't yet. This will be used for the
+ * entire duration of the apply worker so create it in permanent context.
*/
- if (xidhash == NULL)
+ if (xidfileset == NULL)
{
- hash_ctl.keysize = sizeof(TransactionId);
- hash_ctl.entrysize = sizeof(StreamXidHash);
- hash_ctl.hcxt = ApplyContext;
- xidhash = hash_create("StreamXidHash", 1024, &hash_ctl,
- HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ MemoryContext oldctx;
+
+ /*
+ * We need to keep the shared fileset for the worker lifetime so, need
+ * to allocate it in a persistent context.
+ */
+ oldctx = MemoryContextSwitchTo(ApplyContext);
+ xidfileset = palloc(sizeof(FileSet));
+ FileSetInit(xidfileset);
+ MemoryContextSwitchTo(oldctx);
}
/* open the spool file for this transaction */
@@ -1258,7 +1249,6 @@ apply_handle_stream_abort(StringInfo s)
BufFile *fd;
bool found = false;
char path[MAXPGPATH];
- StreamXidHash *ent;
subidx = -1;
begin_replication_step();
@@ -1287,19 +1277,9 @@ apply_handle_stream_abort(StringInfo s)
return;
}
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
/* open the changes file */
changes_filename(path, MyLogicalRepWorker->subid, xid);
- fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
+ fd = BufFileOpenFileSet(xidfileset, path, O_RDWR, false);
/* OK, truncate the file at the right offset */
BufFileTruncateFileSet(fd, subxact_data.subxacts[subidx].fileno,
@@ -1327,7 +1307,6 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
int nchanges;
char path[MAXPGPATH];
char *buffer = NULL;
- StreamXidHash *ent;
MemoryContext oldcxt;
BufFile *fd;
@@ -1345,17 +1324,7 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
changes_filename(path, MyLogicalRepWorker->subid, xid);
elog(DEBUG1, "replaying changes from file \"%s\"", path);
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
- fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(xidfileset, path, O_RDONLY, false);
buffer = palloc(BLCKSZ);
initStringInfo(&s2);
@@ -2509,6 +2478,16 @@ UpdateWorkerStats(XLogRecPtr last_lsn, TimestampTz send_time, bool reply)
}
/*
+ * Cleanup fileset if created.
+ */
+static void
+worker_cleanup(int code, Datum arg)
+{
+ if (xidfileset != NULL)
+ FileSetDeleteAll(xidfileset);
+}
+
+/*
* Apply main loop.
*/
static void
@@ -2534,6 +2513,12 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
"LogicalStreamingContext",
ALLOCSET_DEFAULT_SIZES);
+ /*
+ * Register before-shmem-exit hook to ensure fileset is dropped while we
+ * can still report stats for underlying temporary files.
+ */
+ before_shmem_exit(worker_cleanup, (Datum) 0);
+
/* mark as idle, before starting to loop */
pgstat_report_activity(STATE_IDLE, NULL);
@@ -2957,18 +2942,11 @@ subxact_info_write(Oid subid, TransactionId xid)
{
char path[MAXPGPATH];
Size len;
- StreamXidHash *ent;
BufFile *fd;
Assert(TransactionIdIsValid(xid));
- /* Find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- /* By this time we must have created the transaction entry */
- Assert(ent);
+ subxact_filename(path, subid, xid);
/*
* If there is no subtransaction then nothing to do, but if already have
@@ -2976,39 +2954,15 @@ subxact_info_write(Oid subid, TransactionId xid)
*/
if (subxact_data.nsubxacts == 0)
{
- if (ent->subxact_fileset)
- {
- cleanup_subxact_info();
- FileSetDeleteAll(ent->subxact_fileset);
- pfree(ent->subxact_fileset);
- ent->subxact_fileset = NULL;
- }
+ cleanup_subxact_info();
+ BufFileDeleteFileSet(xidfileset, path, true);
return;
}
- subxact_filename(path, subid, xid);
-
- /*
- * Create the subxact file if it not already created, otherwise open the
- * existing file.
- */
- if (ent->subxact_fileset == NULL)
- {
- MemoryContext oldctx;
-
- /*
- * We need to maintain shared fileset across multiple stream
- * start/stop calls. So, need to allocate it in a persistent context.
- */
- oldctx = MemoryContextSwitchTo(ApplyContext);
- ent->subxact_fileset = palloc(sizeof(FileSet));
- FileSetInit(ent->subxact_fileset);
- MemoryContextSwitchTo(oldctx);
-
- fd = BufFileCreateFileSet(ent->subxact_fileset, path);
- }
- else
- fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDWR);
+ /* Try to open the subxact file, if it doesn't exist then create it */
+ fd = BufFileOpenFileSet(xidfileset, path, O_RDWR, true);
+ if (fd == NULL)
+ fd = BufFileCreateFileSet(xidfileset, path);
len = sizeof(SubXactInfo) * subxact_data.nsubxacts;
@@ -3035,34 +2989,17 @@ subxact_info_read(Oid subid, TransactionId xid)
char path[MAXPGPATH];
Size len;
BufFile *fd;
- StreamXidHash *ent;
MemoryContext oldctx;
Assert(!subxact_data.subxacts);
Assert(subxact_data.nsubxacts == 0);
Assert(subxact_data.nsubxacts_max == 0);
- /* Find the stream xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
- /*
- * If subxact_fileset is not valid that mean we don't have any subxact
- * info
- */
- if (ent->subxact_fileset == NULL)
- return;
-
subxact_filename(path, subid, xid);
- fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(xidfileset, path, O_RDONLY, true);
+ if (fd == NULL)
+ return;
/* read number of subxact items */
if (BufFileRead(fd, &subxact_data.nsubxacts,
@@ -3204,36 +3141,13 @@ static void
stream_cleanup_files(Oid subid, TransactionId xid)
{
char path[MAXPGPATH];
- StreamXidHash *ent;
-
- /* Find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
/* Delete the change file and release the stream fileset memory */
changes_filename(path, subid, xid);
- FileSetDeleteAll(ent->stream_fileset);
- pfree(ent->stream_fileset);
- ent->stream_fileset = NULL;
+ BufFileDeleteFileSet(xidfileset, path, false);
- /* Delete the subxact file and release the memory, if it exist */
- if (ent->subxact_fileset)
- {
- subxact_filename(path, subid, xid);
- FileSetDeleteAll(ent->subxact_fileset);
- pfree(ent->subxact_fileset);
- ent->subxact_fileset = NULL;
- }
-
- /* Remove the xid entry from the stream xid hash */
- hash_search(xidhash, (void *) &xid, HASH_REMOVE, NULL);
+ subxact_filename(path, subid, xid);
+ BufFileDeleteFileSet(xidfileset, path, true);
}
/*
@@ -3243,8 +3157,8 @@ stream_cleanup_files(Oid subid, TransactionId xid)
*
* Open a file for streamed changes from a toplevel transaction identified
* by stream_xid (global variable). If it's the first chunk of streamed
- * changes for this transaction, initialize the shared fileset and create the
- * buffile, otherwise open the previously created file.
+ * changes for this transaction, create the buffile, otherwise open the
+ * previously created file.
*
* This can only be called at the beginning of a "streaming" block, i.e.
* between stream_start/stream_stop messages from the upstream.
@@ -3253,20 +3167,13 @@ static void
stream_open_file(Oid subid, TransactionId xid, bool first_segment)
{
char path[MAXPGPATH];
- bool found;
MemoryContext oldcxt;
- StreamXidHash *ent;
Assert(in_streamed_transaction);
Assert(OidIsValid(subid));
Assert(TransactionIdIsValid(xid));
Assert(stream_fd == NULL);
- /* create or find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_ENTER,
- &found);
changes_filename(path, subid, xid);
elog(DEBUG1, "opening file \"%s\" for streamed changes", path);
@@ -3283,44 +3190,14 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
* writing, in append mode.
*/
if (first_segment)
- {
- MemoryContext savectx;
- FileSet *fileset;
-
- if (found)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
-
- /*
- * We need to maintain shared fileset across multiple stream
- * start/stop calls. So, need to allocate it in a persistent context.
- */
- savectx = MemoryContextSwitchTo(ApplyContext);
- fileset = palloc(sizeof(FileSet));
-
- FileSetInit(fileset);
- MemoryContextSwitchTo(savectx);
-
- stream_fd = BufFileCreateFileSet(fileset, path);
-
- /* Remember the fileset for the next stream of the same transaction */
- ent->xid = xid;
- ent->stream_fileset = fileset;
- ent->subxact_fileset = NULL;
- }
+ stream_fd = BufFileCreateFileSet(xidfileset, path);
else
{
- if (!found)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
-
/*
* Open the file and seek to the end of the file because we always
* append the changes file.
*/
- stream_fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
+ stream_fd = BufFileOpenFileSet(xidfileset, path, O_RDWR, false);
BufFileSeek(stream_fd, 0, 0, SEEK_END);
}
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 8e9307d..9b95f7f 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -278,10 +278,12 @@ BufFileCreateFileSet(FileSet *fileset, const char *name)
* with BufFileCreateFileSet in the same FileSet using the same name.
* The backend that created the file must have called BufFileClose() or
* BufFileExportFileSet() to make sure that it is ready to be opened by other
- * backends and render it read-only.
+ * backends and render it read-only. If missing_ok is true then it will return
+ * NULL if file doesn't exist otherwise error.
*/
BufFile *
-BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
+BufFileOpenFileSet(FileSet *fileset, const char *name, int mode,
+ bool missing_ok)
{
BufFile *file;
char segment_name[MAXPGPATH];
@@ -318,10 +320,18 @@ BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
* name.
*/
if (nfiles == 0)
+ {
+ /* free the memory */
+ pfree(files);
+
+ if (missing_ok)
+ return NULL;
+
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not open temporary file \"%s\" from BufFile \"%s\": %m",
segment_name, name)));
+ }
file = makeBufFileCommon(nfiles);
file->files = files;
@@ -341,10 +351,11 @@ BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
* the FileSet to be cleaned up.
*
* Only one backend should attempt to delete a given name, and should know
- * that it exists and has been exported or closed.
+ * that it exists and has been exported or closed otherwise missing_ok should
+ * be passed true.
*/
void
-BufFileDeleteFileSet(FileSet *fileset, const char *name)
+BufFileDeleteFileSet(FileSet *fileset, const char *name, bool missing_ok)
{
char segment_name[MAXPGPATH];
int segment = 0;
@@ -358,7 +369,7 @@ BufFileDeleteFileSet(FileSet *fileset, const char *name)
for (;;)
{
SegmentName(segment_name, name, segment);
- if (!FileSetDelete(fileset, segment_name, true))
+ if (!FileSetDelete(fileset, segment_name, !missing_ok))
break;
found = true;
++segment;
@@ -366,7 +377,7 @@ BufFileDeleteFileSet(FileSet *fileset, const char *name)
CHECK_FOR_INTERRUPTS();
}
- if (!found)
+ if (!found && !missing_ok)
elog(ERROR, "could not delete unknown shared BufFile \"%s\"", name);
}
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index f7994d7..debf12e1 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -564,7 +564,7 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
lt = <s->tapes[i];
pg_itoa(i, filename);
- file = BufFileOpenFileSet(&fileset->fs, filename, O_RDONLY);
+ file = BufFileOpenFileSet(&fileset->fs, filename, O_RDONLY, false);
filesize = BufFileSize(file);
/*
diff --git a/src/backend/utils/sort/sharedtuplestore.c b/src/backend/utils/sort/sharedtuplestore.c
index 72acd54..8c5135c 100644
--- a/src/backend/utils/sort/sharedtuplestore.c
+++ b/src/backend/utils/sort/sharedtuplestore.c
@@ -560,7 +560,8 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
sts_filename(name, accessor, accessor->read_participant);
accessor->read_file =
- BufFileOpenFileSet(&accessor->fileset->fs, name, O_RDONLY);
+ BufFileOpenFileSet(&accessor->fileset->fs, name, O_RDONLY,
+ false);
}
/* Seek and load the chunk header. */
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 032a823..5e9df44 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -49,8 +49,9 @@ extern long BufFileAppend(BufFile *target, BufFile *source);
extern BufFile *BufFileCreateFileSet(FileSet *fileset, const char *name);
extern void BufFileExportFileSet(BufFile *file);
extern BufFile *BufFileOpenFileSet(FileSet *fileset, const char *name,
- int mode);
-extern void BufFileDeleteFileSet(FileSet *fileset, const char *name);
+ int mode, bool missing_ok);
+extern void BufFileDeleteFileSet(FileSet *fileset, const char *name,
+ bool missing_ok);
extern void BufFileTruncateFileSet(BufFile *file, int fileno, off_t offset);
#endif /* BUFFILE_H */
--
1.8.3.1
v1-0001-Sharedfileset-refactoring.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Sharedfileset-refactoring.patchDownload
From fbdd2aafce9f182b1b8d685080bdfd66928f7eee Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Wed, 18 Aug 2021 15:52:21 +0530
Subject: [PATCH v1 1/2] Sharedfileset refactoring
---
src/backend/replication/logical/worker.c | 40 +++---
src/backend/storage/file/Makefile | 1 +
src/backend/storage/file/buffile.c | 68 ++++-----
src/backend/storage/file/fd.c | 2 +-
src/backend/storage/file/fileset.c | 201 ++++++++++++++++++++++++++
src/backend/storage/file/sharedfileset.c | 228 +-----------------------------
src/backend/utils/sort/logtape.c | 8 +-
src/backend/utils/sort/sharedtuplestore.c | 5 +-
src/include/storage/buffile.h | 12 +-
src/include/storage/fileset.h | 41 ++++++
src/include/storage/sharedfileset.h | 14 +-
11 files changed, 316 insertions(+), 304 deletions(-)
create mode 100644 src/backend/storage/file/fileset.c
create mode 100644 src/include/storage/fileset.h
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index ecaed15..9901cf6 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -39,13 +39,13 @@
* BufFile infrastructure supports temporary files that exceed the OS file size
* limit, (b) provides a way for automatic clean up on the error and (c) provides
* a way to survive these files across local transactions and allow to open and
- * close at stream start and close. We decided to use SharedFileSet
+ * close at stream start and close. We decided to use FileSet
* infrastructure as without that it deletes the files on the closure of the
* file and if we decide to keep stream files open across the start/stop stream
* then it will consume a lot of memory (more than 8K for each BufFile and
* there could be multiple such BufFiles as the subscriber could receive
* multiple start/stop streams for different transactions before getting the
- * commit). Moreover, if we don't use SharedFileSet then we also need to invent
+ * commit). Moreover, if we don't use FileSet then we also need to invent
* a new way to pass filenames to BufFile APIs so that we are allowed to open
* the file we desired across multiple stream-open calls for the same
* transaction.
@@ -231,8 +231,8 @@ typedef struct ApplyExecutionData
typedef struct StreamXidHash
{
TransactionId xid; /* xid is the hash key and must be first */
- SharedFileSet *stream_fileset; /* shared file set for stream data */
- SharedFileSet *subxact_fileset; /* shared file set for subxact info */
+ FileSet *stream_fileset; /* shared file set for stream data */
+ FileSet *subxact_fileset; /* shared file set for subxact info */
} StreamXidHash;
static MemoryContext ApplyMessageContext = NULL;
@@ -1299,10 +1299,10 @@ apply_handle_stream_abort(StringInfo s)
/* open the changes file */
changes_filename(path, MyLogicalRepWorker->subid, xid);
- fd = BufFileOpenShared(ent->stream_fileset, path, O_RDWR);
+ fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
/* OK, truncate the file at the right offset */
- BufFileTruncateShared(fd, subxact_data.subxacts[subidx].fileno,
+ BufFileTruncateFileSet(fd, subxact_data.subxacts[subidx].fileno,
subxact_data.subxacts[subidx].offset);
BufFileClose(fd);
@@ -1355,7 +1355,7 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
errmsg_internal("transaction %u not found in stream XID hash table",
xid)));
- fd = BufFileOpenShared(ent->stream_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDONLY);
buffer = palloc(BLCKSZ);
initStringInfo(&s2);
@@ -2979,7 +2979,7 @@ subxact_info_write(Oid subid, TransactionId xid)
if (ent->subxact_fileset)
{
cleanup_subxact_info();
- SharedFileSetDeleteAll(ent->subxact_fileset);
+ FileSetDeleteAll(ent->subxact_fileset);
pfree(ent->subxact_fileset);
ent->subxact_fileset = NULL;
}
@@ -3001,14 +3001,14 @@ subxact_info_write(Oid subid, TransactionId xid)
* start/stop calls. So, need to allocate it in a persistent context.
*/
oldctx = MemoryContextSwitchTo(ApplyContext);
- ent->subxact_fileset = palloc(sizeof(SharedFileSet));
- SharedFileSetInit(ent->subxact_fileset, NULL);
+ ent->subxact_fileset = palloc(sizeof(FileSet));
+ FileSetInit(ent->subxact_fileset);
MemoryContextSwitchTo(oldctx);
- fd = BufFileCreateShared(ent->subxact_fileset, path);
+ fd = BufFileCreateFileSet(ent->subxact_fileset, path);
}
else
- fd = BufFileOpenShared(ent->subxact_fileset, path, O_RDWR);
+ fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDWR);
len = sizeof(SubXactInfo) * subxact_data.nsubxacts;
@@ -3062,7 +3062,7 @@ subxact_info_read(Oid subid, TransactionId xid)
subxact_filename(path, subid, xid);
- fd = BufFileOpenShared(ent->subxact_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDONLY);
/* read number of subxact items */
if (BufFileRead(fd, &subxact_data.nsubxacts,
@@ -3219,7 +3219,7 @@ stream_cleanup_files(Oid subid, TransactionId xid)
/* Delete the change file and release the stream fileset memory */
changes_filename(path, subid, xid);
- SharedFileSetDeleteAll(ent->stream_fileset);
+ FileSetDeleteAll(ent->stream_fileset);
pfree(ent->stream_fileset);
ent->stream_fileset = NULL;
@@ -3227,7 +3227,7 @@ stream_cleanup_files(Oid subid, TransactionId xid)
if (ent->subxact_fileset)
{
subxact_filename(path, subid, xid);
- SharedFileSetDeleteAll(ent->subxact_fileset);
+ FileSetDeleteAll(ent->subxact_fileset);
pfree(ent->subxact_fileset);
ent->subxact_fileset = NULL;
}
@@ -3285,7 +3285,7 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
if (first_segment)
{
MemoryContext savectx;
- SharedFileSet *fileset;
+ FileSet *fileset;
if (found)
ereport(ERROR,
@@ -3297,12 +3297,12 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
* start/stop calls. So, need to allocate it in a persistent context.
*/
savectx = MemoryContextSwitchTo(ApplyContext);
- fileset = palloc(sizeof(SharedFileSet));
+ fileset = palloc(sizeof(FileSet));
- SharedFileSetInit(fileset, NULL);
+ FileSetInit(fileset);
MemoryContextSwitchTo(savectx);
- stream_fd = BufFileCreateShared(fileset, path);
+ stream_fd = BufFileCreateFileSet(fileset, path);
/* Remember the fileset for the next stream of the same transaction */
ent->xid = xid;
@@ -3320,7 +3320,7 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
* Open the file and seek to the end of the file because we always
* append the changes file.
*/
- stream_fd = BufFileOpenShared(ent->stream_fileset, path, O_RDWR);
+ stream_fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
BufFileSeek(stream_fd, 0, 0, SEEK_END);
}
diff --git a/src/backend/storage/file/Makefile b/src/backend/storage/file/Makefile
index 5e1291b..660ac51 100644
--- a/src/backend/storage/file/Makefile
+++ b/src/backend/storage/file/Makefile
@@ -16,6 +16,7 @@ OBJS = \
buffile.o \
copydir.o \
fd.o \
+ fileset.o \
reinit.o \
sharedfileset.o
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index a4be5fe..8e9307d 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -39,7 +39,7 @@
* BufFile also supports temporary files that can be used by the single backend
* when the corresponding files need to be survived across the transaction and
* need to be opened and closed multiple times. Such files need to be created
- * as a member of a SharedFileSet.
+ * as a member of a FileSet.
*-------------------------------------------------------------------------
*/
@@ -77,8 +77,8 @@ struct BufFile
bool dirty; /* does buffer need to be written? */
bool readOnly; /* has the file been set to read only? */
- SharedFileSet *fileset; /* space for segment files if shared */
- const char *name; /* name of this BufFile if shared */
+ FileSet *fileset; /* space for fileset for fileset based file */
+ const char *name; /* name of this BufFile */
/*
* resowner is the ResourceOwner to use for underlying temp files. (We
@@ -104,7 +104,7 @@ static void extendBufFile(BufFile *file);
static void BufFileLoadBuffer(BufFile *file);
static void BufFileDumpBuffer(BufFile *file);
static void BufFileFlush(BufFile *file);
-static File MakeNewSharedSegment(BufFile *file, int segment);
+static File MakeNewSegment(BufFile *file, int segment);
/*
* Create BufFile and perform the common initialization.
@@ -160,7 +160,7 @@ extendBufFile(BufFile *file)
if (file->fileset == NULL)
pfile = OpenTemporaryFile(file->isInterXact);
else
- pfile = MakeNewSharedSegment(file, file->numFiles);
+ pfile = MakeNewSegment(file, file->numFiles);
Assert(pfile >= 0);
@@ -214,7 +214,7 @@ BufFileCreateTemp(bool interXact)
* Build the name for a given segment of a given BufFile.
*/
static void
-SharedSegmentName(char *name, const char *buffile_name, int segment)
+SegmentName(char *name, const char *buffile_name, int segment)
{
snprintf(name, MAXPGPATH, "%s.%d", buffile_name, segment);
}
@@ -223,25 +223,25 @@ SharedSegmentName(char *name, const char *buffile_name, int segment)
* Create a new segment file backing a shared BufFile.
*/
static File
-MakeNewSharedSegment(BufFile *buffile, int segment)
+MakeNewSegment(BufFile *buffile, int segment)
{
char name[MAXPGPATH];
File file;
/*
* It is possible that there are files left over from before a crash
- * restart with the same name. In order for BufFileOpenShared() not to
+ * restart with the same name. In order for BufFileOpen() not to
* get confused about how many segments there are, we'll unlink the next
* segment number if it already exists.
*/
- SharedSegmentName(name, buffile->name, segment + 1);
- SharedFileSetDelete(buffile->fileset, name, true);
+ SegmentName(name, buffile->name, segment + 1);
+ FileSetDelete(buffile->fileset, name, true);
/* Create the new segment. */
- SharedSegmentName(name, buffile->name, segment);
- file = SharedFileSetCreate(buffile->fileset, name);
+ SegmentName(name, buffile->name, segment);
+ file = FileSetCreate(buffile->fileset, name);
- /* SharedFileSetCreate would've errored out */
+ /* FileSetCreate would've errored out */
Assert(file > 0);
return file;
@@ -259,7 +259,7 @@ MakeNewSharedSegment(BufFile *buffile, int segment)
* unrelated SharedFileSet objects.
*/
BufFile *
-BufFileCreateShared(SharedFileSet *fileset, const char *name)
+BufFileCreateFileSet(FileSet *fileset, const char *name)
{
BufFile *file;
@@ -267,7 +267,7 @@ BufFileCreateShared(SharedFileSet *fileset, const char *name)
file->fileset = fileset;
file->name = pstrdup(name);
file->files = (File *) palloc(sizeof(File));
- file->files[0] = MakeNewSharedSegment(file, 0);
+ file->files[0] = MakeNewSegment(file, 0);
file->readOnly = false;
return file;
@@ -275,13 +275,13 @@ BufFileCreateShared(SharedFileSet *fileset, const char *name)
/*
* Open a file that was previously created in another backend (or this one)
- * with BufFileCreateShared in the same SharedFileSet using the same name.
+ * with BufFileCreateFileSet in the same FileSet using the same name.
* The backend that created the file must have called BufFileClose() or
- * BufFileExportShared() to make sure that it is ready to be opened by other
+ * BufFileExportFileSet() to make sure that it is ready to be opened by other
* backends and render it read-only.
*/
BufFile *
-BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
+BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
{
BufFile *file;
char segment_name[MAXPGPATH];
@@ -304,8 +304,8 @@ BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
files = repalloc(files, sizeof(File) * capacity);
}
/* Try to load a segment. */
- SharedSegmentName(segment_name, name, nfiles);
- files[nfiles] = SharedFileSetOpen(fileset, segment_name, mode);
+ SegmentName(segment_name, name, nfiles);
+ files[nfiles] = FileSetOpen(fileset, segment_name, mode);
if (files[nfiles] <= 0)
break;
++nfiles;
@@ -333,18 +333,18 @@ BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
}
/*
- * Delete a BufFile that was created by BufFileCreateShared in the given
- * SharedFileSet using the given name.
+ * Delete a BufFile that was created by BufFileCreateFileSet in the given
+ * FileSet using the given name.
*
* It is not necessary to delete files explicitly with this function. It is
* provided only as a way to delete files proactively, rather than waiting for
- * the SharedFileSet to be cleaned up.
+ * the FileSet to be cleaned up.
*
* Only one backend should attempt to delete a given name, and should know
* that it exists and has been exported or closed.
*/
void
-BufFileDeleteShared(SharedFileSet *fileset, const char *name)
+BufFileDeleteFileSet(FileSet *fileset, const char *name)
{
char segment_name[MAXPGPATH];
int segment = 0;
@@ -357,8 +357,8 @@ BufFileDeleteShared(SharedFileSet *fileset, const char *name)
*/
for (;;)
{
- SharedSegmentName(segment_name, name, segment);
- if (!SharedFileSetDelete(fileset, segment_name, true))
+ SegmentName(segment_name, name, segment);
+ if (!FileSetDelete(fileset, segment_name, true))
break;
found = true;
++segment;
@@ -371,12 +371,12 @@ BufFileDeleteShared(SharedFileSet *fileset, const char *name)
}
/*
- * BufFileExportShared --- flush and make read-only, in preparation for sharing.
+ * BufFileExportFileSet --- flush and make read-only, in preparation for sharing.
*/
void
-BufFileExportShared(BufFile *file)
+BufFileExportFileSet(BufFile *file)
{
- /* Must be a file belonging to a SharedFileSet. */
+ /* Must be a file belonging to a FileSet. */
Assert(file->fileset != NULL);
/* It's probably a bug if someone calls this twice. */
@@ -854,11 +854,11 @@ BufFileAppend(BufFile *target, BufFile *source)
}
/*
- * Truncate a BufFile created by BufFileCreateShared up to the given fileno and
- * the offset.
+ * Truncate a BufFile created by BufFileCreateFileSet up to the given fileno
+ * and the offset.
*/
void
-BufFileTruncateShared(BufFile *file, int fileno, off_t offset)
+BufFileTruncateFileSet(BufFile *file, int fileno, off_t offset)
{
int numFiles = file->numFiles;
int newFile = fileno;
@@ -876,9 +876,9 @@ BufFileTruncateShared(BufFile *file, int fileno, off_t offset)
{
if ((i != fileno || offset == 0) && i != 0)
{
- SharedSegmentName(segment_name, file->name, i);
+ SegmentName(segment_name, file->name, i);
FileClose(file->files[i]);
- if (!SharedFileSetDelete(file->fileset, segment_name, true))
+ if (!FileSetDelete(file->fileset, segment_name, true))
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not delete shared fileset \"%s\": %m",
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index b58b399..433e283 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -1921,7 +1921,7 @@ PathNameDeleteTemporaryFile(const char *path, bool error_on_failure)
/*
* Unlike FileClose's automatic file deletion code, we tolerate
- * non-existence to support BufFileDeleteShared which doesn't know how
+ * non-existence to support BufFileDeleteFileSet which doesn't know how
* many segments it has to delete until it runs out.
*/
if (stat_errno == ENOENT)
diff --git a/src/backend/storage/file/fileset.c b/src/backend/storage/file/fileset.c
new file mode 100644
index 0000000..f2a585d
--- /dev/null
+++ b/src/backend/storage/file/fileset.c
@@ -0,0 +1,201 @@
+/*-------------------------------------------------------------------------
+ *
+ * fileset.c
+ * temporary file set management.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/storage/file/fileset.c
+ *
+ * FileSets provide a temporary namespace (think directory) so that files can
+ * be discovered by name
+ *
+ * FileSets can be used by backends when the temporary files need to be
+ * opened/closed multiple times and the underlying files need to survive across
+ * transactions.
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <limits.h>
+
+#include "catalog/pg_tablespace.h"
+#include "commands/tablespace.h"
+#include "common/hashfn.h"
+#include "miscadmin.h"
+#include "storage/ipc.h"
+#include "storage/fileset.h"
+#include "utils/builtins.h"
+
+static void FileSetPath(char *path, FileSet *fileset, Oid tablespace);
+static void FilePath(char *path, FileSet *fileset, const char *name);
+static Oid ChooseTablespace(const FileSet *fileset, const char *name);
+
+/*
+ * Initialize a space for temporary files. This API can be used by shared
+ * fileset as well as if the temporary files are used only by single backend
+ * but the files need to be opened and closed multiple times and also the
+ * underlying files need to survive across transactions.
+ *
+ * Files will be distributed over the tablespaces configured in
+ * temp_tablespaces.
+ *
+ * Under the covers the set is one or more directories which will eventually
+ * be deleted.
+ */
+void
+FileSetInit(FileSet *fileset)
+{
+ static uint32 counter = 0;
+
+ fileset->creator_pid = MyProcPid;
+ fileset->number = counter;
+ counter = (counter + 1) % INT_MAX;
+
+ /* Capture the tablespace OIDs so that all backends agree on them. */
+ PrepareTempTablespaces();
+ fileset->ntablespaces =
+ GetTempTablespaces(&fileset->tablespaces[0],
+ lengthof(fileset->tablespaces));
+ if (fileset->ntablespaces == 0)
+ {
+ /* If the GUC is empty, use current database's default tablespace */
+ fileset->tablespaces[0] = MyDatabaseTableSpace;
+ fileset->ntablespaces = 1;
+ }
+ else
+ {
+ int i;
+
+ /*
+ * An entry of InvalidOid means use the default tablespace for the
+ * current database. Replace that now, to be sure that all users of
+ * the FileSet agree on what to do.
+ */
+ for (i = 0; i < fileset->ntablespaces; i++)
+ {
+ if (fileset->tablespaces[i] == InvalidOid)
+ fileset->tablespaces[i] = MyDatabaseTableSpace;
+ }
+ }
+}
+
+/*
+ * Create a new file in the given set.
+ */
+File
+FileSetCreate(FileSet *fileset, const char *name)
+{
+ char path[MAXPGPATH];
+ File file;
+
+ FilePath(path, fileset, name);
+ file = PathNameCreateTemporaryFile(path, false);
+
+ /* If we failed, see if we need to create the directory on demand. */
+ if (file <= 0)
+ {
+ char tempdirpath[MAXPGPATH];
+ char filesetpath[MAXPGPATH];
+ Oid tablespace = ChooseTablespace(fileset, name);
+
+ TempTablespacePath(tempdirpath, tablespace);
+ FileSetPath(filesetpath, fileset, tablespace);
+ PathNameCreateTemporaryDir(tempdirpath, filesetpath);
+ file = PathNameCreateTemporaryFile(path, true);
+ }
+
+ return file;
+}
+
+/*
+ * Open a file that was created with FileSetCreate() */
+File
+FileSetOpen(FileSet *fileset, const char *name, int mode)
+{
+ char path[MAXPGPATH];
+ File file;
+
+ FilePath(path, fileset, name);
+ file = PathNameOpenTemporaryFile(path, mode);
+
+ return file;
+}
+
+/*
+ * Delete a file that was created with FileSetCreate().
+ * Return true if the file existed, false if didn't.
+ */
+bool
+FileSetDelete(FileSet *fileset, const char *name,
+ bool error_on_failure)
+{
+ char path[MAXPGPATH];
+
+ FilePath(path, fileset, name);
+
+ return PathNameDeleteTemporaryFile(path, error_on_failure);
+}
+
+/*
+ * Delete all files in the set.
+ */
+void
+FileSetDeleteAll(FileSet *fileset)
+{
+ char dirpath[MAXPGPATH];
+ int i;
+
+ /*
+ * Delete the directory we created in each tablespace. Doesn't fail
+ * because we use this in error cleanup paths, but can generate LOG
+ * message on IO error.
+ */
+ for (i = 0; i < fileset->ntablespaces; ++i)
+ {
+ FileSetPath(dirpath, fileset, fileset->tablespaces[i]);
+ PathNameDeleteTemporaryDir(dirpath);
+ }
+}
+
+/*
+ * Build the path for the directory holding the files backing a FileSet in a
+ * given tablespace.
+ */
+static void
+FileSetPath(char *path, FileSet *fileset, Oid tablespace)
+{
+ char tempdirpath[MAXPGPATH];
+
+ TempTablespacePath(tempdirpath, tablespace);
+ snprintf(path, MAXPGPATH, "%s/%s%lu.%u.sharedfileset",
+ tempdirpath, PG_TEMP_FILE_PREFIX,
+ (unsigned long) fileset->creator_pid, fileset->number);
+}
+
+/*
+ * Sorting hat to determine which tablespace a given temporary file belongs in.
+ */
+static Oid
+ChooseTablespace(const FileSet *fileset, const char *name)
+{
+ uint32 hash = hash_any((const unsigned char *) name, strlen(name));
+
+ return fileset->tablespaces[hash % fileset->ntablespaces];
+}
+
+/*
+ * Compute the full path of a file in a FileSet.
+ */
+static void
+FilePath(char *path, FileSet *fileset, const char *name)
+{
+ char dirpath[MAXPGPATH];
+
+ FileSetPath(dirpath, fileset, ChooseTablespace(fileset, name));
+ snprintf(path, MAXPGPATH, "%s/%s", dirpath, name);
+}
diff --git a/src/backend/storage/file/sharedfileset.c b/src/backend/storage/file/sharedfileset.c
index ed37c94..475df46 100644
--- a/src/backend/storage/file/sharedfileset.c
+++ b/src/backend/storage/file/sharedfileset.c
@@ -33,13 +33,7 @@
#include "storage/sharedfileset.h"
#include "utils/builtins.h"
-static List *filesetlist = NIL;
-
static void SharedFileSetOnDetach(dsm_segment *segment, Datum datum);
-static void SharedFileSetDeleteOnProcExit(int status, Datum arg);
-static void SharedFileSetPath(char *path, SharedFileSet *fileset, Oid tablespace);
-static void SharedFilePath(char *path, SharedFileSet *fileset, const char *name);
-static Oid ChooseTablespace(const SharedFileSet *fileset, const char *name);
/*
* Initialize a space for temporary files that can be opened by other backends.
@@ -63,61 +57,14 @@ static Oid ChooseTablespace(const SharedFileSet *fileset, const char *name);
void
SharedFileSetInit(SharedFileSet *fileset, dsm_segment *seg)
{
- static uint32 counter = 0;
-
SpinLockInit(&fileset->mutex);
fileset->refcnt = 1;
- fileset->creator_pid = MyProcPid;
- fileset->number = counter;
- counter = (counter + 1) % INT_MAX;
-
- /* Capture the tablespace OIDs so that all backends agree on them. */
- PrepareTempTablespaces();
- fileset->ntablespaces =
- GetTempTablespaces(&fileset->tablespaces[0],
- lengthof(fileset->tablespaces));
- if (fileset->ntablespaces == 0)
- {
- /* If the GUC is empty, use current database's default tablespace */
- fileset->tablespaces[0] = MyDatabaseTableSpace;
- fileset->ntablespaces = 1;
- }
- else
- {
- int i;
- /*
- * An entry of InvalidOid means use the default tablespace for the
- * current database. Replace that now, to be sure that all users of
- * the SharedFileSet agree on what to do.
- */
- for (i = 0; i < fileset->ntablespaces; i++)
- {
- if (fileset->tablespaces[i] == InvalidOid)
- fileset->tablespaces[i] = MyDatabaseTableSpace;
- }
- }
+ FileSetInit(&fileset->fs);
/* Register our cleanup callback. */
if (seg)
on_dsm_detach(seg, SharedFileSetOnDetach, PointerGetDatum(fileset));
- else
- {
- static bool registered_cleanup = false;
-
- if (!registered_cleanup)
- {
- /*
- * We must not have registered any fileset before registering the
- * fileset clean up.
- */
- Assert(filesetlist == NIL);
- on_proc_exit(SharedFileSetDeleteOnProcExit, 0);
- registered_cleanup = true;
- }
-
- filesetlist = lcons((void *) fileset, filesetlist);
- }
}
/*
@@ -148,86 +95,12 @@ SharedFileSetAttach(SharedFileSet *fileset, dsm_segment *seg)
}
/*
- * Create a new file in the given set.
- */
-File
-SharedFileSetCreate(SharedFileSet *fileset, const char *name)
-{
- char path[MAXPGPATH];
- File file;
-
- SharedFilePath(path, fileset, name);
- file = PathNameCreateTemporaryFile(path, false);
-
- /* If we failed, see if we need to create the directory on demand. */
- if (file <= 0)
- {
- char tempdirpath[MAXPGPATH];
- char filesetpath[MAXPGPATH];
- Oid tablespace = ChooseTablespace(fileset, name);
-
- TempTablespacePath(tempdirpath, tablespace);
- SharedFileSetPath(filesetpath, fileset, tablespace);
- PathNameCreateTemporaryDir(tempdirpath, filesetpath);
- file = PathNameCreateTemporaryFile(path, true);
- }
-
- return file;
-}
-
-/*
- * Open a file that was created with SharedFileSetCreate(), possibly in
- * another backend.
- */
-File
-SharedFileSetOpen(SharedFileSet *fileset, const char *name, int mode)
-{
- char path[MAXPGPATH];
- File file;
-
- SharedFilePath(path, fileset, name);
- file = PathNameOpenTemporaryFile(path, mode);
-
- return file;
-}
-
-/*
- * Delete a file that was created with SharedFileSetCreate().
- * Return true if the file existed, false if didn't.
- */
-bool
-SharedFileSetDelete(SharedFileSet *fileset, const char *name,
- bool error_on_failure)
-{
- char path[MAXPGPATH];
-
- SharedFilePath(path, fileset, name);
-
- return PathNameDeleteTemporaryFile(path, error_on_failure);
-}
-
-/*
* Delete all files in the set.
*/
void
SharedFileSetDeleteAll(SharedFileSet *fileset)
{
- char dirpath[MAXPGPATH];
- int i;
-
- /*
- * Delete the directory we created in each tablespace. Doesn't fail
- * because we use this in error cleanup paths, but can generate LOG
- * message on IO error.
- */
- for (i = 0; i < fileset->ntablespaces; ++i)
- {
- SharedFileSetPath(dirpath, fileset, fileset->tablespaces[i]);
- PathNameDeleteTemporaryDir(dirpath);
- }
-
- /* Unregister the shared fileset */
- SharedFileSetUnregister(fileset);
+ return FileSetDeleteAll(&fileset->fs);
}
/*
@@ -255,100 +128,5 @@ SharedFileSetOnDetach(dsm_segment *segment, Datum datum)
* this function so we can safely access its data.
*/
if (unlink_all)
- SharedFileSetDeleteAll(fileset);
-}
-
-/*
- * Callback function that will be invoked on the process exit. This will
- * process the list of all the registered sharedfilesets and delete the
- * underlying files.
- */
-static void
-SharedFileSetDeleteOnProcExit(int status, Datum arg)
-{
- /*
- * Remove all the pending shared fileset entries. We don't use foreach()
- * here because SharedFileSetDeleteAll will remove the current element in
- * filesetlist. Though we have used foreach_delete_current() to remove the
- * element from filesetlist it could only fix up the state of one of the
- * loops, see SharedFileSetUnregister.
- */
- while (list_length(filesetlist) > 0)
- {
- SharedFileSet *fileset = (SharedFileSet *) linitial(filesetlist);
-
- SharedFileSetDeleteAll(fileset);
- }
-
- filesetlist = NIL;
-}
-
-/*
- * Unregister the shared fileset entry registered for cleanup on proc exit.
- */
-void
-SharedFileSetUnregister(SharedFileSet *input_fileset)
-{
- ListCell *l;
-
- /*
- * If the caller is following the dsm based cleanup then we don't maintain
- * the filesetlist so return.
- */
- if (filesetlist == NIL)
- return;
-
- foreach(l, filesetlist)
- {
- SharedFileSet *fileset = (SharedFileSet *) lfirst(l);
-
- /* Remove the entry from the list */
- if (input_fileset == fileset)
- {
- filesetlist = foreach_delete_current(filesetlist, l);
- return;
- }
- }
-
- /* Should have found a match */
- Assert(false);
-}
-
-/*
- * Build the path for the directory holding the files backing a SharedFileSet
- * in a given tablespace.
- */
-static void
-SharedFileSetPath(char *path, SharedFileSet *fileset, Oid tablespace)
-{
- char tempdirpath[MAXPGPATH];
-
- TempTablespacePath(tempdirpath, tablespace);
- snprintf(path, MAXPGPATH, "%s/%s%lu.%u.sharedfileset",
- tempdirpath, PG_TEMP_FILE_PREFIX,
- (unsigned long) fileset->creator_pid, fileset->number);
-}
-
-/*
- * Sorting hat to determine which tablespace a given shared temporary file
- * belongs in.
- */
-static Oid
-ChooseTablespace(const SharedFileSet *fileset, const char *name)
-{
- uint32 hash = hash_any((const unsigned char *) name, strlen(name));
-
- return fileset->tablespaces[hash % fileset->ntablespaces];
-}
-
-/*
- * Compute the full path of a file in a SharedFileSet.
- */
-static void
-SharedFilePath(char *path, SharedFileSet *fileset, const char *name)
-{
- char dirpath[MAXPGPATH];
-
- SharedFileSetPath(dirpath, fileset, ChooseTablespace(fileset, name));
- snprintf(path, MAXPGPATH, "%s/%s", dirpath, name);
+ FileSetDeleteAll(&fileset->fs);
}
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index cafc087..f7994d7 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -564,7 +564,7 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
lt = <s->tapes[i];
pg_itoa(i, filename);
- file = BufFileOpenShared(fileset, filename, O_RDONLY);
+ file = BufFileOpenFileSet(&fileset->fs, filename, O_RDONLY);
filesize = BufFileSize(file);
/*
@@ -610,7 +610,7 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
* offset).
*
* The only thing that currently prevents writing to the leader tape from
- * working is the fact that BufFiles opened using BufFileOpenShared() are
+ * working is the fact that BufFiles opened using BufFileOpenFileSet() are
* read-only by definition, but that could be changed if it seemed
* worthwhile. For now, writing to the leader tape will raise a "Bad file
* descriptor" error, so tuplesort must avoid writing to the leader tape
@@ -722,7 +722,7 @@ LogicalTapeSetCreate(int ntapes, bool preallocate, TapeShare *shared,
char filename[MAXPGPATH];
pg_itoa(worker, filename);
- lts->pfile = BufFileCreateShared(fileset, filename);
+ lts->pfile = BufFileCreateFileSet(&fileset->fs, filename);
}
else
lts->pfile = BufFileCreateTemp(false);
@@ -1096,7 +1096,7 @@ LogicalTapeFreeze(LogicalTapeSet *lts, int tapenum, TapeShare *share)
/* Handle extra steps when caller is to share its tapeset */
if (share)
{
- BufFileExportShared(lts->pfile);
+ BufFileExportFileSet(lts->pfile);
share->firstblocknumber = lt->firstBlockNumber;
}
}
diff --git a/src/backend/utils/sort/sharedtuplestore.c b/src/backend/utils/sort/sharedtuplestore.c
index 57e35db..72acd54 100644
--- a/src/backend/utils/sort/sharedtuplestore.c
+++ b/src/backend/utils/sort/sharedtuplestore.c
@@ -310,7 +310,8 @@ sts_puttuple(SharedTuplestoreAccessor *accessor, void *meta_data,
/* Create one. Only this backend will write into it. */
sts_filename(name, accessor, accessor->participant);
- accessor->write_file = BufFileCreateShared(accessor->fileset, name);
+ accessor->write_file =
+ BufFileCreateFileSet(&accessor->fileset->fs, name);
/* Set up the shared state for this backend's file. */
participant = &accessor->sts->participants[accessor->participant];
@@ -559,7 +560,7 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
sts_filename(name, accessor, accessor->read_participant);
accessor->read_file =
- BufFileOpenShared(accessor->fileset, name, O_RDONLY);
+ BufFileOpenFileSet(&accessor->fileset->fs, name, O_RDONLY);
}
/* Seek and load the chunk header. */
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 566523d..032a823 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -26,7 +26,7 @@
#ifndef BUFFILE_H
#define BUFFILE_H
-#include "storage/sharedfileset.h"
+#include "storage/fileset.h"
/* BufFile is an opaque type whose details are not known outside buffile.c. */
@@ -46,11 +46,11 @@ extern int BufFileSeekBlock(BufFile *file, long blknum);
extern int64 BufFileSize(BufFile *file);
extern long BufFileAppend(BufFile *target, BufFile *source);
-extern BufFile *BufFileCreateShared(SharedFileSet *fileset, const char *name);
-extern void BufFileExportShared(BufFile *file);
-extern BufFile *BufFileOpenShared(SharedFileSet *fileset, const char *name,
+extern BufFile *BufFileCreateFileSet(FileSet *fileset, const char *name);
+extern void BufFileExportFileSet(BufFile *file);
+extern BufFile *BufFileOpenFileSet(FileSet *fileset, const char *name,
int mode);
-extern void BufFileDeleteShared(SharedFileSet *fileset, const char *name);
-extern void BufFileTruncateShared(BufFile *file, int fileno, off_t offset);
+extern void BufFileDeleteFileSet(FileSet *fileset, const char *name);
+extern void BufFileTruncateFileSet(BufFile *file, int fileno, off_t offset);
#endif /* BUFFILE_H */
diff --git a/src/include/storage/fileset.h b/src/include/storage/fileset.h
new file mode 100644
index 0000000..dfe4da8
--- /dev/null
+++ b/src/include/storage/fileset.h
@@ -0,0 +1,41 @@
+/*-------------------------------------------------------------------------
+ *
+ * fileset.h
+ * temporary file management.
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/fileset.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FILESET_H
+#define FILESET_H
+
+#include "storage/fd.h"
+
+/*
+ * A set of temporary files.
+ */
+typedef struct FileSet
+{
+ pid_t creator_pid; /* PID of the creating process */
+ uint32 number; /* per-PID identifier */
+ int ntablespaces; /* number of tablespaces to use */
+ Oid tablespaces[8]; /* OIDs of tablespaces to use. Assumes that
+ * it's rare that there more than temp
+ * tablespaces. */
+} FileSet;
+
+extern void FileSetInit(FileSet *fileset);
+extern File FileSetCreate(FileSet *fileset, const char *name);
+extern File FileSetOpen(FileSet *fileset, const char *name,
+ int mode);
+extern bool FileSetDelete(FileSet *fileset, const char *name,
+ bool error_on_failure);
+extern void FileSetDeleteAll(FileSet *fileset);
+
+#endif
diff --git a/src/include/storage/sharedfileset.h b/src/include/storage/sharedfileset.h
index 09ba121..59becfb 100644
--- a/src/include/storage/sharedfileset.h
+++ b/src/include/storage/sharedfileset.h
@@ -17,6 +17,7 @@
#include "storage/dsm.h"
#include "storage/fd.h"
+#include "storage/fileset.h"
#include "storage/spin.h"
/*
@@ -24,24 +25,13 @@
*/
typedef struct SharedFileSet
{
- pid_t creator_pid; /* PID of the creating process */
- uint32 number; /* per-PID identifier */
+ FileSet fs;
slock_t mutex; /* mutex protecting the reference count */
int refcnt; /* number of attached backends */
- int ntablespaces; /* number of tablespaces to use */
- Oid tablespaces[8]; /* OIDs of tablespaces to use. Assumes that
- * it's rare that there more than temp
- * tablespaces. */
} SharedFileSet;
extern void SharedFileSetInit(SharedFileSet *fileset, dsm_segment *seg);
extern void SharedFileSetAttach(SharedFileSet *fileset, dsm_segment *seg);
-extern File SharedFileSetCreate(SharedFileSet *fileset, const char *name);
-extern File SharedFileSetOpen(SharedFileSet *fileset, const char *name,
- int mode);
-extern bool SharedFileSetDelete(SharedFileSet *fileset, const char *name,
- bool error_on_failure);
extern void SharedFileSetDeleteAll(SharedFileSet *fileset);
-extern void SharedFileSetUnregister(SharedFileSet *input_fileset);
#endif
--
1.8.3.1
On Sat, Aug 21, 2021 8:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Aug 18, 2021 at 3:45 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I was looking into this, so if we want to do that I think the outline
will look like this- There will be a fileset.c and fileset.h files, and we will expose a
new structure FileSet, which will be the same as SharedFileSet, except
mutext and refcount. The fileset.c will expose FileSetInit(),
FileSetCreate(), FileSetOpen(), FileSetDelete() and FileSetDeleteAll()
interfaces.- sharefileset.c will internally call the fileset.c's interfaces. The
SharedFileSet structure will also contain FileSet and other members
i.e. mutex and refcount.- the buffile.c's interfaces which are ending with Shared e.g.
BufFileCreateShared, BufFileOpenShared, should be converted to
BufFileCreate and BufFileOpen respectively. And the input to these
interfaces can be converted to FileSet instead of SharedFileSet.Here is the first draft based on the idea we discussed, 0001, splits
sharedfileset.c in sharedfileset.c and fileset.c and 0002 is same patch I
submitted earlier(use single fileset throughout the worker), just it is rebased on
top of 0001. Please let me know your thoughts.
Hi,
Here are some comments for the new version patches.
1)
+ TempTablespacePath(tempdirpath, tablespace);
+ snprintf(path, MAXPGPATH, "%s/%s%lu.%u.sharedfileset",
+ tempdirpath, PG_TEMP_FILE_PREFIX,
+ (unsigned long) fileset->creator_pid, fileset->number);
do we need to use different filename for shared and un-shared fileset ?
2)
I think we can remove or adjust the following comments in sharedfileset.c.
----
* SharedFileSets can be used by backends when the temporary files need to be
* opened/closed multiple times and the underlying files need to survive across
* transactions.
----
* We can also use this interface if the temporary files are used only by
* single backend but the files need to be opened and closed multiple times
* and also the underlying files need to survive across transactions. For
----
3)
The 0002 patch still used the word "shared fileset" in some places, I think we
should change it to "fileset".
4)
-extern File SharedFileSetCreate(SharedFileSet *fileset, const char *name);
-extern File SharedFileSetOpen(SharedFileSet *fileset, const char *name,
- int mode);
-extern bool SharedFileSetDelete(SharedFileSet *fileset, const char *name,
- bool error_on_failure);
extern void SharedFileSetDeleteAll(SharedFileSet *fileset);
-extern void SharedFileSetUnregister(SharedFileSet *input_fileset);
I noticed the patch delete part of public api, is it better to keep the old api and
let them invoke new api internally ? Having said that, I didn’t find any open source
extension use these old api, so it might be fine to delete them.
Best regards,
Hou zj
On Mon, Aug 23, 2021 at 9:11 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
4)
-extern File SharedFileSetCreate(SharedFileSet *fileset, const char *name);
-extern File SharedFileSetOpen(SharedFileSet *fileset, const char *name,
- int mode);
-extern bool SharedFileSetDelete(SharedFileSet *fileset, const char *name,
- bool error_on_failure);
extern void SharedFileSetDeleteAll(SharedFileSet *fileset);
-extern void SharedFileSetUnregister(SharedFileSet *input_fileset);I noticed the patch delete part of public api, is it better to keep the old api and
let them invoke new api internally ? Having said that, I didn’t find any open source
extension use these old api, so it might be fine to delete them.
Right, those were internally used by buffile.c but now we have changed
buffile.c to directly use the fileset interfaces, so we better remove
them.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Mon, Aug 23, 2021 at 9:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Aug 23, 2021 at 9:11 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:4)
-extern File SharedFileSetCreate(SharedFileSet *fileset, const char *name);
-extern File SharedFileSetOpen(SharedFileSet *fileset, const char *name,
- int mode);
-extern bool SharedFileSetDelete(SharedFileSet *fileset, const char *name,
- bool error_on_failure);
extern void SharedFileSetDeleteAll(SharedFileSet *fileset);
-extern void SharedFileSetUnregister(SharedFileSet *input_fileset);I noticed the patch delete part of public api, is it better to keep the old api and
let them invoke new api internally ? Having said that, I didn’t find any open source
extension use these old api, so it might be fine to delete them.Right, those were internally used by buffile.c but now we have changed
buffile.c to directly use the fileset interfaces, so we better remove
them.
I also don't see any reason to keep those exposed from
sharedfileset.h. I see that even in the original commit dc6c4c9dc2,
these APIs seem to be introduced to be used by buffile. Andres/Thomas,
do let us know if you think otherwise?
One more comment:
I think v1-0001-Sharedfileset-refactoring doesn't have a way for
cleaning up worker.c temporary files on error/exit. It is better to
have that to make it an independent patch.
--
With Regards,
Amit Kapila.
On Mon, Aug 23, 2021 at 11:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Aug 23, 2021 at 9:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Aug 23, 2021 at 9:11 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:4)
-extern File SharedFileSetCreate(SharedFileSet *fileset, const char *name);
-extern File SharedFileSetOpen(SharedFileSet *fileset, const char *name,
- int mode);
-extern bool SharedFileSetDelete(SharedFileSet *fileset, const char *name,
- bool error_on_failure);
extern void SharedFileSetDeleteAll(SharedFileSet *fileset);
-extern void SharedFileSetUnregister(SharedFileSet *input_fileset);I noticed the patch delete part of public api, is it better to keep the old api and
let them invoke new api internally ? Having said that, I didn’t find any open source
extension use these old api, so it might be fine to delete them.Right, those were internally used by buffile.c but now we have changed
buffile.c to directly use the fileset interfaces, so we better remove
them.I also don't see any reason to keep those exposed from
sharedfileset.h. I see that even in the original commit dc6c4c9dc2,
these APIs seem to be introduced to be used by buffile. Andres/Thomas,
do let us know if you think otherwise?One more comment:
I think v1-0001-Sharedfileset-refactoring doesn't have a way for
cleaning up worker.c temporary files on error/exit. It is better to
have that to make it an independent patch.
I think we should handle that in worker.c itself, by adding a
before_dsm_detach function before_shmem_exit right? Or you are
thinking that in FileSetInit, we keep the mechanism of filesetlist
like we were doing in SharedFileSetInit? I think that will just add
unnecessary complexity in the first patch which will eventually go
away in the second patch. And if we do that then SharedFileSetInit
can not directly use the FileSetInit, otherwise, the dsm based fileset
will also get registered for cleanup in filesetlist so for that we
might need to pass one parameter to the FileSetInit() that whether to
register for cleanup or not and that will again not look clean because
now we are again adding the conditional cleanup, IMHO that is the same
problem what we are trying to cleanup in SharedFileSetInit by
introducing a new FileSetInit.
I think what we can do is, introduce a new function
FileSetInitInternal(), that will do what FileSetInit() is doing today
and now both SharedFileSetInit() and the FileSetInit() will call this
function, and along with that SharedFileSetInit(), will register the
dsm based cleanup and FileSetInit() will do the filesetlist based
cleanup. But IMHO, we should try to go in this direction only if we
are sure that we want to commit the first patch and not the second.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Sat, Aug 21, 2021 at 9:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Aug 18, 2021 at 3:45 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Aug 18, 2021 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Aug 17, 2021 at 4:34 PM Andres Freund <andres@anarazel.de> wrote:
On 2021-08-17 10:54:30 +0530, Amit Kapila wrote:
5. How can we provide a strict mechanism to not allow to use dsm APIs
for non-dsm FileSet? One idea could be that we can have a variable
(probably bool) in SharedFileSet structure which will be initialized
in SharedFileSetInit based on whether the caller has provided dsm
segment. Then in other DSM-based APIs, we can check if it is used for
the wrong type.Well, isn't the issue here that it's not a shared file set in case you
explicitly don't want to share it? ISTM that the proper way to address
this would be to split out a FileSet from SharedFileSet that's then used
for worker.c and sharedfileset.c.Okay, but note that to accomplish the same, we need to tweak the
BufFile (buffile.c) APIs as well so that they can work with FileSet.
As per the initial analysis, there doesn't seem to be any problem with
that though.I was looking into this, so if we want to do that I think the outline
will look like this- There will be a fileset.c and fileset.h files, and we will expose a
new structure FileSet, which will be the same as SharedFileSet, except
mutext and refcount. The fileset.c will expose FileSetInit(),
FileSetCreate(), FileSetOpen(), FileSetDelete() and FileSetDeleteAll()
interfaces.- sharefileset.c will internally call the fileset.c's interfaces. The
SharedFileSet structure will also contain FileSet and other members
i.e. mutex and refcount.- the buffile.c's interfaces which are ending with Shared e.g.
BufFileCreateShared, BufFileOpenShared, should be converted to
BufFileCreate and
BufFileOpen respectively. And the input to these interfaces can be
converted to FileSet instead of SharedFileSet.Here is the first draft based on the idea we discussed, 0001, splits
sharedfileset.c in sharedfileset.c and fileset.c and 0002 is same
patch I submitted earlier(use single fileset throughout the worker),
just it is rebased on top of 0001. Please let me know your thoughts.
Here are some comments on 0001 patch:
+/*
+ * Initialize a space for temporary files. This API can be used by shared
+ * fileset as well as if the temporary files are used only by single backend
+ * but the files need to be opened and closed multiple times and also the
+ * underlying files need to survive across transactions.
+ *
+ * Files will be distributed over the tablespaces configured in
+ * temp_tablespaces.
+ *
+ * Under the covers the set is one or more directories which will eventually
+ * be deleted.
+ */
I think it's better to mention cleaning up here like we do in the
comment of SharedFileSetInit().
---
I think we need to clean up both stream_fileset and subxact_fileset on
proc exit. In 0002 patch cleans up the fileset but I think we need to
do that also in 0001 patch.
---
There still are some comments using "shared fileset" in both buffile.c
and worker.c.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Mon, Aug 23, 2021 at 12:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Aug 23, 2021 at 11:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Aug 23, 2021 at 9:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Aug 23, 2021 at 9:11 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:4)
-extern File SharedFileSetCreate(SharedFileSet *fileset, const char *name);
-extern File SharedFileSetOpen(SharedFileSet *fileset, const char *name,
- int mode);
-extern bool SharedFileSetDelete(SharedFileSet *fileset, const char *name,
- bool error_on_failure);
extern void SharedFileSetDeleteAll(SharedFileSet *fileset);
-extern void SharedFileSetUnregister(SharedFileSet *input_fileset);I noticed the patch delete part of public api, is it better to keep the old api and
let them invoke new api internally ? Having said that, I didn’t find any open source
extension use these old api, so it might be fine to delete them.Right, those were internally used by buffile.c but now we have changed
buffile.c to directly use the fileset interfaces, so we better remove
them.I also don't see any reason to keep those exposed from
sharedfileset.h. I see that even in the original commit dc6c4c9dc2,
these APIs seem to be introduced to be used by buffile. Andres/Thomas,
do let us know if you think otherwise?One more comment:
I think v1-0001-Sharedfileset-refactoring doesn't have a way for
cleaning up worker.c temporary files on error/exit. It is better to
have that to make it an independent patch.I think we should handle that in worker.c itself, by adding a
before_dsm_detach function before_shmem_exit right?
Yeah, I thought of handling it in worker.c similar to what you've in 0002 patch.
--
With Regards,
Amit Kapila.
On Mon, Aug 23, 2021 at 1:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Note: merge comments from multiple mails
I think we should handle that in worker.c itself, by adding a
before_dsm_detach function before_shmem_exit right?Yeah, I thought of handling it in worker.c similar to what you've in 0002 patch.
I have done handling in worker.c
On Mon, Aug 23, 2021 at 9:11 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
On Sat, Aug 21, 2021 8:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
1) + TempTablespacePath(tempdirpath, tablespace); + snprintf(path, MAXPGPATH, "%s/%s%lu.%u.sharedfileset", + tempdirpath, PG_TEMP_FILE_PREFIX, + (unsigned long) fileset->creator_pid, fileset->number);do we need to use different filename for shared and un-shared fileset ?
I was also thinking about the same, does it make sense to name it just
""%s/%s%lu.%u.fileset"?
On Mon, Aug 23, 2021 at 1:08 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Here are some comments on 0001 patch:
+/* + * Initialize a space for temporary files. This API can be used by shared + * fileset as well as if the temporary files are used only by single backend + * but the files need to be opened and closed multiple times and also the + * underlying files need to survive across transactions. + * + * Files will be distributed over the tablespaces configured in + * temp_tablespaces. + * + * Under the covers the set is one or more directories which will eventually + * be deleted. + */I think it's better to mention cleaning up here like we do in the
comment of SharedFileSetInit().
Right, done
---
I think we need to clean up both stream_fileset and subxact_fileset on
proc exit. In 0002 patch cleans up the fileset but I think we need to
do that also in 0001 patch.
Done
---
There still are some comments using "shared fileset" in both buffile.c
and worker.c.
Done
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v2-0001-Sharedfileset-refactoring.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Sharedfileset-refactoring.patchDownload
From ced3a5e0222c2f89a977add167615f09d99d107a Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Wed, 18 Aug 2021 15:52:21 +0530
Subject: [PATCH v2 1/2] Sharedfileset refactoring
Currently, sharedfileset.c is designed for a very specific purpose i.e.
one backend could create the fileset and could be shared across multiple
backend so the fileset should be created in DSM. But in some use cases,
we need exact behavior as sharedfileset that the files created under this
should be named files and should survive the transaction and should allow
to have feature of open and close. In this patch we have refactored
these files such that there will be two files a) fileset.c which will provide
general-purpose interfaces b) sharedfileset.c, which will internally used
fileset.c but this will be specific for DSM-based filesets.
---
src/backend/replication/logical/worker.c | 89 +++++++----
src/backend/storage/file/Makefile | 1 +
src/backend/storage/file/buffile.c | 82 +++++-----
src/backend/storage/file/fd.c | 2 +-
src/backend/storage/file/fileset.c | 204 +++++++++++++++++++++++++
src/backend/storage/file/sharedfileset.c | 239 +-----------------------------
src/backend/utils/sort/logtape.c | 8 +-
src/backend/utils/sort/sharedtuplestore.c | 5 +-
src/include/storage/buffile.h | 12 +-
src/include/storage/fileset.h | 40 +++++
src/include/storage/sharedfileset.h | 14 +-
11 files changed, 366 insertions(+), 330 deletions(-)
create mode 100644 src/backend/storage/file/fileset.c
create mode 100644 src/include/storage/fileset.h
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index ecaed15..d98ebab 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -39,13 +39,13 @@
* BufFile infrastructure supports temporary files that exceed the OS file size
* limit, (b) provides a way for automatic clean up on the error and (c) provides
* a way to survive these files across local transactions and allow to open and
- * close at stream start and close. We decided to use SharedFileSet
+ * close at stream start and close. We decided to use FileSet
* infrastructure as without that it deletes the files on the closure of the
* file and if we decide to keep stream files open across the start/stop stream
* then it will consume a lot of memory (more than 8K for each BufFile and
* there could be multiple such BufFiles as the subscriber could receive
* multiple start/stop streams for different transactions before getting the
- * commit). Moreover, if we don't use SharedFileSet then we also need to invent
+ * commit). Moreover, if we don't use FileSet then we also need to invent
* a new way to pass filenames to BufFile APIs so that we are allowed to open
* the file we desired across multiple stream-open calls for the same
* transaction.
@@ -231,8 +231,8 @@ typedef struct ApplyExecutionData
typedef struct StreamXidHash
{
TransactionId xid; /* xid is the hash key and must be first */
- SharedFileSet *stream_fileset; /* shared file set for stream data */
- SharedFileSet *subxact_fileset; /* shared file set for subxact info */
+ FileSet *stream_fileset; /* file set for stream data */
+ FileSet *subxact_fileset; /* file set for subxact info */
} StreamXidHash;
static MemoryContext ApplyMessageContext = NULL;
@@ -255,8 +255,8 @@ static bool in_streamed_transaction = false;
static TransactionId stream_xid = InvalidTransactionId;
/*
- * Hash table for storing the streaming xid information along with shared file
- * set for streaming and subxact files.
+ * Hash table for storing the streaming xid information along with filesets
+ * for streaming and subxact files.
*/
static HTAB *xidhash = NULL;
@@ -1299,10 +1299,10 @@ apply_handle_stream_abort(StringInfo s)
/* open the changes file */
changes_filename(path, MyLogicalRepWorker->subid, xid);
- fd = BufFileOpenShared(ent->stream_fileset, path, O_RDWR);
+ fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
/* OK, truncate the file at the right offset */
- BufFileTruncateShared(fd, subxact_data.subxacts[subidx].fileno,
+ BufFileTruncateFileSet(fd, subxact_data.subxacts[subidx].fileno,
subxact_data.subxacts[subidx].offset);
BufFileClose(fd);
@@ -1355,7 +1355,7 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
errmsg_internal("transaction %u not found in stream XID hash table",
xid)));
- fd = BufFileOpenShared(ent->stream_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDONLY);
buffer = palloc(BLCKSZ);
initStringInfo(&s2);
@@ -2508,6 +2508,33 @@ UpdateWorkerStats(XLogRecPtr last_lsn, TimestampTz send_time, bool reply)
}
}
+ /*
+ * Cleanup filesets.
+ */
+static void
+worker_cleanup(int code, Datum arg)
+{
+ HASH_SEQ_STATUS status;
+ StreamXidHash *hentry;
+
+ /*
+ * Scan the xidhash table if created and from each entry delete stream
+ * fileset and the subxact fileset.
+ */
+ if (xidhash)
+ {
+ hash_seq_init(&status, xidhash);
+ while ((hentry = (StreamXidHash *) hash_seq_search(&status)) != NULL)
+ {
+ FileSetDeleteAll(hentry->stream_fileset);
+
+ /* Delete the subxact fileset only if it is created */
+ if (hentry->subxact_fileset)
+ FileSetDeleteAll(hentry->subxact_fileset);
+ }
+ }
+}
+
/*
* Apply main loop.
*/
@@ -2534,6 +2561,12 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
"LogicalStreamingContext",
ALLOCSET_DEFAULT_SIZES);
+ /*
+ * Register before-shmem-exit hook to ensure filesets are dropped while we
+ * can still report stats for underlying temporary files.
+ */
+ before_shmem_exit(worker_cleanup, (Datum) 0);
+
/* mark as idle, before starting to loop */
pgstat_report_activity(STATE_IDLE, NULL);
@@ -2979,7 +3012,7 @@ subxact_info_write(Oid subid, TransactionId xid)
if (ent->subxact_fileset)
{
cleanup_subxact_info();
- SharedFileSetDeleteAll(ent->subxact_fileset);
+ FileSetDeleteAll(ent->subxact_fileset);
pfree(ent->subxact_fileset);
ent->subxact_fileset = NULL;
}
@@ -2997,18 +3030,18 @@ subxact_info_write(Oid subid, TransactionId xid)
MemoryContext oldctx;
/*
- * We need to maintain shared fileset across multiple stream
- * start/stop calls. So, need to allocate it in a persistent context.
+ * We need to maintain fileset across multiple stream start/stop calls.
+ * So, need to allocate it in a persistent context.
*/
oldctx = MemoryContextSwitchTo(ApplyContext);
- ent->subxact_fileset = palloc(sizeof(SharedFileSet));
- SharedFileSetInit(ent->subxact_fileset, NULL);
+ ent->subxact_fileset = palloc(sizeof(FileSet));
+ FileSetInit(ent->subxact_fileset);
MemoryContextSwitchTo(oldctx);
- fd = BufFileCreateShared(ent->subxact_fileset, path);
+ fd = BufFileCreateFileSet(ent->subxact_fileset, path);
}
else
- fd = BufFileOpenShared(ent->subxact_fileset, path, O_RDWR);
+ fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDWR);
len = sizeof(SubXactInfo) * subxact_data.nsubxacts;
@@ -3062,7 +3095,7 @@ subxact_info_read(Oid subid, TransactionId xid)
subxact_filename(path, subid, xid);
- fd = BufFileOpenShared(ent->subxact_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDONLY);
/* read number of subxact items */
if (BufFileRead(fd, &subxact_data.nsubxacts,
@@ -3219,7 +3252,7 @@ stream_cleanup_files(Oid subid, TransactionId xid)
/* Delete the change file and release the stream fileset memory */
changes_filename(path, subid, xid);
- SharedFileSetDeleteAll(ent->stream_fileset);
+ FileSetDeleteAll(ent->stream_fileset);
pfree(ent->stream_fileset);
ent->stream_fileset = NULL;
@@ -3227,7 +3260,7 @@ stream_cleanup_files(Oid subid, TransactionId xid)
if (ent->subxact_fileset)
{
subxact_filename(path, subid, xid);
- SharedFileSetDeleteAll(ent->subxact_fileset);
+ FileSetDeleteAll(ent->subxact_fileset);
pfree(ent->subxact_fileset);
ent->subxact_fileset = NULL;
}
@@ -3243,8 +3276,8 @@ stream_cleanup_files(Oid subid, TransactionId xid)
*
* Open a file for streamed changes from a toplevel transaction identified
* by stream_xid (global variable). If it's the first chunk of streamed
- * changes for this transaction, initialize the shared fileset and create the
- * buffile, otherwise open the previously created file.
+ * changes for this transaction, initialize the fileset and create the buffile,
+ * otherwise open the previously created file.
*
* This can only be called at the beginning of a "streaming" block, i.e.
* between stream_start/stream_stop messages from the upstream.
@@ -3285,7 +3318,7 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
if (first_segment)
{
MemoryContext savectx;
- SharedFileSet *fileset;
+ FileSet *fileset;
if (found)
ereport(ERROR,
@@ -3293,16 +3326,16 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
/*
- * We need to maintain shared fileset across multiple stream
- * start/stop calls. So, need to allocate it in a persistent context.
+ * We need to maintain fileset across multiple stream start/stop calls.
+ * So, need to allocate it in a persistent context.
*/
savectx = MemoryContextSwitchTo(ApplyContext);
- fileset = palloc(sizeof(SharedFileSet));
+ fileset = palloc(sizeof(FileSet));
- SharedFileSetInit(fileset, NULL);
+ FileSetInit(fileset);
MemoryContextSwitchTo(savectx);
- stream_fd = BufFileCreateShared(fileset, path);
+ stream_fd = BufFileCreateFileSet(fileset, path);
/* Remember the fileset for the next stream of the same transaction */
ent->xid = xid;
@@ -3320,7 +3353,7 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
* Open the file and seek to the end of the file because we always
* append the changes file.
*/
- stream_fd = BufFileOpenShared(ent->stream_fileset, path, O_RDWR);
+ stream_fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
BufFileSeek(stream_fd, 0, 0, SEEK_END);
}
diff --git a/src/backend/storage/file/Makefile b/src/backend/storage/file/Makefile
index 5e1291b..660ac51 100644
--- a/src/backend/storage/file/Makefile
+++ b/src/backend/storage/file/Makefile
@@ -16,6 +16,7 @@ OBJS = \
buffile.o \
copydir.o \
fd.o \
+ fileset.o \
reinit.o \
sharedfileset.o
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index a4be5fe..b9ca298 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -39,7 +39,7 @@
* BufFile also supports temporary files that can be used by the single backend
* when the corresponding files need to be survived across the transaction and
* need to be opened and closed multiple times. Such files need to be created
- * as a member of a SharedFileSet.
+ * as a member of a FileSet.
*-------------------------------------------------------------------------
*/
@@ -77,8 +77,8 @@ struct BufFile
bool dirty; /* does buffer need to be written? */
bool readOnly; /* has the file been set to read only? */
- SharedFileSet *fileset; /* space for segment files if shared */
- const char *name; /* name of this BufFile if shared */
+ FileSet *fileset; /* space for fileset for fileset based file */
+ const char *name; /* name of this BufFile */
/*
* resowner is the ResourceOwner to use for underlying temp files. (We
@@ -104,7 +104,7 @@ static void extendBufFile(BufFile *file);
static void BufFileLoadBuffer(BufFile *file);
static void BufFileDumpBuffer(BufFile *file);
static void BufFileFlush(BufFile *file);
-static File MakeNewSharedSegment(BufFile *file, int segment);
+static File MakeNewSegment(BufFile *file, int segment);
/*
* Create BufFile and perform the common initialization.
@@ -160,7 +160,7 @@ extendBufFile(BufFile *file)
if (file->fileset == NULL)
pfile = OpenTemporaryFile(file->isInterXact);
else
- pfile = MakeNewSharedSegment(file, file->numFiles);
+ pfile = MakeNewSegment(file, file->numFiles);
Assert(pfile >= 0);
@@ -214,34 +214,34 @@ BufFileCreateTemp(bool interXact)
* Build the name for a given segment of a given BufFile.
*/
static void
-SharedSegmentName(char *name, const char *buffile_name, int segment)
+SegmentName(char *name, const char *buffile_name, int segment)
{
snprintf(name, MAXPGPATH, "%s.%d", buffile_name, segment);
}
/*
- * Create a new segment file backing a shared BufFile.
+ * Create a new segment file backing a fileset BufFile.
*/
static File
-MakeNewSharedSegment(BufFile *buffile, int segment)
+MakeNewSegment(BufFile *buffile, int segment)
{
char name[MAXPGPATH];
File file;
/*
* It is possible that there are files left over from before a crash
- * restart with the same name. In order for BufFileOpenShared() not to
+ * restart with the same name. In order for BufFileOpen() not to
* get confused about how many segments there are, we'll unlink the next
* segment number if it already exists.
*/
- SharedSegmentName(name, buffile->name, segment + 1);
- SharedFileSetDelete(buffile->fileset, name, true);
+ SegmentName(name, buffile->name, segment + 1);
+ FileSetDelete(buffile->fileset, name, true);
/* Create the new segment. */
- SharedSegmentName(name, buffile->name, segment);
- file = SharedFileSetCreate(buffile->fileset, name);
+ SegmentName(name, buffile->name, segment);
+ file = FileSetCreate(buffile->fileset, name);
- /* SharedFileSetCreate would've errored out */
+ /* FileSetCreate would've errored out */
Assert(file > 0);
return file;
@@ -251,7 +251,7 @@ MakeNewSharedSegment(BufFile *buffile, int segment)
* Create a BufFile that can be discovered and opened read-only by other
* backends that are attached to the same SharedFileSet using the same name.
*
- * The naming scheme for shared BufFiles is left up to the calling code. The
+ * The naming scheme for fileset BufFiles is left up to the calling code. The
* name will appear as part of one or more filenames on disk, and might
* provide clues to administrators about which subsystem is generating
* temporary file data. Since each SharedFileSet object is backed by one or
@@ -259,7 +259,7 @@ MakeNewSharedSegment(BufFile *buffile, int segment)
* unrelated SharedFileSet objects.
*/
BufFile *
-BufFileCreateShared(SharedFileSet *fileset, const char *name)
+BufFileCreateFileSet(FileSet *fileset, const char *name)
{
BufFile *file;
@@ -267,7 +267,7 @@ BufFileCreateShared(SharedFileSet *fileset, const char *name)
file->fileset = fileset;
file->name = pstrdup(name);
file->files = (File *) palloc(sizeof(File));
- file->files[0] = MakeNewSharedSegment(file, 0);
+ file->files[0] = MakeNewSegment(file, 0);
file->readOnly = false;
return file;
@@ -275,13 +275,13 @@ BufFileCreateShared(SharedFileSet *fileset, const char *name)
/*
* Open a file that was previously created in another backend (or this one)
- * with BufFileCreateShared in the same SharedFileSet using the same name.
+ * with BufFileCreateFileSet in the same FileSet using the same name.
* The backend that created the file must have called BufFileClose() or
- * BufFileExportShared() to make sure that it is ready to be opened by other
+ * BufFileExportFileSet() to make sure that it is ready to be opened by other
* backends and render it read-only.
*/
BufFile *
-BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
+BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
{
BufFile *file;
char segment_name[MAXPGPATH];
@@ -304,8 +304,8 @@ BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
files = repalloc(files, sizeof(File) * capacity);
}
/* Try to load a segment. */
- SharedSegmentName(segment_name, name, nfiles);
- files[nfiles] = SharedFileSetOpen(fileset, segment_name, mode);
+ SegmentName(segment_name, name, nfiles);
+ files[nfiles] = FileSetOpen(fileset, segment_name, mode);
if (files[nfiles] <= 0)
break;
++nfiles;
@@ -333,18 +333,18 @@ BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
}
/*
- * Delete a BufFile that was created by BufFileCreateShared in the given
- * SharedFileSet using the given name.
+ * Delete a BufFile that was created by BufFileCreateFileSet in the given
+ * FileSet using the given name.
*
* It is not necessary to delete files explicitly with this function. It is
* provided only as a way to delete files proactively, rather than waiting for
- * the SharedFileSet to be cleaned up.
+ * the FileSet to be cleaned up.
*
* Only one backend should attempt to delete a given name, and should know
* that it exists and has been exported or closed.
*/
void
-BufFileDeleteShared(SharedFileSet *fileset, const char *name)
+BufFileDeleteFileSet(FileSet *fileset, const char *name)
{
char segment_name[MAXPGPATH];
int segment = 0;
@@ -357,8 +357,8 @@ BufFileDeleteShared(SharedFileSet *fileset, const char *name)
*/
for (;;)
{
- SharedSegmentName(segment_name, name, segment);
- if (!SharedFileSetDelete(fileset, segment_name, true))
+ SegmentName(segment_name, name, segment);
+ if (!FileSetDelete(fileset, segment_name, true))
break;
found = true;
++segment;
@@ -367,16 +367,16 @@ BufFileDeleteShared(SharedFileSet *fileset, const char *name)
}
if (!found)
- elog(ERROR, "could not delete unknown shared BufFile \"%s\"", name);
+ elog(ERROR, "could not delete unknown BufFile \"%s\"", name);
}
/*
- * BufFileExportShared --- flush and make read-only, in preparation for sharing.
+ * BufFileExportFileSet --- flush and make read-only, in preparation for sharing.
*/
void
-BufFileExportShared(BufFile *file)
+BufFileExportFileSet(BufFile *file)
{
- /* Must be a file belonging to a SharedFileSet. */
+ /* Must be a file belonging to a FileSet. */
Assert(file->fileset != NULL);
/* It's probably a bug if someone calls this twice. */
@@ -785,7 +785,7 @@ BufFileTellBlock(BufFile *file)
#endif
/*
- * Return the current shared BufFile size.
+ * Return the current fileset based BufFile size.
*
* Counts any holes left behind by BufFileAppend as part of the size.
* ereport()s on failure.
@@ -811,8 +811,8 @@ BufFileSize(BufFile *file)
}
/*
- * Append the contents of source file (managed within shared fileset) to
- * end of target file (managed within same shared fileset).
+ * Append the contents of source file (managed within fileset) to
+ * end of target file (managed within same fileset).
*
* Note that operation subsumes ownership of underlying resources from
* "source". Caller should never call BufFileClose against source having
@@ -854,11 +854,11 @@ BufFileAppend(BufFile *target, BufFile *source)
}
/*
- * Truncate a BufFile created by BufFileCreateShared up to the given fileno and
- * the offset.
+ * Truncate a BufFile created by BufFileCreateFileSet up to the given fileno
+ * and the offset.
*/
void
-BufFileTruncateShared(BufFile *file, int fileno, off_t offset)
+BufFileTruncateFileSet(BufFile *file, int fileno, off_t offset)
{
int numFiles = file->numFiles;
int newFile = fileno;
@@ -876,12 +876,12 @@ BufFileTruncateShared(BufFile *file, int fileno, off_t offset)
{
if ((i != fileno || offset == 0) && i != 0)
{
- SharedSegmentName(segment_name, file->name, i);
+ SegmentName(segment_name, file->name, i);
FileClose(file->files[i]);
- if (!SharedFileSetDelete(file->fileset, segment_name, true))
+ if (!FileSetDelete(file->fileset, segment_name, true))
ereport(ERROR,
(errcode_for_file_access(),
- errmsg("could not delete shared fileset \"%s\": %m",
+ errmsg("could not delete fileset \"%s\": %m",
segment_name)));
numFiles--;
newOffset = MAX_PHYSICAL_FILESIZE;
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index b58b399..433e283 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -1921,7 +1921,7 @@ PathNameDeleteTemporaryFile(const char *path, bool error_on_failure)
/*
* Unlike FileClose's automatic file deletion code, we tolerate
- * non-existence to support BufFileDeleteShared which doesn't know how
+ * non-existence to support BufFileDeleteFileSet which doesn't know how
* many segments it has to delete until it runs out.
*/
if (stat_errno == ENOENT)
diff --git a/src/backend/storage/file/fileset.c b/src/backend/storage/file/fileset.c
new file mode 100644
index 0000000..a720c49
--- /dev/null
+++ b/src/backend/storage/file/fileset.c
@@ -0,0 +1,204 @@
+/*-------------------------------------------------------------------------
+ *
+ * fileset.c
+ * temporary file set management.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/storage/file/fileset.c
+ *
+ * FileSets provide a temporary namespace (think directory) so that files can
+ * be discovered by name
+ *
+ * FileSets can be used by backends when the temporary files need to be
+ * opened/closed multiple times and the underlying files need to survive across
+ * transactions.
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <limits.h>
+
+#include "catalog/pg_tablespace.h"
+#include "commands/tablespace.h"
+#include "common/hashfn.h"
+#include "miscadmin.h"
+#include "storage/ipc.h"
+#include "storage/fileset.h"
+#include "utils/builtins.h"
+
+static void FileSetPath(char *path, FileSet *fileset, Oid tablespace);
+static void FilePath(char *path, FileSet *fileset, const char *name);
+static Oid ChooseTablespace(const FileSet *fileset, const char *name);
+
+/*
+ * Initialize a space for temporary files. This API can be used by shared
+ * fileset as well as if the temporary files are used only by single backend
+ * but the files need to be opened and closed multiple times and also the
+ * underlying files need to survive across transactions.
+ *
+ * The callers are expected to explicitly remove such files by using
+ * SharedFileSetDelete/ SharedFileSetDeleteAll.
+ *
+ * Files will be distributed over the tablespaces configured in
+ * temp_tablespaces.
+ *
+ * Under the covers the set is one or more directories which will eventually
+ * be deleted.
+ */
+void
+FileSetInit(FileSet *fileset)
+{
+ static uint32 counter = 0;
+
+ fileset->creator_pid = MyProcPid;
+ fileset->number = counter;
+ counter = (counter + 1) % INT_MAX;
+
+ /* Capture the tablespace OIDs so that all backends agree on them. */
+ PrepareTempTablespaces();
+ fileset->ntablespaces =
+ GetTempTablespaces(&fileset->tablespaces[0],
+ lengthof(fileset->tablespaces));
+ if (fileset->ntablespaces == 0)
+ {
+ /* If the GUC is empty, use current database's default tablespace */
+ fileset->tablespaces[0] = MyDatabaseTableSpace;
+ fileset->ntablespaces = 1;
+ }
+ else
+ {
+ int i;
+
+ /*
+ * An entry of InvalidOid means use the default tablespace for the
+ * current database. Replace that now, to be sure that all users of
+ * the FileSet agree on what to do.
+ */
+ for (i = 0; i < fileset->ntablespaces; i++)
+ {
+ if (fileset->tablespaces[i] == InvalidOid)
+ fileset->tablespaces[i] = MyDatabaseTableSpace;
+ }
+ }
+}
+
+/*
+ * Create a new file in the given set.
+ */
+File
+FileSetCreate(FileSet *fileset, const char *name)
+{
+ char path[MAXPGPATH];
+ File file;
+
+ FilePath(path, fileset, name);
+ file = PathNameCreateTemporaryFile(path, false);
+
+ /* If we failed, see if we need to create the directory on demand. */
+ if (file <= 0)
+ {
+ char tempdirpath[MAXPGPATH];
+ char filesetpath[MAXPGPATH];
+ Oid tablespace = ChooseTablespace(fileset, name);
+
+ TempTablespacePath(tempdirpath, tablespace);
+ FileSetPath(filesetpath, fileset, tablespace);
+ PathNameCreateTemporaryDir(tempdirpath, filesetpath);
+ file = PathNameCreateTemporaryFile(path, true);
+ }
+
+ return file;
+}
+
+/*
+ * Open a file that was created with FileSetCreate() */
+File
+FileSetOpen(FileSet *fileset, const char *name, int mode)
+{
+ char path[MAXPGPATH];
+ File file;
+
+ FilePath(path, fileset, name);
+ file = PathNameOpenTemporaryFile(path, mode);
+
+ return file;
+}
+
+/*
+ * Delete a file that was created with FileSetCreate().
+ * Return true if the file existed, false if didn't.
+ */
+bool
+FileSetDelete(FileSet *fileset, const char *name,
+ bool error_on_failure)
+{
+ char path[MAXPGPATH];
+
+ FilePath(path, fileset, name);
+
+ return PathNameDeleteTemporaryFile(path, error_on_failure);
+}
+
+/*
+ * Delete all files in the set.
+ */
+void
+FileSetDeleteAll(FileSet *fileset)
+{
+ char dirpath[MAXPGPATH];
+ int i;
+
+ /*
+ * Delete the directory we created in each tablespace. Doesn't fail
+ * because we use this in error cleanup paths, but can generate LOG
+ * message on IO error.
+ */
+ for (i = 0; i < fileset->ntablespaces; ++i)
+ {
+ FileSetPath(dirpath, fileset, fileset->tablespaces[i]);
+ PathNameDeleteTemporaryDir(dirpath);
+ }
+}
+
+/*
+ * Build the path for the directory holding the files backing a FileSet in a
+ * given tablespace.
+ */
+static void
+FileSetPath(char *path, FileSet *fileset, Oid tablespace)
+{
+ char tempdirpath[MAXPGPATH];
+
+ TempTablespacePath(tempdirpath, tablespace);
+ snprintf(path, MAXPGPATH, "%s/%s%lu.%u.sharedfileset",
+ tempdirpath, PG_TEMP_FILE_PREFIX,
+ (unsigned long) fileset->creator_pid, fileset->number);
+}
+
+/*
+ * Sorting hat to determine which tablespace a given temporary file belongs in.
+ */
+static Oid
+ChooseTablespace(const FileSet *fileset, const char *name)
+{
+ uint32 hash = hash_any((const unsigned char *) name, strlen(name));
+
+ return fileset->tablespaces[hash % fileset->ntablespaces];
+}
+
+/*
+ * Compute the full path of a file in a FileSet.
+ */
+static void
+FilePath(char *path, FileSet *fileset, const char *name)
+{
+ char dirpath[MAXPGPATH];
+
+ FileSetPath(dirpath, fileset, ChooseTablespace(fileset, name));
+ snprintf(path, MAXPGPATH, "%s/%s", dirpath, name);
+}
diff --git a/src/backend/storage/file/sharedfileset.c b/src/backend/storage/file/sharedfileset.c
index ed37c94..96ad968 100644
--- a/src/backend/storage/file/sharedfileset.c
+++ b/src/backend/storage/file/sharedfileset.c
@@ -13,10 +13,6 @@
* files can be discovered by name, and a shared ownership semantics so that
* shared files survive until the last user detaches.
*
- * SharedFileSets can be used by backends when the temporary files need to be
- * opened/closed multiple times and the underlying files need to survive across
- * transactions.
- *
*-------------------------------------------------------------------------
*/
@@ -33,13 +29,7 @@
#include "storage/sharedfileset.h"
#include "utils/builtins.h"
-static List *filesetlist = NIL;
-
static void SharedFileSetOnDetach(dsm_segment *segment, Datum datum);
-static void SharedFileSetDeleteOnProcExit(int status, Datum arg);
-static void SharedFileSetPath(char *path, SharedFileSet *fileset, Oid tablespace);
-static void SharedFilePath(char *path, SharedFileSet *fileset, const char *name);
-static Oid ChooseTablespace(const SharedFileSet *fileset, const char *name);
/*
* Initialize a space for temporary files that can be opened by other backends.
@@ -47,13 +37,6 @@ static Oid ChooseTablespace(const SharedFileSet *fileset, const char *name);
* SharedFileSet with 'seg'. Any contained files will be deleted when the
* last backend detaches.
*
- * We can also use this interface if the temporary files are used only by
- * single backend but the files need to be opened and closed multiple times
- * and also the underlying files need to survive across transactions. For
- * such cases, dsm segment 'seg' should be passed as NULL. Callers are
- * expected to explicitly remove such files by using SharedFileSetDelete/
- * SharedFileSetDeleteAll or we remove such files on proc exit.
- *
* Files will be distributed over the tablespaces configured in
* temp_tablespaces.
*
@@ -63,61 +46,14 @@ static Oid ChooseTablespace(const SharedFileSet *fileset, const char *name);
void
SharedFileSetInit(SharedFileSet *fileset, dsm_segment *seg)
{
- static uint32 counter = 0;
-
SpinLockInit(&fileset->mutex);
fileset->refcnt = 1;
- fileset->creator_pid = MyProcPid;
- fileset->number = counter;
- counter = (counter + 1) % INT_MAX;
-
- /* Capture the tablespace OIDs so that all backends agree on them. */
- PrepareTempTablespaces();
- fileset->ntablespaces =
- GetTempTablespaces(&fileset->tablespaces[0],
- lengthof(fileset->tablespaces));
- if (fileset->ntablespaces == 0)
- {
- /* If the GUC is empty, use current database's default tablespace */
- fileset->tablespaces[0] = MyDatabaseTableSpace;
- fileset->ntablespaces = 1;
- }
- else
- {
- int i;
- /*
- * An entry of InvalidOid means use the default tablespace for the
- * current database. Replace that now, to be sure that all users of
- * the SharedFileSet agree on what to do.
- */
- for (i = 0; i < fileset->ntablespaces; i++)
- {
- if (fileset->tablespaces[i] == InvalidOid)
- fileset->tablespaces[i] = MyDatabaseTableSpace;
- }
- }
+ FileSetInit(&fileset->fs);
/* Register our cleanup callback. */
if (seg)
on_dsm_detach(seg, SharedFileSetOnDetach, PointerGetDatum(fileset));
- else
- {
- static bool registered_cleanup = false;
-
- if (!registered_cleanup)
- {
- /*
- * We must not have registered any fileset before registering the
- * fileset clean up.
- */
- Assert(filesetlist == NIL);
- on_proc_exit(SharedFileSetDeleteOnProcExit, 0);
- registered_cleanup = true;
- }
-
- filesetlist = lcons((void *) fileset, filesetlist);
- }
}
/*
@@ -148,86 +84,12 @@ SharedFileSetAttach(SharedFileSet *fileset, dsm_segment *seg)
}
/*
- * Create a new file in the given set.
- */
-File
-SharedFileSetCreate(SharedFileSet *fileset, const char *name)
-{
- char path[MAXPGPATH];
- File file;
-
- SharedFilePath(path, fileset, name);
- file = PathNameCreateTemporaryFile(path, false);
-
- /* If we failed, see if we need to create the directory on demand. */
- if (file <= 0)
- {
- char tempdirpath[MAXPGPATH];
- char filesetpath[MAXPGPATH];
- Oid tablespace = ChooseTablespace(fileset, name);
-
- TempTablespacePath(tempdirpath, tablespace);
- SharedFileSetPath(filesetpath, fileset, tablespace);
- PathNameCreateTemporaryDir(tempdirpath, filesetpath);
- file = PathNameCreateTemporaryFile(path, true);
- }
-
- return file;
-}
-
-/*
- * Open a file that was created with SharedFileSetCreate(), possibly in
- * another backend.
- */
-File
-SharedFileSetOpen(SharedFileSet *fileset, const char *name, int mode)
-{
- char path[MAXPGPATH];
- File file;
-
- SharedFilePath(path, fileset, name);
- file = PathNameOpenTemporaryFile(path, mode);
-
- return file;
-}
-
-/*
- * Delete a file that was created with SharedFileSetCreate().
- * Return true if the file existed, false if didn't.
- */
-bool
-SharedFileSetDelete(SharedFileSet *fileset, const char *name,
- bool error_on_failure)
-{
- char path[MAXPGPATH];
-
- SharedFilePath(path, fileset, name);
-
- return PathNameDeleteTemporaryFile(path, error_on_failure);
-}
-
-/*
* Delete all files in the set.
*/
void
SharedFileSetDeleteAll(SharedFileSet *fileset)
{
- char dirpath[MAXPGPATH];
- int i;
-
- /*
- * Delete the directory we created in each tablespace. Doesn't fail
- * because we use this in error cleanup paths, but can generate LOG
- * message on IO error.
- */
- for (i = 0; i < fileset->ntablespaces; ++i)
- {
- SharedFileSetPath(dirpath, fileset, fileset->tablespaces[i]);
- PathNameDeleteTemporaryDir(dirpath);
- }
-
- /* Unregister the shared fileset */
- SharedFileSetUnregister(fileset);
+ return FileSetDeleteAll(&fileset->fs);
}
/*
@@ -255,100 +117,5 @@ SharedFileSetOnDetach(dsm_segment *segment, Datum datum)
* this function so we can safely access its data.
*/
if (unlink_all)
- SharedFileSetDeleteAll(fileset);
-}
-
-/*
- * Callback function that will be invoked on the process exit. This will
- * process the list of all the registered sharedfilesets and delete the
- * underlying files.
- */
-static void
-SharedFileSetDeleteOnProcExit(int status, Datum arg)
-{
- /*
- * Remove all the pending shared fileset entries. We don't use foreach()
- * here because SharedFileSetDeleteAll will remove the current element in
- * filesetlist. Though we have used foreach_delete_current() to remove the
- * element from filesetlist it could only fix up the state of one of the
- * loops, see SharedFileSetUnregister.
- */
- while (list_length(filesetlist) > 0)
- {
- SharedFileSet *fileset = (SharedFileSet *) linitial(filesetlist);
-
- SharedFileSetDeleteAll(fileset);
- }
-
- filesetlist = NIL;
-}
-
-/*
- * Unregister the shared fileset entry registered for cleanup on proc exit.
- */
-void
-SharedFileSetUnregister(SharedFileSet *input_fileset)
-{
- ListCell *l;
-
- /*
- * If the caller is following the dsm based cleanup then we don't maintain
- * the filesetlist so return.
- */
- if (filesetlist == NIL)
- return;
-
- foreach(l, filesetlist)
- {
- SharedFileSet *fileset = (SharedFileSet *) lfirst(l);
-
- /* Remove the entry from the list */
- if (input_fileset == fileset)
- {
- filesetlist = foreach_delete_current(filesetlist, l);
- return;
- }
- }
-
- /* Should have found a match */
- Assert(false);
-}
-
-/*
- * Build the path for the directory holding the files backing a SharedFileSet
- * in a given tablespace.
- */
-static void
-SharedFileSetPath(char *path, SharedFileSet *fileset, Oid tablespace)
-{
- char tempdirpath[MAXPGPATH];
-
- TempTablespacePath(tempdirpath, tablespace);
- snprintf(path, MAXPGPATH, "%s/%s%lu.%u.sharedfileset",
- tempdirpath, PG_TEMP_FILE_PREFIX,
- (unsigned long) fileset->creator_pid, fileset->number);
-}
-
-/*
- * Sorting hat to determine which tablespace a given shared temporary file
- * belongs in.
- */
-static Oid
-ChooseTablespace(const SharedFileSet *fileset, const char *name)
-{
- uint32 hash = hash_any((const unsigned char *) name, strlen(name));
-
- return fileset->tablespaces[hash % fileset->ntablespaces];
-}
-
-/*
- * Compute the full path of a file in a SharedFileSet.
- */
-static void
-SharedFilePath(char *path, SharedFileSet *fileset, const char *name)
-{
- char dirpath[MAXPGPATH];
-
- SharedFileSetPath(dirpath, fileset, ChooseTablespace(fileset, name));
- snprintf(path, MAXPGPATH, "%s/%s", dirpath, name);
+ FileSetDeleteAll(&fileset->fs);
}
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index cafc087..f7994d7 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -564,7 +564,7 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
lt = <s->tapes[i];
pg_itoa(i, filename);
- file = BufFileOpenShared(fileset, filename, O_RDONLY);
+ file = BufFileOpenFileSet(&fileset->fs, filename, O_RDONLY);
filesize = BufFileSize(file);
/*
@@ -610,7 +610,7 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
* offset).
*
* The only thing that currently prevents writing to the leader tape from
- * working is the fact that BufFiles opened using BufFileOpenShared() are
+ * working is the fact that BufFiles opened using BufFileOpenFileSet() are
* read-only by definition, but that could be changed if it seemed
* worthwhile. For now, writing to the leader tape will raise a "Bad file
* descriptor" error, so tuplesort must avoid writing to the leader tape
@@ -722,7 +722,7 @@ LogicalTapeSetCreate(int ntapes, bool preallocate, TapeShare *shared,
char filename[MAXPGPATH];
pg_itoa(worker, filename);
- lts->pfile = BufFileCreateShared(fileset, filename);
+ lts->pfile = BufFileCreateFileSet(&fileset->fs, filename);
}
else
lts->pfile = BufFileCreateTemp(false);
@@ -1096,7 +1096,7 @@ LogicalTapeFreeze(LogicalTapeSet *lts, int tapenum, TapeShare *share)
/* Handle extra steps when caller is to share its tapeset */
if (share)
{
- BufFileExportShared(lts->pfile);
+ BufFileExportFileSet(lts->pfile);
share->firstblocknumber = lt->firstBlockNumber;
}
}
diff --git a/src/backend/utils/sort/sharedtuplestore.c b/src/backend/utils/sort/sharedtuplestore.c
index 57e35db..72acd54 100644
--- a/src/backend/utils/sort/sharedtuplestore.c
+++ b/src/backend/utils/sort/sharedtuplestore.c
@@ -310,7 +310,8 @@ sts_puttuple(SharedTuplestoreAccessor *accessor, void *meta_data,
/* Create one. Only this backend will write into it. */
sts_filename(name, accessor, accessor->participant);
- accessor->write_file = BufFileCreateShared(accessor->fileset, name);
+ accessor->write_file =
+ BufFileCreateFileSet(&accessor->fileset->fs, name);
/* Set up the shared state for this backend's file. */
participant = &accessor->sts->participants[accessor->participant];
@@ -559,7 +560,7 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
sts_filename(name, accessor, accessor->read_participant);
accessor->read_file =
- BufFileOpenShared(accessor->fileset, name, O_RDONLY);
+ BufFileOpenFileSet(&accessor->fileset->fs, name, O_RDONLY);
}
/* Seek and load the chunk header. */
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 566523d..032a823 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -26,7 +26,7 @@
#ifndef BUFFILE_H
#define BUFFILE_H
-#include "storage/sharedfileset.h"
+#include "storage/fileset.h"
/* BufFile is an opaque type whose details are not known outside buffile.c. */
@@ -46,11 +46,11 @@ extern int BufFileSeekBlock(BufFile *file, long blknum);
extern int64 BufFileSize(BufFile *file);
extern long BufFileAppend(BufFile *target, BufFile *source);
-extern BufFile *BufFileCreateShared(SharedFileSet *fileset, const char *name);
-extern void BufFileExportShared(BufFile *file);
-extern BufFile *BufFileOpenShared(SharedFileSet *fileset, const char *name,
+extern BufFile *BufFileCreateFileSet(FileSet *fileset, const char *name);
+extern void BufFileExportFileSet(BufFile *file);
+extern BufFile *BufFileOpenFileSet(FileSet *fileset, const char *name,
int mode);
-extern void BufFileDeleteShared(SharedFileSet *fileset, const char *name);
-extern void BufFileTruncateShared(BufFile *file, int fileno, off_t offset);
+extern void BufFileDeleteFileSet(FileSet *fileset, const char *name);
+extern void BufFileTruncateFileSet(BufFile *file, int fileno, off_t offset);
#endif /* BUFFILE_H */
diff --git a/src/include/storage/fileset.h b/src/include/storage/fileset.h
new file mode 100644
index 0000000..8795fd1
--- /dev/null
+++ b/src/include/storage/fileset.h
@@ -0,0 +1,40 @@
+/*-------------------------------------------------------------------------
+ *
+ * fileset.h
+ * temporary file management.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/fileset.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FILESET_H
+#define FILESET_H
+
+#include "storage/fd.h"
+
+/*
+ * A set of temporary files.
+ */
+typedef struct FileSet
+{
+ pid_t creator_pid; /* PID of the creating process */
+ uint32 number; /* per-PID identifier */
+ int ntablespaces; /* number of tablespaces to use */
+ Oid tablespaces[8]; /* OIDs of tablespaces to use. Assumes that
+ * it's rare that there more than temp
+ * tablespaces. */
+} FileSet;
+
+extern void FileSetInit(FileSet *fileset);
+extern File FileSetCreate(FileSet *fileset, const char *name);
+extern File FileSetOpen(FileSet *fileset, const char *name,
+ int mode);
+extern bool FileSetDelete(FileSet *fileset, const char *name,
+ bool error_on_failure);
+extern void FileSetDeleteAll(FileSet *fileset);
+
+#endif
diff --git a/src/include/storage/sharedfileset.h b/src/include/storage/sharedfileset.h
index 09ba121..59becfb 100644
--- a/src/include/storage/sharedfileset.h
+++ b/src/include/storage/sharedfileset.h
@@ -17,6 +17,7 @@
#include "storage/dsm.h"
#include "storage/fd.h"
+#include "storage/fileset.h"
#include "storage/spin.h"
/*
@@ -24,24 +25,13 @@
*/
typedef struct SharedFileSet
{
- pid_t creator_pid; /* PID of the creating process */
- uint32 number; /* per-PID identifier */
+ FileSet fs;
slock_t mutex; /* mutex protecting the reference count */
int refcnt; /* number of attached backends */
- int ntablespaces; /* number of tablespaces to use */
- Oid tablespaces[8]; /* OIDs of tablespaces to use. Assumes that
- * it's rare that there more than temp
- * tablespaces. */
} SharedFileSet;
extern void SharedFileSetInit(SharedFileSet *fileset, dsm_segment *seg);
extern void SharedFileSetAttach(SharedFileSet *fileset, dsm_segment *seg);
-extern File SharedFileSetCreate(SharedFileSet *fileset, const char *name);
-extern File SharedFileSetOpen(SharedFileSet *fileset, const char *name,
- int mode);
-extern bool SharedFileSetDelete(SharedFileSet *fileset, const char *name,
- bool error_on_failure);
extern void SharedFileSetDeleteAll(SharedFileSet *fileset);
-extern void SharedFileSetUnregister(SharedFileSet *input_fileset);
#endif
--
1.8.3.1
v2-0002-Better-usage-of-sharedfileset-in-apply-worker.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Better-usage-of-sharedfileset-in-apply-worker.patchDownload
From d764f828e1cbbf6a69b9adf929282cb12b501ddb Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Mon, 23 Aug 2021 15:07:28 +0530
Subject: [PATCH v2 2/2] Better usage of sharedfileset in apply worker
Instead of using a separate shared fileset for each xid, use one shared
fileset for whole lifetime of the worker. So for each xid, just create
shared buffile under that shared fileset and remove the file whenever we
are done with the file. For subxact file we only need to create once
we get the first subtransaction and for detecting that we also extend the
buffile open and buffile delete interfaces to allow the missing files.
---
src/backend/replication/logical/worker.c | 238 ++++++------------------------
src/backend/storage/file/buffile.c | 23 ++-
src/backend/utils/sort/logtape.c | 2 +-
src/backend/utils/sort/sharedtuplestore.c | 3 +-
src/include/storage/buffile.h | 5 +-
5 files changed, 65 insertions(+), 206 deletions(-)
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index d98ebab..f991e09 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -221,20 +221,6 @@ typedef struct ApplyExecutionData
PartitionTupleRouting *proute; /* partition routing info */
} ApplyExecutionData;
-/*
- * Stream xid hash entry. Whenever we see a new xid we create this entry in the
- * xidhash and along with it create the streaming file and store the fileset handle.
- * The subxact file is created iff there is any subxact info under this xid. This
- * entry is used on the subsequent streams for the xid to get the corresponding
- * fileset handles, so storing them in hash makes the search faster.
- */
-typedef struct StreamXidHash
-{
- TransactionId xid; /* xid is the hash key and must be first */
- FileSet *stream_fileset; /* file set for stream data */
- FileSet *subxact_fileset; /* file set for subxact info */
-} StreamXidHash;
-
static MemoryContext ApplyMessageContext = NULL;
MemoryContext ApplyContext = NULL;
@@ -255,10 +241,11 @@ static bool in_streamed_transaction = false;
static TransactionId stream_xid = InvalidTransactionId;
/*
- * Hash table for storing the streaming xid information along with filesets
- * for streaming and subxact files.
+ * Fileset for storing the changes and subxact information for the streaming
+ * transaction. We will use only one fileset and for each xid a separate
+ * changes and subxact files will be created under the same fileset.
*/
-static HTAB *xidhash = NULL;
+static FileSet *xidfileset = NULL;
/* BufFile handle of the current streaming file */
static BufFile *stream_fd = NULL;
@@ -1129,7 +1116,6 @@ static void
apply_handle_stream_start(StringInfo s)
{
bool first_segment;
- HASHCTL hash_ctl;
if (in_streamed_transaction)
ereport(ERROR,
@@ -1157,17 +1143,21 @@ apply_handle_stream_start(StringInfo s)
errmsg_internal("invalid transaction ID in streamed replication transaction")));
/*
- * Initialize the xidhash table if we haven't yet. This will be used for
- * the entire duration of the apply worker so create it in permanent
- * context.
+ * Initialize the xidfileset if we haven't yet. This will be used for the
+ * entire duration of the apply worker so create it in permanent context.
*/
- if (xidhash == NULL)
+ if (xidfileset == NULL)
{
- hash_ctl.keysize = sizeof(TransactionId);
- hash_ctl.entrysize = sizeof(StreamXidHash);
- hash_ctl.hcxt = ApplyContext;
- xidhash = hash_create("StreamXidHash", 1024, &hash_ctl,
- HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ MemoryContext oldctx;
+
+ /*
+ * We need to keep the fileset for the worker lifetime so need to
+ * allocate it in a persistent context.
+ */
+ oldctx = MemoryContextSwitchTo(ApplyContext);
+ xidfileset = palloc(sizeof(FileSet));
+ FileSetInit(xidfileset);
+ MemoryContextSwitchTo(oldctx);
}
/* open the spool file for this transaction */
@@ -1258,7 +1248,6 @@ apply_handle_stream_abort(StringInfo s)
BufFile *fd;
bool found = false;
char path[MAXPGPATH];
- StreamXidHash *ent;
subidx = -1;
begin_replication_step();
@@ -1287,19 +1276,9 @@ apply_handle_stream_abort(StringInfo s)
return;
}
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
/* open the changes file */
changes_filename(path, MyLogicalRepWorker->subid, xid);
- fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
+ fd = BufFileOpenFileSet(xidfileset, path, O_RDWR, false);
/* OK, truncate the file at the right offset */
BufFileTruncateFileSet(fd, subxact_data.subxacts[subidx].fileno,
@@ -1327,7 +1306,6 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
int nchanges;
char path[MAXPGPATH];
char *buffer = NULL;
- StreamXidHash *ent;
MemoryContext oldcxt;
BufFile *fd;
@@ -1345,17 +1323,7 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
changes_filename(path, MyLogicalRepWorker->subid, xid);
elog(DEBUG1, "replaying changes from file \"%s\"", path);
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
- fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(xidfileset, path, O_RDONLY, false);
buffer = palloc(BLCKSZ);
initStringInfo(&s2);
@@ -2508,31 +2476,14 @@ UpdateWorkerStats(XLogRecPtr last_lsn, TimestampTz send_time, bool reply)
}
}
- /*
- * Cleanup filesets.
+/*
+ * Cleanup fileset if created.
*/
static void
worker_cleanup(int code, Datum arg)
{
- HASH_SEQ_STATUS status;
- StreamXidHash *hentry;
-
- /*
- * Scan the xidhash table if created and from each entry delete stream
- * fileset and the subxact fileset.
- */
- if (xidhash)
- {
- hash_seq_init(&status, xidhash);
- while ((hentry = (StreamXidHash *) hash_seq_search(&status)) != NULL)
- {
- FileSetDeleteAll(hentry->stream_fileset);
-
- /* Delete the subxact fileset only if it is created */
- if (hentry->subxact_fileset)
- FileSetDeleteAll(hentry->subxact_fileset);
- }
- }
+ if (xidfileset != NULL)
+ FileSetDeleteAll(xidfileset);
}
/*
@@ -2562,7 +2513,7 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
ALLOCSET_DEFAULT_SIZES);
/*
- * Register before-shmem-exit hook to ensure filesets are dropped while we
+ * Register before-shmem-exit hook to ensure fileset is dropped while we
* can still report stats for underlying temporary files.
*/
before_shmem_exit(worker_cleanup, (Datum) 0);
@@ -2990,18 +2941,11 @@ subxact_info_write(Oid subid, TransactionId xid)
{
char path[MAXPGPATH];
Size len;
- StreamXidHash *ent;
BufFile *fd;
Assert(TransactionIdIsValid(xid));
- /* Find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- /* By this time we must have created the transaction entry */
- Assert(ent);
+ subxact_filename(path, subid, xid);
/*
* If there is no subtransaction then nothing to do, but if already have
@@ -3009,39 +2953,18 @@ subxact_info_write(Oid subid, TransactionId xid)
*/
if (subxact_data.nsubxacts == 0)
{
- if (ent->subxact_fileset)
- {
- cleanup_subxact_info();
- FileSetDeleteAll(ent->subxact_fileset);
- pfree(ent->subxact_fileset);
- ent->subxact_fileset = NULL;
- }
+ cleanup_subxact_info();
+ BufFileDeleteFileSet(xidfileset, path, true);
+
return;
}
subxact_filename(path, subid, xid);
- /*
- * Create the subxact file if it not already created, otherwise open the
- * existing file.
- */
- if (ent->subxact_fileset == NULL)
- {
- MemoryContext oldctx;
-
- /*
- * We need to maintain fileset across multiple stream start/stop calls.
- * So, need to allocate it in a persistent context.
- */
- oldctx = MemoryContextSwitchTo(ApplyContext);
- ent->subxact_fileset = palloc(sizeof(FileSet));
- FileSetInit(ent->subxact_fileset);
- MemoryContextSwitchTo(oldctx);
-
- fd = BufFileCreateFileSet(ent->subxact_fileset, path);
- }
- else
- fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDWR);
+ /* Try to open the subxact file, if it doesn't exist then create it */
+ fd = BufFileOpenFileSet(xidfileset, path, O_RDWR, true);
+ if (fd == NULL)
+ fd = BufFileCreateFileSet(xidfileset, path);
len = sizeof(SubXactInfo) * subxact_data.nsubxacts;
@@ -3068,34 +2991,17 @@ subxact_info_read(Oid subid, TransactionId xid)
char path[MAXPGPATH];
Size len;
BufFile *fd;
- StreamXidHash *ent;
MemoryContext oldctx;
Assert(!subxact_data.subxacts);
Assert(subxact_data.nsubxacts == 0);
Assert(subxact_data.nsubxacts_max == 0);
- /* Find the stream xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
- /*
- * If subxact_fileset is not valid that mean we don't have any subxact
- * info
- */
- if (ent->subxact_fileset == NULL)
- return;
-
subxact_filename(path, subid, xid);
- fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(xidfileset, path, O_RDONLY, true);
+ if (fd == NULL)
+ return;
/* read number of subxact items */
if (BufFileRead(fd, &subxact_data.nsubxacts,
@@ -3237,36 +3143,13 @@ static void
stream_cleanup_files(Oid subid, TransactionId xid)
{
char path[MAXPGPATH];
- StreamXidHash *ent;
-
- /* Find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
/* Delete the change file and release the stream fileset memory */
changes_filename(path, subid, xid);
- FileSetDeleteAll(ent->stream_fileset);
- pfree(ent->stream_fileset);
- ent->stream_fileset = NULL;
-
- /* Delete the subxact file and release the memory, if it exist */
- if (ent->subxact_fileset)
- {
- subxact_filename(path, subid, xid);
- FileSetDeleteAll(ent->subxact_fileset);
- pfree(ent->subxact_fileset);
- ent->subxact_fileset = NULL;
- }
+ BufFileDeleteFileSet(xidfileset, path, false);
- /* Remove the xid entry from the stream xid hash */
- hash_search(xidhash, (void *) &xid, HASH_REMOVE, NULL);
+ subxact_filename(path, subid, xid);
+ BufFileDeleteFileSet(xidfileset, path, true);
}
/*
@@ -3276,8 +3159,8 @@ stream_cleanup_files(Oid subid, TransactionId xid)
*
* Open a file for streamed changes from a toplevel transaction identified
* by stream_xid (global variable). If it's the first chunk of streamed
- * changes for this transaction, initialize the fileset and create the buffile,
- * otherwise open the previously created file.
+ * changes for this transaction, create the buffile, otherwise open the
+ * previously created file.
*
* This can only be called at the beginning of a "streaming" block, i.e.
* between stream_start/stream_stop messages from the upstream.
@@ -3286,20 +3169,13 @@ static void
stream_open_file(Oid subid, TransactionId xid, bool first_segment)
{
char path[MAXPGPATH];
- bool found;
MemoryContext oldcxt;
- StreamXidHash *ent;
Assert(in_streamed_transaction);
Assert(OidIsValid(subid));
Assert(TransactionIdIsValid(xid));
Assert(stream_fd == NULL);
- /* create or find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_ENTER,
- &found);
changes_filename(path, subid, xid);
elog(DEBUG1, "opening file \"%s\" for streamed changes", path);
@@ -3316,44 +3192,14 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
* writing, in append mode.
*/
if (first_segment)
- {
- MemoryContext savectx;
- FileSet *fileset;
-
- if (found)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
-
- /*
- * We need to maintain fileset across multiple stream start/stop calls.
- * So, need to allocate it in a persistent context.
- */
- savectx = MemoryContextSwitchTo(ApplyContext);
- fileset = palloc(sizeof(FileSet));
-
- FileSetInit(fileset);
- MemoryContextSwitchTo(savectx);
-
- stream_fd = BufFileCreateFileSet(fileset, path);
-
- /* Remember the fileset for the next stream of the same transaction */
- ent->xid = xid;
- ent->stream_fileset = fileset;
- ent->subxact_fileset = NULL;
- }
+ stream_fd = BufFileCreateFileSet(xidfileset, path);
else
{
- if (!found)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
-
/*
* Open the file and seek to the end of the file because we always
* append the changes file.
*/
- stream_fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
+ stream_fd = BufFileOpenFileSet(xidfileset, path, O_RDWR, false);
BufFileSeek(stream_fd, 0, 0, SEEK_END);
}
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index b9ca298..dbaa0e6 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -278,10 +278,12 @@ BufFileCreateFileSet(FileSet *fileset, const char *name)
* with BufFileCreateFileSet in the same FileSet using the same name.
* The backend that created the file must have called BufFileClose() or
* BufFileExportFileSet() to make sure that it is ready to be opened by other
- * backends and render it read-only.
+ * backends and render it read-only. If missing_ok is true then it will return
+ * NULL if file doesn't exist otherwise error.
*/
BufFile *
-BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
+BufFileOpenFileSet(FileSet *fileset, const char *name, int mode,
+ bool missing_ok)
{
BufFile *file;
char segment_name[MAXPGPATH];
@@ -318,10 +320,18 @@ BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
* name.
*/
if (nfiles == 0)
+ {
+ /* free the memory */
+ pfree(files);
+
+ if (missing_ok)
+ return NULL;
+
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not open temporary file \"%s\" from BufFile \"%s\": %m",
segment_name, name)));
+ }
file = makeBufFileCommon(nfiles);
file->files = files;
@@ -341,10 +351,11 @@ BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
* the FileSet to be cleaned up.
*
* Only one backend should attempt to delete a given name, and should know
- * that it exists and has been exported or closed.
+ * that it exists and has been exported or closed otherwise missing_ok should
+ * be passed true.
*/
void
-BufFileDeleteFileSet(FileSet *fileset, const char *name)
+BufFileDeleteFileSet(FileSet *fileset, const char *name, bool missing_ok)
{
char segment_name[MAXPGPATH];
int segment = 0;
@@ -358,7 +369,7 @@ BufFileDeleteFileSet(FileSet *fileset, const char *name)
for (;;)
{
SegmentName(segment_name, name, segment);
- if (!FileSetDelete(fileset, segment_name, true))
+ if (!FileSetDelete(fileset, segment_name, !missing_ok))
break;
found = true;
++segment;
@@ -366,7 +377,7 @@ BufFileDeleteFileSet(FileSet *fileset, const char *name)
CHECK_FOR_INTERRUPTS();
}
- if (!found)
+ if (!found && !missing_ok)
elog(ERROR, "could not delete unknown BufFile \"%s\"", name);
}
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index f7994d7..debf12e1 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -564,7 +564,7 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
lt = <s->tapes[i];
pg_itoa(i, filename);
- file = BufFileOpenFileSet(&fileset->fs, filename, O_RDONLY);
+ file = BufFileOpenFileSet(&fileset->fs, filename, O_RDONLY, false);
filesize = BufFileSize(file);
/*
diff --git a/src/backend/utils/sort/sharedtuplestore.c b/src/backend/utils/sort/sharedtuplestore.c
index 72acd54..8c5135c 100644
--- a/src/backend/utils/sort/sharedtuplestore.c
+++ b/src/backend/utils/sort/sharedtuplestore.c
@@ -560,7 +560,8 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
sts_filename(name, accessor, accessor->read_participant);
accessor->read_file =
- BufFileOpenFileSet(&accessor->fileset->fs, name, O_RDONLY);
+ BufFileOpenFileSet(&accessor->fileset->fs, name, O_RDONLY,
+ false);
}
/* Seek and load the chunk header. */
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 032a823..5e9df44 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -49,8 +49,9 @@ extern long BufFileAppend(BufFile *target, BufFile *source);
extern BufFile *BufFileCreateFileSet(FileSet *fileset, const char *name);
extern void BufFileExportFileSet(BufFile *file);
extern BufFile *BufFileOpenFileSet(FileSet *fileset, const char *name,
- int mode);
-extern void BufFileDeleteFileSet(FileSet *fileset, const char *name);
+ int mode, bool missing_ok);
+extern void BufFileDeleteFileSet(FileSet *fileset, const char *name,
+ bool missing_ok);
extern void BufFileTruncateFileSet(BufFile *file, int fileno, off_t offset);
#endif /* BUFFILE_H */
--
1.8.3.1
On Mon, Aug 23, 2021 at 3:13 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Aug 23, 2021 at 1:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Note: merge comments from multiple mails
I think we should handle that in worker.c itself, by adding a
before_dsm_detach function before_shmem_exit right?Yeah, I thought of handling it in worker.c similar to what you've in 0002 patch.
I have done handling in worker.c
On Mon, Aug 23, 2021 at 9:11 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:On Sat, Aug 21, 2021 8:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
1) + TempTablespacePath(tempdirpath, tablespace); + snprintf(path, MAXPGPATH, "%s/%s%lu.%u.sharedfileset", + tempdirpath, PG_TEMP_FILE_PREFIX, + (unsigned long) fileset->creator_pid, fileset->number);do we need to use different filename for shared and un-shared fileset ?
I was also thinking about the same, does it make sense to name it just
""%s/%s%lu.%u.fileset"?
I think it is reasonable to use .fileset as proposed by you.
Few other comments:
=================
1.
+ /*
+ * Register before-shmem-exit hook to ensure filesets are dropped while we
+ * can still report stats for underlying temporary files.
+ */
+ before_shmem_exit(worker_cleanup, (Datum) 0);
+
Do we really need to register a new callback here? Won't the existing
logical replication worker exit routine (logicalrep_worker_onexit) be
sufficient for this patch's purpose?
2.
- SharedFileSet *fileset; /* space for segment files if shared */
- const char *name; /* name of this BufFile if shared */
+ FileSet *fileset; /* space for fileset for fileset based file */
+ const char *name; /* name of this BufFile */
The comments for the above two variables can be written as (a) space
for fileset based segment files, (b) name of fileset based BufFile.
3.
/*
- * Create a new segment file backing a shared BufFile.
+ * Create a new segment file backing a fileset BufFile.
*/
static File
-MakeNewSharedSegment(BufFile *buffile, int segment)
+MakeNewSegment(BufFile *buffile, int segment)
I think it is better to name this function as MakeNewFileSetSegment.
You can slightly change the comment as: "Create a new segment file
backing a fileset based BufFile."
4.
/*
* It is possible that there are files left over from before a crash
- * restart with the same name. In order for BufFileOpenShared() not to
+ * restart with the same name. In order for BufFileOpen() not to
* get confused about how many segments there are, we'll unlink the next
Typo. /BufFileOpen/BufFileOpenFileSet
5.
static void
-SharedSegmentName(char *name, const char *buffile_name, int segment)
+SegmentName(char *name, const char *buffile_name, int segment)
Can we name this as FileSetSegmentName?
6.
*
- * The naming scheme for shared BufFiles is left up to the calling code. The
+ * The naming scheme for fileset BufFiles is left up to the calling code.
Isn't it better to say "... fileset based BufFiles .."?
7.
+ * FileSets provide a temporary namespace (think directory) so that files can
+ * be discovered by name
A full stop is missing at the end of the statement.
8.
+ *
+ * The callers are expected to explicitly remove such files by using
+ * SharedFileSetDelete/ SharedFileSetDeleteAll.
+ *
+ * Files will be distributed over the tablespaces configured in
+ * temp_tablespaces.
+ *
+ * Under the covers the set is one or more directories which will eventually
+ * be deleted.
+ */
+void
+FileSetInit(FileSet *fileset)
Is there a need to mention 'Shared' in API names (SharedFileSetDelete/
SharedFileSetDeleteAll) in the comments? Also, there doesn't seem to
be a need for extra space before *DeleteAll API in comments.
9.
* Files will be distributed over the tablespaces configured in
* temp_tablespaces.
*
* Under the covers the set is one or more directories which will eventually
* be deleted.
*/
void
SharedFileSetInit(SharedFileSet *fileset, dsm_segment *seg)
I think we can remove the part of the above comment where it says
"Files will be distributed over ..." as that is already mentioned atop
FileSetInit.
--
With Regards,
Amit Kapila.
On Tue, Aug 24, 2021 at 12:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I was also thinking about the same, does it make sense to name it just
""%s/%s%lu.%u.fileset"?
Done
I think it is reasonable to use .fileset as proposed by you.
Few other comments: ================= 1. + /* + * Register before-shmem-exit hook to ensure filesets are dropped while we + * can still report stats for underlying temporary files. + */ + before_shmem_exit(worker_cleanup, (Datum) 0); +Do we really need to register a new callback here? Won't the existing
logical replication worker exit routine (logicalrep_worker_onexit) be
sufficient for this patch's purpose?
Right, we don't need an extra function for this.
2. - SharedFileSet *fileset; /* space for segment files if shared */ - const char *name; /* name of this BufFile if shared */ + FileSet *fileset; /* space for fileset for fileset based file */ + const char *name; /* name of this BufFile */The comments for the above two variables can be written as (a) space
for fileset based segment files, (b) name of fileset based BufFile.
Done
3. /* - * Create a new segment file backing a shared BufFile. + * Create a new segment file backing a fileset BufFile. */ static File -MakeNewSharedSegment(BufFile *buffile, int segment) +MakeNewSegment(BufFile *buffile, int segment)I think it is better to name this function as MakeNewFileSetSegment.
You can slightly change the comment as: "Create a new segment file
backing a fileset based BufFile."
Make sense
4. /* * It is possible that there are files left over from before a crash - * restart with the same name. In order for BufFileOpenShared() not to + * restart with the same name. In order for BufFileOpen() not to * get confused about how many segments there are, we'll unlink the nextTypo. /BufFileOpen/BufFileOpenFileSet
Fixed
5. static void -SharedSegmentName(char *name, const char *buffile_name, int segment) +SegmentName(char *name, const char *buffile_name, int segment)Can we name this as FileSetSegmentName?
Done
6. * - * The naming scheme for shared BufFiles is left up to the calling code. The + * The naming scheme for fileset BufFiles is left up to the calling code.Isn't it better to say "... fileset based BufFiles .."?
Done
7. + * FileSets provide a temporary namespace (think directory) so that files can + * be discovered by nameA full stop is missing at the end of the statement.
Fixed
8. + * + * The callers are expected to explicitly remove such files by using + * SharedFileSetDelete/ SharedFileSetDeleteAll. + * + * Files will be distributed over the tablespaces configured in + * temp_tablespaces. + * + * Under the covers the set is one or more directories which will eventually + * be deleted. + */ +void +FileSetInit(FileSet *fileset)Is there a need to mention 'Shared' in API names (SharedFileSetDelete/
SharedFileSetDeleteAll) in the comments? Also, there doesn't seem to
be a need for extra space before *DeleteAll API in comments.
Fixed
9.
* Files will be distributed over the tablespaces configured in
* temp_tablespaces.
*
* Under the covers the set is one or more directories which will eventually
* be deleted.
*/
void
SharedFileSetInit(SharedFileSet *fileset, dsm_segment *seg)I think we can remove the part of the above comment where it says
"Files will be distributed over ..." as that is already mentioned atop
FileSetInit.
Done
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v3-0001-Sharedfileset-refactoring.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Sharedfileset-refactoring.patchDownload
From 94a8a22726f08f07259b49ccee340ab1d4cba116 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Wed, 18 Aug 2021 15:52:21 +0530
Subject: [PATCH v3 1/2] Sharedfileset refactoring
Currently, sharedfileset.c is designed for a very specific purpose i.e.
one backend could create the fileset and that could be shared across
multiple backends so the fileset should be created in DSM. But in some
use cases, we need exact behavior as sharedfileset that the files are
created under this should be named files and should also survive the
transaction and should be allowed to be opened and closed. In this
patch we have refactored these files such that there will be two
files a) fileset.c which will provide general-purpose interfaces
b) sharedfileset.c, which will internally used fileset.c but this
will be specific for DSM-based filesets.
---
src/backend/replication/logical/launcher.c | 3 +
src/backend/replication/logical/worker.c | 83 ++++++----
src/backend/storage/file/Makefile | 1 +
src/backend/storage/file/buffile.c | 84 +++++-----
src/backend/storage/file/fd.c | 2 +-
src/backend/storage/file/fileset.c | 204 ++++++++++++++++++++++++
src/backend/storage/file/sharedfileset.c | 244 +----------------------------
src/backend/utils/sort/logtape.c | 8 +-
src/backend/utils/sort/sharedtuplestore.c | 5 +-
src/include/replication/worker_internal.h | 1 +
src/include/storage/buffile.h | 12 +-
src/include/storage/fileset.h | 40 +++++
src/include/storage/sharedfileset.h | 14 +-
13 files changed, 367 insertions(+), 334 deletions(-)
create mode 100644 src/backend/storage/file/fileset.c
create mode 100644 src/include/storage/fileset.h
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index e3b11da..8b1772d 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -648,6 +648,9 @@ logicalrep_worker_onexit(int code, Datum arg)
logicalrep_worker_detach();
+ /* Cleanup filesets used for streaming transactions. */
+ logicalrep_worker_cleanupfileset();
+
ApplyLauncherWakeup();
}
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index ecaed15..07a2c90 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -39,13 +39,13 @@
* BufFile infrastructure supports temporary files that exceed the OS file size
* limit, (b) provides a way for automatic clean up on the error and (c) provides
* a way to survive these files across local transactions and allow to open and
- * close at stream start and close. We decided to use SharedFileSet
+ * close at stream start and close. We decided to use FileSet
* infrastructure as without that it deletes the files on the closure of the
* file and if we decide to keep stream files open across the start/stop stream
* then it will consume a lot of memory (more than 8K for each BufFile and
* there could be multiple such BufFiles as the subscriber could receive
* multiple start/stop streams for different transactions before getting the
- * commit). Moreover, if we don't use SharedFileSet then we also need to invent
+ * commit). Moreover, if we don't use FileSet then we also need to invent
* a new way to pass filenames to BufFile APIs so that we are allowed to open
* the file we desired across multiple stream-open calls for the same
* transaction.
@@ -231,8 +231,8 @@ typedef struct ApplyExecutionData
typedef struct StreamXidHash
{
TransactionId xid; /* xid is the hash key and must be first */
- SharedFileSet *stream_fileset; /* shared file set for stream data */
- SharedFileSet *subxact_fileset; /* shared file set for subxact info */
+ FileSet *stream_fileset; /* file set for stream data */
+ FileSet *subxact_fileset; /* file set for subxact info */
} StreamXidHash;
static MemoryContext ApplyMessageContext = NULL;
@@ -255,8 +255,8 @@ static bool in_streamed_transaction = false;
static TransactionId stream_xid = InvalidTransactionId;
/*
- * Hash table for storing the streaming xid information along with shared file
- * set for streaming and subxact files.
+ * Hash table for storing the streaming xid information along with filesets
+ * for streaming and subxact files.
*/
static HTAB *xidhash = NULL;
@@ -1299,10 +1299,10 @@ apply_handle_stream_abort(StringInfo s)
/* open the changes file */
changes_filename(path, MyLogicalRepWorker->subid, xid);
- fd = BufFileOpenShared(ent->stream_fileset, path, O_RDWR);
+ fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
/* OK, truncate the file at the right offset */
- BufFileTruncateShared(fd, subxact_data.subxacts[subidx].fileno,
+ BufFileTruncateFileSet(fd, subxact_data.subxacts[subidx].fileno,
subxact_data.subxacts[subidx].offset);
BufFileClose(fd);
@@ -1355,7 +1355,7 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
errmsg_internal("transaction %u not found in stream XID hash table",
xid)));
- fd = BufFileOpenShared(ent->stream_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDONLY);
buffer = palloc(BLCKSZ);
initStringInfo(&s2);
@@ -2509,6 +2509,33 @@ UpdateWorkerStats(XLogRecPtr last_lsn, TimestampTz send_time, bool reply)
}
/*
+ * Cleanup filesets.
+ */
+void
+logicalrep_worker_cleanupfileset()
+{
+ HASH_SEQ_STATUS status;
+ StreamXidHash *hentry;
+
+ /*
+ * Scan the xidhash table if created and from each entry delete stream
+ * fileset and the subxact fileset.
+ */
+ if (xidhash)
+ {
+ hash_seq_init(&status, xidhash);
+ while ((hentry = (StreamXidHash *) hash_seq_search(&status)) != NULL)
+ {
+ FileSetDeleteAll(hentry->stream_fileset);
+
+ /* Delete the subxact fileset only if it is created */
+ if (hentry->subxact_fileset)
+ FileSetDeleteAll(hentry->subxact_fileset);
+ }
+ }
+}
+
+/*
* Apply main loop.
*/
static void
@@ -2979,7 +3006,7 @@ subxact_info_write(Oid subid, TransactionId xid)
if (ent->subxact_fileset)
{
cleanup_subxact_info();
- SharedFileSetDeleteAll(ent->subxact_fileset);
+ FileSetDeleteAll(ent->subxact_fileset);
pfree(ent->subxact_fileset);
ent->subxact_fileset = NULL;
}
@@ -2997,18 +3024,18 @@ subxact_info_write(Oid subid, TransactionId xid)
MemoryContext oldctx;
/*
- * We need to maintain shared fileset across multiple stream
- * start/stop calls. So, need to allocate it in a persistent context.
+ * We need to maintain fileset across multiple stream start/stop calls.
+ * So, need to allocate it in a persistent context.
*/
oldctx = MemoryContextSwitchTo(ApplyContext);
- ent->subxact_fileset = palloc(sizeof(SharedFileSet));
- SharedFileSetInit(ent->subxact_fileset, NULL);
+ ent->subxact_fileset = palloc(sizeof(FileSet));
+ FileSetInit(ent->subxact_fileset);
MemoryContextSwitchTo(oldctx);
- fd = BufFileCreateShared(ent->subxact_fileset, path);
+ fd = BufFileCreateFileSet(ent->subxact_fileset, path);
}
else
- fd = BufFileOpenShared(ent->subxact_fileset, path, O_RDWR);
+ fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDWR);
len = sizeof(SubXactInfo) * subxact_data.nsubxacts;
@@ -3062,7 +3089,7 @@ subxact_info_read(Oid subid, TransactionId xid)
subxact_filename(path, subid, xid);
- fd = BufFileOpenShared(ent->subxact_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDONLY);
/* read number of subxact items */
if (BufFileRead(fd, &subxact_data.nsubxacts,
@@ -3219,7 +3246,7 @@ stream_cleanup_files(Oid subid, TransactionId xid)
/* Delete the change file and release the stream fileset memory */
changes_filename(path, subid, xid);
- SharedFileSetDeleteAll(ent->stream_fileset);
+ FileSetDeleteAll(ent->stream_fileset);
pfree(ent->stream_fileset);
ent->stream_fileset = NULL;
@@ -3227,7 +3254,7 @@ stream_cleanup_files(Oid subid, TransactionId xid)
if (ent->subxact_fileset)
{
subxact_filename(path, subid, xid);
- SharedFileSetDeleteAll(ent->subxact_fileset);
+ FileSetDeleteAll(ent->subxact_fileset);
pfree(ent->subxact_fileset);
ent->subxact_fileset = NULL;
}
@@ -3243,8 +3270,8 @@ stream_cleanup_files(Oid subid, TransactionId xid)
*
* Open a file for streamed changes from a toplevel transaction identified
* by stream_xid (global variable). If it's the first chunk of streamed
- * changes for this transaction, initialize the shared fileset and create the
- * buffile, otherwise open the previously created file.
+ * changes for this transaction, initialize the fileset and create the buffile,
+ * otherwise open the previously created file.
*
* This can only be called at the beginning of a "streaming" block, i.e.
* between stream_start/stream_stop messages from the upstream.
@@ -3285,7 +3312,7 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
if (first_segment)
{
MemoryContext savectx;
- SharedFileSet *fileset;
+ FileSet *fileset;
if (found)
ereport(ERROR,
@@ -3293,16 +3320,16 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
/*
- * We need to maintain shared fileset across multiple stream
- * start/stop calls. So, need to allocate it in a persistent context.
+ * We need to maintain fileset across multiple stream start/stop calls.
+ * So, need to allocate it in a persistent context.
*/
savectx = MemoryContextSwitchTo(ApplyContext);
- fileset = palloc(sizeof(SharedFileSet));
+ fileset = palloc(sizeof(FileSet));
- SharedFileSetInit(fileset, NULL);
+ FileSetInit(fileset);
MemoryContextSwitchTo(savectx);
- stream_fd = BufFileCreateShared(fileset, path);
+ stream_fd = BufFileCreateFileSet(fileset, path);
/* Remember the fileset for the next stream of the same transaction */
ent->xid = xid;
@@ -3320,7 +3347,7 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
* Open the file and seek to the end of the file because we always
* append the changes file.
*/
- stream_fd = BufFileOpenShared(ent->stream_fileset, path, O_RDWR);
+ stream_fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
BufFileSeek(stream_fd, 0, 0, SEEK_END);
}
diff --git a/src/backend/storage/file/Makefile b/src/backend/storage/file/Makefile
index 5e1291b..660ac51 100644
--- a/src/backend/storage/file/Makefile
+++ b/src/backend/storage/file/Makefile
@@ -16,6 +16,7 @@ OBJS = \
buffile.o \
copydir.o \
fd.o \
+ fileset.o \
reinit.o \
sharedfileset.o
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index a4be5fe..df3e099 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -39,7 +39,7 @@
* BufFile also supports temporary files that can be used by the single backend
* when the corresponding files need to be survived across the transaction and
* need to be opened and closed multiple times. Such files need to be created
- * as a member of a SharedFileSet.
+ * as a member of a FileSet.
*-------------------------------------------------------------------------
*/
@@ -77,8 +77,8 @@ struct BufFile
bool dirty; /* does buffer need to be written? */
bool readOnly; /* has the file been set to read only? */
- SharedFileSet *fileset; /* space for segment files if shared */
- const char *name; /* name of this BufFile if shared */
+ FileSet *fileset; /* space for fileset based segment files */
+ const char *name; /* name of fileset based BufFile */
/*
* resowner is the ResourceOwner to use for underlying temp files. (We
@@ -104,7 +104,7 @@ static void extendBufFile(BufFile *file);
static void BufFileLoadBuffer(BufFile *file);
static void BufFileDumpBuffer(BufFile *file);
static void BufFileFlush(BufFile *file);
-static File MakeNewSharedSegment(BufFile *file, int segment);
+static File MakeNewFileSetSegment(BufFile *file, int segment);
/*
* Create BufFile and perform the common initialization.
@@ -160,7 +160,7 @@ extendBufFile(BufFile *file)
if (file->fileset == NULL)
pfile = OpenTemporaryFile(file->isInterXact);
else
- pfile = MakeNewSharedSegment(file, file->numFiles);
+ pfile = MakeNewFileSetSegment(file, file->numFiles);
Assert(pfile >= 0);
@@ -214,34 +214,34 @@ BufFileCreateTemp(bool interXact)
* Build the name for a given segment of a given BufFile.
*/
static void
-SharedSegmentName(char *name, const char *buffile_name, int segment)
+FileSetSegmentName(char *name, const char *buffile_name, int segment)
{
snprintf(name, MAXPGPATH, "%s.%d", buffile_name, segment);
}
/*
- * Create a new segment file backing a shared BufFile.
+ * Create a new segment file backing a fileset based BufFile.
*/
static File
-MakeNewSharedSegment(BufFile *buffile, int segment)
+MakeNewFileSetSegment(BufFile *buffile, int segment)
{
char name[MAXPGPATH];
File file;
/*
* It is possible that there are files left over from before a crash
- * restart with the same name. In order for BufFileOpenShared() not to
+ * restart with the same name. In order for BufFileOpenFileSet() not to
* get confused about how many segments there are, we'll unlink the next
* segment number if it already exists.
*/
- SharedSegmentName(name, buffile->name, segment + 1);
- SharedFileSetDelete(buffile->fileset, name, true);
+ FileSetSegmentName(name, buffile->name, segment + 1);
+ FileSetDelete(buffile->fileset, name, true);
/* Create the new segment. */
- SharedSegmentName(name, buffile->name, segment);
- file = SharedFileSetCreate(buffile->fileset, name);
+ FileSetSegmentName(name, buffile->name, segment);
+ file = FileSetCreate(buffile->fileset, name);
- /* SharedFileSetCreate would've errored out */
+ /* FileSetCreate would've errored out */
Assert(file > 0);
return file;
@@ -251,15 +251,15 @@ MakeNewSharedSegment(BufFile *buffile, int segment)
* Create a BufFile that can be discovered and opened read-only by other
* backends that are attached to the same SharedFileSet using the same name.
*
- * The naming scheme for shared BufFiles is left up to the calling code. The
- * name will appear as part of one or more filenames on disk, and might
+ * The naming scheme for fileset based BufFiles is left up to the calling code.
+ * The name will appear as part of one or more filenames on disk, and might
* provide clues to administrators about which subsystem is generating
* temporary file data. Since each SharedFileSet object is backed by one or
* more uniquely named temporary directory, names don't conflict with
* unrelated SharedFileSet objects.
*/
BufFile *
-BufFileCreateShared(SharedFileSet *fileset, const char *name)
+BufFileCreateFileSet(FileSet *fileset, const char *name)
{
BufFile *file;
@@ -267,7 +267,7 @@ BufFileCreateShared(SharedFileSet *fileset, const char *name)
file->fileset = fileset;
file->name = pstrdup(name);
file->files = (File *) palloc(sizeof(File));
- file->files[0] = MakeNewSharedSegment(file, 0);
+ file->files[0] = MakeNewFileSetSegment(file, 0);
file->readOnly = false;
return file;
@@ -275,13 +275,13 @@ BufFileCreateShared(SharedFileSet *fileset, const char *name)
/*
* Open a file that was previously created in another backend (or this one)
- * with BufFileCreateShared in the same SharedFileSet using the same name.
+ * with BufFileCreateFileSet in the same FileSet using the same name.
* The backend that created the file must have called BufFileClose() or
- * BufFileExportShared() to make sure that it is ready to be opened by other
+ * BufFileExportFileSet() to make sure that it is ready to be opened by other
* backends and render it read-only.
*/
BufFile *
-BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
+BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
{
BufFile *file;
char segment_name[MAXPGPATH];
@@ -304,8 +304,8 @@ BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
files = repalloc(files, sizeof(File) * capacity);
}
/* Try to load a segment. */
- SharedSegmentName(segment_name, name, nfiles);
- files[nfiles] = SharedFileSetOpen(fileset, segment_name, mode);
+ FileSetSegmentName(segment_name, name, nfiles);
+ files[nfiles] = FileSetOpen(fileset, segment_name, mode);
if (files[nfiles] <= 0)
break;
++nfiles;
@@ -333,18 +333,18 @@ BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
}
/*
- * Delete a BufFile that was created by BufFileCreateShared in the given
- * SharedFileSet using the given name.
+ * Delete a BufFile that was created by BufFileCreateFileSet in the given
+ * FileSet using the given name.
*
* It is not necessary to delete files explicitly with this function. It is
* provided only as a way to delete files proactively, rather than waiting for
- * the SharedFileSet to be cleaned up.
+ * the FileSet to be cleaned up.
*
* Only one backend should attempt to delete a given name, and should know
* that it exists and has been exported or closed.
*/
void
-BufFileDeleteShared(SharedFileSet *fileset, const char *name)
+BufFileDeleteFileSet(FileSet *fileset, const char *name)
{
char segment_name[MAXPGPATH];
int segment = 0;
@@ -357,8 +357,8 @@ BufFileDeleteShared(SharedFileSet *fileset, const char *name)
*/
for (;;)
{
- SharedSegmentName(segment_name, name, segment);
- if (!SharedFileSetDelete(fileset, segment_name, true))
+ FileSetSegmentName(segment_name, name, segment);
+ if (!FileSetDelete(fileset, segment_name, true))
break;
found = true;
++segment;
@@ -367,16 +367,16 @@ BufFileDeleteShared(SharedFileSet *fileset, const char *name)
}
if (!found)
- elog(ERROR, "could not delete unknown shared BufFile \"%s\"", name);
+ elog(ERROR, "could not delete unknown BufFile \"%s\"", name);
}
/*
- * BufFileExportShared --- flush and make read-only, in preparation for sharing.
+ * BufFileExportFileSet --- flush and make read-only, in preparation for sharing.
*/
void
-BufFileExportShared(BufFile *file)
+BufFileExportFileSet(BufFile *file)
{
- /* Must be a file belonging to a SharedFileSet. */
+ /* Must be a file belonging to a FileSet. */
Assert(file->fileset != NULL);
/* It's probably a bug if someone calls this twice. */
@@ -785,7 +785,7 @@ BufFileTellBlock(BufFile *file)
#endif
/*
- * Return the current shared BufFile size.
+ * Return the current fileset based BufFile size.
*
* Counts any holes left behind by BufFileAppend as part of the size.
* ereport()s on failure.
@@ -811,8 +811,8 @@ BufFileSize(BufFile *file)
}
/*
- * Append the contents of source file (managed within shared fileset) to
- * end of target file (managed within same shared fileset).
+ * Append the contents of source file (managed within fileset) to
+ * end of target file (managed within same fileset).
*
* Note that operation subsumes ownership of underlying resources from
* "source". Caller should never call BufFileClose against source having
@@ -854,11 +854,11 @@ BufFileAppend(BufFile *target, BufFile *source)
}
/*
- * Truncate a BufFile created by BufFileCreateShared up to the given fileno and
- * the offset.
+ * Truncate a BufFile created by BufFileCreateFileSet up to the given fileno
+ * and the offset.
*/
void
-BufFileTruncateShared(BufFile *file, int fileno, off_t offset)
+BufFileTruncateFileSet(BufFile *file, int fileno, off_t offset)
{
int numFiles = file->numFiles;
int newFile = fileno;
@@ -876,12 +876,12 @@ BufFileTruncateShared(BufFile *file, int fileno, off_t offset)
{
if ((i != fileno || offset == 0) && i != 0)
{
- SharedSegmentName(segment_name, file->name, i);
+ FileSetSegmentName(segment_name, file->name, i);
FileClose(file->files[i]);
- if (!SharedFileSetDelete(file->fileset, segment_name, true))
+ if (!FileSetDelete(file->fileset, segment_name, true))
ereport(ERROR,
(errcode_for_file_access(),
- errmsg("could not delete shared fileset \"%s\": %m",
+ errmsg("could not delete fileset \"%s\": %m",
segment_name)));
numFiles--;
newOffset = MAX_PHYSICAL_FILESIZE;
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index b58b399..433e283 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -1921,7 +1921,7 @@ PathNameDeleteTemporaryFile(const char *path, bool error_on_failure)
/*
* Unlike FileClose's automatic file deletion code, we tolerate
- * non-existence to support BufFileDeleteShared which doesn't know how
+ * non-existence to support BufFileDeleteFileSet which doesn't know how
* many segments it has to delete until it runs out.
*/
if (stat_errno == ENOENT)
diff --git a/src/backend/storage/file/fileset.c b/src/backend/storage/file/fileset.c
new file mode 100644
index 0000000..91b06dd
--- /dev/null
+++ b/src/backend/storage/file/fileset.c
@@ -0,0 +1,204 @@
+/*-------------------------------------------------------------------------
+ *
+ * fileset.c
+ * temporary file set management.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/storage/file/fileset.c
+ *
+ * FileSets provide a temporary namespace (think directory) so that files can
+ * be discovered by name.
+ *
+ * FileSets can be used by backends when the temporary files need to be
+ * opened/closed multiple times and the underlying files need to survive across
+ * transactions.
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <limits.h>
+
+#include "catalog/pg_tablespace.h"
+#include "commands/tablespace.h"
+#include "common/hashfn.h"
+#include "miscadmin.h"
+#include "storage/ipc.h"
+#include "storage/fileset.h"
+#include "utils/builtins.h"
+
+static void FileSetPath(char *path, FileSet *fileset, Oid tablespace);
+static void FilePath(char *path, FileSet *fileset, const char *name);
+static Oid ChooseTablespace(const FileSet *fileset, const char *name);
+
+/*
+ * Initialize a space for temporary files. This API can be used by shared
+ * fileset as well as if the temporary files are used only by single backend
+ * but the files need to be opened and closed multiple times and also the
+ * underlying files need to survive across transactions.
+ *
+ * The callers are expected to explicitly remove such files by using
+ * FileSetDelete/FileSetDeleteAll.
+ *
+ * Files will be distributed over the tablespaces configured in
+ * temp_tablespaces.
+ *
+ * Under the covers the set is one or more directories which will eventually
+ * be deleted.
+ */
+void
+FileSetInit(FileSet *fileset)
+{
+ static uint32 counter = 0;
+
+ fileset->creator_pid = MyProcPid;
+ fileset->number = counter;
+ counter = (counter + 1) % INT_MAX;
+
+ /* Capture the tablespace OIDs so that all backends agree on them. */
+ PrepareTempTablespaces();
+ fileset->ntablespaces =
+ GetTempTablespaces(&fileset->tablespaces[0],
+ lengthof(fileset->tablespaces));
+ if (fileset->ntablespaces == 0)
+ {
+ /* If the GUC is empty, use current database's default tablespace */
+ fileset->tablespaces[0] = MyDatabaseTableSpace;
+ fileset->ntablespaces = 1;
+ }
+ else
+ {
+ int i;
+
+ /*
+ * An entry of InvalidOid means use the default tablespace for the
+ * current database. Replace that now, to be sure that all users of
+ * the FileSet agree on what to do.
+ */
+ for (i = 0; i < fileset->ntablespaces; i++)
+ {
+ if (fileset->tablespaces[i] == InvalidOid)
+ fileset->tablespaces[i] = MyDatabaseTableSpace;
+ }
+ }
+}
+
+/*
+ * Create a new file in the given set.
+ */
+File
+FileSetCreate(FileSet *fileset, const char *name)
+{
+ char path[MAXPGPATH];
+ File file;
+
+ FilePath(path, fileset, name);
+ file = PathNameCreateTemporaryFile(path, false);
+
+ /* If we failed, see if we need to create the directory on demand. */
+ if (file <= 0)
+ {
+ char tempdirpath[MAXPGPATH];
+ char filesetpath[MAXPGPATH];
+ Oid tablespace = ChooseTablespace(fileset, name);
+
+ TempTablespacePath(tempdirpath, tablespace);
+ FileSetPath(filesetpath, fileset, tablespace);
+ PathNameCreateTemporaryDir(tempdirpath, filesetpath);
+ file = PathNameCreateTemporaryFile(path, true);
+ }
+
+ return file;
+}
+
+/*
+ * Open a file that was created with FileSetCreate() */
+File
+FileSetOpen(FileSet *fileset, const char *name, int mode)
+{
+ char path[MAXPGPATH];
+ File file;
+
+ FilePath(path, fileset, name);
+ file = PathNameOpenTemporaryFile(path, mode);
+
+ return file;
+}
+
+/*
+ * Delete a file that was created with FileSetCreate().
+ * Return true if the file existed, false if didn't.
+ */
+bool
+FileSetDelete(FileSet *fileset, const char *name,
+ bool error_on_failure)
+{
+ char path[MAXPGPATH];
+
+ FilePath(path, fileset, name);
+
+ return PathNameDeleteTemporaryFile(path, error_on_failure);
+}
+
+/*
+ * Delete all files in the set.
+ */
+void
+FileSetDeleteAll(FileSet *fileset)
+{
+ char dirpath[MAXPGPATH];
+ int i;
+
+ /*
+ * Delete the directory we created in each tablespace. Doesn't fail
+ * because we use this in error cleanup paths, but can generate LOG
+ * message on IO error.
+ */
+ for (i = 0; i < fileset->ntablespaces; ++i)
+ {
+ FileSetPath(dirpath, fileset, fileset->tablespaces[i]);
+ PathNameDeleteTemporaryDir(dirpath);
+ }
+}
+
+/*
+ * Build the path for the directory holding the files backing a FileSet in a
+ * given tablespace.
+ */
+static void
+FileSetPath(char *path, FileSet *fileset, Oid tablespace)
+{
+ char tempdirpath[MAXPGPATH];
+
+ TempTablespacePath(tempdirpath, tablespace);
+ snprintf(path, MAXPGPATH, "%s/%s%lu.%u.fileset",
+ tempdirpath, PG_TEMP_FILE_PREFIX,
+ (unsigned long) fileset->creator_pid, fileset->number);
+}
+
+/*
+ * Sorting hat to determine which tablespace a given temporary file belongs in.
+ */
+static Oid
+ChooseTablespace(const FileSet *fileset, const char *name)
+{
+ uint32 hash = hash_any((const unsigned char *) name, strlen(name));
+
+ return fileset->tablespaces[hash % fileset->ntablespaces];
+}
+
+/*
+ * Compute the full path of a file in a FileSet.
+ */
+static void
+FilePath(char *path, FileSet *fileset, const char *name)
+{
+ char dirpath[MAXPGPATH];
+
+ FileSetPath(dirpath, fileset, ChooseTablespace(fileset, name));
+ snprintf(path, MAXPGPATH, "%s/%s", dirpath, name);
+}
diff --git a/src/backend/storage/file/sharedfileset.c b/src/backend/storage/file/sharedfileset.c
index ed37c94..5bb3d44 100644
--- a/src/backend/storage/file/sharedfileset.c
+++ b/src/backend/storage/file/sharedfileset.c
@@ -13,10 +13,6 @@
* files can be discovered by name, and a shared ownership semantics so that
* shared files survive until the last user detaches.
*
- * SharedFileSets can be used by backends when the temporary files need to be
- * opened/closed multiple times and the underlying files need to survive across
- * transactions.
- *
*-------------------------------------------------------------------------
*/
@@ -33,13 +29,7 @@
#include "storage/sharedfileset.h"
#include "utils/builtins.h"
-static List *filesetlist = NIL;
-
static void SharedFileSetOnDetach(dsm_segment *segment, Datum datum);
-static void SharedFileSetDeleteOnProcExit(int status, Datum arg);
-static void SharedFileSetPath(char *path, SharedFileSet *fileset, Oid tablespace);
-static void SharedFilePath(char *path, SharedFileSet *fileset, const char *name);
-static Oid ChooseTablespace(const SharedFileSet *fileset, const char *name);
/*
* Initialize a space for temporary files that can be opened by other backends.
@@ -47,77 +37,22 @@ static Oid ChooseTablespace(const SharedFileSet *fileset, const char *name);
* SharedFileSet with 'seg'. Any contained files will be deleted when the
* last backend detaches.
*
- * We can also use this interface if the temporary files are used only by
- * single backend but the files need to be opened and closed multiple times
- * and also the underlying files need to survive across transactions. For
- * such cases, dsm segment 'seg' should be passed as NULL. Callers are
- * expected to explicitly remove such files by using SharedFileSetDelete/
- * SharedFileSetDeleteAll or we remove such files on proc exit.
- *
- * Files will be distributed over the tablespaces configured in
- * temp_tablespaces.
- *
* Under the covers the set is one or more directories which will eventually
* be deleted.
*/
void
SharedFileSetInit(SharedFileSet *fileset, dsm_segment *seg)
{
- static uint32 counter = 0;
-
+ /* Initialize the shared fileset specific members. */
SpinLockInit(&fileset->mutex);
fileset->refcnt = 1;
- fileset->creator_pid = MyProcPid;
- fileset->number = counter;
- counter = (counter + 1) % INT_MAX;
-
- /* Capture the tablespace OIDs so that all backends agree on them. */
- PrepareTempTablespaces();
- fileset->ntablespaces =
- GetTempTablespaces(&fileset->tablespaces[0],
- lengthof(fileset->tablespaces));
- if (fileset->ntablespaces == 0)
- {
- /* If the GUC is empty, use current database's default tablespace */
- fileset->tablespaces[0] = MyDatabaseTableSpace;
- fileset->ntablespaces = 1;
- }
- else
- {
- int i;
- /*
- * An entry of InvalidOid means use the default tablespace for the
- * current database. Replace that now, to be sure that all users of
- * the SharedFileSet agree on what to do.
- */
- for (i = 0; i < fileset->ntablespaces; i++)
- {
- if (fileset->tablespaces[i] == InvalidOid)
- fileset->tablespaces[i] = MyDatabaseTableSpace;
- }
- }
+ /* Initialize the fileset. */
+ FileSetInit(&fileset->fs);
/* Register our cleanup callback. */
if (seg)
on_dsm_detach(seg, SharedFileSetOnDetach, PointerGetDatum(fileset));
- else
- {
- static bool registered_cleanup = false;
-
- if (!registered_cleanup)
- {
- /*
- * We must not have registered any fileset before registering the
- * fileset clean up.
- */
- Assert(filesetlist == NIL);
- on_proc_exit(SharedFileSetDeleteOnProcExit, 0);
- registered_cleanup = true;
- }
-
- filesetlist = lcons((void *) fileset, filesetlist);
- }
}
/*
@@ -148,86 +83,12 @@ SharedFileSetAttach(SharedFileSet *fileset, dsm_segment *seg)
}
/*
- * Create a new file in the given set.
- */
-File
-SharedFileSetCreate(SharedFileSet *fileset, const char *name)
-{
- char path[MAXPGPATH];
- File file;
-
- SharedFilePath(path, fileset, name);
- file = PathNameCreateTemporaryFile(path, false);
-
- /* If we failed, see if we need to create the directory on demand. */
- if (file <= 0)
- {
- char tempdirpath[MAXPGPATH];
- char filesetpath[MAXPGPATH];
- Oid tablespace = ChooseTablespace(fileset, name);
-
- TempTablespacePath(tempdirpath, tablespace);
- SharedFileSetPath(filesetpath, fileset, tablespace);
- PathNameCreateTemporaryDir(tempdirpath, filesetpath);
- file = PathNameCreateTemporaryFile(path, true);
- }
-
- return file;
-}
-
-/*
- * Open a file that was created with SharedFileSetCreate(), possibly in
- * another backend.
- */
-File
-SharedFileSetOpen(SharedFileSet *fileset, const char *name, int mode)
-{
- char path[MAXPGPATH];
- File file;
-
- SharedFilePath(path, fileset, name);
- file = PathNameOpenTemporaryFile(path, mode);
-
- return file;
-}
-
-/*
- * Delete a file that was created with SharedFileSetCreate().
- * Return true if the file existed, false if didn't.
- */
-bool
-SharedFileSetDelete(SharedFileSet *fileset, const char *name,
- bool error_on_failure)
-{
- char path[MAXPGPATH];
-
- SharedFilePath(path, fileset, name);
-
- return PathNameDeleteTemporaryFile(path, error_on_failure);
-}
-
-/*
* Delete all files in the set.
*/
void
SharedFileSetDeleteAll(SharedFileSet *fileset)
{
- char dirpath[MAXPGPATH];
- int i;
-
- /*
- * Delete the directory we created in each tablespace. Doesn't fail
- * because we use this in error cleanup paths, but can generate LOG
- * message on IO error.
- */
- for (i = 0; i < fileset->ntablespaces; ++i)
- {
- SharedFileSetPath(dirpath, fileset, fileset->tablespaces[i]);
- PathNameDeleteTemporaryDir(dirpath);
- }
-
- /* Unregister the shared fileset */
- SharedFileSetUnregister(fileset);
+ return FileSetDeleteAll(&fileset->fs);
}
/*
@@ -255,100 +116,5 @@ SharedFileSetOnDetach(dsm_segment *segment, Datum datum)
* this function so we can safely access its data.
*/
if (unlink_all)
- SharedFileSetDeleteAll(fileset);
-}
-
-/*
- * Callback function that will be invoked on the process exit. This will
- * process the list of all the registered sharedfilesets and delete the
- * underlying files.
- */
-static void
-SharedFileSetDeleteOnProcExit(int status, Datum arg)
-{
- /*
- * Remove all the pending shared fileset entries. We don't use foreach()
- * here because SharedFileSetDeleteAll will remove the current element in
- * filesetlist. Though we have used foreach_delete_current() to remove the
- * element from filesetlist it could only fix up the state of one of the
- * loops, see SharedFileSetUnregister.
- */
- while (list_length(filesetlist) > 0)
- {
- SharedFileSet *fileset = (SharedFileSet *) linitial(filesetlist);
-
- SharedFileSetDeleteAll(fileset);
- }
-
- filesetlist = NIL;
-}
-
-/*
- * Unregister the shared fileset entry registered for cleanup on proc exit.
- */
-void
-SharedFileSetUnregister(SharedFileSet *input_fileset)
-{
- ListCell *l;
-
- /*
- * If the caller is following the dsm based cleanup then we don't maintain
- * the filesetlist so return.
- */
- if (filesetlist == NIL)
- return;
-
- foreach(l, filesetlist)
- {
- SharedFileSet *fileset = (SharedFileSet *) lfirst(l);
-
- /* Remove the entry from the list */
- if (input_fileset == fileset)
- {
- filesetlist = foreach_delete_current(filesetlist, l);
- return;
- }
- }
-
- /* Should have found a match */
- Assert(false);
-}
-
-/*
- * Build the path for the directory holding the files backing a SharedFileSet
- * in a given tablespace.
- */
-static void
-SharedFileSetPath(char *path, SharedFileSet *fileset, Oid tablespace)
-{
- char tempdirpath[MAXPGPATH];
-
- TempTablespacePath(tempdirpath, tablespace);
- snprintf(path, MAXPGPATH, "%s/%s%lu.%u.sharedfileset",
- tempdirpath, PG_TEMP_FILE_PREFIX,
- (unsigned long) fileset->creator_pid, fileset->number);
-}
-
-/*
- * Sorting hat to determine which tablespace a given shared temporary file
- * belongs in.
- */
-static Oid
-ChooseTablespace(const SharedFileSet *fileset, const char *name)
-{
- uint32 hash = hash_any((const unsigned char *) name, strlen(name));
-
- return fileset->tablespaces[hash % fileset->ntablespaces];
-}
-
-/*
- * Compute the full path of a file in a SharedFileSet.
- */
-static void
-SharedFilePath(char *path, SharedFileSet *fileset, const char *name)
-{
- char dirpath[MAXPGPATH];
-
- SharedFileSetPath(dirpath, fileset, ChooseTablespace(fileset, name));
- snprintf(path, MAXPGPATH, "%s/%s", dirpath, name);
+ FileSetDeleteAll(&fileset->fs);
}
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index cafc087..f7994d7 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -564,7 +564,7 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
lt = <s->tapes[i];
pg_itoa(i, filename);
- file = BufFileOpenShared(fileset, filename, O_RDONLY);
+ file = BufFileOpenFileSet(&fileset->fs, filename, O_RDONLY);
filesize = BufFileSize(file);
/*
@@ -610,7 +610,7 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
* offset).
*
* The only thing that currently prevents writing to the leader tape from
- * working is the fact that BufFiles opened using BufFileOpenShared() are
+ * working is the fact that BufFiles opened using BufFileOpenFileSet() are
* read-only by definition, but that could be changed if it seemed
* worthwhile. For now, writing to the leader tape will raise a "Bad file
* descriptor" error, so tuplesort must avoid writing to the leader tape
@@ -722,7 +722,7 @@ LogicalTapeSetCreate(int ntapes, bool preallocate, TapeShare *shared,
char filename[MAXPGPATH];
pg_itoa(worker, filename);
- lts->pfile = BufFileCreateShared(fileset, filename);
+ lts->pfile = BufFileCreateFileSet(&fileset->fs, filename);
}
else
lts->pfile = BufFileCreateTemp(false);
@@ -1096,7 +1096,7 @@ LogicalTapeFreeze(LogicalTapeSet *lts, int tapenum, TapeShare *share)
/* Handle extra steps when caller is to share its tapeset */
if (share)
{
- BufFileExportShared(lts->pfile);
+ BufFileExportFileSet(lts->pfile);
share->firstblocknumber = lt->firstBlockNumber;
}
}
diff --git a/src/backend/utils/sort/sharedtuplestore.c b/src/backend/utils/sort/sharedtuplestore.c
index 57e35db..72acd54 100644
--- a/src/backend/utils/sort/sharedtuplestore.c
+++ b/src/backend/utils/sort/sharedtuplestore.c
@@ -310,7 +310,8 @@ sts_puttuple(SharedTuplestoreAccessor *accessor, void *meta_data,
/* Create one. Only this backend will write into it. */
sts_filename(name, accessor, accessor->participant);
- accessor->write_file = BufFileCreateShared(accessor->fileset, name);
+ accessor->write_file =
+ BufFileCreateFileSet(&accessor->fileset->fs, name);
/* Set up the shared state for this backend's file. */
participant = &accessor->sts->participants[accessor->participant];
@@ -559,7 +560,7 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
sts_filename(name, accessor, accessor->read_participant);
accessor->read_file =
- BufFileOpenShared(accessor->fileset, name, O_RDONLY);
+ BufFileOpenFileSet(&accessor->fileset->fs, name, O_RDONLY);
}
/* Seek and load the chunk header. */
diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h
index 41c7487..a6c9d4e 100644
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -79,6 +79,7 @@ extern void logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname,
extern void logicalrep_worker_stop(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup_ptr(LogicalRepWorker *worker);
+extern void logicalrep_worker_cleanupfileset(void);
extern int logicalrep_sync_worker_count(Oid subid);
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 566523d..032a823 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -26,7 +26,7 @@
#ifndef BUFFILE_H
#define BUFFILE_H
-#include "storage/sharedfileset.h"
+#include "storage/fileset.h"
/* BufFile is an opaque type whose details are not known outside buffile.c. */
@@ -46,11 +46,11 @@ extern int BufFileSeekBlock(BufFile *file, long blknum);
extern int64 BufFileSize(BufFile *file);
extern long BufFileAppend(BufFile *target, BufFile *source);
-extern BufFile *BufFileCreateShared(SharedFileSet *fileset, const char *name);
-extern void BufFileExportShared(BufFile *file);
-extern BufFile *BufFileOpenShared(SharedFileSet *fileset, const char *name,
+extern BufFile *BufFileCreateFileSet(FileSet *fileset, const char *name);
+extern void BufFileExportFileSet(BufFile *file);
+extern BufFile *BufFileOpenFileSet(FileSet *fileset, const char *name,
int mode);
-extern void BufFileDeleteShared(SharedFileSet *fileset, const char *name);
-extern void BufFileTruncateShared(BufFile *file, int fileno, off_t offset);
+extern void BufFileDeleteFileSet(FileSet *fileset, const char *name);
+extern void BufFileTruncateFileSet(BufFile *file, int fileno, off_t offset);
#endif /* BUFFILE_H */
diff --git a/src/include/storage/fileset.h b/src/include/storage/fileset.h
new file mode 100644
index 0000000..8795fd1
--- /dev/null
+++ b/src/include/storage/fileset.h
@@ -0,0 +1,40 @@
+/*-------------------------------------------------------------------------
+ *
+ * fileset.h
+ * temporary file management.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/fileset.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FILESET_H
+#define FILESET_H
+
+#include "storage/fd.h"
+
+/*
+ * A set of temporary files.
+ */
+typedef struct FileSet
+{
+ pid_t creator_pid; /* PID of the creating process */
+ uint32 number; /* per-PID identifier */
+ int ntablespaces; /* number of tablespaces to use */
+ Oid tablespaces[8]; /* OIDs of tablespaces to use. Assumes that
+ * it's rare that there more than temp
+ * tablespaces. */
+} FileSet;
+
+extern void FileSetInit(FileSet *fileset);
+extern File FileSetCreate(FileSet *fileset, const char *name);
+extern File FileSetOpen(FileSet *fileset, const char *name,
+ int mode);
+extern bool FileSetDelete(FileSet *fileset, const char *name,
+ bool error_on_failure);
+extern void FileSetDeleteAll(FileSet *fileset);
+
+#endif
diff --git a/src/include/storage/sharedfileset.h b/src/include/storage/sharedfileset.h
index 09ba121..59becfb 100644
--- a/src/include/storage/sharedfileset.h
+++ b/src/include/storage/sharedfileset.h
@@ -17,6 +17,7 @@
#include "storage/dsm.h"
#include "storage/fd.h"
+#include "storage/fileset.h"
#include "storage/spin.h"
/*
@@ -24,24 +25,13 @@
*/
typedef struct SharedFileSet
{
- pid_t creator_pid; /* PID of the creating process */
- uint32 number; /* per-PID identifier */
+ FileSet fs;
slock_t mutex; /* mutex protecting the reference count */
int refcnt; /* number of attached backends */
- int ntablespaces; /* number of tablespaces to use */
- Oid tablespaces[8]; /* OIDs of tablespaces to use. Assumes that
- * it's rare that there more than temp
- * tablespaces. */
} SharedFileSet;
extern void SharedFileSetInit(SharedFileSet *fileset, dsm_segment *seg);
extern void SharedFileSetAttach(SharedFileSet *fileset, dsm_segment *seg);
-extern File SharedFileSetCreate(SharedFileSet *fileset, const char *name);
-extern File SharedFileSetOpen(SharedFileSet *fileset, const char *name,
- int mode);
-extern bool SharedFileSetDelete(SharedFileSet *fileset, const char *name,
- bool error_on_failure);
extern void SharedFileSetDeleteAll(SharedFileSet *fileset);
-extern void SharedFileSetUnregister(SharedFileSet *input_fileset);
#endif
--
1.8.3.1
v3-0002-Better-usage-of-fileset-in-apply-worker.patchtext/x-patch; charset=US-ASCII; name=v3-0002-Better-usage-of-fileset-in-apply-worker.patchDownload
From e7494c21da24d856e113a19dcba8b3c9893055fb Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Tue, 24 Aug 2021 14:44:59 +0530
Subject: [PATCH v3 2/2] Better usage of fileset in apply worker
Instead of using a separate fileset for each xid, use just one
fileset for whole lifetime of the worker. So for each xid, create
buffilea under the same fileset and remove the files whenever we
are done with those files. For subxact file we only need to create
once we get the first subtransaction and for detecting that we
have also extended the buffile open and buffile delete interfaces
to allow the missing files.
---
src/backend/replication/logical/launcher.c | 2 +-
src/backend/replication/logical/worker.c | 234 +++++------------------------
src/backend/storage/file/buffile.c | 23 ++-
src/backend/utils/sort/logtape.c | 2 +-
src/backend/utils/sort/sharedtuplestore.c | 3 +-
src/include/storage/buffile.h | 5 +-
6 files changed, 64 insertions(+), 205 deletions(-)
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 8b1772d..644a9c2 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -648,7 +648,7 @@ logicalrep_worker_onexit(int code, Datum arg)
logicalrep_worker_detach();
- /* Cleanup filesets used for streaming transactions. */
+ /* Cleanup fileset used for streaming transactions. */
logicalrep_worker_cleanupfileset();
ApplyLauncherWakeup();
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 07a2c90..77aaba1 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -221,20 +221,6 @@ typedef struct ApplyExecutionData
PartitionTupleRouting *proute; /* partition routing info */
} ApplyExecutionData;
-/*
- * Stream xid hash entry. Whenever we see a new xid we create this entry in the
- * xidhash and along with it create the streaming file and store the fileset handle.
- * The subxact file is created iff there is any subxact info under this xid. This
- * entry is used on the subsequent streams for the xid to get the corresponding
- * fileset handles, so storing them in hash makes the search faster.
- */
-typedef struct StreamXidHash
-{
- TransactionId xid; /* xid is the hash key and must be first */
- FileSet *stream_fileset; /* file set for stream data */
- FileSet *subxact_fileset; /* file set for subxact info */
-} StreamXidHash;
-
static MemoryContext ApplyMessageContext = NULL;
MemoryContext ApplyContext = NULL;
@@ -255,10 +241,11 @@ static bool in_streamed_transaction = false;
static TransactionId stream_xid = InvalidTransactionId;
/*
- * Hash table for storing the streaming xid information along with filesets
- * for streaming and subxact files.
+ * Fileset for storing the changes and subxact information for the streaming
+ * transaction. We will use only one fileset and for each xid a separate
+ * changes and subxact files will be created under the same fileset.
*/
-static HTAB *xidhash = NULL;
+static FileSet *xidfileset = NULL;
/* BufFile handle of the current streaming file */
static BufFile *stream_fd = NULL;
@@ -1129,7 +1116,6 @@ static void
apply_handle_stream_start(StringInfo s)
{
bool first_segment;
- HASHCTL hash_ctl;
if (in_streamed_transaction)
ereport(ERROR,
@@ -1157,17 +1143,21 @@ apply_handle_stream_start(StringInfo s)
errmsg_internal("invalid transaction ID in streamed replication transaction")));
/*
- * Initialize the xidhash table if we haven't yet. This will be used for
- * the entire duration of the apply worker so create it in permanent
- * context.
+ * Initialize the xidfileset if we haven't yet. This will be used for the
+ * entire duration of the apply worker so create it in permanent context.
*/
- if (xidhash == NULL)
+ if (xidfileset == NULL)
{
- hash_ctl.keysize = sizeof(TransactionId);
- hash_ctl.entrysize = sizeof(StreamXidHash);
- hash_ctl.hcxt = ApplyContext;
- xidhash = hash_create("StreamXidHash", 1024, &hash_ctl,
- HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ MemoryContext oldctx;
+
+ /*
+ * We need to keep the fileset for the worker lifetime so need to
+ * allocate it in a persistent context.
+ */
+ oldctx = MemoryContextSwitchTo(ApplyContext);
+ xidfileset = palloc(sizeof(FileSet));
+ FileSetInit(xidfileset);
+ MemoryContextSwitchTo(oldctx);
}
/* open the spool file for this transaction */
@@ -1258,7 +1248,6 @@ apply_handle_stream_abort(StringInfo s)
BufFile *fd;
bool found = false;
char path[MAXPGPATH];
- StreamXidHash *ent;
subidx = -1;
begin_replication_step();
@@ -1287,19 +1276,9 @@ apply_handle_stream_abort(StringInfo s)
return;
}
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
/* open the changes file */
changes_filename(path, MyLogicalRepWorker->subid, xid);
- fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
+ fd = BufFileOpenFileSet(xidfileset, path, O_RDWR, false);
/* OK, truncate the file at the right offset */
BufFileTruncateFileSet(fd, subxact_data.subxacts[subidx].fileno,
@@ -1327,7 +1306,6 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
int nchanges;
char path[MAXPGPATH];
char *buffer = NULL;
- StreamXidHash *ent;
MemoryContext oldcxt;
BufFile *fd;
@@ -1345,17 +1323,7 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
changes_filename(path, MyLogicalRepWorker->subid, xid);
elog(DEBUG1, "replaying changes from file \"%s\"", path);
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
- fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(xidfileset, path, O_RDONLY, false);
buffer = palloc(BLCKSZ);
initStringInfo(&s2);
@@ -2509,30 +2477,13 @@ UpdateWorkerStats(XLogRecPtr last_lsn, TimestampTz send_time, bool reply)
}
/*
- * Cleanup filesets.
+ * Cleanup fileset.
*/
void
logicalrep_worker_cleanupfileset()
{
- HASH_SEQ_STATUS status;
- StreamXidHash *hentry;
-
- /*
- * Scan the xidhash table if created and from each entry delete stream
- * fileset and the subxact fileset.
- */
- if (xidhash)
- {
- hash_seq_init(&status, xidhash);
- while ((hentry = (StreamXidHash *) hash_seq_search(&status)) != NULL)
- {
- FileSetDeleteAll(hentry->stream_fileset);
-
- /* Delete the subxact fileset only if it is created */
- if (hentry->subxact_fileset)
- FileSetDeleteAll(hentry->subxact_fileset);
- }
- }
+ if (xidfileset != NULL)
+ FileSetDeleteAll(xidfileset);
}
/*
@@ -2984,18 +2935,11 @@ subxact_info_write(Oid subid, TransactionId xid)
{
char path[MAXPGPATH];
Size len;
- StreamXidHash *ent;
BufFile *fd;
Assert(TransactionIdIsValid(xid));
- /* Find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- /* By this time we must have created the transaction entry */
- Assert(ent);
+ subxact_filename(path, subid, xid);
/*
* If there is no subtransaction then nothing to do, but if already have
@@ -3003,39 +2947,18 @@ subxact_info_write(Oid subid, TransactionId xid)
*/
if (subxact_data.nsubxacts == 0)
{
- if (ent->subxact_fileset)
- {
- cleanup_subxact_info();
- FileSetDeleteAll(ent->subxact_fileset);
- pfree(ent->subxact_fileset);
- ent->subxact_fileset = NULL;
- }
+ cleanup_subxact_info();
+ BufFileDeleteFileSet(xidfileset, path, true);
+
return;
}
subxact_filename(path, subid, xid);
- /*
- * Create the subxact file if it not already created, otherwise open the
- * existing file.
- */
- if (ent->subxact_fileset == NULL)
- {
- MemoryContext oldctx;
-
- /*
- * We need to maintain fileset across multiple stream start/stop calls.
- * So, need to allocate it in a persistent context.
- */
- oldctx = MemoryContextSwitchTo(ApplyContext);
- ent->subxact_fileset = palloc(sizeof(FileSet));
- FileSetInit(ent->subxact_fileset);
- MemoryContextSwitchTo(oldctx);
-
- fd = BufFileCreateFileSet(ent->subxact_fileset, path);
- }
- else
- fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDWR);
+ /* Try to open the subxact file, if it doesn't exist then create it */
+ fd = BufFileOpenFileSet(xidfileset, path, O_RDWR, true);
+ if (fd == NULL)
+ fd = BufFileCreateFileSet(xidfileset, path);
len = sizeof(SubXactInfo) * subxact_data.nsubxacts;
@@ -3062,34 +2985,17 @@ subxact_info_read(Oid subid, TransactionId xid)
char path[MAXPGPATH];
Size len;
BufFile *fd;
- StreamXidHash *ent;
MemoryContext oldctx;
Assert(!subxact_data.subxacts);
Assert(subxact_data.nsubxacts == 0);
Assert(subxact_data.nsubxacts_max == 0);
- /* Find the stream xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
- /*
- * If subxact_fileset is not valid that mean we don't have any subxact
- * info
- */
- if (ent->subxact_fileset == NULL)
- return;
-
subxact_filename(path, subid, xid);
- fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(xidfileset, path, O_RDONLY, true);
+ if (fd == NULL)
+ return;
/* read number of subxact items */
if (BufFileRead(fd, &subxact_data.nsubxacts,
@@ -3231,36 +3137,13 @@ static void
stream_cleanup_files(Oid subid, TransactionId xid)
{
char path[MAXPGPATH];
- StreamXidHash *ent;
-
- /* Find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
/* Delete the change file and release the stream fileset memory */
changes_filename(path, subid, xid);
- FileSetDeleteAll(ent->stream_fileset);
- pfree(ent->stream_fileset);
- ent->stream_fileset = NULL;
-
- /* Delete the subxact file and release the memory, if it exist */
- if (ent->subxact_fileset)
- {
- subxact_filename(path, subid, xid);
- FileSetDeleteAll(ent->subxact_fileset);
- pfree(ent->subxact_fileset);
- ent->subxact_fileset = NULL;
- }
+ BufFileDeleteFileSet(xidfileset, path, false);
- /* Remove the xid entry from the stream xid hash */
- hash_search(xidhash, (void *) &xid, HASH_REMOVE, NULL);
+ subxact_filename(path, subid, xid);
+ BufFileDeleteFileSet(xidfileset, path, true);
}
/*
@@ -3270,8 +3153,8 @@ stream_cleanup_files(Oid subid, TransactionId xid)
*
* Open a file for streamed changes from a toplevel transaction identified
* by stream_xid (global variable). If it's the first chunk of streamed
- * changes for this transaction, initialize the fileset and create the buffile,
- * otherwise open the previously created file.
+ * changes for this transaction, create the buffile, otherwise open the
+ * previously created file.
*
* This can only be called at the beginning of a "streaming" block, i.e.
* between stream_start/stream_stop messages from the upstream.
@@ -3280,20 +3163,13 @@ static void
stream_open_file(Oid subid, TransactionId xid, bool first_segment)
{
char path[MAXPGPATH];
- bool found;
MemoryContext oldcxt;
- StreamXidHash *ent;
Assert(in_streamed_transaction);
Assert(OidIsValid(subid));
Assert(TransactionIdIsValid(xid));
Assert(stream_fd == NULL);
- /* create or find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_ENTER,
- &found);
changes_filename(path, subid, xid);
elog(DEBUG1, "opening file \"%s\" for streamed changes", path);
@@ -3310,44 +3186,14 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
* writing, in append mode.
*/
if (first_segment)
- {
- MemoryContext savectx;
- FileSet *fileset;
-
- if (found)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
-
- /*
- * We need to maintain fileset across multiple stream start/stop calls.
- * So, need to allocate it in a persistent context.
- */
- savectx = MemoryContextSwitchTo(ApplyContext);
- fileset = palloc(sizeof(FileSet));
-
- FileSetInit(fileset);
- MemoryContextSwitchTo(savectx);
-
- stream_fd = BufFileCreateFileSet(fileset, path);
-
- /* Remember the fileset for the next stream of the same transaction */
- ent->xid = xid;
- ent->stream_fileset = fileset;
- ent->subxact_fileset = NULL;
- }
+ stream_fd = BufFileCreateFileSet(xidfileset, path);
else
{
- if (!found)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
-
/*
* Open the file and seek to the end of the file because we always
* append the changes file.
*/
- stream_fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
+ stream_fd = BufFileOpenFileSet(xidfileset, path, O_RDWR, false);
BufFileSeek(stream_fd, 0, 0, SEEK_END);
}
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index df3e099..1cf9733 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -278,10 +278,12 @@ BufFileCreateFileSet(FileSet *fileset, const char *name)
* with BufFileCreateFileSet in the same FileSet using the same name.
* The backend that created the file must have called BufFileClose() or
* BufFileExportFileSet() to make sure that it is ready to be opened by other
- * backends and render it read-only.
+ * backends and render it read-only. If missing_ok is true then it will return
+ * NULL if file doesn't exist otherwise error.
*/
BufFile *
-BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
+BufFileOpenFileSet(FileSet *fileset, const char *name, int mode,
+ bool missing_ok)
{
BufFile *file;
char segment_name[MAXPGPATH];
@@ -318,10 +320,18 @@ BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
* name.
*/
if (nfiles == 0)
+ {
+ /* free the memory */
+ pfree(files);
+
+ if (missing_ok)
+ return NULL;
+
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not open temporary file \"%s\" from BufFile \"%s\": %m",
segment_name, name)));
+ }
file = makeBufFileCommon(nfiles);
file->files = files;
@@ -341,10 +351,11 @@ BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
* the FileSet to be cleaned up.
*
* Only one backend should attempt to delete a given name, and should know
- * that it exists and has been exported or closed.
+ * that it exists and has been exported or closed otherwise missing_ok should
+ * be passed true.
*/
void
-BufFileDeleteFileSet(FileSet *fileset, const char *name)
+BufFileDeleteFileSet(FileSet *fileset, const char *name, bool missing_ok)
{
char segment_name[MAXPGPATH];
int segment = 0;
@@ -358,7 +369,7 @@ BufFileDeleteFileSet(FileSet *fileset, const char *name)
for (;;)
{
FileSetSegmentName(segment_name, name, segment);
- if (!FileSetDelete(fileset, segment_name, true))
+ if (!FileSetDelete(fileset, segment_name, !missing_ok))
break;
found = true;
++segment;
@@ -366,7 +377,7 @@ BufFileDeleteFileSet(FileSet *fileset, const char *name)
CHECK_FOR_INTERRUPTS();
}
- if (!found)
+ if (!found && !missing_ok)
elog(ERROR, "could not delete unknown BufFile \"%s\"", name);
}
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index f7994d7..debf12e1 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -564,7 +564,7 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
lt = <s->tapes[i];
pg_itoa(i, filename);
- file = BufFileOpenFileSet(&fileset->fs, filename, O_RDONLY);
+ file = BufFileOpenFileSet(&fileset->fs, filename, O_RDONLY, false);
filesize = BufFileSize(file);
/*
diff --git a/src/backend/utils/sort/sharedtuplestore.c b/src/backend/utils/sort/sharedtuplestore.c
index 72acd54..8c5135c 100644
--- a/src/backend/utils/sort/sharedtuplestore.c
+++ b/src/backend/utils/sort/sharedtuplestore.c
@@ -560,7 +560,8 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
sts_filename(name, accessor, accessor->read_participant);
accessor->read_file =
- BufFileOpenFileSet(&accessor->fileset->fs, name, O_RDONLY);
+ BufFileOpenFileSet(&accessor->fileset->fs, name, O_RDONLY,
+ false);
}
/* Seek and load the chunk header. */
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 032a823..5e9df44 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -49,8 +49,9 @@ extern long BufFileAppend(BufFile *target, BufFile *source);
extern BufFile *BufFileCreateFileSet(FileSet *fileset, const char *name);
extern void BufFileExportFileSet(BufFile *file);
extern BufFile *BufFileOpenFileSet(FileSet *fileset, const char *name,
- int mode);
-extern void BufFileDeleteFileSet(FileSet *fileset, const char *name);
+ int mode, bool missing_ok);
+extern void BufFileDeleteFileSet(FileSet *fileset, const char *name,
+ bool missing_ok);
extern void BufFileTruncateFileSet(BufFile *file, int fileno, off_t offset);
#endif /* BUFFILE_H */
--
1.8.3.1
On Tuesday, August 24, 2021 6:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Tue, Aug 24, 2021 at 12:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I was also thinking about the same, does it make sense to name it
just ""%s/%s%lu.%u.fileset"?Done
I think it is reasonable to use .fileset as proposed by you.
Few other comments:
Done
After applying the patch, I tested the impacted features
(parallel hashjoin/btree build) with different settings
(log_temp_files/maintenance_work_mem/parallel_leader_participation/workers)
, and the patch works well.
One thing I noticed is that when enable log_temp_files, the pathname in log
changed from "xxx.sharedfileset" to "xxx.fileset", and I also noticed some
blogs[1]https://blog.dbi-services.com/about-temp_tablespaces-in-postgresql/ reference the old path name. Although, I think it doesn't matter, just
share the information here in case someone has different opinions.
Some minor thing I noticed in the patch:
1)
it seems we can add "FileSet" to typedefs.list
2)
The commit message in 0002 used "shared fileset" which should be "fileset",
---
Instead of using a separate shared fileset for each xid, use one shared
fileset for whole lifetime of the worker...
---
[1]: https://blog.dbi-services.com/about-temp_tablespaces-in-postgresql/
Best regards,
Hou zj
On Tue, Aug 24, 2021 at 3:55 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Tue, Aug 24, 2021 at 12:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
The first patch looks good to me. I have made minor changes to the
attached patch. The changes include: fixing compilation warning, made
some comment changes, ran pgindent, and few other cosmetic changes. If
you are fine with the attached, then kindly rebase the second patch
atop it.
--
With Regards,
Amit Kapila.
Attachments:
v5-0001-Refactor-sharedfileset.c-to-separate-out-fileset-.patchapplication/octet-stream; name=v5-0001-Refactor-sharedfileset.c-to-separate-out-fileset-.patchDownload
From bbf58ee17aab7057b92d4d1da69a5e9503b0f85e Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Wed, 25 Aug 2021 15:00:19 +0530
Subject: [PATCH v5] Refactor sharedfileset.c to separate out fileset
implementation.
Move fileset related implementation out of sharedfileset.c to allow its
usage by backends that don't want to share filesets among different
processes. After this split, fileset infrastructure is used by both
sharedfileset.c and worker.c for the named temporary files that survive
across transactions.
Author: Dilip Kumar, based on suggestion by Andres Freund
Reviewed-by: Hou Zhijie, Masahiko Sawada, Amit Kapila
Discussion: https://postgr.es/m/E1mCC6U-0004Ik-Fs@gemulon.postgresql.org
---
src/backend/replication/logical/launcher.c | 3 +
src/backend/replication/logical/worker.c | 82 ++++++----
src/backend/storage/file/Makefile | 1 +
src/backend/storage/file/buffile.c | 84 +++++-----
src/backend/storage/file/fd.c | 2 +-
src/backend/storage/file/fileset.c | 205 ++++++++++++++++++++++++
src/backend/storage/file/sharedfileset.c | 244 +----------------------------
src/backend/utils/sort/logtape.c | 8 +-
src/backend/utils/sort/sharedtuplestore.c | 5 +-
src/include/replication/worker_internal.h | 1 +
src/include/storage/buffile.h | 14 +-
src/include/storage/fileset.h | 40 +++++
src/include/storage/sharedfileset.h | 14 +-
src/tools/pgindent/typedefs.list | 1 +
14 files changed, 368 insertions(+), 336 deletions(-)
create mode 100644 src/backend/storage/file/fileset.c
create mode 100644 src/include/storage/fileset.h
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index e3b11da..8b1772d 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -648,6 +648,9 @@ logicalrep_worker_onexit(int code, Datum arg)
logicalrep_worker_detach();
+ /* Cleanup filesets used for streaming transactions. */
+ logicalrep_worker_cleanupfileset();
+
ApplyLauncherWakeup();
}
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 38b493e..6adc6ddd 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -39,13 +39,13 @@
* BufFile infrastructure supports temporary files that exceed the OS file size
* limit, (b) provides a way for automatic clean up on the error and (c) provides
* a way to survive these files across local transactions and allow to open and
- * close at stream start and close. We decided to use SharedFileSet
+ * close at stream start and close. We decided to use FileSet
* infrastructure as without that it deletes the files on the closure of the
* file and if we decide to keep stream files open across the start/stop stream
* then it will consume a lot of memory (more than 8K for each BufFile and
* there could be multiple such BufFiles as the subscriber could receive
* multiple start/stop streams for different transactions before getting the
- * commit). Moreover, if we don't use SharedFileSet then we also need to invent
+ * commit). Moreover, if we don't use FileSet then we also need to invent
* a new way to pass filenames to BufFile APIs so that we are allowed to open
* the file we desired across multiple stream-open calls for the same
* transaction.
@@ -231,8 +231,8 @@ typedef struct ApplyExecutionData
typedef struct StreamXidHash
{
TransactionId xid; /* xid is the hash key and must be first */
- SharedFileSet *stream_fileset; /* shared file set for stream data */
- SharedFileSet *subxact_fileset; /* shared file set for subxact info */
+ FileSet *stream_fileset; /* file set for stream data */
+ FileSet *subxact_fileset; /* file set for subxact info */
} StreamXidHash;
static MemoryContext ApplyMessageContext = NULL;
@@ -255,8 +255,8 @@ static bool in_streamed_transaction = false;
static TransactionId stream_xid = InvalidTransactionId;
/*
- * Hash table for storing the streaming xid information along with shared file
- * set for streaming and subxact files.
+ * Hash table for storing the streaming xid information along with filesets
+ * for streaming and subxact files.
*/
static HTAB *xidhash = NULL;
@@ -1299,11 +1299,11 @@ apply_handle_stream_abort(StringInfo s)
/* open the changes file */
changes_filename(path, MyLogicalRepWorker->subid, xid);
- fd = BufFileOpenShared(ent->stream_fileset, path, O_RDWR);
+ fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
/* OK, truncate the file at the right offset */
- BufFileTruncateShared(fd, subxact_data.subxacts[subidx].fileno,
- subxact_data.subxacts[subidx].offset);
+ BufFileTruncateFileSet(fd, subxact_data.subxacts[subidx].fileno,
+ subxact_data.subxacts[subidx].offset);
BufFileClose(fd);
/* discard the subxacts added later */
@@ -1355,7 +1355,7 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
errmsg_internal("transaction %u not found in stream XID hash table",
xid)));
- fd = BufFileOpenShared(ent->stream_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDONLY);
buffer = palloc(BLCKSZ);
initStringInfo(&s2);
@@ -2509,6 +2509,30 @@ UpdateWorkerStats(XLogRecPtr last_lsn, TimestampTz send_time, bool reply)
}
/*
+ * Cleanup filesets.
+ */
+void
+logicalrep_worker_cleanupfileset(void)
+{
+ HASH_SEQ_STATUS status;
+ StreamXidHash *hentry;
+
+ /* Remove all the pending stream and subxact filesets. */
+ if (xidhash)
+ {
+ hash_seq_init(&status, xidhash);
+ while ((hentry = (StreamXidHash *) hash_seq_search(&status)) != NULL)
+ {
+ FileSetDeleteAll(hentry->stream_fileset);
+
+ /* Delete the subxact fileset iff it is created. */
+ if (hentry->subxact_fileset)
+ FileSetDeleteAll(hentry->subxact_fileset);
+ }
+ }
+}
+
+/*
* Apply main loop.
*/
static void
@@ -2979,7 +3003,7 @@ subxact_info_write(Oid subid, TransactionId xid)
if (ent->subxact_fileset)
{
cleanup_subxact_info();
- SharedFileSetDeleteAll(ent->subxact_fileset);
+ FileSetDeleteAll(ent->subxact_fileset);
pfree(ent->subxact_fileset);
ent->subxact_fileset = NULL;
}
@@ -2997,18 +3021,18 @@ subxact_info_write(Oid subid, TransactionId xid)
MemoryContext oldctx;
/*
- * We need to maintain shared fileset across multiple stream
- * start/stop calls. So, need to allocate it in a persistent context.
+ * We need to maintain fileset across multiple stream start/stop
+ * calls. So, need to allocate it in a persistent context.
*/
oldctx = MemoryContextSwitchTo(ApplyContext);
- ent->subxact_fileset = palloc(sizeof(SharedFileSet));
- SharedFileSetInit(ent->subxact_fileset, NULL);
+ ent->subxact_fileset = palloc(sizeof(FileSet));
+ FileSetInit(ent->subxact_fileset);
MemoryContextSwitchTo(oldctx);
- fd = BufFileCreateShared(ent->subxact_fileset, path);
+ fd = BufFileCreateFileSet(ent->subxact_fileset, path);
}
else
- fd = BufFileOpenShared(ent->subxact_fileset, path, O_RDWR);
+ fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDWR);
len = sizeof(SubXactInfo) * subxact_data.nsubxacts;
@@ -3062,7 +3086,7 @@ subxact_info_read(Oid subid, TransactionId xid)
subxact_filename(path, subid, xid);
- fd = BufFileOpenShared(ent->subxact_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDONLY);
/* read number of subxact items */
if (BufFileRead(fd, &subxact_data.nsubxacts,
@@ -3219,7 +3243,7 @@ stream_cleanup_files(Oid subid, TransactionId xid)
/* Delete the change file and release the stream fileset memory */
changes_filename(path, subid, xid);
- SharedFileSetDeleteAll(ent->stream_fileset);
+ FileSetDeleteAll(ent->stream_fileset);
pfree(ent->stream_fileset);
ent->stream_fileset = NULL;
@@ -3227,7 +3251,7 @@ stream_cleanup_files(Oid subid, TransactionId xid)
if (ent->subxact_fileset)
{
subxact_filename(path, subid, xid);
- SharedFileSetDeleteAll(ent->subxact_fileset);
+ FileSetDeleteAll(ent->subxact_fileset);
pfree(ent->subxact_fileset);
ent->subxact_fileset = NULL;
}
@@ -3243,8 +3267,8 @@ stream_cleanup_files(Oid subid, TransactionId xid)
*
* Open a file for streamed changes from a toplevel transaction identified
* by stream_xid (global variable). If it's the first chunk of streamed
- * changes for this transaction, initialize the shared fileset and create the
- * buffile, otherwise open the previously created file.
+ * changes for this transaction, initialize the fileset and create the buffile,
+ * otherwise open the previously created file.
*
* This can only be called at the beginning of a "streaming" block, i.e.
* between stream_start/stream_stop messages from the upstream.
@@ -3285,7 +3309,7 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
if (first_segment)
{
MemoryContext savectx;
- SharedFileSet *fileset;
+ FileSet *fileset;
if (found)
ereport(ERROR,
@@ -3293,16 +3317,16 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
/*
- * We need to maintain shared fileset across multiple stream
- * start/stop calls. So, need to allocate it in a persistent context.
+ * We need to maintain fileset across multiple stream start/stop
+ * calls. So, need to allocate it in a persistent context.
*/
savectx = MemoryContextSwitchTo(ApplyContext);
- fileset = palloc(sizeof(SharedFileSet));
+ fileset = palloc(sizeof(FileSet));
- SharedFileSetInit(fileset, NULL);
+ FileSetInit(fileset);
MemoryContextSwitchTo(savectx);
- stream_fd = BufFileCreateShared(fileset, path);
+ stream_fd = BufFileCreateFileSet(fileset, path);
/* Remember the fileset for the next stream of the same transaction */
ent->xid = xid;
@@ -3320,7 +3344,7 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
* Open the file and seek to the end of the file because we always
* append the changes file.
*/
- stream_fd = BufFileOpenShared(ent->stream_fileset, path, O_RDWR);
+ stream_fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
BufFileSeek(stream_fd, 0, 0, SEEK_END);
}
diff --git a/src/backend/storage/file/Makefile b/src/backend/storage/file/Makefile
index 5e1291b..660ac51 100644
--- a/src/backend/storage/file/Makefile
+++ b/src/backend/storage/file/Makefile
@@ -16,6 +16,7 @@ OBJS = \
buffile.o \
copydir.o \
fd.o \
+ fileset.o \
reinit.o \
sharedfileset.o
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index a4be5fe..5e5409d 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -39,7 +39,7 @@
* BufFile also supports temporary files that can be used by the single backend
* when the corresponding files need to be survived across the transaction and
* need to be opened and closed multiple times. Such files need to be created
- * as a member of a SharedFileSet.
+ * as a member of a FileSet.
*-------------------------------------------------------------------------
*/
@@ -77,8 +77,8 @@ struct BufFile
bool dirty; /* does buffer need to be written? */
bool readOnly; /* has the file been set to read only? */
- SharedFileSet *fileset; /* space for segment files if shared */
- const char *name; /* name of this BufFile if shared */
+ FileSet *fileset; /* space for fileset based segment files */
+ const char *name; /* name of fileset based BufFile */
/*
* resowner is the ResourceOwner to use for underlying temp files. (We
@@ -104,7 +104,7 @@ static void extendBufFile(BufFile *file);
static void BufFileLoadBuffer(BufFile *file);
static void BufFileDumpBuffer(BufFile *file);
static void BufFileFlush(BufFile *file);
-static File MakeNewSharedSegment(BufFile *file, int segment);
+static File MakeNewFileSetSegment(BufFile *file, int segment);
/*
* Create BufFile and perform the common initialization.
@@ -160,7 +160,7 @@ extendBufFile(BufFile *file)
if (file->fileset == NULL)
pfile = OpenTemporaryFile(file->isInterXact);
else
- pfile = MakeNewSharedSegment(file, file->numFiles);
+ pfile = MakeNewFileSetSegment(file, file->numFiles);
Assert(pfile >= 0);
@@ -214,34 +214,34 @@ BufFileCreateTemp(bool interXact)
* Build the name for a given segment of a given BufFile.
*/
static void
-SharedSegmentName(char *name, const char *buffile_name, int segment)
+FileSetSegmentName(char *name, const char *buffile_name, int segment)
{
snprintf(name, MAXPGPATH, "%s.%d", buffile_name, segment);
}
/*
- * Create a new segment file backing a shared BufFile.
+ * Create a new segment file backing a fileset based BufFile.
*/
static File
-MakeNewSharedSegment(BufFile *buffile, int segment)
+MakeNewFileSetSegment(BufFile *buffile, int segment)
{
char name[MAXPGPATH];
File file;
/*
* It is possible that there are files left over from before a crash
- * restart with the same name. In order for BufFileOpenShared() not to
+ * restart with the same name. In order for BufFileOpenFileSet() not to
* get confused about how many segments there are, we'll unlink the next
* segment number if it already exists.
*/
- SharedSegmentName(name, buffile->name, segment + 1);
- SharedFileSetDelete(buffile->fileset, name, true);
+ FileSetSegmentName(name, buffile->name, segment + 1);
+ FileSetDelete(buffile->fileset, name, true);
/* Create the new segment. */
- SharedSegmentName(name, buffile->name, segment);
- file = SharedFileSetCreate(buffile->fileset, name);
+ FileSetSegmentName(name, buffile->name, segment);
+ file = FileSetCreate(buffile->fileset, name);
- /* SharedFileSetCreate would've errored out */
+ /* FileSetCreate would've errored out */
Assert(file > 0);
return file;
@@ -251,15 +251,15 @@ MakeNewSharedSegment(BufFile *buffile, int segment)
* Create a BufFile that can be discovered and opened read-only by other
* backends that are attached to the same SharedFileSet using the same name.
*
- * The naming scheme for shared BufFiles is left up to the calling code. The
- * name will appear as part of one or more filenames on disk, and might
+ * The naming scheme for fileset based BufFiles is left up to the calling code.
+ * The name will appear as part of one or more filenames on disk, and might
* provide clues to administrators about which subsystem is generating
* temporary file data. Since each SharedFileSet object is backed by one or
* more uniquely named temporary directory, names don't conflict with
* unrelated SharedFileSet objects.
*/
BufFile *
-BufFileCreateShared(SharedFileSet *fileset, const char *name)
+BufFileCreateFileSet(FileSet *fileset, const char *name)
{
BufFile *file;
@@ -267,7 +267,7 @@ BufFileCreateShared(SharedFileSet *fileset, const char *name)
file->fileset = fileset;
file->name = pstrdup(name);
file->files = (File *) palloc(sizeof(File));
- file->files[0] = MakeNewSharedSegment(file, 0);
+ file->files[0] = MakeNewFileSetSegment(file, 0);
file->readOnly = false;
return file;
@@ -275,13 +275,13 @@ BufFileCreateShared(SharedFileSet *fileset, const char *name)
/*
* Open a file that was previously created in another backend (or this one)
- * with BufFileCreateShared in the same SharedFileSet using the same name.
+ * with BufFileCreateFileSet in the same FileSet using the same name.
* The backend that created the file must have called BufFileClose() or
- * BufFileExportShared() to make sure that it is ready to be opened by other
+ * BufFileExportFileSet() to make sure that it is ready to be opened by other
* backends and render it read-only.
*/
BufFile *
-BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
+BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
{
BufFile *file;
char segment_name[MAXPGPATH];
@@ -304,8 +304,8 @@ BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
files = repalloc(files, sizeof(File) * capacity);
}
/* Try to load a segment. */
- SharedSegmentName(segment_name, name, nfiles);
- files[nfiles] = SharedFileSetOpen(fileset, segment_name, mode);
+ FileSetSegmentName(segment_name, name, nfiles);
+ files[nfiles] = FileSetOpen(fileset, segment_name, mode);
if (files[nfiles] <= 0)
break;
++nfiles;
@@ -333,18 +333,18 @@ BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
}
/*
- * Delete a BufFile that was created by BufFileCreateShared in the given
- * SharedFileSet using the given name.
+ * Delete a BufFile that was created by BufFileCreateFileSet in the given
+ * FileSet using the given name.
*
* It is not necessary to delete files explicitly with this function. It is
* provided only as a way to delete files proactively, rather than waiting for
- * the SharedFileSet to be cleaned up.
+ * the FileSet to be cleaned up.
*
* Only one backend should attempt to delete a given name, and should know
* that it exists and has been exported or closed.
*/
void
-BufFileDeleteShared(SharedFileSet *fileset, const char *name)
+BufFileDeleteFileSet(FileSet *fileset, const char *name)
{
char segment_name[MAXPGPATH];
int segment = 0;
@@ -357,8 +357,8 @@ BufFileDeleteShared(SharedFileSet *fileset, const char *name)
*/
for (;;)
{
- SharedSegmentName(segment_name, name, segment);
- if (!SharedFileSetDelete(fileset, segment_name, true))
+ FileSetSegmentName(segment_name, name, segment);
+ if (!FileSetDelete(fileset, segment_name, true))
break;
found = true;
++segment;
@@ -367,16 +367,16 @@ BufFileDeleteShared(SharedFileSet *fileset, const char *name)
}
if (!found)
- elog(ERROR, "could not delete unknown shared BufFile \"%s\"", name);
+ elog(ERROR, "could not delete unknown BufFile \"%s\"", name);
}
/*
- * BufFileExportShared --- flush and make read-only, in preparation for sharing.
+ * BufFileExportFileSet --- flush and make read-only, in preparation for sharing.
*/
void
-BufFileExportShared(BufFile *file)
+BufFileExportFileSet(BufFile *file)
{
- /* Must be a file belonging to a SharedFileSet. */
+ /* Must be a file belonging to a FileSet. */
Assert(file->fileset != NULL);
/* It's probably a bug if someone calls this twice. */
@@ -785,7 +785,7 @@ BufFileTellBlock(BufFile *file)
#endif
/*
- * Return the current shared BufFile size.
+ * Return the current fileset based BufFile size.
*
* Counts any holes left behind by BufFileAppend as part of the size.
* ereport()s on failure.
@@ -811,8 +811,8 @@ BufFileSize(BufFile *file)
}
/*
- * Append the contents of source file (managed within shared fileset) to
- * end of target file (managed within same shared fileset).
+ * Append the contents of source file (managed within fileset) to
+ * end of target file (managed within same fileset).
*
* Note that operation subsumes ownership of underlying resources from
* "source". Caller should never call BufFileClose against source having
@@ -854,11 +854,11 @@ BufFileAppend(BufFile *target, BufFile *source)
}
/*
- * Truncate a BufFile created by BufFileCreateShared up to the given fileno and
- * the offset.
+ * Truncate a BufFile created by BufFileCreateFileSet up to the given fileno
+ * and the offset.
*/
void
-BufFileTruncateShared(BufFile *file, int fileno, off_t offset)
+BufFileTruncateFileSet(BufFile *file, int fileno, off_t offset)
{
int numFiles = file->numFiles;
int newFile = fileno;
@@ -876,12 +876,12 @@ BufFileTruncateShared(BufFile *file, int fileno, off_t offset)
{
if ((i != fileno || offset == 0) && i != 0)
{
- SharedSegmentName(segment_name, file->name, i);
+ FileSetSegmentName(segment_name, file->name, i);
FileClose(file->files[i]);
- if (!SharedFileSetDelete(file->fileset, segment_name, true))
+ if (!FileSetDelete(file->fileset, segment_name, true))
ereport(ERROR,
(errcode_for_file_access(),
- errmsg("could not delete shared fileset \"%s\": %m",
+ errmsg("could not delete fileset \"%s\": %m",
segment_name)));
numFiles--;
newOffset = MAX_PHYSICAL_FILESIZE;
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index b58b399..433e283 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -1921,7 +1921,7 @@ PathNameDeleteTemporaryFile(const char *path, bool error_on_failure)
/*
* Unlike FileClose's automatic file deletion code, we tolerate
- * non-existence to support BufFileDeleteShared which doesn't know how
+ * non-existence to support BufFileDeleteFileSet which doesn't know how
* many segments it has to delete until it runs out.
*/
if (stat_errno == ENOENT)
diff --git a/src/backend/storage/file/fileset.c b/src/backend/storage/file/fileset.c
new file mode 100644
index 0000000..282ff12
--- /dev/null
+++ b/src/backend/storage/file/fileset.c
@@ -0,0 +1,205 @@
+/*-------------------------------------------------------------------------
+ *
+ * fileset.c
+ * Management of named temporary files.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/storage/file/fileset.c
+ *
+ * FileSets provide a temporary namespace (think directory) so that files can
+ * be discovered by name.
+ *
+ * FileSets can be used by backends when the temporary files need to be
+ * opened/closed multiple times and the underlying files need to survive across
+ * transactions.
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <limits.h>
+
+#include "catalog/pg_tablespace.h"
+#include "commands/tablespace.h"
+#include "common/hashfn.h"
+#include "miscadmin.h"
+#include "storage/ipc.h"
+#include "storage/fileset.h"
+#include "utils/builtins.h"
+
+static void FileSetPath(char *path, FileSet *fileset, Oid tablespace);
+static void FilePath(char *path, FileSet *fileset, const char *name);
+static Oid ChooseTablespace(const FileSet *fileset, const char *name);
+
+/*
+ * Initialize a space for temporary files. This API can be used by shared
+ * fileset as well as if the temporary files are used only by single backend
+ * but the files need to be opened and closed multiple times and also the
+ * underlying files need to survive across transactions.
+ *
+ * The callers are expected to explicitly remove such files by using
+ * FileSetDelete/FileSetDeleteAll.
+ *
+ * Files will be distributed over the tablespaces configured in
+ * temp_tablespaces.
+ *
+ * Under the covers the set is one or more directories which will eventually
+ * be deleted.
+ */
+void
+FileSetInit(FileSet *fileset)
+{
+ static uint32 counter = 0;
+
+ fileset->creator_pid = MyProcPid;
+ fileset->number = counter;
+ counter = (counter + 1) % INT_MAX;
+
+ /* Capture the tablespace OIDs so that all backends agree on them. */
+ PrepareTempTablespaces();
+ fileset->ntablespaces =
+ GetTempTablespaces(&fileset->tablespaces[0],
+ lengthof(fileset->tablespaces));
+ if (fileset->ntablespaces == 0)
+ {
+ /* If the GUC is empty, use current database's default tablespace */
+ fileset->tablespaces[0] = MyDatabaseTableSpace;
+ fileset->ntablespaces = 1;
+ }
+ else
+ {
+ int i;
+
+ /*
+ * An entry of InvalidOid means use the default tablespace for the
+ * current database. Replace that now, to be sure that all users of
+ * the FileSet agree on what to do.
+ */
+ for (i = 0; i < fileset->ntablespaces; i++)
+ {
+ if (fileset->tablespaces[i] == InvalidOid)
+ fileset->tablespaces[i] = MyDatabaseTableSpace;
+ }
+ }
+}
+
+/*
+ * Create a new file in the given set.
+ */
+File
+FileSetCreate(FileSet *fileset, const char *name)
+{
+ char path[MAXPGPATH];
+ File file;
+
+ FilePath(path, fileset, name);
+ file = PathNameCreateTemporaryFile(path, false);
+
+ /* If we failed, see if we need to create the directory on demand. */
+ if (file <= 0)
+ {
+ char tempdirpath[MAXPGPATH];
+ char filesetpath[MAXPGPATH];
+ Oid tablespace = ChooseTablespace(fileset, name);
+
+ TempTablespacePath(tempdirpath, tablespace);
+ FileSetPath(filesetpath, fileset, tablespace);
+ PathNameCreateTemporaryDir(tempdirpath, filesetpath);
+ file = PathNameCreateTemporaryFile(path, true);
+ }
+
+ return file;
+}
+
+/*
+ * Open a file that was created with FileSetCreate() */
+File
+FileSetOpen(FileSet *fileset, const char *name, int mode)
+{
+ char path[MAXPGPATH];
+ File file;
+
+ FilePath(path, fileset, name);
+ file = PathNameOpenTemporaryFile(path, mode);
+
+ return file;
+}
+
+/*
+ * Delete a file that was created with FileSetCreate().
+ *
+ * Return true if the file existed, false if didn't.
+ */
+bool
+FileSetDelete(FileSet *fileset, const char *name,
+ bool error_on_failure)
+{
+ char path[MAXPGPATH];
+
+ FilePath(path, fileset, name);
+
+ return PathNameDeleteTemporaryFile(path, error_on_failure);
+}
+
+/*
+ * Delete all files in the set.
+ */
+void
+FileSetDeleteAll(FileSet *fileset)
+{
+ char dirpath[MAXPGPATH];
+ int i;
+
+ /*
+ * Delete the directory we created in each tablespace. Doesn't fail
+ * because we use this in error cleanup paths, but can generate LOG
+ * message on IO error.
+ */
+ for (i = 0; i < fileset->ntablespaces; ++i)
+ {
+ FileSetPath(dirpath, fileset, fileset->tablespaces[i]);
+ PathNameDeleteTemporaryDir(dirpath);
+ }
+}
+
+/*
+ * Build the path for the directory holding the files backing a FileSet in a
+ * given tablespace.
+ */
+static void
+FileSetPath(char *path, FileSet *fileset, Oid tablespace)
+{
+ char tempdirpath[MAXPGPATH];
+
+ TempTablespacePath(tempdirpath, tablespace);
+ snprintf(path, MAXPGPATH, "%s/%s%lu.%u.fileset",
+ tempdirpath, PG_TEMP_FILE_PREFIX,
+ (unsigned long) fileset->creator_pid, fileset->number);
+}
+
+/*
+ * Sorting has to determine which tablespace a given temporary file belongs in.
+ */
+static Oid
+ChooseTablespace(const FileSet *fileset, const char *name)
+{
+ uint32 hash = hash_any((const unsigned char *) name, strlen(name));
+
+ return fileset->tablespaces[hash % fileset->ntablespaces];
+}
+
+/*
+ * Compute the full path of a file in a FileSet.
+ */
+static void
+FilePath(char *path, FileSet *fileset, const char *name)
+{
+ char dirpath[MAXPGPATH];
+
+ FileSetPath(dirpath, fileset, ChooseTablespace(fileset, name));
+ snprintf(path, MAXPGPATH, "%s/%s", dirpath, name);
+}
diff --git a/src/backend/storage/file/sharedfileset.c b/src/backend/storage/file/sharedfileset.c
index ed37c94..6a33fac 100644
--- a/src/backend/storage/file/sharedfileset.c
+++ b/src/backend/storage/file/sharedfileset.c
@@ -13,10 +13,6 @@
* files can be discovered by name, and a shared ownership semantics so that
* shared files survive until the last user detaches.
*
- * SharedFileSets can be used by backends when the temporary files need to be
- * opened/closed multiple times and the underlying files need to survive across
- * transactions.
- *
*-------------------------------------------------------------------------
*/
@@ -33,13 +29,7 @@
#include "storage/sharedfileset.h"
#include "utils/builtins.h"
-static List *filesetlist = NIL;
-
static void SharedFileSetOnDetach(dsm_segment *segment, Datum datum);
-static void SharedFileSetDeleteOnProcExit(int status, Datum arg);
-static void SharedFileSetPath(char *path, SharedFileSet *fileset, Oid tablespace);
-static void SharedFilePath(char *path, SharedFileSet *fileset, const char *name);
-static Oid ChooseTablespace(const SharedFileSet *fileset, const char *name);
/*
* Initialize a space for temporary files that can be opened by other backends.
@@ -47,77 +37,22 @@ static Oid ChooseTablespace(const SharedFileSet *fileset, const char *name);
* SharedFileSet with 'seg'. Any contained files will be deleted when the
* last backend detaches.
*
- * We can also use this interface if the temporary files are used only by
- * single backend but the files need to be opened and closed multiple times
- * and also the underlying files need to survive across transactions. For
- * such cases, dsm segment 'seg' should be passed as NULL. Callers are
- * expected to explicitly remove such files by using SharedFileSetDelete/
- * SharedFileSetDeleteAll or we remove such files on proc exit.
- *
- * Files will be distributed over the tablespaces configured in
- * temp_tablespaces.
- *
* Under the covers the set is one or more directories which will eventually
* be deleted.
*/
void
SharedFileSetInit(SharedFileSet *fileset, dsm_segment *seg)
{
- static uint32 counter = 0;
-
+ /* Initialize the shared fileset specific members. */
SpinLockInit(&fileset->mutex);
fileset->refcnt = 1;
- fileset->creator_pid = MyProcPid;
- fileset->number = counter;
- counter = (counter + 1) % INT_MAX;
-
- /* Capture the tablespace OIDs so that all backends agree on them. */
- PrepareTempTablespaces();
- fileset->ntablespaces =
- GetTempTablespaces(&fileset->tablespaces[0],
- lengthof(fileset->tablespaces));
- if (fileset->ntablespaces == 0)
- {
- /* If the GUC is empty, use current database's default tablespace */
- fileset->tablespaces[0] = MyDatabaseTableSpace;
- fileset->ntablespaces = 1;
- }
- else
- {
- int i;
- /*
- * An entry of InvalidOid means use the default tablespace for the
- * current database. Replace that now, to be sure that all users of
- * the SharedFileSet agree on what to do.
- */
- for (i = 0; i < fileset->ntablespaces; i++)
- {
- if (fileset->tablespaces[i] == InvalidOid)
- fileset->tablespaces[i] = MyDatabaseTableSpace;
- }
- }
+ /* Initialize the fileset. */
+ FileSetInit(&fileset->fs);
/* Register our cleanup callback. */
if (seg)
on_dsm_detach(seg, SharedFileSetOnDetach, PointerGetDatum(fileset));
- else
- {
- static bool registered_cleanup = false;
-
- if (!registered_cleanup)
- {
- /*
- * We must not have registered any fileset before registering the
- * fileset clean up.
- */
- Assert(filesetlist == NIL);
- on_proc_exit(SharedFileSetDeleteOnProcExit, 0);
- registered_cleanup = true;
- }
-
- filesetlist = lcons((void *) fileset, filesetlist);
- }
}
/*
@@ -148,86 +83,12 @@ SharedFileSetAttach(SharedFileSet *fileset, dsm_segment *seg)
}
/*
- * Create a new file in the given set.
- */
-File
-SharedFileSetCreate(SharedFileSet *fileset, const char *name)
-{
- char path[MAXPGPATH];
- File file;
-
- SharedFilePath(path, fileset, name);
- file = PathNameCreateTemporaryFile(path, false);
-
- /* If we failed, see if we need to create the directory on demand. */
- if (file <= 0)
- {
- char tempdirpath[MAXPGPATH];
- char filesetpath[MAXPGPATH];
- Oid tablespace = ChooseTablespace(fileset, name);
-
- TempTablespacePath(tempdirpath, tablespace);
- SharedFileSetPath(filesetpath, fileset, tablespace);
- PathNameCreateTemporaryDir(tempdirpath, filesetpath);
- file = PathNameCreateTemporaryFile(path, true);
- }
-
- return file;
-}
-
-/*
- * Open a file that was created with SharedFileSetCreate(), possibly in
- * another backend.
- */
-File
-SharedFileSetOpen(SharedFileSet *fileset, const char *name, int mode)
-{
- char path[MAXPGPATH];
- File file;
-
- SharedFilePath(path, fileset, name);
- file = PathNameOpenTemporaryFile(path, mode);
-
- return file;
-}
-
-/*
- * Delete a file that was created with SharedFileSetCreate().
- * Return true if the file existed, false if didn't.
- */
-bool
-SharedFileSetDelete(SharedFileSet *fileset, const char *name,
- bool error_on_failure)
-{
- char path[MAXPGPATH];
-
- SharedFilePath(path, fileset, name);
-
- return PathNameDeleteTemporaryFile(path, error_on_failure);
-}
-
-/*
* Delete all files in the set.
*/
void
SharedFileSetDeleteAll(SharedFileSet *fileset)
{
- char dirpath[MAXPGPATH];
- int i;
-
- /*
- * Delete the directory we created in each tablespace. Doesn't fail
- * because we use this in error cleanup paths, but can generate LOG
- * message on IO error.
- */
- for (i = 0; i < fileset->ntablespaces; ++i)
- {
- SharedFileSetPath(dirpath, fileset, fileset->tablespaces[i]);
- PathNameDeleteTemporaryDir(dirpath);
- }
-
- /* Unregister the shared fileset */
- SharedFileSetUnregister(fileset);
+ FileSetDeleteAll(&fileset->fs);
}
/*
@@ -255,100 +116,5 @@ SharedFileSetOnDetach(dsm_segment *segment, Datum datum)
* this function so we can safely access its data.
*/
if (unlink_all)
- SharedFileSetDeleteAll(fileset);
-}
-
-/*
- * Callback function that will be invoked on the process exit. This will
- * process the list of all the registered sharedfilesets and delete the
- * underlying files.
- */
-static void
-SharedFileSetDeleteOnProcExit(int status, Datum arg)
-{
- /*
- * Remove all the pending shared fileset entries. We don't use foreach()
- * here because SharedFileSetDeleteAll will remove the current element in
- * filesetlist. Though we have used foreach_delete_current() to remove the
- * element from filesetlist it could only fix up the state of one of the
- * loops, see SharedFileSetUnregister.
- */
- while (list_length(filesetlist) > 0)
- {
- SharedFileSet *fileset = (SharedFileSet *) linitial(filesetlist);
-
- SharedFileSetDeleteAll(fileset);
- }
-
- filesetlist = NIL;
-}
-
-/*
- * Unregister the shared fileset entry registered for cleanup on proc exit.
- */
-void
-SharedFileSetUnregister(SharedFileSet *input_fileset)
-{
- ListCell *l;
-
- /*
- * If the caller is following the dsm based cleanup then we don't maintain
- * the filesetlist so return.
- */
- if (filesetlist == NIL)
- return;
-
- foreach(l, filesetlist)
- {
- SharedFileSet *fileset = (SharedFileSet *) lfirst(l);
-
- /* Remove the entry from the list */
- if (input_fileset == fileset)
- {
- filesetlist = foreach_delete_current(filesetlist, l);
- return;
- }
- }
-
- /* Should have found a match */
- Assert(false);
-}
-
-/*
- * Build the path for the directory holding the files backing a SharedFileSet
- * in a given tablespace.
- */
-static void
-SharedFileSetPath(char *path, SharedFileSet *fileset, Oid tablespace)
-{
- char tempdirpath[MAXPGPATH];
-
- TempTablespacePath(tempdirpath, tablespace);
- snprintf(path, MAXPGPATH, "%s/%s%lu.%u.sharedfileset",
- tempdirpath, PG_TEMP_FILE_PREFIX,
- (unsigned long) fileset->creator_pid, fileset->number);
-}
-
-/*
- * Sorting hat to determine which tablespace a given shared temporary file
- * belongs in.
- */
-static Oid
-ChooseTablespace(const SharedFileSet *fileset, const char *name)
-{
- uint32 hash = hash_any((const unsigned char *) name, strlen(name));
-
- return fileset->tablespaces[hash % fileset->ntablespaces];
-}
-
-/*
- * Compute the full path of a file in a SharedFileSet.
- */
-static void
-SharedFilePath(char *path, SharedFileSet *fileset, const char *name)
-{
- char dirpath[MAXPGPATH];
-
- SharedFileSetPath(dirpath, fileset, ChooseTablespace(fileset, name));
- snprintf(path, MAXPGPATH, "%s/%s", dirpath, name);
+ FileSetDeleteAll(&fileset->fs);
}
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index cafc087..f7994d7 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -564,7 +564,7 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
lt = <s->tapes[i];
pg_itoa(i, filename);
- file = BufFileOpenShared(fileset, filename, O_RDONLY);
+ file = BufFileOpenFileSet(&fileset->fs, filename, O_RDONLY);
filesize = BufFileSize(file);
/*
@@ -610,7 +610,7 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
* offset).
*
* The only thing that currently prevents writing to the leader tape from
- * working is the fact that BufFiles opened using BufFileOpenShared() are
+ * working is the fact that BufFiles opened using BufFileOpenFileSet() are
* read-only by definition, but that could be changed if it seemed
* worthwhile. For now, writing to the leader tape will raise a "Bad file
* descriptor" error, so tuplesort must avoid writing to the leader tape
@@ -722,7 +722,7 @@ LogicalTapeSetCreate(int ntapes, bool preallocate, TapeShare *shared,
char filename[MAXPGPATH];
pg_itoa(worker, filename);
- lts->pfile = BufFileCreateShared(fileset, filename);
+ lts->pfile = BufFileCreateFileSet(&fileset->fs, filename);
}
else
lts->pfile = BufFileCreateTemp(false);
@@ -1096,7 +1096,7 @@ LogicalTapeFreeze(LogicalTapeSet *lts, int tapenum, TapeShare *share)
/* Handle extra steps when caller is to share its tapeset */
if (share)
{
- BufFileExportShared(lts->pfile);
+ BufFileExportFileSet(lts->pfile);
share->firstblocknumber = lt->firstBlockNumber;
}
}
diff --git a/src/backend/utils/sort/sharedtuplestore.c b/src/backend/utils/sort/sharedtuplestore.c
index 57e35db..504ef1c 100644
--- a/src/backend/utils/sort/sharedtuplestore.c
+++ b/src/backend/utils/sort/sharedtuplestore.c
@@ -310,7 +310,8 @@ sts_puttuple(SharedTuplestoreAccessor *accessor, void *meta_data,
/* Create one. Only this backend will write into it. */
sts_filename(name, accessor, accessor->participant);
- accessor->write_file = BufFileCreateShared(accessor->fileset, name);
+ accessor->write_file =
+ BufFileCreateFileSet(&accessor->fileset->fs, name);
/* Set up the shared state for this backend's file. */
participant = &accessor->sts->participants[accessor->participant];
@@ -559,7 +560,7 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
sts_filename(name, accessor, accessor->read_participant);
accessor->read_file =
- BufFileOpenShared(accessor->fileset, name, O_RDONLY);
+ BufFileOpenFileSet(&accessor->fileset->fs, name, O_RDONLY);
}
/* Seek and load the chunk header. */
diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h
index 41c7487..a6c9d4e 100644
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -79,6 +79,7 @@ extern void logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname,
extern void logicalrep_worker_stop(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup_ptr(LogicalRepWorker *worker);
+extern void logicalrep_worker_cleanupfileset(void);
extern int logicalrep_sync_worker_count(Oid subid);
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 566523d..143eada 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -26,7 +26,7 @@
#ifndef BUFFILE_H
#define BUFFILE_H
-#include "storage/sharedfileset.h"
+#include "storage/fileset.h"
/* BufFile is an opaque type whose details are not known outside buffile.c. */
@@ -46,11 +46,11 @@ extern int BufFileSeekBlock(BufFile *file, long blknum);
extern int64 BufFileSize(BufFile *file);
extern long BufFileAppend(BufFile *target, BufFile *source);
-extern BufFile *BufFileCreateShared(SharedFileSet *fileset, const char *name);
-extern void BufFileExportShared(BufFile *file);
-extern BufFile *BufFileOpenShared(SharedFileSet *fileset, const char *name,
- int mode);
-extern void BufFileDeleteShared(SharedFileSet *fileset, const char *name);
-extern void BufFileTruncateShared(BufFile *file, int fileno, off_t offset);
+extern BufFile *BufFileCreateFileSet(FileSet *fileset, const char *name);
+extern void BufFileExportFileSet(BufFile *file);
+extern BufFile *BufFileOpenFileSet(FileSet *fileset, const char *name,
+ int mode);
+extern void BufFileDeleteFileSet(FileSet *fileset, const char *name);
+extern void BufFileTruncateFileSet(BufFile *file, int fileno, off_t offset);
#endif /* BUFFILE_H */
diff --git a/src/include/storage/fileset.h b/src/include/storage/fileset.h
new file mode 100644
index 0000000..be0e097
--- /dev/null
+++ b/src/include/storage/fileset.h
@@ -0,0 +1,40 @@
+/*-------------------------------------------------------------------------
+ *
+ * fileset.h
+ * Management of named temporary files.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/fileset.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FILESET_H
+#define FILESET_H
+
+#include "storage/fd.h"
+
+/*
+ * A set of temporary files.
+ */
+typedef struct FileSet
+{
+ pid_t creator_pid; /* PID of the creating process */
+ uint32 number; /* per-PID identifier */
+ int ntablespaces; /* number of tablespaces to use */
+ Oid tablespaces[8]; /* OIDs of tablespaces to use. Assumes that
+ * it's rare that there more than temp
+ * tablespaces. */
+} FileSet;
+
+extern void FileSetInit(FileSet *fileset);
+extern File FileSetCreate(FileSet *fileset, const char *name);
+extern File FileSetOpen(FileSet *fileset, const char *name,
+ int mode);
+extern bool FileSetDelete(FileSet *fileset, const char *name,
+ bool error_on_failure);
+extern void FileSetDeleteAll(FileSet *fileset);
+
+#endif
diff --git a/src/include/storage/sharedfileset.h b/src/include/storage/sharedfileset.h
index 09ba121..59becfb 100644
--- a/src/include/storage/sharedfileset.h
+++ b/src/include/storage/sharedfileset.h
@@ -17,6 +17,7 @@
#include "storage/dsm.h"
#include "storage/fd.h"
+#include "storage/fileset.h"
#include "storage/spin.h"
/*
@@ -24,24 +25,13 @@
*/
typedef struct SharedFileSet
{
- pid_t creator_pid; /* PID of the creating process */
- uint32 number; /* per-PID identifier */
+ FileSet fs;
slock_t mutex; /* mutex protecting the reference count */
int refcnt; /* number of attached backends */
- int ntablespaces; /* number of tablespaces to use */
- Oid tablespaces[8]; /* OIDs of tablespaces to use. Assumes that
- * it's rare that there more than temp
- * tablespaces. */
} SharedFileSet;
extern void SharedFileSetInit(SharedFileSet *fileset, dsm_segment *seg);
extern void SharedFileSetAttach(SharedFileSet *fileset, dsm_segment *seg);
-extern File SharedFileSetCreate(SharedFileSet *fileset, const char *name);
-extern File SharedFileSetOpen(SharedFileSet *fileset, const char *name,
- int mode);
-extern bool SharedFileSetDelete(SharedFileSet *fileset, const char *name,
- bool error_on_failure);
extern void SharedFileSetDeleteAll(SharedFileSet *fileset);
-extern void SharedFileSetUnregister(SharedFileSet *input_fileset);
#endif
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 37cf4b2..4770740 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -708,6 +708,7 @@ File
FileFdwExecutionState
FileFdwPlanState
FileNameMap
+FileSet
FileTag
FinalPathExtraData
FindColsContext
--
1.8.3.1
On Wed, Aug 25, 2021 at 9:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Aug 24, 2021 at 3:55 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Tue, Aug 24, 2021 at 12:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
The first patch looks good to me. I have made minor changes to the
attached patch. The changes include: fixing compilation warning, made
some comment changes, ran pgindent, and few other cosmetic changes. If
you are fine with the attached, then kindly rebase the second patch
atop it.
The patch looks good to me.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Wed, Aug 25, 2021 at 5:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Aug 24, 2021 at 3:55 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Tue, Aug 24, 2021 at 12:26 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
The first patch looks good to me. I have made minor changes to the
attached patch. The changes include: fixing compilation warning, made
some comment changes, ran pgindent, and few other cosmetic changes. If
you are fine with the attached, then kindly rebase the second patch
atop it.
The patch looks good to me, I have rebased 0002 atop this patch and also
done some cosmetic fixes in 0002.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v5-0001-Refactor-sharedfileset.c-to-separate-out-fileset-.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Refactor-sharedfileset.c-to-separate-out-fileset-.patchDownload
From 3558cdc8a4f77e3c9092bbaeded451eaaa1feca4 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Wed, 25 Aug 2021 15:00:19 +0530
Subject: [PATCH v5 1/2] Refactor sharedfileset.c to separate out fileset
implementation.
Move fileset related implementation out of sharedfileset.c to allow its
usage by backends that don't want to share filesets among different
processes. After this split, fileset infrastructure is used by both
sharedfileset.c and worker.c for the named temporary files that survive
across transactions.
Author: Dilip Kumar, based on suggestion by Andres Freund
Reviewed-by: Hou Zhijie, Masahiko Sawada, Amit Kapila
Discussion: https://postgr.es/m/E1mCC6U-0004Ik-Fs@gemulon.postgresql.org
---
src/backend/replication/logical/launcher.c | 3 +
src/backend/replication/logical/worker.c | 82 ++++++----
src/backend/storage/file/Makefile | 1 +
src/backend/storage/file/buffile.c | 84 +++++-----
src/backend/storage/file/fd.c | 2 +-
src/backend/storage/file/fileset.c | 205 ++++++++++++++++++++++++
src/backend/storage/file/sharedfileset.c | 244 +----------------------------
src/backend/utils/sort/logtape.c | 8 +-
src/backend/utils/sort/sharedtuplestore.c | 5 +-
src/include/replication/worker_internal.h | 1 +
src/include/storage/buffile.h | 14 +-
src/include/storage/fileset.h | 40 +++++
src/include/storage/sharedfileset.h | 14 +-
src/tools/pgindent/typedefs.list | 1 +
14 files changed, 368 insertions(+), 336 deletions(-)
create mode 100644 src/backend/storage/file/fileset.c
create mode 100644 src/include/storage/fileset.h
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index e3b11da..8b1772d 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -648,6 +648,9 @@ logicalrep_worker_onexit(int code, Datum arg)
logicalrep_worker_detach();
+ /* Cleanup filesets used for streaming transactions. */
+ logicalrep_worker_cleanupfileset();
+
ApplyLauncherWakeup();
}
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 38b493e..6adc6ddd 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -39,13 +39,13 @@
* BufFile infrastructure supports temporary files that exceed the OS file size
* limit, (b) provides a way for automatic clean up on the error and (c) provides
* a way to survive these files across local transactions and allow to open and
- * close at stream start and close. We decided to use SharedFileSet
+ * close at stream start and close. We decided to use FileSet
* infrastructure as without that it deletes the files on the closure of the
* file and if we decide to keep stream files open across the start/stop stream
* then it will consume a lot of memory (more than 8K for each BufFile and
* there could be multiple such BufFiles as the subscriber could receive
* multiple start/stop streams for different transactions before getting the
- * commit). Moreover, if we don't use SharedFileSet then we also need to invent
+ * commit). Moreover, if we don't use FileSet then we also need to invent
* a new way to pass filenames to BufFile APIs so that we are allowed to open
* the file we desired across multiple stream-open calls for the same
* transaction.
@@ -231,8 +231,8 @@ typedef struct ApplyExecutionData
typedef struct StreamXidHash
{
TransactionId xid; /* xid is the hash key and must be first */
- SharedFileSet *stream_fileset; /* shared file set for stream data */
- SharedFileSet *subxact_fileset; /* shared file set for subxact info */
+ FileSet *stream_fileset; /* file set for stream data */
+ FileSet *subxact_fileset; /* file set for subxact info */
} StreamXidHash;
static MemoryContext ApplyMessageContext = NULL;
@@ -255,8 +255,8 @@ static bool in_streamed_transaction = false;
static TransactionId stream_xid = InvalidTransactionId;
/*
- * Hash table for storing the streaming xid information along with shared file
- * set for streaming and subxact files.
+ * Hash table for storing the streaming xid information along with filesets
+ * for streaming and subxact files.
*/
static HTAB *xidhash = NULL;
@@ -1299,11 +1299,11 @@ apply_handle_stream_abort(StringInfo s)
/* open the changes file */
changes_filename(path, MyLogicalRepWorker->subid, xid);
- fd = BufFileOpenShared(ent->stream_fileset, path, O_RDWR);
+ fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
/* OK, truncate the file at the right offset */
- BufFileTruncateShared(fd, subxact_data.subxacts[subidx].fileno,
- subxact_data.subxacts[subidx].offset);
+ BufFileTruncateFileSet(fd, subxact_data.subxacts[subidx].fileno,
+ subxact_data.subxacts[subidx].offset);
BufFileClose(fd);
/* discard the subxacts added later */
@@ -1355,7 +1355,7 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
errmsg_internal("transaction %u not found in stream XID hash table",
xid)));
- fd = BufFileOpenShared(ent->stream_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDONLY);
buffer = palloc(BLCKSZ);
initStringInfo(&s2);
@@ -2509,6 +2509,30 @@ UpdateWorkerStats(XLogRecPtr last_lsn, TimestampTz send_time, bool reply)
}
/*
+ * Cleanup filesets.
+ */
+void
+logicalrep_worker_cleanupfileset(void)
+{
+ HASH_SEQ_STATUS status;
+ StreamXidHash *hentry;
+
+ /* Remove all the pending stream and subxact filesets. */
+ if (xidhash)
+ {
+ hash_seq_init(&status, xidhash);
+ while ((hentry = (StreamXidHash *) hash_seq_search(&status)) != NULL)
+ {
+ FileSetDeleteAll(hentry->stream_fileset);
+
+ /* Delete the subxact fileset iff it is created. */
+ if (hentry->subxact_fileset)
+ FileSetDeleteAll(hentry->subxact_fileset);
+ }
+ }
+}
+
+/*
* Apply main loop.
*/
static void
@@ -2979,7 +3003,7 @@ subxact_info_write(Oid subid, TransactionId xid)
if (ent->subxact_fileset)
{
cleanup_subxact_info();
- SharedFileSetDeleteAll(ent->subxact_fileset);
+ FileSetDeleteAll(ent->subxact_fileset);
pfree(ent->subxact_fileset);
ent->subxact_fileset = NULL;
}
@@ -2997,18 +3021,18 @@ subxact_info_write(Oid subid, TransactionId xid)
MemoryContext oldctx;
/*
- * We need to maintain shared fileset across multiple stream
- * start/stop calls. So, need to allocate it in a persistent context.
+ * We need to maintain fileset across multiple stream start/stop
+ * calls. So, need to allocate it in a persistent context.
*/
oldctx = MemoryContextSwitchTo(ApplyContext);
- ent->subxact_fileset = palloc(sizeof(SharedFileSet));
- SharedFileSetInit(ent->subxact_fileset, NULL);
+ ent->subxact_fileset = palloc(sizeof(FileSet));
+ FileSetInit(ent->subxact_fileset);
MemoryContextSwitchTo(oldctx);
- fd = BufFileCreateShared(ent->subxact_fileset, path);
+ fd = BufFileCreateFileSet(ent->subxact_fileset, path);
}
else
- fd = BufFileOpenShared(ent->subxact_fileset, path, O_RDWR);
+ fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDWR);
len = sizeof(SubXactInfo) * subxact_data.nsubxacts;
@@ -3062,7 +3086,7 @@ subxact_info_read(Oid subid, TransactionId xid)
subxact_filename(path, subid, xid);
- fd = BufFileOpenShared(ent->subxact_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDONLY);
/* read number of subxact items */
if (BufFileRead(fd, &subxact_data.nsubxacts,
@@ -3219,7 +3243,7 @@ stream_cleanup_files(Oid subid, TransactionId xid)
/* Delete the change file and release the stream fileset memory */
changes_filename(path, subid, xid);
- SharedFileSetDeleteAll(ent->stream_fileset);
+ FileSetDeleteAll(ent->stream_fileset);
pfree(ent->stream_fileset);
ent->stream_fileset = NULL;
@@ -3227,7 +3251,7 @@ stream_cleanup_files(Oid subid, TransactionId xid)
if (ent->subxact_fileset)
{
subxact_filename(path, subid, xid);
- SharedFileSetDeleteAll(ent->subxact_fileset);
+ FileSetDeleteAll(ent->subxact_fileset);
pfree(ent->subxact_fileset);
ent->subxact_fileset = NULL;
}
@@ -3243,8 +3267,8 @@ stream_cleanup_files(Oid subid, TransactionId xid)
*
* Open a file for streamed changes from a toplevel transaction identified
* by stream_xid (global variable). If it's the first chunk of streamed
- * changes for this transaction, initialize the shared fileset and create the
- * buffile, otherwise open the previously created file.
+ * changes for this transaction, initialize the fileset and create the buffile,
+ * otherwise open the previously created file.
*
* This can only be called at the beginning of a "streaming" block, i.e.
* between stream_start/stream_stop messages from the upstream.
@@ -3285,7 +3309,7 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
if (first_segment)
{
MemoryContext savectx;
- SharedFileSet *fileset;
+ FileSet *fileset;
if (found)
ereport(ERROR,
@@ -3293,16 +3317,16 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
/*
- * We need to maintain shared fileset across multiple stream
- * start/stop calls. So, need to allocate it in a persistent context.
+ * We need to maintain fileset across multiple stream start/stop
+ * calls. So, need to allocate it in a persistent context.
*/
savectx = MemoryContextSwitchTo(ApplyContext);
- fileset = palloc(sizeof(SharedFileSet));
+ fileset = palloc(sizeof(FileSet));
- SharedFileSetInit(fileset, NULL);
+ FileSetInit(fileset);
MemoryContextSwitchTo(savectx);
- stream_fd = BufFileCreateShared(fileset, path);
+ stream_fd = BufFileCreateFileSet(fileset, path);
/* Remember the fileset for the next stream of the same transaction */
ent->xid = xid;
@@ -3320,7 +3344,7 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
* Open the file and seek to the end of the file because we always
* append the changes file.
*/
- stream_fd = BufFileOpenShared(ent->stream_fileset, path, O_RDWR);
+ stream_fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
BufFileSeek(stream_fd, 0, 0, SEEK_END);
}
diff --git a/src/backend/storage/file/Makefile b/src/backend/storage/file/Makefile
index 5e1291b..660ac51 100644
--- a/src/backend/storage/file/Makefile
+++ b/src/backend/storage/file/Makefile
@@ -16,6 +16,7 @@ OBJS = \
buffile.o \
copydir.o \
fd.o \
+ fileset.o \
reinit.o \
sharedfileset.o
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index a4be5fe..5e5409d 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -39,7 +39,7 @@
* BufFile also supports temporary files that can be used by the single backend
* when the corresponding files need to be survived across the transaction and
* need to be opened and closed multiple times. Such files need to be created
- * as a member of a SharedFileSet.
+ * as a member of a FileSet.
*-------------------------------------------------------------------------
*/
@@ -77,8 +77,8 @@ struct BufFile
bool dirty; /* does buffer need to be written? */
bool readOnly; /* has the file been set to read only? */
- SharedFileSet *fileset; /* space for segment files if shared */
- const char *name; /* name of this BufFile if shared */
+ FileSet *fileset; /* space for fileset based segment files */
+ const char *name; /* name of fileset based BufFile */
/*
* resowner is the ResourceOwner to use for underlying temp files. (We
@@ -104,7 +104,7 @@ static void extendBufFile(BufFile *file);
static void BufFileLoadBuffer(BufFile *file);
static void BufFileDumpBuffer(BufFile *file);
static void BufFileFlush(BufFile *file);
-static File MakeNewSharedSegment(BufFile *file, int segment);
+static File MakeNewFileSetSegment(BufFile *file, int segment);
/*
* Create BufFile and perform the common initialization.
@@ -160,7 +160,7 @@ extendBufFile(BufFile *file)
if (file->fileset == NULL)
pfile = OpenTemporaryFile(file->isInterXact);
else
- pfile = MakeNewSharedSegment(file, file->numFiles);
+ pfile = MakeNewFileSetSegment(file, file->numFiles);
Assert(pfile >= 0);
@@ -214,34 +214,34 @@ BufFileCreateTemp(bool interXact)
* Build the name for a given segment of a given BufFile.
*/
static void
-SharedSegmentName(char *name, const char *buffile_name, int segment)
+FileSetSegmentName(char *name, const char *buffile_name, int segment)
{
snprintf(name, MAXPGPATH, "%s.%d", buffile_name, segment);
}
/*
- * Create a new segment file backing a shared BufFile.
+ * Create a new segment file backing a fileset based BufFile.
*/
static File
-MakeNewSharedSegment(BufFile *buffile, int segment)
+MakeNewFileSetSegment(BufFile *buffile, int segment)
{
char name[MAXPGPATH];
File file;
/*
* It is possible that there are files left over from before a crash
- * restart with the same name. In order for BufFileOpenShared() not to
+ * restart with the same name. In order for BufFileOpenFileSet() not to
* get confused about how many segments there are, we'll unlink the next
* segment number if it already exists.
*/
- SharedSegmentName(name, buffile->name, segment + 1);
- SharedFileSetDelete(buffile->fileset, name, true);
+ FileSetSegmentName(name, buffile->name, segment + 1);
+ FileSetDelete(buffile->fileset, name, true);
/* Create the new segment. */
- SharedSegmentName(name, buffile->name, segment);
- file = SharedFileSetCreate(buffile->fileset, name);
+ FileSetSegmentName(name, buffile->name, segment);
+ file = FileSetCreate(buffile->fileset, name);
- /* SharedFileSetCreate would've errored out */
+ /* FileSetCreate would've errored out */
Assert(file > 0);
return file;
@@ -251,15 +251,15 @@ MakeNewSharedSegment(BufFile *buffile, int segment)
* Create a BufFile that can be discovered and opened read-only by other
* backends that are attached to the same SharedFileSet using the same name.
*
- * The naming scheme for shared BufFiles is left up to the calling code. The
- * name will appear as part of one or more filenames on disk, and might
+ * The naming scheme for fileset based BufFiles is left up to the calling code.
+ * The name will appear as part of one or more filenames on disk, and might
* provide clues to administrators about which subsystem is generating
* temporary file data. Since each SharedFileSet object is backed by one or
* more uniquely named temporary directory, names don't conflict with
* unrelated SharedFileSet objects.
*/
BufFile *
-BufFileCreateShared(SharedFileSet *fileset, const char *name)
+BufFileCreateFileSet(FileSet *fileset, const char *name)
{
BufFile *file;
@@ -267,7 +267,7 @@ BufFileCreateShared(SharedFileSet *fileset, const char *name)
file->fileset = fileset;
file->name = pstrdup(name);
file->files = (File *) palloc(sizeof(File));
- file->files[0] = MakeNewSharedSegment(file, 0);
+ file->files[0] = MakeNewFileSetSegment(file, 0);
file->readOnly = false;
return file;
@@ -275,13 +275,13 @@ BufFileCreateShared(SharedFileSet *fileset, const char *name)
/*
* Open a file that was previously created in another backend (or this one)
- * with BufFileCreateShared in the same SharedFileSet using the same name.
+ * with BufFileCreateFileSet in the same FileSet using the same name.
* The backend that created the file must have called BufFileClose() or
- * BufFileExportShared() to make sure that it is ready to be opened by other
+ * BufFileExportFileSet() to make sure that it is ready to be opened by other
* backends and render it read-only.
*/
BufFile *
-BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
+BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
{
BufFile *file;
char segment_name[MAXPGPATH];
@@ -304,8 +304,8 @@ BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
files = repalloc(files, sizeof(File) * capacity);
}
/* Try to load a segment. */
- SharedSegmentName(segment_name, name, nfiles);
- files[nfiles] = SharedFileSetOpen(fileset, segment_name, mode);
+ FileSetSegmentName(segment_name, name, nfiles);
+ files[nfiles] = FileSetOpen(fileset, segment_name, mode);
if (files[nfiles] <= 0)
break;
++nfiles;
@@ -333,18 +333,18 @@ BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
}
/*
- * Delete a BufFile that was created by BufFileCreateShared in the given
- * SharedFileSet using the given name.
+ * Delete a BufFile that was created by BufFileCreateFileSet in the given
+ * FileSet using the given name.
*
* It is not necessary to delete files explicitly with this function. It is
* provided only as a way to delete files proactively, rather than waiting for
- * the SharedFileSet to be cleaned up.
+ * the FileSet to be cleaned up.
*
* Only one backend should attempt to delete a given name, and should know
* that it exists and has been exported or closed.
*/
void
-BufFileDeleteShared(SharedFileSet *fileset, const char *name)
+BufFileDeleteFileSet(FileSet *fileset, const char *name)
{
char segment_name[MAXPGPATH];
int segment = 0;
@@ -357,8 +357,8 @@ BufFileDeleteShared(SharedFileSet *fileset, const char *name)
*/
for (;;)
{
- SharedSegmentName(segment_name, name, segment);
- if (!SharedFileSetDelete(fileset, segment_name, true))
+ FileSetSegmentName(segment_name, name, segment);
+ if (!FileSetDelete(fileset, segment_name, true))
break;
found = true;
++segment;
@@ -367,16 +367,16 @@ BufFileDeleteShared(SharedFileSet *fileset, const char *name)
}
if (!found)
- elog(ERROR, "could not delete unknown shared BufFile \"%s\"", name);
+ elog(ERROR, "could not delete unknown BufFile \"%s\"", name);
}
/*
- * BufFileExportShared --- flush and make read-only, in preparation for sharing.
+ * BufFileExportFileSet --- flush and make read-only, in preparation for sharing.
*/
void
-BufFileExportShared(BufFile *file)
+BufFileExportFileSet(BufFile *file)
{
- /* Must be a file belonging to a SharedFileSet. */
+ /* Must be a file belonging to a FileSet. */
Assert(file->fileset != NULL);
/* It's probably a bug if someone calls this twice. */
@@ -785,7 +785,7 @@ BufFileTellBlock(BufFile *file)
#endif
/*
- * Return the current shared BufFile size.
+ * Return the current fileset based BufFile size.
*
* Counts any holes left behind by BufFileAppend as part of the size.
* ereport()s on failure.
@@ -811,8 +811,8 @@ BufFileSize(BufFile *file)
}
/*
- * Append the contents of source file (managed within shared fileset) to
- * end of target file (managed within same shared fileset).
+ * Append the contents of source file (managed within fileset) to
+ * end of target file (managed within same fileset).
*
* Note that operation subsumes ownership of underlying resources from
* "source". Caller should never call BufFileClose against source having
@@ -854,11 +854,11 @@ BufFileAppend(BufFile *target, BufFile *source)
}
/*
- * Truncate a BufFile created by BufFileCreateShared up to the given fileno and
- * the offset.
+ * Truncate a BufFile created by BufFileCreateFileSet up to the given fileno
+ * and the offset.
*/
void
-BufFileTruncateShared(BufFile *file, int fileno, off_t offset)
+BufFileTruncateFileSet(BufFile *file, int fileno, off_t offset)
{
int numFiles = file->numFiles;
int newFile = fileno;
@@ -876,12 +876,12 @@ BufFileTruncateShared(BufFile *file, int fileno, off_t offset)
{
if ((i != fileno || offset == 0) && i != 0)
{
- SharedSegmentName(segment_name, file->name, i);
+ FileSetSegmentName(segment_name, file->name, i);
FileClose(file->files[i]);
- if (!SharedFileSetDelete(file->fileset, segment_name, true))
+ if (!FileSetDelete(file->fileset, segment_name, true))
ereport(ERROR,
(errcode_for_file_access(),
- errmsg("could not delete shared fileset \"%s\": %m",
+ errmsg("could not delete fileset \"%s\": %m",
segment_name)));
numFiles--;
newOffset = MAX_PHYSICAL_FILESIZE;
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index b58b399..433e283 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -1921,7 +1921,7 @@ PathNameDeleteTemporaryFile(const char *path, bool error_on_failure)
/*
* Unlike FileClose's automatic file deletion code, we tolerate
- * non-existence to support BufFileDeleteShared which doesn't know how
+ * non-existence to support BufFileDeleteFileSet which doesn't know how
* many segments it has to delete until it runs out.
*/
if (stat_errno == ENOENT)
diff --git a/src/backend/storage/file/fileset.c b/src/backend/storage/file/fileset.c
new file mode 100644
index 0000000..282ff12
--- /dev/null
+++ b/src/backend/storage/file/fileset.c
@@ -0,0 +1,205 @@
+/*-------------------------------------------------------------------------
+ *
+ * fileset.c
+ * Management of named temporary files.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/storage/file/fileset.c
+ *
+ * FileSets provide a temporary namespace (think directory) so that files can
+ * be discovered by name.
+ *
+ * FileSets can be used by backends when the temporary files need to be
+ * opened/closed multiple times and the underlying files need to survive across
+ * transactions.
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <limits.h>
+
+#include "catalog/pg_tablespace.h"
+#include "commands/tablespace.h"
+#include "common/hashfn.h"
+#include "miscadmin.h"
+#include "storage/ipc.h"
+#include "storage/fileset.h"
+#include "utils/builtins.h"
+
+static void FileSetPath(char *path, FileSet *fileset, Oid tablespace);
+static void FilePath(char *path, FileSet *fileset, const char *name);
+static Oid ChooseTablespace(const FileSet *fileset, const char *name);
+
+/*
+ * Initialize a space for temporary files. This API can be used by shared
+ * fileset as well as if the temporary files are used only by single backend
+ * but the files need to be opened and closed multiple times and also the
+ * underlying files need to survive across transactions.
+ *
+ * The callers are expected to explicitly remove such files by using
+ * FileSetDelete/FileSetDeleteAll.
+ *
+ * Files will be distributed over the tablespaces configured in
+ * temp_tablespaces.
+ *
+ * Under the covers the set is one or more directories which will eventually
+ * be deleted.
+ */
+void
+FileSetInit(FileSet *fileset)
+{
+ static uint32 counter = 0;
+
+ fileset->creator_pid = MyProcPid;
+ fileset->number = counter;
+ counter = (counter + 1) % INT_MAX;
+
+ /* Capture the tablespace OIDs so that all backends agree on them. */
+ PrepareTempTablespaces();
+ fileset->ntablespaces =
+ GetTempTablespaces(&fileset->tablespaces[0],
+ lengthof(fileset->tablespaces));
+ if (fileset->ntablespaces == 0)
+ {
+ /* If the GUC is empty, use current database's default tablespace */
+ fileset->tablespaces[0] = MyDatabaseTableSpace;
+ fileset->ntablespaces = 1;
+ }
+ else
+ {
+ int i;
+
+ /*
+ * An entry of InvalidOid means use the default tablespace for the
+ * current database. Replace that now, to be sure that all users of
+ * the FileSet agree on what to do.
+ */
+ for (i = 0; i < fileset->ntablespaces; i++)
+ {
+ if (fileset->tablespaces[i] == InvalidOid)
+ fileset->tablespaces[i] = MyDatabaseTableSpace;
+ }
+ }
+}
+
+/*
+ * Create a new file in the given set.
+ */
+File
+FileSetCreate(FileSet *fileset, const char *name)
+{
+ char path[MAXPGPATH];
+ File file;
+
+ FilePath(path, fileset, name);
+ file = PathNameCreateTemporaryFile(path, false);
+
+ /* If we failed, see if we need to create the directory on demand. */
+ if (file <= 0)
+ {
+ char tempdirpath[MAXPGPATH];
+ char filesetpath[MAXPGPATH];
+ Oid tablespace = ChooseTablespace(fileset, name);
+
+ TempTablespacePath(tempdirpath, tablespace);
+ FileSetPath(filesetpath, fileset, tablespace);
+ PathNameCreateTemporaryDir(tempdirpath, filesetpath);
+ file = PathNameCreateTemporaryFile(path, true);
+ }
+
+ return file;
+}
+
+/*
+ * Open a file that was created with FileSetCreate() */
+File
+FileSetOpen(FileSet *fileset, const char *name, int mode)
+{
+ char path[MAXPGPATH];
+ File file;
+
+ FilePath(path, fileset, name);
+ file = PathNameOpenTemporaryFile(path, mode);
+
+ return file;
+}
+
+/*
+ * Delete a file that was created with FileSetCreate().
+ *
+ * Return true if the file existed, false if didn't.
+ */
+bool
+FileSetDelete(FileSet *fileset, const char *name,
+ bool error_on_failure)
+{
+ char path[MAXPGPATH];
+
+ FilePath(path, fileset, name);
+
+ return PathNameDeleteTemporaryFile(path, error_on_failure);
+}
+
+/*
+ * Delete all files in the set.
+ */
+void
+FileSetDeleteAll(FileSet *fileset)
+{
+ char dirpath[MAXPGPATH];
+ int i;
+
+ /*
+ * Delete the directory we created in each tablespace. Doesn't fail
+ * because we use this in error cleanup paths, but can generate LOG
+ * message on IO error.
+ */
+ for (i = 0; i < fileset->ntablespaces; ++i)
+ {
+ FileSetPath(dirpath, fileset, fileset->tablespaces[i]);
+ PathNameDeleteTemporaryDir(dirpath);
+ }
+}
+
+/*
+ * Build the path for the directory holding the files backing a FileSet in a
+ * given tablespace.
+ */
+static void
+FileSetPath(char *path, FileSet *fileset, Oid tablespace)
+{
+ char tempdirpath[MAXPGPATH];
+
+ TempTablespacePath(tempdirpath, tablespace);
+ snprintf(path, MAXPGPATH, "%s/%s%lu.%u.fileset",
+ tempdirpath, PG_TEMP_FILE_PREFIX,
+ (unsigned long) fileset->creator_pid, fileset->number);
+}
+
+/*
+ * Sorting has to determine which tablespace a given temporary file belongs in.
+ */
+static Oid
+ChooseTablespace(const FileSet *fileset, const char *name)
+{
+ uint32 hash = hash_any((const unsigned char *) name, strlen(name));
+
+ return fileset->tablespaces[hash % fileset->ntablespaces];
+}
+
+/*
+ * Compute the full path of a file in a FileSet.
+ */
+static void
+FilePath(char *path, FileSet *fileset, const char *name)
+{
+ char dirpath[MAXPGPATH];
+
+ FileSetPath(dirpath, fileset, ChooseTablespace(fileset, name));
+ snprintf(path, MAXPGPATH, "%s/%s", dirpath, name);
+}
diff --git a/src/backend/storage/file/sharedfileset.c b/src/backend/storage/file/sharedfileset.c
index ed37c94..6a33fac 100644
--- a/src/backend/storage/file/sharedfileset.c
+++ b/src/backend/storage/file/sharedfileset.c
@@ -13,10 +13,6 @@
* files can be discovered by name, and a shared ownership semantics so that
* shared files survive until the last user detaches.
*
- * SharedFileSets can be used by backends when the temporary files need to be
- * opened/closed multiple times and the underlying files need to survive across
- * transactions.
- *
*-------------------------------------------------------------------------
*/
@@ -33,13 +29,7 @@
#include "storage/sharedfileset.h"
#include "utils/builtins.h"
-static List *filesetlist = NIL;
-
static void SharedFileSetOnDetach(dsm_segment *segment, Datum datum);
-static void SharedFileSetDeleteOnProcExit(int status, Datum arg);
-static void SharedFileSetPath(char *path, SharedFileSet *fileset, Oid tablespace);
-static void SharedFilePath(char *path, SharedFileSet *fileset, const char *name);
-static Oid ChooseTablespace(const SharedFileSet *fileset, const char *name);
/*
* Initialize a space for temporary files that can be opened by other backends.
@@ -47,77 +37,22 @@ static Oid ChooseTablespace(const SharedFileSet *fileset, const char *name);
* SharedFileSet with 'seg'. Any contained files will be deleted when the
* last backend detaches.
*
- * We can also use this interface if the temporary files are used only by
- * single backend but the files need to be opened and closed multiple times
- * and also the underlying files need to survive across transactions. For
- * such cases, dsm segment 'seg' should be passed as NULL. Callers are
- * expected to explicitly remove such files by using SharedFileSetDelete/
- * SharedFileSetDeleteAll or we remove such files on proc exit.
- *
- * Files will be distributed over the tablespaces configured in
- * temp_tablespaces.
- *
* Under the covers the set is one or more directories which will eventually
* be deleted.
*/
void
SharedFileSetInit(SharedFileSet *fileset, dsm_segment *seg)
{
- static uint32 counter = 0;
-
+ /* Initialize the shared fileset specific members. */
SpinLockInit(&fileset->mutex);
fileset->refcnt = 1;
- fileset->creator_pid = MyProcPid;
- fileset->number = counter;
- counter = (counter + 1) % INT_MAX;
-
- /* Capture the tablespace OIDs so that all backends agree on them. */
- PrepareTempTablespaces();
- fileset->ntablespaces =
- GetTempTablespaces(&fileset->tablespaces[0],
- lengthof(fileset->tablespaces));
- if (fileset->ntablespaces == 0)
- {
- /* If the GUC is empty, use current database's default tablespace */
- fileset->tablespaces[0] = MyDatabaseTableSpace;
- fileset->ntablespaces = 1;
- }
- else
- {
- int i;
- /*
- * An entry of InvalidOid means use the default tablespace for the
- * current database. Replace that now, to be sure that all users of
- * the SharedFileSet agree on what to do.
- */
- for (i = 0; i < fileset->ntablespaces; i++)
- {
- if (fileset->tablespaces[i] == InvalidOid)
- fileset->tablespaces[i] = MyDatabaseTableSpace;
- }
- }
+ /* Initialize the fileset. */
+ FileSetInit(&fileset->fs);
/* Register our cleanup callback. */
if (seg)
on_dsm_detach(seg, SharedFileSetOnDetach, PointerGetDatum(fileset));
- else
- {
- static bool registered_cleanup = false;
-
- if (!registered_cleanup)
- {
- /*
- * We must not have registered any fileset before registering the
- * fileset clean up.
- */
- Assert(filesetlist == NIL);
- on_proc_exit(SharedFileSetDeleteOnProcExit, 0);
- registered_cleanup = true;
- }
-
- filesetlist = lcons((void *) fileset, filesetlist);
- }
}
/*
@@ -148,86 +83,12 @@ SharedFileSetAttach(SharedFileSet *fileset, dsm_segment *seg)
}
/*
- * Create a new file in the given set.
- */
-File
-SharedFileSetCreate(SharedFileSet *fileset, const char *name)
-{
- char path[MAXPGPATH];
- File file;
-
- SharedFilePath(path, fileset, name);
- file = PathNameCreateTemporaryFile(path, false);
-
- /* If we failed, see if we need to create the directory on demand. */
- if (file <= 0)
- {
- char tempdirpath[MAXPGPATH];
- char filesetpath[MAXPGPATH];
- Oid tablespace = ChooseTablespace(fileset, name);
-
- TempTablespacePath(tempdirpath, tablespace);
- SharedFileSetPath(filesetpath, fileset, tablespace);
- PathNameCreateTemporaryDir(tempdirpath, filesetpath);
- file = PathNameCreateTemporaryFile(path, true);
- }
-
- return file;
-}
-
-/*
- * Open a file that was created with SharedFileSetCreate(), possibly in
- * another backend.
- */
-File
-SharedFileSetOpen(SharedFileSet *fileset, const char *name, int mode)
-{
- char path[MAXPGPATH];
- File file;
-
- SharedFilePath(path, fileset, name);
- file = PathNameOpenTemporaryFile(path, mode);
-
- return file;
-}
-
-/*
- * Delete a file that was created with SharedFileSetCreate().
- * Return true if the file existed, false if didn't.
- */
-bool
-SharedFileSetDelete(SharedFileSet *fileset, const char *name,
- bool error_on_failure)
-{
- char path[MAXPGPATH];
-
- SharedFilePath(path, fileset, name);
-
- return PathNameDeleteTemporaryFile(path, error_on_failure);
-}
-
-/*
* Delete all files in the set.
*/
void
SharedFileSetDeleteAll(SharedFileSet *fileset)
{
- char dirpath[MAXPGPATH];
- int i;
-
- /*
- * Delete the directory we created in each tablespace. Doesn't fail
- * because we use this in error cleanup paths, but can generate LOG
- * message on IO error.
- */
- for (i = 0; i < fileset->ntablespaces; ++i)
- {
- SharedFileSetPath(dirpath, fileset, fileset->tablespaces[i]);
- PathNameDeleteTemporaryDir(dirpath);
- }
-
- /* Unregister the shared fileset */
- SharedFileSetUnregister(fileset);
+ FileSetDeleteAll(&fileset->fs);
}
/*
@@ -255,100 +116,5 @@ SharedFileSetOnDetach(dsm_segment *segment, Datum datum)
* this function so we can safely access its data.
*/
if (unlink_all)
- SharedFileSetDeleteAll(fileset);
-}
-
-/*
- * Callback function that will be invoked on the process exit. This will
- * process the list of all the registered sharedfilesets and delete the
- * underlying files.
- */
-static void
-SharedFileSetDeleteOnProcExit(int status, Datum arg)
-{
- /*
- * Remove all the pending shared fileset entries. We don't use foreach()
- * here because SharedFileSetDeleteAll will remove the current element in
- * filesetlist. Though we have used foreach_delete_current() to remove the
- * element from filesetlist it could only fix up the state of one of the
- * loops, see SharedFileSetUnregister.
- */
- while (list_length(filesetlist) > 0)
- {
- SharedFileSet *fileset = (SharedFileSet *) linitial(filesetlist);
-
- SharedFileSetDeleteAll(fileset);
- }
-
- filesetlist = NIL;
-}
-
-/*
- * Unregister the shared fileset entry registered for cleanup on proc exit.
- */
-void
-SharedFileSetUnregister(SharedFileSet *input_fileset)
-{
- ListCell *l;
-
- /*
- * If the caller is following the dsm based cleanup then we don't maintain
- * the filesetlist so return.
- */
- if (filesetlist == NIL)
- return;
-
- foreach(l, filesetlist)
- {
- SharedFileSet *fileset = (SharedFileSet *) lfirst(l);
-
- /* Remove the entry from the list */
- if (input_fileset == fileset)
- {
- filesetlist = foreach_delete_current(filesetlist, l);
- return;
- }
- }
-
- /* Should have found a match */
- Assert(false);
-}
-
-/*
- * Build the path for the directory holding the files backing a SharedFileSet
- * in a given tablespace.
- */
-static void
-SharedFileSetPath(char *path, SharedFileSet *fileset, Oid tablespace)
-{
- char tempdirpath[MAXPGPATH];
-
- TempTablespacePath(tempdirpath, tablespace);
- snprintf(path, MAXPGPATH, "%s/%s%lu.%u.sharedfileset",
- tempdirpath, PG_TEMP_FILE_PREFIX,
- (unsigned long) fileset->creator_pid, fileset->number);
-}
-
-/*
- * Sorting hat to determine which tablespace a given shared temporary file
- * belongs in.
- */
-static Oid
-ChooseTablespace(const SharedFileSet *fileset, const char *name)
-{
- uint32 hash = hash_any((const unsigned char *) name, strlen(name));
-
- return fileset->tablespaces[hash % fileset->ntablespaces];
-}
-
-/*
- * Compute the full path of a file in a SharedFileSet.
- */
-static void
-SharedFilePath(char *path, SharedFileSet *fileset, const char *name)
-{
- char dirpath[MAXPGPATH];
-
- SharedFileSetPath(dirpath, fileset, ChooseTablespace(fileset, name));
- snprintf(path, MAXPGPATH, "%s/%s", dirpath, name);
+ FileSetDeleteAll(&fileset->fs);
}
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index cafc087..f7994d7 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -564,7 +564,7 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
lt = <s->tapes[i];
pg_itoa(i, filename);
- file = BufFileOpenShared(fileset, filename, O_RDONLY);
+ file = BufFileOpenFileSet(&fileset->fs, filename, O_RDONLY);
filesize = BufFileSize(file);
/*
@@ -610,7 +610,7 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
* offset).
*
* The only thing that currently prevents writing to the leader tape from
- * working is the fact that BufFiles opened using BufFileOpenShared() are
+ * working is the fact that BufFiles opened using BufFileOpenFileSet() are
* read-only by definition, but that could be changed if it seemed
* worthwhile. For now, writing to the leader tape will raise a "Bad file
* descriptor" error, so tuplesort must avoid writing to the leader tape
@@ -722,7 +722,7 @@ LogicalTapeSetCreate(int ntapes, bool preallocate, TapeShare *shared,
char filename[MAXPGPATH];
pg_itoa(worker, filename);
- lts->pfile = BufFileCreateShared(fileset, filename);
+ lts->pfile = BufFileCreateFileSet(&fileset->fs, filename);
}
else
lts->pfile = BufFileCreateTemp(false);
@@ -1096,7 +1096,7 @@ LogicalTapeFreeze(LogicalTapeSet *lts, int tapenum, TapeShare *share)
/* Handle extra steps when caller is to share its tapeset */
if (share)
{
- BufFileExportShared(lts->pfile);
+ BufFileExportFileSet(lts->pfile);
share->firstblocknumber = lt->firstBlockNumber;
}
}
diff --git a/src/backend/utils/sort/sharedtuplestore.c b/src/backend/utils/sort/sharedtuplestore.c
index 57e35db..504ef1c 100644
--- a/src/backend/utils/sort/sharedtuplestore.c
+++ b/src/backend/utils/sort/sharedtuplestore.c
@@ -310,7 +310,8 @@ sts_puttuple(SharedTuplestoreAccessor *accessor, void *meta_data,
/* Create one. Only this backend will write into it. */
sts_filename(name, accessor, accessor->participant);
- accessor->write_file = BufFileCreateShared(accessor->fileset, name);
+ accessor->write_file =
+ BufFileCreateFileSet(&accessor->fileset->fs, name);
/* Set up the shared state for this backend's file. */
participant = &accessor->sts->participants[accessor->participant];
@@ -559,7 +560,7 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
sts_filename(name, accessor, accessor->read_participant);
accessor->read_file =
- BufFileOpenShared(accessor->fileset, name, O_RDONLY);
+ BufFileOpenFileSet(&accessor->fileset->fs, name, O_RDONLY);
}
/* Seek and load the chunk header. */
diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h
index 41c7487..a6c9d4e 100644
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -79,6 +79,7 @@ extern void logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname,
extern void logicalrep_worker_stop(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup_ptr(LogicalRepWorker *worker);
+extern void logicalrep_worker_cleanupfileset(void);
extern int logicalrep_sync_worker_count(Oid subid);
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 566523d..143eada 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -26,7 +26,7 @@
#ifndef BUFFILE_H
#define BUFFILE_H
-#include "storage/sharedfileset.h"
+#include "storage/fileset.h"
/* BufFile is an opaque type whose details are not known outside buffile.c. */
@@ -46,11 +46,11 @@ extern int BufFileSeekBlock(BufFile *file, long blknum);
extern int64 BufFileSize(BufFile *file);
extern long BufFileAppend(BufFile *target, BufFile *source);
-extern BufFile *BufFileCreateShared(SharedFileSet *fileset, const char *name);
-extern void BufFileExportShared(BufFile *file);
-extern BufFile *BufFileOpenShared(SharedFileSet *fileset, const char *name,
- int mode);
-extern void BufFileDeleteShared(SharedFileSet *fileset, const char *name);
-extern void BufFileTruncateShared(BufFile *file, int fileno, off_t offset);
+extern BufFile *BufFileCreateFileSet(FileSet *fileset, const char *name);
+extern void BufFileExportFileSet(BufFile *file);
+extern BufFile *BufFileOpenFileSet(FileSet *fileset, const char *name,
+ int mode);
+extern void BufFileDeleteFileSet(FileSet *fileset, const char *name);
+extern void BufFileTruncateFileSet(BufFile *file, int fileno, off_t offset);
#endif /* BUFFILE_H */
diff --git a/src/include/storage/fileset.h b/src/include/storage/fileset.h
new file mode 100644
index 0000000..be0e097
--- /dev/null
+++ b/src/include/storage/fileset.h
@@ -0,0 +1,40 @@
+/*-------------------------------------------------------------------------
+ *
+ * fileset.h
+ * Management of named temporary files.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/fileset.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FILESET_H
+#define FILESET_H
+
+#include "storage/fd.h"
+
+/*
+ * A set of temporary files.
+ */
+typedef struct FileSet
+{
+ pid_t creator_pid; /* PID of the creating process */
+ uint32 number; /* per-PID identifier */
+ int ntablespaces; /* number of tablespaces to use */
+ Oid tablespaces[8]; /* OIDs of tablespaces to use. Assumes that
+ * it's rare that there more than temp
+ * tablespaces. */
+} FileSet;
+
+extern void FileSetInit(FileSet *fileset);
+extern File FileSetCreate(FileSet *fileset, const char *name);
+extern File FileSetOpen(FileSet *fileset, const char *name,
+ int mode);
+extern bool FileSetDelete(FileSet *fileset, const char *name,
+ bool error_on_failure);
+extern void FileSetDeleteAll(FileSet *fileset);
+
+#endif
diff --git a/src/include/storage/sharedfileset.h b/src/include/storage/sharedfileset.h
index 09ba121..59becfb 100644
--- a/src/include/storage/sharedfileset.h
+++ b/src/include/storage/sharedfileset.h
@@ -17,6 +17,7 @@
#include "storage/dsm.h"
#include "storage/fd.h"
+#include "storage/fileset.h"
#include "storage/spin.h"
/*
@@ -24,24 +25,13 @@
*/
typedef struct SharedFileSet
{
- pid_t creator_pid; /* PID of the creating process */
- uint32 number; /* per-PID identifier */
+ FileSet fs;
slock_t mutex; /* mutex protecting the reference count */
int refcnt; /* number of attached backends */
- int ntablespaces; /* number of tablespaces to use */
- Oid tablespaces[8]; /* OIDs of tablespaces to use. Assumes that
- * it's rare that there more than temp
- * tablespaces. */
} SharedFileSet;
extern void SharedFileSetInit(SharedFileSet *fileset, dsm_segment *seg);
extern void SharedFileSetAttach(SharedFileSet *fileset, dsm_segment *seg);
-extern File SharedFileSetCreate(SharedFileSet *fileset, const char *name);
-extern File SharedFileSetOpen(SharedFileSet *fileset, const char *name,
- int mode);
-extern bool SharedFileSetDelete(SharedFileSet *fileset, const char *name,
- bool error_on_failure);
extern void SharedFileSetDeleteAll(SharedFileSet *fileset);
-extern void SharedFileSetUnregister(SharedFileSet *input_fileset);
#endif
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 37cf4b2..4770740 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -708,6 +708,7 @@ File
FileFdwExecutionState
FileFdwPlanState
FileNameMap
+FileSet
FileTag
FinalPathExtraData
FindColsContext
--
1.8.3.1
v5-0002-Using-fileset-more-effectively-in-the-apply-worke.patchtext/x-patch; charset=UTF-8; name=v5-0002-Using-fileset-more-effectively-in-the-apply-worke.patchDownload
From d879f4046f5d98d1409fa686e46863ce25c37dd2 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Thu, 26 Aug 2021 09:42:36 +0530
Subject: [PATCH v5 2/2] Using fileset more effectively in the apply worker
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Do not use separate file sets for each xid, instead use one fileset
for the worker's entire lifetime. Now, the changes/subxacts files
for every streaming transaction will be created under the same fileset
and the files will be deleted after the transaction is completed.
The fileset will be there until the worker exit.
---
src/backend/replication/logical/launcher.c | 2 +-
src/backend/replication/logical/worker.c | 249 ++++++-----------------------
src/backend/storage/file/buffile.c | 23 ++-
src/backend/utils/sort/logtape.c | 2 +-
src/backend/utils/sort/sharedtuplestore.c | 3 +-
src/include/storage/buffile.h | 5 +-
6 files changed, 75 insertions(+), 209 deletions(-)
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 8b1772d..644a9c2 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -648,7 +648,7 @@ logicalrep_worker_onexit(int code, Datum arg)
logicalrep_worker_detach();
- /* Cleanup filesets used for streaming transactions. */
+ /* Cleanup fileset used for streaming transactions. */
logicalrep_worker_cleanupfileset();
ApplyLauncherWakeup();
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 6adc6ddd..eda71b6 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -221,20 +221,6 @@ typedef struct ApplyExecutionData
PartitionTupleRouting *proute; /* partition routing info */
} ApplyExecutionData;
-/*
- * Stream xid hash entry. Whenever we see a new xid we create this entry in the
- * xidhash and along with it create the streaming file and store the fileset handle.
- * The subxact file is created iff there is any subxact info under this xid. This
- * entry is used on the subsequent streams for the xid to get the corresponding
- * fileset handles, so storing them in hash makes the search faster.
- */
-typedef struct StreamXidHash
-{
- TransactionId xid; /* xid is the hash key and must be first */
- FileSet *stream_fileset; /* file set for stream data */
- FileSet *subxact_fileset; /* file set for subxact info */
-} StreamXidHash;
-
static MemoryContext ApplyMessageContext = NULL;
MemoryContext ApplyContext = NULL;
@@ -255,10 +241,13 @@ static bool in_streamed_transaction = false;
static TransactionId stream_xid = InvalidTransactionId;
/*
- * Hash table for storing the streaming xid information along with filesets
- * for streaming and subxact files.
+ * The fileset is used by the worker to create the changes and subxact files
+ * for the streaming transaction. Upon the arrival of the first streaming
+ * transaction, the fileset will be initialized, and it will be deleted when
+ * the worker exits. Under this, separate buffiles would be created for each
+ * transaction and would be deleted after the transaction is completed.
*/
-static HTAB *xidhash = NULL;
+static FileSet *stream_fileset = NULL;
/* BufFile handle of the current streaming file */
static BufFile *stream_fd = NULL;
@@ -1129,7 +1118,6 @@ static void
apply_handle_stream_start(StringInfo s)
{
bool first_segment;
- HASHCTL hash_ctl;
if (in_streamed_transaction)
ereport(ERROR,
@@ -1157,17 +1145,20 @@ apply_handle_stream_start(StringInfo s)
errmsg_internal("invalid transaction ID in streamed replication transaction")));
/*
- * Initialize the xidhash table if we haven't yet. This will be used for
+ * Initialize the stream_fileset if we haven't yet. This will be used for
* the entire duration of the apply worker so create it in permanent
* context.
*/
- if (xidhash == NULL)
+ if (stream_fileset == NULL)
{
- hash_ctl.keysize = sizeof(TransactionId);
- hash_ctl.entrysize = sizeof(StreamXidHash);
- hash_ctl.hcxt = ApplyContext;
- xidhash = hash_create("StreamXidHash", 1024, &hash_ctl,
- HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ MemoryContext oldctx;
+
+ oldctx = MemoryContextSwitchTo(ApplyContext);
+
+ stream_fileset = palloc(sizeof(FileSet));
+ FileSetInit(stream_fileset);
+
+ MemoryContextSwitchTo(oldctx);
}
/* open the spool file for this transaction */
@@ -1258,7 +1249,6 @@ apply_handle_stream_abort(StringInfo s)
BufFile *fd;
bool found = false;
char path[MAXPGPATH];
- StreamXidHash *ent;
subidx = -1;
begin_replication_step();
@@ -1287,19 +1277,9 @@ apply_handle_stream_abort(StringInfo s)
return;
}
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
/* open the changes file */
changes_filename(path, MyLogicalRepWorker->subid, xid);
- fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
+ fd = BufFileOpenFileSet(stream_fileset, path, O_RDWR, false);
/* OK, truncate the file at the right offset */
BufFileTruncateFileSet(fd, subxact_data.subxacts[subidx].fileno,
@@ -1327,7 +1307,6 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
int nchanges;
char path[MAXPGPATH];
char *buffer = NULL;
- StreamXidHash *ent;
MemoryContext oldcxt;
BufFile *fd;
@@ -1345,17 +1324,7 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
changes_filename(path, MyLogicalRepWorker->subid, xid);
elog(DEBUG1, "replaying changes from file \"%s\"", path);
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
- fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(stream_fileset, path, O_RDONLY, false);
buffer = palloc(BLCKSZ);
initStringInfo(&s2);
@@ -2509,27 +2478,14 @@ UpdateWorkerStats(XLogRecPtr last_lsn, TimestampTz send_time, bool reply)
}
/*
- * Cleanup filesets.
+ * Cleanup fileset.
*/
void
-logicalrep_worker_cleanupfileset(void)
+logicalrep_worker_cleanupfileset()
{
- HASH_SEQ_STATUS status;
- StreamXidHash *hentry;
-
- /* Remove all the pending stream and subxact filesets. */
- if (xidhash)
- {
- hash_seq_init(&status, xidhash);
- while ((hentry = (StreamXidHash *) hash_seq_search(&status)) != NULL)
- {
- FileSetDeleteAll(hentry->stream_fileset);
-
- /* Delete the subxact fileset iff it is created. */
- if (hentry->subxact_fileset)
- FileSetDeleteAll(hentry->subxact_fileset);
- }
- }
+ /* If the fileset is created, clean the underlying files. */
+ if (stream_fileset != NULL)
+ FileSetDeleteAll(stream_fileset);
}
/*
@@ -2981,58 +2937,29 @@ subxact_info_write(Oid subid, TransactionId xid)
{
char path[MAXPGPATH];
Size len;
- StreamXidHash *ent;
BufFile *fd;
Assert(TransactionIdIsValid(xid));
- /* Find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- /* By this time we must have created the transaction entry */
- Assert(ent);
+ /* Get the subxact filename. */
+ subxact_filename(path, subid, xid);
/*
- * If there is no subtransaction then nothing to do, but if already have
- * subxact file then delete that.
+ * If there are no subtransactions, there is nothing to be done, but if
+ * subxacts already exist, delete it.
*/
if (subxact_data.nsubxacts == 0)
{
- if (ent->subxact_fileset)
- {
- cleanup_subxact_info();
- FileSetDeleteAll(ent->subxact_fileset);
- pfree(ent->subxact_fileset);
- ent->subxact_fileset = NULL;
- }
+ cleanup_subxact_info();
+ BufFileDeleteFileSet(stream_fileset, path, true);
+
return;
}
- subxact_filename(path, subid, xid);
-
- /*
- * Create the subxact file if it not already created, otherwise open the
- * existing file.
- */
- if (ent->subxact_fileset == NULL)
- {
- MemoryContext oldctx;
-
- /*
- * We need to maintain fileset across multiple stream start/stop
- * calls. So, need to allocate it in a persistent context.
- */
- oldctx = MemoryContextSwitchTo(ApplyContext);
- ent->subxact_fileset = palloc(sizeof(FileSet));
- FileSetInit(ent->subxact_fileset);
- MemoryContextSwitchTo(oldctx);
-
- fd = BufFileCreateFileSet(ent->subxact_fileset, path);
- }
- else
- fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDWR);
+ /* Open the subxact file, if it does not exist, create it. */
+ fd = BufFileOpenFileSet(stream_fileset, path, O_RDWR, true);
+ if (fd == NULL)
+ fd = BufFileCreateFileSet(stream_fileset, path);
len = sizeof(SubXactInfo) * subxact_data.nsubxacts;
@@ -3059,34 +2986,20 @@ subxact_info_read(Oid subid, TransactionId xid)
char path[MAXPGPATH];
Size len;
BufFile *fd;
- StreamXidHash *ent;
MemoryContext oldctx;
Assert(!subxact_data.subxacts);
Assert(subxact_data.nsubxacts == 0);
Assert(subxact_data.nsubxacts_max == 0);
- /* Find the stream xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
/*
- * If subxact_fileset is not valid that mean we don't have any subxact
- * info
+ * Open the subxact file for the input streaming xid, just return if the
+ * file does not exist.
*/
- if (ent->subxact_fileset == NULL)
- return;
-
subxact_filename(path, subid, xid);
-
- fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(stream_fileset, path, O_RDONLY, true);
+ if (fd == NULL)
+ return;
/* read number of subxact items */
if (BufFileRead(fd, &subxact_data.nsubxacts,
@@ -3222,42 +3135,20 @@ changes_filename(char *path, Oid subid, TransactionId xid)
* Cleanup files for a subscription / toplevel transaction.
*
* Remove files with serialized changes and subxact info for a particular
- * toplevel transaction. Each subscription has a separate set of files.
+ * toplevel transaction. Each subscription has a separate files.
*/
static void
stream_cleanup_files(Oid subid, TransactionId xid)
{
char path[MAXPGPATH];
- StreamXidHash *ent;
-
- /* Find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
- /* Delete the change file and release the stream fileset memory */
+ /* Delete the changes file. */
changes_filename(path, subid, xid);
- FileSetDeleteAll(ent->stream_fileset);
- pfree(ent->stream_fileset);
- ent->stream_fileset = NULL;
+ BufFileDeleteFileSet(stream_fileset, path, false);
- /* Delete the subxact file and release the memory, if it exist */
- if (ent->subxact_fileset)
- {
- subxact_filename(path, subid, xid);
- FileSetDeleteAll(ent->subxact_fileset);
- pfree(ent->subxact_fileset);
- ent->subxact_fileset = NULL;
- }
-
- /* Remove the xid entry from the stream xid hash */
- hash_search(xidhash, (void *) &xid, HASH_REMOVE, NULL);
+ /* Delete the subxact file, if it exist. */
+ subxact_filename(path, subid, xid);
+ BufFileDeleteFileSet(stream_fileset, path, true);
}
/*
@@ -3267,8 +3158,8 @@ stream_cleanup_files(Oid subid, TransactionId xid)
*
* Open a file for streamed changes from a toplevel transaction identified
* by stream_xid (global variable). If it's the first chunk of streamed
- * changes for this transaction, initialize the fileset and create the buffile,
- * otherwise open the previously created file.
+ * changes for this transaction, create the buffile, otherwise open the
+ * previously created file.
*
* This can only be called at the beginning of a "streaming" block, i.e.
* between stream_start/stream_stop messages from the upstream.
@@ -3277,20 +3168,13 @@ static void
stream_open_file(Oid subid, TransactionId xid, bool first_segment)
{
char path[MAXPGPATH];
- bool found;
MemoryContext oldcxt;
- StreamXidHash *ent;
Assert(in_streamed_transaction);
Assert(OidIsValid(subid));
Assert(TransactionIdIsValid(xid));
Assert(stream_fd == NULL);
- /* create or find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_ENTER,
- &found);
changes_filename(path, subid, xid);
elog(DEBUG1, "opening file \"%s\" for streamed changes", path);
@@ -3302,49 +3186,18 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
oldcxt = MemoryContextSwitchTo(LogicalStreamingContext);
/*
- * If this is the first streamed segment, the file must not exist, so make
- * sure we're the ones creating it. Otherwise just open the file for
- * writing, in append mode.
+ * If this is the first streamed segment, create the changes file.
+ * Otherwise, just open it file for writing, in append mode.
*/
if (first_segment)
- {
- MemoryContext savectx;
- FileSet *fileset;
-
- if (found)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
-
- /*
- * We need to maintain fileset across multiple stream start/stop
- * calls. So, need to allocate it in a persistent context.
- */
- savectx = MemoryContextSwitchTo(ApplyContext);
- fileset = palloc(sizeof(FileSet));
-
- FileSetInit(fileset);
- MemoryContextSwitchTo(savectx);
-
- stream_fd = BufFileCreateFileSet(fileset, path);
-
- /* Remember the fileset for the next stream of the same transaction */
- ent->xid = xid;
- ent->stream_fileset = fileset;
- ent->subxact_fileset = NULL;
- }
+ stream_fd = BufFileCreateFileSet(stream_fileset, path);
else
{
- if (!found)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
-
/*
* Open the file and seek to the end of the file because we always
* append the changes file.
*/
- stream_fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
+ stream_fd = BufFileOpenFileSet(stream_fileset, path, O_RDWR, false);
BufFileSeek(stream_fd, 0, 0, SEEK_END);
}
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 5e5409d..d96b25d 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -278,10 +278,12 @@ BufFileCreateFileSet(FileSet *fileset, const char *name)
* with BufFileCreateFileSet in the same FileSet using the same name.
* The backend that created the file must have called BufFileClose() or
* BufFileExportFileSet() to make sure that it is ready to be opened by other
- * backends and render it read-only.
+ * backends and render it read-only. If missing_ok is true, it will return
+ * NULL if the file does not exist otherwise, it will throw an error.
*/
BufFile *
-BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
+BufFileOpenFileSet(FileSet *fileset, const char *name, int mode,
+ bool missing_ok)
{
BufFile *file;
char segment_name[MAXPGPATH];
@@ -318,10 +320,18 @@ BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
* name.
*/
if (nfiles == 0)
+ {
+ /* free the memory */
+ pfree(files);
+
+ if (missing_ok)
+ return NULL;
+
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not open temporary file \"%s\" from BufFile \"%s\": %m",
segment_name, name)));
+ }
file = makeBufFileCommon(nfiles);
file->files = files;
@@ -341,10 +351,11 @@ BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
* the FileSet to be cleaned up.
*
* Only one backend should attempt to delete a given name, and should know
- * that it exists and has been exported or closed.
+ * that it exists and has been exported or closed otherwise missing_ok should
+ * be passed true.
*/
void
-BufFileDeleteFileSet(FileSet *fileset, const char *name)
+BufFileDeleteFileSet(FileSet *fileset, const char *name, bool missing_ok)
{
char segment_name[MAXPGPATH];
int segment = 0;
@@ -358,7 +369,7 @@ BufFileDeleteFileSet(FileSet *fileset, const char *name)
for (;;)
{
FileSetSegmentName(segment_name, name, segment);
- if (!FileSetDelete(fileset, segment_name, true))
+ if (!FileSetDelete(fileset, segment_name, !missing_ok))
break;
found = true;
++segment;
@@ -366,7 +377,7 @@ BufFileDeleteFileSet(FileSet *fileset, const char *name)
CHECK_FOR_INTERRUPTS();
}
- if (!found)
+ if (!found && !missing_ok)
elog(ERROR, "could not delete unknown BufFile \"%s\"", name);
}
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index f7994d7..debf12e1 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -564,7 +564,7 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
lt = <s->tapes[i];
pg_itoa(i, filename);
- file = BufFileOpenFileSet(&fileset->fs, filename, O_RDONLY);
+ file = BufFileOpenFileSet(&fileset->fs, filename, O_RDONLY, false);
filesize = BufFileSize(file);
/*
diff --git a/src/backend/utils/sort/sharedtuplestore.c b/src/backend/utils/sort/sharedtuplestore.c
index 504ef1c..033088f 100644
--- a/src/backend/utils/sort/sharedtuplestore.c
+++ b/src/backend/utils/sort/sharedtuplestore.c
@@ -560,7 +560,8 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
sts_filename(name, accessor, accessor->read_participant);
accessor->read_file =
- BufFileOpenFileSet(&accessor->fileset->fs, name, O_RDONLY);
+ BufFileOpenFileSet(&accessor->fileset->fs, name, O_RDONLY,
+ false);
}
/* Seek and load the chunk header. */
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 143eada..7ae5ea2 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -49,8 +49,9 @@ extern long BufFileAppend(BufFile *target, BufFile *source);
extern BufFile *BufFileCreateFileSet(FileSet *fileset, const char *name);
extern void BufFileExportFileSet(BufFile *file);
extern BufFile *BufFileOpenFileSet(FileSet *fileset, const char *name,
- int mode);
-extern void BufFileDeleteFileSet(FileSet *fileset, const char *name);
+ int mode, bool missing_ok);
+extern void BufFileDeleteFileSet(FileSet *fileset, const char *name,
+ bool missing_ok);
extern void BufFileTruncateFileSet(BufFile *file, int fileno, off_t offset);
#endif /* BUFFILE_H */
--
1.8.3.1
On Thu, Aug 26, 2021 at 11:47 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Aug 25, 2021 at 5:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Aug 24, 2021 at 3:55 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Tue, Aug 24, 2021 at 12:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
The first patch looks good to me. I have made minor changes to the
attached patch. The changes include: fixing compilation warning, made
some comment changes, ran pgindent, and few other cosmetic changes. If
you are fine with the attached, then kindly rebase the second patch
atop it.The patch looks good to me,
Thanks, Sawada-San and Dilip for confirmation. I would like to commit
this and the second patch (the second one still needs some more
testing and review) for PG-15 as there is no bug per-se related to
this work in PG-14 but I see an argument to commit this for PG-14 to
keep the code (APIs) consistent. What do you think? Does anybody else
have any opinion on this?
Below is a summary of each of the patches for those who are not
following this closely:
Patch-1: The purpose of this patch is to refactor sharedfileset.c to
separate out fileset implementation. Basically, this moves the fileset
related implementation out of sharedfileset.c to allow its usage by
backends that don't want to share filesets among different processes.
After this split, fileset infrastructure is used by both
sharedfileset.c and worker.c for the named temporary files that
survive across transactions. This is suggested by Andres to have a
cleaner API and its usage.
Patch-2: Allow to use single fileset in worker.c (for the lifetime of
worker) instead of using a separate fileset for each remote
transaction. After this, the files to record changes for each remote
transaction will be created under the same fileset and the files will
be deleted after the transaction is completed. This is suggested by
Thomas to allow better resource usage.
--
With Regards,
Amit Kapila.
On Thu, Aug 26, 2021 at 2:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Thanks, Sawada-San and Dilip for confirmation. I would like to commit
this and the second patch (the second one still needs some more
testing and review) for PG-15 as there is no bug per-se related to
this work in PG-14 but I see an argument to commit this for PG-14 to
keep the code (APIs) consistent. What do you think? Does anybody else
have any opinion on this?
IMHO, this is a fair amount of refactoring and this is actually an
improvement patch so we should push it to v15.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Thu, Aug 26, 2021 2:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
The patch looks good to me, I have rebased 0002 atop
this patch and also done some cosmetic fixes in 0002.
Here are some comments for the 0002 patch.
1)
- * toplevel transaction. Each subscription has a separate set of files.
+ * toplevel transaction. Each subscription has a separate files.
a separate files => a separate file
2)
+ * Otherwise, just open it file for writing, in append mode.
*/
open it file => open the file
3)
if (subxact_data.nsubxacts == 0)
{
- if (ent->subxact_fileset)
- {
- cleanup_subxact_info();
- FileSetDeleteAll(ent->subxact_fileset);
- pfree(ent->subxact_fileset);
- ent->subxact_fileset = NULL;
- }
+ cleanup_subxact_info();
+ BufFileDeleteFileSet(stream_fileset, path, true);
+
Before applying the patch, the code only invoke cleanup_subxact_info() when the
file exists. After applying the patch, it will invoke cleanup_subxact_info()
either the file exists or not. Is it correct ?
4)
/*
- * If this is the first streamed segment, the file must not exist, so make
- * sure we're the ones creating it. Otherwise just open the file for
- * writing, in append mode.
+ * If this is the first streamed segment, create the changes file.
+ * Otherwise, just open it file for writing, in append mode.
*/
if (first_segment)
- {
...
- if (found)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
...
- }
+ stream_fd = BufFileCreateFileSet(stream_fileset, path);
Since the function BufFileCreateFileSet() doesn't check the file's existence,
the change here seems remove the file existence check which the old code did.
Best regards,
Hou zj
On Fri, Aug 27, 2021 1:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, Aug 26, 2021 at 2:10 PM Amit Kapila <mailto:amit.kapila16@gmail.com> wrote:
Thanks, Sawada-San and Dilip for confirmation. I would like to commit
this and the second patch (the second one still needs some more
testing and review) for PG-15 as there is no bug per-se related to
this work in PG-14 but I see an argument to commit this for PG-14 to
keep the code (APIs) consistent. What do you think? Does anybody else
have any opinion on this?
IMHO, this is a fair amount of refactoring and this is actually an
improvement patch so we should push it to v15.
+1
Best regards,
Hou zj
On Fri, Aug 27, 2021 at 10:56 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
On Thu, Aug 26, 2021 2:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
The patch looks good to me, I have rebased 0002 atop
this patch and also done some cosmetic fixes in 0002.Here are some comments for the 0002 patch.
1)
- * toplevel transaction. Each subscription has a separate set of files. + * toplevel transaction. Each subscription has a separate files.a separate files => a separate file
Done
2)
+ * Otherwise, just open it file for writing, in append mode.
*/open it file => open the file
Done
3) if (subxact_data.nsubxacts == 0) { - if (ent->subxact_fileset) - { - cleanup_subxact_info(); - FileSetDeleteAll(ent->subxact_fileset); - pfree(ent->subxact_fileset); - ent->subxact_fileset = NULL; - } + cleanup_subxact_info(); + BufFileDeleteFileSet(stream_fileset, path, true); +Before applying the patch, the code only invoke cleanup_subxact_info() when the
file exists. After applying the patch, it will invoke cleanup_subxact_info()
either the file exists or not. Is it correct ?
I think this is just structure resetting at the end of the stream.
Earlier the hash was telling us whether we have ever dirtied that
structure or not but now we are not maintaining that hash so we just
reset it at the end of the stream. I don't think its any bad, in fact
I think this is much cheaper compared to maining the hash.
4) /* - * If this is the first streamed segment, the file must not exist, so make - * sure we're the ones creating it. Otherwise just open the file for - * writing, in append mode. + * If this is the first streamed segment, create the changes file. + * Otherwise, just open it file for writing, in append mode. */ if (first_segment) - { ... - if (found) - ereport(ERROR, - (errcode(ERRCODE_PROTOCOL_VIOLATION), - errmsg_internal("incorrect first-segment flag for streamed replication transaction"))); ... - } + stream_fd = BufFileCreateFileSet(stream_fileset, path);Since the function BufFileCreateFileSet() doesn't check the file's existence,
the change here seems remove the file existence check which the old code did.
Not really, we were just doing a sanity check of the in memory hash
entry, now we don't maintain that so we don't need to do that.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v6-0001-Refactor-sharedfileset.c-to-separate-out-fileset-.patchtext/x-patch; charset=US-ASCII; name=v6-0001-Refactor-sharedfileset.c-to-separate-out-fileset-.patchDownload
From dfae7d9e3ae875c8db2de792e9a95ef86fef2b00 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Wed, 25 Aug 2021 15:00:19 +0530
Subject: [PATCH v6 1/2] Refactor sharedfileset.c to separate out fileset
implementation.
Move fileset related implementation out of sharedfileset.c to allow its
usage by backends that don't want to share filesets among different
processes. After this split, fileset infrastructure is used by both
sharedfileset.c and worker.c for the named temporary files that survive
across transactions.
Author: Dilip Kumar, based on suggestion by Andres Freund
Reviewed-by: Hou Zhijie, Masahiko Sawada, Amit Kapila
Discussion: https://postgr.es/m/E1mCC6U-0004Ik-Fs@gemulon.postgresql.org
---
src/backend/replication/logical/launcher.c | 3 +
src/backend/replication/logical/worker.c | 82 ++++++----
src/backend/storage/file/Makefile | 1 +
src/backend/storage/file/buffile.c | 84 +++++-----
src/backend/storage/file/fd.c | 2 +-
src/backend/storage/file/fileset.c | 205 ++++++++++++++++++++++++
src/backend/storage/file/sharedfileset.c | 244 +----------------------------
src/backend/utils/sort/logtape.c | 8 +-
src/backend/utils/sort/sharedtuplestore.c | 5 +-
src/include/replication/worker_internal.h | 1 +
src/include/storage/buffile.h | 14 +-
src/include/storage/fileset.h | 40 +++++
src/include/storage/sharedfileset.h | 14 +-
src/tools/pgindent/typedefs.list | 1 +
14 files changed, 368 insertions(+), 336 deletions(-)
create mode 100644 src/backend/storage/file/fileset.c
create mode 100644 src/include/storage/fileset.h
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index e3b11da..8b1772d 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -648,6 +648,9 @@ logicalrep_worker_onexit(int code, Datum arg)
logicalrep_worker_detach();
+ /* Cleanup filesets used for streaming transactions. */
+ logicalrep_worker_cleanupfileset();
+
ApplyLauncherWakeup();
}
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 295b1e0..bfb7d1a 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -39,13 +39,13 @@
* BufFile infrastructure supports temporary files that exceed the OS file size
* limit, (b) provides a way for automatic clean up on the error and (c) provides
* a way to survive these files across local transactions and allow to open and
- * close at stream start and close. We decided to use SharedFileSet
+ * close at stream start and close. We decided to use FileSet
* infrastructure as without that it deletes the files on the closure of the
* file and if we decide to keep stream files open across the start/stop stream
* then it will consume a lot of memory (more than 8K for each BufFile and
* there could be multiple such BufFiles as the subscriber could receive
* multiple start/stop streams for different transactions before getting the
- * commit). Moreover, if we don't use SharedFileSet then we also need to invent
+ * commit). Moreover, if we don't use FileSet then we also need to invent
* a new way to pass filenames to BufFile APIs so that we are allowed to open
* the file we desired across multiple stream-open calls for the same
* transaction.
@@ -246,8 +246,8 @@ static ApplyErrorCallbackArg apply_error_callback_arg =
typedef struct StreamXidHash
{
TransactionId xid; /* xid is the hash key and must be first */
- SharedFileSet *stream_fileset; /* shared file set for stream data */
- SharedFileSet *subxact_fileset; /* shared file set for subxact info */
+ FileSet *stream_fileset; /* file set for stream data */
+ FileSet *subxact_fileset; /* file set for subxact info */
} StreamXidHash;
static MemoryContext ApplyMessageContext = NULL;
@@ -270,8 +270,8 @@ static bool in_streamed_transaction = false;
static TransactionId stream_xid = InvalidTransactionId;
/*
- * Hash table for storing the streaming xid information along with shared file
- * set for streaming and subxact files.
+ * Hash table for storing the streaming xid information along with filesets
+ * for streaming and subxact files.
*/
static HTAB *xidhash = NULL;
@@ -1297,11 +1297,11 @@ apply_handle_stream_abort(StringInfo s)
/* open the changes file */
changes_filename(path, MyLogicalRepWorker->subid, xid);
- fd = BufFileOpenShared(ent->stream_fileset, path, O_RDWR);
+ fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
/* OK, truncate the file at the right offset */
- BufFileTruncateShared(fd, subxact_data.subxacts[subidx].fileno,
- subxact_data.subxacts[subidx].offset);
+ BufFileTruncateFileSet(fd, subxact_data.subxacts[subidx].fileno,
+ subxact_data.subxacts[subidx].offset);
BufFileClose(fd);
/* discard the subxacts added later */
@@ -1355,7 +1355,7 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
errmsg_internal("transaction %u not found in stream XID hash table",
xid)));
- fd = BufFileOpenShared(ent->stream_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDONLY);
buffer = palloc(BLCKSZ);
initStringInfo(&s2);
@@ -2542,6 +2542,30 @@ UpdateWorkerStats(XLogRecPtr last_lsn, TimestampTz send_time, bool reply)
}
/*
+ * Cleanup filesets.
+ */
+void
+logicalrep_worker_cleanupfileset(void)
+{
+ HASH_SEQ_STATUS status;
+ StreamXidHash *hentry;
+
+ /* Remove all the pending stream and subxact filesets. */
+ if (xidhash)
+ {
+ hash_seq_init(&status, xidhash);
+ while ((hentry = (StreamXidHash *) hash_seq_search(&status)) != NULL)
+ {
+ FileSetDeleteAll(hentry->stream_fileset);
+
+ /* Delete the subxact fileset iff it is created. */
+ if (hentry->subxact_fileset)
+ FileSetDeleteAll(hentry->subxact_fileset);
+ }
+ }
+}
+
+/*
* Apply main loop.
*/
static void
@@ -3024,7 +3048,7 @@ subxact_info_write(Oid subid, TransactionId xid)
if (ent->subxact_fileset)
{
cleanup_subxact_info();
- SharedFileSetDeleteAll(ent->subxact_fileset);
+ FileSetDeleteAll(ent->subxact_fileset);
pfree(ent->subxact_fileset);
ent->subxact_fileset = NULL;
}
@@ -3042,18 +3066,18 @@ subxact_info_write(Oid subid, TransactionId xid)
MemoryContext oldctx;
/*
- * We need to maintain shared fileset across multiple stream
- * start/stop calls. So, need to allocate it in a persistent context.
+ * We need to maintain fileset across multiple stream start/stop
+ * calls. So, need to allocate it in a persistent context.
*/
oldctx = MemoryContextSwitchTo(ApplyContext);
- ent->subxact_fileset = palloc(sizeof(SharedFileSet));
- SharedFileSetInit(ent->subxact_fileset, NULL);
+ ent->subxact_fileset = palloc(sizeof(FileSet));
+ FileSetInit(ent->subxact_fileset);
MemoryContextSwitchTo(oldctx);
- fd = BufFileCreateShared(ent->subxact_fileset, path);
+ fd = BufFileCreateFileSet(ent->subxact_fileset, path);
}
else
- fd = BufFileOpenShared(ent->subxact_fileset, path, O_RDWR);
+ fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDWR);
len = sizeof(SubXactInfo) * subxact_data.nsubxacts;
@@ -3107,7 +3131,7 @@ subxact_info_read(Oid subid, TransactionId xid)
subxact_filename(path, subid, xid);
- fd = BufFileOpenShared(ent->subxact_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDONLY);
/* read number of subxact items */
if (BufFileRead(fd, &subxact_data.nsubxacts,
@@ -3264,7 +3288,7 @@ stream_cleanup_files(Oid subid, TransactionId xid)
/* Delete the change file and release the stream fileset memory */
changes_filename(path, subid, xid);
- SharedFileSetDeleteAll(ent->stream_fileset);
+ FileSetDeleteAll(ent->stream_fileset);
pfree(ent->stream_fileset);
ent->stream_fileset = NULL;
@@ -3272,7 +3296,7 @@ stream_cleanup_files(Oid subid, TransactionId xid)
if (ent->subxact_fileset)
{
subxact_filename(path, subid, xid);
- SharedFileSetDeleteAll(ent->subxact_fileset);
+ FileSetDeleteAll(ent->subxact_fileset);
pfree(ent->subxact_fileset);
ent->subxact_fileset = NULL;
}
@@ -3288,8 +3312,8 @@ stream_cleanup_files(Oid subid, TransactionId xid)
*
* Open a file for streamed changes from a toplevel transaction identified
* by stream_xid (global variable). If it's the first chunk of streamed
- * changes for this transaction, initialize the shared fileset and create the
- * buffile, otherwise open the previously created file.
+ * changes for this transaction, initialize the fileset and create the buffile,
+ * otherwise open the previously created file.
*
* This can only be called at the beginning of a "streaming" block, i.e.
* between stream_start/stream_stop messages from the upstream.
@@ -3330,7 +3354,7 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
if (first_segment)
{
MemoryContext savectx;
- SharedFileSet *fileset;
+ FileSet *fileset;
if (found)
ereport(ERROR,
@@ -3338,16 +3362,16 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
/*
- * We need to maintain shared fileset across multiple stream
- * start/stop calls. So, need to allocate it in a persistent context.
+ * We need to maintain fileset across multiple stream start/stop
+ * calls. So, need to allocate it in a persistent context.
*/
savectx = MemoryContextSwitchTo(ApplyContext);
- fileset = palloc(sizeof(SharedFileSet));
+ fileset = palloc(sizeof(FileSet));
- SharedFileSetInit(fileset, NULL);
+ FileSetInit(fileset);
MemoryContextSwitchTo(savectx);
- stream_fd = BufFileCreateShared(fileset, path);
+ stream_fd = BufFileCreateFileSet(fileset, path);
/* Remember the fileset for the next stream of the same transaction */
ent->xid = xid;
@@ -3365,7 +3389,7 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
* Open the file and seek to the end of the file because we always
* append the changes file.
*/
- stream_fd = BufFileOpenShared(ent->stream_fileset, path, O_RDWR);
+ stream_fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
BufFileSeek(stream_fd, 0, 0, SEEK_END);
}
diff --git a/src/backend/storage/file/Makefile b/src/backend/storage/file/Makefile
index 5e1291b..660ac51 100644
--- a/src/backend/storage/file/Makefile
+++ b/src/backend/storage/file/Makefile
@@ -16,6 +16,7 @@ OBJS = \
buffile.o \
copydir.o \
fd.o \
+ fileset.o \
reinit.o \
sharedfileset.o
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index a4be5fe..5e5409d 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -39,7 +39,7 @@
* BufFile also supports temporary files that can be used by the single backend
* when the corresponding files need to be survived across the transaction and
* need to be opened and closed multiple times. Such files need to be created
- * as a member of a SharedFileSet.
+ * as a member of a FileSet.
*-------------------------------------------------------------------------
*/
@@ -77,8 +77,8 @@ struct BufFile
bool dirty; /* does buffer need to be written? */
bool readOnly; /* has the file been set to read only? */
- SharedFileSet *fileset; /* space for segment files if shared */
- const char *name; /* name of this BufFile if shared */
+ FileSet *fileset; /* space for fileset based segment files */
+ const char *name; /* name of fileset based BufFile */
/*
* resowner is the ResourceOwner to use for underlying temp files. (We
@@ -104,7 +104,7 @@ static void extendBufFile(BufFile *file);
static void BufFileLoadBuffer(BufFile *file);
static void BufFileDumpBuffer(BufFile *file);
static void BufFileFlush(BufFile *file);
-static File MakeNewSharedSegment(BufFile *file, int segment);
+static File MakeNewFileSetSegment(BufFile *file, int segment);
/*
* Create BufFile and perform the common initialization.
@@ -160,7 +160,7 @@ extendBufFile(BufFile *file)
if (file->fileset == NULL)
pfile = OpenTemporaryFile(file->isInterXact);
else
- pfile = MakeNewSharedSegment(file, file->numFiles);
+ pfile = MakeNewFileSetSegment(file, file->numFiles);
Assert(pfile >= 0);
@@ -214,34 +214,34 @@ BufFileCreateTemp(bool interXact)
* Build the name for a given segment of a given BufFile.
*/
static void
-SharedSegmentName(char *name, const char *buffile_name, int segment)
+FileSetSegmentName(char *name, const char *buffile_name, int segment)
{
snprintf(name, MAXPGPATH, "%s.%d", buffile_name, segment);
}
/*
- * Create a new segment file backing a shared BufFile.
+ * Create a new segment file backing a fileset based BufFile.
*/
static File
-MakeNewSharedSegment(BufFile *buffile, int segment)
+MakeNewFileSetSegment(BufFile *buffile, int segment)
{
char name[MAXPGPATH];
File file;
/*
* It is possible that there are files left over from before a crash
- * restart with the same name. In order for BufFileOpenShared() not to
+ * restart with the same name. In order for BufFileOpenFileSet() not to
* get confused about how many segments there are, we'll unlink the next
* segment number if it already exists.
*/
- SharedSegmentName(name, buffile->name, segment + 1);
- SharedFileSetDelete(buffile->fileset, name, true);
+ FileSetSegmentName(name, buffile->name, segment + 1);
+ FileSetDelete(buffile->fileset, name, true);
/* Create the new segment. */
- SharedSegmentName(name, buffile->name, segment);
- file = SharedFileSetCreate(buffile->fileset, name);
+ FileSetSegmentName(name, buffile->name, segment);
+ file = FileSetCreate(buffile->fileset, name);
- /* SharedFileSetCreate would've errored out */
+ /* FileSetCreate would've errored out */
Assert(file > 0);
return file;
@@ -251,15 +251,15 @@ MakeNewSharedSegment(BufFile *buffile, int segment)
* Create a BufFile that can be discovered and opened read-only by other
* backends that are attached to the same SharedFileSet using the same name.
*
- * The naming scheme for shared BufFiles is left up to the calling code. The
- * name will appear as part of one or more filenames on disk, and might
+ * The naming scheme for fileset based BufFiles is left up to the calling code.
+ * The name will appear as part of one or more filenames on disk, and might
* provide clues to administrators about which subsystem is generating
* temporary file data. Since each SharedFileSet object is backed by one or
* more uniquely named temporary directory, names don't conflict with
* unrelated SharedFileSet objects.
*/
BufFile *
-BufFileCreateShared(SharedFileSet *fileset, const char *name)
+BufFileCreateFileSet(FileSet *fileset, const char *name)
{
BufFile *file;
@@ -267,7 +267,7 @@ BufFileCreateShared(SharedFileSet *fileset, const char *name)
file->fileset = fileset;
file->name = pstrdup(name);
file->files = (File *) palloc(sizeof(File));
- file->files[0] = MakeNewSharedSegment(file, 0);
+ file->files[0] = MakeNewFileSetSegment(file, 0);
file->readOnly = false;
return file;
@@ -275,13 +275,13 @@ BufFileCreateShared(SharedFileSet *fileset, const char *name)
/*
* Open a file that was previously created in another backend (or this one)
- * with BufFileCreateShared in the same SharedFileSet using the same name.
+ * with BufFileCreateFileSet in the same FileSet using the same name.
* The backend that created the file must have called BufFileClose() or
- * BufFileExportShared() to make sure that it is ready to be opened by other
+ * BufFileExportFileSet() to make sure that it is ready to be opened by other
* backends and render it read-only.
*/
BufFile *
-BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
+BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
{
BufFile *file;
char segment_name[MAXPGPATH];
@@ -304,8 +304,8 @@ BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
files = repalloc(files, sizeof(File) * capacity);
}
/* Try to load a segment. */
- SharedSegmentName(segment_name, name, nfiles);
- files[nfiles] = SharedFileSetOpen(fileset, segment_name, mode);
+ FileSetSegmentName(segment_name, name, nfiles);
+ files[nfiles] = FileSetOpen(fileset, segment_name, mode);
if (files[nfiles] <= 0)
break;
++nfiles;
@@ -333,18 +333,18 @@ BufFileOpenShared(SharedFileSet *fileset, const char *name, int mode)
}
/*
- * Delete a BufFile that was created by BufFileCreateShared in the given
- * SharedFileSet using the given name.
+ * Delete a BufFile that was created by BufFileCreateFileSet in the given
+ * FileSet using the given name.
*
* It is not necessary to delete files explicitly with this function. It is
* provided only as a way to delete files proactively, rather than waiting for
- * the SharedFileSet to be cleaned up.
+ * the FileSet to be cleaned up.
*
* Only one backend should attempt to delete a given name, and should know
* that it exists and has been exported or closed.
*/
void
-BufFileDeleteShared(SharedFileSet *fileset, const char *name)
+BufFileDeleteFileSet(FileSet *fileset, const char *name)
{
char segment_name[MAXPGPATH];
int segment = 0;
@@ -357,8 +357,8 @@ BufFileDeleteShared(SharedFileSet *fileset, const char *name)
*/
for (;;)
{
- SharedSegmentName(segment_name, name, segment);
- if (!SharedFileSetDelete(fileset, segment_name, true))
+ FileSetSegmentName(segment_name, name, segment);
+ if (!FileSetDelete(fileset, segment_name, true))
break;
found = true;
++segment;
@@ -367,16 +367,16 @@ BufFileDeleteShared(SharedFileSet *fileset, const char *name)
}
if (!found)
- elog(ERROR, "could not delete unknown shared BufFile \"%s\"", name);
+ elog(ERROR, "could not delete unknown BufFile \"%s\"", name);
}
/*
- * BufFileExportShared --- flush and make read-only, in preparation for sharing.
+ * BufFileExportFileSet --- flush and make read-only, in preparation for sharing.
*/
void
-BufFileExportShared(BufFile *file)
+BufFileExportFileSet(BufFile *file)
{
- /* Must be a file belonging to a SharedFileSet. */
+ /* Must be a file belonging to a FileSet. */
Assert(file->fileset != NULL);
/* It's probably a bug if someone calls this twice. */
@@ -785,7 +785,7 @@ BufFileTellBlock(BufFile *file)
#endif
/*
- * Return the current shared BufFile size.
+ * Return the current fileset based BufFile size.
*
* Counts any holes left behind by BufFileAppend as part of the size.
* ereport()s on failure.
@@ -811,8 +811,8 @@ BufFileSize(BufFile *file)
}
/*
- * Append the contents of source file (managed within shared fileset) to
- * end of target file (managed within same shared fileset).
+ * Append the contents of source file (managed within fileset) to
+ * end of target file (managed within same fileset).
*
* Note that operation subsumes ownership of underlying resources from
* "source". Caller should never call BufFileClose against source having
@@ -854,11 +854,11 @@ BufFileAppend(BufFile *target, BufFile *source)
}
/*
- * Truncate a BufFile created by BufFileCreateShared up to the given fileno and
- * the offset.
+ * Truncate a BufFile created by BufFileCreateFileSet up to the given fileno
+ * and the offset.
*/
void
-BufFileTruncateShared(BufFile *file, int fileno, off_t offset)
+BufFileTruncateFileSet(BufFile *file, int fileno, off_t offset)
{
int numFiles = file->numFiles;
int newFile = fileno;
@@ -876,12 +876,12 @@ BufFileTruncateShared(BufFile *file, int fileno, off_t offset)
{
if ((i != fileno || offset == 0) && i != 0)
{
- SharedSegmentName(segment_name, file->name, i);
+ FileSetSegmentName(segment_name, file->name, i);
FileClose(file->files[i]);
- if (!SharedFileSetDelete(file->fileset, segment_name, true))
+ if (!FileSetDelete(file->fileset, segment_name, true))
ereport(ERROR,
(errcode_for_file_access(),
- errmsg("could not delete shared fileset \"%s\": %m",
+ errmsg("could not delete fileset \"%s\": %m",
segment_name)));
numFiles--;
newOffset = MAX_PHYSICAL_FILESIZE;
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index b58b399..433e283 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -1921,7 +1921,7 @@ PathNameDeleteTemporaryFile(const char *path, bool error_on_failure)
/*
* Unlike FileClose's automatic file deletion code, we tolerate
- * non-existence to support BufFileDeleteShared which doesn't know how
+ * non-existence to support BufFileDeleteFileSet which doesn't know how
* many segments it has to delete until it runs out.
*/
if (stat_errno == ENOENT)
diff --git a/src/backend/storage/file/fileset.c b/src/backend/storage/file/fileset.c
new file mode 100644
index 0000000..282ff12
--- /dev/null
+++ b/src/backend/storage/file/fileset.c
@@ -0,0 +1,205 @@
+/*-------------------------------------------------------------------------
+ *
+ * fileset.c
+ * Management of named temporary files.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/storage/file/fileset.c
+ *
+ * FileSets provide a temporary namespace (think directory) so that files can
+ * be discovered by name.
+ *
+ * FileSets can be used by backends when the temporary files need to be
+ * opened/closed multiple times and the underlying files need to survive across
+ * transactions.
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <limits.h>
+
+#include "catalog/pg_tablespace.h"
+#include "commands/tablespace.h"
+#include "common/hashfn.h"
+#include "miscadmin.h"
+#include "storage/ipc.h"
+#include "storage/fileset.h"
+#include "utils/builtins.h"
+
+static void FileSetPath(char *path, FileSet *fileset, Oid tablespace);
+static void FilePath(char *path, FileSet *fileset, const char *name);
+static Oid ChooseTablespace(const FileSet *fileset, const char *name);
+
+/*
+ * Initialize a space for temporary files. This API can be used by shared
+ * fileset as well as if the temporary files are used only by single backend
+ * but the files need to be opened and closed multiple times and also the
+ * underlying files need to survive across transactions.
+ *
+ * The callers are expected to explicitly remove such files by using
+ * FileSetDelete/FileSetDeleteAll.
+ *
+ * Files will be distributed over the tablespaces configured in
+ * temp_tablespaces.
+ *
+ * Under the covers the set is one or more directories which will eventually
+ * be deleted.
+ */
+void
+FileSetInit(FileSet *fileset)
+{
+ static uint32 counter = 0;
+
+ fileset->creator_pid = MyProcPid;
+ fileset->number = counter;
+ counter = (counter + 1) % INT_MAX;
+
+ /* Capture the tablespace OIDs so that all backends agree on them. */
+ PrepareTempTablespaces();
+ fileset->ntablespaces =
+ GetTempTablespaces(&fileset->tablespaces[0],
+ lengthof(fileset->tablespaces));
+ if (fileset->ntablespaces == 0)
+ {
+ /* If the GUC is empty, use current database's default tablespace */
+ fileset->tablespaces[0] = MyDatabaseTableSpace;
+ fileset->ntablespaces = 1;
+ }
+ else
+ {
+ int i;
+
+ /*
+ * An entry of InvalidOid means use the default tablespace for the
+ * current database. Replace that now, to be sure that all users of
+ * the FileSet agree on what to do.
+ */
+ for (i = 0; i < fileset->ntablespaces; i++)
+ {
+ if (fileset->tablespaces[i] == InvalidOid)
+ fileset->tablespaces[i] = MyDatabaseTableSpace;
+ }
+ }
+}
+
+/*
+ * Create a new file in the given set.
+ */
+File
+FileSetCreate(FileSet *fileset, const char *name)
+{
+ char path[MAXPGPATH];
+ File file;
+
+ FilePath(path, fileset, name);
+ file = PathNameCreateTemporaryFile(path, false);
+
+ /* If we failed, see if we need to create the directory on demand. */
+ if (file <= 0)
+ {
+ char tempdirpath[MAXPGPATH];
+ char filesetpath[MAXPGPATH];
+ Oid tablespace = ChooseTablespace(fileset, name);
+
+ TempTablespacePath(tempdirpath, tablespace);
+ FileSetPath(filesetpath, fileset, tablespace);
+ PathNameCreateTemporaryDir(tempdirpath, filesetpath);
+ file = PathNameCreateTemporaryFile(path, true);
+ }
+
+ return file;
+}
+
+/*
+ * Open a file that was created with FileSetCreate() */
+File
+FileSetOpen(FileSet *fileset, const char *name, int mode)
+{
+ char path[MAXPGPATH];
+ File file;
+
+ FilePath(path, fileset, name);
+ file = PathNameOpenTemporaryFile(path, mode);
+
+ return file;
+}
+
+/*
+ * Delete a file that was created with FileSetCreate().
+ *
+ * Return true if the file existed, false if didn't.
+ */
+bool
+FileSetDelete(FileSet *fileset, const char *name,
+ bool error_on_failure)
+{
+ char path[MAXPGPATH];
+
+ FilePath(path, fileset, name);
+
+ return PathNameDeleteTemporaryFile(path, error_on_failure);
+}
+
+/*
+ * Delete all files in the set.
+ */
+void
+FileSetDeleteAll(FileSet *fileset)
+{
+ char dirpath[MAXPGPATH];
+ int i;
+
+ /*
+ * Delete the directory we created in each tablespace. Doesn't fail
+ * because we use this in error cleanup paths, but can generate LOG
+ * message on IO error.
+ */
+ for (i = 0; i < fileset->ntablespaces; ++i)
+ {
+ FileSetPath(dirpath, fileset, fileset->tablespaces[i]);
+ PathNameDeleteTemporaryDir(dirpath);
+ }
+}
+
+/*
+ * Build the path for the directory holding the files backing a FileSet in a
+ * given tablespace.
+ */
+static void
+FileSetPath(char *path, FileSet *fileset, Oid tablespace)
+{
+ char tempdirpath[MAXPGPATH];
+
+ TempTablespacePath(tempdirpath, tablespace);
+ snprintf(path, MAXPGPATH, "%s/%s%lu.%u.fileset",
+ tempdirpath, PG_TEMP_FILE_PREFIX,
+ (unsigned long) fileset->creator_pid, fileset->number);
+}
+
+/*
+ * Sorting has to determine which tablespace a given temporary file belongs in.
+ */
+static Oid
+ChooseTablespace(const FileSet *fileset, const char *name)
+{
+ uint32 hash = hash_any((const unsigned char *) name, strlen(name));
+
+ return fileset->tablespaces[hash % fileset->ntablespaces];
+}
+
+/*
+ * Compute the full path of a file in a FileSet.
+ */
+static void
+FilePath(char *path, FileSet *fileset, const char *name)
+{
+ char dirpath[MAXPGPATH];
+
+ FileSetPath(dirpath, fileset, ChooseTablespace(fileset, name));
+ snprintf(path, MAXPGPATH, "%s/%s", dirpath, name);
+}
diff --git a/src/backend/storage/file/sharedfileset.c b/src/backend/storage/file/sharedfileset.c
index ed37c94..6a33fac 100644
--- a/src/backend/storage/file/sharedfileset.c
+++ b/src/backend/storage/file/sharedfileset.c
@@ -13,10 +13,6 @@
* files can be discovered by name, and a shared ownership semantics so that
* shared files survive until the last user detaches.
*
- * SharedFileSets can be used by backends when the temporary files need to be
- * opened/closed multiple times and the underlying files need to survive across
- * transactions.
- *
*-------------------------------------------------------------------------
*/
@@ -33,13 +29,7 @@
#include "storage/sharedfileset.h"
#include "utils/builtins.h"
-static List *filesetlist = NIL;
-
static void SharedFileSetOnDetach(dsm_segment *segment, Datum datum);
-static void SharedFileSetDeleteOnProcExit(int status, Datum arg);
-static void SharedFileSetPath(char *path, SharedFileSet *fileset, Oid tablespace);
-static void SharedFilePath(char *path, SharedFileSet *fileset, const char *name);
-static Oid ChooseTablespace(const SharedFileSet *fileset, const char *name);
/*
* Initialize a space for temporary files that can be opened by other backends.
@@ -47,77 +37,22 @@ static Oid ChooseTablespace(const SharedFileSet *fileset, const char *name);
* SharedFileSet with 'seg'. Any contained files will be deleted when the
* last backend detaches.
*
- * We can also use this interface if the temporary files are used only by
- * single backend but the files need to be opened and closed multiple times
- * and also the underlying files need to survive across transactions. For
- * such cases, dsm segment 'seg' should be passed as NULL. Callers are
- * expected to explicitly remove such files by using SharedFileSetDelete/
- * SharedFileSetDeleteAll or we remove such files on proc exit.
- *
- * Files will be distributed over the tablespaces configured in
- * temp_tablespaces.
- *
* Under the covers the set is one or more directories which will eventually
* be deleted.
*/
void
SharedFileSetInit(SharedFileSet *fileset, dsm_segment *seg)
{
- static uint32 counter = 0;
-
+ /* Initialize the shared fileset specific members. */
SpinLockInit(&fileset->mutex);
fileset->refcnt = 1;
- fileset->creator_pid = MyProcPid;
- fileset->number = counter;
- counter = (counter + 1) % INT_MAX;
-
- /* Capture the tablespace OIDs so that all backends agree on them. */
- PrepareTempTablespaces();
- fileset->ntablespaces =
- GetTempTablespaces(&fileset->tablespaces[0],
- lengthof(fileset->tablespaces));
- if (fileset->ntablespaces == 0)
- {
- /* If the GUC is empty, use current database's default tablespace */
- fileset->tablespaces[0] = MyDatabaseTableSpace;
- fileset->ntablespaces = 1;
- }
- else
- {
- int i;
- /*
- * An entry of InvalidOid means use the default tablespace for the
- * current database. Replace that now, to be sure that all users of
- * the SharedFileSet agree on what to do.
- */
- for (i = 0; i < fileset->ntablespaces; i++)
- {
- if (fileset->tablespaces[i] == InvalidOid)
- fileset->tablespaces[i] = MyDatabaseTableSpace;
- }
- }
+ /* Initialize the fileset. */
+ FileSetInit(&fileset->fs);
/* Register our cleanup callback. */
if (seg)
on_dsm_detach(seg, SharedFileSetOnDetach, PointerGetDatum(fileset));
- else
- {
- static bool registered_cleanup = false;
-
- if (!registered_cleanup)
- {
- /*
- * We must not have registered any fileset before registering the
- * fileset clean up.
- */
- Assert(filesetlist == NIL);
- on_proc_exit(SharedFileSetDeleteOnProcExit, 0);
- registered_cleanup = true;
- }
-
- filesetlist = lcons((void *) fileset, filesetlist);
- }
}
/*
@@ -148,86 +83,12 @@ SharedFileSetAttach(SharedFileSet *fileset, dsm_segment *seg)
}
/*
- * Create a new file in the given set.
- */
-File
-SharedFileSetCreate(SharedFileSet *fileset, const char *name)
-{
- char path[MAXPGPATH];
- File file;
-
- SharedFilePath(path, fileset, name);
- file = PathNameCreateTemporaryFile(path, false);
-
- /* If we failed, see if we need to create the directory on demand. */
- if (file <= 0)
- {
- char tempdirpath[MAXPGPATH];
- char filesetpath[MAXPGPATH];
- Oid tablespace = ChooseTablespace(fileset, name);
-
- TempTablespacePath(tempdirpath, tablespace);
- SharedFileSetPath(filesetpath, fileset, tablespace);
- PathNameCreateTemporaryDir(tempdirpath, filesetpath);
- file = PathNameCreateTemporaryFile(path, true);
- }
-
- return file;
-}
-
-/*
- * Open a file that was created with SharedFileSetCreate(), possibly in
- * another backend.
- */
-File
-SharedFileSetOpen(SharedFileSet *fileset, const char *name, int mode)
-{
- char path[MAXPGPATH];
- File file;
-
- SharedFilePath(path, fileset, name);
- file = PathNameOpenTemporaryFile(path, mode);
-
- return file;
-}
-
-/*
- * Delete a file that was created with SharedFileSetCreate().
- * Return true if the file existed, false if didn't.
- */
-bool
-SharedFileSetDelete(SharedFileSet *fileset, const char *name,
- bool error_on_failure)
-{
- char path[MAXPGPATH];
-
- SharedFilePath(path, fileset, name);
-
- return PathNameDeleteTemporaryFile(path, error_on_failure);
-}
-
-/*
* Delete all files in the set.
*/
void
SharedFileSetDeleteAll(SharedFileSet *fileset)
{
- char dirpath[MAXPGPATH];
- int i;
-
- /*
- * Delete the directory we created in each tablespace. Doesn't fail
- * because we use this in error cleanup paths, but can generate LOG
- * message on IO error.
- */
- for (i = 0; i < fileset->ntablespaces; ++i)
- {
- SharedFileSetPath(dirpath, fileset, fileset->tablespaces[i]);
- PathNameDeleteTemporaryDir(dirpath);
- }
-
- /* Unregister the shared fileset */
- SharedFileSetUnregister(fileset);
+ FileSetDeleteAll(&fileset->fs);
}
/*
@@ -255,100 +116,5 @@ SharedFileSetOnDetach(dsm_segment *segment, Datum datum)
* this function so we can safely access its data.
*/
if (unlink_all)
- SharedFileSetDeleteAll(fileset);
-}
-
-/*
- * Callback function that will be invoked on the process exit. This will
- * process the list of all the registered sharedfilesets and delete the
- * underlying files.
- */
-static void
-SharedFileSetDeleteOnProcExit(int status, Datum arg)
-{
- /*
- * Remove all the pending shared fileset entries. We don't use foreach()
- * here because SharedFileSetDeleteAll will remove the current element in
- * filesetlist. Though we have used foreach_delete_current() to remove the
- * element from filesetlist it could only fix up the state of one of the
- * loops, see SharedFileSetUnregister.
- */
- while (list_length(filesetlist) > 0)
- {
- SharedFileSet *fileset = (SharedFileSet *) linitial(filesetlist);
-
- SharedFileSetDeleteAll(fileset);
- }
-
- filesetlist = NIL;
-}
-
-/*
- * Unregister the shared fileset entry registered for cleanup on proc exit.
- */
-void
-SharedFileSetUnregister(SharedFileSet *input_fileset)
-{
- ListCell *l;
-
- /*
- * If the caller is following the dsm based cleanup then we don't maintain
- * the filesetlist so return.
- */
- if (filesetlist == NIL)
- return;
-
- foreach(l, filesetlist)
- {
- SharedFileSet *fileset = (SharedFileSet *) lfirst(l);
-
- /* Remove the entry from the list */
- if (input_fileset == fileset)
- {
- filesetlist = foreach_delete_current(filesetlist, l);
- return;
- }
- }
-
- /* Should have found a match */
- Assert(false);
-}
-
-/*
- * Build the path for the directory holding the files backing a SharedFileSet
- * in a given tablespace.
- */
-static void
-SharedFileSetPath(char *path, SharedFileSet *fileset, Oid tablespace)
-{
- char tempdirpath[MAXPGPATH];
-
- TempTablespacePath(tempdirpath, tablespace);
- snprintf(path, MAXPGPATH, "%s/%s%lu.%u.sharedfileset",
- tempdirpath, PG_TEMP_FILE_PREFIX,
- (unsigned long) fileset->creator_pid, fileset->number);
-}
-
-/*
- * Sorting hat to determine which tablespace a given shared temporary file
- * belongs in.
- */
-static Oid
-ChooseTablespace(const SharedFileSet *fileset, const char *name)
-{
- uint32 hash = hash_any((const unsigned char *) name, strlen(name));
-
- return fileset->tablespaces[hash % fileset->ntablespaces];
-}
-
-/*
- * Compute the full path of a file in a SharedFileSet.
- */
-static void
-SharedFilePath(char *path, SharedFileSet *fileset, const char *name)
-{
- char dirpath[MAXPGPATH];
-
- SharedFileSetPath(dirpath, fileset, ChooseTablespace(fileset, name));
- snprintf(path, MAXPGPATH, "%s/%s", dirpath, name);
+ FileSetDeleteAll(&fileset->fs);
}
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index cafc087..f7994d7 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -564,7 +564,7 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
lt = <s->tapes[i];
pg_itoa(i, filename);
- file = BufFileOpenShared(fileset, filename, O_RDONLY);
+ file = BufFileOpenFileSet(&fileset->fs, filename, O_RDONLY);
filesize = BufFileSize(file);
/*
@@ -610,7 +610,7 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
* offset).
*
* The only thing that currently prevents writing to the leader tape from
- * working is the fact that BufFiles opened using BufFileOpenShared() are
+ * working is the fact that BufFiles opened using BufFileOpenFileSet() are
* read-only by definition, but that could be changed if it seemed
* worthwhile. For now, writing to the leader tape will raise a "Bad file
* descriptor" error, so tuplesort must avoid writing to the leader tape
@@ -722,7 +722,7 @@ LogicalTapeSetCreate(int ntapes, bool preallocate, TapeShare *shared,
char filename[MAXPGPATH];
pg_itoa(worker, filename);
- lts->pfile = BufFileCreateShared(fileset, filename);
+ lts->pfile = BufFileCreateFileSet(&fileset->fs, filename);
}
else
lts->pfile = BufFileCreateTemp(false);
@@ -1096,7 +1096,7 @@ LogicalTapeFreeze(LogicalTapeSet *lts, int tapenum, TapeShare *share)
/* Handle extra steps when caller is to share its tapeset */
if (share)
{
- BufFileExportShared(lts->pfile);
+ BufFileExportFileSet(lts->pfile);
share->firstblocknumber = lt->firstBlockNumber;
}
}
diff --git a/src/backend/utils/sort/sharedtuplestore.c b/src/backend/utils/sort/sharedtuplestore.c
index 57e35db..504ef1c 100644
--- a/src/backend/utils/sort/sharedtuplestore.c
+++ b/src/backend/utils/sort/sharedtuplestore.c
@@ -310,7 +310,8 @@ sts_puttuple(SharedTuplestoreAccessor *accessor, void *meta_data,
/* Create one. Only this backend will write into it. */
sts_filename(name, accessor, accessor->participant);
- accessor->write_file = BufFileCreateShared(accessor->fileset, name);
+ accessor->write_file =
+ BufFileCreateFileSet(&accessor->fileset->fs, name);
/* Set up the shared state for this backend's file. */
participant = &accessor->sts->participants[accessor->participant];
@@ -559,7 +560,7 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
sts_filename(name, accessor, accessor->read_participant);
accessor->read_file =
- BufFileOpenShared(accessor->fileset, name, O_RDONLY);
+ BufFileOpenFileSet(&accessor->fileset->fs, name, O_RDONLY);
}
/* Seek and load the chunk header. */
diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h
index 41c7487..a6c9d4e 100644
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -79,6 +79,7 @@ extern void logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname,
extern void logicalrep_worker_stop(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup_ptr(LogicalRepWorker *worker);
+extern void logicalrep_worker_cleanupfileset(void);
extern int logicalrep_sync_worker_count(Oid subid);
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 566523d..143eada 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -26,7 +26,7 @@
#ifndef BUFFILE_H
#define BUFFILE_H
-#include "storage/sharedfileset.h"
+#include "storage/fileset.h"
/* BufFile is an opaque type whose details are not known outside buffile.c. */
@@ -46,11 +46,11 @@ extern int BufFileSeekBlock(BufFile *file, long blknum);
extern int64 BufFileSize(BufFile *file);
extern long BufFileAppend(BufFile *target, BufFile *source);
-extern BufFile *BufFileCreateShared(SharedFileSet *fileset, const char *name);
-extern void BufFileExportShared(BufFile *file);
-extern BufFile *BufFileOpenShared(SharedFileSet *fileset, const char *name,
- int mode);
-extern void BufFileDeleteShared(SharedFileSet *fileset, const char *name);
-extern void BufFileTruncateShared(BufFile *file, int fileno, off_t offset);
+extern BufFile *BufFileCreateFileSet(FileSet *fileset, const char *name);
+extern void BufFileExportFileSet(BufFile *file);
+extern BufFile *BufFileOpenFileSet(FileSet *fileset, const char *name,
+ int mode);
+extern void BufFileDeleteFileSet(FileSet *fileset, const char *name);
+extern void BufFileTruncateFileSet(BufFile *file, int fileno, off_t offset);
#endif /* BUFFILE_H */
diff --git a/src/include/storage/fileset.h b/src/include/storage/fileset.h
new file mode 100644
index 0000000..be0e097
--- /dev/null
+++ b/src/include/storage/fileset.h
@@ -0,0 +1,40 @@
+/*-------------------------------------------------------------------------
+ *
+ * fileset.h
+ * Management of named temporary files.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/fileset.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FILESET_H
+#define FILESET_H
+
+#include "storage/fd.h"
+
+/*
+ * A set of temporary files.
+ */
+typedef struct FileSet
+{
+ pid_t creator_pid; /* PID of the creating process */
+ uint32 number; /* per-PID identifier */
+ int ntablespaces; /* number of tablespaces to use */
+ Oid tablespaces[8]; /* OIDs of tablespaces to use. Assumes that
+ * it's rare that there more than temp
+ * tablespaces. */
+} FileSet;
+
+extern void FileSetInit(FileSet *fileset);
+extern File FileSetCreate(FileSet *fileset, const char *name);
+extern File FileSetOpen(FileSet *fileset, const char *name,
+ int mode);
+extern bool FileSetDelete(FileSet *fileset, const char *name,
+ bool error_on_failure);
+extern void FileSetDeleteAll(FileSet *fileset);
+
+#endif
diff --git a/src/include/storage/sharedfileset.h b/src/include/storage/sharedfileset.h
index 09ba121..59becfb 100644
--- a/src/include/storage/sharedfileset.h
+++ b/src/include/storage/sharedfileset.h
@@ -17,6 +17,7 @@
#include "storage/dsm.h"
#include "storage/fd.h"
+#include "storage/fileset.h"
#include "storage/spin.h"
/*
@@ -24,24 +25,13 @@
*/
typedef struct SharedFileSet
{
- pid_t creator_pid; /* PID of the creating process */
- uint32 number; /* per-PID identifier */
+ FileSet fs;
slock_t mutex; /* mutex protecting the reference count */
int refcnt; /* number of attached backends */
- int ntablespaces; /* number of tablespaces to use */
- Oid tablespaces[8]; /* OIDs of tablespaces to use. Assumes that
- * it's rare that there more than temp
- * tablespaces. */
} SharedFileSet;
extern void SharedFileSetInit(SharedFileSet *fileset, dsm_segment *seg);
extern void SharedFileSetAttach(SharedFileSet *fileset, dsm_segment *seg);
-extern File SharedFileSetCreate(SharedFileSet *fileset, const char *name);
-extern File SharedFileSetOpen(SharedFileSet *fileset, const char *name,
- int mode);
-extern bool SharedFileSetDelete(SharedFileSet *fileset, const char *name,
- bool error_on_failure);
extern void SharedFileSetDeleteAll(SharedFileSet *fileset);
-extern void SharedFileSetUnregister(SharedFileSet *input_fileset);
#endif
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 621d0cb..f31a1e4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -709,6 +709,7 @@ File
FileFdwExecutionState
FileFdwPlanState
FileNameMap
+FileSet
FileTag
FinalPathExtraData
FindColsContext
--
1.8.3.1
v6-0002-Using-fileset-more-effectively-in-the-apply-worke.patchtext/x-patch; charset=US-ASCII; name=v6-0002-Using-fileset-more-effectively-in-the-apply-worke.patchDownload
From 99a71be71a9db8b118890b71aab3797d0904285a Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Fri, 27 Aug 2021 11:49:12 +0530
Subject: [PATCH v6 2/2] Using fileset more effectively in the apply worker
Do not use separate file sets for each xid, instead use one fileset
for the worker's entire lifetime. Now, the changes/subxacts files
for every streaming transaction will be created under the same fileset
and the files will be deleted after the transaction is completed.
The fileset will be there until the worker exit.
---
src/backend/replication/logical/launcher.c | 2 +-
src/backend/replication/logical/worker.c | 249 ++++++-----------------------
src/backend/storage/file/buffile.c | 23 ++-
src/backend/utils/sort/logtape.c | 2 +-
src/backend/utils/sort/sharedtuplestore.c | 3 +-
src/include/storage/buffile.h | 5 +-
6 files changed, 75 insertions(+), 209 deletions(-)
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 8b1772d..644a9c2 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -648,7 +648,7 @@ logicalrep_worker_onexit(int code, Datum arg)
logicalrep_worker_detach();
- /* Cleanup filesets used for streaming transactions. */
+ /* Cleanup fileset used for streaming transactions. */
logicalrep_worker_cleanupfileset();
ApplyLauncherWakeup();
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index bfb7d1a..c3e4114 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -236,20 +236,6 @@ static ApplyErrorCallbackArg apply_error_callback_arg =
.ts = 0,
};
-/*
- * Stream xid hash entry. Whenever we see a new xid we create this entry in the
- * xidhash and along with it create the streaming file and store the fileset handle.
- * The subxact file is created iff there is any subxact info under this xid. This
- * entry is used on the subsequent streams for the xid to get the corresponding
- * fileset handles, so storing them in hash makes the search faster.
- */
-typedef struct StreamXidHash
-{
- TransactionId xid; /* xid is the hash key and must be first */
- FileSet *stream_fileset; /* file set for stream data */
- FileSet *subxact_fileset; /* file set for subxact info */
-} StreamXidHash;
-
static MemoryContext ApplyMessageContext = NULL;
MemoryContext ApplyContext = NULL;
@@ -270,10 +256,13 @@ static bool in_streamed_transaction = false;
static TransactionId stream_xid = InvalidTransactionId;
/*
- * Hash table for storing the streaming xid information along with filesets
- * for streaming and subxact files.
+ * The fileset is used by the worker to create the changes and subxact files
+ * for the streaming transaction. Upon the arrival of the first streaming
+ * transaction, the fileset will be initialized, and it will be deleted when
+ * the worker exits. Under this, separate buffiles would be created for each
+ * transaction and would be deleted after the transaction is completed.
*/
-static HTAB *xidhash = NULL;
+static FileSet *stream_fileset = NULL;
/* BufFile handle of the current streaming file */
static BufFile *stream_fd = NULL;
@@ -1118,7 +1107,6 @@ static void
apply_handle_stream_start(StringInfo s)
{
bool first_segment;
- HASHCTL hash_ctl;
if (in_streamed_transaction)
ereport(ERROR,
@@ -1148,17 +1136,20 @@ apply_handle_stream_start(StringInfo s)
set_apply_error_context_xact(stream_xid, 0);
/*
- * Initialize the xidhash table if we haven't yet. This will be used for
+ * Initialize the stream_fileset if we haven't yet. This will be used for
* the entire duration of the apply worker so create it in permanent
* context.
*/
- if (xidhash == NULL)
+ if (stream_fileset == NULL)
{
- hash_ctl.keysize = sizeof(TransactionId);
- hash_ctl.entrysize = sizeof(StreamXidHash);
- hash_ctl.hcxt = ApplyContext;
- xidhash = hash_create("StreamXidHash", 1024, &hash_ctl,
- HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ MemoryContext oldctx;
+
+ oldctx = MemoryContextSwitchTo(ApplyContext);
+
+ stream_fileset = palloc(sizeof(FileSet));
+ FileSetInit(stream_fileset);
+
+ MemoryContextSwitchTo(oldctx);
}
/* open the spool file for this transaction */
@@ -1253,7 +1244,6 @@ apply_handle_stream_abort(StringInfo s)
BufFile *fd;
bool found = false;
char path[MAXPGPATH];
- StreamXidHash *ent;
set_apply_error_context_xact(subxid, 0);
@@ -1285,19 +1275,9 @@ apply_handle_stream_abort(StringInfo s)
return;
}
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
/* open the changes file */
changes_filename(path, MyLogicalRepWorker->subid, xid);
- fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
+ fd = BufFileOpenFileSet(stream_fileset, path, O_RDWR, false);
/* OK, truncate the file at the right offset */
BufFileTruncateFileSet(fd, subxact_data.subxacts[subidx].fileno,
@@ -1327,7 +1307,6 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
int nchanges;
char path[MAXPGPATH];
char *buffer = NULL;
- StreamXidHash *ent;
MemoryContext oldcxt;
BufFile *fd;
@@ -1345,17 +1324,7 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
changes_filename(path, MyLogicalRepWorker->subid, xid);
elog(DEBUG1, "replaying changes from file \"%s\"", path);
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
- fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(stream_fileset, path, O_RDONLY, false);
buffer = palloc(BLCKSZ);
initStringInfo(&s2);
@@ -2542,27 +2511,14 @@ UpdateWorkerStats(XLogRecPtr last_lsn, TimestampTz send_time, bool reply)
}
/*
- * Cleanup filesets.
+ * Cleanup fileset.
*/
void
-logicalrep_worker_cleanupfileset(void)
+logicalrep_worker_cleanupfileset()
{
- HASH_SEQ_STATUS status;
- StreamXidHash *hentry;
-
- /* Remove all the pending stream and subxact filesets. */
- if (xidhash)
- {
- hash_seq_init(&status, xidhash);
- while ((hentry = (StreamXidHash *) hash_seq_search(&status)) != NULL)
- {
- FileSetDeleteAll(hentry->stream_fileset);
-
- /* Delete the subxact fileset iff it is created. */
- if (hentry->subxact_fileset)
- FileSetDeleteAll(hentry->subxact_fileset);
- }
- }
+ /* If the fileset is created, clean the underlying files. */
+ if (stream_fileset != NULL)
+ FileSetDeleteAll(stream_fileset);
}
/*
@@ -3026,58 +2982,29 @@ subxact_info_write(Oid subid, TransactionId xid)
{
char path[MAXPGPATH];
Size len;
- StreamXidHash *ent;
BufFile *fd;
Assert(TransactionIdIsValid(xid));
- /* Find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- /* By this time we must have created the transaction entry */
- Assert(ent);
+ /* Get the subxact filename. */
+ subxact_filename(path, subid, xid);
/*
- * If there is no subtransaction then nothing to do, but if already have
- * subxact file then delete that.
+ * If there are no subtransactions, there is nothing to be done, but if
+ * subxacts already exist, delete it.
*/
if (subxact_data.nsubxacts == 0)
{
- if (ent->subxact_fileset)
- {
- cleanup_subxact_info();
- FileSetDeleteAll(ent->subxact_fileset);
- pfree(ent->subxact_fileset);
- ent->subxact_fileset = NULL;
- }
+ cleanup_subxact_info();
+ BufFileDeleteFileSet(stream_fileset, path, true);
+
return;
}
- subxact_filename(path, subid, xid);
-
- /*
- * Create the subxact file if it not already created, otherwise open the
- * existing file.
- */
- if (ent->subxact_fileset == NULL)
- {
- MemoryContext oldctx;
-
- /*
- * We need to maintain fileset across multiple stream start/stop
- * calls. So, need to allocate it in a persistent context.
- */
- oldctx = MemoryContextSwitchTo(ApplyContext);
- ent->subxact_fileset = palloc(sizeof(FileSet));
- FileSetInit(ent->subxact_fileset);
- MemoryContextSwitchTo(oldctx);
-
- fd = BufFileCreateFileSet(ent->subxact_fileset, path);
- }
- else
- fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDWR);
+ /* Open the subxact file, if it does not exist, create it. */
+ fd = BufFileOpenFileSet(stream_fileset, path, O_RDWR, true);
+ if (fd == NULL)
+ fd = BufFileCreateFileSet(stream_fileset, path);
len = sizeof(SubXactInfo) * subxact_data.nsubxacts;
@@ -3104,34 +3031,20 @@ subxact_info_read(Oid subid, TransactionId xid)
char path[MAXPGPATH];
Size len;
BufFile *fd;
- StreamXidHash *ent;
MemoryContext oldctx;
Assert(!subxact_data.subxacts);
Assert(subxact_data.nsubxacts == 0);
Assert(subxact_data.nsubxacts_max == 0);
- /* Find the stream xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
/*
- * If subxact_fileset is not valid that mean we don't have any subxact
- * info
+ * Open the subxact file for the input streaming xid, just return if the
+ * file does not exist.
*/
- if (ent->subxact_fileset == NULL)
- return;
-
subxact_filename(path, subid, xid);
-
- fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(stream_fileset, path, O_RDONLY, true);
+ if (fd == NULL)
+ return;
/* read number of subxact items */
if (BufFileRead(fd, &subxact_data.nsubxacts,
@@ -3267,42 +3180,20 @@ changes_filename(char *path, Oid subid, TransactionId xid)
* Cleanup files for a subscription / toplevel transaction.
*
* Remove files with serialized changes and subxact info for a particular
- * toplevel transaction. Each subscription has a separate set of files.
+ * toplevel transaction. Each subscription has a separate file.
*/
static void
stream_cleanup_files(Oid subid, TransactionId xid)
{
char path[MAXPGPATH];
- StreamXidHash *ent;
-
- /* Find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
- /* Delete the change file and release the stream fileset memory */
+ /* Delete the changes file. */
changes_filename(path, subid, xid);
- FileSetDeleteAll(ent->stream_fileset);
- pfree(ent->stream_fileset);
- ent->stream_fileset = NULL;
-
- /* Delete the subxact file and release the memory, if it exist */
- if (ent->subxact_fileset)
- {
- subxact_filename(path, subid, xid);
- FileSetDeleteAll(ent->subxact_fileset);
- pfree(ent->subxact_fileset);
- ent->subxact_fileset = NULL;
- }
+ BufFileDeleteFileSet(stream_fileset, path, false);
- /* Remove the xid entry from the stream xid hash */
- hash_search(xidhash, (void *) &xid, HASH_REMOVE, NULL);
+ /* Delete the subxact file, if it exist. */
+ subxact_filename(path, subid, xid);
+ BufFileDeleteFileSet(stream_fileset, path, true);
}
/*
@@ -3312,8 +3203,8 @@ stream_cleanup_files(Oid subid, TransactionId xid)
*
* Open a file for streamed changes from a toplevel transaction identified
* by stream_xid (global variable). If it's the first chunk of streamed
- * changes for this transaction, initialize the fileset and create the buffile,
- * otherwise open the previously created file.
+ * changes for this transaction, create the buffile, otherwise open the
+ * previously created file.
*
* This can only be called at the beginning of a "streaming" block, i.e.
* between stream_start/stream_stop messages from the upstream.
@@ -3322,20 +3213,13 @@ static void
stream_open_file(Oid subid, TransactionId xid, bool first_segment)
{
char path[MAXPGPATH];
- bool found;
MemoryContext oldcxt;
- StreamXidHash *ent;
Assert(in_streamed_transaction);
Assert(OidIsValid(subid));
Assert(TransactionIdIsValid(xid));
Assert(stream_fd == NULL);
- /* create or find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_ENTER,
- &found);
changes_filename(path, subid, xid);
elog(DEBUG1, "opening file \"%s\" for streamed changes", path);
@@ -3347,49 +3231,18 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
oldcxt = MemoryContextSwitchTo(LogicalStreamingContext);
/*
- * If this is the first streamed segment, the file must not exist, so make
- * sure we're the ones creating it. Otherwise just open the file for
- * writing, in append mode.
+ * If this is the first streamed segment, create the changes file.
+ * Otherwise, just open the file for writing, in append mode.
*/
if (first_segment)
- {
- MemoryContext savectx;
- FileSet *fileset;
-
- if (found)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
-
- /*
- * We need to maintain fileset across multiple stream start/stop
- * calls. So, need to allocate it in a persistent context.
- */
- savectx = MemoryContextSwitchTo(ApplyContext);
- fileset = palloc(sizeof(FileSet));
-
- FileSetInit(fileset);
- MemoryContextSwitchTo(savectx);
-
- stream_fd = BufFileCreateFileSet(fileset, path);
-
- /* Remember the fileset for the next stream of the same transaction */
- ent->xid = xid;
- ent->stream_fileset = fileset;
- ent->subxact_fileset = NULL;
- }
+ stream_fd = BufFileCreateFileSet(stream_fileset, path);
else
{
- if (!found)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
-
/*
* Open the file and seek to the end of the file because we always
* append the changes file.
*/
- stream_fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
+ stream_fd = BufFileOpenFileSet(stream_fileset, path, O_RDWR, false);
BufFileSeek(stream_fd, 0, 0, SEEK_END);
}
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 5e5409d..d96b25d 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -278,10 +278,12 @@ BufFileCreateFileSet(FileSet *fileset, const char *name)
* with BufFileCreateFileSet in the same FileSet using the same name.
* The backend that created the file must have called BufFileClose() or
* BufFileExportFileSet() to make sure that it is ready to be opened by other
- * backends and render it read-only.
+ * backends and render it read-only. If missing_ok is true, it will return
+ * NULL if the file does not exist otherwise, it will throw an error.
*/
BufFile *
-BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
+BufFileOpenFileSet(FileSet *fileset, const char *name, int mode,
+ bool missing_ok)
{
BufFile *file;
char segment_name[MAXPGPATH];
@@ -318,10 +320,18 @@ BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
* name.
*/
if (nfiles == 0)
+ {
+ /* free the memory */
+ pfree(files);
+
+ if (missing_ok)
+ return NULL;
+
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not open temporary file \"%s\" from BufFile \"%s\": %m",
segment_name, name)));
+ }
file = makeBufFileCommon(nfiles);
file->files = files;
@@ -341,10 +351,11 @@ BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
* the FileSet to be cleaned up.
*
* Only one backend should attempt to delete a given name, and should know
- * that it exists and has been exported or closed.
+ * that it exists and has been exported or closed otherwise missing_ok should
+ * be passed true.
*/
void
-BufFileDeleteFileSet(FileSet *fileset, const char *name)
+BufFileDeleteFileSet(FileSet *fileset, const char *name, bool missing_ok)
{
char segment_name[MAXPGPATH];
int segment = 0;
@@ -358,7 +369,7 @@ BufFileDeleteFileSet(FileSet *fileset, const char *name)
for (;;)
{
FileSetSegmentName(segment_name, name, segment);
- if (!FileSetDelete(fileset, segment_name, true))
+ if (!FileSetDelete(fileset, segment_name, !missing_ok))
break;
found = true;
++segment;
@@ -366,7 +377,7 @@ BufFileDeleteFileSet(FileSet *fileset, const char *name)
CHECK_FOR_INTERRUPTS();
}
- if (!found)
+ if (!found && !missing_ok)
elog(ERROR, "could not delete unknown BufFile \"%s\"", name);
}
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index f7994d7..debf12e1 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -564,7 +564,7 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
lt = <s->tapes[i];
pg_itoa(i, filename);
- file = BufFileOpenFileSet(&fileset->fs, filename, O_RDONLY);
+ file = BufFileOpenFileSet(&fileset->fs, filename, O_RDONLY, false);
filesize = BufFileSize(file);
/*
diff --git a/src/backend/utils/sort/sharedtuplestore.c b/src/backend/utils/sort/sharedtuplestore.c
index 504ef1c..033088f 100644
--- a/src/backend/utils/sort/sharedtuplestore.c
+++ b/src/backend/utils/sort/sharedtuplestore.c
@@ -560,7 +560,8 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
sts_filename(name, accessor, accessor->read_participant);
accessor->read_file =
- BufFileOpenFileSet(&accessor->fileset->fs, name, O_RDONLY);
+ BufFileOpenFileSet(&accessor->fileset->fs, name, O_RDONLY,
+ false);
}
/* Seek and load the chunk header. */
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 143eada..7ae5ea2 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -49,8 +49,9 @@ extern long BufFileAppend(BufFile *target, BufFile *source);
extern BufFile *BufFileCreateFileSet(FileSet *fileset, const char *name);
extern void BufFileExportFileSet(BufFile *file);
extern BufFile *BufFileOpenFileSet(FileSet *fileset, const char *name,
- int mode);
-extern void BufFileDeleteFileSet(FileSet *fileset, const char *name);
+ int mode, bool missing_ok);
+extern void BufFileDeleteFileSet(FileSet *fileset, const char *name,
+ bool missing_ok);
extern void BufFileTruncateFileSet(BufFile *file, int fileno, off_t offset);
#endif /* BUFFILE_H */
--
1.8.3.1
On Fri, Aug 27, 2021 at 2:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, Aug 26, 2021 at 2:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Thanks, Sawada-San and Dilip for confirmation. I would like to commit
this and the second patch (the second one still needs some more
testing and review) for PG-15 as there is no bug per-se related to
this work in PG-14 but I see an argument to commit this for PG-14 to
keep the code (APIs) consistent. What do you think? Does anybody else
have any opinion on this?IMHO, this is a fair amount of refactoring and this is actually an improvement patch so we should push it to v15.
I think so too.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Fri, Aug 27, 2021 at 3:34 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Aug 27, 2021 at 10:56 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:On Thu, Aug 26, 2021 2:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
The patch looks good to me, I have rebased 0002 atop
this patch and also done some cosmetic fixes in 0002.Here are some comments for the 0002 patch.
1)
- * toplevel transaction. Each subscription has a separate set of files. + * toplevel transaction. Each subscription has a separate files.a separate files => a separate file
Done
2)
+ * Otherwise, just open it file for writing, in append mode.
*/open it file => open the file
Done
3) if (subxact_data.nsubxacts == 0) { - if (ent->subxact_fileset) - { - cleanup_subxact_info(); - FileSetDeleteAll(ent->subxact_fileset); - pfree(ent->subxact_fileset); - ent->subxact_fileset = NULL; - } + cleanup_subxact_info(); + BufFileDeleteFileSet(stream_fileset, path, true); +Before applying the patch, the code only invoke cleanup_subxact_info() when the
file exists. After applying the patch, it will invoke cleanup_subxact_info()
either the file exists or not. Is it correct ?I think this is just structure resetting at the end of the stream.
Earlier the hash was telling us whether we have ever dirtied that
structure or not but now we are not maintaining that hash so we just
reset it at the end of the stream. I don't think its any bad, in fact
I think this is much cheaper compared to maining the hash.4) /* - * If this is the first streamed segment, the file must not exist, so make - * sure we're the ones creating it. Otherwise just open the file for - * writing, in append mode. + * If this is the first streamed segment, create the changes file. + * Otherwise, just open it file for writing, in append mode. */ if (first_segment) - { ... - if (found) - ereport(ERROR, - (errcode(ERRCODE_PROTOCOL_VIOLATION), - errmsg_internal("incorrect first-segment flag for streamed replication transaction"))); ... - } + stream_fd = BufFileCreateFileSet(stream_fileset, path);Since the function BufFileCreateFileSet() doesn't check the file's existence,
the change here seems remove the file existence check which the old code did.Not really, we were just doing a sanity check of the in memory hash
entry, now we don't maintain that so we don't need to do that.
Thank you for updating the patch!
The patch looks good to me except for the below comment:
+ /* Delete the subxact file, if it exist. */
+ subxact_filename(path, subid, xid);
+ BufFileDeleteFileSet(stream_fileset, path, true);
s/it exist/it exists/
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Mon, Aug 30, 2021 at 6:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Aug 27, 2021 at 2:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, Aug 26, 2021 at 2:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Thanks, Sawada-San and Dilip for confirmation. I would like to commit
this and the second patch (the second one still needs some more
testing and review) for PG-15 as there is no bug per-se related to
this work in PG-14 but I see an argument to commit this for PG-14 to
keep the code (APIs) consistent. What do you think? Does anybody else
have any opinion on this?IMHO, this is a fair amount of refactoring and this is actually an improvement patch so we should push it to v15.
I think so too.
I have pushed the first patch in this series.
--
With Regards,
Amit Kapila.
From Mon, Aug 30, 2021 2:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Aug 27, 2021 at 3:34 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Aug 27, 2021 at 10:56 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
On Thu, Aug 26, 2021 2:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
The patch looks good to me, I have rebased 0002 atop this patch
and also done some cosmetic fixes in 0002.Thank you for updating the patch!
The patch looks good to me except for the below comment:
+ /* Delete the subxact file, if it exist. */ + subxact_filename(path, subid, xid); + BufFileDeleteFileSet(stream_fileset, path, true);s/it exist/it exists/
Except for Sawada-san's comment, the v6-0002 patch looks good to me.
Best regards,
Hou zj
On Tue, Aug 31, 2021 at 7:39 AM houzj.fnst@fujitsu.com <
houzj.fnst@fujitsu.com> wrote:
From Mon, Aug 30, 2021 2:15 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Fri, Aug 27, 2021 at 3:34 PM Dilip Kumar <dilipbalaut@gmail.com>
wrote:
On Fri, Aug 27, 2021 at 10:56 AM houzj.fnst@fujitsu.com <
houzj.fnst@fujitsu.com> wrote:
On Thu, Aug 26, 2021 2:18 PM Dilip Kumar <dilipbalaut@gmail.com>
wrote:
The patch looks good to me, I have rebased 0002 atop this patch
and also done some cosmetic fixes in 0002.Thank you for updating the patch!
The patch looks good to me except for the below comment:
+ /* Delete the subxact file, if it exist. */ + subxact_filename(path, subid, xid); + BufFileDeleteFileSet(stream_fileset, path, true);s/it exist/it exists/
Except for Sawada-san's comment, the v6-0002 patch looks good to me.
Thanks, I will wait for a day to see if there are any other comments on
this, after that I will fix this one issue and post the updated patch.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Fri, Aug 27, 2021 at 12:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Few comments on v6-0002*
=========================
1.
-BufFileDeleteFileSet(FileSet *fileset, const char *name)
+BufFileDeleteFileSet(FileSet *fileset, const char *name, bool missing_ok)
{
char segment_name[MAXPGPATH];
int segment = 0;
@@ -358,7 +369,7 @@ BufFileDeleteFileSet(FileSet *fileset, const char *name)
for (;;)
{
FileSetSegmentName(segment_name, name, segment);
- if (!FileSetDelete(fileset, segment_name, true))
+ if (!FileSetDelete(fileset, segment_name, !missing_ok))
I don't think the usage of missing_ok is correct here. If you see
FileSetDelete->PathNameDeleteTemporaryFile, it already tolerates that
the file doesn't exist but gives an error only when it is unable to
link. So, with this missing_ok users (say like worker.c) won't even
get errors when they are not able to remove files whereas I think the
need for the patch is to not get an error when the file doesn't exist.
I think you don't need to change anything in the way we invoke
FileSetDelete.
2.
-static HTAB *xidhash = NULL;
+static FileSet *stream_fileset = NULL;
Can we keep this in LogicalRepWorker and initialize it accordingly?
3.
+ /* Open the subxact file, if it does not exist, create it. */
+ fd = BufFileOpenFileSet(stream_fileset, path, O_RDWR, true);
+ if (fd == NULL)
+ fd = BufFileCreateFileSet(stream_fileset, path);
I think retaining the existing comment: "Create the subxact file if it
not already created, otherwise open the existing file." seems better
here.
4.
/*
- * If there is no subtransaction then nothing to do, but if already have
- * subxact file then delete that.
+ * If there are no subtransactions, there is nothing to be done, but if
+ * subxacts already exist, delete it.
*/
How about changing the above comment to something like: "Delete the
subxacts file, if exists"?
5. Can we slightly change the commit message as:
Optimize fileset usage in apply worker.
Use one fileset for the entire worker lifetime instead of using
separate filesets for each streaming transaction. Now, the
changes/subxacts files for every streaming transaction will be created
under the same fileset and the files will be deleted after the
transaction is completed.
This patch extends the BufFileOpenFileSet and BufFileDeleteFileSet
APIs to allow users to specify whether to give an error on missing
files.
--
With Regards,
Amit Kapila.
On Tue, Aug 31, 2021 at 3:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Aug 27, 2021 at 12:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Few comments on v6-0002* ========================= 1. -BufFileDeleteFileSet(FileSet *fileset, const char *name) +BufFileDeleteFileSet(FileSet *fileset, const char *name, bool missing_ok) { char segment_name[MAXPGPATH]; int segment = 0; @@ -358,7 +369,7 @@ BufFileDeleteFileSet(FileSet *fileset, const char *name) for (;;) { FileSetSegmentName(segment_name, name, segment); - if (!FileSetDelete(fileset, segment_name, true)) + if (!FileSetDelete(fileset, segment_name, !missing_ok))I don't think the usage of missing_ok is correct here. If you see
FileSetDelete->PathNameDeleteTemporaryFile, it already tolerates that
the file doesn't exist but gives an error only when it is unable to
link. So, with this missing_ok users (say like worker.c) won't even
get errors when they are not able to remove files whereas I think the
need for the patch is to not get an error when the file doesn't exist.
I think you don't need to change anything in the way we invoke
FileSetDelete.
Right, fixed.
2. -static HTAB *xidhash = NULL; +static FileSet *stream_fileset = NULL;Can we keep this in LogicalRepWorker and initialize it accordingly?
Done
3. + /* Open the subxact file, if it does not exist, create it. */ + fd = BufFileOpenFileSet(stream_fileset, path, O_RDWR, true); + if (fd == NULL) + fd = BufFileCreateFileSet(stream_fileset, path);I think retaining the existing comment: "Create the subxact file if it
not already created, otherwise open the existing file." seems better
here.
Done
4. /* - * If there is no subtransaction then nothing to do, but if already have - * subxact file then delete that. + * If there are no subtransactions, there is nothing to be done, but if + * subxacts already exist, delete it. */How about changing the above comment to something like: "Delete the
subxacts file, if exists"?
Done
5. Can we slightly change the commit message as:
Optimize fileset usage in apply worker.Use one fileset for the entire worker lifetime instead of using
separate filesets for each streaming transaction. Now, the
changes/subxacts files for every streaming transaction will be created
under the same fileset and the files will be deleted after the
transaction is completed.This patch extends the BufFileOpenFileSet and BufFileDeleteFileSet
APIs to allow users to specify whether to give an error on missing
files.
Done
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v7-0001-Optimize-fileset-usage-in-apply-worker.patchtext/x-patch; charset=US-ASCII; name=v7-0001-Optimize-fileset-usage-in-apply-worker.patchDownload
From fade8ae2779e8aeda0d1fee3902a93eaedb0187f Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Fri, 27 Aug 2021 11:49:12 +0530
Subject: [PATCH v7] Optimize fileset usage in apply worker
Use one fileset for the entire worker lifetime instead of using
separate filesets for each streaming transaction. Now, the
changes/subxacts files for every streaming transaction will be
created under the same fileset and the files will be deleted
after the transaction is completed.
This patch extends the BufFileOpenFileSet and BufFileDeleteFileSet
APIs to allow users to specify whether to give an error on missing
files.
---
src/backend/replication/logical/launcher.c | 6 +-
src/backend/replication/logical/worker.c | 249 +++++------------------------
src/backend/storage/file/buffile.c | 21 ++-
src/backend/utils/sort/logtape.c | 2 +-
src/backend/utils/sort/sharedtuplestore.c | 3 +-
src/include/replication/worker_internal.h | 11 +-
src/include/storage/buffile.h | 5 +-
7 files changed, 80 insertions(+), 217 deletions(-)
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 8b1772d..3fb4caa 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -379,6 +379,7 @@ retry:
worker->relid = relid;
worker->relstate = SUBREL_STATE_UNKNOWN;
worker->relstate_lsn = InvalidXLogRecPtr;
+ worker->stream_fileset = NULL;
worker->last_lsn = InvalidXLogRecPtr;
TIMESTAMP_NOBEGIN(worker->last_send_time);
TIMESTAMP_NOBEGIN(worker->last_recv_time);
@@ -648,8 +649,9 @@ logicalrep_worker_onexit(int code, Datum arg)
logicalrep_worker_detach();
- /* Cleanup filesets used for streaming transactions. */
- logicalrep_worker_cleanupfileset();
+ /* Cleanup fileset used for streaming transactions. */
+ if (MyLogicalRepWorker->stream_fileset != NULL)
+ FileSetDeleteAll(MyLogicalRepWorker->stream_fileset);
ApplyLauncherWakeup();
}
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index bfb7d1a..a222cb3 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -236,20 +236,6 @@ static ApplyErrorCallbackArg apply_error_callback_arg =
.ts = 0,
};
-/*
- * Stream xid hash entry. Whenever we see a new xid we create this entry in the
- * xidhash and along with it create the streaming file and store the fileset handle.
- * The subxact file is created iff there is any subxact info under this xid. This
- * entry is used on the subsequent streams for the xid to get the corresponding
- * fileset handles, so storing them in hash makes the search faster.
- */
-typedef struct StreamXidHash
-{
- TransactionId xid; /* xid is the hash key and must be first */
- FileSet *stream_fileset; /* file set for stream data */
- FileSet *subxact_fileset; /* file set for subxact info */
-} StreamXidHash;
-
static MemoryContext ApplyMessageContext = NULL;
MemoryContext ApplyContext = NULL;
@@ -269,12 +255,6 @@ static bool in_streamed_transaction = false;
static TransactionId stream_xid = InvalidTransactionId;
-/*
- * Hash table for storing the streaming xid information along with filesets
- * for streaming and subxact files.
- */
-static HTAB *xidhash = NULL;
-
/* BufFile handle of the current streaming file */
static BufFile *stream_fd = NULL;
@@ -1118,7 +1098,6 @@ static void
apply_handle_stream_start(StringInfo s)
{
bool first_segment;
- HASHCTL hash_ctl;
if (in_streamed_transaction)
ereport(ERROR,
@@ -1148,17 +1127,20 @@ apply_handle_stream_start(StringInfo s)
set_apply_error_context_xact(stream_xid, 0);
/*
- * Initialize the xidhash table if we haven't yet. This will be used for
+ * Initialize the stream_fileset if we haven't yet. This will be used for
* the entire duration of the apply worker so create it in permanent
* context.
*/
- if (xidhash == NULL)
+ if (MyLogicalRepWorker->stream_fileset == NULL)
{
- hash_ctl.keysize = sizeof(TransactionId);
- hash_ctl.entrysize = sizeof(StreamXidHash);
- hash_ctl.hcxt = ApplyContext;
- xidhash = hash_create("StreamXidHash", 1024, &hash_ctl,
- HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ MemoryContext oldctx;
+
+ oldctx = MemoryContextSwitchTo(ApplyContext);
+
+ MyLogicalRepWorker->stream_fileset = palloc(sizeof(FileSet));
+ FileSetInit(MyLogicalRepWorker->stream_fileset);
+
+ MemoryContextSwitchTo(oldctx);
}
/* open the spool file for this transaction */
@@ -1253,7 +1235,6 @@ apply_handle_stream_abort(StringInfo s)
BufFile *fd;
bool found = false;
char path[MAXPGPATH];
- StreamXidHash *ent;
set_apply_error_context_xact(subxid, 0);
@@ -1285,19 +1266,10 @@ apply_handle_stream_abort(StringInfo s)
return;
}
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
/* open the changes file */
changes_filename(path, MyLogicalRepWorker->subid, xid);
- fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
+ fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset, path,
+ O_RDWR, false);
/* OK, truncate the file at the right offset */
BufFileTruncateFileSet(fd, subxact_data.subxacts[subidx].fileno,
@@ -1327,7 +1299,6 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
int nchanges;
char path[MAXPGPATH];
char *buffer = NULL;
- StreamXidHash *ent;
MemoryContext oldcxt;
BufFile *fd;
@@ -1345,17 +1316,8 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
changes_filename(path, MyLogicalRepWorker->subid, xid);
elog(DEBUG1, "replaying changes from file \"%s\"", path);
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
- fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset, path, O_RDONLY,
+ false);
buffer = palloc(BLCKSZ);
initStringInfo(&s2);
@@ -2542,30 +2504,6 @@ UpdateWorkerStats(XLogRecPtr last_lsn, TimestampTz send_time, bool reply)
}
/*
- * Cleanup filesets.
- */
-void
-logicalrep_worker_cleanupfileset(void)
-{
- HASH_SEQ_STATUS status;
- StreamXidHash *hentry;
-
- /* Remove all the pending stream and subxact filesets. */
- if (xidhash)
- {
- hash_seq_init(&status, xidhash);
- while ((hentry = (StreamXidHash *) hash_seq_search(&status)) != NULL)
- {
- FileSetDeleteAll(hentry->stream_fileset);
-
- /* Delete the subxact fileset iff it is created. */
- if (hentry->subxact_fileset)
- FileSetDeleteAll(hentry->subxact_fileset);
- }
- }
-}
-
-/*
* Apply main loop.
*/
static void
@@ -3026,58 +2964,30 @@ subxact_info_write(Oid subid, TransactionId xid)
{
char path[MAXPGPATH];
Size len;
- StreamXidHash *ent;
BufFile *fd;
Assert(TransactionIdIsValid(xid));
- /* Find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- /* By this time we must have created the transaction entry */
- Assert(ent);
+ /* Get the subxact filename. */
+ subxact_filename(path, subid, xid);
- /*
- * If there is no subtransaction then nothing to do, but if already have
- * subxact file then delete that.
- */
+ /* Delete the subxacts file, if exists. */
if (subxact_data.nsubxacts == 0)
{
- if (ent->subxact_fileset)
- {
- cleanup_subxact_info();
- FileSetDeleteAll(ent->subxact_fileset);
- pfree(ent->subxact_fileset);
- ent->subxact_fileset = NULL;
- }
+ cleanup_subxact_info();
+ BufFileDeleteFileSet(MyLogicalRepWorker->stream_fileset, path, true);
+
return;
}
- subxact_filename(path, subid, xid);
-
/*
* Create the subxact file if it not already created, otherwise open the
* existing file.
*/
- if (ent->subxact_fileset == NULL)
- {
- MemoryContext oldctx;
-
- /*
- * We need to maintain fileset across multiple stream start/stop
- * calls. So, need to allocate it in a persistent context.
- */
- oldctx = MemoryContextSwitchTo(ApplyContext);
- ent->subxact_fileset = palloc(sizeof(FileSet));
- FileSetInit(ent->subxact_fileset);
- MemoryContextSwitchTo(oldctx);
-
- fd = BufFileCreateFileSet(ent->subxact_fileset, path);
- }
- else
- fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDWR);
+ fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset, path, O_RDWR,
+ true);
+ if (fd == NULL)
+ fd = BufFileCreateFileSet(MyLogicalRepWorker->stream_fileset, path);
len = sizeof(SubXactInfo) * subxact_data.nsubxacts;
@@ -3104,34 +3014,21 @@ subxact_info_read(Oid subid, TransactionId xid)
char path[MAXPGPATH];
Size len;
BufFile *fd;
- StreamXidHash *ent;
MemoryContext oldctx;
Assert(!subxact_data.subxacts);
Assert(subxact_data.nsubxacts == 0);
Assert(subxact_data.nsubxacts_max == 0);
- /* Find the stream xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
/*
- * If subxact_fileset is not valid that mean we don't have any subxact
- * info
+ * Open the subxact file for the input streaming xid, just return if the
+ * file does not exist.
*/
- if (ent->subxact_fileset == NULL)
- return;
-
subxact_filename(path, subid, xid);
-
- fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset, path, O_RDONLY,
+ true);
+ if (fd == NULL)
+ return;
/* read number of subxact items */
if (BufFileRead(fd, &subxact_data.nsubxacts,
@@ -3267,42 +3164,20 @@ changes_filename(char *path, Oid subid, TransactionId xid)
* Cleanup files for a subscription / toplevel transaction.
*
* Remove files with serialized changes and subxact info for a particular
- * toplevel transaction. Each subscription has a separate set of files.
+ * toplevel transaction. Each subscription has a separate file.
*/
static void
stream_cleanup_files(Oid subid, TransactionId xid)
{
char path[MAXPGPATH];
- StreamXidHash *ent;
-
- /* Find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
- /* Delete the change file and release the stream fileset memory */
+ /* Delete the changes file. */
changes_filename(path, subid, xid);
- FileSetDeleteAll(ent->stream_fileset);
- pfree(ent->stream_fileset);
- ent->stream_fileset = NULL;
+ BufFileDeleteFileSet(MyLogicalRepWorker->stream_fileset, path, false);
- /* Delete the subxact file and release the memory, if it exist */
- if (ent->subxact_fileset)
- {
- subxact_filename(path, subid, xid);
- FileSetDeleteAll(ent->subxact_fileset);
- pfree(ent->subxact_fileset);
- ent->subxact_fileset = NULL;
- }
-
- /* Remove the xid entry from the stream xid hash */
- hash_search(xidhash, (void *) &xid, HASH_REMOVE, NULL);
+ /* Delete the subxact file, if it exists. */
+ subxact_filename(path, subid, xid);
+ BufFileDeleteFileSet(MyLogicalRepWorker->stream_fileset, path, true);
}
/*
@@ -3312,8 +3187,8 @@ stream_cleanup_files(Oid subid, TransactionId xid)
*
* Open a file for streamed changes from a toplevel transaction identified
* by stream_xid (global variable). If it's the first chunk of streamed
- * changes for this transaction, initialize the fileset and create the buffile,
- * otherwise open the previously created file.
+ * changes for this transaction, create the buffile, otherwise open the
+ * previously created file.
*
* This can only be called at the beginning of a "streaming" block, i.e.
* between stream_start/stream_stop messages from the upstream.
@@ -3322,20 +3197,13 @@ static void
stream_open_file(Oid subid, TransactionId xid, bool first_segment)
{
char path[MAXPGPATH];
- bool found;
MemoryContext oldcxt;
- StreamXidHash *ent;
Assert(in_streamed_transaction);
Assert(OidIsValid(subid));
Assert(TransactionIdIsValid(xid));
Assert(stream_fd == NULL);
- /* create or find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_ENTER,
- &found);
changes_filename(path, subid, xid);
elog(DEBUG1, "opening file \"%s\" for streamed changes", path);
@@ -3347,49 +3215,20 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
oldcxt = MemoryContextSwitchTo(LogicalStreamingContext);
/*
- * If this is the first streamed segment, the file must not exist, so make
- * sure we're the ones creating it. Otherwise just open the file for
- * writing, in append mode.
+ * If this is the first streamed segment, create the changes file.
+ * Otherwise, just open the file for writing, in append mode.
*/
if (first_segment)
- {
- MemoryContext savectx;
- FileSet *fileset;
-
- if (found)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
-
- /*
- * We need to maintain fileset across multiple stream start/stop
- * calls. So, need to allocate it in a persistent context.
- */
- savectx = MemoryContextSwitchTo(ApplyContext);
- fileset = palloc(sizeof(FileSet));
-
- FileSetInit(fileset);
- MemoryContextSwitchTo(savectx);
-
- stream_fd = BufFileCreateFileSet(fileset, path);
-
- /* Remember the fileset for the next stream of the same transaction */
- ent->xid = xid;
- ent->stream_fileset = fileset;
- ent->subxact_fileset = NULL;
- }
+ stream_fd = BufFileCreateFileSet(MyLogicalRepWorker->stream_fileset,
+ path);
else
{
- if (!found)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
-
/*
* Open the file and seek to the end of the file because we always
* append the changes file.
*/
- stream_fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
+ stream_fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset,
+ path, O_RDWR, false);
BufFileSeek(stream_fd, 0, 0, SEEK_END);
}
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 5e5409d..5b5a6e1 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -278,10 +278,12 @@ BufFileCreateFileSet(FileSet *fileset, const char *name)
* with BufFileCreateFileSet in the same FileSet using the same name.
* The backend that created the file must have called BufFileClose() or
* BufFileExportFileSet() to make sure that it is ready to be opened by other
- * backends and render it read-only.
+ * backends and render it read-only. If missing_ok is true, it will return
+ * NULL if the file does not exist otherwise, it will throw an error.
*/
BufFile *
-BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
+BufFileOpenFileSet(FileSet *fileset, const char *name, int mode,
+ bool missing_ok)
{
BufFile *file;
char segment_name[MAXPGPATH];
@@ -318,10 +320,18 @@ BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
* name.
*/
if (nfiles == 0)
+ {
+ /* free the memory */
+ pfree(files);
+
+ if (missing_ok)
+ return NULL;
+
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not open temporary file \"%s\" from BufFile \"%s\": %m",
segment_name, name)));
+ }
file = makeBufFileCommon(nfiles);
file->files = files;
@@ -341,10 +351,11 @@ BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
* the FileSet to be cleaned up.
*
* Only one backend should attempt to delete a given name, and should know
- * that it exists and has been exported or closed.
+ * that it exists and has been exported or closed otherwise missing_ok should
+ * be passed true.
*/
void
-BufFileDeleteFileSet(FileSet *fileset, const char *name)
+BufFileDeleteFileSet(FileSet *fileset, const char *name, bool missing_ok)
{
char segment_name[MAXPGPATH];
int segment = 0;
@@ -366,7 +377,7 @@ BufFileDeleteFileSet(FileSet *fileset, const char *name)
CHECK_FOR_INTERRUPTS();
}
- if (!found)
+ if (!found && !missing_ok)
elog(ERROR, "could not delete unknown BufFile \"%s\"", name);
}
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index f7994d7..debf12e1 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -564,7 +564,7 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
lt = <s->tapes[i];
pg_itoa(i, filename);
- file = BufFileOpenFileSet(&fileset->fs, filename, O_RDONLY);
+ file = BufFileOpenFileSet(&fileset->fs, filename, O_RDONLY, false);
filesize = BufFileSize(file);
/*
diff --git a/src/backend/utils/sort/sharedtuplestore.c b/src/backend/utils/sort/sharedtuplestore.c
index 504ef1c..033088f 100644
--- a/src/backend/utils/sort/sharedtuplestore.c
+++ b/src/backend/utils/sort/sharedtuplestore.c
@@ -560,7 +560,8 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
sts_filename(name, accessor, accessor->read_participant);
accessor->read_file =
- BufFileOpenFileSet(&accessor->fileset->fs, name, O_RDONLY);
+ BufFileOpenFileSet(&accessor->fileset->fs, name, O_RDONLY,
+ false);
}
/* Seek and load the chunk header. */
diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h
index a6c9d4e..1a2437a 100644
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -50,6 +50,16 @@ typedef struct LogicalRepWorker
XLogRecPtr relstate_lsn;
slock_t relmutex;
+ /*
+ * The fileset is used by the worker to create the changes and subxact
+ * files for the streaming transaction. Upon the arrival of the first
+ * streaming transaction, the fileset will be initialized, and it will be
+ * deleted when the worker exits. Under this, separate buffiles would be
+ * created for each transaction and would be deleted after the transaction
+ * is completed.
+ */
+ FileSet *stream_fileset;
+
/* Stats. */
XLogRecPtr last_lsn;
TimestampTz last_send_time;
@@ -79,7 +89,6 @@ extern void logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname,
extern void logicalrep_worker_stop(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup_ptr(LogicalRepWorker *worker);
-extern void logicalrep_worker_cleanupfileset(void);
extern int logicalrep_sync_worker_count(Oid subid);
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 143eada..7ae5ea2 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -49,8 +49,9 @@ extern long BufFileAppend(BufFile *target, BufFile *source);
extern BufFile *BufFileCreateFileSet(FileSet *fileset, const char *name);
extern void BufFileExportFileSet(BufFile *file);
extern BufFile *BufFileOpenFileSet(FileSet *fileset, const char *name,
- int mode);
-extern void BufFileDeleteFileSet(FileSet *fileset, const char *name);
+ int mode, bool missing_ok);
+extern void BufFileDeleteFileSet(FileSet *fileset, const char *name,
+ bool missing_ok);
extern void BufFileTruncateFileSet(BufFile *file, int fileno, off_t offset);
#endif /* BUFFILE_H */
--
1.8.3.1
On Wed, Sep 1, 2021 at 1:53 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
The latest patch looks good to me. I have made some changes in the
comments, see attached. I am planning to push this tomorrow unless you
or others have any comments on it.
--
With Regards,
Amit Kapila.
Attachments:
v8-0001-Optimize-fileset-usage-in-apply-worker.patchapplication/octet-stream; name=v8-0001-Optimize-fileset-usage-in-apply-worker.patchDownload
From 5c9712f210e966940448792e667c3496f78a642f Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Fri, 27 Aug 2021 11:49:12 +0530
Subject: [PATCH v8] Optimize fileset usage in apply worker.
Use one fileset for the entire worker lifetime instead of using
separate filesets for each streaming transaction. Now, the
changes/subxacts files for every streaming transaction will be
created under the same fileset and the files will be deleted
after the transaction is completed.
This patch extends the BufFileOpenFileSet and BufFileDeleteFileSet
APIs to allow users to specify whether to give an error on missing
files.
Author: Dilip Kumar, based on suggestion by Thomas Munro
Reviewed-by: Hou Zhijie, Masahiko Sawada, Amit Kapila
Discussion: https://postgr.es/m/E1mCC6U-0004Ik-Fs@gemulon.postgresql.org
---
src/backend/replication/logical/launcher.c | 6 +-
src/backend/replication/logical/worker.c | 257 ++++++-----------------------
src/backend/storage/file/buffile.c | 22 ++-
src/backend/utils/sort/logtape.c | 2 +-
src/backend/utils/sort/sharedtuplestore.c | 3 +-
src/include/replication/worker_internal.h | 10 +-
src/include/storage/buffile.h | 5 +-
7 files changed, 86 insertions(+), 219 deletions(-)
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 8b1772d..3fb4caa 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -379,6 +379,7 @@ retry:
worker->relid = relid;
worker->relstate = SUBREL_STATE_UNKNOWN;
worker->relstate_lsn = InvalidXLogRecPtr;
+ worker->stream_fileset = NULL;
worker->last_lsn = InvalidXLogRecPtr;
TIMESTAMP_NOBEGIN(worker->last_send_time);
TIMESTAMP_NOBEGIN(worker->last_recv_time);
@@ -648,8 +649,9 @@ logicalrep_worker_onexit(int code, Datum arg)
logicalrep_worker_detach();
- /* Cleanup filesets used for streaming transactions. */
- logicalrep_worker_cleanupfileset();
+ /* Cleanup fileset used for streaming transactions. */
+ if (MyLogicalRepWorker->stream_fileset != NULL)
+ FileSetDeleteAll(MyLogicalRepWorker->stream_fileset);
ApplyLauncherWakeup();
}
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index bfb7d1a..8d96c92 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -236,20 +236,6 @@ static ApplyErrorCallbackArg apply_error_callback_arg =
.ts = 0,
};
-/*
- * Stream xid hash entry. Whenever we see a new xid we create this entry in the
- * xidhash and along with it create the streaming file and store the fileset handle.
- * The subxact file is created iff there is any subxact info under this xid. This
- * entry is used on the subsequent streams for the xid to get the corresponding
- * fileset handles, so storing them in hash makes the search faster.
- */
-typedef struct StreamXidHash
-{
- TransactionId xid; /* xid is the hash key and must be first */
- FileSet *stream_fileset; /* file set for stream data */
- FileSet *subxact_fileset; /* file set for subxact info */
-} StreamXidHash;
-
static MemoryContext ApplyMessageContext = NULL;
MemoryContext ApplyContext = NULL;
@@ -269,12 +255,6 @@ static bool in_streamed_transaction = false;
static TransactionId stream_xid = InvalidTransactionId;
-/*
- * Hash table for storing the streaming xid information along with filesets
- * for streaming and subxact files.
- */
-static HTAB *xidhash = NULL;
-
/* BufFile handle of the current streaming file */
static BufFile *stream_fd = NULL;
@@ -1118,7 +1098,6 @@ static void
apply_handle_stream_start(StringInfo s)
{
bool first_segment;
- HASHCTL hash_ctl;
if (in_streamed_transaction)
ereport(ERROR,
@@ -1148,17 +1127,23 @@ apply_handle_stream_start(StringInfo s)
set_apply_error_context_xact(stream_xid, 0);
/*
- * Initialize the xidhash table if we haven't yet. This will be used for
- * the entire duration of the apply worker so create it in permanent
- * context.
+ * Initialize the worker's stream_fileset if we haven't yet. This will be
+ * used for the entire duration of the worker so create it in a permanent
+ * context. We create this on the very first streaming message from any
+ * transaction and then use it for this and other streaming transactions.
+ * Now, we could create a fileset at the start of the worker as well but
+ * then we won't be sure that it will ever be used.
*/
- if (xidhash == NULL)
+ if (MyLogicalRepWorker->stream_fileset == NULL)
{
- hash_ctl.keysize = sizeof(TransactionId);
- hash_ctl.entrysize = sizeof(StreamXidHash);
- hash_ctl.hcxt = ApplyContext;
- xidhash = hash_create("StreamXidHash", 1024, &hash_ctl,
- HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ MemoryContext oldctx;
+
+ oldctx = MemoryContextSwitchTo(ApplyContext);
+
+ MyLogicalRepWorker->stream_fileset = palloc(sizeof(FileSet));
+ FileSetInit(MyLogicalRepWorker->stream_fileset);
+
+ MemoryContextSwitchTo(oldctx);
}
/* open the spool file for this transaction */
@@ -1253,7 +1238,6 @@ apply_handle_stream_abort(StringInfo s)
BufFile *fd;
bool found = false;
char path[MAXPGPATH];
- StreamXidHash *ent;
set_apply_error_context_xact(subxid, 0);
@@ -1285,19 +1269,10 @@ apply_handle_stream_abort(StringInfo s)
return;
}
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
/* open the changes file */
changes_filename(path, MyLogicalRepWorker->subid, xid);
- fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
+ fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset, path,
+ O_RDWR, false);
/* OK, truncate the file at the right offset */
BufFileTruncateFileSet(fd, subxact_data.subxacts[subidx].fileno,
@@ -1327,7 +1302,6 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
int nchanges;
char path[MAXPGPATH];
char *buffer = NULL;
- StreamXidHash *ent;
MemoryContext oldcxt;
BufFile *fd;
@@ -1345,17 +1319,8 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
changes_filename(path, MyLogicalRepWorker->subid, xid);
elog(DEBUG1, "replaying changes from file \"%s\"", path);
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
- fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset, path, O_RDONLY,
+ false);
buffer = palloc(BLCKSZ);
initStringInfo(&s2);
@@ -2542,30 +2507,6 @@ UpdateWorkerStats(XLogRecPtr last_lsn, TimestampTz send_time, bool reply)
}
/*
- * Cleanup filesets.
- */
-void
-logicalrep_worker_cleanupfileset(void)
-{
- HASH_SEQ_STATUS status;
- StreamXidHash *hentry;
-
- /* Remove all the pending stream and subxact filesets. */
- if (xidhash)
- {
- hash_seq_init(&status, xidhash);
- while ((hentry = (StreamXidHash *) hash_seq_search(&status)) != NULL)
- {
- FileSetDeleteAll(hentry->stream_fileset);
-
- /* Delete the subxact fileset iff it is created. */
- if (hentry->subxact_fileset)
- FileSetDeleteAll(hentry->subxact_fileset);
- }
- }
-}
-
-/*
* Apply main loop.
*/
static void
@@ -3026,58 +2967,30 @@ subxact_info_write(Oid subid, TransactionId xid)
{
char path[MAXPGPATH];
Size len;
- StreamXidHash *ent;
BufFile *fd;
Assert(TransactionIdIsValid(xid));
- /* Find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- /* By this time we must have created the transaction entry */
- Assert(ent);
+ /* construct the subxact filename */
+ subxact_filename(path, subid, xid);
- /*
- * If there is no subtransaction then nothing to do, but if already have
- * subxact file then delete that.
- */
+ /* Delete the subxacts file, if exists. */
if (subxact_data.nsubxacts == 0)
{
- if (ent->subxact_fileset)
- {
- cleanup_subxact_info();
- FileSetDeleteAll(ent->subxact_fileset);
- pfree(ent->subxact_fileset);
- ent->subxact_fileset = NULL;
- }
+ cleanup_subxact_info();
+ BufFileDeleteFileSet(MyLogicalRepWorker->stream_fileset, path, true);
+
return;
}
- subxact_filename(path, subid, xid);
-
/*
* Create the subxact file if it not already created, otherwise open the
* existing file.
*/
- if (ent->subxact_fileset == NULL)
- {
- MemoryContext oldctx;
-
- /*
- * We need to maintain fileset across multiple stream start/stop
- * calls. So, need to allocate it in a persistent context.
- */
- oldctx = MemoryContextSwitchTo(ApplyContext);
- ent->subxact_fileset = palloc(sizeof(FileSet));
- FileSetInit(ent->subxact_fileset);
- MemoryContextSwitchTo(oldctx);
-
- fd = BufFileCreateFileSet(ent->subxact_fileset, path);
- }
- else
- fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDWR);
+ fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset, path, O_RDWR,
+ true);
+ if (fd == NULL)
+ fd = BufFileCreateFileSet(MyLogicalRepWorker->stream_fileset, path);
len = sizeof(SubXactInfo) * subxact_data.nsubxacts;
@@ -3104,34 +3017,21 @@ subxact_info_read(Oid subid, TransactionId xid)
char path[MAXPGPATH];
Size len;
BufFile *fd;
- StreamXidHash *ent;
MemoryContext oldctx;
Assert(!subxact_data.subxacts);
Assert(subxact_data.nsubxacts == 0);
Assert(subxact_data.nsubxacts_max == 0);
- /* Find the stream xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
-
/*
- * If subxact_fileset is not valid that mean we don't have any subxact
- * info
+ * If the subxact file doesn't exist that means we don't have any subxact
+ * info.
*/
- if (ent->subxact_fileset == NULL)
- return;
-
subxact_filename(path, subid, xid);
-
- fd = BufFileOpenFileSet(ent->subxact_fileset, path, O_RDONLY);
+ fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset, path, O_RDONLY,
+ true);
+ if (fd == NULL)
+ return;
/* read number of subxact items */
if (BufFileRead(fd, &subxact_data.nsubxacts,
@@ -3267,42 +3167,21 @@ changes_filename(char *path, Oid subid, TransactionId xid)
* Cleanup files for a subscription / toplevel transaction.
*
* Remove files with serialized changes and subxact info for a particular
- * toplevel transaction. Each subscription has a separate set of files.
+ * toplevel transaction. Each subscription has a separate set of files
+ * for any toplevel transaction.
*/
static void
stream_cleanup_files(Oid subid, TransactionId xid)
{
char path[MAXPGPATH];
- StreamXidHash *ent;
-
- /* Find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_FIND,
- NULL);
- if (!ent)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("transaction %u not found in stream XID hash table",
- xid)));
- /* Delete the change file and release the stream fileset memory */
+ /* Delete the changes file. */
changes_filename(path, subid, xid);
- FileSetDeleteAll(ent->stream_fileset);
- pfree(ent->stream_fileset);
- ent->stream_fileset = NULL;
+ BufFileDeleteFileSet(MyLogicalRepWorker->stream_fileset, path, false);
- /* Delete the subxact file and release the memory, if it exist */
- if (ent->subxact_fileset)
- {
- subxact_filename(path, subid, xid);
- FileSetDeleteAll(ent->subxact_fileset);
- pfree(ent->subxact_fileset);
- ent->subxact_fileset = NULL;
- }
-
- /* Remove the xid entry from the stream xid hash */
- hash_search(xidhash, (void *) &xid, HASH_REMOVE, NULL);
+ /* Delete the subxact file, if it exists. */
+ subxact_filename(path, subid, xid);
+ BufFileDeleteFileSet(MyLogicalRepWorker->stream_fileset, path, true);
}
/*
@@ -3312,8 +3191,8 @@ stream_cleanup_files(Oid subid, TransactionId xid)
*
* Open a file for streamed changes from a toplevel transaction identified
* by stream_xid (global variable). If it's the first chunk of streamed
- * changes for this transaction, initialize the fileset and create the buffile,
- * otherwise open the previously created file.
+ * changes for this transaction, create the buffile, otherwise open the
+ * previously created file.
*
* This can only be called at the beginning of a "streaming" block, i.e.
* between stream_start/stream_stop messages from the upstream.
@@ -3322,20 +3201,13 @@ static void
stream_open_file(Oid subid, TransactionId xid, bool first_segment)
{
char path[MAXPGPATH];
- bool found;
MemoryContext oldcxt;
- StreamXidHash *ent;
Assert(in_streamed_transaction);
Assert(OidIsValid(subid));
Assert(TransactionIdIsValid(xid));
Assert(stream_fd == NULL);
- /* create or find the xid entry in the xidhash */
- ent = (StreamXidHash *) hash_search(xidhash,
- (void *) &xid,
- HASH_ENTER,
- &found);
changes_filename(path, subid, xid);
elog(DEBUG1, "opening file \"%s\" for streamed changes", path);
@@ -3347,49 +3219,20 @@ stream_open_file(Oid subid, TransactionId xid, bool first_segment)
oldcxt = MemoryContextSwitchTo(LogicalStreamingContext);
/*
- * If this is the first streamed segment, the file must not exist, so make
- * sure we're the ones creating it. Otherwise just open the file for
- * writing, in append mode.
+ * If this is the first streamed segment, create the changes file.
+ * Otherwise, just open the file for writing, in append mode.
*/
if (first_segment)
- {
- MemoryContext savectx;
- FileSet *fileset;
-
- if (found)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
-
- /*
- * We need to maintain fileset across multiple stream start/stop
- * calls. So, need to allocate it in a persistent context.
- */
- savectx = MemoryContextSwitchTo(ApplyContext);
- fileset = palloc(sizeof(FileSet));
-
- FileSetInit(fileset);
- MemoryContextSwitchTo(savectx);
-
- stream_fd = BufFileCreateFileSet(fileset, path);
-
- /* Remember the fileset for the next stream of the same transaction */
- ent->xid = xid;
- ent->stream_fileset = fileset;
- ent->subxact_fileset = NULL;
- }
+ stream_fd = BufFileCreateFileSet(MyLogicalRepWorker->stream_fileset,
+ path);
else
{
- if (!found)
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg_internal("incorrect first-segment flag for streamed replication transaction")));
-
/*
* Open the file and seek to the end of the file because we always
* append the changes file.
*/
- stream_fd = BufFileOpenFileSet(ent->stream_fileset, path, O_RDWR);
+ stream_fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset,
+ path, O_RDWR, false);
BufFileSeek(stream_fd, 0, 0, SEEK_END);
}
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 5e5409d..ff3aa67 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -278,10 +278,13 @@ BufFileCreateFileSet(FileSet *fileset, const char *name)
* with BufFileCreateFileSet in the same FileSet using the same name.
* The backend that created the file must have called BufFileClose() or
* BufFileExportFileSet() to make sure that it is ready to be opened by other
- * backends and render it read-only.
+ * backends and render it read-only. If missing_ok is true, which indicates
+ * that missing files can be safely ignored, then return NULL if the BufFile
+ * with the given name is not found, otherwise, throw an error.
*/
BufFile *
-BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
+BufFileOpenFileSet(FileSet *fileset, const char *name, int mode,
+ bool missing_ok)
{
BufFile *file;
char segment_name[MAXPGPATH];
@@ -318,10 +321,18 @@ BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
* name.
*/
if (nfiles == 0)
+ {
+ /* free the memory */
+ pfree(files);
+
+ if (missing_ok)
+ return NULL;
+
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not open temporary file \"%s\" from BufFile \"%s\": %m",
segment_name, name)));
+ }
file = makeBufFileCommon(nfiles);
file->files = files;
@@ -341,10 +352,11 @@ BufFileOpenFileSet(FileSet *fileset, const char *name, int mode)
* the FileSet to be cleaned up.
*
* Only one backend should attempt to delete a given name, and should know
- * that it exists and has been exported or closed.
+ * that it exists and has been exported or closed otherwise missing_ok should
+ * be passed true.
*/
void
-BufFileDeleteFileSet(FileSet *fileset, const char *name)
+BufFileDeleteFileSet(FileSet *fileset, const char *name, bool missing_ok)
{
char segment_name[MAXPGPATH];
int segment = 0;
@@ -366,7 +378,7 @@ BufFileDeleteFileSet(FileSet *fileset, const char *name)
CHECK_FOR_INTERRUPTS();
}
- if (!found)
+ if (!found && !missing_ok)
elog(ERROR, "could not delete unknown BufFile \"%s\"", name);
}
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index f7994d7..debf12e1 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -564,7 +564,7 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
lt = <s->tapes[i];
pg_itoa(i, filename);
- file = BufFileOpenFileSet(&fileset->fs, filename, O_RDONLY);
+ file = BufFileOpenFileSet(&fileset->fs, filename, O_RDONLY, false);
filesize = BufFileSize(file);
/*
diff --git a/src/backend/utils/sort/sharedtuplestore.c b/src/backend/utils/sort/sharedtuplestore.c
index 504ef1c..033088f 100644
--- a/src/backend/utils/sort/sharedtuplestore.c
+++ b/src/backend/utils/sort/sharedtuplestore.c
@@ -560,7 +560,8 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
sts_filename(name, accessor, accessor->read_participant);
accessor->read_file =
- BufFileOpenFileSet(&accessor->fileset->fs, name, O_RDONLY);
+ BufFileOpenFileSet(&accessor->fileset->fs, name, O_RDONLY,
+ false);
}
/* Seek and load the chunk header. */
diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h
index a6c9d4e..c00be2a 100644
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -50,6 +50,15 @@ typedef struct LogicalRepWorker
XLogRecPtr relstate_lsn;
slock_t relmutex;
+ /*
+ * Used to create the changes and subxact files for the streaming
+ * transactions. Upon the arrival of the first streaming transaction, the
+ * fileset will be initialized, and it will be deleted when the worker
+ * exits. Under this, separate buffiles would be created for each
+ * transaction which will be deleted after the transaction is finished.
+ */
+ FileSet *stream_fileset;
+
/* Stats. */
XLogRecPtr last_lsn;
TimestampTz last_send_time;
@@ -79,7 +88,6 @@ extern void logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname,
extern void logicalrep_worker_stop(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup(Oid subid, Oid relid);
extern void logicalrep_worker_wakeup_ptr(LogicalRepWorker *worker);
-extern void logicalrep_worker_cleanupfileset(void);
extern int logicalrep_sync_worker_count(Oid subid);
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 143eada..7ae5ea2 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -49,8 +49,9 @@ extern long BufFileAppend(BufFile *target, BufFile *source);
extern BufFile *BufFileCreateFileSet(FileSet *fileset, const char *name);
extern void BufFileExportFileSet(BufFile *file);
extern BufFile *BufFileOpenFileSet(FileSet *fileset, const char *name,
- int mode);
-extern void BufFileDeleteFileSet(FileSet *fileset, const char *name);
+ int mode, bool missing_ok);
+extern void BufFileDeleteFileSet(FileSet *fileset, const char *name,
+ bool missing_ok);
extern void BufFileTruncateFileSet(BufFile *file, int fileno, off_t offset);
#endif /* BUFFILE_H */
--
1.8.3.1
On Wed, Sep 1, 2021 at 5:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Sep 1, 2021 at 1:53 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
The latest patch looks good to me. I have made some changes in the
comments, see attached. I am planning to push this tomorrow unless you
or others have any comments on it.
These comments changes look good to me.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Sep 1, 2021 at 5:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Sep 1, 2021 at 5:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Sep 1, 2021 at 1:53 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
The latest patch looks good to me. I have made some changes in the
comments, see attached. I am planning to push this tomorrow unless you
or others have any comments on it.These comments changes look good to me.
Pushed.
--
With Regards,
Amit Kapila.