memory leak in pgoutput

Started by by Yangabout 1 year ago10 messages
#1by Yang
mobile.yang@outlook.com
1 attachment(s)

Hello,

I recently noticed a unusually large memory consumption in pgoutput where
"RelationSyncCache" is maintaining approximately 3 GB of memory while
handling 15,000 tables.

Upon investigating the cause, I found that "tts_tupleDescriptor" in both
"old_slot" and "new_slot" wouldn't be freed before the reusing the entry.
The refcount of the tuple descriptor is initialized as -1 in init_tuple_slot(),
which means it will not be refcounted. So, when cleaning the outdated slots,
ExecDropSingleTupleTableSlot() would omit tuple descriptors.

To address this issue, calling FreeTupleDesc() to release the tuple descriptor
before ExecDropSingleTupleTableSlot() might be a suitable solution. However,
I am uncertain whether this approach is appropriate.

Reproduct
=========
It's easy to reproduce the issue with sysbench:

1. Create emtpy tables on primary.
```
sysbench oltp_read_only --db-driver=pgsql \
--pgsql-port=5432 \
--pgsql-db=postgres \
--pgsql-user=postgres \
--tables=15000 --table_size=0 \
--report-interval=1 \
--threads=10 prepare
```

2. Create a standby and promote it.

3. Create publication on primary and subscription on standby.
```
CREATE PUBLICATION pub1 FOR ALL TABLES;
CREATE SUBSCRIPTION sub1
CONNECTION 'port=5432 user=postgres dbname=postgres'
PUBLICATION pub1
WITH (enabled, create_slot, slot_name='pub1_to_sub1', copy_data=false);
```

4. Bench on primary.
```
sysbench oltp_write_only --db-driver=pgsql \
--pgsql-port=5432 \
--pgsql-db=postgres \
--pgsql-user=postgres \
--tables=15000 --table_size=100 \
--report-interval=1 \
--threads=10 run \
--time=180
```

Tested in PostgreSQL 17, the memory consumption of walsender is 804MB after the
benchmark, and it decreases to 441MB after applying the patch.

Thanks for your feedback.

Kind Regards,
Boyu Yang

Attachments:

0001-Fix-memory-leak-in-pgoutput.patchapplication/octet-stream; name=0001-Fix-memory-leak-in-pgoutput.patchDownload
From a53c9c5a8d9e44998d0d7f1ca7b293ae793bb051 Mon Sep 17 00:00:00 2001
From: "yangboyu.yby" <yangboyu.yby@alibaba-inc.com>
Date: Sat, 16 Nov 2024 16:26:02 +0800
Subject: [PATCH] Fix memory leak in pgoutput

---
 src/backend/replication/pgoutput/pgoutput.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 00e7024563..b0a5884dd4 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -2055,9 +2055,19 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
 		 * Tuple slots cleanups. (Will be rebuilt later if needed).
 		 */
 		if (entry->old_slot)
+		{
+			Assert(entry->old_slot->tts_tupleDescriptor->tdrefcount == -1);
+
+			FreeTupleDesc(entry->old_slot->tts_tupleDescriptor);
 			ExecDropSingleTupleTableSlot(entry->old_slot);
+		}
 		if (entry->new_slot)
+		{
+			Assert(entry->new_slot->tts_tupleDescriptor->tdrefcount == -1);
+
+			FreeTupleDesc(entry->new_slot->tts_tupleDescriptor);
 			ExecDropSingleTupleTableSlot(entry->new_slot);
+		}
 
 		entry->old_slot = NULL;
 		entry->new_slot = NULL;
-- 
2.32.0.3.g01195cf9f

#2Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: by Yang (#1)
RE: memory leak in pgoutput

On Saturday, November 16, 2024 4:37 PM by Yang <mobile.yang@outlook.com> wrote:

I recently noticed a unusually large memory consumption in pgoutput where
"RelationSyncCache" is maintaining approximately 3 GB of memory while
handling 15,000 tables.

Upon investigating the cause, I found that "tts_tupleDescriptor" in both
"old_slot" and "new_slot" wouldn't be freed before the reusing the entry.
The refcount of the tuple descriptor is initialized as -1 in init_tuple_slot(),
which means it will not be refcounted. So, when cleaning the outdated slots,
ExecDropSingleTupleTableSlot() would omit tuple descriptors.

Thanks for reporting the issue. I also confirmed that the bug exists and
your analysis is correct.

To address this issue, calling FreeTupleDesc() to release the tuple descriptor
before ExecDropSingleTupleTableSlot() might be a suitable solution. However,
I am uncertain whether this approach is appropriate.

I think the proposed change is reasonable.

I considered another approach which is to mark tupledesc reference-counted
instead. But to make that work, we lack a global resource owner which is
required by IncrTupleDescRefCount/DecrTupleDescRefCount. (pgoutput doesn't
create its own resowner, only the toptransaction's resowner is available.)
Also, pgoutput doesn't reference the tupledesc in other places so it doesn't
seems useful to mark them reference-counted.

But I think there is an issue in the attached patch:

+ FreeTupleDesc(entry->old_slot->tts_tupleDescriptor);
ExecDropSingleTupleTableSlot(entry->old_slot);

Here, after freeing the tupledesc, the ExecDropSingleTupleTableSlot will still
access the freed tupledesc->tdrefcount which is an illegal memory access.

I think we can do something like below instead:

+			TupleDesc	desc = entry->old_slot->tts_tupleDescriptor;
+
+			Assert(desc->tdrefcount == -1);
+
 			ExecDropSingleTupleTableSlot(entry->old_slot);
+			FreeTupleDesc(desc);

Best Regards,
Hou zj

#3Michael Paquier
michael@paquier.xyz
In reply to: Zhijie Hou (Fujitsu) (#2)
Re: memory leak in pgoutput

On Mon, Nov 18, 2024 at 02:53:52AM +0000, Zhijie Hou (Fujitsu) wrote:

But I think there is an issue in the attached patch:

+ FreeTupleDesc(entry->old_slot->tts_tupleDescriptor);
ExecDropSingleTupleTableSlot(entry->old_slot);

Here, after freeing the tupledesc, the ExecDropSingleTupleTableSlot will still
access the freed tupledesc->tdrefcount which is an illegal memory access.

I think we can do something like below instead:

+			TupleDesc	desc = entry->old_slot->tts_tupleDescriptor;
+
+			Assert(desc->tdrefcount == -1);
+
ExecDropSingleTupleTableSlot(entry->old_slot);
+			FreeTupleDesc(desc);

Yep, obviously.

I was first surprised that the DecrTupleDescRefCount() done in
ExecDropSingleTupleTableSlot() would not be enough to free the
TupleDesc.

We do some allocations for dynamically-allocated resources like what
pgoutput does in the SRF code, if I recall correctly, as these need
are a problem across multiple calls and the query-level memory context
would not do this cleanup..
--
Michael

#4by Yang
mobile.yang@outlook.com
In reply to: Zhijie Hou (Fujitsu) (#2)
Re: memory leak in pgoutput

Here, after freeing the tupledesc, the ExecDropSingleTupleTableSlot will still
access the freed tupledesc->tdrefcount which is an illegal memory access.

Yes, I overlooked that.

I think we can do something like below instead:

+                       TupleDesc       desc = entry->old_slot->tts_tupleDescriptor;
+
+                       Assert(desc->tdrefcount == -1);
+
ExecDropSingleTupleTableSlot(entry->old_slot);
+                       FreeTupleDesc(desc);

It seems a bit odd because "entry->old_slot->tts_tupleDescriptor" is accessed
after "entry->old_slot" has been freed. I think we can avoid this by assigning
"desc" to NULL before ExecDropSingleTupleTableSlot().

```
+                                              TupleDesc       desc = entry->old_slot->tts_tupleDescriptor;
+
+                                              Assert(desc->tdrefcount == -1);
+
+                                              FreeTupleDesc(desc);
+                                              desc = NULL;
                                               ExecDropSingleTupleTableSlot(entry->old_slot);
```

By the way, this issue is introduced in 52e4f0cd472d39d. Therefore, we may need
to backport the patch to v15.

Best Regards,
Boyu Yang

#5Michael Paquier
michael@paquier.xyz
In reply to: by Yang (#4)
Re: memory leak in pgoutput

On Mon, Nov 18, 2024 at 07:00:57AM +0000, by Yang wrote:

By the way, this issue is introduced in 52e4f0cd472d39d. Therefore, we may need
to backport the patch to v15.

Yes. Note that nothing can happen on stable branches for a few days
as a release is planned for this week. See here:
/messages/by-id/cd54868c-3162-41f9-bba6-16e6b1449ff3@postgresql.org
--
Michael

#6Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#5)
Re: memory leak in pgoutput

On Mon, Nov 18, 2024 at 04:58:38PM +0900, Michael Paquier wrote:

On Mon, Nov 18, 2024 at 07:00:57AM +0000, by Yang wrote:

By the way, this issue is introduced in 52e4f0cd472d39d. Therefore, we may need
to backport the patch to v15.

Yes. Note that nothing can happen on stable branches for a few days
as a release is planned for this week. See here:
/messages/by-id/cd54868c-3162-41f9-bba6-16e6b1449ff3@postgresql.org

By the way, if possible, could you send an updated version of the
patch to show what you have in mind?
--
Michael

#7by Yang
mobile.yang@outlook.com
In reply to: Michael Paquier (#6)
1 attachment(s)
Re: memory leak in pgoutput

By the way, if possible, could you send an updated version of the
patch to show what you have in mind?

Yeah, here is the new patch:

I have verifed that this patch works for REL_[15-17]_STABLE and master.
The memory consumption of "logical replication cache context" remains
consistently at 112 MB during the benchmark mentioned above.

Show quoted text

sysbench oltp_write_only --db-driver=pgsql \
--pgsql-port=5432 \
--pgsql-db=postgres \
--pgsql-user=postgres \
--tables=15000 --table_size=100 \
--report-interval=1 \
--threads=10 run \
--time=180

Attachments:

v2-0001-Fix-memory-leak-in-pgoutput.patchapplication/octet-stream; name=v2-0001-Fix-memory-leak-in-pgoutput.patchDownload
From 3e6d9d1abf18fb36805eac25f765b3cd388b44f6 Mon Sep 17 00:00:00 2001
From: "yangboyu.yby" <yangboyu.yby@alibaba-inc.com>
Date: Sat, 16 Nov 2024 16:26:02 +0800
Subject: [PATCH] Fix memory leak in pgoutput

---
 src/backend/replication/pgoutput/pgoutput.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 00e7024563..390f41494e 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -2055,9 +2055,25 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
 		 * Tuple slots cleanups. (Will be rebuilt later if needed).
 		 */
 		if (entry->old_slot)
+		{
+			TupleDesc       desc = entry->old_slot->tts_tupleDescriptor;
+
+			Assert(desc->tdrefcount == -1);
+
+			FreeTupleDesc(desc);
+			desc = NULL;
 			ExecDropSingleTupleTableSlot(entry->old_slot);
+		}
 		if (entry->new_slot)
+		{
+			TupleDesc       desc = entry->new_slot->tts_tupleDescriptor;
+
+			Assert(desc->tdrefcount == -1);
+
+			FreeTupleDesc(desc);
+			desc = NULL;
 			ExecDropSingleTupleTableSlot(entry->new_slot);
+		}
 
 		entry->old_slot = NULL;
 		entry->new_slot = NULL;
-- 
2.32.0.3.g01195cf9f

#8Michael Paquier
michael@paquier.xyz
In reply to: by Yang (#7)
Re: memory leak in pgoutput

On Tue, Nov 19, 2024 at 07:08:26AM +0000, by Yang wrote:

I have verifed that this patch works for REL_[15-17]_STABLE and master.
The memory consumption of "logical replication cache context" remains
consistently at 112 MB during the benchmark mentioned above.

You should be more careful with the amount of tests you are doing
here. This fails while waiting for some changes to be streamed when
creating a subscription:
cd src/test/subscription/ && PROVE_TESTS=t/017_stream_ddl.pl make check
--
Michael

#9by Yang
mobile.yang@outlook.com
In reply to: Michael Paquier (#8)
1 attachment(s)
Re: memory leak in pgoutput

You should be more careful with the amount of tests you are doing
here. This fails while waiting for some changes to be streamed when
creating a subscription:
cd src/test/subscription/ && PROVE_TESTS=t/017_stream_ddl.pl make check

I apologize for the obvious error in the previous patch. I have corrected it
in the new patch(v3) and pass the regression testing.

Attachments:

v3-0001-Fix-memory-leak-in-pgoutput.patchapplication/octet-stream; name=v3-0001-Fix-memory-leak-in-pgoutput.patchDownload
From 242bff0c2fecf6b777bfd72ac077810822d8015d Mon Sep 17 00:00:00 2001
From: "yangboyu.yby" <yangboyu.yby@alibaba-inc.com>
Date: Wed, 20 Nov 2024 11:14:09 +0800
Subject: [PATCH] Fix memory leak in pgoutput

---
 src/backend/replication/pgoutput/pgoutput.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 00e7024563e..36b2c5fbbfe 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -2055,9 +2055,25 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
 		 * Tuple slots cleanups. (Will be rebuilt later if needed).
 		 */
 		if (entry->old_slot)
+		{
+			TupleDesc       *desc = &entry->old_slot->tts_tupleDescriptor;
+
+			Assert((*desc)->tdrefcount == -1);
+
+			FreeTupleDesc(*desc);
+			*desc = NULL;
 			ExecDropSingleTupleTableSlot(entry->old_slot);
+		}
 		if (entry->new_slot)
+		{
+			TupleDesc       *desc = &entry->new_slot->tts_tupleDescriptor;
+
+			Assert((*desc)->tdrefcount == -1);
+
+			FreeTupleDesc(*desc);
+			*desc = NULL;
 			ExecDropSingleTupleTableSlot(entry->new_slot);
+		}
 
 		entry->old_slot = NULL;
 		entry->new_slot = NULL;
-- 
2.32.0.3.g01195cf9f

#10Michael Paquier
michael@paquier.xyz
In reply to: by Yang (#9)
Re: memory leak in pgoutput

On Wed, Nov 20, 2024 at 10:41:50AM +0000, by Yang wrote:

I apologize for the obvious error in the previous patch. I have corrected it
in the new patch(v3) and pass the regression testing.

It took me quite a bit of time to evaluate the amount of the damage,
and indeed sysbench has been quite good at showing the problem. It is
not that much though depending on the number of tables. With map
laptop and 500 tables, the cleanup of the slots showed up in sudden
burts that increased the memory footprint of the WAL sender.

I think that there is at least one more leak, which is even smaller
than the one you have reported here. It takes a longer run time to
show up with sysbench, but it's here with the WAL sender memory
growing slowly over time. We should be much more careful with this
area of the code in terms of memory handling. Perhaps with a broader
memory context associated to the data of the cached entries, reset
each time an entry is validated? This current coding style is quite
dangerous to rely on.

Anyway, this patch fixes a portion of the damage and Hou's method was
a bit cleaner, so I have used it and applied it down to v15.
--
Michael