[PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Started by Önder Kalacıalmost 4 years ago196 messageshackers
Jump to latest
#1Önder Kalacı
onderkalaci@gmail.com

Hi hackers,

It is often not feasible to use `REPLICA IDENTITY FULL` on the
publication, because it leads to full table scan

per tuple change on the subscription. This makes `REPLICA IDENTITY
FULL` impracticable -- probably other

than some small number of use cases.

With this patch, I'm proposing the following change: If there is an
index on the subscriber, use the index

as long as the planner sub-modules picks any index over sequential scan.

Majority of the logic on the subscriber side has already existed in
the code. The subscriber is already

capable of doing (unique) index scans. With this patch, we are
allowing the index to iterate over the

tuples fetched and only act when tuples are equal. The ones familiar
with this part of the code could

realize that the sequential scan code on the subscriber already
implements the `tuples_equal()` function.

In short, the changes on the subscriber are mostly combining parts of
(unique) index scan and

sequential scan codes.

The decision on whether to use an index (or which index) is mostly
derived from planner infrastructure.

The idea is that on the subscriber we have all the columns. So,
construct all the `Path`s with the

restrictions on all columns, such as `col_1 = $1 AND col_2 = $2 ...
AND col_n = $N`. Finally, let

the planner sub-module -- `create_index_paths()` -- to give us the
relevant index `Path`s. On top of

that adds the sequential scan `Path` as well. Finally, pick the
cheapest `Path` among.

From the performance point of view, there are few things to note.
First, the patch aims not to
change the behavior when PRIMARY KEY or UNIQUE INDEX is used. Second,
when REPLICA IDENTITY
IS FULL on the publisher and an index is used on the subscriber, the
difference mostly comes down
to `index scan` vs `sequential scan`. That's why it is hard to claim a
certain number of improvements.
It mostly depends on the data size, index and the data distribution.

Still, below I try to showcase the potential improvements using an
index on the subscriber
`pgbench_accounts(bid)`. With the index, the replication catches up
around ~5 seconds.
When the index is dropped, the replication takes around ~300 seconds.

// init source db
pgbench -i -s 100 -p 5432 postgres
psql -c "ALTER TABLE pgbench_accounts DROP CONSTRAINT
pgbench_accounts_pkey;" -p 5432 postgres
psql -c "CREATE INDEX i1 ON pgbench_accounts(aid);" -p 5432 postgres
psql -c "ALTER TABLE pgbench_accounts REPLICA IDENTITY FULL;" -p 5432 postgres
psql -c "CREATE PUBLICATION pub_test_1 FOR TABLE pgbench_accounts;" -p
5432 postgres

// init target db, drop existing primary key
pgbench -i -p 9700 postgres
psql -c "truncate pgbench_accounts;" -p 9700 postgres
psql -c "ALTER TABLE pgbench_accounts DROP CONSTRAINT
pgbench_accounts_pkey;" -p 9700 postgres
psql -c "CREATE SUBSCRIPTION sub_test_1 CONNECTION 'host=localhost
port=5432 user=onderkalaci dbname=postgres' PUBLICATION pub_test_1;"
-p 9700 postgres

// create one index, even on a low cardinality column
psql -c "CREATE INDEX i2 ON pgbench_accounts(bid);" -p 9700 postgres

// now, run some pgbench tests and observe replication
pgbench -t 500 -b tpcb-like -p 5432 postgres

What do hackers think about this change?

Thanks,

Onder Kalaci & Developing the Citus extension for PostgreSQL

Attachments:

0001_use_index_on_subs_when_pub_rep_ident_full.patchapplication/octet-stream; name=0001_use_index_on_subs_when_pub_rep_ident_full.patchDownload+800-47
#2Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#1)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

On Tue, Jul 12, 2022 at 7:07 PM Önder Kalacı <onderkalaci@gmail.com> wrote:

Hi hackers,

It is often not feasible to use `REPLICA IDENTITY FULL` on the publication, because it leads to full table scan

per tuple change on the subscription. This makes `REPLICA IDENTITY FULL` impracticable -- probably other

than some small number of use cases.

IIUC, this proposal is to optimize cases where users can't have a
unique/primary key for a relation on the subscriber and those
relations receive lots of updates or deletes?

With this patch, I'm proposing the following change: If there is an index on the subscriber, use the index

as long as the planner sub-modules picks any index over sequential scan.

Majority of the logic on the subscriber side has already existed in the code. The subscriber is already

capable of doing (unique) index scans. With this patch, we are allowing the index to iterate over the

tuples fetched and only act when tuples are equal. The ones familiar with this part of the code could

realize that the sequential scan code on the subscriber already implements the `tuples_equal()` function.

In short, the changes on the subscriber are mostly combining parts of (unique) index scan and

sequential scan codes.

The decision on whether to use an index (or which index) is mostly derived from planner infrastructure.

The idea is that on the subscriber we have all the columns. So, construct all the `Path`s with the

restrictions on all columns, such as `col_1 = $1 AND col_2 = $2 ... AND col_n = $N`. Finally, let

the planner sub-module -- `create_index_paths()` -- to give us the relevant index `Path`s. On top of

that adds the sequential scan `Path` as well. Finally, pick the cheapest `Path` among.

From the performance point of view, there are few things to note. First, the patch aims not to
change the behavior when PRIMARY KEY or UNIQUE INDEX is used. Second, when REPLICA IDENTITY
IS FULL on the publisher and an index is used on the subscriber, the difference mostly comes down
to `index scan` vs `sequential scan`. That's why it is hard to claim a certain number of improvements.
It mostly depends on the data size, index and the data distribution.

It seems that in favorable cases it will improve performance but we
should consider unfavorable cases as well. Two things that come to
mind in that regard are (a) while choosing index/seq. scan paths, the
patch doesn't account for cost for tuples_equal() which needs to be
performed for index scans, (b) it appears to me that the patch decides
which index to use the first time it opens the rel (or if the rel gets
invalidated) on subscriber and then for all consecutive operations it
uses the same index. It is quite possible that after some more
operations on the table, using the same index will actually be
costlier than a sequence scan or some other index scan.

--
With Regards,
Amit Kapila.

#3Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#2)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

On Mon, Jul 18, 2022 at 11:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jul 12, 2022 at 7:07 PM Önder Kalacı <onderkalaci@gmail.com> wrote:

Hi hackers,

It is often not feasible to use `REPLICA IDENTITY FULL` on the publication, because it leads to full table scan

per tuple change on the subscription. This makes `REPLICA IDENTITY FULL` impracticable -- probably other

than some small number of use cases.

IIUC, this proposal is to optimize cases where users can't have a
unique/primary key for a relation on the subscriber and those
relations receive lots of updates or deletes?

With this patch, I'm proposing the following change: If there is an index on the subscriber, use the index

as long as the planner sub-modules picks any index over sequential scan.

Majority of the logic on the subscriber side has already existed in the code. The subscriber is already

capable of doing (unique) index scans. With this patch, we are allowing the index to iterate over the

tuples fetched and only act when tuples are equal. The ones familiar with this part of the code could

realize that the sequential scan code on the subscriber already implements the `tuples_equal()` function.

In short, the changes on the subscriber are mostly combining parts of (unique) index scan and

sequential scan codes.

The decision on whether to use an index (or which index) is mostly derived from planner infrastructure.

The idea is that on the subscriber we have all the columns. So, construct all the `Path`s with the

restrictions on all columns, such as `col_1 = $1 AND col_2 = $2 ... AND col_n = $N`. Finally, let

the planner sub-module -- `create_index_paths()` -- to give us the relevant index `Path`s. On top of

that adds the sequential scan `Path` as well. Finally, pick the cheapest `Path` among.

From the performance point of view, there are few things to note. First, the patch aims not to
change the behavior when PRIMARY KEY or UNIQUE INDEX is used. Second, when REPLICA IDENTITY
IS FULL on the publisher and an index is used on the subscriber, the difference mostly comes down
to `index scan` vs `sequential scan`. That's why it is hard to claim a certain number of improvements.
It mostly depends on the data size, index and the data distribution.

It seems that in favorable cases it will improve performance but we
should consider unfavorable cases as well. Two things that come to
mind in that regard are (a) while choosing index/seq. scan paths, the
patch doesn't account for cost for tuples_equal() which needs to be
performed for index scans, (b) it appears to me that the patch decides
which index to use the first time it opens the rel (or if the rel gets
invalidated) on subscriber and then for all consecutive operations it
uses the same index. It is quite possible that after some more
operations on the table, using the same index will actually be
costlier than a sequence scan or some other index scan.

Point (a) won't matter because we perform tuples_equal both for
sequence and index scans. So, we can ignore point (a).

--
With Regards,
Amit Kapila.

#4Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#2)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Hi, thanks for your reply.

Amit Kapila <amit.kapila16@gmail.com>, 18 Tem 2022 Pzt, 08:29 tarihinde
şunu yazdı:

On Tue, Jul 12, 2022 at 7:07 PM Önder Kalacı <onderkalaci@gmail.com>
wrote:

Hi hackers,

It is often not feasible to use `REPLICA IDENTITY FULL` on the

publication, because it leads to full table scan

per tuple change on the subscription. This makes `REPLICA IDENTITY FULL`

impracticable -- probably other

than some small number of use cases.

IIUC, this proposal is to optimize cases where users can't have a
unique/primary key for a relation on the subscriber and those
relations receive lots of updates or deletes?

Yes, that is right.

In a similar perspective, I see this patch useful for reducing the "use
primary key/unique index" requirement to "use any index" for a reasonably
performant logical replication with updates/deletes.

It seems that in favorable cases it will improve performance but we
should consider unfavorable cases as well. Two things that come to
mind in that regard are (a) while choosing index/seq. scan paths, the
patch doesn't account for cost for tuples_equal() which needs to be
performed for index scans, (b) it appears to me that the patch decides
which index to use the first time it opens the rel (or if the rel gets
invalidated) on subscriber and then for all consecutive operations it
uses the same index. It is quite possible that after some more
operations on the table, using the same index will actually be
costlier than a sequence scan or some other index scan

Regarding (b), yes that is a concern I share. And, I was actually
considering sending another patch regarding this.

Currently, I can see two options and happy to hear your take on these (or
maybe another idea?)

- Add a new class of invalidation callbacks: Today, if we do ALTER TABLE or
CREATE INDEX on a table, the CacheRegisterRelcacheCallback helps us to
re-create the cache entries. In this case, as far as I can see, we need a
callback that is called when table "ANALYZE"d, because that is when the
statistics change. That is the time picking a new index makes sense.
However, that seems like adding another dimension to this patch, which I
can try but also see that committing becomes even harder. So, please see
the next idea as well.

- Ask users to manually pick the index they want to use: Currently, the
main complexity of the patch comes with the planner related code. In fact,
if you look into the logical replication related changes, those are
relatively modest changes. If we can drop the feature that Postgres picks
the index, and provide a user interface to set the indexes per table in the
subscription, we can probably have an easier patch to review & test. For
example, we could add `ALTER SUBSCRIPTION sub ALTER TABLE t USE INDEX i`
type of an API. This also needs some coding, but probably much simpler than
the current code. And, obviously, this pops up the question of can users
pick the right index? Probably not always, but at least that seems like a
good start to use this performance improvement.

Thoughts?

Thanks,
Onder Kalaci

#5Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#4)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

On Tue, Jul 19, 2022 at 1:46 PM Önder Kalacı <onderkalaci@gmail.com> wrote:

Hi, thanks for your reply.

Amit Kapila <amit.kapila16@gmail.com>, 18 Tem 2022 Pzt, 08:29 tarihinde şunu yazdı:

IIUC, this proposal is to optimize cases where users can't have a
unique/primary key for a relation on the subscriber and those
relations receive lots of updates or deletes?

Yes, that is right.

In a similar perspective, I see this patch useful for reducing the "use primary key/unique index" requirement to "use any index" for a reasonably performant logical replication with updates/deletes.

Agreed. BTW, have you seen any such requirements from users where this
will be useful for them?

It seems that in favorable cases it will improve performance but we
should consider unfavorable cases as well. Two things that come to
mind in that regard are (a) while choosing index/seq. scan paths, the
patch doesn't account for cost for tuples_equal() which needs to be
performed for index scans, (b) it appears to me that the patch decides
which index to use the first time it opens the rel (or if the rel gets
invalidated) on subscriber and then for all consecutive operations it
uses the same index. It is quite possible that after some more
operations on the table, using the same index will actually be
costlier than a sequence scan or some other index scan

Regarding (b), yes that is a concern I share. And, I was actually considering sending another patch regarding this.

Currently, I can see two options and happy to hear your take on these (or maybe another idea?)

- Add a new class of invalidation callbacks: Today, if we do ALTER TABLE or CREATE INDEX on a table, the CacheRegisterRelcacheCallback helps us to re-create the cache entries. In this case, as far as I can see, we need a callback that is called when table "ANALYZE"d, because that is when the statistics change. That is the time picking a new index makes sense.
However, that seems like adding another dimension to this patch, which I can try but also see that committing becomes even harder.

This idea sounds worth investigating. I see that this will require
more work but OTOH, we can't allow the existing system to regress
especially because depending on workload it might regress badly. We
can create a patch for this atop the base patch for easier review/test
but I feel we need some way to address this point.

So, please see the next idea as well.

- Ask users to manually pick the index they want to use: Currently, the main complexity of the patch comes with the planner related code. In fact, if you look into the logical replication related changes, those are relatively modest changes. If we can drop the feature that Postgres picks the index, and provide a user interface to set the indexes per table in the subscription, we can probably have an easier patch to review & test. For example, we could add `ALTER SUBSCRIPTION sub ALTER TABLE t USE INDEX i` type of an API. This also needs some coding, but probably much simpler than the current code. And, obviously, this pops up the question of can users pick the right index?

I think picking the right index is one point and another is what if
the subscription has many tables (say 10K or more), doing it for
individual tables per subscription won't be fun. Also, users need to
identify which tables belong to a particular subscription, now, users
can find the same via pg_subscription_rel or some other way but doing
this won't be straightforward for users. So, my inclination would be
to pick the right index automatically rather than getting the input
from the user.

Now, your point related to planner code in the patch bothers me as
well but I haven't studied the patch in detail to provide any
alternatives at this stage. Do you have any other ideas to make it
simpler or solve this problem in some other way?

--
With Regards,
Amit Kapila.

#6Marco Slot
marco.slot@gmail.com
In reply to: Amit Kapila (#2)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

On Mon, Jul 18, 2022 at 8:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

IIUC, this proposal is to optimize cases where users can't have a
unique/primary key for a relation on the subscriber and those
relations receive lots of updates or deletes?

I think this patch optimizes for all non-trivial cases of update/delete
replication (e.g. >1000 rows in the table, >1000 rows per hour updated)
without a primary key. For instance, it's quite common to have a large
append-mostly events table without a primary key (e.g. because of
partitioning, or insertion speed), which will still have occasional batch
updates/deletes.

Imagine an update of a table or partition with 1 million rows and a typical
scan speed of 1M rows/sec. An update on the whole table takes maybe 1-2
seconds. Replicating the update using a sequential scan per row can take on
the order of ~12 days ≈ 1M seconds.

The current implementation makes using REPLICA IDENTITY FULL a huge
liability/ impractical for scenarios where you want to replicate an
arbitrary set of user-defined tables, such as upgrades, migrations, shard
moves. We generally recommend users to tolerate update/delete errors in
such scenarios.

If the apply worker can use an index, the data migration tool can
tactically create one on a high cardinality column, which would practically
always be better than doing a sequential scan for non-trivial workloads.

cheers,
Marco

#7Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#5)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Hi,

- Add a new class of invalidation callbacks: Today, if we do ALTER TABLE

or CREATE INDEX on a table, the CacheRegisterRelcacheCallback helps us to
re-create the cache entries. In this case, as far as I can see, we need a
callback that is called when table "ANALYZE"d, because that is when the
statistics change. That is the time picking a new index makes sense.

However, that seems like adding another dimension to this patch, which I

can try but also see that committing becomes even harder.

This idea sounds worth investigating. I see that this will require
more work but OTOH, we can't allow the existing system to regress
especially because depending on workload it might regress badly.

Just to note if that is not clear: This patch avoids (or at least aims to
avoid assuming no bugs) changing the behavior of the existing systems with
PRIMARY KEY or UNIQUE index. In that case, we still use the relevant
indexes.

We
can create a patch for this atop the base patch for easier review/test
but I feel we need some way to address this point.

One another idea could be to re-calculate the index, say after *N*
updates/deletes for the table. We may consider using subscription_parameter
for getting N -- with a good default, or even hard-code into the code. I
think the cost of re-calculating should really be pretty small compared to
the other things happening during logical replication. So, a sane default
might work?

If you think the above doesn't work, I can try to work on a separate patch
which adds something like "analyze invalidation callback".

- Ask users to manually pick the index they want to use: Currently, the

main complexity of the patch comes with the planner related code. In fact,
if you look into the logical replication related changes, those are
relatively modest changes. If we can drop the feature that Postgres picks
the index, and provide a user interface to set the indexes per table in the
subscription, we can probably have an easier patch to review & test. For
example, we could add `ALTER SUBSCRIPTION sub ALTER TABLE t USE INDEX i`
type of an API. This also needs some coding, but probably much simpler than
the current code. And, obviously, this pops up the question of can users
pick the right index?

I think picking the right index is one point and another is what if
the subscription has many tables (say 10K or more), doing it for
individual tables per subscription won't be fun. Also, users need to
identify which tables belong to a particular subscription, now, users
can find the same via pg_subscription_rel or some other way but doing
this won't be straightforward for users. So, my inclination would be
to pick the right index automatically rather than getting the input
from the user.

Yes, all makes sense.

Now, your point related to planner code in the patch bothers me as
well but I haven't studied the patch in detail to provide any
alternatives at this stage. Do you have any other ideas to make it
simpler or solve this problem in some other way?

One idea I tried earlier was to go over the existing indexes and on the
table, then get the IndexInfo via BuildIndexInfo(). And then, try to find a
good heuristic to pick an index. In the end, I felt like that is doing a
sub-optimal job, requiring a similar amount of code of the current patch,
and still using the similar infrastructure.

My conclusion for that route was I should either use a very simple
heuristic (like pick the index with the most columns) and have a suboptimal
index pick, OR use a complex heuristic with a reasonable index pick. And,
the latter approach converged to the planner code in the patch. Do you
think the former approach is acceptable?

Thanks,
Onder

#8Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#7)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

On Wed, Jul 20, 2022 at 8:19 PM Önder Kalacı <onderkalaci@gmail.com> wrote:

- Add a new class of invalidation callbacks: Today, if we do ALTER TABLE or CREATE INDEX on a table, the CacheRegisterRelcacheCallback helps us to re-create the cache entries. In this case, as far as I can see, we need a callback that is called when table "ANALYZE"d, because that is when the statistics change. That is the time picking a new index makes sense.
However, that seems like adding another dimension to this patch, which I can try but also see that committing becomes even harder.

This idea sounds worth investigating. I see that this will require
more work but OTOH, we can't allow the existing system to regress
especially because depending on workload it might regress badly.

Just to note if that is not clear: This patch avoids (or at least aims to avoid assuming no bugs) changing the behavior of the existing systems with PRIMARY KEY or UNIQUE index. In that case, we still use the relevant indexes.

We
can create a patch for this atop the base patch for easier review/test
but I feel we need some way to address this point.

One another idea could be to re-calculate the index, say after N updates/deletes for the table. We may consider using subscription_parameter for getting N -- with a good default, or even hard-code into the code. I think the cost of re-calculating should really be pretty small compared to the other things happening during logical replication. So, a sane default might work?

One difficulty in deciding the value of N for the user or choosing a
default would be that we need to probably consider the local DML
operations on the table as well.

If you think the above doesn't work, I can try to work on a separate patch which adds something like "analyze invalidation callback".

I suggest we should give this a try and if this turns out to be
problematic or complex then we can think of using some heuristic as
you are suggesting above.

Now, your point related to planner code in the patch bothers me as
well but I haven't studied the patch in detail to provide any
alternatives at this stage. Do you have any other ideas to make it
simpler or solve this problem in some other way?

One idea I tried earlier was to go over the existing indexes and on the table, then get the IndexInfo via BuildIndexInfo(). And then, try to find a good heuristic to pick an index. In the end, I felt like that is doing a sub-optimal job, requiring a similar amount of code of the current patch, and still using the similar infrastructure.

My conclusion for that route was I should either use a very simple heuristic (like pick the index with the most columns) and have a suboptimal index pick,

Not only that but say all index have same number of columns then we
need to probably either pick the first such index or use some other
heuristic.

OR use a complex heuristic with a reasonable index pick. And, the latter approach converged to the planner code in the patch. Do you think the former approach is acceptable?

In this regard, I was thinking in which cases a sequence scan can be
better than the index scan (considering one is available). I think if
a certain column has a lot of duplicates (for example, a column has a
boolean value) then probably doing a sequence scan is better. Now,
considering this even though your other approach sounds simpler but
could lead to unpredictable results. So, I think the latter approach
is preferable.

BTW, do we want to consider partial indexes for the scan in this
context? I mean it may not have data of all rows so how that would be
usable?

Few comments:
===============
1.
static List *
+CreateReplicaIdentityFullPaths(Relation localrel)
{
...
+ /*
+ * Rather than doing all the pushups that would be needed to use
+ * set_baserel_size_estimates, just do a quick hack for rows and width.
+ */
+ rel->rows = rel->tuples;

Is it a good idea to set rows without any selectivity estimation?
Won't this always set the entire rows in a relation? Also, if we don't
want to use set_baserel_size_estimates(), how will we compute
baserestrictcost which will later be used in the costing of paths (for
example, costing of seqscan path (cost_seqscan) uses it)?

In general, I think it will be better to consider calling some
top-level planner functions even for paths. Can we consider using
make_one_rel() instead of building individual paths? On similar lines,
in function PickCheapestIndexPathIfExists(), can we use
set_cheapest()?

2.
@@ -57,9 +60,6 @@ build_replindex_scan_key(ScanKey skey, Relation rel,
Relation idxrel,
int2vector *indkey = &idxrel->rd_index->indkey;
bool hasnulls = false;

- Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel) ||
- RelationGetPrimaryKeyIndex(rel) == RelationGetRelid(idxrel));

You have removed this assertion but there is a comment ("This is not
generic routine, it expects the idxrel to be replication identity of a
rel and meet all limitations associated with that.") atop this
function which either needs to be changed/removed and probably we
should think if the function needs some change after removing that
restriction.

--
With Regards,
Amit Kapila.

#9Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#8)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Hi,

One another idea could be to re-calculate the index, say after N

updates/deletes for the table. We may consider using subscription_parameter
for getting N -- with a good default, or even hard-code into the code. I
think the cost of re-calculating should really be pretty small compared to
the other things happening during logical replication. So, a sane default
might work?

One difficulty in deciding the value of N for the user or choosing a
default would be that we need to probably consider the local DML
operations on the table as well.

Fair enough, it is not easy to find a good default.

If you think the above doesn't work, I can try to work on a separate

patch which adds something like "analyze invalidation callback".

I suggest we should give this a try and if this turns out to be
problematic or complex then we can think of using some heuristic as
you are suggesting above.

Alright, I'll try this and respond shortly back.

OR use a complex heuristic with a reasonable index pick. And, the latter

approach converged to the planner code in the patch. Do you think the
former approach is acceptable?

In this regard, I was thinking in which cases a sequence scan can be
better than the index scan (considering one is available). I think if
a certain column has a lot of duplicates (for example, a column has a
boolean value) then probably doing a sequence scan is better. Now,
considering this even though your other approach sounds simpler but
could lead to unpredictable results. So, I think the latter approach
is preferable.

Yes, it makes sense. I also considered this during the development of the
patch, but forgot to mention :)

BTW, do we want to consider partial indexes for the scan in this
context? I mean it may not have data of all rows so how that would be
usable?

As far as I can see, check_index_predicates() never picks a partial index
for the baserestrictinfos we create in CreateReplicaIdentityFullPaths().
The reason is that we have roughly the following call stack:

-check_index_predicates
--predicate_implied_by
---predicate_implied_by_recurse
----predicate_implied_by_simple_clause
-----operator_predicate_proof

And, inside operator_predicate_proof(), there is never going to be an
equality. Because, we push `Param`s to the baserestrictinfos whereas the
index predicates are always `Const`.

If we want to make it even more explicit, I can filter out `Path`s with
partial indexes. But that seems redundant to me. For now, I pushed the
commit with an assertion that we never pick partial indexes and also added
a test.

If you think it is better to explicitly filter out partial indexes, I can
do that as well.

Few comments:
===============
1.
static List *
+CreateReplicaIdentityFullPaths(Relation localrel)
{
...
+ /*
+ * Rather than doing all the pushups that would be needed to use
+ * set_baserel_size_estimates, just do a quick hack for rows and width.
+ */
+ rel->rows = rel->tuples;

Is it a good idea to set rows without any selectivity estimation?
Won't this always set the entire rows in a relation? Also, if we don't
want to use set_baserel_size_estimates(), how will we compute
baserestrictcost which will later be used in the costing of paths (for
example, costing of seqscan path (cost_seqscan) uses it)?

In general, I think it will be better to consider calling some
top-level planner functions even for paths. Can we consider using
make_one_rel() instead of building individual paths?

Thanks, this looks like a good suggestion/simplification. I wanted to use
the least amount of code possible, and make_one_rel() does either what I
exactly need or slightly more, which is great.

Note that make_one_rel() also follows the same call stack that I noted
above. So, I cannot spot any problems with partial indexes. Maybe am I
missing something here?

On similar lines,
in function PickCheapestIndexPathIfExists(), can we use
set_cheapest()?

Yes, make_one_rel() + set_cheapest() sounds better. Changed.

2.
@@ -57,9 +60,6 @@ build_replindex_scan_key(ScanKey skey, Relation rel,
Relation idxrel,
int2vector *indkey = &idxrel->rd_index->indkey;
bool hasnulls = false;

- Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel) ||
- RelationGetPrimaryKeyIndex(rel) == RelationGetRelid(idxrel));

You have removed this assertion but there is a comment ("This is not
generic routine, it expects the idxrel to be replication identity of a
rel and meet all limitations associated with that.") atop this
function which either needs to be changed/removed and probably we
should think if the function needs some change after removing that
restriction.

Ack, I can see your point. I think, for example, we should skip index
attributes that are not simple column references. And, probably whatever
other restrictions that PRIMARY has, should be here.

I'll read some more Postgres code & test before pushing a revision for this
part. In the meantime, if you have any suggestions/pointers for me to look
into, please note here.

Attached v2 of the patch with addressing some of the comments you had. I'll
work on the remaining shortly.

Thanks,
Onder

Attachments:

v2_0001_use_index_on_subs_when_pub_rep_ident_full.patchapplication/octet-stream; name=v2_0001_use_index_on_subs_when_pub_rep_ident_full.patchDownload+867-47
#10Önder Kalacı
onderkalaci@gmail.com
In reply to: Önder Kalacı (#9)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Hi,

2.

@@ -57,9 +60,6 @@ build_replindex_scan_key(ScanKey skey, Relation rel,
Relation idxrel,
int2vector *indkey = &idxrel->rd_index->indkey;
bool hasnulls = false;

- Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel) ||
- RelationGetPrimaryKeyIndex(rel) == RelationGetRelid(idxrel));

You have removed this assertion but there is a comment ("This is not
generic routine, it expects the idxrel to be replication identity of a
rel and meet all limitations associated with that.") atop this
function which either needs to be changed/removed and probably we
should think if the function needs some change after removing that
restriction.

Ack, I can see your point. I think, for example, we should skip index
attributes that are not simple column references. And, probably whatever
other restrictions that PRIMARY has, should be here.

Primary keys require:
- Unique: We don't need uniqueness, that's the point of this patch
- Valid index: Should not be an issue in this case, because planner would
not pick non-valid index anyway.
- Non-Partial index: As discussed earlier in this thread, I really don't
see any problems with partial indexes for this use-case. Please let me know
if there is anything I miss.
- Deferrable - Immediate: As far as I can see, there is no such concepts
for regular indexes, so does not apply here
- Indexes with no expressions: This is the point where we require some
minor changes inside/around `build_replindex_scan_key `. Previously,
indexes on expressions could not be replica indexes. And, with this patch
they can. However, the expressions cannot be used for filtering the tuples
because of the way we create the restrictinfos. We essentially create
`WHERE col_1 = $1 AND col_2 = $2 .. col_n = $n` for the columns with
equality operators available. In the case of expressions on the indexes,
the planner would never pick such indexes with these restrictions. I
changed `build_replindex_scan_key ` to reflect that, added a new assert and
pushed tests with the following schema, and make sure the code behaves as
expected:

CREATE TABLE people (firstname text, lastname text);
CREATE INDEX people_names_expr_only ON people ((firstname || ' ' ||
lastname));
CREATE INDEX people_names_expr_and_columns ON people ((firstname || ' ' ||
lastname), firstname, lastname);

Also did similar tests with indexes on jsonb fields. Does that help you
avoid the concerns regarding indexes with expressions?

I'll work on one of the other open items in the thread (e.g., analyze
invalidation callback) separately.

Thanks,
Onder KALACI

Attachments:

v3_0001_use_index_on_subs_when_pub_rep_ident_full.patchapplication/octet-stream; name=v3_0001_use_index_on_subs_when_pub_rep_ident_full.patchDownload+1023-68
#11Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#5)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Hi,

As far as I can see, the following is the answer to the only remaining open
discussion in this thread. Let me know if anything is missed.

(b) it appears to me that the patch decides

which index to use the first time it opens the rel (or if the rel gets
invalidated) on subscriber and then for all consecutive operations it
uses the same index. It is quite possible that after some more
operations on the table, using the same index will actually be
costlier than a sequence scan or some other index scan

Regarding (b), yes that is a concern I share. And, I was actually

considering sending another patch regarding this.

Currently, I can see two options and happy to hear your take on these

(or maybe another idea?)

- Add a new class of invalidation callbacks: Today, if we do ALTER TABLE

or CREATE INDEX on a table, the CacheRegisterRelcacheCallback helps us to
re-create the cache entries. In this case, as far as I can see, we need a
callback that is called when table "ANALYZE"d, because that is when the
statistics change. That is the time picking a new index makes sense.

However, that seems like adding another dimension to this patch, which I

can try but also see that committing becomes even harder.

This idea sounds worth investigating. I see that this will require
more work but OTOH, we can't allow the existing system to regress
especially because depending on workload it might regress badly. We
can create a patch for this atop the base patch for easier review/test
but I feel we need some way to address this point.

It turns out that we already invalidate the relevant entries
in LogicalRepRelMap/LogicalRepPartMap when "ANALYZE" (or VACUUM) updates
any of the statistics in pg_class.

The call-stack for analyze is roughly:
do_analyze_rel()
-> vac_update_relstats()
-> heap_inplace_update()
-> if needs to apply any statistical change
-> CacheInvalidateHeapTuple()

And, we register for those invalidations already:
logicalrep_relmap_init() / logicalrep_partmap_init()
-> CacheRegisterRelcacheCallback()

Added a test which triggers this behavior. The test is as follows:
- Create two indexes on the target, on column_a and column_b
- Initially load data such that the column_a has a high cardinality
- Show that we use the index on column_a
- Load more data such that the column_b has higher cardinality
- ANALYZE on the target table
- Show that we use the index on column_b afterwards

Thanks,
Onder KALACI

Attachments:

v4_0001_use_index_on_subs_when_pub_rep_ident_full.patchapplication/x-patch; name=v4_0001_use_index_on_subs_when_pub_rep_ident_full.patchDownload+1165-68
#12Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#11)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

On Mon, Aug 1, 2022 at 9:52 PM Önder Kalacı <onderkalaci@gmail.com> wrote:

Hi,

As far as I can see, the following is the answer to the only remaining open discussion in this thread. Let me know if anything is missed.

(b) it appears to me that the patch decides

which index to use the first time it opens the rel (or if the rel gets
invalidated) on subscriber and then for all consecutive operations it
uses the same index. It is quite possible that after some more
operations on the table, using the same index will actually be
costlier than a sequence scan or some other index scan

Regarding (b), yes that is a concern I share. And, I was actually considering sending another patch regarding this.

Currently, I can see two options and happy to hear your take on these (or maybe another idea?)

- Add a new class of invalidation callbacks: Today, if we do ALTER TABLE or CREATE INDEX on a table, the CacheRegisterRelcacheCallback helps us to re-create the cache entries. In this case, as far as I can see, we need a callback that is called when table "ANALYZE"d, because that is when the statistics change. That is the time picking a new index makes sense.
However, that seems like adding another dimension to this patch, which I can try but also see that committing becomes even harder.

This idea sounds worth investigating. I see that this will require
more work but OTOH, we can't allow the existing system to regress
especially because depending on workload it might regress badly. We
can create a patch for this atop the base patch for easier review/test
but I feel we need some way to address this point.

It turns out that we already invalidate the relevant entries in LogicalRepRelMap/LogicalRepPartMap when "ANALYZE" (or VACUUM) updates any of the statistics in pg_class.

The call-stack for analyze is roughly:
do_analyze_rel()
-> vac_update_relstats()
-> heap_inplace_update()
-> if needs to apply any statistical change
-> CacheInvalidateHeapTuple()

Yeah, it appears that this will work but I see that we don't update
here for inherited stats, how does it work for such cases?

--
With Regards,
Amit Kapila.

#13Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#9)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

On Fri, Jul 22, 2022 at 9:45 PM Önder Kalacı <onderkalaci@gmail.com> wrote:

BTW, do we want to consider partial indexes for the scan in this
context? I mean it may not have data of all rows so how that would be
usable?

As far as I can see, check_index_predicates() never picks a partial index for the baserestrictinfos we create in CreateReplicaIdentityFullPaths(). The reason is that we have roughly the following call stack:

-check_index_predicates
--predicate_implied_by
---predicate_implied_by_recurse
----predicate_implied_by_simple_clause
-----operator_predicate_proof

And, inside operator_predicate_proof(), there is never going to be an equality. Because, we push `Param`s to the baserestrictinfos whereas the index predicates are always `Const`.

I agree that the way currently baserestrictinfos are formed by patch,
it won't select the partial path, and chances are that that will be
true in future as well but I think it is better to be explicit in this
case to avoid creating a dependency between two code paths.

Few other comments:
==================
1. Why is it a good idea to choose the index selected even for the
bitmap path (T_BitmapHeapScan or T_BitmapIndexScan)? We use index scan
during update/delete, so not sure how we can conclude to use index for
bitmap paths.

2. The index info is built even on insert, so workload, where there
are no updates/deletes or those are not published then this index
selection work will go waste. Will it be better to do it at first
update/delete? One can say that it is not worth the hassle as anyway
it will be built the first time we perform an operation on the
relation or after the relation gets invalidated. If we think so, then
probably adding a comment could be useful.

3.
+my $synced_query =
+  "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT
IN ('r', 's');";
...
...
+# wait for initial table synchronization to finish
+$node_subscriber->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";

You can avoid such instances in the test by using the new
infrastructure added in commit 0c20dd33db.

4.
LogicalRepRelation *remoterel = &root->remoterel;
+
Oid partOid = RelationGetRelid(partrel);

Spurious line addition.

--
With Regards,
Amit Kapila.

#14Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#13)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Hi,

Thanks for the feedback, see my reply below.

It turns out that we already invalidate the relevant entries in

LogicalRepRelMap/LogicalRepPartMap when "ANALYZE" (or VACUUM) updates any
of the statistics in pg_class.

The call-stack for analyze is roughly:
do_analyze_rel()
-> vac_update_relstats()
-> heap_inplace_update()
-> if needs to apply any statistical change
-> CacheInvalidateHeapTuple()

Yeah, it appears that this will work but I see that we don't update

here for inherited stats, how does it work for such cases?

There, the expansion of the relation list to partitions happens one level
above on the call stack. So, the call stack looks like the following:

autovacuum_do_vac_analyze() (or ExecVacuum)
-> vacuum()
-> expand_vacuum_rel()
-> rel_list=parent+children partitions
-> for rel in rel_list
->analyze_rel()
->do_analyze_rel
... (and the same call stack as above)

I also added one variation of a similar test for partitioned tables, which
I earlier added for non-partitioned tables as well:

Added a test which triggers this behavior. The test is as follows:

- Create two indexes on the target, on column_a and column_b
- Initially load data such that the column_a has a high cardinality
- Show that we use the index on column_a on a *child *table
- Load more data such that the column_b has higher cardinality
- ANALYZE on the *parent* table
- Show that we use the index on column_b afterwards on the *child* table

My answer for the above assumes that your question is regarding what
happens if you ANALYZE on a partitioned table. If your question is
something different, please let me know.

BTW, do we want to consider partial indexes for the scan in this
context? I mean it may not have data of all rows so how that would be
usable?

As far as I can see, check_index_predicates() never picks a partial

index for the baserestrictinfos we create in
CreateReplicaIdentityFullPaths(). The reason is that we have roughly the
following call stack:

-check_index_predicates
--predicate_implied_by
---predicate_implied_by_recurse
----predicate_implied_by_simple_clause
-----operator_predicate_proof

And, inside operator_predicate_proof(), there is never going to be an

equality. Because, we push `Param`s to the baserestrictinfos whereas the
index predicates are always `Const`.

I agree that the way currently baserestrictinfos are formed by patch,
it won't select the partial path, and chances are that that will be
true in future as well but I think it is better to be explicit in this
case to avoid creating a dependency between two code paths.

Yes, it makes sense. So, I changed Assert into a function where we filter
partial indexes and indexes on only expressions, so that we do not create
such dependencies between the planner and here.

If one day planner supports using column values on index with expressions,
this code would only not be able to use the optimization until we do some
improvements in this code-path. I think that seems like a fair trade-off
for now.

Few other comments:

==================
1. Why is it a good idea to choose the index selected even for the
bitmap path (T_BitmapHeapScan or T_BitmapIndexScan)? We use index scan
during update/delete, so not sure how we can conclude to use index for
bitmap paths.

In our case, during update/delete we are searching for a single tuple on
the target. And, it seems like using an index is probably going to be
cheaper for finding the single tuple. In general, I thought we should use
an index if the planner ever decides to use it with the given restrictions.

Also, for the majority of the use-cases, I think we'd probably expect an
index on a column with high cardinality -- hence use index scan. So, bitmap
index scans are probably not going to be that much common.

Still, I don't see a problem with using such indexes. Of course, it is
possible that I might be missing something. Do you have any specific
concerns in this area?

2. The index info is built even on insert, so workload, where there
are no updates/deletes or those are not published then this index
selection work will go waste. Will it be better to do it at first
update/delete? One can say that it is not worth the hassle as anyway
it will be built the first time we perform an operation on the
relation or after the relation gets invalidated.

With the current approach, the index (re)-calculation is coupled with
(in)validation of the relevant cache entries. So, I'd argue for the
simplicity of the code, we could afford to waste this small overhead?
According to my local measurements, especially for large tables, the index
oid calculation is mostly insignificant compared to the rest of the steps.
Does that sound OK to you?

If we think so, then
probably adding a comment could be useful.

Yes, that is useful if you are OK with the above, added.

3.
+my $synced_query =
+  "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT
IN ('r', 's');";
...
...
+# wait for initial table synchronization to finish
+$node_subscriber->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";

You can avoid such instances in the test by using the new
infrastructure added in commit 0c20dd33db.

Cool, applied changes.

4.
LogicalRepRelation *remoterel = &root->remoterel;
+
Oid partOid = RelationGetRelid(partrel);

Spurious line addition.

Fixed, went over the code and couldn't find other.

Attaching v5 of the patch which reflects the review on this email, also few
minor test improvements.

Thanks,
Onder

Attachments:

v5_0001_use_index_on_subs_when_pub_rep_ident_full.patchapplication/octet-stream; name=v5_0001_use_index_on_subs_when_pub_rep_ident_full.patchDownload+1235-68
#15osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Önder Kalacı (#14)
RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

On Tuesday, August 9, 2022 12:59 AM Önder Kalacı <onderkalaci@gmail.com> wrote:

Attaching v5 of the patch which reflects the review on this email, also few
minor test improvements.

Hi,

Thank you for the updated patch.
FYI, I noticed that v5 causes cfbot failure in [1]https://cirrus-ci.com/task/6544573026533376.
Could you please fix it in the next version ?

[19:44:38.420] execReplication.c: In function ‘RelationFindReplTupleByIndex’:
[19:44:38.420] execReplication.c:186:24: error: ‘eq’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
[19:44:38.420] 186 | if (!indisunique && !tuples_equal(outslot, searchslot, eq))
[19:44:38.420] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[19:44:38.420] cc1: all warnings being treated as errors

[1]: https://cirrus-ci.com/task/6544573026533376

Best Regards,
Takamichi Osumi

#16Önder Kalacı
onderkalaci@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#15)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Hi,

FYI, I noticed that v5 causes cfbot failure in [1].
Could you please fix it in the next version ?

Thanks for letting me know!

[19:44:38.420] execReplication.c: In function
‘RelationFindReplTupleByIndex’:
[19:44:38.420] execReplication.c:186:24: error: ‘eq’ may be used
uninitialized in this function [-Werror=maybe-uninitialized]
[19:44:38.420] 186 | if (!indisunique && !tuples_equal(outslot,
searchslot, eq))
[19:44:38.420] |
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[19:44:38.420] cc1: all warnings being treated as errors

It is kind of interesting that the compiler cannot understand that `eq` is
only used when *!indisunique. *Anyway, now I've sent v6 where I avoid the
warning with a slight refactor to avoid the compile warning.

Thanks,
Onder KALACI

Attachments:

v6_0001_use_index_on_subs_when_pub_rep_ident_full.patchapplication/octet-stream; name=v6_0001_use_index_on_subs_when_pub_rep_ident_full.patchDownload+1241-68
#17Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#14)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

On Mon, Aug 8, 2022 at 9:29 PM Önder Kalacı <onderkalaci@gmail.com> wrote:

Hi,

Thanks for the feedback, see my reply below.

It turns out that we already invalidate the relevant entries in LogicalRepRelMap/LogicalRepPartMap when "ANALYZE" (or VACUUM) updates any of the statistics in pg_class.

The call-stack for analyze is roughly:
do_analyze_rel()
-> vac_update_relstats()
-> heap_inplace_update()
-> if needs to apply any statistical change
-> CacheInvalidateHeapTuple()

Yeah, it appears that this will work but I see that we don't update
here for inherited stats, how does it work for such cases?

There, the expansion of the relation list to partitions happens one level above on the call stack. So, the call stack looks like the following:

autovacuum_do_vac_analyze() (or ExecVacuum)
-> vacuum()
-> expand_vacuum_rel()
-> rel_list=parent+children partitions
-> for rel in rel_list
->analyze_rel()
->do_analyze_rel
... (and the same call stack as above)

I also added one variation of a similar test for partitioned tables, which I earlier added for non-partitioned tables as well:

Added a test which triggers this behavior. The test is as follows:
- Create two indexes on the target, on column_a and column_b
- Initially load data such that the column_a has a high cardinality
- Show that we use the index on column_a on a child table
- Load more data such that the column_b has higher cardinality
- ANALYZE on the parent table
- Show that we use the index on column_b afterwards on the child table

My answer for the above assumes that your question is regarding what happens if you ANALYZE on a partitioned table. If your question is something different, please let me know.

I was talking about inheritance cases, something like:
create table tbl1 (a int);
create table tbl1_part1 (b int) inherits (tbl1);
create table tbl1_part2 (c int) inherits (tbl1);

What we do in such cases is documented as: "if the table being
analyzed has inheritance children, ANALYZE gathers two sets of
statistics: one on the rows of the parent table only, and a second
including rows of both the parent table and all of its children. This
second set of statistics is needed when planning queries that process
the inheritance tree as a whole. The child tables themselves are not
individually analyzed in this case."

Now, the point I was worried about was what if the changes in child
tables (*_part1, *_part2) are much more than in tbl1? In such cases,
we may not invalidate child rel entries, so how will logical
replication behave for updates/deletes on child tables? There may not
be any problem here but it is better to do some analysis of such cases
to see how it behaves.

BTW, do we want to consider partial indexes for the scan in this
context? I mean it may not have data of all rows so how that would be
usable?

Few other comments:
==================
1. Why is it a good idea to choose the index selected even for the
bitmap path (T_BitmapHeapScan or T_BitmapIndexScan)? We use index scan
during update/delete, so not sure how we can conclude to use index for
bitmap paths.

In our case, during update/delete we are searching for a single tuple on the target. And, it seems like using an index is probably going to be cheaper for finding the single tuple. In general, I thought we should use an index if the planner ever decides to use it with the given restrictions.

What about the case where the index has a lot of duplicate values? We
may need to retrieve multiple tuples in such cases.

Also, for the majority of the use-cases, I think we'd probably expect an index on a column with high cardinality -- hence use index scan. So, bitmap index scans are probably not going to be that much common.

You are probably right here but I don't think we can make such
assumptions. I think the safest way to avoid any regression here is to
choose an index when the planner selects an index scan. We can always
extend it later to bitmap scans if required. We can add a comment
indicating the same.

Still, I don't see a problem with using such indexes. Of course, it is possible that I might be missing something. Do you have any specific concerns in this area?

2. The index info is built even on insert, so workload, where there
are no updates/deletes or those are not published then this index
selection work will go waste. Will it be better to do it at first
update/delete? One can say that it is not worth the hassle as anyway
it will be built the first time we perform an operation on the
relation or after the relation gets invalidated.

With the current approach, the index (re)-calculation is coupled with (in)validation of the relevant cache entries. So, I'd argue for the simplicity of the code, we could afford to waste this small overhead? According to my local measurements, especially for large tables, the index oid calculation is mostly insignificant compared to the rest of the steps. Does that sound OK to you?

If we think so, then
probably adding a comment could be useful.

Yes, that is useful if you are OK with the above, added.

*
+ /*
+ * For insert-only workloads, calculating the index is not necessary.
+ * As the calculation is not expensive, we are fine to do here (instead
+ * of during first update/delete processing).
+ */

I think here instead of talking about cost, we should mention that it
is quite an infrequent operation i.e performed only when we first time
performs an operation on the relation or after invalidation. This is
because I think the cost is relative.

*
+
+ /*
+ * Although currently it is not possible for planner to pick a
+ * partial index or indexes only on expressions,

It may be better to expand this comment by describing a bit why it is
not possible in our case. You might want to give the function
reference where it is decided.

--
With Regards,
Amit Kapila.

#18Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#17)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Hi,

I'm a little late to catch up with your comments, but here are my replies:

My answer for the above assumes that your question is regarding what
happens if you ANALYZE on a partitioned table. If your question is
something different, please let me know.

I was talking about inheritance cases, something like:
create table tbl1 (a int);
create table tbl1_part1 (b int) inherits (tbl1);
create table tbl1_part2 (c int) inherits (tbl1);

What we do in such cases is documented as: "if the table being
analyzed has inheritance children, ANALYZE gathers two sets of
statistics: one on the rows of the parent table only, and a second
including rows of both the parent table and all of its children. This
second set of statistics is needed when planning queries that process
the inheritance tree as a whole. The child tables themselves are not
individually analyzed in this case."

Oh, I haven't considered inherited tables. That seems right, the
statistics of the children are not updated when the parent is analyzed.

Now, the point I was worried about was what if the changes in child
tables (*_part1, *_part2) are much more than in tbl1? In such cases,
we may not invalidate child rel entries, so how will logical
replication behave for updates/deletes on child tables? There may not
be any problem here but it is better to do some analysis of such cases
to see how it behaves.

I also haven't observed any specific issues. In the end, when the user (or
autovacuum) does ANALYZE on the child, it is when the statistics are
updated for the child. Although I do not have much experience with
inherited tables, this sounds like the expected behavior?

I also pushed a test covering inherited tables. First, a basic test on the
parent. Then, show that updates on the parent can also use indexes of the
children. Also, after an ANALYZE on the child, we can re-calculate the
index and use the index with a higher cardinality column.

Also, for the majority of the use-cases, I think we'd probably expect an

index on a column with high cardinality -- hence use index scan. So, bitmap
index scans are probably not going to be that much common.

You are probably right here but I don't think we can make such
assumptions. I think the safest way to avoid any regression here is to
choose an index when the planner selects an index scan. We can always
extend it later to bitmap scans if required. We can add a comment
indicating the same.

Alright, I got rid of the bitmap scans.

Though, it caused few of the new tests to fail. I think because of the data
size/distribution, the planner picks bitmap scans. To make the tests
consistent and small, I added `enable_bitmapscan to off` for this new test
file. Does that sound ok to you? Or, should we change the tests to make
sure they genuinely use index scans?

*

+ /*
+ * For insert-only workloads, calculating the index is not necessary.
+ * As the calculation is not expensive, we are fine to do here (instead
+ * of during first update/delete processing).
+ */

I think here instead of talking about cost, we should mention that it
is quite an infrequent operation i.e performed only when we first time
performs an operation on the relation or after invalidation. This is
because I think the cost is relative.

Changed, does that look better?

+

+ /*
+ * Although currently it is not possible for planner to pick a
+ * partial index or indexes only on expressions,

It may be better to expand this comment by describing a bit why it is
not possible in our case. You might want to give the function
reference where it is decided.

Make sense, added some more information.

Thanks,
Onder

Attachments:

v7_0001_use_index_on_subs_when_pub_rep_ident_full.patchapplication/octet-stream; name=v7_0001_use_index_on_subs_when_pub_rep_ident_full.patchDownload+1359-68
#19shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Önder Kalacı (#18)
RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

On Sat, Aug 20, 2022 7:02 PM Önder Kalacı <onderkalaci@gmail.com> wrote:

Hi,

I'm a little late to catch up with your comments, but here are my replies:

Thanks for your patch. Here are some comments.

1.
In FilterOutNotSuitablePathsForReplIdentFull(), is "nonPartialIndexPathList" a
good name for the list? Indexes on only expressions are also be filtered.

+static List *
+FilterOutNotSuitablePathsForReplIdentFull(List *pathlist)
+{
+	ListCell   *lc;
+	List *nonPartialIndexPathList = NIL;
2.
+typedef struct LogicalRepPartMapEntry
+{
+	Oid			partoid;		/* LogicalRepPartMap's key */
+	LogicalRepRelMapEntry relmapentry;
+	Oid			usableIndexOid; /* which index to use? (Invalid when no index
+								 * used) */
+} LogicalRepPartMapEntry;

For partition tables, is it possible to use relmapentry->usableIndexOid to mark
which index to use? Which means we don't need to add "usableIndexOid" to
LogicalRepPartMapEntry.

3.
It looks we should change the comment for FindReplTupleInLocalRel() in this
patch.

/*
* Try to find a tuple received from the publication side (in 'remoteslot') in
* the corresponding local relation using either replica identity index,
* primary key or if needed, sequential scan.
*
* Local tuple, if found, is returned in '*localslot'.
*/
static bool
FindReplTupleInLocalRel(EState *estate, Relation localrel,

4.
@@ -2030,16 +2017,19 @@ apply_handle_delete_internal(ApplyExecutionData *edata,
 {
 	EState	   *estate = edata->estate;
 	Relation	localrel = relinfo->ri_RelationDesc;
-	LogicalRepRelation *remoterel = &edata->targetRel->remoterel;
+	LogicalRepRelMapEntry *targetRel = edata->targetRel;
+	LogicalRepRelation *remoterel = &targetRel->remoterel;
 	EPQState	epqstate;
 	TupleTableSlot *localslot;

Do we need this change? I didn't see any place to use the variable targetRel
afterwards.

5.
+		if (!AttributeNumberIsValid(mainattno))
+		{
+			/*
+			 * There are two cases to consider. First, if the index is a primary or
+			 * unique key, we cannot have any indexes with expressions. So, at this
+			 * point we are sure that the index we deal is not these.
+			 */
+			Assert(RelationGetReplicaIndex(rel) != RelationGetRelid(idxrel) &&
+				   RelationGetPrimaryKeyIndex(rel) != RelationGetRelid(idxrel));
+
+			/*
+			 * For a non-primary/unique index with an expression, we are sure that
+			 * the expression cannot be used for replication index search. The
+			 * reason is that we create relevant index paths by providing column
+			 * equalities. And, the planner does not pick expression indexes via
+			 * column equality restrictions in the query.
+			 */
+			continue;
+		}

Is it possible that it is a usable index with an expression? I think indexes
with an expression has been filtered in
FilterOutNotSuitablePathsForReplIdentFull(). If it can't be a usable index with
an expression, maybe we shouldn't use "continue" here.

6.
In the following case, I got a result which is different from HEAD, could you
please look into it?

-- publisher
CREATE TABLE test_replica_id_full (x int);
ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;

-- subscriber
CREATE TABLE test_replica_id_full (x int, y int);
CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x,y);
CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres port=5432' PUBLICATION tap_pub_rep_full;

-- publisher
INSERT INTO test_replica_id_full VALUES (1);
UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;

The data in subscriber:
on HEAD:
postgres=# select * from test_replica_id_full ;
x | y
---+---
2 |
(1 row)

After applying the patch:
postgres=# select * from test_replica_id_full ;
x | y
---+---
1 |
(1 row)

Regards,
Shi yu

#20Önder Kalacı
onderkalaci@gmail.com
In reply to: shiy.fnst@fujitsu.com (#19)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Hi,

Thanks for the review!

1.
In FilterOutNotSuitablePathsForReplIdentFull(), is
"nonPartialIndexPathList" a
good name for the list? Indexes on only expressions are also be filtered.

+static List *
+FilterOutNotSuitablePathsForReplIdentFull(List *pathlist)
+{
+       ListCell   *lc;
+       List *nonPartialIndexPathList = NIL;

Yes, true. We only started filtering the non-partial ones first. Now
changed to *suitableIndexList*, does that look right?

2.
+typedef struct LogicalRepPartMapEntry
+{
+       Oid                     partoid;                /*
LogicalRepPartMap's key */
+       LogicalRepRelMapEntry relmapentry;
+       Oid                     usableIndexOid; /* which index to use?
(Invalid when no index
+                                                                * used) */
+} LogicalRepPartMapEntry;

For partition tables, is it possible to use relmapentry->usableIndexOid to
mark
which index to use? Which means we don't need to add "usableIndexOid" to
LogicalRepPartMapEntry.

My intention was to make this explicit so that it is clear that partitions
can explicitly own indexes.

But I tried your suggested refactor, which looks good. So, I changed it.

Also, I realized that I do not have a test where the partition has an index
(not inherited from the parent), which I also added now.

3.
It looks we should change the comment for FindReplTupleInLocalRel() in this
patch.

/*
* Try to find a tuple received from the publication side (in
'remoteslot') in
* the corresponding local relation using either replica identity index,
* primary key or if needed, sequential scan.
*
* Local tuple, if found, is returned in '*localslot'.
*/
static bool
FindReplTupleInLocalRel(EState *estate, Relation localrel,

I made a small change, just adding "index". Do you expect a larger change?

4.
@@ -2030,16 +2017,19 @@ apply_handle_delete_internal(ApplyExecutionData
*edata,
{
EState     *estate = edata->estate;
Relation        localrel = relinfo->ri_RelationDesc;
-       LogicalRepRelation *remoterel = &edata->targetRel->remoterel;
+       LogicalRepRelMapEntry *targetRel = edata->targetRel;
+       LogicalRepRelation *remoterel = &targetRel->remoterel;
EPQState        epqstate;
TupleTableSlot *localslot;

Do we need this change? I didn't see any place to use the variable
targetRel
afterwards.

Seems so, changed it back.

5.
+               if (!AttributeNumberIsValid(mainattno))
+               {
+                       /*
+                        * There are two cases to consider. First, if the
index is a primary or
+                        * unique key, we cannot have any indexes with
expressions. So, at this
+                        * point we are sure that the index we deal is not
these.
+                        */
+                       Assert(RelationGetReplicaIndex(rel) !=
RelationGetRelid(idxrel) &&
+                                  RelationGetPrimaryKeyIndex(rel) !=
RelationGetRelid(idxrel));
+
+                       /*
+                        * For a non-primary/unique index with an
expression, we are sure that
+                        * the expression cannot be used for replication
index search. The
+                        * reason is that we create relevant index paths
by providing column
+                        * equalities. And, the planner does not pick
expression indexes via
+                        * column equality restrictions in the query.
+                        */
+                       continue;
+               }

Is it possible that it is a usable index with an expression? I think
indexes
with an expression has been filtered in
FilterOutNotSuitablePathsForReplIdentFull(). If it can't be a usable index
with
an expression, maybe we shouldn't use "continue" here.

Ok, I think there are some confusing comments in the code, which I updated.
Also, added one more explicit Assert to make the code a little more
readable.

We can support indexes involving expressions but not indexes that are only
consisting of expressions. FilterOutNotSuitablePathsForReplIdentFull()
filters out the latter, see IndexOnlyOnExpression().

So, for example, if we have an index as below, we are skipping the
expression while building the index scan keys:

CREATE INDEX people_names ON people (firstname, lastname, (id || '_' ||
sub_id));

We can consider removing `continue`, but that'd mean we should also adjust
the following code-block to handle indexprs. To me, that seems like an edge
case to implement at this point, given such an index is probably not
common. Do you think should I try to use the indexprs as well while
building the scan key?

I'm mostly trying to keep the complexity small. If you suggest this
limitation should be lifted, I can give it a shot. I think the limitation I
leave here is with a single sentence: *The index on the subscriber can only
use simple column references. *

6.
In the following case, I got a result which is different from HEAD, could
you
please look into it?

-- publisher
CREATE TABLE test_replica_id_full (x int);
ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;

-- subscriber
CREATE TABLE test_replica_id_full (x int, y int);
CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x,y);
CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres
port=5432' PUBLICATION tap_pub_rep_full;

-- publisher
INSERT INTO test_replica_id_full VALUES (1);
UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;

The data in subscriber:
on HEAD:
postgres=# select * from test_replica_id_full ;
x | y
---+---
2 |
(1 row)

After applying the patch:
postgres=# select * from test_replica_id_full ;
x | y
---+---
1 |
(1 row)

Ops, good catch. it seems we forgot to have:

skey[scankey_attoff].sk_flags |= SK_SEARCHNULL;

On head, the index used for this purpose could only be the primary key or
unique key on NOT NULL columns. Now, we do allow NULL values, and need to
search for them. Added that (and your test) to the updated patch.

As a semi-related note, tuples_equal() decides `true` for (NULL = NULL). I
have not changed that, and it seems right in this context. Do you see any
issues with that?

Also, I realized that the functions in the execReplication.c expect only
btree indexes. So, I skipped others as well. If that makes sense, I can
work on a follow-up patch after we can merge this, to remove some of the
limitations mentioned here.

Thanks,
Onder

Attachments:

v8_0001_use_index_on_subs_when_pub_rep_ident_full.patchapplication/x-patch; name=v8_0001_use_index_on_subs_when_pub_rep_ident_full.patchDownload+1499-71
#21Peter Smith
smithpb2250@gmail.com
In reply to: Önder Kalacı (#20)
#22Önder Kalacı
onderkalaci@gmail.com
In reply to: Peter Smith (#21)
#23Peter Smith
smithpb2250@gmail.com
In reply to: Önder Kalacı (#22)
#24Peter Smith
smithpb2250@gmail.com
In reply to: Önder Kalacı (#22)
#25Önder Kalacı
onderkalaci@gmail.com
In reply to: Peter Smith (#23)
#26Önder Kalacı
onderkalaci@gmail.com
In reply to: Peter Smith (#24)
#27Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#18)
#28Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#27)
#29Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Önder Kalacı (#28)
#30Peter Smith
smithpb2250@gmail.com
In reply to: Önder Kalacı (#28)
#31Önder Kalacı
onderkalaci@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#29)
#32Önder Kalacı
onderkalaci@gmail.com
In reply to: Peter Smith (#30)
#33Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Önder Kalacı (#31)
#34Peter Smith
smithpb2250@gmail.com
In reply to: Önder Kalacı (#32)
#35Peter Smith
smithpb2250@gmail.com
In reply to: Önder Kalacı (#32)
#36Önder Kalacı
onderkalaci@gmail.com
In reply to: Peter Smith (#34)
#37Önder Kalacı
onderkalaci@gmail.com
In reply to: Peter Smith (#35)
#38Peter Smith
smithpb2250@gmail.com
In reply to: Önder Kalacı (#37)
#39Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Peter Smith (#38)
#40wangw.fnst@fujitsu.com
wangw.fnst@fujitsu.com
In reply to: Önder Kalacı (#37)
#41Önder Kalacı
onderkalaci@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#39)
#42Önder Kalacı
onderkalaci@gmail.com
In reply to: wangw.fnst@fujitsu.com (#40)
#43Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#41)
#44Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Önder Kalacı (#42)
#45Önder Kalacı
onderkalaci@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#44)
#46Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Önder Kalacı (#45)
#47Önder Kalacı
onderkalaci@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#46)
#48Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Önder Kalacı (#47)
#49Önder Kalacı
onderkalaci@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#48)
#50Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Önder Kalacı (#49)
#51Önder Kalacı
onderkalaci@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#50)
#52Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Önder Kalacı (#51)
#53shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Önder Kalacı (#20)
#54Önder Kalacı
onderkalaci@gmail.com
In reply to: shiy.fnst@fujitsu.com (#53)
#55wangw.fnst@fujitsu.com
wangw.fnst@fujitsu.com
In reply to: Önder Kalacı (#54)
#56Önder Kalacı
onderkalaci@gmail.com
In reply to: wangw.fnst@fujitsu.com (#55)
#57shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Önder Kalacı (#56)
#58Önder Kalacı
onderkalaci@gmail.com
In reply to: shiy.fnst@fujitsu.com (#57)
#59Önder Kalacı
onderkalaci@gmail.com
In reply to: Önder Kalacı (#58)
#60Andres Freund
andres@anarazel.de
In reply to: Önder Kalacı (#59)
#61Önder Kalacı
onderkalaci@gmail.com
In reply to: Andres Freund (#60)
#62Tom Lane
tgl@sss.pgh.pa.us
In reply to: Önder Kalacı (#61)
#63Önder Kalacı
onderkalaci@gmail.com
In reply to: Tom Lane (#62)
#64Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#62)
#65Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#64)
#66Marco Slot
marco.slot@gmail.com
In reply to: Tom Lane (#65)
#67Önder Kalacı
onderkalaci@gmail.com
In reply to: Marco Slot (#66)
#68Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#67)
#69Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#68)
#70Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#69)
#71shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Önder Kalacı (#69)
#72shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: shiy.fnst@fujitsu.com (#71)
#73Peter Smith
smithpb2250@gmail.com
In reply to: Önder Kalacı (#69)
#74shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Amit Kapila (#70)
#75Amit Kapila
amit.kapila16@gmail.com
In reply to: shiy.fnst@fujitsu.com (#74)
#76Peter Smith
smithpb2250@gmail.com
In reply to: Önder Kalacı (#69)
#77Peter Smith
smithpb2250@gmail.com
In reply to: Peter Smith (#76)
#78Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#75)
#79Önder Kalacı
onderkalaci@gmail.com
In reply to: Önder Kalacı (#78)
#80Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#78)
#81Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#80)
#82Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#81)
#83Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#80)
#84Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#83)
#85Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#79)
#86Önder Kalacı
onderkalaci@gmail.com
In reply to: Andres Freund (#83)
#87shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Önder Kalacı (#86)
#88Amit Kapila
amit.kapila16@gmail.com
In reply to: shiy.fnst@fujitsu.com (#87)
#89Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#85)
#90Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#89)
#91Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#88)
#92Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#90)
#93Peter Smith
smithpb2250@gmail.com
In reply to: Önder Kalacı (#92)
#94Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#91)
#95Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Önder Kalacı (#92)
#96Peter Smith
smithpb2250@gmail.com
In reply to: Önder Kalacı (#92)
#97Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#92)
#98vignesh C
vignesh21@gmail.com
In reply to: Önder Kalacı (#92)
#99Önder Kalacı
onderkalaci@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#95)
#100Önder Kalacı
onderkalaci@gmail.com
In reply to: Peter Smith (#96)
#101Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Önder Kalacı (#92)
#102Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#97)
#103Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#102)
#104Önder Kalacı
onderkalaci@gmail.com
In reply to: vignesh C (#98)
#105Önder Kalacı
onderkalaci@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#101)
#106Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#99)
#107Peter Smith
smithpb2250@gmail.com
In reply to: Önder Kalacı (#100)
#108Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Smith (#107)
#109Peter Smith
smithpb2250@gmail.com
In reply to: Amit Kapila (#108)
#110Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Smith (#109)
#111Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#106)
#112Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#104)
#113Dilip Kumar
dilipbalaut@gmail.com
In reply to: Önder Kalacı (#111)
#114Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#113)
#115Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#108)
#116Önder Kalacı
onderkalaci@gmail.com
In reply to: Peter Smith (#107)
#117Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#112)
#118Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#88)
#119Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#114)
#120Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#84)
#121Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#118)
#122Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#120)
#123shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Önder Kalacı (#117)
#124Amit Kapila
amit.kapila16@gmail.com
In reply to: shiy.fnst@fujitsu.com (#123)
#125Peter Smith
smithpb2250@gmail.com
In reply to: Önder Kalacı (#116)
#126Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Smith (#125)
#127Peter Smith
smithpb2250@gmail.com
In reply to: Amit Kapila (#126)
#128Önder Kalacı
onderkalaci@gmail.com
In reply to: Andres Freund (#121)
#129Önder Kalacı
onderkalaci@gmail.com
In reply to: shiy.fnst@fujitsu.com (#123)
#130Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#124)
#131Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Smith (#127)
#132Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#131)
#133Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#122)
#134Peter Smith
smithpb2250@gmail.com
In reply to: Amit Kapila (#110)
#135shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Önder Kalacı (#132)
#136Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#132)
#137Peter Smith
smithpb2250@gmail.com
In reply to: Önder Kalacı (#132)
#138Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Smith (#137)
#139Peter Smith
smithpb2250@gmail.com
In reply to: Amit Kapila (#138)
#140Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Önder Kalacı (#132)
#141Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Zhijie Hou (Fujitsu) (#140)
#142Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#136)
#143Önder Kalacı
onderkalaci@gmail.com
In reply to: Peter Smith (#137)
#144Önder Kalacı
onderkalaci@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#140)
#145Önder Kalacı
onderkalaci@gmail.com
In reply to: shiy.fnst@fujitsu.com (#135)
#146Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#145)
#147vignesh C
vignesh21@gmail.com
In reply to: Önder Kalacı (#104)
#148vignesh C
vignesh21@gmail.com
In reply to: Önder Kalacı (#132)
#149Önder Kalacı
onderkalaci@gmail.com
In reply to: vignesh C (#147)
#150Önder Kalacı
onderkalaci@gmail.com
In reply to: vignesh C (#148)
#151Peter Smith
smithpb2250@gmail.com
In reply to: Önder Kalacı (#143)
#152Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Smith (#151)
#153vignesh C
vignesh21@gmail.com
In reply to: Önder Kalacı (#150)
#154Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#149)
#155Peter Smith
smithpb2250@gmail.com
In reply to: Amit Kapila (#154)
#156Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Smith (#155)
#157Önder Kalacı
onderkalaci@gmail.com
In reply to: Peter Smith (#151)
#158Önder Kalacı
onderkalaci@gmail.com
In reply to: vignesh C (#153)
#159Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#154)
#160Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#159)
#161Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#160)
#162Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#146)
#163Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#160)
#164Peter Smith
smithpb2250@gmail.com
In reply to: Önder Kalacı (#163)
#165Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#162)
#166Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#161)
#167Önder Kalacı
onderkalaci@gmail.com
In reply to: Peter Smith (#164)
#168Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#165)
#169Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#168)
#170Önder Kalacı
onderkalaci@gmail.com
In reply to: Önder Kalacı (#166)
#171Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#166)
#172Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#170)
#173Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#171)
#174Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#173)
#175Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#172)
#176Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#174)
#177Peter Smith
smithpb2250@gmail.com
In reply to: Amit Kapila (#172)
#178shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Amit Kapila (#169)
#179Önder Kalacı
onderkalaci@gmail.com
In reply to: shiy.fnst@fujitsu.com (#178)
#180Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Önder Kalacı (#179)
#181Önder Kalacı
onderkalaci@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#180)
#182Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Smith (#177)
#183Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#182)
#184Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#183)
#185Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#184)
#186shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Önder Kalacı (#185)
#187Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#185)
#188Önder Kalacı
onderkalaci@gmail.com
In reply to: shiy.fnst@fujitsu.com (#186)
#189Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#187)
#190Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#188)
#191Önder Kalacı
onderkalaci@gmail.com
In reply to: Amit Kapila (#190)
#192vignesh C
vignesh21@gmail.com
In reply to: Önder Kalacı (#191)
#193Önder Kalacı
onderkalaci@gmail.com
In reply to: vignesh C (#192)
#194Amit Kapila
amit.kapila16@gmail.com
In reply to: Önder Kalacı (#193)
#195Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#194)
#196Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#195)