doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

Started by Masahiko Sawadaabout 3 years ago37 messageshackers

sawada.mshk@gmail.com

about 3 years ago

Hi,

Commit 89e46da5e5 allowed us to use indexes for searching on REPLICA
IDENTITY FULL tables. The documentation explains:

When replica identity <quote>full</quote> is specified,
indexes can be used on the subscriber side for searching the rows. Candidate
indexes must be btree, non-partial, and have at least one column reference
(i.e. cannot consist of only expressions).

To be exact, IIUC the column reference must be on the leftmost column
of indexes. Does it make sense to mention that? I've attached the
patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Amit Kapila

amit.kapila16@gmail.com

about 3 years ago

In reply to: Masahiko Sawada (#1)

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

On Mon, Jul 3, 2023 at 7:45 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Commit 89e46da5e5 allowed us to use indexes for searching on REPLICA
IDENTITY FULL tables. The documentation explains:

When replica identity <quote>full</quote> is specified,
indexes can be used on the subscriber side for searching the rows. Candidate
indexes must be btree, non-partial, and have at least one column reference
(i.e. cannot consist of only expressions).

To be exact, IIUC the column reference must be on the leftmost column
of indexes. Does it make sense to mention that?

Yeah, I think it is good to mention that. Accordingly, the comments
atop build_replindex_scan_key(),
FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
should also be adjusted.

--
With Regards,
Amit Kapila.

Peter Smith

smithpb2250@gmail.com

about 3 years ago

In reply to: Masahiko Sawada (#1)

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

Hi. Here are some review comments for this patch.

+1 for the patch idea.

------

I wasn't sure about the code comment adjustments suggested by Amit [1]Amit suggestions - /messages/by-id/CAA4eK1LZ_A-UmC_P+_hLi+Pbwyqak+vRKemZ7imzk2puVTpHOA@mail.gmail.com:
"Accordingly, the comments atop build_replindex_scan_key(),
FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
should also be adjusted."

Actually, I thought the FindUsableIndexForReplicaIdentityFull()
function comment is *already* describing the limitation about the
leftmost column (see fragment below), so IIUC the Sawada-san patch is
only trying to expose that same information in the PG docs.

[FindUsableIndexForReplicaIdentityFull comment fragment]
* We also skip indexes if the remote relation does not contain the leftmost
* column of the index. This is because in most such cases sequential scan is
* favorable over index scan.

OTOH, it may be better if these limitation rule details were not
scattered in the code. e.g. build_replindex_scan_key() function
comment can be simplified:

CURRENT:
* This is not generic routine, it expects the idxrel to be a btree, non-partial
* and have at least one column reference (i.e. cannot consist of only
* expressions).

SUGGESTION:
This is not a generic routine. It expects the 'idxrel' to be an index
deemed "usable" by the function
FindUsableIndexForReplicaIdentityFull().

------
doc/src/sgml/logical-replication.sgml

1.
    the key.  When replica identity <literal>FULL</literal> is specified,
    indexes can be used on the subscriber side for searching the rows.
Candidate
    indexes must be btree, non-partial, and have at least one column reference
-   (i.e. cannot consist of only expressions).  These restrictions
-   on the non-unique index properties adhere to some of the restrictions that
-   are enforced for primary keys.  If there are no such suitable indexes,
+   at the leftmost column indexes (i.e. cannot consist of only
expressions).  These
+   restrictions on the non-unique index properties adhere to some of
the restrictions
+   that are enforced for primary keys.  If there are no such suitable indexes,
    the search on the subscriber side can be very inefficient, therefore
    replica identity <literal>FULL</literal> should only be used as a
    fallback if no other solution is possible.  If a replica identity other

Isn't this using the word "indexes" with different meanings in the
same sentence? e.g. IIUC "leftmost column indexes" is referring to the
ordinal number of the index fields. TBH, I am not sure the patch
wording is even describing the limitation in quite the same way as
what the code is actually doing.

HEAD (code comment):
* We also skip indexes if the remote relation does not contain the leftmost
* column of the index. This is because in most such cases sequential scan is
* favorable over index scan.

HEAD (rendered docs)
Candidate indexes must be btree, non-partial, and have at least one
column reference (i.e. cannot consist of only expressions). These
restrictions on the non-unique index properties adhere to some of the
restrictions that are enforced for primary keys.

PATCHED (rendered docs)
Candidate indexes must be btree, non-partial, and have at least one
column reference at the leftmost column indexes (i.e. cannot consist
of only expressions). These restrictions on the non-unique index
properties adhere to some of the restrictions that are enforced for
primary keys.

MY SUGGESTION:
Candidate indexes must be btree, non-partial, and have at least one
column reference (i.e. cannot consist of only expressions).
Furthermore, the leftmost field of the candidate index must be a
column of the published table. These restrictions on the non-unique
index properties adhere to some of the restrictions that are enforced
for primary keys.

------
[1]: Amit suggestions - /messages/by-id/CAA4eK1LZ_A-UmC_P+_hLi+Pbwyqak+vRKemZ7imzk2puVTpHOA@mail.gmail.com
/messages/by-id/CAA4eK1LZ_A-UmC_P+_hLi+Pbwyqak+vRKemZ7imzk2puVTpHOA@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Amit Kapila

amit.kapila16@gmail.com

about 3 years ago

In reply to: Peter Smith (#3)

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

On Wed, Jul 5, 2023 at 9:01 AM Peter Smith <smithpb2250@gmail.com> wrote:

Hi. Here are some review comments for this patch.

+1 for the patch idea.

------

I wasn't sure about the code comment adjustments suggested by Amit [1]:
"Accordingly, the comments atop build_replindex_scan_key(),
FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
should also be adjusted."

Actually, I thought the FindUsableIndexForReplicaIdentityFull()
function comment is *already* describing the limitation about the
leftmost column (see fragment below), so IIUC the Sawada-san patch is
only trying to expose that same information in the PG docs.

[FindUsableIndexForReplicaIdentityFull comment fragment]
* We also skip indexes if the remote relation does not contain the leftmost
* column of the index. This is because in most such cases sequential scan is
* favorable over index scan.

This implies that the leftmost column of the index must be
non-expression but I feel what the patch intends to say in docs is
more straightforward and it doesn't match what the proposed docs says.

~

OTOH, it may be better if these limitation rule details were not
scattered in the code. e.g. build_replindex_scan_key() function
comment can be simplified:

CURRENT:
* This is not generic routine, it expects the idxrel to be a btree, non-partial
* and have at least one column reference (i.e. cannot consist of only
* expressions).

SUGGESTION:
This is not a generic routine. It expects the 'idxrel' to be an index
deemed "usable" by the function
FindUsableIndexForReplicaIdentityFull().

Note that for PK/ReplicaIdentity, we don't even call
FindUsableIndexForReplicaIdentityFull() but build_replindex_scan_key()
would still be called for such index. So, I am not sure your proposed
wording is an improvement.

------
doc/src/sgml/logical-replication.sgml
1.
the key.  When replica identity <literal>FULL</literal> is specified,
indexes can be used on the subscriber side for searching the rows.
Candidate
indexes must be btree, non-partial, and have at least one column reference
-   (i.e. cannot consist of only expressions).  These restrictions
-   on the non-unique index properties adhere to some of the restrictions that
-   are enforced for primary keys.  If there are no such suitable indexes,
+   at the leftmost column indexes (i.e. cannot consist of only
expressions).  These
+   restrictions on the non-unique index properties adhere to some of
the restrictions
+   that are enforced for primary keys.  If there are no such suitable indexes,
the search on the subscriber side can be very inefficient, therefore
replica identity <literal>FULL</literal> should only be used as a
fallback if no other solution is possible.  If a replica identity other
Isn't this using the word "indexes" with different meanings in the
same sentence? e.g. IIUC "leftmost column indexes" is referring to the
ordinal number of the index fields. TBH, I am not sure the patch
wording is even describing the limitation in quite the same way as
what the code is actually doing.

HEAD (code comment):
* We also skip indexes if the remote relation does not contain the leftmost
* column of the index. This is because in most such cases sequential scan is
* favorable over index scan.

HEAD (rendered docs)
Candidate indexes must be btree, non-partial, and have at least one
column reference (i.e. cannot consist of only expressions). These
restrictions on the non-unique index properties adhere to some of the
restrictions that are enforced for primary keys.

PATCHED (rendered docs)
Candidate indexes must be btree, non-partial, and have at least one
column reference at the leftmost column indexes (i.e. cannot consist
of only expressions). These restrictions on the non-unique index
properties adhere to some of the restrictions that are enforced for
primary keys.

MY SUGGESTION:
Candidate indexes must be btree, non-partial, and have at least one
column reference (i.e. cannot consist of only expressions).
Furthermore, the leftmost field of the candidate index must be a
column of the published table. These restrictions on the non-unique
index properties adhere to some of the restrictions that are enforced
for primary keys.

I don't know if this suggestion is what the code is actually doing. In
function RemoteRelContainsLeftMostColumnOnIdx(), we have the following
checks:
==========
keycol = indexInfo->ii_IndexAttrNumbers[0];
if (!AttributeNumberIsValid(keycol))
return false;

if (attrmap->maplen <= AttrNumberGetAttrOffset(keycol))
return false;

return attrmap->attnums[AttrNumberGetAttrOffset(keycol)] >= 0;
==========

The first of these checks indicates that the leftmost column of the
index should be non-expression, second and third indicates what you
suggest in your wording. We can also think that what you wrote in a
way is a superset of "leftmost index column is a non-expression" and
"leftmost index column should be present in remote rel" but I feel it
would be better to explicit about the first part as it is easy to
understand for users at least in docs.

--
With Regards,
Amit Kapila.

Masahiko Sawada

sawada.mshk@gmail.com

about 3 years ago

In reply to: Amit Kapila (#4)

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

On Wed, Jul 5, 2023 at 2:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jul 5, 2023 at 9:01 AM Peter Smith <smithpb2250@gmail.com> wrote:

Hi. Here are some review comments for this patch.

+1 for the patch idea.

------

I wasn't sure about the code comment adjustments suggested by Amit [1]:
"Accordingly, the comments atop build_replindex_scan_key(),
FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
should also be adjusted."

As for IsIndexOnlyOnExpression(), what part do you think we need to
adjust? It says:

/*
* Returns true if the given index consists only of expressions such as:
* CREATE INDEX idx ON table(foo(col));
*
* Returns false even if there is one column reference:
* CREATE INDEX idx ON table(foo(col), col_2);
*/

and it seems to me that the function doesn't check if the leftmost
index column is a non-expression.

doc/src/sgml/logical-replication.sgml

1.
the key.  When replica identity <literal>FULL</literal> is specified,
indexes can be used on the subscriber side for searching the rows.
Candidate
indexes must be btree, non-partial, and have at least one column reference
-   (i.e. cannot consist of only expressions).  These restrictions
-   on the non-unique index properties adhere to some of the restrictions that
-   are enforced for primary keys.  If there are no such suitable indexes,
+   at the leftmost column indexes (i.e. cannot consist of only
expressions).  These
+   restrictions on the non-unique index properties adhere to some of
the restrictions
+   that are enforced for primary keys.  If there are no such suitable indexes,
the search on the subscriber side can be very inefficient, therefore
replica identity <literal>FULL</literal> should only be used as a
fallback if no other solution is possible.  If a replica identity other

Isn't this using the word "indexes" with different meanings in the
same sentence? e.g. IIUC "leftmost column indexes" is referring to the
ordinal number of the index fields.

That was my mistake, it should be " at the leftmost column".

I don't know if this suggestion is what the code is actually doing. In
function RemoteRelContainsLeftMostColumnOnIdx(), we have the following
checks:
==========
keycol = indexInfo->ii_IndexAttrNumbers[0];
if (!AttributeNumberIsValid(keycol))
return false;

if (attrmap->maplen <= AttrNumberGetAttrOffset(keycol))
return false;

return attrmap->attnums[AttrNumberGetAttrOffset(keycol)] >= 0;
==========

The first of these checks indicates that the leftmost column of the
index should be non-expression, second and third indicates what you
suggest in your wording. We can also think that what you wrote in a
way is a superset of "leftmost index column is a non-expression" and
"leftmost index column should be present in remote rel" but I feel it
would be better to explicit about the first part as it is easy to
understand for users at least in docs.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Amit Kapila

amit.kapila16@gmail.com

about 3 years ago

In reply to: Masahiko Sawada (#5)

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

On Wed, Jul 5, 2023 at 12:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jul 5, 2023 at 2:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jul 5, 2023 at 9:01 AM Peter Smith <smithpb2250@gmail.com> wrote:

Hi. Here are some review comments for this patch.

+1 for the patch idea.

------

I wasn't sure about the code comment adjustments suggested by Amit [1]:
"Accordingly, the comments atop build_replindex_scan_key(),
FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
should also be adjusted."

As for IsIndexOnlyOnExpression(), what part do you think we need to
adjust? It says:

/*
* Returns true if the given index consists only of expressions such as:
* CREATE INDEX idx ON table(foo(col));
*
* Returns false even if there is one column reference:
* CREATE INDEX idx ON table(foo(col), col_2);
*/

and it seems to me that the function doesn't check if the leftmost
index column is a non-expression.

Right, so, we can leave this as is.

--
With Regards,
Amit Kapila.

Peter Smith

smithpb2250@gmail.com

about 3 years ago

In reply to: Masahiko Sawada (#5)

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

On Wed, Jul 5, 2023 at 4:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jul 5, 2023 at 2:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jul 5, 2023 at 9:01 AM Peter Smith <smithpb2250@gmail.com> wrote:

Hi. Here are some review comments for this patch.

+1 for the patch idea.

------

I wasn't sure about the code comment adjustments suggested by Amit [1]:
"Accordingly, the comments atop build_replindex_scan_key(),
FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
should also be adjusted."

As for IsIndexOnlyOnExpression(), what part do you think we need to
adjust? It says:

/*
* Returns true if the given index consists only of expressions such as:
* CREATE INDEX idx ON table(foo(col));
*
* Returns false even if there is one column reference:
* CREATE INDEX idx ON table(foo(col), col_2);
*/

and it seems to me that the function doesn't check if the leftmost
index column is a non-expression.

TBH, this IsIndexOnlyOnExpression() function seemed redundant to me,
otherwise, there can be some indexes that are firstly considered
"useable" but then fail the subsequent leftmost check. It does not
seem right.

doc/src/sgml/logical-replication.sgml

1.
the key.  When replica identity <literal>FULL</literal> is specified,
indexes can be used on the subscriber side for searching the rows.
Candidate
indexes must be btree, non-partial, and have at least one column reference
-   (i.e. cannot consist of only expressions).  These restrictions
-   on the non-unique index properties adhere to some of the restrictions that
-   are enforced for primary keys.  If there are no such suitable indexes,
+   at the leftmost column indexes (i.e. cannot consist of only
expressions).  These
+   restrictions on the non-unique index properties adhere to some of
the restrictions
+   that are enforced for primary keys.  If there are no such suitable indexes,
the search on the subscriber side can be very inefficient, therefore
replica identity <literal>FULL</literal> should only be used as a
fallback if no other solution is possible.  If a replica identity other

Isn't this using the word "indexes" with different meanings in the
same sentence? e.g. IIUC "leftmost column indexes" is referring to the
ordinal number of the index fields.

That was my mistake, it should be " at the leftmost column".

IIUC the subscriber-side table can have more columns than the
publisher-side table, so just describing in the docs that the leftmost
INDEX field must be a column may not be quite enough; it also needs to
say that column has to exist on the publisher-table, doesn't it?

Also, after you document this 'leftmost field restriction' that
already implies there *must* be a non-expression in the INDEX. So I
thought we can just omit the "(i.e. cannot consist of only
expressions)" part.

Anyway, I will wait to see the wording of the updated patch before
commenting further.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Masahiko Sawada

sawada.mshk@gmail.com

about 3 years ago

In reply to: Peter Smith (#7)

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

On Thu, Jul 6, 2023 at 7:58 AM Peter Smith <smithpb2250@gmail.com> wrote:

On Wed, Jul 5, 2023 at 4:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jul 5, 2023 at 2:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jul 5, 2023 at 9:01 AM Peter Smith <smithpb2250@gmail.com> wrote:

Hi. Here are some review comments for this patch.

+1 for the patch idea.

------

I wasn't sure about the code comment adjustments suggested by Amit [1]:
"Accordingly, the comments atop build_replindex_scan_key(),
FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
should also be adjusted."

As for IsIndexOnlyOnExpression(), what part do you think we need to
adjust? It says:

/*
* Returns true if the given index consists only of expressions such as:
* CREATE INDEX idx ON table(foo(col));
*
* Returns false even if there is one column reference:
* CREATE INDEX idx ON table(foo(col), col_2);
*/

and it seems to me that the function doesn't check if the leftmost
index column is a non-expression.

TBH, this IsIndexOnlyOnExpression() function seemed redundant to me,
otherwise, there can be some indexes that are firstly considered
"useable" but then fail the subsequent leftmost check. It does not
seem right.

I see your point. IsIndexUsableForReplicaIdentityFull(), the sole user
of IsIndexOnlyOnExpression(), is also called by
RelationFindReplTupleByIndex() in an assertion build. I thought this
is the reason why we have separate IsIndexOnlyOnExpression() ( and
IsIndexUsableForReplicaIdentityFull()). But this assertion doesn't
check if the leftmost index column exists on the remote relation. What
are we doing this check for? If it's not necessary, we can remove this
assertion and merge both IsIndexOnlyOnExpression() and
IsIndexUsableForReplicaIdentityFull() into
FindUsableIndexForReplicaIdentityFull().

doc/src/sgml/logical-replication.sgml
1.
the key.  When replica identity <literal>FULL</literal> is specified,
indexes can be used on the subscriber side for searching the rows.
Candidate
indexes must be btree, non-partial, and have at least one column reference
-   (i.e. cannot consist of only expressions).  These restrictions
-   on the non-unique index properties adhere to some of the restrictions that
-   are enforced for primary keys.  If there are no such suitable indexes,
+   at the leftmost column indexes (i.e. cannot consist of only
expressions).  These
+   restrictions on the non-unique index properties adhere to some of
the restrictions
+   that are enforced for primary keys.  If there are no such suitable indexes,
the search on the subscriber side can be very inefficient, therefore
replica identity <literal>FULL</literal> should only be used as a
fallback if no other solution is possible.  If a replica identity other
Isn't this using the word "indexes" with different meanings in the
same sentence? e.g. IIUC "leftmost column indexes" is referring to the
ordinal number of the index fields.
That was my mistake, it should be " at the leftmost column".
IIUC the subscriber-side table can have more columns than the
publisher-side table, so just describing in the docs that the leftmost
INDEX field must be a column may not be quite enough; it also needs to
say that column has to exist on the publisher-table, doesn't it?

Right. I've updated the patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Peter Smith

smithpb2250@gmail.com

about 3 years ago

In reply to: Masahiko Sawada (#8)

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

Hi, Here are my review comments for patch v2.

======
1. doc/src/sgml/logical-replication.sgml

Candidate indexes must be btree, non-partial, and have at least one
column reference to the published table column at the leftmost column
(i.e. cannot consist of only expressions).

There is only one column which can be the leftmost one, so the wording
"At least one ... at the leftmost" seemed a bit strange to me.
Personally, I would phrase it something like below:

SUGGESTION #1
Candidate indexes must be btree, non-partial, and the leftmost index
column must reference a published table column (i.e. the index cannot
consist of only expressions).

SUGGESTION #2 (same as above, but omitting the "only expressions"
part, which I think is implied by the "leftmost" rule anyway)
Candidate indexes must be btree, non-partial, and the leftmost index
column must reference a published table column.

======
2. src/backend/replication/logical/relation.c

  * Returns the oid of an index that can be used by the apply worker to scan
  * the relation. The index must be btree, non-partial, and have at least
- * one column reference (i.e. cannot consist of only expressions). These
- * limitations help to keep the index scan similar to PK/RI index scans.
+ * one column reference to the remote relation's column at the leftmost column
+ * (i.e. cannot consist of only expressions). These limitations help
to keep the
+ * index scan similar to PK/RI index scans.

This comment text is similar to the docs change, so refer to the same
suggestions as #1 above.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#10

Masahiko Sawada

sawada.mshk@gmail.com

about 3 years ago

In reply to: Peter Smith (#9)

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

On Fri, Jul 7, 2023 at 10:55 AM Peter Smith <smithpb2250@gmail.com> wrote:

Hi, Here are my review comments for patch v2.

======
1. doc/src/sgml/logical-replication.sgml

Candidate indexes must be btree, non-partial, and have at least one
column reference to the published table column at the leftmost column
(i.e. cannot consist of only expressions).

~

There is only one column which can be the leftmost one, so the wording
"At least one ... at the leftmost" seemed a bit strange to me.
Personally, I would phrase it something like below:

SUGGESTION #1
Candidate indexes must be btree, non-partial, and the leftmost index
column must reference a published table column (i.e. the index cannot
consist of only expressions).

I prefer the first suggestion. I've attached the updated patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#11

Amit Kapila

amit.kapila16@gmail.com

about 3 years ago

In reply to: Masahiko Sawada (#10)

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

On Fri, Jul 7, 2023 at 1:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I prefer the first suggestion. I've attached the updated patch.

This looks mostly good to me but I think it would be better if we can
also add the information that the leftmost index column must be a
non-expression. So, how about: "Candidate indexes must be btree,
non-partial, and the leftmost index column must be a non-expression
and reference to a published table column (i.e. cannot consist of only
expressions)."?

--
With Regards,
Amit Kapila.

#12

Peter Smith

smithpb2250@gmail.com

about 3 years ago

In reply to: Amit Kapila (#11)

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

On Sat, Jul 8, 2023 at 1:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jul 7, 2023 at 1:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I prefer the first suggestion. I've attached the updated patch.

This looks mostly good to me but I think it would be better if we can
also add the information that the leftmost index column must be a
non-expression. So, how about: "Candidate indexes must be btree,
non-partial, and the leftmost index column must be a non-expression
and reference to a published table column (i.e. cannot consist of only
expressions)."?

That part in parentheses ought to say "the index ..." because it is
referring to the full INDEX, not to the leftmost column. I think this
was missed when Sawada-san took my previous suggestion [1]my review v2 - /messages/by-id/CAHut+PsFdTZJ7DG1jyu7BpA_1d4hwEd-Q+mQAPWcj1ZLD_X5Dw@mail.gmail.com.

IMO it doesn't sound right to say the "index column must be a
non-expression". It is already a non-expression because it is a
column. So I think it would be better to refer to this as an INDEX
"field" instead of an INDEX column. Note that "field" is the same
terminology used in the docs for CREATE INDEX [2]create index - https://www.postgresql.org/docs/current/sql-createindex.html.

SUGGESTION
Candidate indexes must be btree, non-partial, and the leftmost index
field must be a column that references a published table column (i.e.
the index cannot consist of only expressions).

~~~~

What happened to the earlier idea of removing/merging the redundant
(?) function IsIndexOnlyOnExpression()?
- Something wrong with that?
- Chose not to do it?
- Will do it raised in another thread?

------
[1]: my review v2 - /messages/by-id/CAHut+PsFdTZJ7DG1jyu7BpA_1d4hwEd-Q+mQAPWcj1ZLD_X5Dw@mail.gmail.com
/messages/by-id/CAHut+PsFdTZJ7DG1jyu7BpA_1d4hwEd-Q+mQAPWcj1ZLD_X5Dw@mail.gmail.com
[2]: create index - https://www.postgresql.org/docs/current/sql-createindex.html

Kind Regards,
Peter Smith.
Fujitsu Australia

#13

Amit Kapila

amit.kapila16@gmail.com

about 3 years ago

In reply to: Peter Smith (#12)

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

On Mon, Jul 10, 2023 at 7:55 AM Peter Smith <smithpb2250@gmail.com> wrote:

On Sat, Jul 8, 2023 at 1:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jul 7, 2023 at 1:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I prefer the first suggestion. I've attached the updated patch.

This looks mostly good to me but I think it would be better if we can
also add the information that the leftmost index column must be a
non-expression. So, how about: "Candidate indexes must be btree,
non-partial, and the leftmost index column must be a non-expression
and reference to a published table column (i.e. cannot consist of only
expressions)."?

That part in parentheses ought to say "the index ..." because it is
referring to the full INDEX, not to the leftmost column. I think this
was missed when Sawada-san took my previous suggestion [1].

IMO it doesn't sound right to say the "index column must be a
non-expression". It is already a non-expression because it is a
column. So I think it would be better to refer to this as an INDEX
"field" instead of an INDEX column. Note that "field" is the same
terminology used in the docs for CREATE INDEX [2].

I thought it would be better to be explicit for this case but I am
fine if Sawada-San and you prefer some other way to document it.

--
With Regards,
Amit Kapila.

#14

Peter Smith

smithpb2250@gmail.com

about 3 years ago

In reply to: Amit Kapila (#13)

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

On Mon, Jul 10, 2023 at 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jul 10, 2023 at 7:55 AM Peter Smith <smithpb2250@gmail.com> wrote:

On Sat, Jul 8, 2023 at 1:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jul 7, 2023 at 1:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I prefer the first suggestion. I've attached the updated patch.

This looks mostly good to me but I think it would be better if we can
also add the information that the leftmost index column must be a
non-expression. So, how about: "Candidate indexes must be btree,
non-partial, and the leftmost index column must be a non-expression
and reference to a published table column (i.e. cannot consist of only
expressions)."?

That part in parentheses ought to say "the index ..." because it is
referring to the full INDEX, not to the leftmost column. I think this
was missed when Sawada-san took my previous suggestion [1].

IMO it doesn't sound right to say the "index column must be a
non-expression". It is already a non-expression because it is a
column. So I think it would be better to refer to this as an INDEX
"field" instead of an INDEX column. Note that "field" is the same
terminology used in the docs for CREATE INDEX [2].

I thought it would be better to be explicit for this case but I am
fine if Sawada-San and you prefer some other way to document it.

I see. How about just moving the parenthesized part to explicitly
refer only to the leftmost field?

SUGGESTION
Candidate indexes must be btree, non-partial, and the leftmost index
field must be a column (not an expression) that references a published
table column.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#15

Amit Kapila

amit.kapila16@gmail.com

about 3 years ago

In reply to: Peter Smith (#14)

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

On Tue, Jul 11, 2023 at 4:54 AM Peter Smith <smithpb2250@gmail.com> wrote:

On Mon, Jul 10, 2023 at 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jul 10, 2023 at 7:55 AM Peter Smith <smithpb2250@gmail.com> wrote:

On Sat, Jul 8, 2023 at 1:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jul 7, 2023 at 1:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I prefer the first suggestion. I've attached the updated patch.

This looks mostly good to me but I think it would be better if we can
also add the information that the leftmost index column must be a
non-expression. So, how about: "Candidate indexes must be btree,
non-partial, and the leftmost index column must be a non-expression
and reference to a published table column (i.e. cannot consist of only
expressions)."?

That part in parentheses ought to say "the index ..." because it is
referring to the full INDEX, not to the leftmost column. I think this
was missed when Sawada-san took my previous suggestion [1].

IMO it doesn't sound right to say the "index column must be a
non-expression". It is already a non-expression because it is a
column. So I think it would be better to refer to this as an INDEX
"field" instead of an INDEX column. Note that "field" is the same
terminology used in the docs for CREATE INDEX [2].

I thought it would be better to be explicit for this case but I am
fine if Sawada-San and you prefer some other way to document it.

I see. How about just moving the parenthesized part to explicitly
refer only to the leftmost field?

SUGGESTION
Candidate indexes must be btree, non-partial, and the leftmost index
field must be a column (not an expression) that references a published
table column.

Yeah, something like that works for me.

--
With Regards,
Amit Kapila.

#16

Masahiko Sawada

sawada.mshk@gmail.com

about 3 years ago

In reply to: Amit Kapila (#15)

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

On Tue, Jul 11, 2023 at 1:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jul 11, 2023 at 4:54 AM Peter Smith <smithpb2250@gmail.com> wrote:

On Mon, Jul 10, 2023 at 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jul 10, 2023 at 7:55 AM Peter Smith <smithpb2250@gmail.com> wrote:

On Sat, Jul 8, 2023 at 1:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jul 7, 2023 at 1:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I prefer the first suggestion. I've attached the updated patch.

This looks mostly good to me but I think it would be better if we can
also add the information that the leftmost index column must be a
non-expression. So, how about: "Candidate indexes must be btree,
non-partial, and the leftmost index column must be a non-expression
and reference to a published table column (i.e. cannot consist of only
expressions)."?

That part in parentheses ought to say "the index ..." because it is
referring to the full INDEX, not to the leftmost column. I think this
was missed when Sawada-san took my previous suggestion [1].

IMO it doesn't sound right to say the "index column must be a
non-expression". It is already a non-expression because it is a
column. So I think it would be better to refer to this as an INDEX
"field" instead of an INDEX column. Note that "field" is the same
terminology used in the docs for CREATE INDEX [2].

I thought it would be better to be explicit for this case but I am
fine if Sawada-San and you prefer some other way to document it.

I see. How about just moving the parenthesized part to explicitly
refer only to the leftmost field?

SUGGESTION
Candidate indexes must be btree, non-partial, and the leftmost index
field must be a column (not an expression) that references a published
table column.

Yeah, something like that works for me.

Looks good to me. I've attached the updated patch. In the comment in
RemoteRelContainsLeftMostColumnOnIdx(), I used "remote relation"
instead of "published table" as it's more consistent with surrounding
comments. Also, I've removed the comment starting with "We also skip
indedes..." as the new comment now covers it. Please review it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#17

Peter Smith

smithpb2250@gmail.com

about 3 years ago

In reply to: Masahiko Sawada (#16)

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

Here are my comments for v4.

======

Docs/Comments:

All the docs and updated comments LTGM, except I felt one sentence
might be written differently to avoid nested parentheses.

BEFORE
...used for REPLICA IDENTITY FULL table (see
FindUsableIndexForReplicaIdentityFull() for details).

AFTER
...used for REPLICA IDENTITY FULL table. See
FindUsableIndexForReplicaIdentityFull() for details.

====

Logic:

What was the decision about the earlier question [1]/messages/by-id/CAHut+PuGhGHp9Uq8-Wk7uBiirAHF5quDY_1Z6WDoUKRZqkn+rg@mail.gmail.com of
removing/merging the function IsIndexOnlyOnExpression()?

------
[1]: /messages/by-id/CAHut+PuGhGHp9Uq8-Wk7uBiirAHF5quDY_1Z6WDoUKRZqkn+rg@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

#18

Masahiko Sawada

sawada.mshk@gmail.com

about 3 years ago

In reply to: Peter Smith (#17)

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

On Tue, Jul 11, 2023 at 5:31 PM Peter Smith <smithpb2250@gmail.com> wrote:

Here are my comments for v4.

======

Docs/Comments:

All the docs and updated comments LTGM, except I felt one sentence
might be written differently to avoid nested parentheses.

BEFORE
...used for REPLICA IDENTITY FULL table (see
FindUsableIndexForReplicaIdentityFull() for details).

AFTER
...used for REPLICA IDENTITY FULL table. See
FindUsableIndexForReplicaIdentityFull() for details.

====

Agreed. I've attached the updated patch. I'll push it barring any objections.

Logic:

What was the decision about the earlier question [1] of
removing/merging the function IsIndexOnlyOnExpression()?

I don't think we have concluded any action for it. I agree that
IsIndexOnlyOnExpression() is redundant. We don't need to check *all*
index fields actually. I've attached a draft patch. It removes
IsIndexOnlyOnExpression() and merges
RemoteRelContainsLeftMostColumnOnIdx() to
FindUsableIndexForReplicaIdentityFull. One concern is that we no
longer do the assertion check with
IsIndexUsableForReplicaIdentityFull(). What do you think?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#19

Amit Kapila

amit.kapila16@gmail.com

about 3 years ago

In reply to: Masahiko Sawada (#18)

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

On Wed, Jul 12, 2023 at 12:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jul 11, 2023 at 5:31 PM Peter Smith <smithpb2250@gmail.com> wrote:

I don't think we have concluded any action for it. I agree that
IsIndexOnlyOnExpression() is redundant. We don't need to check *all*
index fields actually. I've attached a draft patch. It removes
IsIndexOnlyOnExpression() and merges
RemoteRelContainsLeftMostColumnOnIdx() to
FindUsableIndexForReplicaIdentityFull. One concern is that we no
longer do the assertion check with
IsIndexUsableForReplicaIdentityFull(). What do you think?

I think this is a valid concern. Can't we move all the checks
(including the remote attrs check) inside
IsIndexUsableForReplicaIdentityFull() and then call it from both
places? Won't we have attrmap information available in the callers of
FindReplTupleInLocalRel() via ApplyExecutionData?

--
With Regards,
Amit Kapila.

#20

Masahiko Sawada

sawada.mshk@gmail.com

about 3 years ago

In reply to: Amit Kapila (#19)

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

On Wed, Jul 12, 2023 at 7:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jul 12, 2023 at 12:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jul 11, 2023 at 5:31 PM Peter Smith <smithpb2250@gmail.com> wrote:

I don't think we have concluded any action for it. I agree that
IsIndexOnlyOnExpression() is redundant. We don't need to check *all*
index fields actually. I've attached a draft patch. It removes
IsIndexOnlyOnExpression() and merges
RemoteRelContainsLeftMostColumnOnIdx() to
FindUsableIndexForReplicaIdentityFull. One concern is that we no
longer do the assertion check with
IsIndexUsableForReplicaIdentityFull(). What do you think?

I think this is a valid concern. Can't we move all the checks
(including the remote attrs check) inside
IsIndexUsableForReplicaIdentityFull() and then call it from both
places? Won't we have attrmap information available in the callers of
FindReplTupleInLocalRel() via ApplyExecutionData?

You mean to pass ApplyExecutionData or attrmap down to
RelationFindReplTupleByIndex()? I think it would be better to call it
from FindReplTupleInLocalRel() instead.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#21