Proposal: Global Index for PostgreSQL

Started by Dilip Kumar7 months ago19 messages

dilipbalaut@gmail.com

7 months ago

PostgreSQL’s table partitioning allows large tables to be broken into
smaller, more manageable pieces for better performance. However, a key
limitation currently is the absence of global indexes, which restricts
using partitioned tables, especially when you need unique constraints
on columns that aren't part of the partition key. It's true that
global indexes might seem to go against the core idea of partitioning.
However, they become essential when business needs dictate
partitioning data on one key while also enforcing uniqueness on a
different column. Without global indexes, users simply couldn't
leverage partitioning at all in such scenarios.

I've been developing a design and a POC for global indexes to address
this. I'd like to share my work, including the challenges we still
face and my proposed solutions.

Storage
=======
As we know that since global indexes are covering tuple of multiple
partitions, TIDs are not sufficient to uniquely identifying the tuple
so for that I have introduced a partitioned identifier which is 4
bytes integer and we may argue more on this whether this can be a
variable length or not for saving space. I will describe more on this
partition identifier in a later part of the email and what's the
reason for choosing this instead of just a relation Oid as that seems
much straightforward.

Partition identifier is stored as part of the index tuple right after
the last key column, storing it here after the last key column make a
lot of things work in very straight forward way without any special
handling for this partition identifier column, e.g. btree tuple are
arranged in key order and if key are duplicate its arranged in TID
order and now for global index we want if the keys are duplicates it
should be arranged in partitioned identifier order and if partition
identifier is also same then in TID order, so by storing identifier
after the last key column and if we consider that field as part of the
extended key column then tuple will automatically will be arranged in
(keys, partition identifier, TID) order as we desire. Similarly
suffix truncation and deduplication will also be simpler with this
design choice.

Partition Identifier
==============
Before discussing anything about Create Index/Attach/Detach partition,
I think we need to discuss the partition identifier. So the main
reason for not using relation Oid as partition identifier is because
if we do so we would be forced to remove all the tuple of the detached
partitions from the global index, otherwise it would be very hard to
identify the tuple which are not valid or we should not scan for
example if the same partition detaches and reattaches then we would
not be able to distinguish between old (which should be ignore) and
new tuples, and also if Oid got reassigned to some other partition old
partition is dropped and wraparound happened.

To solve this, we've introduced a partition identifier, a 4-byte
integer. We'll use a new catalog called pg_index_partitions to store a
mapping from (partition identifier, global index OID) to relation OID.

The mapping isn't a direct one from partition identifier to relation
ID because each partition will have a different partition identifier
for each global index it's associated with. This is crucial for
correctly handling global indexes in multi-level partition
hierarchies.

Consider this example:
- T is the top-level parent and has global index g.
- T1 and T2 are children of T. T2 is partitioned table itself and also
has global index g2.
- T22 is a child of T2.

Let's consider a scenario where if you assign a single partition
identifier to T22 and create a one-to-one mapping with its relation
IDs. What happens if we detach T2 from T? (While we haven't delved
into the detach process yet, our current thought is to invalidate the
partition identifier entry so that during a scan, when we look up the
partition identifier to find the corresponding reloid, we'd simply
ignore it.)

However, this approach presents a problem. We can't simply invalidate
the entry for T22 entirely, because while its partition identifier
should be considered invalid when scanning global index 'g' (since T2
is detached from T), it should still be valid when scanning global
index 'g2' (which belongs to T2).

To address this, each leaf partition needs to be assigned a separate
partition identifier for every global index it's associated with.
While a separate partition identifier for each level in the partition
hierarchy could suffice, we've opted for the simpler approach of
assigning a distinct identifier per global index. We made this choice
assuming users will be judicious in creating global indexes within
their partition hierarchies, meaning we won't have to worry about
rapidly consuming a large number of partition IDs. Additionally, the
partition identifier counter can be maintained per global index rather
than globally, this will avoid growing this counter rapidly.

Creating a global Index
=================
While creating an index if the index is marked as GLOBAL, then there
would be some differences compared to the partitioned index, 1) And
additional internal partition identifier key column will be added 2) A
partition id will be allocated to each leaf partitions exist under the
top level partition on which we are creating a global index and
mapping will be stored in pg_index_partition table 3) Unlike
partitioned index global index will not be created on each child
instead this will be created only on the partitioned relation on which
user chooses to create 4) Index build needs to be modified to scan
through all the leaf partitions and create a single sort space for
inserting into the global index.

Attach Partition
=============
While attaching a partition, if this is a leaf partition, we need to
assign a new partition id with respect to each global index present on
its parent and ancestors and insert the mapping in the
pg_index_partitions table. If the partition being attached is itself
a partitioned table then this step has to be done for every leaf
partition present in the partition hierarchy under the partitioned
table being attached.

Currently we reindex the global indexes present on the parent parent
and ancestor to which we are attaching a partition(s), but this can be
optimized for example we may choose to selectively insert the tuple of
the partition being attached but in some cases it could be costly if
the attached partition itself has a lot of tuple compared to existing
partitions which are already attached. So this could be a mixed
approach and can be decided based on the existing number of tuples
index the index vs new in coming tuples, and as of now I am listing
this as open for suggestion items.

Detach Partition
=============
When detaching or dropping a partition, we just need to invalidate the
corresponding mapping in pg_index_partitions. Specifically, for each
leaf partition being detached, we'll mark its reloid as invalid within
the (indexoid, partition id) entry. This ensures that during an index
scan, when the system attempts to convert a partition ID to its
reloid, it will recognize this entry as invalid. We'll also need to
consider a mechanism to clean up these invalidated entries at some
point.

Global Index Scan
==============
In short, the global index scan will be similar to the normal index.
The key difference is that we'll also need to track heapOid alongside
heapTid within BTScanPosItem. Although we store the partition ID
directly in the index tuple, we can readily convert it to a heapOid by
referencing the pg_index_partitions table. For performance, we'll
maintain a cache within the global index's Relcache entry; this cache
will simply store the mapping of partitions associated with that
specific global index.

Furthermore, we must account for new index access paths when global
indexes are present. Currently, set_append_rel_size() generates index
restriction information for each append relation. Now, this
restriction information must also be generated for the partitioned
table itself. Additionally, within set_append_rel_pathlist(), we'll
need to invoke create_index_paths() for the partitioned table. This
ensures that we consider index paths for the partitioned table, which,
in the case of partitioned tables, will be global index scan paths.

Locking consideration
=================
Currently, when we perform DML operations on a top-level partitioned
table, we acquire a lock on the entire partition hierarchy beneath it,
which works well. However, if we're directly operating on a leaf
relation, say, inserting a tuple, we only lock that specific relation.
This isn't sufficient for global indexes.

Since a global index can reside on a parent table, inserting a tuple
into a leaf relation also necessitates an insertion into the global
index. This means we also need to lock the parent table on which the
global index is created.

However, even that isn't enough. We actually need to lock the entire
partition hierarchy under the parent table that has the global index.
Here's why: when you insert a tuple into any leaf partition, you'll
also insert it into the global index. If it's a unique index, a
conflict might arise with a key belonging to another partition. To
validate whether the tuple is live or not, we might need to access
data in other partitions.

Based on our chosen design, we identify all these necessary locks
during the planning phase. We then acquire these locks during planning
and store them within the planned statement. This way, if the
statement is prepared and executed later, these locks can be
re-acquired during AcquireExecutorLocks.

Open Problems/Need suggestions
==========================
Here is the list of top few problems which would be good to discuss
sooner than later

Vacuum for global indexes
—----------------------------------
I think this has been discussed multiple times in the past whenever we
talked about the global index. The core issue is that, by default,
global indexes are vacuumed with each individual partition. This
becomes incredibly inefficient and costly when dealing with millions
of partitions, as it leads to repeated, expensive scans of a large
global index.

Ideally, global indexes should only be vacuumed once, after all
partitions have been processed. This is because a global index keeps
data from all partitions, meaning a single vacuum operation after all
partition vacuums would be sufficient and far more efficient. This
approach would significantly reduce the overhead associated with
maintaining global indexes in large, partitioned datasets.

Our testing highlights a significant performance bottleneck when
vacuuming with global indexes. In a scenario with 1,000 partitions and
a total of 50 million tuples, vacuuming without global indexes took a
mere 15 seconds. However, after introducing global indexes, the vacuum
time skyrocketed to 45 minutes, a 300-fold increase! This dramatic
slowdown is expected, as the global index, containing data from all
partitions, is effectively vacuumed with each of the 1,000 individual
partition vacuums.

We've made a quick but significant optimization: we now vacuum each
partition and its local indexes first, skipping the global index
vacuum and the second pass of heap vacuuming for a bit. Instead, we
just keep track of the dead-tid stores (dead tuples) for each
partition. Once all partitions are done, we then vacuum the global
index once and perform that second heap pass across all partitions.

This change slashed our vacuuming time from 45 minutes down to just 40
seconds! This clearly shows we have plenty of room for optimization,
and this challenge isn't a showstopper. I'll share more details on
other solutions I'm testing in an upcoming email to keep this one
concise.

Global Index rewrite
—-------------------------
One particular class of things that needs careful consideration is
table-rewriting operations. We have a few of those: CLUSTER, VACUUM
FULL..ALTER TABLE etc, When such an operation occurs, all the TIDs
potentially change, so tablecmds.c arranges to rebuild indexes. When
there’s a global index involved, we also need to rebuild global
indexes. Or, as a possible alternative strategy, we could allocate a
new partition ID and just leave the index entries to be cleaned out
eventually, but that seems to risk a lot of bloat. However, a key
point here is that we don’t want to be rebuilding the same global
indexes over and over. If someone does a table-rewriting operation
across a whole partitioning hierarchy, we don’t want to rebuild the
same global indexes multiple times.

We also need to consider what happens when relations are truncated.
Here, the two possible strategies are: (1) assign new partition IDs
and leave the old index entries around, (2) truncate the entire index
and then reinsert entries for any untruncated partitions. As above (1)
risks bloat, but it also makes TRUNCATE fast. Of course, if the whole
partitioning hierarchy is truncated all at once, then (2) is clearly
better.

I'm aiming to submit the first WIP patch set before the July
commitfest. It won't have all the issues ironed out yet, but the main
design will be functional.

Thanks, Robert, for many of the key design ideas and regular
discussion throughout designing this. I'd also like to thank Joe,
Peter Geoghegan, Alvaro, and Masahiko Sawada for discussing the
independent issues with me offlist.

--
Regards,
Dilip Kumar
Google

wenhui qiu

qiuwenhuifx@gmail.com

7 months ago

In reply to: Dilip Kumar (#1)

Re: Proposal: Global Index for PostgreSQL

Hi Dilip Kumar
Thank you for your working on this ,I remember six years ago there was
talk about global index ，You can see if this mailing list has any
references to (
/messages/by-id/CALtqXTcurqy1PKXzP9XO=ofLLA5wBSo77BnUnYVEZpmcA3V0ag@mail.gmail.com
)

Thanks

On Fri, Jun 6, 2025 at 3:00 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Show quoted text

PostgreSQL’s table partitioning allows large tables to be broken into
smaller, more manageable pieces for better performance. However, a key
limitation currently is the absence of global indexes, which restricts
using partitioned tables, especially when you need unique constraints
on columns that aren't part of the partition key. It's true that
global indexes might seem to go against the core idea of partitioning.
However, they become essential when business needs dictate
partitioning data on one key while also enforcing uniqueness on a
different column. Without global indexes, users simply couldn't
leverage partitioning at all in such scenarios.

I've been developing a design and a POC for global indexes to address
this. I'd like to share my work, including the challenges we still
face and my proposed solutions.

Storage
=======
As we know that since global indexes are covering tuple of multiple
partitions, TIDs are not sufficient to uniquely identifying the tuple
so for that I have introduced a partitioned identifier which is 4
bytes integer and we may argue more on this whether this can be a
variable length or not for saving space. I will describe more on this
partition identifier in a later part of the email and what's the
reason for choosing this instead of just a relation Oid as that seems
much straightforward.

Partition identifier is stored as part of the index tuple right after
the last key column, storing it here after the last key column make a
lot of things work in very straight forward way without any special
handling for this partition identifier column, e.g. btree tuple are
arranged in key order and if key are duplicate its arranged in TID
order and now for global index we want if the keys are duplicates it
should be arranged in partitioned identifier order and if partition
identifier is also same then in TID order, so by storing identifier
after the last key column and if we consider that field as part of the
extended key column then tuple will automatically will be arranged in
(keys, partition identifier, TID) order as we desire. Similarly
suffix truncation and deduplication will also be simpler with this
design choice.

Partition Identifier
==============
Before discussing anything about Create Index/Attach/Detach partition,
I think we need to discuss the partition identifier. So the main
reason for not using relation Oid as partition identifier is because
if we do so we would be forced to remove all the tuple of the detached
partitions from the global index, otherwise it would be very hard to
identify the tuple which are not valid or we should not scan for
example if the same partition detaches and reattaches then we would
not be able to distinguish between old (which should be ignore) and
new tuples, and also if Oid got reassigned to some other partition old
partition is dropped and wraparound happened.

To solve this, we've introduced a partition identifier, a 4-byte
integer. We'll use a new catalog called pg_index_partitions to store a
mapping from (partition identifier, global index OID) to relation OID.

The mapping isn't a direct one from partition identifier to relation
ID because each partition will have a different partition identifier
for each global index it's associated with. This is crucial for
correctly handling global indexes in multi-level partition
hierarchies.

Consider this example:
- T is the top-level parent and has global index g.
- T1 and T2 are children of T. T2 is partitioned table itself and also
has global index g2.
- T22 is a child of T2.

Let's consider a scenario where if you assign a single partition
identifier to T22 and create a one-to-one mapping with its relation
IDs. What happens if we detach T2 from T? (While we haven't delved
into the detach process yet, our current thought is to invalidate the
partition identifier entry so that during a scan, when we look up the
partition identifier to find the corresponding reloid, we'd simply
ignore it.)

However, this approach presents a problem. We can't simply invalidate
the entry for T22 entirely, because while its partition identifier
should be considered invalid when scanning global index 'g' (since T2
is detached from T), it should still be valid when scanning global
index 'g2' (which belongs to T2).

To address this, each leaf partition needs to be assigned a separate
partition identifier for every global index it's associated with.
While a separate partition identifier for each level in the partition
hierarchy could suffice, we've opted for the simpler approach of
assigning a distinct identifier per global index. We made this choice
assuming users will be judicious in creating global indexes within
their partition hierarchies, meaning we won't have to worry about
rapidly consuming a large number of partition IDs. Additionally, the
partition identifier counter can be maintained per global index rather
than globally, this will avoid growing this counter rapidly.

Creating a global Index
=================
While creating an index if the index is marked as GLOBAL, then there
would be some differences compared to the partitioned index, 1) And
additional internal partition identifier key column will be added 2) A
partition id will be allocated to each leaf partitions exist under the
top level partition on which we are creating a global index and
mapping will be stored in pg_index_partition table 3) Unlike
partitioned index global index will not be created on each child
instead this will be created only on the partitioned relation on which
user chooses to create 4) Index build needs to be modified to scan
through all the leaf partitions and create a single sort space for
inserting into the global index.

Attach Partition
=============
While attaching a partition, if this is a leaf partition, we need to
assign a new partition id with respect to each global index present on
its parent and ancestors and insert the mapping in the
pg_index_partitions table. If the partition being attached is itself
a partitioned table then this step has to be done for every leaf
partition present in the partition hierarchy under the partitioned
table being attached.

Currently we reindex the global indexes present on the parent parent
and ancestor to which we are attaching a partition(s), but this can be
optimized for example we may choose to selectively insert the tuple of
the partition being attached but in some cases it could be costly if
the attached partition itself has a lot of tuple compared to existing
partitions which are already attached. So this could be a mixed
approach and can be decided based on the existing number of tuples
index the index vs new in coming tuples, and as of now I am listing
this as open for suggestion items.

Detach Partition
=============
When detaching or dropping a partition, we just need to invalidate the
corresponding mapping in pg_index_partitions. Specifically, for each
leaf partition being detached, we'll mark its reloid as invalid within
the (indexoid, partition id) entry. This ensures that during an index
scan, when the system attempts to convert a partition ID to its
reloid, it will recognize this entry as invalid. We'll also need to
consider a mechanism to clean up these invalidated entries at some
point.

Global Index Scan
==============
In short, the global index scan will be similar to the normal index.
The key difference is that we'll also need to track heapOid alongside
heapTid within BTScanPosItem. Although we store the partition ID
directly in the index tuple, we can readily convert it to a heapOid by
referencing the pg_index_partitions table. For performance, we'll
maintain a cache within the global index's Relcache entry; this cache
will simply store the mapping of partitions associated with that
specific global index.

Furthermore, we must account for new index access paths when global
indexes are present. Currently, set_append_rel_size() generates index
restriction information for each append relation. Now, this
restriction information must also be generated for the partitioned
table itself. Additionally, within set_append_rel_pathlist(), we'll
need to invoke create_index_paths() for the partitioned table. This
ensures that we consider index paths for the partitioned table, which,
in the case of partitioned tables, will be global index scan paths.

Locking consideration
=================
Currently, when we perform DML operations on a top-level partitioned
table, we acquire a lock on the entire partition hierarchy beneath it,
which works well. However, if we're directly operating on a leaf
relation, say, inserting a tuple, we only lock that specific relation.
This isn't sufficient for global indexes.

Since a global index can reside on a parent table, inserting a tuple
into a leaf relation also necessitates an insertion into the global
index. This means we also need to lock the parent table on which the
global index is created.

However, even that isn't enough. We actually need to lock the entire
partition hierarchy under the parent table that has the global index.
Here's why: when you insert a tuple into any leaf partition, you'll
also insert it into the global index. If it's a unique index, a
conflict might arise with a key belonging to another partition. To
validate whether the tuple is live or not, we might need to access
data in other partitions.

Based on our chosen design, we identify all these necessary locks
during the planning phase. We then acquire these locks during planning
and store them within the planned statement. This way, if the
statement is prepared and executed later, these locks can be
re-acquired during AcquireExecutorLocks.

Open Problems/Need suggestions
==========================
Here is the list of top few problems which would be good to discuss
sooner than later

Vacuum for global indexes
—----------------------------------
I think this has been discussed multiple times in the past whenever we
talked about the global index. The core issue is that, by default,
global indexes are vacuumed with each individual partition. This
becomes incredibly inefficient and costly when dealing with millions
of partitions, as it leads to repeated, expensive scans of a large
global index.

Ideally, global indexes should only be vacuumed once, after all
partitions have been processed. This is because a global index keeps
data from all partitions, meaning a single vacuum operation after all
partition vacuums would be sufficient and far more efficient. This
approach would significantly reduce the overhead associated with
maintaining global indexes in large, partitioned datasets.

Our testing highlights a significant performance bottleneck when
vacuuming with global indexes. In a scenario with 1,000 partitions and
a total of 50 million tuples, vacuuming without global indexes took a
mere 15 seconds. However, after introducing global indexes, the vacuum
time skyrocketed to 45 minutes, a 300-fold increase! This dramatic
slowdown is expected, as the global index, containing data from all
partitions, is effectively vacuumed with each of the 1,000 individual
partition vacuums.

We've made a quick but significant optimization: we now vacuum each
partition and its local indexes first, skipping the global index
vacuum and the second pass of heap vacuuming for a bit. Instead, we
just keep track of the dead-tid stores (dead tuples) for each
partition. Once all partitions are done, we then vacuum the global
index once and perform that second heap pass across all partitions.

This change slashed our vacuuming time from 45 minutes down to just 40
seconds! This clearly shows we have plenty of room for optimization,
and this challenge isn't a showstopper. I'll share more details on
other solutions I'm testing in an upcoming email to keep this one
concise.

Global Index rewrite
—-------------------------
One particular class of things that needs careful consideration is
table-rewriting operations. We have a few of those: CLUSTER, VACUUM
FULL..ALTER TABLE etc, When such an operation occurs, all the TIDs
potentially change, so tablecmds.c arranges to rebuild indexes. When
there’s a global index involved, we also need to rebuild global
indexes. Or, as a possible alternative strategy, we could allocate a
new partition ID and just leave the index entries to be cleaned out
eventually, but that seems to risk a lot of bloat. However, a key
point here is that we don’t want to be rebuilding the same global
indexes over and over. If someone does a table-rewriting operation
across a whole partitioning hierarchy, we don’t want to rebuild the
same global indexes multiple times.

We also need to consider what happens when relations are truncated.
Here, the two possible strategies are: (1) assign new partition IDs
and leave the old index entries around, (2) truncate the entire index
and then reinsert entries for any untruncated partitions. As above (1)
risks bloat, but it also makes TRUNCATE fast. Of course, if the whole
partitioning hierarchy is truncated all at once, then (2) is clearly
better.

I'm aiming to submit the first WIP patch set before the July
commitfest. It won't have all the issues ironed out yet, but the main
design will be functional.

Thanks, Robert, for many of the key design ideas and regular
discussion throughout designing this. I'd also like to thank Joe,
Peter Geoghegan, Alvaro, and Masahiko Sawada for discussing the
independent issues with me offlist.

--
Regards,
Dilip Kumar
Google

Dilip Kumar

dilipbalaut@gmail.com

7 months ago

In reply to: wenhui qiu (#2)

Re: Proposal: Global Index for PostgreSQL

On Fri, Jun 6, 2025 at 1:01 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote:

Hi Dilip Kumar
Thank you for your working on this ,I remember six years ago there was talk about global index ，You can see if this mailing list has any references to (/messages/by-id/CALtqXTcurqy1PKXzP9XO=ofLLA5wBSo77BnUnYVEZpmcA3V0ag@mail.gmail.com)

Sure Thanks.

--
Regards,
Dilip Kumar
Google

Nikita Malakhov

hukutoc@gmail.com

7 months ago

In reply to: Dilip Kumar (#3)

Re: Proposal: Global Index for PostgreSQL

Hi Dilip!

Global Indexes is a very interesting functionality that has both
significant advantages
and drawbacks, and the community seems not ready to accept it without very
strong
motivation.
There was a more recent approach to Global index problem [1]Global Unique Index </messages/by-id/184879c5306.12490ea581628934.7312528450011769010@highgo.ca>, please check
it out.

I've read you proposal and have several questions:
1) New catalog table with global index partitions would immediately affect
interaction
with user tables with global indexes because of corresponding locks that
should be
taken for [in]validation and attach/detach operations, this should be
investigated;
2) Changing relation OIDs (by, say, vacuum full) would immediately result
in index
inconsistency, what do you suppose to do with internal processes that could
change
relation OIDs? Also this question
3) Would single sort space be enough for a more typical case when we have
hundreds of partitions with hundreds of millions records in each? It is a
normal
production case for partitioned tables.
4) Update-heavy partitioned tables that should run vacuum frequently.
Significant
vacuum slowdown would result in going beyond SLAs without corresponding
significant improvements.

[1]: Global Unique Index </messages/by-id/184879c5306.12490ea581628934.7312528450011769010@highgo.ca>
</messages/by-id/184879c5306.12490ea581628934.7312528450011769010@highgo.ca>

Thank you!

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

Dilip Kumar

dilipbalaut@gmail.com

7 months ago

In reply to: Nikita Malakhov (#4)

Re: Proposal: Global Index for PostgreSQL

On Mon, Jun 9, 2025 at 2:03 PM Nikita Malakhov <hukutoc@gmail.com> wrote:

Hi Dilip!

Thanks Nikita for your response and reading my proposal.

Global Indexes is a very interesting functionality that has both significant advantages
and drawbacks, and the community seems not ready to accept it without very strong
motivation.

I understand that this is a hard problem and needs changes in many
critical modules. I don't think there should be a problem with the
motivation of this work, but I believe the main issue lies in the
project's complexity.

There was a more recent approach to Global index problem [1], please check it out.

I've reviewed the proposal, and I understand it aims to address
uniqueness on partitioned tables for non-partition key columns.
However, I'm concerned about the basic design principles' scalability.
I believe it won't scale effectively beyond a relatively small number
of partitions, and this limitation will be quite surprising to users.
Specifically, checking uniqueness during inserts/updates across all
indexes on each partition (since there's no global index) will become
a significant bottleneck.

I've read you proposal and have several questions:

1) New catalog table with global index partitions would immediately affect interaction
with user tables with global indexes because of corresponding locks that should be
taken for [in]validation and attach/detach operations, this should be investigated;

Yeah that's right, but logically the attach/detach are DDL and are not
most frequent operations, so are you worried about performance due to
locking?

2) Changing relation OIDs (by, say, vacuum full) would immediately result in index
inconsistency, what do you suppose to do with internal processes that could change
relation OIDs? Also this question

I want to clarify that we don't store relation OIDs directly in the
global index. Instead, the global index holds partition IDs, and the
mapping from partition ID to relation OID is managed in a new catalog
table, pg_index_partition. It's important to note that VACUUM FULL
operations only alter the relfilenumber (the disk file OID), not the
relation OID itself, which remains constant for the lifetime of the
relation.

What does change during a VACUUM FULL are the TIDs (tuple IDs).
Because of this, all indexes, including global indexes, are reindexed.
As I mentioned in my proposal, there's a significant opportunity for
optimization here. Reindexing large global indexes is a costly
operation, and I've proposed some ideas to improve this process.

3) Would single sort space be enough for a more typical case when we have
hundreds of partitions with hundreds of millions records in each? It is a normal
production case for partitioned tables.

In general, users ideally wouldn't use a global index everywhere. It
really comes down to their specific use case – they should only opt
for a global index when they can't effectively partition their data
without one. The idea is that the amount of data in the sort space
should essentially be the same as if the table wasn't partitioned at
all. That's a good point for consideration. I agree that global
indexes shouldn't be a default choice for every use case. They're most
beneficial when a user's data access patterns inherently prevent
effective partitioning without them. In such scenarios, the amount of
data in the sort space would ideally remain comparable to an
unpartitioned table.

4) Update-heavy partitioned tables that should run vacuum frequently. Significant
vacuum slowdown would result in going beyond SLAs without corresponding
significant improvements.

You've got it. I'm on board with prioritizing a VACUUM optimization
solution for partitioned tables with global indexes. My initial
proposal touched on a proof-of-concept experiment, which indicated no
significant performance hit with global index after the optimization.
I'll share the detailed VACUUM optimization proposal in this thread
within the next couple of days.

--
Regards,
Dilip Kumar
Google

Bruce Momjian

bruce@momjian.us

7 months ago

In reply to: Dilip Kumar (#5)

Re: Proposal: Global Index for PostgreSQL

On Mon, Jun 9, 2025 at 03:28:38PM +0530, Dilip Kumar wrote:

On Mon, Jun 9, 2025 at 2:03 PM Nikita Malakhov <hukutoc@gmail.com> wrote:

Global Indexes is a very interesting functionality that has both significant advantages
and drawbacks, and the community seems not ready to accept it without very strong
motivation.

I understand that this is a hard problem and needs changes in many
critical modules. I don't think there should be a problem with the
motivation of this work, but I believe the main issue lies in the
project's complexity.

...

In general, users ideally wouldn't use a global index everywhere. It
really comes down to their specific use case – they should only opt
for a global index when they can't effectively partition their data
without one. The idea is that the amount of data in the sort space
should essentially be the same as if the table wasn't partitioned at
all. That's a good point for consideration. I agree that global
indexes shouldn't be a default choice for every use case. They're most
beneficial when a user's data access patterns inherently prevent
effective partitioning without them. In such scenarios, the amount of
data in the sort space would ideally remain comparable to an
unpartitioned table.

There are certainly use cases where this would be helpful, but I think
the big question is whether it would have so many negatives that most
people who try to use it would eventually remove it. I have heard that
happened to other relational systems who support global indexes, so I
think we have to consider that possibility. The problem is you might
need to actually write the patch to find out.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

Bruce Momjian

bruce@momjian.us

7 months ago

In reply to: Bruce Momjian (#6)

Re: Proposal: Global Index for PostgreSQL

On Mon, Jun 9, 2025 at 05:51:25PM -0400, Bruce Momjian wrote:

On Mon, Jun 9, 2025 at 03:28:38PM +0530, Dilip Kumar wrote:
There are certainly use cases where this would be helpful, but I think
the big question is whether it would have so many negatives that most
people who try to use it would eventually remove it. I have heard that
happened to other relational systems who support global indexes, so I
think we have to consider that possibility. The problem is you might
need to actually write the patch to find out.

FYI, I wrote a blog about global indexes in 2020:

https://momjian.us/main/blogs/pgblog/2020.html#July_1_2020

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

Dilip Kumar

dilipbalaut@gmail.com

7 months ago

In reply to: Bruce Momjian (#6)

Re: Proposal: Global Index for PostgreSQL

On Tue, Jun 10, 2025 at 3:21 AM Bruce Momjian <bruce@momjian.us> wrote:

Thanks Bruce for your thoughts on this.

On Mon, Jun 9, 2025 at 03:28:38PM +0530, Dilip Kumar wrote:

On Mon, Jun 9, 2025 at 2:03 PM Nikita Malakhov <hukutoc@gmail.com> wrote:

Global Indexes is a very interesting functionality that has both significant advantages
and drawbacks, and the community seems not ready to accept it without very strong
motivation.

I understand that this is a hard problem and needs changes in many
critical modules. I don't think there should be a problem with the
motivation of this work, but I believe the main issue lies in the
project's complexity.

...

In general, users ideally wouldn't use a global index everywhere. It
really comes down to their specific use case – they should only opt
for a global index when they can't effectively partition their data
without one. The idea is that the amount of data in the sort space
should essentially be the same as if the table wasn't partitioned at
all. That's a good point for consideration. I agree that global
indexes shouldn't be a default choice for every use case. They're most
beneficial when a user's data access patterns inherently prevent
effective partitioning without them. In such scenarios, the amount of
data in the sort space would ideally remain comparable to an
unpartitioned table.

There are certainly use cases where this would be helpful, but I think
the big question is whether it would have so many negatives that most
people who try to use it would eventually remove it.

Yeah that's a very valid point.

I have heard that

happened to other relational systems who support global indexes, so I
think we have to consider that possibility. The problem is you might
need to actually write the patch to find out.

I've actually drafted the patch, and while it still has open issues to
tackle, I believe it's ready for some interesting experimentation. For
instance, I've observed significant performance gains during index
scans on non-partition key columns, where scanning a single global
index outperforms appending scan results from thousands of local
indexes. However, I've also noted performance regression in
insert/update cases when a global index is present, which is expected
as we need to insert into a large index, so we need to evaluate what
is acceptable and what not.

My plan is to submit the patch in the next commitfest, including
performance data. Before submission, I'll clean up the code further
and add TODO comments wherever additional work is required.

--
Regards,
Dilip Kumar
Google

Dilip Kumar

dilipbalaut@gmail.com

7 months ago

In reply to: Bruce Momjian (#7)

Re: Proposal: Global Index for PostgreSQL

On Wed, Jun 11, 2025 at 1:08 AM Bruce Momjian <bruce@momjian.us> wrote:

On Mon, Jun 9, 2025 at 05:51:25PM -0400, Bruce Momjian wrote:

On Mon, Jun 9, 2025 at 03:28:38PM +0530, Dilip Kumar wrote:
There are certainly use cases where this would be helpful, but I think
the big question is whether it would have so many negatives that most
people who try to use it would eventually remove it. I have heard that
happened to other relational systems who support global indexes, so I
think we have to consider that possibility. The problem is you might
need to actually write the patch to find out.

FYI, I wrote a blog about global indexes in 2020:

Oh interesting, I would be happy to have a look at it, Thanks.

--
Regards,
Dilip Kumar
Google

#10

Dilip Kumar

dilipbalaut@gmail.com

7 months ago

In reply to: Dilip Kumar (#5)

Re: Proposal: Global Index for PostgreSQL

On Mon, Jun 9, 2025 at 3:28 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Jun 9, 2025 at 2:03 PM Nikita Malakhov <hukutoc@gmail.com> wrote:

4) Update-heavy partitioned tables that should run vacuum frequently. Significant
vacuum slowdown would result in going beyond SLAs without corresponding
significant improvements.

You've got it. I'm on board with prioritizing a VACUUM optimization
solution for partitioned tables with global indexes. My initial
proposal touched on a proof-of-concept experiment, which indicated no
significant performance hit with global index after the optimization.
I'll share the detailed VACUUM optimization proposal in this thread
within the next couple of days.

As discussed earlier the one of main problems is that the global
indexes are vacuumed along with each partition whereas logically it
should be vacuumed only once when all the partitions are vacuum in an
ideal world.

So my proposal is that we make some infrastructure change in vacuum
api's such that there is option to tell heap_vacuum_rel() to skip the
global index vacuum and also return back the deadtid_store, vacuum
layer will store these deadtid_store, hash it by the reloid until it
vacuum all the partitions which are covered by a global index. Once
that is done it will vacuum the global index, we also need to modify
the vac_tid_reaped() so that it can take additional input of reloid so
that it can find the appropriate deadtid_store by looking into the
hash and then look up the dead tid in respective deadtid_store.

We also need to do something for the autovacuum, because currently
autovacuum workers scan the pg_class and identify the relation which
needs to be vacuumed and vacuum it one at a time. However once we
have educated vacuum machinery to first vacuum all the partitions and
then perform global index vacuum, the autovacuum worker should know
which all partitions are supposed to be vacuum together and the
autovacuum worker can pass a list of all those partitions together to
the vacuum machinery.

I believe this enhancement can be implemented in autovacuum without
significant difficulty. Currently, autovacuum scans pg_class to
generate a list of relations requiring vacuuming. For partitioned
tables that have a global index, we can extend this process by
additionally maintaining a list of all their leaf relations within the
parent table's entry.

An autovacuum worker would then process all the leaf relations in this
complete hierarchy. To effectively prevent other workers from
attempting to vacuum the same hierarchy concurrently, the worker would
publish the top-most partitioned relation ID (which identifies the
table with the global index) as MyWorkerInfo->wi_tableoid. This
mechanism ensures that if one worker is processing a partitioned
table, the entire set of child relations under that top-level ID is
automatically skipped by other workers

While this approach offers benefits, it's important to acknowledge
certain potential drawbacks. One concern is a possible impact on
parallelism, as currently each partition might be vacuumed by a
separate worker, but with this change, all partitions covered by the
same global index would have to be processed by a single worker.
Another significant challenge lies in effectively choosing which
partitions to vacuum; for instance, if a table with a global index has
1000 partitions and only 10 meet the vacuum threshold in a given
cycle, this method might not be very efficient. Although we wouldn't
vacuum the global index once for every partition, we could still end
up vacuuming it numerous times by the time all partitions are
eventually processed. To mitigate this, an alternative strategy could
involve proactively vacuuming partitions that haven't fully met their
vacuum threshold but are close (e.g., reached 50% of the threshold).
This would allow us to combine more partitions for a single vacuum
operation, thereby reducing the number of times the global index needs
to be vacuumed.

--
Regards,
Dilip Kumar
Google

#11

Masahiko Sawada

sawada.mshk@gmail.com

7 months ago

In reply to: Dilip Kumar (#10)

Re: Proposal: Global Index for PostgreSQL

On Sat, Jun 14, 2025 at 2:32 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Jun 9, 2025 at 3:28 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Jun 9, 2025 at 2:03 PM Nikita Malakhov <hukutoc@gmail.com> wrote:

4) Update-heavy partitioned tables that should run vacuum frequently. Significant
vacuum slowdown would result in going beyond SLAs without corresponding
significant improvements.

You've got it. I'm on board with prioritizing a VACUUM optimization
solution for partitioned tables with global indexes. My initial
proposal touched on a proof-of-concept experiment, which indicated no
significant performance hit with global index after the optimization.
I'll share the detailed VACUUM optimization proposal in this thread
within the next couple of days.

As discussed earlier the one of main problems is that the global
indexes are vacuumed along with each partition whereas logically it
should be vacuumed only once when all the partitions are vacuum in an
ideal world.

So my proposal is that we make some infrastructure change in vacuum
api's such that there is option to tell heap_vacuum_rel() to skip the
global index vacuum and also return back the deadtid_store, vacuum
layer will store these deadtid_store, hash it by the reloid until it
vacuum all the partitions which are covered by a global index. Once
that is done it will vacuum the global index, we also need to modify
the vac_tid_reaped() so that it can take additional input of reloid so
that it can find the appropriate deadtid_store by looking into the
hash and then look up the dead tid in respective deadtid_store.

Does it need to keep holding dead TIDs for each partition until it
completes vacuuming all partitions that are covered by the global
index? If so, it would end up holding a huge amount of memory in cases
where there are many partitions. How does maintanence_work_mem (or
autovacuum_work_mem) work in this context? Also, what if the
autovacuum worker who is processing the partitioned table with global
indexes gets cancelled? I guess that we would need to scan the
partitions again in order to collect dead TIDs to vacuum the global
index but it would be very expensive.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#12

Dilip Kumar

dilipbalaut@gmail.com

7 months ago

In reply to: Masahiko Sawada (#11)

Re: Proposal: Global Index for PostgreSQL

On Wed, Jun 18, 2025 at 4:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Jun 14, 2025 at 2:32 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Thanks for your opinion, Sawada-San.

Does it need to keep holding dead TIDs for each partition until it
completes vacuuming all partitions that are covered by the global
index? If so, it would end up holding a huge amount of memory in cases
where there are many partitions. How does maintanence_work_mem (or
autovacuum_work_mem) work in this context?

So it will keep holding the deadtids until we vacuum all the
partitions or we run out of the 'maintanence_work_mem', consider this
similar to the case where you do not have concept of global index and
for supporting the unique constraint on not partitioned key column
user has to keep one giant table without partitioning it, then every
time we run out of the 'maintanence_work_mem' we need to vacuum the
indexes, same theory applies for the global index.

Also, what if the

autovacuum worker who is processing the partitioned table with global
indexes gets cancelled? I guess that we would need to scan the
partitions again in order to collect dead TIDs to vacuum the global
index but it would be very expensive.

You're right, but that's comparable to the cost of managing a single,
unpartitioned giant table. I believe the primary value of global
indexes lies in enabling users to partition tables that otherwise
couldn't be. So, while certain maintenance tasks might incur similar
costs to a single large table, you'll gain significant advantages
during many other DML operations due to partitioning that wouldn't be
possible without global indexes.

--
Regards,
Dilip Kumar
Google

#13

Dilip Kumar

dilipbalaut@gmail.com

6 months ago

In reply to: Dilip Kumar (#12)

4 attachment(s)

Re: Proposal: Global Index for PostgreSQL

On Wed, Jun 18, 2025 at 4:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Jun 18, 2025 at 4:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Here is the first WIP version of the patch set. Commit message in
each patch explains in detail what exactly it does so not adding the
description in email. Patches applies on the latest head
(732061150b004385810e522f8629f5bf91d977b7)

Open issues yet to be handled:
=========================
1) Vacuum optimization as I described above is still highly unstable
patch so not attached here, so with this patch set if you create
global index with large table with a lot of partition then there will
be a huge regression in vacuum performance, which should be resolved
after vacuum optimization patch which I am planning to post by this
month.
2) If you are attaching a partition which has a different column order
than the parent and create a global index on that, it will not work,
still working on it.
3) For unique checking in btree, global index may get conflicting
tuple belongs to different partitions so relation open is done on the
fly, ideally that description should have been done in executor and
pass down to btree but still identifying how to do that with minimum
changes in AM interfaces, or can it be done without AM changes.
4) Need to write more test cases for REINDEX TABLE, TRUNCATE TABLE and
need to tighten up the global index rebuild so that it doesn't get
rebuilt multiple times or doesn't skip rebuilding when needed.
5) ALTER COLUMN SET TYPE, which triggers global index rewrite is not
handled properly. This needs some more work, in the alter table
machinery where we identify the list of indexes to rebuild.
6) Need to perform a performance test, for SELECT/UPDATE/INSERT cases,
we already know the VACUUM performance.
7) global_index.sql is taking a long time to execute as I have kept a
high data load, so that needs to be simplified.

Note: Patches are still WIP and might have many loose ends, for now,
make check-world is passing, including the global index test cases

Credit: I have already mentioned while sending the first design email
but missed some, so adding it again here.

1. Robert: for many of the key design ideas and regular discussion
throughout designing this.
2. Joe: Also has regular discussion on this and many suggestions,
specially related to vacuum
3. Peter Geoghegan, Alvaro, and Masahiko Sawada for discussing the
independent issues with me offlist.
4. I got the idea of syntax for creating global indexes and also
keeping a cache of mapping from reloid to relation descriptor during
the global index scan, from some of the very old thread related to
global index (not able to find the thread now).

--
Regards,
Dilip Kumar
Google

Attachments:

v1-0001-Catalog-for-globalindexid-reloid-to-PartitionId-t.patchapplication/octet-stream; name=v1-0001-Catalog-for-globalindexid-reloid-to-PartitionId-t.patchDownload

From 0c48cf2daad6fba92623c4b9b7be60a60e1e3fab Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Tue, 23 Jul 2024 15:46:38 +0530
Subject: [PATCH v1 1/4] Catalog for (globalindexid, reloid) to PartitionId to
 mapping

This is a base patch required for global indexes.  Basically, global
indexes stores tuple from multiple partitions of a partitioned relation
so TIDs alone are insufficient for uniquely identifying tuples. We might
consider storing the relation OID along with TID, as the combination
of relation OID and TID is the most straightforward way to uniquely
identify a heap tuple.

However, this approach has its drawbacks. For instance, if a partition
is detached, the global index would not know how to ignore the tuples
from that detached partition unless it cleans out all tuples from that
partition in the global index.

Therefore, we need an identifier for each partition that is only valid
while the partition is attached and becomes invalid once the partition
is detached. We call this identifier a partition ID. However, when
accessing tuples from the index, we still need to convert the partition
ID to the heap OID at some point.

One might assume a 1-to-1 mapping between partition IDs and relation
OIDs, but this isn't feasible. In a multi-level partition hierarchy,
detaching a partitioned table from a higher-level partitioned table
invalidates the underlying leaf relation's partition ID for the global
indexes of the parent from which it was detached. However, that partition
ID should remain valid for the global indexes prersent at the lower level
partitioned tables where the leaf partition is still attached. Therefore,
for simplicity, we maintain a partition ID for each global index and leaf
relation OID pair. This way, detaching a partition only requires invalidating
the specific global index and partition ID combinations from which the leaf
partition is being detached.

This patch provides a catalog for storing these mappings and a cache for
faster access. It also includes mechanisms for allocating partition IDs
and managing insertions and deletions in the catalog.
---
 src/backend/catalog/Makefile              |   4 +-
 src/backend/catalog/pg_index_partitions.c | 342 ++++++++++++++++++++++
 src/include/c.h                           |  16 +
 src/include/catalog/Makefile              |   3 +-
 src/include/catalog/pg_index_partitions.h |  84 ++++++
 src/include/postgres.h                    |  19 ++
 src/include/utils/rel.h                   |   7 +
 src/test/regress/expected/oidjoins.out    |   2 +
 8 files changed, 474 insertions(+), 3 deletions(-)
 create mode 100644 src/backend/catalog/pg_index_partitions.c
 create mode 100644 src/include/catalog/pg_index_partitions.h

diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index c090094ed0..275910eda6 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -46,8 +46,8 @@ OBJS = \
 	pg_subscription.o \
 	pg_type.o \
 	storage.o \
-	toasting.o
-
+	toasting.o \
+	pg_index_partitions.o
 include $(top_srcdir)/src/backend/common.mk
 
 .PHONY: install-data
diff --git a/src/backend/catalog/pg_index_partitions.c b/src/backend/catalog/pg_index_partitions.c
new file mode 100644
index 0000000000..e637feb453
--- /dev/null
+++ b/src/backend/catalog/pg_index_partitions.c
@@ -0,0 +1,342 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_index_partitions.c
+ *
+ *	  routines to support manipulation of the pg_index_partitions relation and
+ *	  also provide cache over this relation for faster mapping from partition
+ *	  id to reloid for a global index.
+ *
+ * Notes(why we need pg_index_partitions relation):
+ *
+ * This mapping is required for global indexes.  Basically, global indexes
+ * stores tuple from multiple partitions of a partitioned relation so TIDs
+ * alone are insufficient for uniquely identifying tuples. We might consider
+ * storing the relation OID along with TID, as the combination of relation OID
+ * and TID is the most straightforward way to uniquely identify a heap tuple.
+ *
+ * However, this approach has its drawbacks. For instance, if a partition is
+ * detached, the global index would not know how to ignore the tuples from that
+ * detached partition unless it cleans out all tuples from that partition in
+ * the global index.
+ *
+ * Therefore, we need an identifier for each partition that is only valid while
+ * the partition is attached and becomes invalid once the partition is
+ * detached. We call this identifier a partition ID. However, when accessing
+ * tuples from the index, we still need to convert the partition ID to the heap
+ * OID at some point.  One might assume a 1-to-1 mapping between partition IDs
+ * and relation OIDs, but this isn't feasible. In a multi-level partition
+ * hierarchy, detaching a partitioned table from a higher-level partitioned
+ * table invalidates the underlying leaf relation's partition ID for the global
+ * indexes of the parent from which it was detached. However, that partition
+ * ID should remain valid for the global indexes prersent at the lower level
+ * partitioned tables where the leaf partition is still attached. Therefore,
+ * for simplicity, we maintain a partition ID for each global index and leaf
+ * relation OID pair. This way, detaching a partition only requires
+ * invalidating the specific global index and partition ID combinations from
+ * which the leaf partition is being detached.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/catalog/pg_index_partitions.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/htup_details.h"
+#include "access/stratnum.h"
+#include "access/table.h"
+#include "catalog/indexing.h"
+#include "catalog/catalog.h"
+#include "catalog/pg_index_partitions.h"
+#include "partitioning/partdesc.h"
+#include "utils/fmgroids.h"
+#include "utils/inval.h"
+#include "utils/rel.h"
+
+/*
+ * InsertIndexPartitionEntry - Insert parition id to reloid mapping
+ *
+ * Insert (indexoid, partid) to reloid mapping into pg_index_partitions table.
+ */
+void
+InsertIndexPartitionEntry(Relation irel, Oid reloid, PartitionId partid)
+{
+	Datum		values[Natts_pg_index_partitions];
+	bool		nulls[Natts_pg_index_partitions];
+	HeapTuple	tuple;
+	Relation	rel;
+	Oid			indexoid = RelationGetRelid(irel);
+
+	rel = table_open(IndexPartitionsRelationId, RowExclusiveLock);
+
+	/* Make the pg_index_partitions entry. */
+	values[Anum_pg_index_partitions_indexoid - 1] = ObjectIdGetDatum(indexoid);
+	values[Anum_pg_index_partitions_reloid - 1] = ObjectIdGetDatum(reloid);
+	values[Anum_pg_index_partitions_partid - 1] = PartitionIdGetDatum(partid);
+
+	memset(nulls, 0, sizeof(nulls));
+
+	tuple = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+
+	CatalogTupleInsert(rel, tuple);
+
+	heap_freetuple(tuple);
+
+	table_close(rel, RowExclusiveLock);
+}
+
+/*
+ * DeleteIndexPartitionEntries - Delete all index partition entries.
+ *
+ * This will delete all the entires for given global index id from
+ * pg_index_partitions table.  This should only be called when global index
+ * is being dropped.
+ */
+void
+DeleteIndexPartitionEntries(Oid indrelid)
+{
+	Relation	catalogRelation;
+	ScanKeyData key;
+	SysScanDesc scan;
+	HeapTuple	tuple;
+
+	/* Find pg_index_partitions entries by indrelid. */
+	catalogRelation = table_open(IndexPartitionsRelationId, RowExclusiveLock);
+	ScanKeyInit(&key,
+				Anum_pg_index_partitions_indexoid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(indrelid));
+
+	scan = systable_beginscan(catalogRelation, IndexPartitionsIndexId, true,
+							  NULL, 1, &key);
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+		CatalogTupleDelete(catalogRelation, &tuple->t_self);
+
+	/* Done */
+	systable_endscan(scan);
+	table_close(catalogRelation, RowExclusiveLock);
+}
+
+/*
+ * BuildIndexPartitionInfo - Cache for parittion id to reloid mapping
+ *
+ * Build a cache for faster access to the mappping from partition id to the
+ * relation oid.  For more detail on this mapping refer to the comments in
+ * pg_index_partition.h and also atop PartitionId declaration in c.h.
+ */
+void
+BuildIndexPartitionInfo(Relation relation, MemoryContext context)
+{
+	SysScanDesc scan;
+	ScanKeyData key;
+	HeapTuple	tuple;
+	Relation	rel;
+	PartitionId	maxpartid = InvalidPartitionId;
+	IndexPartitionInfo	map;
+	MemoryContext oldcontext;
+	HASHCTL		ctl;
+
+	/*
+	 * Open pg_index_partition table for getting the partition id to reloid
+	 * mapping for the input index relation.
+	 */
+	rel = table_open(IndexPartitionsRelationId, AccessShareLock);
+
+	oldcontext = MemoryContextSwitchTo(context);
+	map = (IndexPartitionInfoData *) palloc0(sizeof(IndexPartitionInfoData));
+	map->context = context;
+
+	/* Make a new hash table for the cache */
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(IndexPartitionInfoEntry);
+	ctl.hcxt = context;
+
+	map->pdir_hash = hash_create("index partition directory", 256, &ctl,
+								  HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+	MemoryContextSwitchTo(oldcontext);
+
+	ScanKeyInit(&key,
+				Anum_pg_index_partitions_indexoid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationGetRelid(relation)));
+
+	scan = systable_beginscan(rel, IndexPartitionsIndexId, true,
+							  NULL, 1, &key);
+
+	while ((tuple = systable_getnext(scan)) != NULL)
+	{
+		Form_pg_index_partitions form = (Form_pg_index_partitions) GETSTRUCT(tuple);
+		IndexPartitionInfoEntry *entry;
+		bool		found;
+
+		/*
+		 * We need to consider the partition id of the detached partitioned as
+		 * well while computing the maxpartid so that we do not repeat the
+		 * value.
+		 */
+		if (form->partid > maxpartid)
+			maxpartid = form->partid;
+
+		if (!OidIsValid(form->reloid))
+			continue;
+
+		entry = hash_search(map->pdir_hash, &form->partid, HASH_ENTER, &found);
+		Assert(!found);
+		entry->reloid = form->reloid;
+	}
+
+	map->max_partid = maxpartid;
+	relation->rd_indexpartinfo = map;
+	systable_endscan(scan);
+
+	table_close(rel, AccessShareLock);
+}
+
+/*
+ * IndexGetRelationPartitionId - Get partition id for the reloid
+ *
+ * Get the partition ID for the given partition relation OID for the specified
+ * global index relation.
+ */
+PartitionId
+IndexGetRelationPartitionId(Relation irel, Oid reloid)
+{
+	IndexPartitionInfo	map;
+	HASH_SEQ_STATUS		hash_seq;
+	PartitionId			partid = InvalidPartitionId;
+	IndexPartitionInfoEntry *entry;
+
+	if (irel->rd_indexpartinfo == NULL)
+		BuildIndexPartitionInfo(irel, CurrentMemoryContext);
+
+	map = irel->rd_indexpartinfo;
+
+	hash_seq_init(&hash_seq, map->pdir_hash);
+
+	while ((entry = hash_seq_search(&hash_seq)) != NULL)
+	{
+		if (entry->reloid == reloid)
+		{
+			partid = entry->partid;
+			hash_seq_term(&hash_seq);
+			break;
+		}
+	}
+
+	return partid;
+}
+
+/*
+ * IndexGetPartitionReloid - Get relation oid for the paritionid
+ *
+ * Get the relation OID for the given partition ID for the specified global
+ * index relation.
+ */
+Oid
+IndexGetPartitionReloid(Relation irel, PartitionId partid)
+{
+	IndexPartitionInfo	map = irel->rd_indexpartinfo;
+	IndexPartitionInfoEntry *entry;
+	bool		found;
+
+	entry = hash_search(map->pdir_hash, &partid, HASH_FIND, &found);
+	if (!found)
+		return InvalidOid;
+
+	return entry->reloid;
+}
+
+/*
+ * InvalidateIndexPartitionEntries - Invalidate pg_index_partitions entries
+ *
+ * Set reloid as Invalid in pg_index_partitions entries with respect to the
+ * given reloid.  If a valid global indexoids list is given then only
+ * invalidate the reloid entires which are related to the input global index
+ * oids.
+ */
+void
+InvalidateIndexPartitionEntries(List *reloids, Oid indexoid)
+{
+	Relation	catalogRelation;
+	SysScanDesc scan;
+	ScanKeyData key;
+	HeapTuple	tuple;
+
+	/*
+	 * Find pg_inherits entries by inhparent.  (We need to scan them all in
+	 * order to verify that no other partition is pending detach.)
+	 */
+	catalogRelation = table_open(IndexPartitionsRelationId, RowExclusiveLock);
+
+	ScanKeyInit(&key,
+				Anum_pg_index_partitions_indexoid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(indexoid));
+
+	scan = systable_beginscan(catalogRelation, IndexPartitionsIndexId, true,
+							  NULL, 1, &key);
+
+	while ((tuple = systable_getnext(scan)) != NULL)
+	{
+		Form_pg_index_partitions form = (Form_pg_index_partitions) GETSTRUCT(tuple);
+		HeapTuple	newtup;
+
+		if (!list_member_oid(reloids, form->reloid))
+			continue;
+
+		newtup = heap_copytuple(tuple);
+		((Form_pg_index_partitions) GETSTRUCT(newtup))->reloid = InvalidOid;
+
+		CatalogTupleUpdate(catalogRelation,
+						   &tuple->t_self,
+						   newtup);
+		heap_freetuple(newtup);
+	}
+
+	/* Done */
+	systable_endscan(scan);
+	table_close(catalogRelation, RowExclusiveLock);
+}
+
+/*
+ * IndexGetNextPartitionID - Get the next partition ID of the global index
+ *
+ * Obtain the next partition ID to be allocated for the specified global index
+ * relation. Also update this value in the cache for the next allocation.
+ */
+PartitionId
+IndexGetNextPartitionID(Relation irel)
+{
+	PartitionId partid;
+
+	/*
+	 * If the cache is not already build then do it first so that we know what
+	 * is the maximum partition ID value and then we can generate the next
+	 * value.
+	 */
+	if (irel->rd_indexpartinfo == NULL)
+		BuildIndexPartitionInfo(irel, CurrentMemoryContext);
+
+	/* Use the max_partid + 1 value as the next parition id. */
+	partid = irel->rd_indexpartinfo->max_partid + 1;
+
+	/*
+	 * If partitionID is wraparound then give error.
+	 * XXX here we might consider reusing the unused partition IDs.
+	 */
+	if (!PartIdIsValid(partid))
+		elog(ERROR, "could not allocate new PartitionID because limit is exhausted");
+
+	/*
+	 * Store the new value in the cache, in case the cache is invalidated we
+	 * will get the max value again from the system catalog, so there should
+	 * not be any issue.
+	 */
+	irel->rd_indexpartinfo->max_partid = partid;
+
+	return partid;
+}
diff --git a/src/include/c.h b/src/include/c.h
index 8cdc16a0f4..797891abb6 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -636,6 +636,22 @@ typedef uint32 MultiXactOffset;
 
 typedef uint32 CommandId;
 
+/*
+ * This is a new type used for global indexes. Global indexes store data
+ * from multiple partitions, so along with the heap TID, we also need a new
+ * identifier to identify the heap. We could store the reloid as well, but
+ * if the partition is detached, it becomes very hard to identify that we
+ * don't need to access data from this particular relation. It becomes even
+ * more problematic if the partition is dropped and later reloid is reused by
+ * another relation, making it difficult to distinguish whether a particular
+ * index  tuple belongs to the old relation or the new relation. To handle
+ * this, we use a new identifier called partition id. Whenever a relation is
+ * detached  from the global index, the partition id is invalidated, allowing
+ * us to easily identify that we don't need to access the tuple of this
+ * partition id.
+ */
+typedef	uint32 PartitionId;
+
 #define FirstCommandId	((CommandId) 0)
 #define InvalidCommandId	(~(CommandId)0)
 
diff --git a/src/include/catalog/Makefile b/src/include/catalog/Makefile
index 2bbc7805fe..2b76dcdc16 100644
--- a/src/include/catalog/Makefile
+++ b/src/include/catalog/Makefile
@@ -81,7 +81,8 @@ CATALOG_HEADERS := \
 	pg_publication_namespace.h \
 	pg_publication_rel.h \
 	pg_subscription.h \
-	pg_subscription_rel.h
+	pg_subscription_rel.h \
+	pg_index_partitions.h
 
 GENERATED_HEADERS := $(CATALOG_HEADERS:%.h=%_d.h)
 
diff --git a/src/include/catalog/pg_index_partitions.h b/src/include/catalog/pg_index_partitions.h
new file mode 100644
index 0000000000..2dcc8ca3fc
--- /dev/null
+++ b/src/include/catalog/pg_index_partitions.h
@@ -0,0 +1,84 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_index_partitions.h
+ *	  definition of the system catalog (pg_index_partitions)
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_index_partitions.h
+ *
+ * NOTES
+ *	  The Catalog.pm module reads this file and derives schema
+ *	  information.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_INDEX_PARTITIONS_H
+#define PG_INDEX_PARTITIONS_H
+
+#include "catalog/genbki.h"
+#include "catalog/pg_attribute.h"
+#include "catalog/pg_index_partitions_d.h"
+#include "catalog/pg_type_d.h"
+#include "utils/hsearch.h"
+#include "utils/relcache.h"
+
+/* ----------------
+ *		pg_index_partitions definition.  cpp turns this into
+ *		typedef struct FormData_pg_index_partitions
+ * ----------------
+ */
+CATALOG(pg_index_partitions,6015,IndexPartitionsRelationId)
+{
+	Oid			indexoid BKI_LOOKUP(pg_class);
+	Oid			reloid BKI_LOOKUP(pg_class);
+	int32		partid;
+} FormData_pg_index_partitions;
+
+/* ----------------
+ *		Form_pg_index_partitions corresponds to a pointer to a tuple with
+ *		the format of pg_index_partitions relation.
+ * ----------------
+ */
+typedef FormData_pg_index_partitions *Form_pg_index_partitions;
+
+DECLARE_UNIQUE_INDEX_PKEY(pg_index_partitions_indexoid_partid_index, 6018, IndexPartitionsIndexId, pg_index_partitions, btree(indexoid oid_ops, partid int4_ops));
+
+/*
+ * Map over the pg_index_partitions table for a particular global index.  This
+ * will be used for faster lookup of the next partid to be used for this global
+ * index and also for finding out the partition relation for a give partid of
+ * a global index.
+ */
+typedef struct IndexPartitionInfoData
+{
+	MemoryContext	context;	/* memory context for storing the cache data */
+	PartitionId		max_partid;	/* max value of used partid */
+	HTAB		   *pdir_hash;	/* partid to reloid lookup hash */
+} IndexPartitionInfoData;
+
+typedef IndexPartitionInfoData *IndexPartitionInfo;
+
+/*
+ * TODO we might think of storing the RelationDesc along with reloid?
+ */
+typedef struct IndexPartitionInfoEntry
+{
+	PartitionId	partid;		/* key */
+	Oid			reloid;		/* payload */
+} IndexPartitionInfoEntry;
+
+#define		InvalidPartitionId			0
+#define		FirstValidPartitionId		1
+#define		PartIdIsValid(partid)	((bool) ((partid) != InvalidPartitionId))
+
+extern void BuildIndexPartitionInfo(Relation relation, MemoryContext context);
+extern PartitionId IndexGetRelationPartitionId(Relation irel, Oid reloid);
+extern Oid IndexGetPartitionReloid(Relation irel, PartitionId partid);
+extern PartitionId IndexGetNextPartitionID(Relation irel);
+extern void DeleteIndexPartitionEntries(Oid indrelid);
+extern void InsertIndexPartitionEntry(Relation irel, Oid reloid, PartitionId partid);
+extern void InvalidateIndexPartitionEntries(List *reloids, Oid indexoid);
+#endif							/* PG_INDEX_PARTITIONS_H */
diff --git a/src/include/postgres.h b/src/include/postgres.h
index 8a41a66868..a0a24e2bca 100644
--- a/src/include/postgres.h
+++ b/src/include/postgres.h
@@ -536,6 +536,25 @@ Float8GetDatum(float8 X)
 extern Datum Float8GetDatum(float8 X);
 #endif
 
+/*
+ * DatumGetPartitionId
+ *		Returns partition identifier value of a datum.
+ */
+static inline PartitionId
+DatumGetPartitionId(Datum X)
+{
+	return (PartitionId) X;
+}
+
+/*
+ * PartitionIdGetDatum
+ *		Returns datum representation for a partition identifier.
+ */
+static inline Datum
+PartitionIdGetDatum(PartitionId X)
+{
+	return (Datum) X;
+}
 
 /*
  * Int64GetDatumFast
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index b552359915..35270fdc05 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -19,6 +19,7 @@
 #include "catalog/catalog.h"
 #include "catalog/pg_class.h"
 #include "catalog/pg_index.h"
+#include "catalog/pg_index_partitions.h"
 #include "catalog/pg_publication.h"
 #include "nodes/bitmapset.h"
 #include "partitioning/partdefs.h"
@@ -217,6 +218,12 @@ typedef struct RelationData
 	Oid		   *rd_indcollation;	/* OIDs of index collations */
 	bytea	  **rd_opcoptions;	/* parsed opclass-specific options */
 
+	/*
+	 * Cache for mapping reloid to partition id for the global index.  For more
+	 * details refer comments in pg_index_partitions.h.
+	 */
+	IndexPartitionInfo	rd_indexpartinfo;
+
 	/*
 	 * rd_amcache is available for index and table AMs to cache private data
 	 * about the relation.  This must be just a cache since it may get reset
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 215eb899be..7588e23d6a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -266,3 +266,5 @@ NOTICE:  checking pg_subscription {subdbid} => pg_database {oid}
 NOTICE:  checking pg_subscription {subowner} => pg_authid {oid}
 NOTICE:  checking pg_subscription_rel {srsubid} => pg_subscription {oid}
 NOTICE:  checking pg_subscription_rel {srrelid} => pg_class {oid}
+NOTICE:  checking pg_index_partitions {indexoid} => pg_class {oid}
+NOTICE:  checking pg_index_partitions {reloid} => pg_class {oid}
-- 
2.49.0

v1-0004-Test-cases-for-global-index.patchapplication/octet-stream; name=v1-0004-Test-cases-for-global-index.patchDownload

From 6b22d9d5fe875119fbd6199eb45993473c1ec496 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumarb@google.com>
Date: Sun, 22 Jun 2025 04:17:36 +0000
Subject: [PATCH v1 4/4] Test cases for global index

---
 src/test/regress/expected/global_index.out | 349 +++++++++++++++++++++
 src/test/regress/parallel_schedule         |   3 +
 src/test/regress/sql/global_index.sql      | 158 ++++++++++
 3 files changed, 510 insertions(+)
 create mode 100644 src/test/regress/expected/global_index.out
 create mode 100644 src/test/regress/sql/global_index.sql

diff --git a/src/test/regress/expected/global_index.out b/src/test/regress/expected/global_index.out
new file mode 100644
index 0000000000..f044ea9483
--- /dev/null
+++ b/src/test/regress/expected/global_index.out
@@ -0,0 +1,349 @@
+--
+-- GLOBAL index tests
+--
+CREATE TABLE range_parted (
+	a int,
+	b int
+) PARTITION BY RANGE (a);
+--create some partitions and insert data
+CREATE TABLE range_parted_1 PARTITION OF range_parted FOR VALUES FROM (1) TO (100000);
+CREATE TABLE range_parted_2 PARTITION OF range_parted FOR VALUES FROM (100000) TO (200000);
+CREATE TABLE range_parted_3 PARTITION OF range_parted FOR VALUES FROM (200000) TO (300000);
+INSERT INTO range_parted SELECT i,i%100 FROM generate_series(1,299999) AS i;
+--Create global index
+CREATE INDEX global_idx ON range_parted(b) global;
+INSERT INTO range_parted SELECT i,i%200 FROM generate_series(1,299999) AS i;
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted WHERE b = 7;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Global Index Scan using global_idx on range_parted
+   Index Cond: (b = 7)
+(2 rows)
+
+SELECT * FROM range_parted WHERE b = 7 LIMIT 10;
+  a  | b 
+-----+---
+   7 | 7
+ 107 | 7
+ 207 | 7
+ 307 | 7
+ 407 | 7
+ 507 | 7
+ 607 | 7
+ 707 | 7
+ 807 | 7
+ 907 | 7
+(10 rows)
+
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted WHERE b = 110;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Global Index Scan using global_idx on range_parted
+   Index Cond: (b = 110)
+(2 rows)
+
+SELECT * FROM range_parted WHERE b = 110 LIMIT 10;
+  a   |  b  
+------+-----
+  110 | 110
+  310 | 110
+  510 | 110
+  710 | 110
+  910 | 110
+ 1110 | 110
+ 1310 | 110
+ 1510 | 110
+ 1710 | 110
+ 1910 | 110
+(10 rows)
+
+SELECT * FROM range_parted WHERE b = 250 LIMIT 10;
+ a | b 
+---+---
+(0 rows)
+
+UPDATE range_parted SET b=b+100;
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted WHERE b = 250;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Global Index Scan using global_idx on range_parted
+   Index Cond: (b = 250)
+(2 rows)
+
+SELECT * FROM range_parted WHERE b = 250 LIMIT 10;
+  a   |  b  
+------+-----
+  150 | 250
+  350 | 250
+  550 | 250
+  750 | 250
+  950 | 250
+ 1150 | 250
+ 1350 | 250
+ 1550 | 250
+ 1750 | 250
+ 1950 | 250
+(10 rows)
+
+--attach partition
+CREATE TABLE range_parted_4(a int, b int);
+INSERT INTO range_parted_4 SELECT i,i%300 + 100 FROM generate_series(300000,300300) as i;
+ALTER TABLE range_parted ATTACH PARTITION range_parted_4 FOR VALUES FROM (300000) to (400000);
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted WHERE b = 350;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Global Index Scan using global_idx on range_parted
+   Index Cond: (b = 350)
+(2 rows)
+
+SELECT * FROM range_parted WHERE b = 350 LIMIT 10;
+   a    |  b  
+--------+-----
+ 300250 | 350
+(1 row)
+
+CREATE TABLE range_parted_5(a int, b int) PARTITION BY RANGE (a);
+CREATE TABLE range_parted_5_1 PARTITION OF range_parted_5 FOR VALUES FROM (400000) TO (450000);
+CREATE TABLE range_parted_5_2 PARTITION OF range_parted_5 FOR VALUES FROM (450000) TO (460000);
+INSERT INTO range_parted_5 SELECT i,i%100 + 500 FROM generate_series(400000,459999) AS i;
+CREATE INDEX global_idx_1 ON range_parted_5(b) global;
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted_5 WHERE b = 550;
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Global Index Scan using global_idx_1 on range_parted_5
+   Index Cond: (b = 550)
+(2 rows)
+
+SELECT * FROM range_parted_5 WHERE b = 550 LIMIT 5;
+   a    |  b  
+--------+-----
+ 400050 | 550
+ 400150 | 550
+ 400250 | 550
+ 400350 | 550
+ 400450 | 550
+(5 rows)
+
+--attach to the top partition
+ALTER TABLE range_parted ATTACH PARTITION range_parted_5 FOR VALUES FROM (400000) to (500000);
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted WHERE b = 550;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Global Index Scan using global_idx on range_parted
+   Index Cond: (b = 550)
+(2 rows)
+
+SELECT * FROM range_parted WHERE b = 550 LIMIT 5;
+   a    |  b  
+--------+-----
+ 400050 | 550
+ 400150 | 550
+ 400250 | 550
+ 400350 | 550
+ 400450 | 550
+(5 rows)
+
+-- Check index only scan
+EXPLAIN (COSTS OFF) SELECT b FROM range_parted WHERE b = 550 LIMIT 5;
+                          QUERY PLAN                           
+---------------------------------------------------------------
+ Limit
+   ->  Global Index Only Scan using global_idx on range_parted
+         Index Cond: (b = 550)
+(3 rows)
+
+SELECT b FROM range_parted WHERE b = 550 LIMIT 5;
+  b  
+-----
+ 550
+ 550
+ 550
+ 550
+ 550
+(5 rows)
+
+-- Attach to level-1 partition (test with multi level global index)
+CREATE TABLE range_parted_6(a int, b int);
+INSERT INTO range_parted_6 SELECT i,i%100 + 600 FROM generate_series(460000,490000) AS i;
+ALTER TABLE range_parted_5 ATTACH PARTITION range_parted_6 FOR VALUES FROM (460000) to (500000);
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted WHERE b = 650;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Global Index Scan using global_idx on range_parted
+   Index Cond: (b = 650)
+(2 rows)
+
+SELECT * FROM range_parted WHERE b = 650 LIMIT 5;
+   a    |  b  
+--------+-----
+ 460050 | 650
+ 460150 | 650
+ 460250 | 650
+ 460350 | 650
+ 460450 | 650
+(5 rows)
+
+-- Update the leaf and check we are inserting that in multi-level global index
+UPDATE range_parted SET b=b+1000;
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted WHERE b = 1650;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Global Index Scan using global_idx on range_parted
+   Index Cond: (b = 1650)
+(2 rows)
+
+--SELECT * FROM range_parted WHERE b = 1650 LIMIT 5;
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted_5 WHERE b = 1650;
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Global Index Scan using global_idx_1 on range_parted_5
+   Index Cond: (b = 1650)
+(2 rows)
+
+--SELECT * FROM range_parted_5 WHERE b = 1650 LIMIT 5;
+-- Conditional update using global index
+EXPLAIN (COSTS OFF) UPDATE range_parted SET b=b+1000 where b = 1650;
+                        QUERY PLAN                        
+----------------------------------------------------------
+ Update on range_parted
+   Update on range_parted_1 range_parted
+   Update on range_parted_2 range_parted
+   Update on range_parted_3 range_parted
+   Update on range_parted_4 range_parted
+   Update on range_parted_5_1 range_parted
+   Update on range_parted_5_2 range_parted
+   Update on range_parted_6 range_parted
+   ->  Global Index Scan using global_idx on range_parted
+         Index Cond: (b = 1650)
+(10 rows)
+
+UPDATE range_parted SET b=b+1000 where b = 1650;
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted_5 WHERE b = 2650;
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Global Index Scan using global_idx_1 on range_parted_5
+   Index Cond: (b = 2650)
+(2 rows)
+
+--SELECT * FROM range_parted_5 WHERE b = 2650 LIMIT 5;
+--Detach partition
+ALTER TABLE range_parted DETACH PARTITION range_parted_5;
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted WHERE b = 550;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Global Index Scan using global_idx on range_parted
+   Index Cond: (b = 550)
+(2 rows)
+
+SELECT * FROM range_parted WHERE b = 550 LIMIT 5;
+ a | b 
+---+---
+(0 rows)
+
+--Reattach the partition
+ALTER TABLE range_parted ATTACH PARTITION range_parted_5 FOR VALUES FROM (400000) to (500000);
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted WHERE b = 550;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Global Index Scan using global_idx on range_parted
+   Index Cond: (b = 550)
+(2 rows)
+
+SELECT * FROM range_parted WHERE b = 550 LIMIT 5;
+ a | b 
+---+---
+(0 rows)
+
+--Drop the partitioned table
+DROP TABLE range_parted_5;
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted WHERE b = 550;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Global Index Scan using global_idx on range_parted
+   Index Cond: (b = 550)
+(2 rows)
+
+SELECT * FROM range_parted WHERE b = 550 LIMIT 5;
+ a | b 
+---+---
+(0 rows)
+
+-- Test unique global index
+TRUNCATE TABLE range_parted;
+INSERT INTO range_parted VALUES(1,2);
+INSERT INTO range_parted VALUES(2,2);
+CREATE UNIQUE INDEX global_idx_unique ON range_parted(b) global; -- Fail with duplicate
+ERROR:  could not create unique index "global_idx_unique"
+DETAIL:  Key (b)=(2) is duplicated.
+TRUNCATE TABLE range_parted;
+INSERT INTO range_parted VALUES(1,2);
+CREATE UNIQUE INDEX global_idx_unique ON range_parted(b) global;
+INSERT INTO range_parted VALUES(1,2); -- Fail with duplicate
+ERROR:  duplicate key value violates unique constraint "global_idx_unique"
+DETAIL:  Key (b)=(2) already exists.
+DROP INDEX global_idx_unique;
+INSERT INTO range_parted VALUES(1,2); -- Now this should pass
+-- multiple level multiple type partitions
+CREATE TABLE parent_table (
+    id INT,
+    category TEXT,
+    sub_category TEXT,
+    value INT
+) PARTITION BY RANGE (id);
+DO $$
+DECLARE
+    range_start INT;
+    range_end INT;
+    range_partition_name TEXT;
+    list_partition_name TEXT;
+    hash_partition_name TEXT;
+    i INT;
+    j INT;
+    k INT;
+BEGIN
+    -- Create range partitions
+    FOR i IN 0..10 LOOP
+        range_start := i * 1000;
+        range_end := (i + 1) * 1000;
+        range_partition_name := format('parent_table_%s', i);
+        EXECUTE format('CREATE TABLE %I PARTITION OF parent_table FOR VALUES FROM (%s) TO (%s) PARTITION BY LIST(category)', range_partition_name, range_start, range_end);
+
+        -- Create list partitions within each range partition
+        FOR j IN 1..10 LOOP
+            list_partition_name := format('%s_list_%s', range_partition_name, j);
+            EXECUTE format('CREATE TABLE %I PARTITION OF %I FOR VALUES IN (''%s'') PARTITION BY HASH (id)', list_partition_name, range_partition_name, j);
+
+            -- Create hash partitions within each list partition
+            FOR k IN 0..4 LOOP
+                hash_partition_name := format('%s_hash_%s', list_partition_name, k);
+                EXECUTE format('CREATE TABLE %I PARTITION OF %I FOR VALUES WITH (MODULUS 5, REMAINDER %s)', hash_partition_name, list_partition_name, k);
+            END LOOP;
+        END LOOP;
+    END LOOP;
+END $$;
+DO $$
+DECLARE
+    i INT := 1;
+BEGIN
+    WHILE i <= 10000 LOOP
+        INSERT INTO parent_table (id, category, sub_category, value)
+        VALUES (i, '' || (i % 10 + 1), '' || (i % 10 + 1), i);
+        i := i + 1;
+    END LOOP;
+END $$;
+CREATE INDEX global_index_v ON parent_table(value) global;
+EXPLAIN (COSTS OFF) SELECT * FROM parent_table WHERE value = 9000;
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Global Index Scan using global_index_v on parent_table
+   Index Cond: (value = 9000)
+(2 rows)
+
+SELECT * FROM parent_table WHERE value = 9000;
+  id  | category | sub_category | value 
+------+----------+--------------+-------
+ 9000 | 1        | 1            |  9000
+(1 row)
+
+-- Cleanup
+DROP TABLE range_parted;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index a424be2a6b..712a19cd4e 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -137,6 +137,9 @@ test: event_trigger_login
 # this test also uses event triggers, so likewise run it by itself
 test: fast_default
 
+# test global index
+test: global_index
+
 # run tablespace test at the end because it drops the tablespace created during
 # setup that other tests may use.
 test: tablespace
diff --git a/src/test/regress/sql/global_index.sql b/src/test/regress/sql/global_index.sql
new file mode 100644
index 0000000000..4a2b91ff3a
--- /dev/null
+++ b/src/test/regress/sql/global_index.sql
@@ -0,0 +1,158 @@
+--
+-- GLOBAL index tests
+--
+CREATE TABLE range_parted (
+	a int,
+	b int
+) PARTITION BY RANGE (a);
+
+--create some partitions and insert data
+CREATE TABLE range_parted_1 PARTITION OF range_parted FOR VALUES FROM (1) TO (100000);
+CREATE TABLE range_parted_2 PARTITION OF range_parted FOR VALUES FROM (100000) TO (200000);
+CREATE TABLE range_parted_3 PARTITION OF range_parted FOR VALUES FROM (200000) TO (300000);
+INSERT INTO range_parted SELECT i,i%100 FROM generate_series(1,299999) AS i;
+
+--Create global index
+CREATE INDEX global_idx ON range_parted(b) global;
+INSERT INTO range_parted SELECT i,i%200 FROM generate_series(1,299999) AS i;
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted WHERE b = 7;
+SELECT * FROM range_parted WHERE b = 7 LIMIT 10;
+
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted WHERE b = 110;
+SELECT * FROM range_parted WHERE b = 110 LIMIT 10;
+SELECT * FROM range_parted WHERE b = 250 LIMIT 10;
+
+UPDATE range_parted SET b=b+100;
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted WHERE b = 250;
+SELECT * FROM range_parted WHERE b = 250 LIMIT 10;
+
+--attach partition
+CREATE TABLE range_parted_4(a int, b int);
+INSERT INTO range_parted_4 SELECT i,i%300 + 100 FROM generate_series(300000,300300) as i;
+ALTER TABLE range_parted ATTACH PARTITION range_parted_4 FOR VALUES FROM (300000) to (400000);
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted WHERE b = 350;
+SELECT * FROM range_parted WHERE b = 350 LIMIT 10;
+
+CREATE TABLE range_parted_5(a int, b int) PARTITION BY RANGE (a);
+CREATE TABLE range_parted_5_1 PARTITION OF range_parted_5 FOR VALUES FROM (400000) TO (450000);
+CREATE TABLE range_parted_5_2 PARTITION OF range_parted_5 FOR VALUES FROM (450000) TO (460000);
+INSERT INTO range_parted_5 SELECT i,i%100 + 500 FROM generate_series(400000,459999) AS i;
+CREATE INDEX global_idx_1 ON range_parted_5(b) global;
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted_5 WHERE b = 550;
+SELECT * FROM range_parted_5 WHERE b = 550 LIMIT 5;
+
+--attach to the top partition
+ALTER TABLE range_parted ATTACH PARTITION range_parted_5 FOR VALUES FROM (400000) to (500000);
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted WHERE b = 550;
+SELECT * FROM range_parted WHERE b = 550 LIMIT 5;
+
+-- Check index only scan
+EXPLAIN (COSTS OFF) SELECT b FROM range_parted WHERE b = 550 LIMIT 5;
+SELECT b FROM range_parted WHERE b = 550 LIMIT 5;
+
+-- Attach to level-1 partition (test with multi level global index)
+CREATE TABLE range_parted_6(a int, b int);
+INSERT INTO range_parted_6 SELECT i,i%100 + 600 FROM generate_series(460000,490000) AS i;
+ALTER TABLE range_parted_5 ATTACH PARTITION range_parted_6 FOR VALUES FROM (460000) to (500000);
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted WHERE b = 650;
+SELECT * FROM range_parted WHERE b = 650 LIMIT 5;
+
+-- Update the leaf and check we are inserting that in multi-level global index
+UPDATE range_parted SET b=b+1000;
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted WHERE b = 1650;
+--SELECT * FROM range_parted WHERE b = 1650 LIMIT 5;
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted_5 WHERE b = 1650;
+--SELECT * FROM range_parted_5 WHERE b = 1650 LIMIT 5;
+
+-- Conditional update using global index
+EXPLAIN (COSTS OFF) UPDATE range_parted SET b=b+1000 where b = 1650;
+UPDATE range_parted SET b=b+1000 where b = 1650;
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted_5 WHERE b = 2650;
+--SELECT * FROM range_parted_5 WHERE b = 2650 LIMIT 5;
+
+--Detach partition
+ALTER TABLE range_parted DETACH PARTITION range_parted_5;
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted WHERE b = 550;
+SELECT * FROM range_parted WHERE b = 550 LIMIT 5;
+
+--Reattach the partition
+ALTER TABLE range_parted ATTACH PARTITION range_parted_5 FOR VALUES FROM (400000) to (500000);
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted WHERE b = 550;
+SELECT * FROM range_parted WHERE b = 550 LIMIT 5;
+
+--Drop the partitioned table
+DROP TABLE range_parted_5;
+EXPLAIN (COSTS OFF) SELECT * FROM range_parted WHERE b = 550;
+SELECT * FROM range_parted WHERE b = 550 LIMIT 5;
+
+-- Test unique global index
+TRUNCATE TABLE range_parted;
+INSERT INTO range_parted VALUES(1,2);
+INSERT INTO range_parted VALUES(2,2);
+CREATE UNIQUE INDEX global_idx_unique ON range_parted(b) global; -- Fail with duplicate
+TRUNCATE TABLE range_parted;
+INSERT INTO range_parted VALUES(1,2);
+CREATE UNIQUE INDEX global_idx_unique ON range_parted(b) global;
+INSERT INTO range_parted VALUES(1,2); -- Fail with duplicate
+DROP INDEX global_idx_unique;
+INSERT INTO range_parted VALUES(1,2); -- Now this should pass
+
+-- multiple level multiple type partitions
+CREATE TABLE parent_table (
+    id INT,
+    category TEXT,
+    sub_category TEXT,
+    value INT
+) PARTITION BY RANGE (id);
+
+
+DO $$
+DECLARE
+    range_start INT;
+    range_end INT;
+    range_partition_name TEXT;
+    list_partition_name TEXT;
+    hash_partition_name TEXT;
+    i INT;
+    j INT;
+    k INT;
+BEGIN
+    -- Create range partitions
+    FOR i IN 0..10 LOOP
+        range_start := i * 1000;
+        range_end := (i + 1) * 1000;
+        range_partition_name := format('parent_table_%s', i);
+        EXECUTE format('CREATE TABLE %I PARTITION OF parent_table FOR VALUES FROM (%s) TO (%s) PARTITION BY LIST(category)', range_partition_name, range_start, range_end);
+
+        -- Create list partitions within each range partition
+        FOR j IN 1..10 LOOP
+            list_partition_name := format('%s_list_%s', range_partition_name, j);
+            EXECUTE format('CREATE TABLE %I PARTITION OF %I FOR VALUES IN (''%s'') PARTITION BY HASH (id)', list_partition_name, range_partition_name, j);
+
+            -- Create hash partitions within each list partition
+            FOR k IN 0..4 LOOP
+                hash_partition_name := format('%s_hash_%s', list_partition_name, k);
+                EXECUTE format('CREATE TABLE %I PARTITION OF %I FOR VALUES WITH (MODULUS 5, REMAINDER %s)', hash_partition_name, list_partition_name, k);
+            END LOOP;
+        END LOOP;
+    END LOOP;
+END $$;
+
+
+DO $$
+DECLARE
+    i INT := 1;
+BEGIN
+    WHILE i <= 10000 LOOP
+        INSERT INTO parent_table (id, category, sub_category, value)
+        VALUES (i, '' || (i % 10 + 1), '' || (i % 10 + 1), i);
+        i := i + 1;
+    END LOOP;
+END $$;
+
+CREATE INDEX global_index_v ON parent_table(value) global;
+EXPLAIN (COSTS OFF) SELECT * FROM parent_table WHERE value = 9000;
+SELECT * FROM parent_table WHERE value = 9000;
+
+-- Cleanup
+DROP TABLE range_parted;
-- 
2.49.0

v1-0002-Provide-Support-for-creating-global-indexes-and-o.patchapplication/octet-stream; name=v1-0002-Provide-Support-for-creating-global-indexes-and-o.patchDownload

From 58e7cba46540340bb598661fb023e9ad8192518a Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@Dilip.local>
Date: Thu, 15 May 2025 16:59:31 +0530
Subject: [PATCH v1 2/4] Provide Support for creating global indexes and other
 DDL on global indexes

Syntax: CREATE INDEX name ON table (column_list) GLOBAL;

As described in commit message of the previous patch TIDs alone are
insufficient for uniquely identifying tuples in global indexes because
they include tuples from multiple partitions. To uniquely identify heap
tuples, we append a partitionID. For detailed information, refer to
the comments in the patch 0001 commit message.

In this design, the partitionID is included as the last key column of the
index. The rationale behind storing it as the last key column is that in
various scenarios, this column is treated as an extended index key column.
For example, in B-tree indexes, each index tuple must be uniquely identified.
If we encounter duplicate keys, we use heap TID as a tiebreaker. However,
for global indexes, relying solely on heap TID isn't adequate; we also
require the partition identifier. Including this as an additional key
column simplifies the process, as index tuples are arranged in key order,
and this column will automatically be part of that order.

Whenever a global index is created, this patch utilizes the interfaces from
the previous patch to assign a partitionID to each leaf relation of the
partitioned table on which the global index is being created. It then inserts
the (indexid, partitionID) -> reloid mapping into the pg_index_partition table
for each leaf relation.

Additionally, we need to create this mapping whenever a partition is attached.
Specifically, we must create mappings for all leaf partitions of the table being
attached, corresponding to the global indexes on the parent and all its ancestors
under which the new partition is being attached.

Similarly, when a partition is detached, we must invalidate the mappings for all
leaf partitions under the detached partition related to the global indexes on all
ancestors from which the partition is being detached.

Open Items:
- Rebuilding the global indexes when truncating the relation and reindexing the relation.
  currently global indexes are getting reindexed for each partition whereas it should be done only once

- Currently parition id column is treated as special column, Robert suggested to do it as an expression

- Vacuum is vacuuming global indexes multiple times for each leaf partition, this need to be optimized

- RelationGetIndexList is not follwoing the locking order i.e. parent to child as the child might already be locked
this is causing a deadlock, need to fix this
---
 contrib/pg_overexplain/pg_overexplain.c   |   3 +
 src/backend/access/common/reloptions.c    |   1 +
 src/backend/access/heap/heapam.c          |   6 +-
 src/backend/access/index/genam.c          |   9 +-
 src/backend/access/index/indexam.c        |   3 +-
 src/backend/access/nbtree/nbtdedup.c      |  45 ++-
 src/backend/access/nbtree/nbtinsert.c     | 190 ++++++++--
 src/backend/access/nbtree/nbtpage.c       |  84 ++++-
 src/backend/access/nbtree/nbtree.c        |  18 +
 src/backend/access/nbtree/nbtsort.c       |  84 ++++-
 src/backend/access/table/table.c          |   1 +
 src/backend/bootstrap/bootparse.y         |   2 +
 src/backend/catalog/aclchk.c              |   5 +-
 src/backend/catalog/dependency.c          |   3 +-
 src/backend/catalog/heap.c                |  25 +-
 src/backend/catalog/index.c               | 219 ++++++++---
 src/backend/catalog/namespace.c           |   1 +
 src/backend/catalog/objectaddress.c       |  11 +-
 src/backend/catalog/partition.c           |   6 +-
 src/backend/catalog/pg_class.c            |   2 +
 src/backend/catalog/pg_index_partitions.c |  48 ++-
 src/backend/catalog/toasting.c            |   2 +-
 src/backend/commands/analyze.c            |  55 ++-
 src/backend/commands/cluster.c            |   5 +
 src/backend/commands/indexcmds.c          | 197 ++++++++--
 src/backend/commands/tablecmds.c          | 430 ++++++++++++++++++++--
 src/backend/commands/vacuum.c             |   9 +-
 src/backend/executor/execIndexing.c       |  15 +-
 src/backend/optimizer/util/plancat.c      |   9 +
 src/backend/parser/gram.y                 |  21 +-
 src/backend/parser/parse_utilcmd.c        |  10 +
 src/backend/statistics/stat_utils.c       |   5 +
 src/backend/tcop/utility.c                |  12 +-
 src/backend/utils/adt/amutils.c           |   3 +-
 src/backend/utils/adt/ruleutils.c         |  11 +
 src/backend/utils/cache/lsyscache.c       |  21 ++
 src/backend/utils/cache/relcache.c        |  37 +-
 src/bin/pg_dump/pg_dump.c                 |   1 +
 src/bin/psql/describe.c                   |  15 +-
 src/include/access/nbtree.h               |  64 +++-
 src/include/access/tableam.h              |  20 +
 src/include/catalog/index.h               |  13 +-
 src/include/catalog/pg_class.h            |  10 +-
 src/include/catalog/pg_index_partitions.h |  24 ++
 src/include/commands/defrem.h             |   1 +
 src/include/commands/tablecmds.h          |   3 +-
 src/include/nodes/execnodes.h             |  10 +
 src/include/nodes/parsenodes.h            |   1 +
 src/include/utils/lsyscache.h             |   1 +
 src/include/utils/rel.h                   |  17 +-
 50 files changed, 1575 insertions(+), 213 deletions(-)

diff --git a/contrib/pg_overexplain/pg_overexplain.c b/contrib/pg_overexplain/pg_overexplain.c
index de824566f8..39c502d721 100644
--- a/contrib/pg_overexplain/pg_overexplain.c
+++ b/contrib/pg_overexplain/pg_overexplain.c
@@ -522,6 +522,9 @@ overexplain_range_table(PlannedStmt *plannedstmt, ExplainState *es)
 			case RELKIND_PARTITIONED_INDEX:
 				relkind = "partitioned_index";
 				break;
+			case RELKIND_GLOBAL_INDEX:
+				relkind = "global_index";
+				break;
 			case '\0':
 				relkind = NULL;
 				break;
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 50747c1639..fb4ace1bb6 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -1429,6 +1429,7 @@ extractRelOptions(HeapTuple tuple, TupleDesc tupdesc,
 			break;
 		case RELKIND_INDEX:
 		case RELKIND_PARTITIONED_INDEX:
+		case RELKIND_GLOBAL_INDEX:
 			options = index_reloptions(amoptions, datum, false);
 			break;
 		case RELKIND_FOREIGN_TABLE:
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0dcd6ee817..e8cabea93a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4259,7 +4259,8 @@ check_lock_if_inplace_updateable_rel(Relation relation,
 				else
 					dbid = MyDatabaseId;
 
-				if (classForm->relkind == RELKIND_INDEX)
+				if (classForm->relkind == RELKIND_INDEX ||
+					classForm->relkind == RELKIND_GLOBAL_INDEX)
 				{
 					Relation	irel = index_open(relid, AccessShareLock);
 
@@ -4313,7 +4314,8 @@ check_inplace_rel_lock(HeapTuple oldtup)
 	else
 		dbid = MyDatabaseId;
 
-	if (classForm->relkind == RELKIND_INDEX)
+	if (classForm->relkind == RELKIND_INDEX ||
+		classForm->relkind == RELKIND_GLOBAL_INDEX)
 	{
 		Relation	irel = index_open(relid, AccessShareLock);
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 0cb27af131..c2b80669aa 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -187,7 +187,14 @@ BuildIndexValueDescription(Relation indexRelation,
 	Oid			indrelid;
 	AclResult	aclresult;
 
-	indnkeyatts = IndexRelationGetNumberOfKeyAttributes(indexRelation);
+	/*
+	 * For global index skip the partitionID attribute while describing the
+	 * index values.
+	 */
+	if (RelationIsGlobalIndex(indexRelation))
+		indnkeyatts = IndexRelationGetNumberOfKeyAttributes(indexRelation) - 1;
+	else
+		indnkeyatts = IndexRelationGetNumberOfKeyAttributes(indexRelation);
 
 	/*
 	 * Check permissions- if the user does not have access to view all of the
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 219df1971d..3aa1fc92df 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -197,7 +197,8 @@ static inline void
 validate_relation_kind(Relation r)
 {
 	if (r->rd_rel->relkind != RELKIND_INDEX &&
-		r->rd_rel->relkind != RELKIND_PARTITIONED_INDEX)
+		r->rd_rel->relkind != RELKIND_PARTITIONED_INDEX &&
+		r->rd_rel->relkind != RELKIND_GLOBAL_INDEX)
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 				 errmsg("\"%s\" is not an index",
diff --git a/src/backend/access/nbtree/nbtdedup.c b/src/backend/access/nbtree/nbtdedup.c
index 08884116ae..bdd085c8cc 100644
--- a/src/backend/access/nbtree/nbtdedup.c
+++ b/src/backend/access/nbtree/nbtdedup.c
@@ -20,8 +20,10 @@
 #include "miscadmin.h"
 #include "utils/rel.h"
 
-static void _bt_bottomupdel_finish_pending(Page page, BTDedupState state,
-										   TM_IndexDeleteOp *delstate);
+static void _bt_bottomupdel_finish_pending(Relation rel, Page page,
+										   BTDedupState state,
+										   TM_IndexDeleteOp *delstate,
+										   PartidDeltidMapping *mapping);
 static bool _bt_do_singleval(Relation rel, Page page, BTDedupState state,
 							 OffsetNumber minoff, IndexTuple newitem);
 static void _bt_singleval_fillfactor(Page page, BTDedupState state,
@@ -315,6 +317,7 @@ _bt_bottomupdel_pass(Relation rel, Buffer buf, Relation heapRel,
 	BTDedupState state;
 	TM_IndexDeleteOp delstate;
 	bool		neverdedup;
+	PartidDeltidMapping *mapping;
 	int			nkeyatts = IndexRelationGetNumberOfKeyAttributes(rel);
 
 	/* Passed-in newitemsz is MAXALIGNED but does not include line pointer */
@@ -334,6 +337,9 @@ _bt_bottomupdel_pass(Relation rel, Buffer buf, Relation heapRel,
 	state->phystupsize = 0;
 	state->nintervals = 0;
 
+	/* Allocate memory for partittion id to deleted tid array mapping. */
+	mapping = palloc(MaxTIDsPerBTreePage * sizeof(PartidDeltidMapping));
+
 	/*
 	 * Initialize tableam state that describes bottom-up index deletion
 	 * operation.
@@ -382,14 +388,15 @@ _bt_bottomupdel_pass(Relation rel, Buffer buf, Relation heapRel,
 		else
 		{
 			/* Finalize interval -- move its TIDs to delete state */
-			_bt_bottomupdel_finish_pending(page, state, &delstate);
+			_bt_bottomupdel_finish_pending(rel, page, state, &delstate,
+										   mapping);
 
 			/* itup starts new pending interval */
 			_bt_dedup_start_pending(state, itup, offnum);
 		}
 	}
 	/* Finalize final interval -- move its TIDs to delete state */
-	_bt_bottomupdel_finish_pending(page, state, &delstate);
+	_bt_bottomupdel_finish_pending(rel, page, state, &delstate, mapping);
 
 	/*
 	 * We don't give up now in the event of having few (or even zero)
@@ -407,7 +414,7 @@ _bt_bottomupdel_pass(Relation rel, Buffer buf, Relation heapRel,
 	pfree(state);
 
 	/* Ask tableam which TIDs are deletable, then physically delete them */
-	_bt_delitems_delete_check(rel, buf, heapRel, &delstate);
+	_bt_delitems_delete_check(rel, buf, heapRel, &delstate, mapping);
 
 	pfree(delstate.deltids);
 	pfree(delstate.status);
@@ -645,10 +652,12 @@ _bt_dedup_finish_pending(Page newpage, BTDedupState state)
  * deletion operations.
  */
 static void
-_bt_bottomupdel_finish_pending(Page page, BTDedupState state,
-							   TM_IndexDeleteOp *delstate)
+_bt_bottomupdel_finish_pending(Relation rel, Page page, BTDedupState state,
+							   TM_IndexDeleteOp *delstate,
+							   PartidDeltidMapping *mapping)
 {
 	bool		dupinterval = (state->nitems > 1);
+	PartitionId	partid = InvalidPartitionId;
 
 	Assert(state->nitems > 0);
 	Assert(state->nitems <= state->nhtids);
@@ -662,6 +671,20 @@ _bt_bottomupdel_finish_pending(Page page, BTDedupState state,
 		TM_IndexDelete *ideltid = &delstate->deltids[delstate->ndeltids];
 		TM_IndexStatus *istatus = &delstate->status[delstate->ndeltids];
 
+		/*
+		 * A global index stored tids from multiple partitions so we also need
+		 * reloid along with tid to uniquely identifying the tuple.  We don't
+		 * need to convert partitionID to reloid for every item because we do
+		 * not deduplicate across partitionID, i.e. all items in BTDedupState
+		 * must belong to same partitionID.
+		 */
+		if (!PartIdIsValid(partid) && RelationIsGlobalIndex(rel))
+			partid = BTreeTupleGetPartitionId(rel, itup);
+
+		/* All IndexTuple in the state must be having same partitionID */
+		Assert(!RelationIsGlobalIndex(rel) ||
+			   partid == BTreeTupleGetPartitionId(rel, itup));
+
 		if (!BTreeTupleIsPosting(itup))
 		{
 			/* Simple case: A plain non-pivot tuple */
@@ -672,6 +695,9 @@ _bt_bottomupdel_finish_pending(Page page, BTDedupState state,
 			istatus->promising = dupinterval;	/* simple rule */
 			istatus->freespace = ItemIdGetLength(itemid) + sizeof(ItemIdData);
 
+			/* Create mapping entry. */
+			mapping->partid = partid;
+			mapping->idx = delstate->ndeltids;
 			delstate->ndeltids++;
 		}
 		else
@@ -735,6 +761,11 @@ _bt_bottomupdel_finish_pending(Page page, BTDedupState state,
 
 				ideltid++;
 				istatus++;
+
+				/* Create mapping entry. */
+				mapping->partid = partid;
+				mapping->idx = delstate->ndeltids;
+
 				delstate->ndeltids++;
 			}
 		}
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index aa82cede30..94baad3eee 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -17,6 +17,8 @@
 
 #include "access/nbtree.h"
 #include "access/nbtxlog.h"
+#include "access/relation.h"
+#include "access/table.h"
 #include "access/transam.h"
 #include "access/xloginsert.h"
 #include "common/int.h"
@@ -29,6 +31,17 @@
 /* Minimum tree height for application of fastpath optimization */
 #define BTREE_FASTPATH_MIN_LEVEL	2
 
+/*
+ * Table block information pointed to by LP_DEAD-set tuples in the index.
+ * For a global index, we also need the PartitionId along with the BlockNumber
+ * to determine which partition the block belongs to. This information is used
+ * during simple delete pass.
+ */
+typedef struct BTHeapBlockInfo
+{
+	PartitionId	partid;
+	BlockNumber	blockno;
+} BTHeapBlockInfo;
 
 static BTStack _bt_search_insert(Relation rel, Relation heaprel,
 								 BTInsertState insertstate);
@@ -70,10 +83,12 @@ static void _bt_simpledel_pass(Relation rel, Buffer buffer, Relation heapRel,
 							   OffsetNumber *deletable, int ndeletable,
 							   IndexTuple newitem, OffsetNumber minoff,
 							   OffsetNumber maxoff);
-static BlockNumber *_bt_deadblocks(Page page, OffsetNumber *deletable,
-								   int ndeletable, IndexTuple newitem,
-								   int *nblocks);
+static BTHeapBlockInfo* _bt_deadblocks(Relation rel, Page page,
+									   OffsetNumber *deletable,
+									   int ndeletable, IndexTuple newitem,
+									   int *nblocks);
 static inline int _bt_blk_cmp(const void *arg1, const void *arg2);
+static inline int _bt_indexdel_cmp(const void *arg1, const void *arg2);
 
 /*
  *	_bt_doinsert() -- Handle insertion of a single index tuple in the tree.
@@ -135,6 +150,13 @@ _bt_doinsert(Relation rel, IndexTuple itup,
 			Assert(checkUnique != UNIQUE_CHECK_EXISTING);
 			is_unique = true;
 		}
+
+		/*
+		 * Ignore the PartitionId attribute for the global indexes until the
+		 * uniqueness established.
+		 */
+		if (RelationIsGlobalIndex(rel))
+			itup_key->keysz--;
 	}
 
 	/*
@@ -235,6 +257,13 @@ search:
 		/* Uniqueness is established -- restore heap tid as scantid */
 		if (itup_key->heapkeyspace)
 			itup_key->scantid = &itup->t_tid;
+
+		/*
+		 * Uniqueness is established -- consider the PartitionId for
+		 * (heapkeyspace).
+		 */
+		if (RelationIsGlobalIndex(rel))
+			itup_key->keysz++;
 	}
 
 	if (checkUnique != UNIQUE_CHECK_EXISTING)
@@ -418,11 +447,13 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel,
 	OffsetNumber maxoff;
 	Page		page;
 	BTPageOpaque opaque;
+	Relation	partrel = heapRel;
 	Buffer		nbuf = InvalidBuffer;
 	bool		found = false;
 	bool		inposting = false;
 	bool		prevalldead = true;
 	int			curposti = 0;
+	Oid			heapoid = RelationGetRelid(heapRel);
 
 	/* Assume unique until we find a duplicate */
 	*is_unique = true;
@@ -540,6 +571,28 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel,
 					htid = *BTreeTupleGetPostingN(curitup, curposti);
 				}
 
+				/*
+				 * For a global indexes, we need to obtain the exact partition
+				 * heap relation corresponding to the partition ID stored
+				 * inside the index tuple.
+				 */
+				if (RelationIsGlobalIndex(rel))
+				{
+					Oid	curheapoid = BTreeTupleGetPartitionRelid(rel, curitup);
+
+					if (heapoid != curheapoid)
+					{
+						if (heapoid != RelationGetRelid(heapRel))
+						{
+							Assert(partrel != NULL);
+							relation_close(partrel, NoLock);
+						}
+
+						partrel = relation_open(curheapoid, NoLock);
+						heapoid = curheapoid;
+					}
+				}
+
 				/*
 				 * If we are doing a recheck, we expect to find the tuple we
 				 * are rechecking.  It's not a duplicate, but we have to keep
@@ -557,7 +610,7 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel,
 				 * with optimizations like heap's HOT, we have just a single
 				 * index entry for the entire chain.
 				 */
-				else if (table_index_fetch_tuple_check(heapRel, &htid,
+				else if (table_index_fetch_tuple_check(partrel, &htid,
 													   &SnapshotDirty,
 													   &all_dead))
 				{
@@ -576,6 +629,15 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel,
 						if (nbuf != InvalidBuffer)
 							_bt_relbuf(rel, nbuf);
 						*is_unique = false;
+
+						/*
+						 * Close the partrel if this is not same as the heapRel
+						 * passed by the caller.  Caller is responsible for
+						 * closing the input heapRel.
+						 */
+						if (partrel && partrel != heapRel)
+							table_close(partrel, NoLock);
+
 						return InvalidTransactionId;
 					}
 
@@ -594,6 +656,15 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel,
 						*speculativeToken = SnapshotDirty.speculativeToken;
 						/* Caller releases lock on buf immediately */
 						insertstate->bounds_valid = false;
+
+						/*
+						 * Close the partrel if this is not same as the heapRel
+						 * passed by the caller.  Caller is responsible for
+						 * closing the input heapRel.
+						 */
+						if (partrel && partrel != heapRel)
+							table_close(partrel, NoLock);
+
 						return xwait;
 					}
 
@@ -669,7 +740,7 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel,
 										RelationGetRelationName(rel)),
 								 key_desc ? errdetail("Key %s already exists.",
 													  key_desc) : 0,
-								 errtableconstraint(heapRel,
+								 errtableconstraint(partrel,
 													RelationGetRelationName(rel))));
 					}
 				}
@@ -751,6 +822,13 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel,
 		}
 	}
 
+	/*
+	 * Close the partrel if this is not same as the heapRel passed by the
+	 * caller.  Caller is responsible for closing the input heapRel.
+	 */
+	if (partrel && partrel != heapRel)
+		table_close(partrel, NoLock);
+
 	/*
 	 * If we are doing a recheck then we should have found the tuple we are
 	 * checking.  Otherwise there's something very wrong --- probably, the
@@ -762,7 +840,7 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel,
 				 errmsg("failed to re-find tuple within index \"%s\"",
 						RelationGetRelationName(rel)),
 				 errhint("This may be because of a non-immutable index expression."),
-				 errtableconstraint(heapRel,
+				 errtableconstraint(partrel,
 									RelationGetRelationName(rel))));
 
 	if (nbuf != InvalidBuffer)
@@ -2814,13 +2892,14 @@ _bt_simpledel_pass(Relation rel, Buffer buffer, Relation heapRel,
 				   OffsetNumber minoff, OffsetNumber maxoff)
 {
 	Page		page = BufferGetPage(buffer);
-	BlockNumber *deadblocks;
+	BTHeapBlockInfo *deadblocks;
 	int			ndeadblocks;
 	TM_IndexDeleteOp delstate;
 	OffsetNumber offnum;
+	PartidDeltidMapping *mapping;
 
 	/* Get array of table blocks pointed to by LP_DEAD-set tuples */
-	deadblocks = _bt_deadblocks(page, deletable, ndeletable, newitem,
+	deadblocks = _bt_deadblocks(rel, page, deletable, ndeletable, newitem,
 								&ndeadblocks);
 
 	/* Initialize tableam state that describes index deletion operation */
@@ -2832,6 +2911,9 @@ _bt_simpledel_pass(Relation rel, Buffer buffer, Relation heapRel,
 	delstate.deltids = palloc(MaxTIDsPerBTreePage * sizeof(TM_IndexDelete));
 	delstate.status = palloc(MaxTIDsPerBTreePage * sizeof(TM_IndexStatus));
 
+	/* Allocate memory for partittion id to deleted tid array mapping. */
+	mapping = palloc(MaxTIDsPerBTreePage * sizeof(PartidDeltidMapping));
+
 	for (offnum = minoff;
 		 offnum <= maxoff;
 		 offnum = OffsetNumberNext(offnum))
@@ -2840,14 +2922,16 @@ _bt_simpledel_pass(Relation rel, Buffer buffer, Relation heapRel,
 		IndexTuple	itup = (IndexTuple) PageGetItem(page, itemid);
 		TM_IndexDelete *odeltid = &delstate.deltids[delstate.ndeltids];
 		TM_IndexStatus *ostatus = &delstate.status[delstate.ndeltids];
-		BlockNumber tidblock;
+		BTHeapBlockInfo tidblock;
 		void	   *match;
 
 		if (!BTreeTupleIsPosting(itup))
 		{
-			tidblock = ItemPointerGetBlockNumber(&itup->t_tid);
+			tidblock.blockno = ItemPointerGetBlockNumber(&itup->t_tid);
+			tidblock.partid = (RelationIsGlobalIndex(rel)) ?
+						BTreeTupleGetPartitionId(rel, itup) : InvalidOid;
 			match = bsearch(&tidblock, deadblocks, ndeadblocks,
-							sizeof(BlockNumber), _bt_blk_cmp);
+							sizeof(BTHeapBlockInfo), _bt_blk_cmp);
 
 			if (!match)
 			{
@@ -2866,19 +2950,26 @@ _bt_simpledel_pass(Relation rel, Buffer buffer, Relation heapRel,
 			ostatus->promising = false; /* unused */
 			ostatus->freespace = 0; /* unused */
 
+			/* Create mapping entry. */
+			mapping->partid = tidblock.partid;
+			mapping->idx = delstate.ndeltids;
 			delstate.ndeltids++;
 		}
 		else
 		{
 			int			nitem = BTreeTupleGetNPosting(itup);
+			PartitionId	partid = (RelationIsGlobalIndex(rel)) ?
+						BTreeTupleGetPartitionId(rel, itup) : InvalidOid;
 
 			for (int p = 0; p < nitem; p++)
 			{
 				ItemPointer tid = BTreeTupleGetPostingN(itup, p);
 
-				tidblock = ItemPointerGetBlockNumber(tid);
+				tidblock.blockno = ItemPointerGetBlockNumber(tid);
+				tidblock.partid = partid;
+
 				match = bsearch(&tidblock, deadblocks, ndeadblocks,
-								sizeof(BlockNumber), _bt_blk_cmp);
+								sizeof(BTHeapBlockInfo), _bt_blk_cmp);
 
 				if (!match)
 				{
@@ -2899,6 +2990,10 @@ _bt_simpledel_pass(Relation rel, Buffer buffer, Relation heapRel,
 
 				odeltid++;
 				ostatus++;
+
+				/* Create mapping entry. */
+				mapping->partid = tidblock.partid;
+				mapping->idx = delstate.ndeltids;
 				delstate.ndeltids++;
 			}
 		}
@@ -2909,7 +3004,7 @@ _bt_simpledel_pass(Relation rel, Buffer buffer, Relation heapRel,
 	Assert(delstate.ndeltids >= ndeletable);
 
 	/* Physically delete LP_DEAD tuples (plus any delete-safe extra TIDs) */
-	_bt_delitems_delete_check(rel, buffer, heapRel, &delstate);
+	_bt_delitems_delete_check(rel, buffer, heapRel, &delstate, mapping);
 
 	pfree(delstate.deltids);
 	pfree(delstate.status);
@@ -2923,6 +3018,16 @@ _bt_simpledel_pass(Relation rel, Buffer buffer, Relation heapRel,
  * block from incoming newitem just in case it isn't among the LP_DEAD-related
  * table blocks.
  *
+ * For global indexes, we need the relation OID along with block numbers to
+ * uniquely identify the block. Therefore, we return the output in the form of
+ * a partition ID and block number pair and will convert the partition ID to
+ * the relation OID whenever we need to access the heap. While we could
+ * convert to the relation OID here and store it directly, this conversion
+ * might need to be done multiple times. So, we choose to convert when we
+ * really need to access the heap. Before accessing the heap, we first sort
+ * them in partition ID order, so the conversion from partition ID to relation
+ * OID  only needs to be done once per partition.
+ *
  * Always counting the newitem's table block as an LP_DEAD related block makes
  * sense because the cost is consistently low; it is practically certain that
  * the table block will not incur a buffer miss in tableam.  On the other hand
@@ -2934,13 +3039,14 @@ _bt_simpledel_pass(Relation rel, Buffer buffer, Relation heapRel,
  *
  * Returns final array, and sets *nblocks to its final size for caller.
  */
-static BlockNumber *
-_bt_deadblocks(Page page, OffsetNumber *deletable, int ndeletable,
-			   IndexTuple newitem, int *nblocks)
+static BTHeapBlockInfo *
+_bt_deadblocks(Relation rel, Page page, OffsetNumber *deletable,
+			   int ndeletable, IndexTuple newitem, int *nblocks)
 {
 	int			spacentids,
 				ntids;
-	BlockNumber *tidblocks;
+	bool		isglobalidx = RelationIsGlobalIndex(rel);
+	BTHeapBlockInfo *tidblocks;
 
 	/*
 	 * Accumulate each TID's block in array whose initial size has space for
@@ -2950,7 +3056,7 @@ _bt_deadblocks(Page page, OffsetNumber *deletable, int ndeletable,
 	 */
 	spacentids = ndeletable + 1;
 	ntids = 0;
-	tidblocks = (BlockNumber *) palloc(sizeof(BlockNumber) * spacentids);
+	tidblocks = (BTHeapBlockInfo *) palloc(sizeof(BTHeapBlockInfo) * spacentids);
 
 	/*
 	 * First add the table block for the incoming newitem.  This is the one
@@ -2958,7 +3064,15 @@ _bt_deadblocks(Page page, OffsetNumber *deletable, int ndeletable,
 	 * any known deletable items.
 	 */
 	Assert(!BTreeTupleIsPosting(newitem) && !BTreeTupleIsPivot(newitem));
-	tidblocks[ntids++] = ItemPointerGetBlockNumber(&newitem->t_tid);
+
+	/*
+	 * Store PartitionId and BlockNumber of the deletable item. For non-global
+	 * indexes, just store InvalidPartitionId, as it is never going to be
+	 * accessed.
+	 */
+	tidblocks[ntids].partid = isglobalidx ?
+				BTreeTupleGetPartitionId(rel, newitem) : InvalidPartitionId;
+	tidblocks[ntids++].blockno = ItemPointerGetBlockNumber(&newitem->t_tid);
 
 	for (int i = 0; i < ndeletable; i++)
 	{
@@ -2972,34 +3086,41 @@ _bt_deadblocks(Page page, OffsetNumber *deletable, int ndeletable,
 			if (ntids + 1 > spacentids)
 			{
 				spacentids *= 2;
-				tidblocks = (BlockNumber *)
-					repalloc(tidblocks, sizeof(BlockNumber) * spacentids);
+				tidblocks = (BTHeapBlockInfo *)
+					repalloc(tidblocks, sizeof(BTHeapBlockInfo) * spacentids);
 			}
 
-			tidblocks[ntids++] = ItemPointerGetBlockNumber(&itup->t_tid);
+			/* Store PartitionId and BlockNumber of the deletable item. */
+			tidblocks[ntids].partid = isglobalidx ?
+					BTreeTupleGetPartitionId(rel, itup) : InvalidPartitionId;
+			tidblocks[ntids++].blockno =
+					ItemPointerGetBlockNumber(&itup->t_tid);
 		}
 		else
 		{
 			int			nposting = BTreeTupleGetNPosting(itup);
+			PartitionId	partid = isglobalidx ?
+					BTreeTupleGetPartitionId(rel, itup) : InvalidPartitionId;
 
 			if (ntids + nposting > spacentids)
 			{
 				spacentids = Max(spacentids * 2, ntids + nposting);
-				tidblocks = (BlockNumber *)
-					repalloc(tidblocks, sizeof(BlockNumber) * spacentids);
+				tidblocks = (BTHeapBlockInfo *)
+					repalloc(tidblocks, sizeof(BTHeapBlockInfo) * spacentids);
 			}
 
 			for (int j = 0; j < nposting; j++)
 			{
 				ItemPointer tid = BTreeTupleGetPostingN(itup, j);
 
-				tidblocks[ntids++] = ItemPointerGetBlockNumber(tid);
+				tidblocks[ntids].partid = partid;
+				tidblocks[ntids++].blockno = ItemPointerGetBlockNumber(tid);
 			}
 		}
 	}
 
-	qsort(tidblocks, ntids, sizeof(BlockNumber), _bt_blk_cmp);
-	*nblocks = qunique(tidblocks, ntids, sizeof(BlockNumber), _bt_blk_cmp);
+	qsort(tidblocks, ntids, sizeof(BTHeapBlockInfo), _bt_blk_cmp);
+	*nblocks = qunique(tidblocks, ntids, sizeof(BTHeapBlockInfo), _bt_blk_cmp);
 
 	return tidblocks;
 }
@@ -3010,8 +3131,15 @@ _bt_deadblocks(Page page, OffsetNumber *deletable, int ndeletable,
 static inline int
 _bt_blk_cmp(const void *arg1, const void *arg2)
 {
-	BlockNumber b1 = *((BlockNumber *) arg1);
-	BlockNumber b2 = *((BlockNumber *) arg2);
+	BTHeapBlockInfo *b1 = ((BTHeapBlockInfo *) arg1);
+	BTHeapBlockInfo *b2 = ((BTHeapBlockInfo *) arg2);
+	int	res;
 
-	return pg_cmp_u32(b1, b2);
+	/*
+	 * First compare partids if they are same then compare the block numbers.
+	 */
+	res = pg_cmp_u32(b1->partid, b2->partid);
+	if (res == 0)
+		res = pg_cmp_u32(b1->blockno, b2->blockno);
+	return res;
 }
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index c79dd38ee1..08505cd262 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -24,6 +24,7 @@
 
 #include "access/nbtree.h"
 #include "access/nbtxlog.h"
+#include "access/table.h"
 #include "access/tableam.h"
 #include "access/transam.h"
 #include "access/xlog.h"
@@ -1509,9 +1510,9 @@ _bt_delitems_cmp(const void *a, const void *b)
  * field (tableam will sort deltids for its own reasons, so we'll need to put
  * it back in leaf-page-wise order afterwards).
  */
-void
-_bt_delitems_delete_check(Relation rel, Buffer buf, Relation heapRel,
-						  TM_IndexDeleteOp *delstate)
+static void
+_bt_delitems_delete_check_guts(Relation rel, Buffer buf, Relation heapRel,
+							   TM_IndexDeleteOp *delstate)
 {
 	Page		page = BufferGetPage(buf);
 	TransactionId snapshotConflictHorizon;
@@ -1678,6 +1679,83 @@ _bt_delitems_delete_check(Relation rel, Buffer buf, Relation heapRel,
 		pfree(updatable[i]);
 }
 
+/*
+ * Try to delete item(s) from a btree leaf page during single-page cleanup.
+ *
+ * Refer to the detailed comments in '_bt_delitems_delete_check_guts' for more
+ * information. This function serves as a wrapper to handle the case of a
+ * global index, where we might have TIDs from multiple partitions. It calls
+ * the core functionality for each heap relation corresponding to each
+ * partition.
+ */
+void
+_bt_delitems_delete_check(Relation rel, Buffer buf, Relation heapRel,
+						  TM_IndexDeleteOp *delstate,
+						  PartidDeltidMapping *mapping)
+{
+	/*
+	 * For global index we need to delete the items for each partition
+	 * separately.
+	 */
+	if (RelationIsGlobalIndex(rel))
+	{
+		int		ndeltid;
+		int		starttid = 0;
+		Oid		prevpartid = InvalidPartitionId;
+		TM_IndexDeleteOp partdelstate = *delstate;
+
+		/*
+		 * Sort the mapping array in partittion id order so that we avoid
+		 * calling tableAM for same relation multiple times.
+		 */
+		qsort(mapping, delstate->ndeltids, sizeof(PartidDeltidMapping),
+			  _bt_indexdel_cmp);
+
+		for (ndeltid = 0; ndeltid < delstate->ndeltids; ndeltid++)
+		{
+
+			/*
+			 * If ndeltid is not same as the index present in the mapping then
+			 * swap it with the correct entry.
+			 */
+			if (mapping[ndeltid].idx != ndeltid)
+			{
+				int				idx = mapping[ndeltid].idx;
+				TM_IndexDelete	tmp = delstate->deltids[idx];
+
+				delstate->deltids[idx] = delstate->deltids[ndeltid];
+				delstate->deltids[ndeltid] = tmp;
+			}
+
+			/*
+			 * If this item belong to a different PartitionID mean we need to
+			 * process delete for all the items of the previous PartitionID.
+			 * Also if this is the last item then we need to process all the
+			 * items of the last PartitionId.
+			 */
+			if (PartIdIsValid(prevpartid) &&
+				(mapping[0].partid != prevpartid ||
+				 ndeltid == delstate->ndeltids - 1))
+			{
+				Oid			reloid = IndexGetPartitionReloid(rel, prevpartid);
+				Relation	childRel = table_open(reloid, AccessShareLock);
+
+				partdelstate.deltids = &delstate->deltids[starttid];
+				partdelstate.ndeltids = ndeltid - starttid;
+
+				_bt_delitems_delete_check_guts(rel, buf, childRel,
+											   &partdelstate);
+				starttid = ndeltid;
+				table_close(childRel, AccessShareLock);
+			}
+
+			prevpartid = mapping[ndeltid].partid;
+		}
+	}
+	else
+		_bt_delitems_delete_check_guts(rel, buf, heapRel, delstate);
+}
+
 /*
  * Check that leftsib page (the btpo_prev of target page) is not marked with
  * INCOMPLETE_SPLIT flag.  Used during page deletion.
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index fdff960c13..c3960784eb 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -1505,6 +1505,16 @@ backtrack:
 		nhtidslive = 0;
 		if (callback)
 		{
+			PartitionId		partid = InvalidPartitionId;
+
+			/*
+			 * If this is a global index then get the partition id for the
+			 * heap relation being vacuum so that we only call the callback
+			 * functions for the index tuple which belong to this partition.
+			 */
+			if (RelationIsGlobalIndex(rel) && !PartIdIsValid(partid))
+				partid = IndexGetRelationPartitionId(rel, RelationGetRelid(heaprel));
+
 			/* btbulkdelete callback tells us what to delete (or update) */
 			for (offnum = minoff;
 				 offnum <= maxoff;
@@ -1515,6 +1525,14 @@ backtrack:
 				itup = (IndexTuple) PageGetItem(page,
 												PageGetItemId(page, offnum));
 
+				/*
+				 * For global index only call the callback for the heap
+				 * relation which is being vacuumed;
+				 */
+				if (RelationIsGlobalIndex(rel) &&
+					BTreeTupleGetPartitionId(rel, itup) != partid)
+					continue;
+
 				Assert(!BTreeTupleIsPivot(itup));
 				if (!BTreeTupleIsPosting(itup))
 				{
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 9d70e89c1f..bba47dc969 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -46,9 +46,11 @@
 #include "access/table.h"
 #include "access/xact.h"
 #include "catalog/index.h"
+#include "catalog/pg_inherits.h"
 #include "commands/progress.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
+#include "partitioning/partdesc.h"
 #include "pgstat.h"
 #include "storage/bulk_write.h"
 #include "tcop/tcopprot.h"
@@ -286,7 +288,9 @@ static void _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 									   BTShared *btshared, Sharedsort *sharedsort,
 									   Sharedsort *sharedsort2, int sortmem,
 									   bool progress);
-
+static double _bt_spool_scan_partitions(IndexInfo *indexInfo, Relation rel,
+										BTBuildState *buildstate,
+										Relation irel);
 
 /*
  *	btbuild() -- build a new btree index.
@@ -350,6 +354,68 @@ btbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	return result;
 }
 
+/*
+ * This is a wrapper function to call table_index_build_scan() for each leaf
+ * partition while building a global index.
+ */
+static double
+_bt_spool_scan_partitions(IndexInfo *indexInfo, Relation rel,
+						  BTBuildState *buildstate, Relation irel)
+{
+	double		reltuples = 0;
+	List	   *tableIds;
+
+	Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+	Assert(RelationIsGlobalIndex(irel));
+
+	/*
+	 * To retrieve the tuple from all leaf relations, we need to obtain a list
+	 * of all inheritor relations. This operation should only be performed
+	 * during the creation of an index, reindexing, or truncating a relation.
+	 * In these cases, when a global index is involved, the caller already
+	 * holds the necessary locks on all inheritor relations. Therefore, we can
+	 * safely proceed with NoLock in this context.
+	 */
+	tableIds = find_all_inheritors(RelationGetRelid(rel), NoLock, NULL);
+
+	foreach_oid(tableOid, tableIds)
+	{
+		Relation	childrel = table_open(tableOid, NoLock);
+		double		curreltuples;
+
+		/*
+		 * Only leaf relation holds the data so we can ignore other inheritors.
+		 */
+		if (childrel->rd_rel->relkind != RELKIND_RELATION)
+		{
+			table_close(childrel, NoLock);
+			continue;
+		}
+
+		/*
+		 * Get partition id of this partition with respect to the global
+		 * index.
+		 */
+		indexInfo->ii_partid = IndexGetRelationPartitionId(irel, tableOid);
+		curreltuples = table_index_build_scan(childrel, irel, indexInfo, true,
+											  true, _bt_build_callback,
+											  (void *) buildstate, NULL);
+		reltuples += curreltuples;
+
+		/*
+		 * This is the right place to update the relation stats while building
+		 * the global index because at this point we know the individual
+		 * tuples for each partition.
+		 */
+		index_update_stats(childrel, true, true, curreltuples);
+		table_close(childrel, NoLock);
+	}
+
+	list_free(tableIds);
+
+	return reltuples;
+}
+
 /*
  * Create and initialize one or two spool structures, and save them in caller's
  * buildstate argument.  May also fill-in fields within indexInfo used by index
@@ -474,9 +540,19 @@ _bt_spools_heapscan(Relation heap, Relation index, BTBuildState *buildstate,
 
 	/* Fill spool using either serial or parallel heap scan */
 	if (!buildstate->btleader)
-		reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
-										   _bt_build_callback, buildstate,
-										   NULL);
+	{
+		/*
+		 * If we are building a global index then we need to scan all the
+		 * child partitions and insert into the global index.
+		 */
+		if (RelationIsGlobalIndex(index))
+			reltuples = _bt_spool_scan_partitions(indexInfo, heap,
+												  buildstate, index);
+		else
+			reltuples = table_index_build_scan(heap, index, indexInfo, true,
+											   true, _bt_build_callback,
+											   buildstate, NULL);
+	}
 	else
 		reltuples = _bt_parallel_heapscan(buildstate,
 										  &indexInfo->ii_BrokenHotChain);
diff --git a/src/backend/access/table/table.c b/src/backend/access/table/table.c
index be698bba0e..8243a565f9 100644
--- a/src/backend/access/table/table.c
+++ b/src/backend/access/table/table.c
@@ -139,6 +139,7 @@ validate_relation_kind(Relation r)
 {
 	if (r->rd_rel->relkind == RELKIND_INDEX ||
 		r->rd_rel->relkind == RELKIND_PARTITIONED_INDEX ||
+		r->rd_rel->relkind == RELKIND_GLOBAL_INDEX ||
 		r->rd_rel->relkind == RELKIND_COMPOSITE_TYPE)
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
diff --git a/src/backend/bootstrap/bootparse.y b/src/backend/bootstrap/bootparse.y
index 9833f52c1b..2393ed45ff 100644
--- a/src/backend/bootstrap/bootparse.y
+++ b/src/backend/bootstrap/bootparse.y
@@ -313,6 +313,7 @@ Boot_DeclareIndexStmt:
 								$4,
 								InvalidOid,
 								InvalidOid,
+								NIL,
 								-1,
 								false,
 								false,
@@ -366,6 +367,7 @@ Boot_DeclareUniqueIndexStmt:
 								$5,
 								InvalidOid,
 								InvalidOid,
+								NIL,
 								-1,
 								false,
 								false,
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 9ca8a88dc9..6584202525 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -1800,7 +1800,8 @@ ExecGrant_Relation(InternalGrant *istmt)
 
 		/* Not sensible to grant on an index */
 		if (pg_class_tuple->relkind == RELKIND_INDEX ||
-			pg_class_tuple->relkind == RELKIND_PARTITIONED_INDEX)
+			pg_class_tuple->relkind == RELKIND_PARTITIONED_INDEX ||
+			pg_class_tuple->relkind == RELKIND_GLOBAL_INDEX)
 			ereport(ERROR,
 					(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 					 errmsg("\"%s\" is an index",
@@ -4364,6 +4365,7 @@ recordExtObjInitPriv(Oid objoid, Oid classoid)
 		 */
 		if (pg_class_tuple->relkind == RELKIND_INDEX ||
 			pg_class_tuple->relkind == RELKIND_PARTITIONED_INDEX ||
+			pg_class_tuple->relkind == RELKIND_GLOBAL_INDEX ||
 			pg_class_tuple->relkind == RELKIND_COMPOSITE_TYPE)
 		{
 			ReleaseSysCache(tuple);
@@ -4524,6 +4526,7 @@ removeExtObjInitPriv(Oid objoid, Oid classoid)
 		 */
 		if (pg_class_tuple->relkind == RELKIND_INDEX ||
 			pg_class_tuple->relkind == RELKIND_PARTITIONED_INDEX ||
+			pg_class_tuple->relkind == RELKIND_GLOBAL_INDEX ||
 			pg_class_tuple->relkind == RELKIND_COMPOSITE_TYPE)
 		{
 			ReleaseSysCache(tuple);
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 7dded634eb..2db9d847e0 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1358,7 +1358,8 @@ doDeletion(const ObjectAddress *object, int flags)
 				char		relKind = get_rel_relkind(object->objectId);
 
 				if (relKind == RELKIND_INDEX ||
-					relKind == RELKIND_PARTITIONED_INDEX)
+					relKind == RELKIND_PARTITIONED_INDEX ||
+					relKind == RELKIND_GLOBAL_INDEX)
 				{
 					bool		concurrent = ((flags & PERFORM_DELETION_CONCURRENTLY) != 0);
 					bool		concurrent_lock_mode = ((flags & PERFORM_DELETION_CONCURRENT_LOCK) != 0);
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index fd6537567e..19f27f8f2b 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1221,6 +1221,7 @@ heap_create_with_catalog(const char *relname,
 			 */
 			Assert(relkind != RELKIND_INDEX);
 			Assert(relkind != RELKIND_PARTITIONED_INDEX);
+			Assert(relkind != RELKIND_GLOBAL_INDEX);
 
 			if (relkind == RELKIND_TOASTVALUE)
 			{
@@ -1338,7 +1339,8 @@ heap_create_with_catalog(const char *relname,
 	if (!(relkind == RELKIND_SEQUENCE ||
 		  relkind == RELKIND_TOASTVALUE ||
 		  relkind == RELKIND_INDEX ||
-		  relkind == RELKIND_PARTITIONED_INDEX))
+		  relkind == RELKIND_PARTITIONED_INDEX ||
+		  relkind == RELKIND_GLOBAL_INDEX))
 	{
 		Oid			new_array_oid;
 		ObjectAddress new_type_addr;
@@ -1875,6 +1877,24 @@ heap_drop_with_catalog(Oid relid)
 	if (relid == defaultPartOid)
 		update_default_partition_oid(parentOid, InvalidOid);
 
+	/*
+	 * If leaf relation of a partitioned table is being drop then detach it
+	 * from the global indexes i.e. remove all the mappings from
+	 * pg_index_partition relation.  We don't have any mapping for non-leaf
+	 * relation so nothing to do for them.
+	 */
+	if (get_rel_relispartition(relid))
+	{
+		List	*indexids = RelationGetIndexList(rel);
+		List	*reloids = list_make1_oid(relid);
+
+		/* Detach the reloid from the global indexes. */
+		DetachFromGlobalIndexes(indexids, reloids);
+
+		list_free(indexids);
+		list_free(reloids);
+	}
+
 	/*
 	 * Schedule unlinking of the relation's physical files at commit.
 	 */
@@ -3534,6 +3554,9 @@ RemoveStatistics(Oid relid, AttrNumber attnum)
  *
  * The routine will truncate and then reconstruct the indexes on
  * the specified relation.  Caller must hold exclusive lock on rel.
+ *
+ * TODO: Handle the global indexes, global indexes should not be rebuild while
+ * truncating each leaf relation.
  */
 static void
 RelationTruncateIndexes(Relation heapRelation)
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index aa216683b7..2bf9b59c9d 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -44,6 +44,7 @@
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_description.h"
+#include "catalog/pg_index_partitions.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
@@ -62,6 +63,7 @@
 #include "nodes/nodeFuncs.h"
 #include "optimizer/optimizer.h"
 #include "parser/parser.h"
+#include "partitioning/partdesc.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "rewrite/rewriteManip.h"
@@ -120,9 +122,6 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
 								bool immediate,
 								bool isvalid,
 								bool isready);
-static void index_update_stats(Relation rel,
-							   bool hasindex,
-							   double reltuples);
 static void IndexCheckExclusion(Relation heapRelation,
 								Relation indexRelation,
 								IndexInfo *indexInfo);
@@ -342,12 +341,24 @@ ConstructTupleDescriptor(Relation heapRelation,
 			/* Simple index column */
 			const FormData_pg_attribute *from;
 
-			Assert(atnum > 0);	/* should've been caught above */
+			/*
+			 * For global indexes along with the positive attribute number we
+			 * can also get the PartitionIdAttributeNumber.
+			 */
+			Assert(atnum > 0 || atnum == PartitionIdAttributeNumber);
 
 			if (atnum > natts)	/* safety check */
 				elog(ERROR, "invalid column number %d", atnum);
-			from = TupleDescAttr(heapTupDesc,
-								 AttrNumberGetAttrOffset(atnum));
+
+			/*
+			 * If the attribute number is PartitionIdAttributeNumber then
+			 * directly assign to the predefined partitionid_attr constant.
+			 */
+			if (atnum == PartitionIdAttributeNumber)
+				from = &partitionid_attr;
+			else
+				from = TupleDescAttr(heapTupDesc,
+									 AttrNumberGetAttrOffset(atnum));
 
 			to->atttypid = from->atttypid;
 			to->attlen = from->attlen;
@@ -719,6 +730,7 @@ UpdateIndexRelation(Oid indexoid,
  * allow_system_table_mods: allow table to be a system catalog
  * is_internal: if true, post creation hook for new index
  * constraintId: if not NULL, receives OID of created constraint
+ * inheritors: if not NIL, receives OIDs of all the inheritors
  *
  * Returns the OID of the created index.
  */
@@ -743,7 +755,8 @@ index_create(Relation heapRelation,
 			 bits16 constr_flags,
 			 bool allow_system_table_mods,
 			 bool is_internal,
-			 Oid *constraintId)
+			 Oid *constraintId,
+			 List *inheritors)
 {
 	Oid			heapRelationId = RelationGetRelid(heapRelation);
 	Relation	pg_class;
@@ -759,6 +772,7 @@ index_create(Relation heapRelation,
 	bool		invalid = (flags & INDEX_CREATE_INVALID) != 0;
 	bool		concurrent = (flags & INDEX_CREATE_CONCURRENT) != 0;
 	bool		partitioned = (flags & INDEX_CREATE_PARTITIONED) != 0;
+	bool		global_index = (flags & INDEX_CREATE_GLOBAL) != 0;
 	char		relkind;
 	TransactionId relfrozenxid;
 	MultiXactId relminmxid;
@@ -770,7 +784,13 @@ index_create(Relation heapRelation,
 	/* partitioned indexes must never be "built" by themselves */
 	Assert(!partitioned || (flags & INDEX_CREATE_SKIP_BUILD));
 
-	relkind = partitioned ? RELKIND_PARTITIONED_INDEX : RELKIND_INDEX;
+	if (global_index)
+		relkind = RELKIND_GLOBAL_INDEX;
+	else if (partitioned)
+		relkind = RELKIND_PARTITIONED_INDEX;
+	else
+		relkind = RELKIND_INDEX;
+
 	is_exclusion = (indexInfo->ii_ExclusionOps != NULL);
 
 	pg_class = table_open(RelationRelationId, RowExclusiveLock);
@@ -1051,10 +1071,34 @@ index_create(Relation heapRelation,
 						!concurrent);
 
 	/*
-	 * Register relcache invalidation on the indexes' heap relation, to
-	 * maintain consistency of its index list
+	 * Create the mapping in pg_index_partitions table, also register relcache
+	 * invalidation on the indexes' heap relation, to maintain consistency of
+	 * its index list.  If we are creating a global index then invalidate the
+	 * relcache of all the inheritors as well.
 	 */
-	CacheInvalidateRelcache(heapRelation);
+	if (global_index)
+	{
+		Assert(inheritors != NIL);
+		AttachParittionsToGlobalIndex(indexRelation, inheritors);
+		foreach_oid(tableOid, inheritors)
+		{
+			Relation	childrel = table_open(tableOid, NoLock);
+
+			CacheInvalidateRelcache(childrel);
+			table_close(childrel, NoLock);
+		}
+
+		/*
+		 * IndexPartitionInfo cache got built while we were inserting the tuple
+		 * in system table so this might not be complete so clean this up and
+		 * let it get build whenever needed.
+		 *
+		 * FIXME recheck whether we really need to do this?
+		 */
+		indexRelation->rd_indexpartinfo = NULL;
+	}
+	else
+		CacheInvalidateRelcache(heapRelation);
 
 	/* update pg_inherits and the parent's relhassubclass, if needed */
 	if (OidIsValid(parentIndexRelid))
@@ -1268,7 +1312,7 @@ index_create(Relation heapRelation,
 		 * having an index.
 		 */
 		index_update_stats(heapRelation,
-						   true,
+						   true, false,
 						   -1.0);
 		/* Make the above update visible */
 		CommandCounterIncrement();
@@ -1462,7 +1506,8 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 							  0,
 							  true, /* allow table to be a system catalog? */
 							  false,	/* is_internal? */
-							  NULL);
+							  NULL,
+							  NIL);
 
 	/* Close the relations used and clean up */
 	index_close(indexRelation, NoLock);
@@ -2127,6 +2172,7 @@ index_drop(Oid indexId, bool concurrent, bool concurrent_lock_mode)
 	Relation	indexRelation;
 	HeapTuple	tuple;
 	bool		hasexprs;
+	bool		isglobal;
 	LockRelId	heaprelid,
 				indexrelid;
 	LOCKTAG		heaplocktag;
@@ -2319,6 +2365,9 @@ index_drop(Oid indexId, bool concurrent, bool concurrent_lock_mode)
 		TransferPredicateLocksToHeapRelation(userIndexRelation);
 	}
 
+	/* Remember whether it is a global index. */
+	isglobal = RelationIsGlobalIndex(userIndexRelation);
+
 	/*
 	 * Schedule physical removal of the files (if any)
 	 */
@@ -2384,15 +2433,36 @@ index_drop(Oid indexId, bool concurrent, bool concurrent_lock_mode)
 	 */
 	DeleteInheritsTuple(indexId, InvalidOid, false, NULL);
 
+	/*
+	 * Remove all the mapping present in pg_index_partitions table for this
+	 * global index.
+	 */
+	if (isglobal)
+		DeleteIndexPartitionEntries(indexId);
+
 	/*
 	 * We are presently too lazy to attempt to compute the new correct value
 	 * of relhasindex (the next VACUUM will fix it if necessary). So there is
 	 * no need to update the pg_class tuple for the owning relation. But we
 	 * must send out a shared-cache-inval notice on the owning relation to
 	 * ensure other backends update their relcache lists of indexes.  (In the
-	 * concurrent case, this is redundant but harmless.)
+	 * concurrent case, this is redundant but harmless.).  If we are dropping a
+	 * global index then invalidate the relcache of all the inheritors as well.
 	 */
-	CacheInvalidateRelcache(userHeapRelation);
+	if (isglobal)
+	{
+		/*
+		 * Pass lockmode as NoLock because caller should already hold the lock
+		 * on all the partitions.  Check code in RemoveRelations().
+		 */
+		List *tableIds = find_all_inheritors(heapId, NoLock, NULL);
+
+		foreach_oid(tableOid, tableIds)
+			CacheInvalidateRelcacheByRelid(tableOid);
+		list_free(tableIds);
+	}
+	else
+		CacheInvalidateRelcache(userHeapRelation);
 
 	/*
 	 * Close owning rel, but keep lock
@@ -2753,7 +2823,16 @@ FormIndexDatum(IndexInfo *indexInfo,
 		Datum		iDatum;
 		bool		isNull;
 
-		if (keycol < 0)
+		/*
+		 * If the attribute number is PartitionIdAttributeNumber then directly
+		 * assign the value stored in indexInfo->ii_partid.
+		 */
+		if (keycol == PartitionIdAttributeNumber)
+		{
+			iDatum = indexInfo->ii_partid;
+			isNull = false;
+		}
+		else if (keycol < 0)
 			iDatum = slot_getsysattr(slot, keycol, &isNull);
 		else if (keycol != 0)
 		{
@@ -2805,9 +2884,10 @@ FormIndexDatum(IndexInfo *indexInfo,
  * index.  When updating an index, it's important because some index AMs
  * expect a relcache flush to occur after REINDEX.
  */
-static void
+void
 index_update_stats(Relation rel,
 				   bool hasindex,
+				   bool hasglobalindex,
 				   double reltuples)
 {
 	bool		update_stats;
@@ -2930,6 +3010,18 @@ index_update_stats(Relation rel,
 		dirty = true;
 	}
 
+	/*
+	 * Set it to true if we have created global index and it is already not set
+	 * to true.  Afterward if we are creating some other index then input
+	 * hasglobalindex would be false so we don't need to do anything in that
+	 * case.
+	 */
+	if (hasglobalindex && !rd_rel->relhasglobalindex)
+	{
+		rd_rel->relhasglobalindex = hasglobalindex;
+		dirty = true;
+	}
+
 	if (update_stats)
 	{
 		if (rd_rel->relpages != (int32) relpages)
@@ -2981,7 +3073,6 @@ index_update_stats(Relation rel,
 	table_close(pg_class, RowExclusiveLock);
 }
 
-
 /*
  * index_build - invoke access-method-specific index build procedure
  *
@@ -3010,6 +3101,10 @@ index_build(Relation heapRelation,
 	int			save_sec_context;
 	int			save_nestlevel;
 
+	/* XXX Currently parallel build is not supported for global indexes. */
+	if (RelationIsGlobalIndex(indexRelation))
+		parallel = false;
+
 	/*
 	 * sanity checks
 	 */
@@ -3150,14 +3245,26 @@ index_build(Relation heapRelation,
 	}
 
 	/*
-	 * Update heap and index pg_class rows
+	 * Update the pg_class rows for the heap and index.  If this is a
+	 * partitioned relation, meaning we are building a global index, so just
+	 * set the relation has an index.  We have already updated the heap tuple
+	 * stats for leaf relation while processing each partition inside
+	 * _bt_spool_scan_partitions().
+	 *
+	 * TODO: We might choose to change the ambuild function to return array
+	 * of stats so that we can get a seperate stats for each partition.  And
+	 * then instead of setting stats in _bt_spool_scan_partitions() we can do
+	 * that here because interface wise that would look cleaner.
 	 */
-	index_update_stats(heapRelation,
-					   true,
-					   stats->heap_tuples);
+	if (heapRelation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		index_update_stats(heapRelation, true, true, -1.0);
+	else
+		index_update_stats(heapRelation,
+						   true, false,
+						   stats->heap_tuples);
 
 	index_update_stats(indexRelation,
-					   false,
+					   false, false,
 					   stats->index_tuples);
 
 	/* Make the updated catalog row versions visible */
@@ -3607,10 +3714,9 @@ IndexGetRelation(Oid indexId, bool missing_ok)
 void
 reindex_index(const ReindexStmt *stmt, Oid indexId,
 			  bool skip_constraint_checks, char persistence,
-			  const ReindexParams *params)
+			  const ReindexParams *params, Relation heapRelation)
 {
-	Relation	iRel,
-				heapRelation;
+	Relation	iRel;
 	Oid			heapId;
 	Oid			save_userid;
 	int			save_sec_context;
@@ -3620,27 +3726,44 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 	PGRUsage	ru0;
 	bool		progress = ((params->options & REINDEXOPT_REPORT_PROGRESS) != 0);
 	bool		set_tablespace = false;
+	bool		close_rel = false;
 
 	pg_rusage_init(&ru0);
 
 	/*
-	 * Open and lock the parent heap relation.  ShareLock is sufficient since
-	 * we only need to be sure no schema or data changes are going on.
+	 * Open and lock the parent heap relation if not done by caller.  ShareLock
+	 * is sufficient since we only need to be sure no schema or data changes
+	 * are going on.
 	 */
-	heapId = IndexGetRelation(indexId,
-							  (params->options & REINDEXOPT_MISSING_OK) != 0);
-	/* if relation is missing, leave */
-	if (!OidIsValid(heapId))
-		return;
+	if (!heapRelation)
+	{
+		heapId = IndexGetRelation(indexId,
+								(params->options & REINDEXOPT_MISSING_OK) != 0);
+		/* if relation is missing, leave */
+		if (!OidIsValid(heapId))
+			return;
+
+		if ((params->options & REINDEXOPT_MISSING_OK) != 0)
+			heapRelation = try_table_open(heapId, ShareLock);
+		else
+			heapRelation = table_open(heapId, ShareLock);
 
-	if ((params->options & REINDEXOPT_MISSING_OK) != 0)
-		heapRelation = try_table_open(heapId, ShareLock);
+		/* if relation is gone, leave */
+		if (!heapRelation)
+			return;
+		close_rel = true;
+	}
 	else
-		heapRelation = table_open(heapId, ShareLock);
+		heapId = RelationGetRelid(heapRelation);
 
-	/* if relation is gone, leave */
-	if (!heapRelation)
-		return;
+	/*
+	 * If we are reindexing the global index then lock all the inheritors
+	 * because we are going to access all the inheritors for building the
+	 * global index.  ShareLock is enough to prevent schema modifications.
+	 * We need to lock.
+	 */
+	if (get_rel_relkind(indexId) == RELKIND_GLOBAL_INDEX)
+		(void) find_all_inheritors(heapId, ShareLock, NULL);
 
 	/*
 	 * Switch to the table owner's userid, so that any index functions are run
@@ -3903,7 +4026,10 @@ reindex_index(const ReindexStmt *stmt, Oid indexId,
 
 	/* Close rels, but keep locks */
 	index_close(iRel, NoLock);
-	table_close(heapRelation, NoLock);
+
+	/* Do not close the rel if it is passed by the caller. */
+	if (close_rel)
+		table_close(heapRelation, NoLock);
 
 	if (progress)
 		pgstat_progress_end_command();
@@ -4070,8 +4196,19 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
 			continue;
 		}
 
+		/*
+		 * Skip global indexes when reindexing individual relations, as the
+		 * caller will handle them separately. This prevents redundant
+		 * reindexing and ensures that global indexes are processed only once.
+		 */
+		if (get_rel_relkind(indexOid) == RELKIND_GLOBAL_INDEX)
+		{
+			RemoveReindexPending(indexOid);
+			continue;
+		}
+
 		reindex_index(stmt, indexOid, !(flags & REINDEX_REL_CHECK_CONSTRAINTS),
-					  persistence, params);
+					  persistence, params, NULL);
 
 		CommandCounterIncrement();
 
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index d97d632a7e..f3c9d977e7 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -26,6 +26,7 @@
 #include "catalog/dependency.h"
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
+#include "catalog/partition.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_conversion.h"
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index b63fd57dc0..4a79ade8e4 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -1355,7 +1355,8 @@ get_relation_by_qualified_name(ObjectType objtype, List *object,
 	{
 		case OBJECT_INDEX:
 			if (relation->rd_rel->relkind != RELKIND_INDEX &&
-				relation->rd_rel->relkind != RELKIND_PARTITIONED_INDEX)
+				relation->rd_rel->relkind != RELKIND_PARTITIONED_INDEX &&
+				relation->rd_rel->relkind != RELKIND_GLOBAL_INDEX)
 				ereport(ERROR,
 						(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 						 errmsg("\"%s\" is not an index",
@@ -4137,6 +4138,10 @@ getRelationDescription(StringInfo buffer, Oid relid, bool missing_ok)
 			appendStringInfo(buffer, _("index %s"),
 							 relname);
 			break;
+		case RELKIND_GLOBAL_INDEX:
+			appendStringInfo(buffer, _("global index %s"),
+							 relname);
+			break;
 		case RELKIND_SEQUENCE:
 			appendStringInfo(buffer, _("sequence %s"),
 							 relname);
@@ -4713,6 +4718,9 @@ getRelationTypeDescription(StringInfo buffer, Oid relid, int32 objectSubId,
 		case RELKIND_PARTITIONED_INDEX:
 			appendStringInfoString(buffer, "index");
 			break;
+		case RELKIND_GLOBAL_INDEX:
+			appendStringInfoString(buffer, "global index");
+			break;
 		case RELKIND_SEQUENCE:
 			appendStringInfoString(buffer, "sequence");
 			break;
@@ -6192,6 +6200,7 @@ get_relkind_objtype(char relkind)
 			return OBJECT_TABLE;
 		case RELKIND_INDEX:
 		case RELKIND_PARTITIONED_INDEX:
+		case RELKIND_GLOBAL_INDEX:
 			return OBJECT_INDEX;
 		case RELKIND_SEQUENCE:
 			return OBJECT_SEQUENCE;
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 93d72157a4..472a096206 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -60,9 +60,11 @@ get_partition_parent(Oid relid, bool even_if_detached)
 
 	result = get_partition_parent_worker(catalogRelation, relid,
 										 &detach_pending);
-
 	if (!OidIsValid(result))
-		elog(ERROR, "could not find tuple for parent of relation %u", relid);
+	{
+		table_close(catalogRelation, AccessShareLock);
+		return InvalidOid;
+	}
 
 	if (detach_pending && !even_if_detached)
 		elog(ERROR, "relation %u has no parent because it's being detached",
diff --git a/src/backend/catalog/pg_class.c b/src/backend/catalog/pg_class.c
index 18eecbdfc0..3519ed9ae0 100644
--- a/src/backend/catalog/pg_class.c
+++ b/src/backend/catalog/pg_class.c
@@ -45,6 +45,8 @@ errdetail_relkind_not_supported(char relkind)
 			return errdetail("This operation is not supported for partitioned tables.");
 		case RELKIND_PARTITIONED_INDEX:
 			return errdetail("This operation is not supported for partitioned indexes.");
+		case RELKIND_GLOBAL_INDEX:
+			return errdetail("This operation is not supported for global indexes.");
 		default:
 			elog(ERROR, "unrecognized relkind: '%c'", relkind);
 			return 0;
diff --git a/src/backend/catalog/pg_index_partitions.c b/src/backend/catalog/pg_index_partitions.c
index e637feb453..c03fcd45e5 100644
--- a/src/backend/catalog/pg_index_partitions.c
+++ b/src/backend/catalog/pg_index_partitions.c
@@ -254,9 +254,8 @@ IndexGetPartitionReloid(Relation irel, PartitionId partid)
  * InvalidateIndexPartitionEntries - Invalidate pg_index_partitions entries
  *
  * Set reloid as Invalid in pg_index_partitions entries with respect to the
- * given reloid.  If a valid global indexoids list is given then only
- * invalidate the reloid entires which are related to the input global index
- * oids.
+ * given reloid.  If a valid reloids list is given then only
+ * invalidate the reloid entires which are related to the input reloids.
  */
 void
 InvalidateIndexPartitionEntries(List *reloids, Oid indexoid)
@@ -340,3 +339,46 @@ IndexGetNextPartitionID(Relation irel)
 
 	return partid;
 }
+
+/*
+ * IndexPartitionRelidGlobalIndexList - Get global index list for give reloid
+ *
+ * Get list of all the global index for given relation oid.
+ */
+List *
+IndexPartitionRelidGetGlobalIndexOids(Oid reloid)
+{
+	Relation	catalogRelation;
+	SysScanDesc scan;
+	ScanKeyData key;
+	HeapTuple	tuple;
+	List	   *globalindexoids = NIL;
+
+	/*
+	 * Find pg_inherits entries by inhparent.  (We need to scan them all in
+	 * order to verify that no other partition is pending detach.)
+	 */
+	catalogRelation = table_open(IndexPartitionsRelationId, RowExclusiveLock);
+
+	ScanKeyInit(&key,
+				Anum_pg_index_partitions_reloid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(reloid));
+
+	scan = systable_beginscan(catalogRelation, IndexPartitionsReloidIndexId,
+							  true, NULL, 1, &key);
+
+	while ((tuple = systable_getnext(scan)) != NULL)
+	{
+		Form_pg_index_partitions form = (Form_pg_index_partitions) GETSTRUCT(tuple);
+
+		Assert(form->reloid == reloid);
+		globalindexoids = lappend_oid(globalindexoids, form->indexoid);
+	}
+
+	/* Done */
+	systable_endscan(scan);
+	table_close(catalogRelation, RowExclusiveLock);
+
+	return globalindexoids;
+}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 874a8fc89a..a534bae291 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -325,7 +325,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 				 BTREE_AM_OID,
 				 rel->rd_rel->reltablespace,
 				 collationIds, opclassIds, NULL, coloptions, NULL, (Datum) 0,
-				 INDEX_CREATE_IS_PRIMARY, 0, true, true, NULL);
+				 INDEX_CREATE_IS_PRIMARY, 0, true, true, NULL, NIL);
 
 	table_close(toast_rel, NoLock);
 
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 7111d5d533..4417e927e6 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -421,19 +421,10 @@ do_analyze_rel(Relation onerel, const VacuumParams params,
 	 * an explicit column list in the ANALYZE command, however.
 	 *
 	 * If we are doing a recursive scan, we don't want to touch the parent's
-	 * indexes at all.  If we're processing a partitioned table, we need to
-	 * know if there are any indexes, but we don't want to process them.
+	 * indexes at all.  Partitioned table can also have global indexes so we
+	 * need to open indexes for the partitioned table as well.
 	 */
-	if (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-	{
-		List	   *idxs = RelationGetIndexList(onerel);
-
-		Irel = NULL;
-		nindexes = 0;
-		hasindex = idxs != NIL;
-		list_free(idxs);
-	}
-	else if (!inh)
+	if (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE || !inh)
 	{
 		vac_open_indexes(onerel, AccessShareLock, &nindexes, &Irel);
 		hasindex = nindexes > 0;
@@ -651,24 +642,6 @@ do_analyze_rel(Relation onerel, const VacuumParams params,
 							InvalidMultiXactId,
 							NULL, NULL,
 							in_outer_xact);
-
-		/* Same for indexes */
-		for (ind = 0; ind < nindexes; ind++)
-		{
-			AnlIndexData *thisdata = &indexdata[ind];
-			double		totalindexrows;
-
-			totalindexrows = ceil(thisdata->tupleFract * totalrows);
-			vac_update_relstats(Irel[ind],
-								RelationGetNumberOfBlocks(Irel[ind]),
-								totalindexrows,
-								0, 0,
-								false,
-								InvalidTransactionId,
-								InvalidMultiXactId,
-								NULL, NULL,
-								in_outer_xact);
-		}
 	}
 	else if (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 	{
@@ -684,6 +657,28 @@ do_analyze_rel(Relation onerel, const VacuumParams params,
 							in_outer_xact);
 	}
 
+	/* Same for indexes */
+	for (ind = 0; ind < nindexes; ind++)
+	{
+		AnlIndexData *thisdata = &indexdata[ind];
+		double		totalindexrows;
+
+		/* Nothing to be done for the partitioned indexes. */
+		if (Irel[ind]->rd_rel->relkind == RELKIND_PARTITIONED_INDEX)
+			continue;
+
+		totalindexrows = ceil(thisdata->tupleFract * totalrows);
+		vac_update_relstats(Irel[ind],
+							RelationGetNumberOfBlocks(Irel[ind]),
+							totalindexrows,
+							0, 0,
+							false,
+							InvalidTransactionId,
+							InvalidMultiXactId,
+							NULL, NULL,
+							in_outer_xact);
+	}
+
 	/*
 	 * Now report ANALYZE to the cumulative stats system.  For regular tables,
 	 * we do it only if not doing inherited stats.  For partitioned tables, we
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b55221d44c..692f441851 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -189,6 +189,11 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 						(errcode(ERRCODE_UNDEFINED_OBJECT),
 						 errmsg("index \"%s\" for table \"%s\" does not exist",
 								stmt->indexname, stmt->relation->relname)));
+			if (get_rel_relkind(indexOid) == RELKIND_GLOBAL_INDEX)
+				ereport(ERROR,
+						(errcode(ERRCODE_UNDEFINED_OBJECT),
+						 errmsg("can not cluster using global index \"%s\" ",
+								stmt->indexname)));
 		}
 
 		/* For non-partitioned tables, do what we came here to do. */
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 6f753ab6d7..0bfd2e7dbf 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -24,6 +24,7 @@
 #include "access/tableam.h"
 #include "access/xact.h"
 #include "catalog/catalog.h"
+#include "catalog/heap.h"
 #include "catalog/index.h"
 #include "catalog/indexing.h"
 #include "catalog/namespace.h"
@@ -32,6 +33,7 @@
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_database.h"
+#include "catalog/pg_index_partitions.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
@@ -95,7 +97,7 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
 							  int *ddl_save_nestlevel);
 static char *ChooseIndexName(const char *tabname, Oid namespaceId,
 							 const List *colnames, const List *exclusionOpNames,
-							 bool primary, bool isconstraint);
+							 bool primary, bool isconstraint, bool global);
 static char *ChooseIndexNameAddition(const List *colnames);
 static List *ChooseIndexColumnNames(const List *indexElems);
 static void ReindexIndex(const ReindexStmt *stmt, const ReindexParams *params,
@@ -109,6 +111,8 @@ static void ReindexMultipleTables(const ReindexStmt *stmt,
 static void reindex_error_callback(void *arg);
 static void ReindexPartitions(const ReindexStmt *stmt, Oid relid,
 							  const ReindexParams *params, bool isTopLevel);
+static void  ReindexPartitionedRelation(List *reloids,
+										const ReindexParams *params);
 static void ReindexMultipleInternal(const ReindexStmt *stmt, const List *relids,
 									const ReindexParams *params);
 static bool ReindexRelationConcurrently(const ReindexStmt *stmt,
@@ -276,7 +280,13 @@ CheckIndexCompatible(Oid oldId,
 	}
 
 	/* Any change in operator class or collation breaks compatibility. */
-	old_natts = indexForm->indnkeyatts;
+
+	/* For global index ignore the partitionID attribute. */
+	if (get_rel_relkind(oldId) == RELKIND_GLOBAL_INDEX)
+		old_natts = indexForm->indnkeyatts - 1;
+	else
+		old_natts = indexForm->indnkeyatts;
+
 	Assert(old_natts == numberOfAttributes);
 
 	d = SysCacheGetAttrNotNull(INDEXRELID, tuple, Anum_pg_index_indcollation);
@@ -525,6 +535,7 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
  *		of a partitioned index.
  * 'parentConstraintId': the OID of the parent constraint; InvalidOid if not
  *		the child of a constraint (only used when recursing)
+ * 'inheritors' List of all inheritor's OIDs if this is a partitioned relation;
  * 'total_parts': total number of direct and indirect partitions of relation;
  *		pass -1 if not known or rel is not partitioned.
  * 'is_alter_table': this is due to an ALTER rather than a CREATE operation.
@@ -544,6 +555,7 @@ DefineIndex(Oid tableId,
 			Oid indexRelationId,
 			Oid parentIndexId,
 			Oid parentConstraintId,
+			List *inheritors,
 			int total_parts,
 			bool is_alter_table,
 			bool check_rights,
@@ -636,6 +648,27 @@ DefineIndex(Oid tableId,
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_INDEX_OID,
 								 InvalidOid);
 
+	/*
+	 * If this is a global index, we must append a partition identifier to
+	 * uniquely identify the heap tuple.  Therefore, in this design, we have
+	 * opted to include the partition-id as the last key column.
+	 *
+	 * The rationale behind storing it as the last key column is that in
+	 * various scenarios, we would treat this column as an extended
+	 * index key column.  Essentially, each index tuple must be uniquely
+	 * identified. Therefore, if we encounter duplicate keys, we utilize heap
+	 * tid as a tiebreaker.  However, for global indexes, relying solely on
+	 * heap tid isn't adequate; we also require the partition identifier.
+	 */
+	if (stmt->global)
+	{
+		IndexElem	*newparam = makeNode(IndexElem);
+
+		newparam->name = NULL;
+		newparam->expr = NULL;
+		stmt->indexParams = lappend(stmt->indexParams, newparam);
+	}
+
 	/*
 	 * count key attributes in index
 	 */
@@ -738,6 +771,11 @@ DefineIndex(Oid tableId,
 					 errmsg("cannot create index on partitioned table \"%s\" concurrently",
 							RelationGetRelationName(rel))));
 	}
+	else if (stmt->global)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot create global index on non partitioned table \"%s\"",
+						RelationGetRelationName(rel))));
 
 	/*
 	 * Don't try to CREATE INDEX on temp tables of other backends.
@@ -832,7 +870,8 @@ DefineIndex(Oid tableId,
 											indexColNames,
 											stmt->excludeOpNames,
 											stmt->primary,
-											stmt->isconstraint);
+											stmt->isconstraint,
+											stmt->global);
 
 	/*
 	 * look up the access method, verify it can handle the requested features
@@ -891,6 +930,11 @@ DefineIndex(Oid tableId,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("access method \"%s\" does not support WITHOUT OVERLAPS constraints",
 						accessMethodName)));
+	if (stmt->global && strcmp(accessMethodName, "btree") != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("access method \"%s\" does not support global indexes",
+						accessMethodName)));
 
 	amcanorder = amRoutine->amcanorder;
 	amoptions = amRoutine->amoptions;
@@ -957,10 +1001,9 @@ DefineIndex(Oid tableId,
 	 * violate uniqueness by putting values that ought to be unique in
 	 * different partitions.
 	 *
-	 * We could lift this limitation if we had global indexes, but those have
-	 * their own problems, so this is a useful feature combination.
+	 * If we are creating a global index the we do not have this problem.
 	 */
-	if (partitioned && (stmt->unique || exclusion))
+	if (partitioned && !stmt->global && (stmt->unique || exclusion))
 	{
 		PartitionKey key = RelationGetPartitionKey(rel);
 		const char *constraint_type;
@@ -1110,7 +1153,7 @@ DefineIndex(Oid tableId,
 	{
 		AttrNumber	attno = indexInfo->ii_IndexAttrNumbers[i];
 
-		if (attno < 0)
+		if (attno < 0 && !stmt->global)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("index creation on system columns is not supported")));
@@ -1212,16 +1255,18 @@ DefineIndex(Oid tableId,
 	flags = constr_flags = 0;
 	if (stmt->isconstraint)
 		flags |= INDEX_CREATE_ADD_CONSTRAINT;
-	if (skip_build || concurrent || partitioned)
+	if (skip_build || concurrent || (partitioned && !stmt->global))
 		flags |= INDEX_CREATE_SKIP_BUILD;
 	if (stmt->if_not_exists)
 		flags |= INDEX_CREATE_IF_NOT_EXISTS;
 	if (concurrent)
 		flags |= INDEX_CREATE_CONCURRENT;
-	if (partitioned)
+	if (partitioned && !stmt->global)
 		flags |= INDEX_CREATE_PARTITIONED;
 	if (stmt->primary)
 		flags |= INDEX_CREATE_IS_PRIMARY;
+	if (stmt->global)
+		flags |= INDEX_CREATE_GLOBAL;
 
 	/*
 	 * If the table is partitioned, and recursion was declined but partitions
@@ -1251,7 +1296,7 @@ DefineIndex(Oid tableId,
 					 coloptions, NULL, reloptions,
 					 flags, constr_flags,
 					 allowSystemTableMods, !check_rights,
-					 &createdConstraintId);
+					 &createdConstraintId, inheritors);
 
 	ObjectAddressSet(address, RelationRelationId, indexRelationId);
 
@@ -1289,7 +1334,13 @@ DefineIndex(Oid tableId,
 		CreateComments(indexRelationId, RelationRelationId, 0,
 					   stmt->idxcomment);
 
-	if (partitioned)
+	/*
+	 * If table is partitioned then create index on each partition.  But if
+	 * we are building a global index we don't need to create it on each
+	 * partition, there will be just one global index which will hold data from
+	 * all the children.
+	 */
+	if (partitioned && !stmt->global)
 	{
 		PartitionDesc partdesc;
 
@@ -1523,6 +1574,7 @@ DefineIndex(Oid tableId,
 									InvalidOid, /* no predefined OID */
 									indexRelationId,	/* this is our child */
 									createdConstraintId,
+									NIL,
 									-1,
 									is_alter_table, check_rights,
 									check_not_in_use,
@@ -1935,9 +1987,21 @@ ComputeIndexAttrs(IndexInfo *indexInfo,
 		Oid			attcollation;
 
 		/*
-		 * Process the column-or-expression to be indexed.
+		 * Process the column-or-expression to be indexed.  For partition ID
+		 * attribute both name and expr is set as NULL.  And we can directly
+		 * point to the predefine FormData_pg_attribute for the partition id
+		 * attribute.
 		 */
-		if (attribute->name != NULL)
+		if ((attribute->name == NULL) && (attribute->expr == NULL))
+		{
+			const FormData_pg_attribute *attform;
+
+			attform = &partitionid_attr;
+			indexInfo->ii_IndexAttrNumbers[attn] = attform->attnum;
+			atttype = attform->atttypid;
+			attcollation = attform->attcollation;
+		}
+		else if (attribute->name != NULL)
 		{
 			/* Simple index attribute */
 			HeapTuple	atttuple;
@@ -2673,7 +2737,7 @@ ChooseRelationName(const char *name1, const char *name2,
 static char *
 ChooseIndexName(const char *tabname, Oid namespaceId,
 				const List *colnames, const List *exclusionOpNames,
-				bool primary, bool isconstraint)
+				bool primary, bool isconstraint, bool global)
 {
 	char	   *indexname;
 
@@ -2737,6 +2801,9 @@ ChooseIndexNameAddition(const List *colnames)
 	{
 		const char *name = (const char *) lfirst(lc);
 
+		if (strcmp(name, "partid") == 0)
+			continue;
+
 		if (buflen > 0)
 			buf[buflen++] = '_';	/* insert _ between names */
 
@@ -2778,8 +2845,10 @@ ChooseIndexColumnNames(const List *indexElems)
 			origname = ielem->indexcolname; /* caller-specified name */
 		else if (ielem->name)
 			origname = ielem->name; /* simple column reference */
-		else
+		else if (ielem->expr)
 			origname = "expr";	/* default name for expression */
+		else
+			origname = "partid";
 
 		/* If it conflicts with any previous column, tweak it */
 		curname = origname;
@@ -2960,7 +3029,7 @@ ReindexIndex(const ReindexStmt *stmt, const ReindexParams *params, bool isTopLev
 		ReindexParams newparams = *params;
 
 		newparams.options |= REINDEXOPT_REPORT_PROGRESS;
-		reindex_index(stmt, indOid, false, persistence, &newparams);
+		reindex_index(stmt, indOid, false, persistence, &newparams, NULL);
 	}
 }
 
@@ -3010,7 +3079,8 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
 	if (!relkind)
 		return;
 	if (relkind != RELKIND_INDEX &&
-		relkind != RELKIND_PARTITIONED_INDEX)
+		relkind != RELKIND_PARTITIONED_INDEX &&
+		relkind != RELKIND_GLOBAL_INDEX)
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 				 errmsg("\"%s\" is not an index", relation->relname)));
@@ -3353,6 +3423,7 @@ ReindexPartitions(const ReindexStmt *stmt, Oid relid, const ReindexParams *param
 	char	   *relnamespace = get_namespace_name(get_rel_namespace(relid));
 	MemoryContext reindex_context;
 	List	   *inhoids;
+	List	   *parentreloids = NIL;
 	ListCell   *lc;
 	ErrorContextCallback errcallback;
 	ReindexErrorInfo errinfo;
@@ -3403,10 +3474,16 @@ ReindexPartitions(const ReindexStmt *stmt, Oid relid, const ReindexParams *param
 
 		/*
 		 * This discards partitioned tables, partitioned indexes and foreign
-		 * tables.
+		 * tables.  However, we would rememeber the OIDs of the partittioned
+		 * tables and reindex them later as they can also have global indexes.
 		 */
 		if (!RELKIND_HAS_STORAGE(partkind))
+		{
+			if (partkind == RELKIND_PARTITIONED_TABLE)
+				parentreloids = lappend_oid(parentreloids, partoid);
+
 			continue;
+		}
 
 		Assert(partkind == RELKIND_INDEX ||
 			   partkind == RELKIND_RELATION);
@@ -3423,6 +3500,9 @@ ReindexPartitions(const ReindexStmt *stmt, Oid relid, const ReindexParams *param
 	 */
 	ReindexMultipleInternal(stmt, partitions, params);
 
+	/* Reindex the global indexes. */
+	ReindexPartitionedRelation(parentreloids, params);
+
 	/*
 	 * Clean up working storage --- note we must do this after
 	 * StartTransactionCommand, else we might be trying to delete the active
@@ -3431,6 +3511,78 @@ ReindexPartitions(const ReindexStmt *stmt, Oid relid, const ReindexParams *param
 	MemoryContextDelete(reindex_context);
 }
 
+/*
+ * ReindexPartitionedRelation
+ *
+ * Reindex the list of partitioned relations.  Partitioned relations can have
+ * global index so this will reindex global indexes directly defined on each
+ * partitioned relation.
+ */
+static void
+ReindexPartitionedRelation(List *reloids, const ReindexParams *params)
+{
+	Relation	rel;
+
+	foreach_oid(relid, reloids)
+	{
+		List *indexoids;
+
+		/*
+		 * Open and lock the relation.  ShareLock is sufficient since we only
+		 * need to prevent schema and data changes in it.  The lock level used
+		 * here should match ReindexTable().
+		 */
+		if ((params->options & REINDEXOPT_MISSING_OK) != 0)
+			rel = try_table_open(relid, ShareLock);
+		else
+			rel = table_open(relid, ShareLock);
+
+		/* if relation is gone, leave */
+		if (!rel)
+			continue;
+
+		/* Only partitioned table must get here. */
+		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+
+		/* If relation doesn't have global index then skip it. */
+		if (!rel->rd_rel->relhasglobalindex)
+		{
+			/* Close rel, but continue to hold the lock. */
+			table_close(rel, NoLock);
+			continue;
+		}
+
+		/*
+		 * Get the list of indexes of the relation.  Loop through all the
+		 * indexes and reindex the indexes directly defined on the relation.
+		 */
+		indexoids = RelationGetIndexList(rel);
+		foreach_oid(indexoid, indexoids)
+		{
+			Oid	heapId = IndexGetRelation(indexoid,
+										  (params->options &
+										   REINDEXOPT_MISSING_OK) != 0);
+			ReindexParams newparams = *params;
+
+			/*
+			 * if relation is missing, or the index is not defined on this
+			 * relation directly then skip it as we do not want to reindex the
+			 * global indexes defined on the parent relation.
+			 */
+			if (!OidIsValid(heapId) || heapId != relid)
+				continue;
+
+			/* Partitioned relation can have only global indexes. */
+			Assert(get_rel_relkind(indexoid) == RELKIND_GLOBAL_INDEX);
+			reindex_index(NULL, indexoid, false, rel->rd_rel->relpersistence,
+						  &newparams, rel);
+		}
+
+		 /* Close rel, but continue to hold the lock. */
+		 table_close(rel, NoLock);
+	}
+}
+
 /*
  * ReindexMultipleInternal
  *
@@ -3493,7 +3645,8 @@ ReindexMultipleInternal(const ReindexStmt *stmt, const List *relids, const Reind
 		Assert(!RELKIND_HAS_PARTITIONS(relkind));
 
 		if ((params->options & REINDEXOPT_CONCURRENTLY) != 0 &&
-			relpersistence != RELPERSISTENCE_TEMP)
+			relpersistence != RELPERSISTENCE_TEMP &&
+			!get_rel_has_globalindex(relid))
 		{
 			ReindexParams newparams = *params;
 
@@ -3509,7 +3662,7 @@ ReindexMultipleInternal(const ReindexStmt *stmt, const List *relids, const Reind
 
 			newparams.options |=
 				REINDEXOPT_REPORT_PROGRESS | REINDEXOPT_MISSING_OK;
-			reindex_index(stmt, relid, false, relpersistence, &newparams);
+			reindex_index(stmt, relid, false, relpersistence, &newparams, NULL);
 			PopActiveSnapshot();
 			/* reindex_index() does the verbose output */
 		}
@@ -3838,6 +3991,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 
 		case RELKIND_PARTITIONED_TABLE:
 		case RELKIND_PARTITIONED_INDEX:
+		case RELKIND_GLOBAL_INDEX:
 		default:
 			/* Return error if type of relation is not supported */
 			ereport(ERROR,
@@ -4451,7 +4605,8 @@ IndexSetParentIndex(Relation partitionIdx, Oid parentOid)
 
 	/* Make sure this is an index */
 	Assert(partitionIdx->rd_rel->relkind == RELKIND_INDEX ||
-		   partitionIdx->rd_rel->relkind == RELKIND_PARTITIONED_INDEX);
+		   partitionIdx->rd_rel->relkind == RELKIND_PARTITIONED_INDEX ||
+		   partitionIdx->rd_rel->relkind == RELKIND_GLOBAL_INDEX);
 
 	/*
 	 * Scan pg_inherits for rows linking our index to some parent.
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index b8837f26cb..875163aa28 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -101,6 +101,7 @@
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/partcache.h"
+#include "utils/rel.h"
 #include "utils/relcache.h"
 #include "utils/ruleutils.h"
 #include "utils/snapmgr.h"
@@ -207,6 +208,9 @@ typedef struct AlteredTableInfo
 	char	   *clusterOnIndex; /* index to use for CLUSTER */
 	List	   *changedStatisticsOids;	/* OIDs of statistics to rebuild */
 	List	   *changedStatisticsDefs;	/* string definitions of same */
+	List	   *globalindexoids; /* OIDs of the global indexes from ancestors */
+	List	   *partids; /* Partition ids for each global index oids in
+							globalindexoids */
 } AlteredTableInfo;
 
 /* Struct describing one new constraint to check in Phase 3 scan */
@@ -307,6 +311,12 @@ static const struct dropmsgstrings dropmsgstringarray[] = {
 		gettext_noop("index \"%s\" does not exist, skipping"),
 		gettext_noop("\"%s\" is not an index"),
 	gettext_noop("Use DROP INDEX to remove an index.")},
+	{RELKIND_GLOBAL_INDEX,
+		ERRCODE_UNDEFINED_OBJECT,
+		gettext_noop("index \"%s\" does not exist"),
+		gettext_noop("index \"%s\" does not exist, skipping"),
+		gettext_noop("\"%s\" is not an index"),
+	gettext_noop("Use DROP INDEX to remove an index.")},
 	{'\0', 0, NULL, NULL, NULL, NULL}
 };
 
@@ -739,6 +749,7 @@ static List *GetParentedForeignKeyRefs(Relation partition);
 static void ATDetachCheckNoForeignKeyRefs(Relation partition);
 static char GetAttributeCompression(Oid atttypid, const char *compression);
 static char GetAttributeStorage(Oid atttypid, const char *storagemode);
+static void LockPartitionsForGlobalIndex(Relation rel, LOCKMODE lockmode);
 
 
 /* ----------------------------------------------------------------
@@ -1277,13 +1288,15 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 
 			if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 			{
-				if (idxRel->rd_index->indisunique)
+				if (idxRel->rd_index->indisunique ||
+					RelationIsGlobalIndex(idxRel))
 					ereport(ERROR,
 							(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 							 errmsg("cannot create foreign partition of partitioned table \"%s\"",
 									RelationGetRelationName(parent)),
-							 errdetail("Table \"%s\" contains indexes that are unique.",
-									   RelationGetRelationName(parent))));
+							 errdetail("Table \"%s\" contains indexes that are %s.",
+									   RelationGetRelationName(parent),
+									   idxRel->rd_index->indisunique ? "unique" : "global")));
 				else
 				{
 					index_close(idxRel, AccessShareLock);
@@ -1291,6 +1304,26 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 				}
 			}
 
+			/*
+			 * Global indexes are only exist on the partitioned table on which
+			 * it is created so we don't need to copy it to child relation.
+			 * However we need to attach this partition to the global index
+			 * that will internally assign a partition id and insert mapping
+			 * into pg_index_partition table.  And also update the stats that
+			 * relation has an index.
+			 */
+			if (RelationIsGlobalIndex(idxRel))
+			{
+				List *inheritor = list_make1_oid(relationId);
+
+				AttachParittionsToGlobalIndex(idxRel, inheritor);
+
+				/* Update the stats that the relation has a index. */
+				index_update_stats(rel, true, true, -1.0);
+				index_close(idxRel, AccessShareLock);
+				continue;
+			}
+
 			attmap = build_attrmap_by_name(RelationGetDescr(rel),
 										   RelationGetDescr(parent),
 										   false);
@@ -1302,6 +1335,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 						InvalidOid,
 						RelationGetRelid(idxRel),
 						constraintOid,
+						NIL,
 						-1,
 						false, false, false, false, false);
 
@@ -1647,24 +1681,28 @@ RemoveRelations(DropStmt *drop)
 		}
 
 		/*
-		 * Concurrent index drop cannot be used with partitioned indexes,
-		 * either.
+		 * Concurrent index drop cannot be used with partitioned indexes or
+		 * global indexes.
 		 */
 		if ((flags & PERFORM_DELETION_CONCURRENTLY) != 0 &&
-			state.actual_relkind == RELKIND_PARTITIONED_INDEX)
+			(state.actual_relkind == RELKIND_PARTITIONED_INDEX ||
+			 state.actual_relkind == RELKIND_GLOBAL_INDEX))
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot drop partitioned index \"%s\" concurrently",
-							rel->relname)));
+					 errmsg("cannot drop %s index \"%s\" concurrently",
+							(state.actual_relkind == RELKIND_GLOBAL_INDEX) ?
+							"global" : "partitioned", rel->relname)));
 
 		/*
-		 * If we're told to drop a partitioned index, we must acquire lock on
-		 * all the children of its parent partitioned table before proceeding.
-		 * Otherwise we'd try to lock the child index partitions before their
-		 * tables, leading to potential deadlock against other sessions that
-		 * will lock those objects in the other order.
+		 * If we're told to drop a partitioned index or a global index, we must
+		 * acquire lock on all the children of its parent partitioned table
+		 * before proceeding.  Otherwise we'd try to lock the child index
+		 * partitions before their tables, leading to potential deadlock
+		 * against other sessions that will lock those objects in the other
+		 * order.
 		 */
-		if (state.actual_relkind == RELKIND_PARTITIONED_INDEX)
+		if (state.actual_relkind == RELKIND_PARTITIONED_INDEX ||
+			state.actual_relkind == RELKIND_GLOBAL_INDEX)
 			(void) find_all_inheritors(state.heapOid,
 									   state.heap_lockmode,
 									   NULL);
@@ -1751,6 +1789,8 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
 		expected_relkind = RELKIND_RELATION;
 	else if (classform->relkind == RELKIND_PARTITIONED_INDEX)
 		expected_relkind = RELKIND_INDEX;
+	else if (classform->relkind == RELKIND_GLOBAL_INDEX)
+		expected_relkind = RELKIND_INDEX;
 	else
 		expected_relkind = classform->relkind;
 
@@ -1877,6 +1917,12 @@ ExecuteTruncate(TruncateStmt *stmt)
 		/* open the relation, we already hold a lock on it */
 		rel = table_open(myrelid, NoLock);
 
+		/*
+		 * Lock top level parent of the relation having global index and all
+		 * its inheritos.
+		 */
+		LockPartitionsForGlobalIndex(rel, lockmode);
+
 		/*
 		 * RangeVarGetRelidExtended() has done most checks with its callback,
 		 * but other checks with the now-opened Relation remain.
@@ -1980,6 +2026,7 @@ ExecuteTruncateGuts(List *explicit_rels,
 {
 	List	   *rels;
 	List	   *seq_relids = NIL;
+	List	   *index_oids = NIL;
 	HTAB	   *ft_htab = NULL;
 	EState	   *estate;
 	ResultRelInfo *resultRelInfos;
@@ -2043,6 +2090,28 @@ ExecuteTruncateGuts(List *explicit_rels,
 		heap_truncate_check_FKs(rels, false);
 #endif
 
+	/*
+	 * Process the list of relation and collect the list of all the index oids.
+	 * this is required so that we after truncating all the relations we can
+	 * reindex the global indexes just once.
+	 */
+	foreach(cell, rels)
+	{
+		Relation	rel = (Relation) lfirst(cell);
+		List	   *oids;
+
+		if (!rel->rd_rel->relhasglobalindex)
+			continue;
+		oids = RelationGetIndexList(rel);
+
+		/*
+		 * We need to use a unique concatenation, as there may be duplicate
+		 * indexes across different partitions. This can happen if multiple
+		 * partitions inherit the same global indexes from a common ancestor.
+		 */
+		index_oids = list_concat_unique_oid(index_oids, oids);
+	}
+
 	/*
 	 * If we are asked to restart sequences, find all the sequences, lock them
 	 * (we need AccessExclusiveLock for ResetSequence), and check permissions.
@@ -2243,6 +2312,18 @@ ExecuteTruncateGuts(List *explicit_rels,
 		pgstat_count_truncate(rel);
 	}
 
+	/* Reindex global indexes */
+	foreach_oid(indexoid, index_oids)
+	{
+		ReindexParams reindex_params = {0};
+
+		if (get_rel_relkind(indexoid) != RELKIND_GLOBAL_INDEX)
+			continue;
+
+		reindex_index(NULL, indexoid, false, get_rel_persistence(indexoid),
+					  &reindex_params, NULL);
+	}
+
 	/* Now go through the hash table, and truncate foreign tables */
 	if (ft_htab)
 	{
@@ -3804,6 +3885,7 @@ renameatt_check(Oid myrelid, Form_pg_class classform, bool recursing)
 		relkind != RELKIND_COMPOSITE_TYPE &&
 		relkind != RELKIND_INDEX &&
 		relkind != RELKIND_PARTITIONED_INDEX &&
+		relkind != RELKIND_GLOBAL_INDEX &&
 		relkind != RELKIND_FOREIGN_TABLE &&
 		relkind != RELKIND_PARTITIONED_TABLE)
 		ereport(ERROR,
@@ -4237,7 +4319,8 @@ RenameRelation(RenameStmt *stmt)
 		 */
 		relkind = get_rel_relkind(relid);
 		obj_is_index = (relkind == RELKIND_INDEX ||
-						relkind == RELKIND_PARTITIONED_INDEX);
+						relkind == RELKIND_PARTITIONED_INDEX ||
+						relkind == RELKIND_GLOBAL_INDEX);
 		if (obj_is_index || is_index_stmt == obj_is_index)
 			break;
 
@@ -4304,7 +4387,8 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 */
 	Assert(!is_index ||
 		   is_index == (targetrelation->rd_rel->relkind == RELKIND_INDEX ||
-						targetrelation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX));
+						targetrelation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX ||
+						targetrelation->rd_rel->relkind == RELKIND_GLOBAL_INDEX));
 
 	/*
 	 * Update pg_class tuple with new relname.  (Scribbling on reltup is OK
@@ -4332,7 +4416,8 @@ RenameRelationInternal(Oid myrelid, const char *newrelname, bool is_internal, bo
 	 * Also rename the associated constraint, if any.
 	 */
 	if (targetrelation->rd_rel->relkind == RELKIND_INDEX ||
-		targetrelation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX)
+		targetrelation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX ||
+		targetrelation->rd_rel->relkind == RELKIND_GLOBAL_INDEX)
 	{
 		Oid			constraintId = get_index_constraint(myrelid);
 
@@ -4417,6 +4502,7 @@ CheckTableNotInUse(Relation rel, const char *stmt)
 
 	if (rel->rd_rel->relkind != RELKIND_INDEX &&
 		rel->rd_rel->relkind != RELKIND_PARTITIONED_INDEX &&
+		rel->rd_rel->relkind != RELKIND_GLOBAL_INDEX &&
 		AfterTriggerPendingOnRel(RelationGetRelid(rel)))
 		ereport(ERROR,
 				(errcode(ERRCODE_OBJECT_IN_USE),
@@ -6038,6 +6124,28 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode,
 		}
 	}
 
+	/*
+	 * Rebuild global indexes.  Each AlteredTableInfo contains the list of
+	 * global index oids of ancestors which need to be rebuilt after this
+	 * partition got attached.
+	 */
+	foreach(ltab, *wqueue)
+	{
+		AlteredTableInfo *tab = (AlteredTableInfo *) lfirst(ltab);
+		ReindexParams reindex_params = {0};
+
+		if (tab->globalindexoids == NIL)
+			continue;
+
+		Assert(tab->relkind == RELKIND_PARTITIONED_TABLE);
+
+		foreach_oid(indexoid, tab->globalindexoids)
+		{
+			reindex_index(NULL, indexoid, false,
+						  get_rel_persistence(indexoid), &reindex_params, NULL);
+		}
+	}
+
 	/*
 	 * Foreign key constraints are checked in a final pass, since (a) it's
 	 * generally best to examine each one separately, and (b) it's at least
@@ -6745,6 +6853,7 @@ ATSimplePermissions(AlterTableType cmdtype, Relation rel, int allowed_targets)
 			actual_target = ATT_MATVIEW;
 			break;
 		case RELKIND_INDEX:
+		case RELKIND_GLOBAL_INDEX:
 			actual_target = ATT_INDEX;
 			break;
 		case RELKIND_PARTITIONED_INDEX:
@@ -8888,6 +8997,7 @@ ATExecSetStatistics(Relation rel, const char *colName, int16 colNum, Node *newVa
 	 */
 	if (rel->rd_rel->relkind != RELKIND_INDEX &&
 		rel->rd_rel->relkind != RELKIND_PARTITIONED_INDEX &&
+		rel->rd_rel->relkind != RELKIND_GLOBAL_INDEX &&
 		!colName)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -8967,7 +9077,8 @@ ATExecSetStatistics(Relation rel, const char *colName, int16 colNum, Node *newVa
 						colName)));
 
 	if (rel->rd_rel->relkind == RELKIND_INDEX ||
-		rel->rd_rel->relkind == RELKIND_PARTITIONED_INDEX)
+		rel->rd_rel->relkind == RELKIND_PARTITIONED_INDEX ||
+		rel->rd_rel->relkind == RELKIND_GLOBAL_INDEX)
 	{
 		if (attnum > rel->rd_index->indnkeyatts)
 			ereport(ERROR,
@@ -9588,6 +9699,7 @@ ATExecAddIndex(AlteredTableInfo *tab, Relation rel,
 	bool		check_rights;
 	bool		skip_build;
 	bool		quiet;
+	List	   *inheritors = NIL;
 	ObjectAddress address;
 
 	Assert(IsA(stmt, IndexStmt));
@@ -9603,11 +9715,15 @@ ATExecAddIndex(AlteredTableInfo *tab, Relation rel,
 	/* suppress notices when rebuilding existing index */
 	quiet = is_rebuild;
 
+	if (stmt->global)
+		inheritors = find_all_inheritors(RelationGetRelid(rel), NoLock, NULL);
+
 	address = DefineIndex(RelationGetRelid(rel),
 						  stmt,
 						  InvalidOid,	/* no predefined OID */
 						  InvalidOid,	/* no parent index */
 						  InvalidOid,	/* no parent constraint */
+						  inheritors,
 						  -1,	/* total_parts unknown */
 						  true, /* is_alter_table */
 						  check_rights,
@@ -9634,6 +9750,9 @@ ATExecAddIndex(AlteredTableInfo *tab, Relation rel,
 		index_close(irel, NoLock);
 	}
 
+	if (inheritors)
+		list_free(inheritors);
+
 	return address;
 }
 
@@ -15049,7 +15168,8 @@ RememberAllDependentForRebuilding(AlteredTableInfo *tab, AlterTableType subtype,
 					char		relKind = get_rel_relkind(foundObject.objectId);
 
 					if (relKind == RELKIND_INDEX ||
-						relKind == RELKIND_PARTITIONED_INDEX)
+						relKind == RELKIND_PARTITIONED_INDEX ||
+						relKind == RELKIND_GLOBAL_INDEX)
 					{
 						Assert(foundObject.objectSubId == 0);
 						RememberIndexForRebuilding(foundObject.objectId, tab);
@@ -16197,6 +16317,7 @@ ATExecChangeOwner(Oid relationOid, Oid newOwnerId, bool recursing, LOCKMODE lock
 		if (tuple_class->relkind != RELKIND_COMPOSITE_TYPE &&
 			tuple_class->relkind != RELKIND_INDEX &&
 			tuple_class->relkind != RELKIND_PARTITIONED_INDEX &&
+			tuple_class->relkind != RELKIND_GLOBAL_INDEX &&
 			tuple_class->relkind != RELKIND_TOASTVALUE)
 			changeDependencyOnOwner(RelationRelationId, relationOid,
 									newOwnerId);
@@ -16648,6 +16769,7 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation,
 			break;
 		case RELKIND_INDEX:
 		case RELKIND_PARTITIONED_INDEX:
+		case RELKIND_GLOBAL_INDEX:
 			(void) index_reloptions(rel->rd_indam->amoptions, newOptions, true);
 			break;
 		case RELKIND_TOASTVALUE:
@@ -16839,7 +16961,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 	newrlocator.spcOid = newTableSpace;
 
 	/* hand off to AM to actually create new rel storage and copy the data */
-	if (rel->rd_rel->relkind == RELKIND_INDEX)
+	if (rel->rd_rel->relkind == RELKIND_INDEX ||
+		rel->rd_rel->relkind == RELKIND_GLOBAL_INDEX)
 	{
 		index_copy_data(rel, newrlocator);
 	}
@@ -17022,7 +17145,8 @@ AlterTableMoveAll(AlterTableMoveAllStmt *stmt)
 			 relForm->relkind != RELKIND_PARTITIONED_TABLE) ||
 			(stmt->objtype == OBJECT_INDEX &&
 			 relForm->relkind != RELKIND_INDEX &&
-			 relForm->relkind != RELKIND_PARTITIONED_INDEX) ||
+			 relForm->relkind != RELKIND_PARTITIONED_INDEX &&
+			 relForm->relkind != RELKIND_GLOBAL_INDEX) ||
 			(stmt->objtype == OBJECT_MATVIEW &&
 			 relForm->relkind != RELKIND_MATVIEW))
 			continue;
@@ -19612,8 +19736,8 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid,
 				 errmsg("\"%s\" is not a composite type", rv->relname)));
 
 	if (reltype == OBJECT_INDEX && relkind != RELKIND_INDEX &&
-		relkind != RELKIND_PARTITIONED_INDEX
-		&& !IsA(stmt, RenameStmt))
+		relkind != RELKIND_PARTITIONED_INDEX &&
+		relkind != RELKIND_GLOBAL_INDEX && !IsA(stmt, RenameStmt))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 				 errmsg("\"%s\" is not an index", rv->relname)));
@@ -19636,7 +19760,8 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid,
 	 */
 	if (IsA(stmt, AlterObjectSchemaStmt))
 	{
-		if (relkind == RELKIND_INDEX || relkind == RELKIND_PARTITIONED_INDEX)
+		if (relkind == RELKIND_INDEX || relkind == RELKIND_PARTITIONED_INDEX ||
+			relkind == RELKIND_GLOBAL_INDEX)
 			ereport(ERROR,
 					(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 					 errmsg("cannot change schema of index \"%s\"",
@@ -20174,6 +20299,160 @@ QueuePartitionConstraintValidation(List **wqueue, Relation scanrel,
 	}
 }
 
+/*
+ * AttachParittionsToGlobalIndex - Attach relation oids to the global index
+ *
+ * Thi will create the mapping for the input 'reloids' to global index oid of
+ * the input relation pointed by 'irel'.
+ */
+void
+AttachParittionsToGlobalIndex(Relation irel, List *reloids)
+{
+	/*
+	 * Loop through OID of each relation and attach to the global indexes.
+	 */
+	foreach_oid(childoid, reloids)
+	{
+		/* Caller should be holding the lock for all the children. */
+		Relation	childrel = table_open(childoid, NoLock);
+		PartitionId	partid;
+
+		/*
+		 * Allocate the partition ID for this partition with respect to the
+		 * the global index and insert the mapping into the index partition
+		 * table.
+		 */
+		partid = IndexGetNextPartitionID(irel);
+		InsertIndexPartitionEntry(irel, childoid, partid);
+
+		table_close(childrel, NoLock);
+	}
+}
+
+/*
+ * AttachToGlobalIndexes - Attach base relation(s) to ancestor's global indexes
+ *
+ * This function creates the mapping for the attached relation to all the
+ * global indexes present on the partitioned table we are attaching to, as well
+ * as all its ancestors. The process involves assigning a partition ID for each
+ * global index on the ancestors and making an entry in the pg_index_partitions
+ * table.  If the relation being attached is also partitioned, the function
+ * recursively traverses all its children to create mappings for the base
+ * relations. Note that this mapping is required only for the base relations,
+ * as they are the ones that can contain tuples.
+ */
+static void
+AttachToGlobalIndexes(List **wqueue, Relation rel, List *reloids)
+{
+	List	   *indexoids;
+	List	   *globalindexoids = NIL;
+	bool		hasglobalindex = false;
+	AlteredTableInfo   *tab;
+
+	/*
+	 * Retrieve the list of all indexes from the parent to which we are
+	 * attaching.  The parent relation's index list will also include all the
+	 * global indexes of its ancestors.
+	 */
+	indexoids = RelationGetIndexList(rel);
+
+	/* Quick exit if there is no index on the parent relation. */
+	if (indexoids == NIL)
+		return;
+
+	/*
+	 * Loop through each indexoid and create mapping for the all the reloids
+	 * if this is a global index.
+	 */
+	foreach_oid(indexoid, indexoids)
+	{
+		Relation	irel = index_open(indexoid, RowExclusiveLock);
+
+		/* We don't need to do anything if this is not a global index. */
+		if (!RelationIsGlobalIndex(irel))
+		{
+			table_close(irel, RowExclusiveLock);
+			continue;
+		}
+
+		globalindexoids = lappend_oid(globalindexoids, indexoid);
+
+		/* Flag to indicate that we have at least one global index. */
+		hasglobalindex = true;
+
+		/* Attach reloids to the global index. */
+		AttachParittionsToGlobalIndex(irel, reloids);
+
+		/*
+		 * Invalidate the index relation cache of the global index so that its
+		 * gets recreated and the newly attached partitions get reflected in
+		 * the cache.
+		 */
+		CacheInvalidateRelcache(irel);
+
+		/* Close the index relation, keep the lock till end of transaction */
+		table_close(irel, NoLock);
+	}
+
+	/*
+	 * Loop through each partition and update the stats that the relation has
+	 * a index.
+	 */
+	if (hasglobalindex)
+	{
+		foreach_oid(childoid, reloids)
+		{
+			Relation	childrel;
+
+			/* Lock already held by caller. */
+			childrel = table_open(childoid, NoLock);
+			index_update_stats(childrel, true, true, -1.0);
+			table_close(childrel, NoLock);
+		}
+	}
+
+	tab = ATGetQueueEntry(wqueue, rel);
+	tab->globalindexoids = globalindexoids;
+
+	/* Free the indexoids list memory. */
+	list_free(indexoids);
+}
+
+/*
+ * DetachFromGlobalIndexes - Detach reloids from all global indexes
+ *
+ * Invalidate the mapping in pg_index_partitions for input 'indexoids' and
+ * 'reloids'.
+ */
+void
+DetachFromGlobalIndexes(List *indexoids, List *reloids)
+{
+	foreach_oid(indexoid, indexoids)
+	{
+		Relation	irel = index_open(indexoid, AccessExclusiveLock);
+
+		/*
+		 * There will not be any mapping if this is not a global index so
+		 * continue with the nexr entry.
+		 */
+		if (!RelationIsGlobalIndex(irel))
+		{
+			index_close(irel, AccessExclusiveLock);
+			continue;
+		}
+
+		/* Invalidate the mapping for the global index to reloids. */
+		InvalidateIndexPartitionEntries(reloids, indexoid);
+
+		/*
+		 * Invalidate the index relation cache so that the dropped relation
+		 * information is reflected in the cache.
+		 */
+		CacheInvalidateRelcache(irel);
+		index_close(irel, AccessExclusiveLock);
+	}
+}
+
 /*
  * ALTER TABLE <name> ATTACH PARTITION <partition-name> FOR VALUES
  *
@@ -20200,6 +20479,12 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd,
 
 	pstate->p_sourcetext = context->queryString;
 
+	/*
+	 * If the relation to which we are attaching has mark as it has global
+	 * index, lock all the inheritors which are covered by the global index.
+	 */
+	LockPartitionsForGlobalIndex(rel, AccessExclusiveLock);
+
 	/*
 	 * We must lock the default partition if one exists, because attaching a
 	 * new partition will change its partition constraint.
@@ -20472,6 +20757,9 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd,
 
 	ObjectAddressSet(address, RelationRelationId, RelationGetRelid(attachrel));
 
+	/* Attach partition to the global indexes. */
+	AttachToGlobalIndexes(wqueue, rel, attachrel_children);
+
 	/*
 	 * If the partition we just attached is partitioned itself, invalidate
 	 * relcache for all descendent partitions too to ensure that their
@@ -20546,14 +20834,16 @@ AttachPartitionEnsureIndexes(List **wqueue, Relation rel, Relation attachrel)
 			Relation	idxRel = index_open(idx, AccessShareLock);
 
 			if (idxRel->rd_index->indisunique ||
-				idxRel->rd_index->indisprimary)
+				idxRel->rd_index->indisprimary ||
+				RelationIsGlobalIndex(idxRel))
 				ereport(ERROR,
 						(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 						 errmsg("cannot attach foreign table \"%s\" as partition of partitioned table \"%s\"",
 								RelationGetRelationName(attachrel),
 								RelationGetRelationName(rel)),
-						 errdetail("Partitioned table \"%s\" contains unique indexes.",
-								   RelationGetRelationName(rel))));
+						 errdetail("Partitioned table \"%s\" contains %s indexes.",
+								   RelationGetRelationName(rel),
+								   RelationIsGlobalIndex(idxRel) ? "global" : "unique")));
 			index_close(idxRel, AccessShareLock);
 		}
 
@@ -20664,6 +20954,7 @@ AttachPartitionEnsureIndexes(List **wqueue, Relation rel, Relation attachrel)
 			DefineIndex(RelationGetRelid(attachrel), stmt, InvalidOid,
 						RelationGetRelid(idxRel),
 						conOid,
+						NIL,
 						-1,
 						true, false, false, false, false);
 		}
@@ -20879,6 +21170,15 @@ ATExecDetachPartition(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		LockRelationOid(defaultPartOid, AccessExclusiveLock);
 	}
 
+	/*
+	 * Lock top level parent of the relation having global index and all its
+	 * inheritos.
+	 *
+	 * XXX do we need AccessExclusiveLock on all the tables under the global
+	 * index or only on the partitions which are being detached?
+	 */
+	LockPartitionsForGlobalIndex(rel, AccessExclusiveLock);
+
 	/*
 	 * In concurrent mode, the partition is locked with share-update-exclusive
 	 * in the first transaction.  This allows concurrent transactions to be
@@ -21035,6 +21335,7 @@ DetachPartitionFinalize(Relation rel, Relation partRel, bool concurrent,
 				newtuple;
 	Relation	trigrel = NULL;
 	List	   *fkoids = NIL;
+	List       *children = NIL;
 
 	if (concurrent)
 	{
@@ -21322,6 +21623,30 @@ DetachPartitionFinalize(Relation rel, Relation partRel, bool concurrent,
 			CacheInvalidateRelcacheByRelid(defaultPartOid);
 	}
 
+	/*
+	 * When detaching a table, we also need to detach (invalidate mapping in
+	 * pg_index_partition relation) this table from all the global indexes
+	 * on the parent table and its ancestors from which the table is being
+	 * detached. If the table being detached is itself partitioned, we must
+	 * retrieve the list of all its inheritors and detach all the leaf tables
+	 * under this partition from the global indexes of all ancestors.
+	 */
+	indexes = RelationGetIndexList(rel);
+	if (partRel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		/*
+		 * All inheritors are already locked so we don't need to lock it here.
+		 */
+		children = find_all_inheritors(RelationGetRelid(partRel),
+									   NoLock, NULL);
+	}
+	else
+		children = list_make1_oid(RelationGetRelid(partRel));
+
+	/* Detach the relation and its children from ancestor's global indexes. */
+	DetachFromGlobalIndexes(indexes, children);
+	list_free(indexes);
+
 	/*
 	 * Invalidate the parent's relcache so that the partition is no longer
 	 * included in its partition descriptor.
@@ -21337,15 +21662,13 @@ DetachPartitionFinalize(Relation rel, Relation partRel, bool concurrent,
 	 */
 	if (partRel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 	{
-		List	   *children;
-
-		children = find_all_inheritors(RelationGetRelid(partRel),
-									   AccessExclusiveLock, NULL);
 		foreach(cell, children)
 		{
 			CacheInvalidateRelcacheByRelid(lfirst_oid(cell));
 		}
 	}
+
+	list_free(children);
 }
 
 /*
@@ -21541,7 +21864,8 @@ RangeVarCallbackForAttachIndex(const RangeVar *rv, Oid relOid, Oid oldRelOid,
 		return;					/* concurrently dropped, so nothing to do */
 	classform = (Form_pg_class) GETSTRUCT(tuple);
 	if (classform->relkind != RELKIND_PARTITIONED_INDEX &&
-		classform->relkind != RELKIND_INDEX)
+		classform->relkind != RELKIND_INDEX &&
+		classform->relkind != RELKIND_GLOBAL_INDEX)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 				 errmsg("\"%s\" is not an index", rv->relname)));
@@ -22040,3 +22364,45 @@ GetAttributeStorage(Oid atttypid, const char *storagemode)
 
 	return cstorage;
 }
+
+/*
+ * Lock top level parent of the relation having global index and all its
+ * inheritos.
+ */
+static void
+LockPartitionsForGlobalIndex(Relation rel, LOCKMODE lockmode)
+{
+	List	   *indexoidlist;
+	List	   *parentreloids = NIL;
+
+	/* Nothing to do if relhasglobalindex is false. */
+	if (!rel->rd_rel->relhasglobalindex)
+		return;
+
+	/*
+	 * Loop the all the indexes and for each global index get the relation oid
+	 * on which the global index is defined and lock that parent along with all
+	 * its inheritors.
+	 */
+	indexoidlist = RelationGetIndexList(rel);
+	foreach_oid(indexid, indexoidlist)
+	{
+		Relation	idxrel;
+		Oid			reloid;
+
+		idxrel = index_open(indexid, AccessShareLock);
+		if (!RelationIsGlobalIndex(idxrel))
+		{
+			index_close(idxrel, AccessShareLock);
+			continue;
+		}
+		reloid = idxrel->rd_index->indrelid;
+		index_close(idxrel, AccessShareLock);
+		if (list_member_oid(parentreloids, reloid))
+			continue;
+
+		parentreloids = lappend_oid(parentreloids, reloid);
+		LockRelationOid(reloid, lockmode);
+		(void) find_all_inheritors(reloid, lockmode, NULL);
+	}
+}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 733ef40ae7..435ac0d850 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -35,6 +35,7 @@
 #include "access/transam.h"
 #include "access/xact.h"
 #include "catalog/namespace.h"
+#include "catalog/partition.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_inherits.h"
 #include "commands/cluster.h"
@@ -59,6 +60,7 @@
 #include "utils/injection_point.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
+#include "utils/lsyscache.h"
 #include "utils/syscache.h"
 
 /*
@@ -2365,6 +2367,10 @@ vac_open_indexes(Relation relation, LOCKMODE lockmode,
 
 	Assert(lockmode != NoLock);
 
+	/*
+	 * Get list of all the indexes including the global indexes of all its
+	 * ancestors.
+	 */
 	indexoidlist = RelationGetIndexList(relation);
 
 	/* allocate enough memory for all indexes */
@@ -2383,7 +2389,8 @@ vac_open_indexes(Relation relation, LOCKMODE lockmode,
 		Relation	indrel;
 
 		indrel = index_open(indexoid, lockmode);
-		if (indrel->rd_index->indisready)
+		if (indrel->rd_index->indisready &&
+			indrel->rd_index->indrelid == RelationGetRelid(relation))
 			(*Irel)[i++] = indrel;
 		else
 			index_close(indrel, lockmode);
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index ca33a85427..b543a5d683 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -107,13 +107,16 @@
 #include "postgres.h"
 
 #include "access/genam.h"
+#include "access/relation.h"
 #include "access/relscan.h"
 #include "access/tableam.h"
 #include "access/xact.h"
 #include "catalog/index.h"
+#include "catalog/partition.h"
 #include "executor/executor.h"
 #include "nodes/nodeFuncs.h"
 #include "storage/lmgr.h"
+#include "utils/lsyscache.h"
 #include "utils/multirangetypes.h"
 #include "utils/rangetypes.h"
 #include "utils/snapmgr.h"
@@ -174,9 +177,11 @@ ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative)
 		return;
 
 	/*
-	 * Get cached list of index OIDs
+	 * Get list of all the indexes including the global indexes of all its
+	 * ancestors.
 	 */
 	indexoidlist = RelationGetIndexList(resultRelation);
+
 	len = list_length(indexoidlist);
 	if (len == 0)
 		return;
@@ -213,6 +218,14 @@ ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative)
 		/* extract index key information from the index's pg_index info */
 		ii = BuildIndexInfo(indexDesc);
 
+		/*
+		 * Fetch partition ID of the relation for the global index.  For more
+		 * details refer comments atop IndexInfo.
+		 */
+		 if (RelationIsGlobalIndex(indexDesc))
+			ii->ii_partid = IndexGetRelationPartitionId(indexDesc,
+														RelationGetRelid(resultRelation));
+
 		/*
 		 * If the indexes are to be used for speculative insertion, add extra
 		 * information required by unique index entries.
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 59233b6473..c716f9a6fe 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -268,6 +268,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				continue;
 			}
 
+			/*
+			 * TODO: Global index scan paths are not yet supported.
+			 */
+			if (RelationIsGlobalIndex(indexRelation))
+			{
+				index_close(indexRelation, NoLock);
+				continue;
+			}
+
 			/*
 			 * If the index is valid, but cannot yet be used, ignore it; but
 			 * mark the plan we are generating as transient. See
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 50f53159d5..1c8db4fde9 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -487,7 +487,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <str>		unicode_normal_form
 
 %type <boolean> opt_instead
-%type <boolean> opt_unique opt_verbose opt_full
+%type <boolean> opt_unique opt_verbose opt_full opt_global
 %type <boolean> opt_freeze opt_analyze opt_default
 %type <defelt>	opt_binary copy_delimiter
 
@@ -8188,7 +8188,7 @@ defacl_privilege_target:
 
 IndexStmt:	CREATE opt_unique INDEX opt_concurrently opt_single_name
 			ON relation_expr access_method_clause '(' index_params ')'
-			opt_include opt_unique_null_treatment opt_reloptions OptTableSpace where_clause
+			opt_include opt_unique_null_treatment opt_reloptions opt_global OptTableSpace where_clause
 				{
 					IndexStmt *n = makeNode(IndexStmt);
 
@@ -8201,8 +8201,9 @@ IndexStmt:	CREATE opt_unique INDEX opt_concurrently opt_single_name
 					n->indexIncludingParams = $12;
 					n->nulls_not_distinct = !$13;
 					n->options = $14;
-					n->tableSpace = $15;
-					n->whereClause = $16;
+					n->global = $15;
+					n->tableSpace = $16;
+					n->whereClause = $17;
 					n->excludeOpNames = NIL;
 					n->idxcomment = NULL;
 					n->indexOid = InvalidOid;
@@ -8220,7 +8221,7 @@ IndexStmt:	CREATE opt_unique INDEX opt_concurrently opt_single_name
 				}
 			| CREATE opt_unique INDEX opt_concurrently IF_P NOT EXISTS name
 			ON relation_expr access_method_clause '(' index_params ')'
-			opt_include opt_unique_null_treatment opt_reloptions OptTableSpace where_clause
+			opt_include opt_unique_null_treatment opt_reloptions opt_global OptTableSpace where_clause
 				{
 					IndexStmt *n = makeNode(IndexStmt);
 
@@ -8233,8 +8234,9 @@ IndexStmt:	CREATE opt_unique INDEX opt_concurrently opt_single_name
 					n->indexIncludingParams = $15;
 					n->nulls_not_distinct = !$16;
 					n->options = $17;
-					n->tableSpace = $18;
-					n->whereClause = $19;
+					n->global = $18;
+					n->tableSpace = $19;
+					n->whereClause = $20;
 					n->excludeOpNames = NIL;
 					n->idxcomment = NULL;
 					n->indexOid = InvalidOid;
@@ -8257,6 +8259,11 @@ opt_unique:
 			| /*EMPTY*/								{ $$ = false; }
 		;
 
+opt_global:
+			GLOBAL									{ $$ = true; }
+			| /*EMPTY*/								{ $$ = false; }
+		;
+
 access_method_clause:
 			USING name								{ $$ = $2; }
 			| /*EMPTY*/								{ $$ = DEFAULT_INDEX_TYPE; }
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index afcf54169c..d354f44e66 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -1762,6 +1762,7 @@ generateClonedIndexStmt(RangeVar *heapRel, Relation source_idx,
 	index->unique = idxrec->indisunique;
 	index->nulls_not_distinct = idxrec->indnullsnotdistinct;
 	index->primary = idxrec->indisprimary;
+	index->global = (idxrelrec->relkind == RELKIND_GLOBAL_INDEX);
 	index->iswithoutoverlaps = (idxrec->indisprimary || idxrec->indisunique) && idxrec->indisexclusion;
 	index->transformed = true;	/* don't need transformIndexStmt */
 	index->concurrent = false;
@@ -1880,6 +1881,13 @@ generateClonedIndexStmt(RangeVar *heapRel, Relation source_idx,
 											   keyno);
 		int16		opt = source_idx->rd_indoption[keyno];
 
+		/*
+		 * We don't need to copy PartitionIdAttributeNumber as this will be
+		 * internally added by DefineIndex while creating a global index.
+		 */
+		if (attnum == PartitionIdAttributeNumber)
+			continue;
+
 		iparam = makeNode(IndexElem);
 
 		if (AttributeNumberIsValid(attnum))
@@ -2528,6 +2536,8 @@ transformIndexConstraint(Constraint *constraint, CreateStmtContext *cxt)
 				Assert(attnum <= heap_rel->rd_att->natts);
 				attform = TupleDescAttr(heap_rel->rd_att, attnum - 1);
 			}
+			else if (attnum == PartitionIdAttributeNumber)
+				continue;
 			else
 				attform = SystemAttributeDefinition(attnum);
 			attname = pstrdup(NameStr(attform->attname));
diff --git a/src/backend/statistics/stat_utils.c b/src/backend/statistics/stat_utils.c
index a9a3224efe..8564a19cdb 100644
--- a/src/backend/statistics/stat_utils.c
+++ b/src/backend/statistics/stat_utils.c
@@ -148,7 +148,12 @@ stats_lock_check_privileges(Oid reloid)
 	 */
 	switch (get_rel_relkind(reloid))
 	{
+		/*
+		 * FIXME, revalidate correct lock type for global index and update
+		 * comments in README.tuplock and other relavent places.
+		 */
 		case RELKIND_INDEX:
+		case RELKIND_GLOBAL_INDEX:
 			index_oid = reloid;
 			table_oid = IndexGetRelation(index_oid, false);
 			index_lockmode = AccessShareLock;
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 25fe3d5801..a3a84c57f0 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1459,6 +1459,7 @@ ProcessUtilitySlow(ParseState *pstate,
 					LOCKMODE	lockmode;
 					int			nparts = -1;
 					bool		is_alter_table;
+					List	   *inheritors = NIL;
 
 					if (stmt->concurrent)
 						PreventInTransactionBlock(isTopLevel,
@@ -1496,7 +1497,6 @@ ProcessUtilitySlow(ParseState *pstate,
 						get_rel_relkind(relid) == RELKIND_PARTITIONED_TABLE)
 					{
 						ListCell   *lc;
-						List	   *inheritors = NIL;
 
 						inheritors = find_all_inheritors(relid, lockmode, NULL);
 						foreach(lc, inheritors)
@@ -1512,17 +1512,16 @@ ProcessUtilitySlow(ParseState *pstate,
 									 relkind, stmt->relation->relname);
 
 							if (relkind == RELKIND_FOREIGN_TABLE &&
-								(stmt->unique || stmt->primary))
+								(stmt->unique || stmt->primary || stmt->global))
 								ereport(ERROR,
 										(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-										 errmsg("cannot create unique index on partitioned table \"%s\"",
-												stmt->relation->relname),
+										 errmsg("cannot create %s index on partitioned table \"%s\"",
+												stmt->global ? "global" : "unique", stmt->relation->relname),
 										 errdetail("Table \"%s\" contains partitions that are foreign tables.",
 												   stmt->relation->relname)));
 						}
 						/* count direct and indirect children, but not rel */
 						nparts = list_length(inheritors) - 1;
-						list_free(inheritors);
 					}
 
 					/*
@@ -1547,6 +1546,7 @@ ProcessUtilitySlow(ParseState *pstate,
 									InvalidOid, /* no predefined OID */
 									InvalidOid, /* no parent index */
 									InvalidOid, /* no parent constraint */
+									inheritors, /* list of inheritor's OID */
 									nparts, /* # of partitions, or -1 */
 									is_alter_table,
 									true,	/* check_rights */
@@ -1554,6 +1554,8 @@ ProcessUtilitySlow(ParseState *pstate,
 									false,	/* skip_build */
 									false); /* quiet */
 
+					list_free(inheritors);
+
 					/*
 					 * Add the CREATE INDEX node itself to stash right away;
 					 * if there were any commands stashed in the ALTER TABLE
diff --git a/src/backend/utils/adt/amutils.c b/src/backend/utils/adt/amutils.c
index 0af26d6acf..54b3a0d6b9 100644
--- a/src/backend/utils/adt/amutils.c
+++ b/src/backend/utils/adt/amutils.c
@@ -173,7 +173,8 @@ indexam_property(FunctionCallInfo fcinfo,
 			PG_RETURN_NULL();
 		rd_rel = (Form_pg_class) GETSTRUCT(tuple);
 		if (rd_rel->relkind != RELKIND_INDEX &&
-			rd_rel->relkind != RELKIND_PARTITIONED_INDEX)
+			rd_rel->relkind != RELKIND_PARTITIONED_INDEX &&
+			rd_rel->relkind != RELKIND_GLOBAL_INDEX)
 		{
 			ReleaseSysCache(tuple);
 			PG_RETURN_NULL();
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 3d6e6bdbfd..92ef04533c 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -1401,6 +1401,7 @@ pg_get_indexdef_worker(Oid indexrelid, int colno,
 		Oid			keycoltype;
 		Oid			keycolcollation;
 
+
 		/*
 		 * Ignore non-key attributes if told to.
 		 */
@@ -1414,6 +1415,10 @@ pg_get_indexdef_worker(Oid indexrelid, int colno,
 			sep = "";
 		}
 
+		/* Ignore internal PartitionIdAttributeNumber. */
+		if (attnum == PartitionIdAttributeNumber)
+			continue;
+
 		if (!colno)
 			appendStringInfoString(&buf, sep);
 		sep = ", ";
@@ -1514,6 +1519,12 @@ pg_get_indexdef_worker(Oid indexrelid, int colno,
 		if (idxrec->indnullsnotdistinct)
 			appendStringInfoString(&buf, " NULLS NOT DISTINCT");
 
+		/*
+		 * If this is a global index, append "GLOBAL"
+		 */
+		if (idxrelrec->relkind == RELKIND_GLOBAL_INDEX)
+			appendStringInfoString(&buf, " GLOBAL");
+
 		/*
 		 * If it has options, append "WITH (options)"
 		 */
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index c460a72b75..ac99d8e608 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2253,6 +2253,27 @@ get_rel_relam(Oid relid)
 	return result;
 }
 
+/*
+ * get_rel_has_globalindex
+ *
+ *		Returns whether the relation has global index or not.
+ */
+bool
+get_rel_has_globalindex(Oid relid)
+{
+	HeapTuple	tp;
+	Form_pg_class reltup;
+	char		result;
+
+	tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+	if (!HeapTupleIsValid(tp))
+		elog(ERROR, "cache lookup failed for relation %u", relid);
+	reltup = (Form_pg_class) GETSTRUCT(tp);
+	result = reltup->relhasglobalindex;
+	ReleaseSysCache(tp);
+
+	return result;
+}
 
 /*				---------- TRANSFORM CACHE ----------						 */
 
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 559ba9cdb2..485c3fd223 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -33,6 +33,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/parallel.h"
+#include "access/relation.h"
 #include "access/reloptions.h"
 #include "access/sysattr.h"
 #include "access/table.h"
@@ -51,6 +52,7 @@
 #include "catalog/pg_authid.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_database.h"
+#include "catalog/pg_inherits.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_proc.h"
@@ -487,6 +489,7 @@ RelationParseRelOptions(Relation relation, HeapTuple tuple)
 			break;
 		case RELKIND_INDEX:
 		case RELKIND_PARTITIONED_INDEX:
+		case RELKIND_GLOBAL_INDEX:
 			amoptsfn = relation->rd_indam->amoptions;
 			break;
 		default:
@@ -1223,7 +1226,8 @@ retry:
 	 * initialize access method information
 	 */
 	if (relation->rd_rel->relkind == RELKIND_INDEX ||
-		relation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX)
+		relation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX ||
+		relation->rd_rel->relkind == RELKIND_GLOBAL_INDEX)
 		RelationInitIndexAccessInfo(relation);
 	else if (RELKIND_HAS_TABLE_AM(relation->rd_rel->relkind) ||
 			 relation->rd_rel->relkind == RELKIND_SEQUENCE)
@@ -1588,6 +1592,14 @@ RelationInitIndexAccessInfo(Relation relation)
 
 	(void) RelationGetIndexAttOptions(relation, false);
 
+	/*
+	 * If this is a global index then also build cache for partition id to
+	 * relation oid mapping.  For more details about this mapping read comments
+	 * atop IndexPartitionInfoData.
+	 */
+	if (RelationIsGlobalIndex(relation))
+		BuildIndexPartitionInfo(relation, indexcxt);
+
 	/*
 	 * expressions, predicate, exclusion caches will be filled later
 	 */
@@ -2281,7 +2293,8 @@ RelationReloadIndexInfo(Relation relation)
 
 	/* Should be called only for invalidated, live indexes */
 	Assert((relation->rd_rel->relkind == RELKIND_INDEX ||
-			relation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) &&
+			relation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX ||
+			relation->rd_rel->relkind == RELKIND_GLOBAL_INDEX) &&
 		   !relation->rd_isvalid &&
 		   relation->rd_droppedSubid == InvalidSubTransactionId);
 
@@ -4926,6 +4939,18 @@ RelationGetIndexList(Relation relation)
 
 	systable_endscan(indscan);
 
+	/*
+	 * If this relation potentially has global indexes on itself or on any of
+	 * its ancestors, retrieve the list of all global indexes.
+	 */
+	if (RelationGetForm(relation)->relhasglobalindex)
+	{
+		List *globalindexoids = IndexPartitionRelidGetGlobalIndexOids(
+												RelationGetRelid(relation));
+
+		result = list_concat_unique_oid(result, globalindexoids);
+	}
+
 	table_close(indrel, AccessShareLock);
 
 	/* Sort the result list into OID order, per API spec. */
@@ -5339,7 +5364,8 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
 		return NULL;
 
 	/*
-	 * Get cached list of index OIDs. If we have to start over, we do so here.
+	 * Get list of all the indexes including the global indexes of all its
+	 * ancestors. If we have to start over, we do so here.
 	 */
 restart:
 	indexoidlist = RelationGetIndexList(relation);
@@ -5451,8 +5477,11 @@ restart:
 			 * Obviously, non-key columns couldn't be referenced by foreign
 			 * key or identity key. Hence we do not include them into
 			 * uindexattrs, pkindexattrs and idindexattrs bitmaps.
+			 *
+			 * Also ignore the parittion ID attribute as this is an internal
+			 * attribute added for the global indexes.
 			 */
-			if (attrnum != 0)
+			if (attrnum != 0 && attrnum != PartitionIdAttributeNumber)
 			{
 				*attrs = bms_add_member(*attrs,
 										attrnum - FirstLowInvalidHeapAttributeNumber);
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 1937997ea6..7d378facef 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -5688,6 +5688,7 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
 						 "\n-- For binary upgrade, must preserve pg_class oids and relfilenodes\n");
 
 	if (entry->relkind != RELKIND_INDEX &&
+		entry->relkind != RELKIND_GLOBAL_INDEX &&
 		entry->relkind != RELKIND_PARTITIONED_INDEX)
 	{
 		appendPQExpBuffer(upgrade_buffer,
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index dd25d2fe7b..778ec2815c 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -1989,7 +1989,12 @@ describeOneTableDetails(const char *schemaname,
 	}
 
 	appendPQExpBufferStr(&buf, "\nFROM pg_catalog.pg_attribute a");
-	appendPQExpBuffer(&buf, "\nWHERE a.attrelid = '%s' AND a.attnum > 0 AND NOT a.attisdropped", oid);
+
+	/*
+	 * FIXME: partid column should be avoided only for global indexes.  And
+	 * shall we restrict usage of partid column.
+	 */
+	appendPQExpBuffer(&buf, "\nWHERE a.attrelid = '%s' AND a.attnum > 0 AND NOT a.attisdropped AND a.attname != 'partid'", oid);
 	appendPQExpBufferStr(&buf, "\nORDER BY a.attnum;");
 
 	res = PSQLexec(buf.data);
@@ -2052,6 +2057,14 @@ describeOneTableDetails(const char *schemaname,
 				printfPQExpBuffer(&title, _("Partitioned table \"%s.%s\""),
 								  schemaname, relationname);
 			break;
+		case RELKIND_GLOBAL_INDEX:
+			if (tableinfo.relpersistence == 'u')
+				printfPQExpBuffer(&title, _("Unlogged global index \"%s.%s\""),
+								  schemaname, relationname);
+			else
+				printfPQExpBuffer(&title, _("Global index \"%s.%s\""),
+								  schemaname, relationname);
+			break;
 		default:
 			/* untranslated unknown relkind */
 			printfPQExpBuffer(&title, "?%c? \"%s.%s\"",
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index e709d2e0af..cf7ddb0131 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -21,6 +21,7 @@
 #include "access/xlogreader.h"
 #include "catalog/pg_am_d.h"
 #include "catalog/pg_index.h"
+#include "common/int.h"
 #include "lib/stringinfo.h"
 #include "storage/bufmgr.h"
 #include "storage/shm_toc.h"
@@ -655,6 +656,47 @@ BTreeTupleGetHeapTID(IndexTuple itup)
 	return &itup->t_tid;
 }
 
+/*
+ * Fetch partition ID store in the index tuple.
+ *
+ * For global indexes we store partition ID column as a last additional key
+ * column in order to identify which partition this index tuple belongs to.
+ */
+static inline PartitionId
+BTreeTupleGetPartitionId(Relation index, IndexTuple itup)
+{
+	bool		is_null;
+	Datum		datum;
+	int 		partidattno = IndexRelationGetNumberOfKeyAttributes(index);
+	TupleDesc	tupleDesc = RelationGetDescr(index);
+
+	Assert(RelationIsGlobalIndex(index));
+
+	/*
+	 * If this is a pivot tuple and tiebreaker partition id attribute is not
+	 * present in it then return InvalidPartitionId.
+	 */
+	if (BTreeTupleIsPivot(itup) && BTreeTupleGetNAtts(itup, index) <=
+		IndexRelationGetNumberOfKeyAttributes(index))
+		return InvalidPartitionId;
+
+	/* Fetch partition id attribute from index tuple. */
+	datum = index_getattr(itup, partidattno, tupleDesc, &is_null);
+	Assert(!is_null);
+
+	return DatumGetPartitionId(datum);
+}
+
+/*
+ * Get relation OID with respect to the partition ID stored in the IndexTuple.
+ */
+static inline Oid
+BTreeTupleGetPartitionRelid(Relation index, IndexTuple itup)
+{
+	return IndexGetPartitionReloid(index,
+								   BTreeTupleGetPartitionId(index, itup));
+}
+
 /*
  * Get maximum heap TID attribute, which could be the only TID in the case of
  * a non-pivot tuple that does not have a posting list.
@@ -676,6 +718,19 @@ BTreeTupleGetMaxHeapTID(IndexTuple itup)
 	return &itup->t_tid;
 }
 
+/*
+ * _bt_indexdel_cmp() -- qsort comparison function for _bt_simpledel_pass() in
+ * order to sort the items in partition ID order.
+ */
+static inline int
+_bt_indexdel_cmp(const void *arg1, const void *arg2)
+{
+	PartidDeltidMapping *b1 = ((PartidDeltidMapping *) arg1);
+	PartidDeltidMapping *b2 = ((PartidDeltidMapping *) arg2);
+
+	return pg_cmp_u32(b1->partid, b2->partid);
+}
+
 /*
  *	Operator strategy numbers for B-tree have been moved to access/stratnum.h,
  *	because many places need to use them in ScanKeyInit() calls.
@@ -1156,7 +1211,8 @@ typedef struct BTOptions
 } BTOptions;
 
 #define BTGetFillFactor(relation) \
-	(AssertMacro(relation->rd_rel->relkind == RELKIND_INDEX && \
+	(AssertMacro((relation->rd_rel->relkind == RELKIND_INDEX || \
+				  relation->rd_rel->relkind == RELKIND_GLOBAL_INDEX) && \
 				 relation->rd_rel->relam == BTREE_AM_OID), \
 	 (relation)->rd_options ? \
 	 ((BTOptions *) (relation)->rd_options)->fillfactor : \
@@ -1164,7 +1220,8 @@ typedef struct BTOptions
 #define BTGetTargetPageFreeSpace(relation) \
 	(BLCKSZ * (100 - BTGetFillFactor(relation)) / 100)
 #define BTGetDeduplicateItems(relation) \
-	(AssertMacro(relation->rd_rel->relkind == RELKIND_INDEX && \
+	(AssertMacro((relation->rd_rel->relkind == RELKIND_INDEX || \
+				  relation->rd_rel->relkind == RELKIND_GLOBAL_INDEX) && \
 				 relation->rd_rel->relam == BTREE_AM_OID), \
 	((relation)->rd_options ? \
 	 ((BTOptions *) (relation)->rd_options)->deduplicate_items : true))
@@ -1287,7 +1344,8 @@ extern void _bt_delitems_vacuum(Relation rel, Buffer buf,
 								BTVacuumPosting *updatable, int nupdatable);
 extern void _bt_delitems_delete_check(Relation rel, Buffer buf,
 									  Relation heapRel,
-									  TM_IndexDeleteOp *delstate);
+									  TM_IndexDeleteOp *delstate,
+									  PartidDeltidMapping *mapping);
 extern void _bt_pagedel(Relation rel, Buffer leafbuf, BTVacState *vstate);
 extern void _bt_pendingfsm_init(Relation rel, BTVacState *vstate,
 								bool cleanuponly);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1c9e802a6b..cea236b340 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -17,6 +17,7 @@
 #ifndef TABLEAM_H
 #define TABLEAM_H
 
+#include "c.h"
 #include "access/relscan.h"
 #include "access/sdir.h"
 #include "access/xact.h"
@@ -200,6 +201,11 @@ typedef struct TM_FailureData
  * ndeltids is 0 on return from call to tableam, in which case no index tuple
  * deletions are possible.  Simple deletion callers can rely on any entries
  * they know to be deletable appearing in the final array as deletable.
+ *
+ * Note: For global indexes, the TID alone is insufficient to identify the
+ * heap tuple. We also need the partition ID that indicates which partition the
+ * TID belongs to. Later, when accessing the heap, the partition ID can be
+ * converted to the corresponding relation ID.
  */
 typedef struct TM_IndexDelete
 {
@@ -248,6 +254,20 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/*
+ * This maintain a entry with respect to each entry of *deltids in
+ * TM_IndexDeleteOp structure.  For each entry it will keep the partition ID
+ * for that tid and the index into the *deltids array.  We need this so that
+ * later we can sort deleted tids in partittion ID order in order to call the
+ * table AM method for checking the deleted tids status.
+ */
+typedef struct PartidDeltidMapping
+{
+	PartitionId		partid;	/* Partition ID of the entry in deltids array
+							   in TM_IndexDeleteOp. */
+	int				idx;	/* Index in deltids array in TM_IndexDeleteOp */
+} PartidDeltidMapping;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 4daa8bef5e..da11c32179 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -64,7 +64,8 @@ extern void index_check_primary_key(Relation heapRel,
 #define	INDEX_CREATE_CONCURRENT				(1 << 3)
 #define	INDEX_CREATE_IF_NOT_EXISTS			(1 << 4)
 #define	INDEX_CREATE_PARTITIONED			(1 << 5)
-#define INDEX_CREATE_INVALID				(1 << 6)
+#define INDEX_CREATE_GLOBAL					(1 << 6)
+#define INDEX_CREATE_INVALID				(1 << 7)
 
 extern Oid	index_create(Relation heapRelation,
 						 const char *indexRelationName,
@@ -86,7 +87,8 @@ extern Oid	index_create(Relation heapRelation,
 						 bits16 constr_flags,
 						 bool allow_system_table_mods,
 						 bool is_internal,
-						 Oid *constraintId);
+						 Oid *constraintId,
+						 List *inheritors);
 
 #define	INDEX_CONSTR_CREATE_MARK_AS_PRIMARY	(1 << 0)
 #define	INDEX_CONSTR_CREATE_DEFERRABLE		(1 << 1)
@@ -144,7 +146,10 @@ extern void index_build(Relation heapRelation,
 						IndexInfo *indexInfo,
 						bool isreindex,
 						bool parallel);
-
+extern void index_update_stats(Relation rel,
+							   bool hasindex,
+							   bool hasglobalindex,
+							   double reltuples);
 extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
 
 extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
@@ -153,7 +158,7 @@ extern Oid	IndexGetRelation(Oid indexId, bool missing_ok);
 
 extern void reindex_index(const ReindexStmt *stmt, Oid indexId,
 						  bool skip_constraint_checks, char persistence,
-						  const ReindexParams *params);
+						  const ReindexParams *params, Relation heapRelation);
 
 /* Flag bits for reindex_relation(): */
 #define REINDEX_REL_PROCESS_TOAST			0x01
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 07d182da79..1115963360 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -77,6 +77,12 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
 	/* T if has (or has had) any indexes */
 	bool		relhasindex BKI_DEFAULT(f);
 
+	/*
+	 * T if this rel or any of its ancestor has (or has had) any global
+	 * indexes.
+	 */
+	bool		relhasglobalindex BKI_DEFAULT(f);
+
 	/* T if shared across databases */
 	bool		relisshared BKI_DEFAULT(f);
 
@@ -174,6 +180,7 @@ MAKE_SYSCACHE(RELNAMENSP, pg_class_relname_nsp_index, 128);
 #define		  RELKIND_FOREIGN_TABLE   'f'	/* foreign table */
 #define		  RELKIND_PARTITIONED_TABLE 'p' /* partitioned table */
 #define		  RELKIND_PARTITIONED_INDEX 'I' /* partitioned index */
+#define		  RELKIND_GLOBAL_INDEX		'g' /* global index */
 
 #define		  RELPERSISTENCE_PERMANENT	'p' /* regular table */
 #define		  RELPERSISTENCE_UNLOGGED	'u' /* unlogged permanent table */
@@ -202,7 +209,8 @@ MAKE_SYSCACHE(RELNAMENSP, pg_class_relname_nsp_index, 128);
 	 (relkind) == RELKIND_INDEX || \
 	 (relkind) == RELKIND_SEQUENCE || \
 	 (relkind) == RELKIND_TOASTVALUE || \
-	 (relkind) == RELKIND_MATVIEW)
+	 (relkind) == RELKIND_MATVIEW || \
+	 (relkind) == RELKIND_GLOBAL_INDEX)
 
 #define RELKIND_HAS_PARTITIONS(relkind) \
 	((relkind) == RELKIND_PARTITIONED_TABLE || \
diff --git a/src/include/catalog/pg_index_partitions.h b/src/include/catalog/pg_index_partitions.h
index 2dcc8ca3fc..c2d952ef9b 100644
--- a/src/include/catalog/pg_index_partitions.h
+++ b/src/include/catalog/pg_index_partitions.h
@@ -45,6 +45,7 @@ CATALOG(pg_index_partitions,6015,IndexPartitionsRelationId)
 typedef FormData_pg_index_partitions *Form_pg_index_partitions;
 
 DECLARE_UNIQUE_INDEX_PKEY(pg_index_partitions_indexoid_partid_index, 6018, IndexPartitionsIndexId, pg_index_partitions, btree(indexoid oid_ops, partid int4_ops));
+DECLARE_INDEX(pg_index_partitions_reloid_index, 6019, IndexPartitionsReloidIndexId, pg_index_partitions, btree(reloid oid_ops));
 
 /*
  * Map over the pg_index_partitions table for a particular global index.  This
@@ -74,6 +75,28 @@ typedef struct IndexPartitionInfoEntry
 #define		FirstValidPartitionId		1
 #define		PartIdIsValid(partid)	((bool) ((partid) != InvalidPartitionId))
 
+/*
+ * The "partitionid" is a special purpose attribute this attribute is not have
+ * entry in the pg_attribute table.  But this is just used for getting the
+ * FormData_pg_attribute entry for partition id attribute.
+ *
+ * TODO: We need to find some better way than doing this.
+ */
+#define PartitionIdAttributeNumber				(-100)
+
+static const FormData_pg_attribute partitionid_attr = {
+	.attname = {""},
+	.atttypid = INT4OID,
+	.attlen = sizeof(int32),
+	.attnum = PartitionIdAttributeNumber,
+	.atttypmod = -1,
+	.attbyval = true,
+	.attalign = TYPALIGN_INT,
+	.attstorage = TYPSTORAGE_PLAIN,
+	.attnotnull = true,
+	.attislocal = true,
+};
+
 extern void BuildIndexPartitionInfo(Relation relation, MemoryContext context);
 extern PartitionId IndexGetRelationPartitionId(Relation irel, Oid reloid);
 extern Oid IndexGetPartitionReloid(Relation irel, PartitionId partid);
@@ -81,4 +104,5 @@ extern PartitionId IndexGetNextPartitionID(Relation irel);
 extern void DeleteIndexPartitionEntries(Oid indrelid);
 extern void InsertIndexPartitionEntry(Relation irel, Oid reloid, PartitionId partid);
 extern void InvalidateIndexPartitionEntries(List *reloids, Oid indexoid);
+extern List *IndexPartitionRelidGetGlobalIndexOids(Oid reloid);
 #endif							/* PG_INDEX_PARTITIONS_H */
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index dd22b5efdf..158ccc1773 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -30,6 +30,7 @@ extern ObjectAddress DefineIndex(Oid tableId,
 								 Oid indexRelationId,
 								 Oid parentIndexId,
 								 Oid parentConstraintId,
+								 List *inheritors,
 								 int total_parts,
 								 bool is_alter_table,
 								 bool check_rights,
diff --git a/src/include/commands/tablecmds.h b/src/include/commands/tablecmds.h
index 6832470d38..956f617fbd 100644
--- a/src/include/commands/tablecmds.h
+++ b/src/include/commands/tablecmds.h
@@ -106,5 +106,6 @@ extern void RangeVarCallbackOwnsRelation(const RangeVar *relation,
 										 Oid relId, Oid oldRelId, void *arg);
 extern bool PartConstraintImpliedByRelConstraint(Relation scanrel,
 												 List *partConstraint);
-
+extern void AttachParittionsToGlobalIndex(Relation irel, List *reloids);
+extern void DetachFromGlobalIndexes(List *indexoids, List *reloids);
 #endif							/* TABLECMDS_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 2492282213..b7cec7b6fc 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -187,6 +187,15 @@ typedef struct ExprState
  *
  * ii_Concurrent, ii_BrokenHotChain, and ii_ParallelWorkers are used only
  * during index build; they're conventionally zeroed otherwise.
+ *
+ * ii_partid is only used during inserting a index tuple or index build.  This
+ * holds the partition Id of a leaf partition for which we are currently
+ * inserting the tuple into the global index.
+ *
+ * XXX this is stored by caller where we have the information about currently
+ * for which partition we are inserting a tuple into the index and this is
+ * accessed by FormIndexDatum().  We may consider to pass this as a parameter
+ * to FormIndexDatum() or some other way of computing this.
  * ----------------
  */
 typedef struct IndexInfo
@@ -216,6 +225,7 @@ typedef struct IndexInfo
 	bool		ii_WithoutOverlaps;
 	int			ii_ParallelWorkers;
 	Oid			ii_Am;
+	PartitionId	ii_partid;
 	void	   *ii_AmCache;
 	MemoryContext ii_Context;
 } IndexInfo;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index ba12678d1c..425d6b1386 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3466,6 +3466,7 @@ typedef struct IndexStmt
 	bool		unique;			/* is index unique? */
 	bool		nulls_not_distinct; /* null treatment for UNIQUE constraints */
 	bool		primary;		/* is index a primary key? */
+	bool		global;			/* is index a global index? */
 	bool		isconstraint;	/* is it for a pkey/unique constraint? */
 	bool		iswithoutoverlaps;	/* is the constraint WITHOUT OVERLAPS? */
 	bool		deferrable;		/* is the constraint DEFERRABLE? */
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index fa7c7e0323..8d2804bdcf 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -147,6 +147,7 @@ extern bool get_rel_relispartition(Oid relid);
 extern Oid	get_rel_tablespace(Oid relid);
 extern char get_rel_persistence(Oid relid);
 extern Oid	get_rel_relam(Oid relid);
+extern bool get_rel_has_globalindex(Oid relid);
 extern Oid	get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
 extern Oid	get_transform_tosql(Oid typid, Oid langid, List *trftypes);
 extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 35270fdc05..1117f352e4 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -150,7 +150,16 @@ typedef struct RelationData
 	MemoryContext rd_partcheckcxt;	/* private cxt for rd_partcheck, if any */
 
 	/* data managed by RelationGetIndexList: */
-	List	   *rd_indexlist;	/* list of OIDs of indexes on relation */
+
+	/*
+	 * List of OIDs of indexes on the relation, including the global indexes of
+	 * all its ancestors. We include the ancestor's global indexes because any
+	 * operation performed on this relation, such as insert or update, will
+	 * also affect the global indexes of the ancestors so wherever we need to
+	 * fetch the indexes we also need to fetch the global indexes of the
+	 * ancestors.
+	 */
+	List	   *rd_indexlist;
 	Oid			rd_pkindex;		/* OID of (deferrable?) primary key, if any */
 	bool		rd_ispkdeferrable;	/* is rd_pkindex a deferrable PK? */
 	Oid			rd_replidindex; /* OID of replica identity index, if any */
@@ -722,6 +731,12 @@ RelationCloseSmgr(Relation relation)
 	 (relation)->rd_rel->relkind != RELKIND_FOREIGN_TABLE &&	\
 	 !IsCatalogRelation(relation))
 
+/*
+ * Check whether the input relation is a global index or not.
+ */
+#define RelationIsGlobalIndex(relation) \
+	((relation)->rd_rel->relkind == RELKIND_GLOBAL_INDEX)
+
 /* routines in utils/cache/relcache.c */
 extern void RelationIncrementReferenceCount(Relation rel);
 extern void RelationDecrementReferenceCount(Relation rel);
-- 
2.49.0

v1-0003-Provide-support-for-global-Index-Scan-Path.patchapplication/octet-stream; name=v1-0003-Provide-support-for-global-Index-Scan-Path.patchDownload

From 1f962d31d9f71fafe729b5a25396cdce112b7646 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@Dilip.local>
Date: Thu, 15 May 2025 17:39:58 +0530
Subject: [PATCH v1 3/4] Provide support for global Index Scan Path

In previous patches we have added support for creating the global index.  Now
in this patch we provided a support in planner to choose a global index scan
and index only scan paths at for the append rel.

Currently we do not have support for selecting a bitmap scan using the global
index.  We may do that in future and if we need to do that we need to change
a executor such that we can build a sperate tidmap for each leaf relation while
scanning the global index and then do the bitmap heap scan partition at a time
based on the bitmap.

We also do not support the parallel index scan using the global index.  There
is nothing blocking as such but this is still a TODO.

Open Items
- In table_slot_callbacks(), now partiioned table can generate tuple by global
 index scan so we need proper slot instead of just assigning a virtual slot.
This handling should be done maybe through AM callback?
---
 src/backend/access/index/genam.c         |  19 ++
 src/backend/access/index/indexam.c       | 245 ++++++++++++++++++++++-
 src/backend/access/nbtree/nbtree.c       |  10 +-
 src/backend/access/nbtree/nbtsearch.c    |  71 +++++--
 src/backend/catalog/partition.c          |   4 -
 src/backend/commands/explain.c           |  12 +-
 src/backend/executor/nodeIndexonlyscan.c |  25 ++-
 src/backend/executor/nodeIndexscan.c     |  16 +-
 src/backend/optimizer/path/allpaths.c    |  12 ++
 src/backend/optimizer/path/indxpath.c    |  40 +++-
 src/backend/optimizer/plan/planmain.c    |   4 +-
 src/backend/optimizer/plan/planner.c     | 139 ++++++++++++-
 src/backend/optimizer/util/appendinfo.c  |  60 +++++-
 src/backend/optimizer/util/plancat.c     |  43 ++--
 src/backend/optimizer/util/var.c         |   1 +
 src/backend/parser/parse_utilcmd.c       |   1 +
 src/backend/utils/adt/selfuncs.c         |   4 +
 src/backend/utils/cache/plancache.c      |  15 ++
 src/bin/psql/describe.c                  |  15 +-
 src/include/access/genam.h               |   6 +
 src/include/access/nbtree.h              |   3 +
 src/include/access/relscan.h             |   8 +-
 src/include/nodes/pathnodes.h            |  21 ++
 src/include/nodes/plannodes.h            |   3 +
 src/include/optimizer/appendinfo.h       |   2 +
 25 files changed, 725 insertions(+), 54 deletions(-)

diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index c2b80669aa..13bd1e90b7 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -126,6 +126,25 @@ RelationGetIndexScan(Relation indexRelation, int nkeys, int norderbys)
 	scan->xs_hitup = NULL;
 	scan->xs_hitupdesc = NULL;
 
+	/*
+	 * Set a flag to indicate a global index scan and create a cache for
+	 * partition ID to relation OID lookup. This is necessary because a global
+	 * index stores the partition ID along with each tuple, and when fetching a
+	 * tuple, we need to convert that partition ID into a relation OID. For
+	 * more details, refer to the comments above the PartitionId typedef.
+	 */
+	if (RelationIsGlobalIndex(indexRelation))
+	{
+		scan->xs_global_index = true;
+		scan->xs_global_index_cache =
+			create_globalindex_partition_cache(CurrentMemoryContext);
+	}
+	else
+	{
+		scan->xs_global_index = false;
+		scan->xs_global_index_cache = NULL;
+	}
+
 	return scan;
 }
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 3aa1fc92df..4e18d8150d 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -104,11 +104,35 @@ do { \
 			 CppAsString(pname), RelationGetRelationName(scan->indexRelation)); \
 } while(0)
 
+/*
+ * Lookup table from relation oid to the relation descriptor and
+ * IndexFetchTableData structure.  Because only once we should call
+ * table_index_fetch_begin() for each partition but in scan->xs_heapfetch we
+ * will overwrite with the current partition so if we come back to the old
+ * partition which we already have scanned once then we should use the same
+ * xs_heapfetch and that we can get from the cache.
+ */
+typedef struct GlobalIndexPartitionCacheData
+{
+	MemoryContext pdir_mcxt;
+	HTAB	*pdir_hash;
+} GlobalIndexPartitionCacheData;
+
+typedef struct GlobalIndexPartitionCacheEntry
+{
+	Oid 		reloid;
+	Relation	relation;
+	IndexFetchTableData *heapfetch;
+} GlobalIndexPartitionCacheEntry;
+
 static IndexScanDesc index_beginscan_internal(Relation indexRelation,
 											  int nkeys, int norderbys, Snapshot snapshot,
 											  ParallelIndexScanDesc pscan, bool temp_snap);
 static inline void validate_relation_kind(Relation r);
-
+static GlobalIndexPartitionCacheEntry *globalindex_partition_entry_lookup(
+											GlobalIndexPartitionCache pdir,
+											Oid relid);
+static void globalindex_partition_cache_reset(GlobalIndexPartitionCache pdir);
 
 /* ----------------------------------------------------------------
  *				   index_ interface functions
@@ -270,12 +294,29 @@ index_beginscan(Relation heapRelation,
 	 * Save additional parameters into the scandesc.  Everything else was set
 	 * up by RelationGetIndexScan.
 	 */
-	scan->heapRelation = heapRelation;
 	scan->xs_snapshot = snapshot;
 	scan->instrument = instrument;
 
-	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	/*
+	 * For global index do not set the heapRelation and xs_heapfetch because
+	 * while scanning the index we might get tids belongs to different
+	 * partitions so we will initialize these fields when we actually fetch the
+	 * tid from the index as that time we will know the relation oid from where
+	 * we need to fetch the tid.
+	 */
+	if (scan->xs_global_index)
+	{
+		scan->heapRelation = NULL;
+		scan->xs_heapfetch = NULL;
+	}
+	else
+	{
+		scan->heapRelation = heapRelation;
+
+		/* prepare to fetch index matches from table */
+		scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	}
+
 
 	return scan;
 }
@@ -365,7 +406,23 @@ index_rescan(IndexScanDesc scan,
 	Assert(norderbys == scan->numberOfOrderBys);
 
 	/* Release resources (like buffer pins) from table accesses */
-	if (scan->xs_heapfetch)
+	if (scan->xs_global_index)
+	{
+		/*
+		 * For the global index, also reset the xs_global_index_cache.
+		 * Essentially, the global index will have multiple entries of
+		 * xs_heapfetch corresponding to each partition. These entries will be
+		 * reset inside  globalindex_partition_cache_reset(). Here, we can
+		 * simply set xs_heapfetch  and heapRelation to NULL in the scan
+		 * descriptor. For more details, refer  to the comments inside
+		 * index_beginscan().
+		 */
+		scan->heapRelation = NULL;
+		scan->xs_heapfetch = NULL;
+		if (scan->xs_global_index_cache)
+			globalindex_partition_cache_reset(scan->xs_global_index_cache);
+	}
+	else if (scan->xs_heapfetch)
 		table_index_fetch_reset(scan->xs_heapfetch);
 
 	scan->kill_prior_tuple = false; /* for safety */
@@ -386,7 +443,18 @@ index_endscan(IndexScanDesc scan)
 	CHECK_SCAN_PROCEDURE(amendscan);
 
 	/* Release resources (like buffer pins) from table accesses */
-	if (scan->xs_heapfetch)
+	if (scan->xs_global_index)
+	{
+		/*
+		 * For global index also reset the cache, interanlly this will
+		 * deallocate the index fetch handle for each partition.
+		 */
+		if (scan->xs_global_index_cache)
+			globalindex_partition_cache_destroy(scan->xs_global_index_cache);
+		scan->heapRelation = NULL;
+		scan->xs_heapfetch = NULL;
+	}
+	else if (scan->xs_heapfetch)
 	{
 		table_index_fetch_end(scan->xs_heapfetch);
 		scan->xs_heapfetch = NULL;
@@ -442,7 +510,18 @@ index_restrpos(IndexScanDesc scan)
 	CHECK_SCAN_PROCEDURE(amrestrpos);
 
 	/* release resources (like buffer pins) from table accesses */
-	if (scan->xs_heapfetch)
+	if (scan->xs_global_index)
+	{
+		/*
+		 * For global index also reset the cache, interanlly this will reset
+		 * the index fetch handle for each partition.
+		 */
+		if (scan->xs_global_index_cache)
+			globalindex_partition_cache_reset(scan->xs_global_index_cache);
+		scan->heapRelation = NULL;
+		scan->xs_heapfetch = NULL;
+	}
+	else if (scan->xs_heapfetch)
 		table_index_fetch_reset(scan->xs_heapfetch);
 
 	scan->kill_prior_tuple = false; /* for safety */
@@ -742,6 +821,15 @@ index_getnext_slot(IndexScanDesc scan, ScanDirection direction, TupleTableSlot *
 		 * the index.
 		 */
 		Assert(ItemPointerIsValid(&scan->xs_heaptid));
+
+		/*
+		 * For global index we need to get the heapoid of the parittion
+		 * relation from the scan descriptor stored by index scan and fetch the
+		 * tuple from that relation.
+		 */
+		if (scan->xs_global_index)
+			global_indexscan_setup_partrel(scan);
+
 		if (index_fetch_heap(scan, slot))
 			return true;
 	}
@@ -1085,3 +1173,146 @@ index_opclass_options(Relation indrel, AttrNumber attnum, Datum attoptions,
 
 	return build_local_reloptions(&relopts, attoptions, validate);
 }
+
+/*
+ * Helper function for index_getnext_slot() and IndexOnlyNext for setting up
+ * a proper scan->heapRelation and scan->xs_heapfetch during global index scan
+ * as global index will return tids which belongs to different partitions.
+ */
+void
+global_indexscan_setup_partrel(IndexScanDesc scan)
+{
+	Oid		relid;
+	GlobalIndexPartitionCacheEntry *entry;
+
+	relid = scan->xs_heapoid;
+
+	/*
+	 * During a global index scan, we might encounter index entries that belong
+	 * to different partitions, which could be interleaved.  Each time we get
+	 * a new index tuple, we need to verify if the scan->heapRelation matches
+	 * the relid of that tuple. If it does not, we fetch the corresponding
+	 * entry from the cache and store it in the scan descriptor.
+	 */
+	if (scan->heapRelation == NULL)
+	{
+		entry = globalindex_partition_entry_lookup(
+								scan->xs_global_index_cache, relid);
+
+		scan->heapRelation = entry->relation;
+		scan->xs_heapfetch = entry->heapfetch;
+	}
+	else if (scan->heapRelation &&
+				relid != RelationGetRelid(scan->heapRelation))
+	{
+		table_index_fetch_reset(scan->xs_heapfetch);
+
+		entry = globalindex_partition_entry_lookup(
+								scan->xs_global_index_cache, relid);
+		scan->heapRelation = entry->relation;
+		scan->xs_heapfetch = entry->heapfetch;
+	}
+}
+
+/*
+ * create_globalindex_partition_cache - Create index scan partition cache
+ *
+ * For more details about this cache refer comments atop
+ * GlobalIndexPartitionCacheData structure.
+ */
+GlobalIndexPartitionCache
+create_globalindex_partition_cache(MemoryContext mcxt)
+{
+	MemoryContext oldcontext = MemoryContextSwitchTo(mcxt);
+	GlobalIndexPartitionCache pdir;
+	HASHCTL		ctl;
+
+	MemSet(&ctl, 0, sizeof(HASHCTL));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(GlobalIndexPartitionCacheEntry);
+	ctl.hcxt = mcxt;
+
+	pdir = palloc(sizeof(GlobalIndexPartitionCacheData));
+	pdir->pdir_mcxt = mcxt;
+	pdir->pdir_hash = hash_create("globalIndex partitionId cache", 256, &ctl,
+								  HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+	MemoryContextSwitchTo(oldcontext);
+	return pdir;
+}
+
+/*
+ * globalindex_partition_entry_lookup
+ *
+ * Lookup the relation descriptor and index heap fetch handle for the given
+ * relid.  If the entry is not found, it will open the relation, initialize the
+ * index fetch on that relation, and store it in the cache for subsequent
+ * references.
+ */
+static GlobalIndexPartitionCacheEntry *
+globalindex_partition_entry_lookup(GlobalIndexPartitionCache pdir, Oid relid)
+{
+	GlobalIndexPartitionCacheEntry *pde;
+	bool		found;
+	Relation	part_rel;
+
+	Assert(OidIsValid(relid));
+	Assert(pdir);
+	pde = hash_search(pdir->pdir_hash, &relid, HASH_FIND, &found);
+	if (found)
+		return pde;
+	else
+	{
+		pde = hash_search(pdir->pdir_hash, &relid, HASH_ENTER, &found);
+		part_rel = relation_open(relid, AccessShareLock);
+		pde->relation = part_rel;
+		pde->heapfetch = table_index_fetch_begin(part_rel);
+	}
+
+	return pde;
+}
+
+/*
+ * globalindex_partition_entry_lookup - destory the cache
+ *
+ * This will destory the GlobalIndexPartitionCache and also deallocate index
+ * fetch for each cache entry whereever it was initialized.
+ */
+void
+globalindex_partition_cache_destroy(GlobalIndexPartitionCache pdir)
+{
+	HASH_SEQ_STATUS status;
+	GlobalIndexPartitionCacheEntry *pde;
+
+	hash_seq_init(&status, pdir->pdir_hash);
+	while ((pde = hash_seq_search(&status)) != NULL)
+	{
+		if (pde->heapfetch)
+		{
+			table_index_fetch_end(pde->heapfetch);
+			pde->heapfetch = NULL;
+		}
+
+		relation_close(pde->relation, NoLock);
+	}
+}
+
+/*
+ * globalindex_partition_entry_lookup - reset the cache
+ *
+ * This will reset the GlobalIndexPartitionCache and also reset the index
+ * fetch for each cache entry if it was initialized.
+ */
+static void
+globalindex_partition_cache_reset(GlobalIndexPartitionCache pdir)
+{
+	HASH_SEQ_STATUS status;
+	GlobalIndexPartitionCacheEntry *entry;
+
+	hash_seq_init(&status, pdir->pdir_hash);
+	while ((entry = hash_seq_search(&status)))
+	{
+		if (entry->heapfetch)
+			table_index_fetch_reset(entry->heapfetch);
+	}
+}
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index c3960784eb..e310ddcea6 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -228,7 +228,15 @@ btgettuple(IndexScanDesc scan, ScanDirection dir)
 	BTScanOpaque so = (BTScanOpaque) scan->opaque;
 	bool		res;
 
-	Assert(scan->heapRelation != NULL);
+	/*
+	 * When working with global indexes, the scan's heap relation
+	 * (scan->heapRelation) is not set beforehand. Instead, it's populated by
+	 * the index scan interfaces, dynamically determined based on the TID being
+	 * processed. This is because global index tuples explicitly carry the heap
+	 * OID (along with the TID) to identify the originating heap relation.
+	 */
+	Assert(RelationIsGlobalIndex(scan->indexRelation) ||
+		   scan->heapRelation != NULL);
 
 	/* btree indexes are never lossy */
 	scan->xs_recheck = false;
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index 36544ecfd5..44841394df 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -35,13 +35,14 @@ static int	_bt_binsrch_posting(BTScanInsert key, Page page,
 static bool _bt_readpage(IndexScanDesc scan, ScanDirection dir,
 						 OffsetNumber offnum, bool firstpage);
 static void _bt_saveitem(BTScanOpaque so, int itemIndex,
-						 OffsetNumber offnum, IndexTuple itup);
+						 OffsetNumber offnum, IndexTuple itup, Oid heapOid);
 static int	_bt_setuppostingitems(BTScanOpaque so, int itemIndex,
 								  OffsetNumber offnum, ItemPointer heapTid,
-								  IndexTuple itup);
+								  IndexTuple itup, Oid heapOid);
 static inline void _bt_savepostingitem(BTScanOpaque so, int itemIndex,
 									   OffsetNumber offnum,
-									   ItemPointer heapTid, int tupleOffset);
+									   ItemPointer heapTid, int tupleOffset,
+									   Oid heapOid);
 static inline void _bt_returnitem(IndexScanDesc scan, BTScanOpaque so);
 static bool _bt_steppage(IndexScanDesc scan, ScanDirection dir);
 static bool _bt_readfirstpage(IndexScanDesc scan, OffsetNumber offnum,
@@ -1608,6 +1609,7 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum,
 	bool		arrayKeys;
 	int			itemIndex,
 				indnatts;
+	Oid			heapOid;
 
 	/* save the page/buffer block number, along with its sibling links */
 	page = BufferGetPage(so->currPos.buf);
@@ -1718,6 +1720,27 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum,
 			itup = (IndexTuple) PageGetItem(page, iid);
 			Assert(!BTreeTupleIsPivot(itup));
 
+			/*
+			 * For global index we also need to fetch the relation oid in order
+			 * to know from which relation we need to fetch tuple.
+			 */
+			if (RelationIsGlobalIndex(scan->indexRelation))
+			{
+				heapOid = BTreeTupleGetPartitionRelid(scan->indexRelation, itup);
+
+				/*
+				 * If the partition is already detcahed then we will get an
+				 * InvalidOid so ignore such tuples.
+				 */
+				if (!OidIsValid(heapOid))
+				{
+					offnum = OffsetNumberNext(offnum);
+					continue;
+				}
+			}
+			else
+				heapOid = InvalidOid;
+
 			pstate.offnum = offnum;
 			passes_quals = _bt_checkkeys(scan, &pstate, arrayKeys,
 										 itup, indnatts);
@@ -1743,7 +1766,7 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum,
 				if (!BTreeTupleIsPosting(itup))
 				{
 					/* Remember it */
-					_bt_saveitem(so, itemIndex, offnum, itup);
+					_bt_saveitem(so, itemIndex, offnum, itup, heapOid);
 					itemIndex++;
 				}
 				else
@@ -1757,14 +1780,14 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum,
 					tupleOffset =
 						_bt_setuppostingitems(so, itemIndex, offnum,
 											  BTreeTupleGetPostingN(itup, 0),
-											  itup);
+											  itup, heapOid);
 					itemIndex++;
 					/* Remember additional TIDs */
 					for (int i = 1; i < BTreeTupleGetNPosting(itup); i++)
 					{
 						_bt_savepostingitem(so, itemIndex, offnum,
 											BTreeTupleGetPostingN(itup, i),
-											tupleOffset);
+											tupleOffset, heapOid);
 						itemIndex++;
 					}
 				}
@@ -1883,6 +1906,24 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum,
 			itup = (IndexTuple) PageGetItem(page, iid);
 			Assert(!BTreeTupleIsPivot(itup));
 
+			/*
+			 * For global index we also need to fetch the partition id in order
+			 * to know from which relation we need to fetch tuple.  We might
+			 * get an InvalidOid if the partition is already detcahed so ignore
+			 * such tuples.
+			 */
+			if (RelationIsGlobalIndex(scan->indexRelation))
+			{
+				heapOid = BTreeTupleGetPartitionRelid(scan->indexRelation, itup);
+				if (!OidIsValid(heapOid))
+				{
+					offnum = OffsetNumberNext(offnum);
+					continue;
+				}
+			}
+			else
+				heapOid = InvalidOid;
+
 			pstate.offnum = offnum;
 			if (arrayKeys && offnum == minoff && pstate.forcenonrequired)
 			{
@@ -1931,7 +1972,7 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum,
 				{
 					/* Remember it */
 					itemIndex--;
-					_bt_saveitem(so, itemIndex, offnum, itup);
+					_bt_saveitem(so, itemIndex, offnum, itup, heapOid);
 				}
 				else
 				{
@@ -1951,14 +1992,14 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum,
 					tupleOffset =
 						_bt_setuppostingitems(so, itemIndex, offnum,
 											  BTreeTupleGetPostingN(itup, 0),
-											  itup);
+											  itup, heapOid);
 					/* Remember additional TIDs */
 					for (int i = 1; i < BTreeTupleGetNPosting(itup); i++)
 					{
 						itemIndex--;
 						_bt_savepostingitem(so, itemIndex, offnum,
 											BTreeTupleGetPostingN(itup, i),
-											tupleOffset);
+											tupleOffset, heapOid);
 					}
 				}
 			}
@@ -2002,12 +2043,13 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum,
 /* Save an index item into so->currPos.items[itemIndex] */
 static void
 _bt_saveitem(BTScanOpaque so, int itemIndex,
-			 OffsetNumber offnum, IndexTuple itup)
+			 OffsetNumber offnum, IndexTuple itup, Oid heapOid)
 {
 	BTScanPosItem *currItem = &so->currPos.items[itemIndex];
 
 	Assert(!BTreeTupleIsPivot(itup) && !BTreeTupleIsPosting(itup));
 
+	currItem->heapOid = heapOid;
 	currItem->heapTid = itup->t_tid;
 	currItem->indexOffset = offnum;
 	if (so->currTuples)
@@ -2032,12 +2074,13 @@ _bt_saveitem(BTScanOpaque so, int itemIndex,
  */
 static int
 _bt_setuppostingitems(BTScanOpaque so, int itemIndex, OffsetNumber offnum,
-					  ItemPointer heapTid, IndexTuple itup)
+					  ItemPointer heapTid, IndexTuple itup, Oid heapOid)
 {
 	BTScanPosItem *currItem = &so->currPos.items[itemIndex];
 
 	Assert(BTreeTupleIsPosting(itup));
 
+	currItem->heapOid = heapOid;
 	currItem->heapTid = *heapTid;
 	currItem->indexOffset = offnum;
 	if (so->currTuples)
@@ -2070,10 +2113,11 @@ _bt_setuppostingitems(BTScanOpaque so, int itemIndex, OffsetNumber offnum,
  */
 static inline void
 _bt_savepostingitem(BTScanOpaque so, int itemIndex, OffsetNumber offnum,
-					ItemPointer heapTid, int tupleOffset)
+					ItemPointer heapTid, int tupleOffset, Oid heapOid)
 {
 	BTScanPosItem *currItem = &so->currPos.items[itemIndex];
 
+	currItem->heapOid = heapOid;
 	currItem->heapTid = *heapTid;
 	currItem->indexOffset = offnum;
 
@@ -2100,6 +2144,9 @@ _bt_returnitem(IndexScanDesc scan, BTScanOpaque so)
 	Assert(so->currPos.itemIndex <= so->currPos.lastItem);
 
 	/* Return next item, per amgettuple contract */
+	/* For global index we must have a valid heap oid. */
+	Assert(!scan->xs_global_index || OidIsValid(currItem->heapOid));
+	scan->xs_heapoid = currItem->heapOid;
 	scan->xs_heaptid = currItem->heapTid;
 	if (so->currTuples)
 		scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 472a096206..48bd2066a1 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -44,10 +44,6 @@ static void get_partition_ancestors_worker(Relation inhRel, Oid relid,
  *
  * If the partition is in the process of being detached, an error is thrown,
  * unless even_if_detached is passed as true.
- *
- * Note: Because this function assumes that the relation whose OID is passed
- * as an argument will have precisely one parent, it should only be called
- * when it is known that the relation is a partition.
  */
 Oid
 get_partition_parent(Oid relid, bool even_if_detached)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7e2792ead7..0721135200 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1442,10 +1442,18 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			pname = sname = "Gather Merge";
 			break;
 		case T_IndexScan:
-			pname = sname = "Index Scan";
+				if (get_rel_relkind(((IndexScan *) plan)->indexid) ==
+					RELKIND_GLOBAL_INDEX)
+					pname = sname = "Global Index Scan";
+				else
+					pname = sname = "Index Scan";
 			break;
 		case T_IndexOnlyScan:
-			pname = sname = "Index Only Scan";
+				if (get_rel_relkind(((IndexScan *) plan)->indexid) ==
+					RELKIND_GLOBAL_INDEX)
+					pname = sname = "Global Index Only Scan";
+				else
+					pname = sname = "Index Only Scan";
 			break;
 		case T_BitmapIndexScan:
 			pname = sname = "Bitmap Index Scan";
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f464cca950..f85962b88a 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -43,6 +43,7 @@
 #include "storage/bufmgr.h"
 #include "storage/predicate.h"
 #include "utils/builtins.h"
+#include "utils/lsyscache.h"
 #include "utils/rel.h"
 
 
@@ -124,6 +125,14 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 		CHECK_FOR_INTERRUPTS();
 
+		/*
+		 * For global index we need to get the heapoid of the parittion
+		 * relation from the scan descriptor stored by index scan in order to
+		 * check the visibility map of that relation.
+		 */
+		if (scandesc->xs_global_index)
+			global_indexscan_setup_partrel(scandesc);
+
 		/*
 		 * We can skip the heap fetch if the TID references a heap page on
 		 * which all tuples are known visible to everybody.  In any case,
@@ -534,6 +543,7 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 	TupleDesc	tupDesc;
 	int			indnkeyatts;
 	int			namecount;
+	const TupleTableSlotOps *tts_cb;
 
 	/*
 	 * create state structure
@@ -569,14 +579,25 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 	ExecInitScanTupleSlot(estate, &indexstate->ss, tupDesc,
 						  &TTSOpsVirtual);
 
+	/*
+	 * FIXME: Global index scans on partitioned tables require
+	 * TTSOpsBufferHeapTuple, but partitioned tables normally get TTSOpsVirtual
+	 * (no TableAM).  We currently hack this by assuming partitions with global
+	 * indexes are Heap AM.  Proper TableAM integration for partitioned tables
+	 * is needed for slot allocation.
+	 */
+	if (get_rel_relkind(node->indexid) == RELKIND_GLOBAL_INDEX)
+		tts_cb = &TTSOpsBufferHeapTuple;
+	else
+		tts_cb = table_slot_callbacks(currentRelation);
+
 	/*
 	 * We need another slot, in a format that's suitable for the table AM, for
 	 * when we need to fetch a tuple from the table for rechecking visibility.
 	 */
 	indexstate->ioss_TableSlot =
 		ExecAllocTableSlot(&estate->es_tupleTable,
-						   RelationGetDescr(currentRelation),
-						   table_slot_callbacks(currentRelation));
+						   RelationGetDescr(currentRelation), tts_cb);
 
 	/*
 	 * Initialize result type and projection info.  The node's targetlist will
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe6..6cd041330d 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -911,6 +911,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	IndexScanState *indexstate;
 	Relation	currentRelation;
 	LOCKMODE	lockmode;
+	const TupleTableSlotOps *tts_cb;
 
 	/*
 	 * create state structure
@@ -935,12 +936,23 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	indexstate->ss.ss_currentRelation = currentRelation;
 	indexstate->ss.ss_currentScanDesc = NULL;	/* no heap scan here */
 
+	/*
+	 * FIXME: Global index scans on partitioned tables require
+	 * TTSOpsBufferHeapTuple, but partitioned tables normally get TTSOpsVirtual
+	 * (no TableAM).  We currently hack this by assuming partitions with global
+	 * indexes are Heap AM.  Proper TableAM integration for partitioned tables
+	 * is needed for slot allocation.
+	 */
+	if (get_rel_relkind(node->indexid) == RELKIND_GLOBAL_INDEX)
+		tts_cb = &TTSOpsBufferHeapTuple;
+	else
+		tts_cb = table_slot_callbacks(currentRelation);
+
 	/*
 	 * get the scan type from the relation descriptor.
 	 */
 	ExecInitScanTupleSlot(estate, &indexstate->ss,
-						  RelationGetDescr(currentRelation),
-						  table_slot_callbacks(currentRelation));
+						  RelationGetDescr(currentRelation), tts_cb);
 
 	/*
 	 * Initialize result type and projection.
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6cc6966b06..230a98f221 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1211,6 +1211,12 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		}
 	}
 
+	/*
+	 * We need to check the index predicate for the parent relation, as the
+	 * parent relation may have global index scan paths.
+	 */
+	check_index_predicates(root, rel);
+
 	if (has_live_children)
 	{
 		/*
@@ -1303,6 +1309,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 
 	/* Add paths to the append relation. */
 	add_paths_to_append_rel(root, rel, live_childrels);
+
+	/*
+	 * Partiotioned relation may have global indexes so lets consider index
+	 * scan paths.
+	 */
+	create_index_paths(root, rel);
 }
 
 
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 601354ea3e..8fef652d4a 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -21,6 +21,7 @@
 #include "access/sysattr.h"
 #include "catalog/pg_am.h"
 #include "catalog/pg_amop.h"
+#include "catalog/pg_index_partitions.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_type.h"
@@ -246,6 +247,7 @@ create_index_paths(PlannerInfo *root, RelOptInfo *rel)
 	IndexClauseSet jclauseset;
 	IndexClauseSet eclauseset;
 	ListCell   *lc;
+	bool		ispartitioned = IS_PARTITIONED_REL(rel);
 
 	/* Skip the whole mess if no indexes */
 	if (rel->indexlist == NIL)
@@ -259,6 +261,22 @@ create_index_paths(PlannerInfo *root, RelOptInfo *rel)
 	{
 		IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
 
+		/*
+		 * For partitioned relations, we can only consider global index scan
+		 * paths.  And for non partitioned relation ignore the indirect
+		 * global indexes.
+		 */
+		if ((ispartitioned && index->idxkind != INDEX_GLOBAL_DIRECT) ||
+			(!ispartitioned && index->idxkind != INDEX_LOCAL))
+			continue;
+
+		/*
+		 * For non partitioned table we should not get the global index info.
+		 * Check comments in get_relation_info() where we are adding
+		 * IndexOptInfo nodes.
+		 */
+		Assert(ispartitioned || index->idxkind != INDEX_GLOBAL_DIRECT);
+
 		/* Protect limited-size array in IndexClauseSets */
 		Assert(index->nkeycolumns <= INDEX_MAX_KEYS);
 
@@ -2228,6 +2246,7 @@ check_index_only(RelOptInfo *rel, IndexOptInfo *index)
 {
 	bool		result;
 	Bitmapset  *attrs_used = NULL;
+	Bitmapset  *rowidvar = NULL;
 	Bitmapset  *index_canreturn_attrs = NULL;
 	ListCell   *lc;
 	int			i;
@@ -2248,6 +2267,21 @@ check_index_only(RelOptInfo *rel, IndexOptInfo *index)
 	 */
 	pull_varattnos((Node *) rel->reltarget->exprs, rel->relid, &attrs_used);
 
+	/*
+	 * FIXME: Ugly hack to avoid global index only scan during update/delete.
+	 * In normal case it is avoided because reltarget will have junkattribute
+	 * which would not match with index_canreturn_attrs.  But with global index
+	 * we are creating this scan on parent table so we would have extra
+	 * ROWID_VAR but that would not get caught while calling pull_varattnos
+	 * with rel->relid so we are searching here with sepecific ROWID_VAR.
+	 */
+	if (rel->nparts != 0)
+	{
+		pull_varattnos((Node *) rel->reltarget->exprs, ROWID_VAR, &rowidvar);
+		if (rowidvar != NULL)
+			return false;
+	}
+
 	/*
 	 * Add all the attributes used by restriction clauses; but consider only
 	 * those clauses not implied by the index predicate, since ones that are
@@ -2276,9 +2310,11 @@ check_index_only(RelOptInfo *rel, IndexOptInfo *index)
 
 		/*
 		 * For the moment, we just ignore index expressions.  It might be nice
-		 * to do something with them, later.
+		 * to do something with them, later.  For global index we also add
+		 * an internal partition id attribute so just ignore that as we don't
+		 * need to return that attribute from index.
 		 */
-		if (attno == 0)
+		if (attno == 0 || attno == PartitionIdAttributeNumber)
 			continue;
 
 		if (index->canreturn[i])
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 5467e094ca..922b938f0b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -20,6 +20,7 @@
  */
 #include "postgres.h"
 
+#include "catalog/pg_inherits.h"
 #include "optimizer/appendinfo.h"
 #include "optimizer/clauses.h"
 #include "optimizer/optimizer.h"
@@ -28,7 +29,8 @@
 #include "optimizer/paths.h"
 #include "optimizer/placeholder.h"
 #include "optimizer/planmain.h"
-
+#include "storage/lmgr.h"
+#include "storage/lockdefs.h"
 
 /*
  * query_planner
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 549aedcfa9..b63e9c47c1 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -22,6 +22,7 @@
 #include "access/parallel.h"
 #include "access/sysattr.h"
 #include "access/table.h"
+#include "catalog/partition.h"
 #include "catalog/pg_aggregate.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_proc.h"
@@ -58,6 +59,7 @@
 #include "parser/parsetree.h"
 #include "partitioning/partdesc.h"
 #include "rewrite/rewriteManip.h"
+#include "storage/lmgr.h"
 #include "utils/backend_status.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
@@ -267,7 +269,7 @@ static bool group_by_has_partkey(RelOptInfo *input_rel,
 static int	common_prefix_cmp(const void *a, const void *b);
 static List *generate_setop_child_grouplist(SetOperationStmt *op,
 											List *targetlist);
-
+static void lock_additional_rel(PlannerInfo *root);
 
 /*****************************************************************************
  *
@@ -581,6 +583,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	result->utilityStmt = parse->utilityStmt;
 	result->stmt_location = parse->stmt_location;
 	result->stmt_len = parse->stmt_len;
+	result->lockrelOids = glob->lockRelOids;
 
 	result->jitFlags = PGJIT_NONE;
 	if (jit_enabled && jit_above_cost >= 0 &&
@@ -1176,6 +1179,13 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 	 */
 	SS_identify_outer_params(root);
 
+	/*
+	 * Prepare a list of additional relation OIDs to be locked if there is any
+	 * global index on the result relation.  Also lock those OIDs, for more
+	 * details refer function header comments.
+	 */
+	lock_additional_rel(root);
+
 	/*
 	 * If any initPlans were created in this query level, adjust the surviving
 	 * Paths' costs and parallel-safety flags to account for them.  The
@@ -7748,12 +7758,13 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
 	bool		rel_is_partitioned = IS_PARTITIONED_REL(rel);
 	PathTarget *scanjoin_target;
 	ListCell   *lc;
+	List	   *global_index_path_list = NIL;
 
 	/* This recurses, so be paranoid. */
 	check_stack_depth();
 
 	/*
-	 * If the rel is partitioned, we want to drop its existing paths and
+	 * If the rel is partitioned, we want to drop its existing append paths and
 	 * generate new ones.  This function would still be correct if we kept the
 	 * existing paths: we'd modify them to generate the correct target above
 	 * the partitioning Append, and then they'd compete on cost with paths
@@ -7770,9 +7781,57 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
 	 * stanza.  Hence, zap the main pathlist here, then allow
 	 * generate_useful_gather_paths to add path(s) to the main list, and
 	 * finally zap the partial pathlist.
+	 *
+	 * Note: All the partitioned rel paths which are build by appending child
+	 * rel paths will be rebuilt again so we need to preserve the global index
+	 * paths which are directly created on the partitioned relation.
 	 */
 	if (rel_is_partitioned)
+	{
+		List	*newtarget = NIL;
+		PathTarget *index_scanjoin_target;
+
+		/*
+		 * Preprocess the scanjoin_targets and replace ROWID_VAR with the
+		 * partitioned rel's varno, TODO - explain the reasoning here.
+		 */
+		foreach(lc, scanjoin_targets)
+		{
+			PathTarget *target = lfirst_node(PathTarget, lc);
+
+			target = copy_pathtarget(target);
+			target->exprs = (List *)
+				adjust_appendrel_rowid_vars(root, (Node *) target->exprs,
+											rel->relid);
+			newtarget = lappend(newtarget, target);
+		}
+		/* Extract SRF-free scan/join target. */
+		index_scanjoin_target = linitial_node(PathTarget, newtarget);
+
+		/*
+		 * As explained in above comments, skip all paths other than the
+		 * global index paths as other paths will be build again. So process
+		 * the global index paths and apply the index_scanjoin_target to them.
+		 */
+		foreach(lc, rel->pathlist)
+		{
+			Path	*path = (Path *) lfirst(lc);
+			Path	*newpath;
+
+			if (nodeTag(path) != T_IndexPath)
+				continue;
+
+			newpath = (Path *) create_projection_path(root, rel, path,
+													  index_scanjoin_target);
+			global_index_path_list = lappend(global_index_path_list, newpath);
+		}
+
+		/*
+		 * For now set the rel->pathlist to NIL and once we have regenerated
+		 * the append paths add the other paths back to the list.
+		 */
 		rel->pathlist = NIL;
+	}
 
 	/*
 	 * If the scan/join target is not parallel-safe, partial paths cannot
@@ -7935,6 +7994,9 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
 
 		/* Build new paths for this relation by appending child paths. */
 		add_paths_to_append_rel(root, rel, live_children);
+
+		if (global_index_path_list)
+			rel->pathlist = list_concat(rel->pathlist, global_index_path_list);
 	}
 
 	/*
@@ -8248,3 +8310,76 @@ generate_setop_child_grouplist(SetOperationStmt *op, List *targetlist)
 
 	return grouplist;
 }
+
+
+/*
+ * lock_additional_rel
+ *	Lock additional relations to be locked in presence of a global index and
+ *  also add those Oids to PlannerGlobal so that
+ *
+ * During DML operations on tables with global indexes, it's necessary to
+ * lock the entire partition tree up to the partitioned relation that holds
+ * the global index.
+ */
+static void
+lock_additional_rel(PlannerInfo *root)
+{
+	Query	   *parse = root->parse;
+	RelOptInfo *rel;
+	ListCell   *lc;
+	List	   *lockreloids = NIL;
+
+	/* Nothing to do if there is no result relation. */
+	if (parse->resultRelation <= 0)
+		return;
+
+	/*
+	 * Fetch the RelOptInfo of the result relation.  If we haven't built it
+	 * already then do it now.
+	 */
+	rel = find_base_rel_noerr(root, parse->resultRelation);
+	if (rel == NULL)
+	{
+		RangeTblEntry  *rte = root->simple_rte_array[parse->resultRelation];
+
+		/*
+		 * If we don't have global index on the result relation then we don't
+		 * need to do anything.
+		 */
+		if (!get_rel_has_globalindex(rte->relid))
+			return;
+
+		rel = build_simple_rel(root, parse->resultRelation, NULL);
+	}
+
+	/*
+	 * Loop through all the indexes of the result relation and if it is a
+	 * global index then lock all the inheritors under the relation on which
+	 * this global index is created.  Also store the list of all the OIDs
+	 * in PlannerGlobal.
+	 */
+	foreach(lc, rel->indexlist)
+	{
+		IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
+		List		 *childrel = NIL;
+
+		if (index->idxkind == INDEX_LOCAL)
+			continue;
+
+		if (list_member_oid(lockreloids, index->indrelid))
+			continue;
+
+		/*
+		 * Acquire lock on top level parent on which the global index is
+		 * created and also lock all its inheritors.
+		 */
+		LockRelationOid(index->indrelid, RowExclusiveLock);
+		lockreloids = lappend_oid(lockreloids, index->indrelid);
+		childrel = find_all_inheritors(index->indrelid, RowExclusiveLock,
+										NULL);
+		lockreloids = list_concat(lockreloids, childrel);
+	}
+
+	root->glob->lockRelOids =
+				list_concat_unique_oid(root->glob->lockRelOids, lockreloids);
+}
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 5b3dc0d865..2ad52cb497 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -32,6 +32,7 @@ typedef struct
 {
 	PlannerInfo *root;
 	int			nappinfos;
+	int			varno;
 	AppendRelInfo **appinfos;
 } adjust_appendrel_attrs_context;
 
@@ -41,7 +42,8 @@ static void make_inh_translation_list(Relation oldrelation,
 									  AppendRelInfo *appinfo);
 static Node *adjust_appendrel_attrs_mutator(Node *node,
 											adjust_appendrel_attrs_context *context);
-
+static Node *adjust_appendrel_rowid_vars_mutator(Node *node,
+								adjust_appendrel_attrs_context *context);
 
 /*
  * make_append_rel_info
@@ -529,6 +531,62 @@ adjust_appendrel_attrs_mutator(Node *node,
 	return expression_tree_mutator(node, adjust_appendrel_attrs_mutator, context);
 }
 
+/*
+ * Replace ROWID_VAR with the varno.
+ *
+ * This is simmilar to the adjust_appendrel_attrs(), except here instead of
+ * preparing the scantarget for the appendrel we are preparing for the
+ * partitioned rel, so varno of the partitioned rel is passed as input and we
+ * need to replcae the ROWID_VAR with the input varno.
+ */
+Node *
+adjust_appendrel_rowid_vars(PlannerInfo *root, Node *node, int varno)
+{
+	adjust_appendrel_attrs_context context;
+
+	context.root = root;
+	context.nappinfos = 0;
+	context.varno = varno;
+
+	/* Should never be translating a Query tree. */
+	Assert(node == NULL || !IsA(node, Query));
+
+	return adjust_appendrel_rowid_vars_mutator(node, &context);
+}
+
+static Node *
+adjust_appendrel_rowid_vars_mutator(Node *node,
+									adjust_appendrel_attrs_context *context)
+{
+	if (node == NULL)
+		return NULL;
+	if (IsA(node, Var))
+	{
+		Var		   *var = (Var *) copyObject(node);
+
+		if (var->varno == ROWID_VAR)
+		{
+			RowIdentityVarInfo *ridinfo = (RowIdentityVarInfo *)
+						list_nth(context->root->row_identity_vars, var->varattno - 1);
+
+			/* Substitute the Var given in the RowIdentityVarInfo */
+			var = copyObject(ridinfo->rowidvar);
+
+			/* Replace the ROWID_VAR with the varno of the partitioned rel. */
+			var->varno = context->varno;
+			/* identity vars shouldn't have nulling rels */
+			Assert(var->varnullingrels == NULL);
+			/* varnosyn in the RowIdentityVarInfo is probably wrong */
+			var->varnosyn = 0;
+			var->varattnosyn = 0;
+		}
+
+		return (Node *) var;
+	}
+	return expression_tree_mutator(node, adjust_appendrel_rowid_vars_mutator,
+								   (void *) context);
+}
+
 /*
  * adjust_appendrel_attrs_multilevel
  *	  Apply Var translations from an appendrel parent down to a child.
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index c716f9a6fe..576a7f97f4 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -35,6 +35,7 @@
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/pathnodes.h"
 #include "nodes/supportnodes.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
@@ -268,15 +269,6 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				continue;
 			}
 
-			/*
-			 * TODO: Global index scan paths are not yet supported.
-			 */
-			if (RelationIsGlobalIndex(indexRelation))
-			{
-				index_close(indexRelation, NoLock);
-				continue;
-			}
-
 			/*
 			 * If the index is valid, but cannot yet be used, ignore it; but
 			 * mark the plan we are generating as transient. See
@@ -293,7 +285,13 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 			info = makeNode(IndexOptInfo);
 
+			/* Set a flag to indicate this is a global index. */
+			if (RelationIsGlobalIndex(indexRelation))
+				info->idxkind = (index->indrelid == relationObjectId) ?
+								INDEX_GLOBAL_DIRECT : INDEX_GLOBAL_INDIRECT;
+
 			info->indexoid = index->indexrelid;
+			info->indrelid = index->indrelid;
 			info->reltablespace =
 				RelationGetForm(indexRelation)->reltablespace;
 			info->rel = rel;
@@ -333,15 +331,28 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				info->amoptionalkey = amroutine->amoptionalkey;
 				info->amsearcharray = amroutine->amsearcharray;
 				info->amsearchnulls = amroutine->amsearchnulls;
-				info->amcanparallel = amroutine->amcanparallel;
 				info->amhasgettuple = (amroutine->amgettuple != NULL);
-				info->amhasgetbitmap = amroutine->amgetbitmap != NULL &&
-					relation->rd_tableam->scan_bitmap_next_tuple != NULL;
 				info->amcanmarkpos = (amroutine->ammarkpos != NULL &&
 									  amroutine->amrestrpos != NULL);
 				info->amcostestimate = amroutine->amcostestimate;
 				Assert(info->amcostestimate != NULL);
 
+				/*
+				 * TODO: Currently parallel and bitmap scans are not supported
+				 * for the global indexes.
+				 */
+				if (info->idxkind != INDEX_LOCAL)
+				{
+					info->amcanparallel = false;
+					info->amhasgetbitmap = false;
+				}
+				else
+				{
+					info->amcanparallel = amroutine->amcanparallel;
+					info->amhasgetbitmap = amroutine->amgetbitmap != NULL &&
+					relation->rd_tableam->scan_bitmap_next_tuple != NULL;
+				}
+
 				/* Fetch index opclass options */
 				info->opclassoptions = RelationGetIndexAttOptions(indexRelation, true);
 
@@ -1932,7 +1943,13 @@ build_index_tlist(PlannerInfo *root, IndexOptInfo *index,
 			/* simple column */
 			const FormData_pg_attribute *att_tup;
 
-			if (indexkey < 0)
+			/*
+			 * If the attribute number is PartitionIdAttributeNumber then
+			 * directly assign to the predefined partitionid_attr constant.
+			 */
+			if (indexkey == PartitionIdAttributeNumber)
+				att_tup = &partitionid_attr;
+			else if (indexkey < 0)
 				att_tup = SystemAttributeDefinition(indexkey);
 			else
 				att_tup = TupleDescAttr(heapRelation->rd_att, indexkey - 1);
diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index 8065237a18..3fd7bc949f 100644
--- a/src/backend/optimizer/util/var.c
+++ b/src/backend/optimizer/util/var.c
@@ -21,6 +21,7 @@
 #include "postgres.h"
 
 #include "access/sysattr.h"
+#include "catalog/pg_index_partitions.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/optimizer.h"
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index d354f44e66..1dc7fd2ae4 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -4266,6 +4266,7 @@ transformPartitionCmd(CreateStmtContext *cxt, PartitionCmd *cmd)
 							RelationGetRelationName(parentRel))));
 			break;
 		case RELKIND_INDEX:
+		case RELKIND_GLOBAL_INDEX:
 			/* the index must be partitioned */
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index ce6a626eba..7d3082a54b 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -6500,6 +6500,8 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
 		/* Ignore non-ordering indexes */
 		if (index->sortopfamily == NULL)
 			continue;
+		if (index->idxkind != INDEX_LOCAL)
+			continue;
 
 		/*
 		 * Ignore partial indexes --- we only want stats that cover the entire
@@ -6720,6 +6722,8 @@ get_actual_variable_endpoint(Relation heapRel,
 	InitNonVacuumableSnapshot(SnapshotNonVacuumable,
 							  GlobalVisTestFor(heapRel));
 
+	Assert(!RelationIsGlobalIndex(indexRel));
+
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
 								 1, 0);
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 89a1c79e98..412628872c 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1928,6 +1928,21 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
 			else
 				UnlockRelationOid(rte->relid, rte->rellockmode);
 		}
+
+		/*
+		 * Loop through the lockrelOids derived based on the result relations
+		 * and acquire lock on all the relation.  We may store the lockmode as
+		 * well along with the oid but we can dirtectly use RowExclusiveLock
+		 * because these are derived from result relations and result relations
+		 * are locked in this mode.
+		 */
+		foreach_oid(relid, plannedstmt->lockrelOids)
+		{
+			if (acquire)
+				LockRelationOid(relid, RowExclusiveLock);
+			else
+				UnlockRelationOid(relid, RowExclusiveLock);
+		}
 	}
 }
 
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 778ec2815c..8624ece5d7 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -1923,7 +1923,8 @@ describeOneTableDetails(const char *schemaname,
 		attgenerated_col = cols++;
 	}
 	if (tableinfo.relkind == RELKIND_INDEX ||
-		tableinfo.relkind == RELKIND_PARTITIONED_INDEX)
+		tableinfo.relkind == RELKIND_PARTITIONED_INDEX ||
+		tableinfo.relkind == RELKIND_GLOBAL_INDEX)
 	{
 		if (pset.sversion >= 110000)
 		{
@@ -2308,7 +2309,8 @@ describeOneTableDetails(const char *schemaname,
 	}
 
 	if (tableinfo.relkind == RELKIND_INDEX ||
-		tableinfo.relkind == RELKIND_PARTITIONED_INDEX)
+		tableinfo.relkind == RELKIND_PARTITIONED_INDEX ||
+		tableinfo.relkind == RELKIND_GLOBAL_INDEX)
 	{
 		/* Footer information about an index */
 		PGresult   *result;
@@ -2412,7 +2414,8 @@ describeOneTableDetails(const char *schemaname,
 			/*
 			 * If it's a partitioned index, we'll print the tablespace below
 			 */
-			if (tableinfo.relkind == RELKIND_INDEX)
+			if (tableinfo.relkind == RELKIND_INDEX ||
+				tableinfo.relkind == RELKIND_GLOBAL_INDEX)
 				add_tablespace_footer(&cont, tableinfo.relkind,
 									  tableinfo.tablespace, true);
 		}
@@ -3666,6 +3669,7 @@ add_tablespace_footer(printTableContent *const cont, char relkind,
 		relkind == RELKIND_INDEX ||
 		relkind == RELKIND_PARTITIONED_TABLE ||
 		relkind == RELKIND_PARTITIONED_INDEX ||
+		relkind == RELKIND_GLOBAL_INDEX ||
 		relkind == RELKIND_TOASTVALUE)
 	{
 		/*
@@ -4055,6 +4059,7 @@ listTables(const char *tabtypes, const char *pattern, bool verbose, bool showSys
 					  " WHEN " CppAsString2(RELKIND_FOREIGN_TABLE) " THEN '%s'"
 					  " WHEN " CppAsString2(RELKIND_PARTITIONED_TABLE) " THEN '%s'"
 					  " WHEN " CppAsString2(RELKIND_PARTITIONED_INDEX) " THEN '%s'"
+					  " WHEN " CppAsString2(RELKIND_GLOBAL_INDEX) " THEN '%s'"
 					  " END as \"%s\",\n"
 					  "  pg_catalog.pg_get_userbyid(c.relowner) as \"%s\"",
 					  gettext_noop("Schema"),
@@ -4068,6 +4073,7 @@ listTables(const char *tabtypes, const char *pattern, bool verbose, bool showSys
 					  gettext_noop("foreign table"),
 					  gettext_noop("partitioned table"),
 					  gettext_noop("partitioned index"),
+					  gettext_noop("global index"),
 					  gettext_noop("Type"),
 					  gettext_noop("Owner"));
 	cols_so_far = 4;
@@ -4148,7 +4154,8 @@ listTables(const char *tabtypes, const char *pattern, bool verbose, bool showSys
 		appendPQExpBufferStr(&buf, CppAsString2(RELKIND_MATVIEW) ",");
 	if (showIndexes)
 		appendPQExpBufferStr(&buf, CppAsString2(RELKIND_INDEX) ","
-							 CppAsString2(RELKIND_PARTITIONED_INDEX) ",");
+							 CppAsString2(RELKIND_PARTITIONED_INDEX) ","
+							 CppAsString2(RELKIND_GLOBAL_INDEX) ",");
 	if (showSeq)
 		appendPQExpBufferStr(&buf, CppAsString2(RELKIND_SEQUENCE) ",");
 	if (showSystem || pattern)
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5..ec032ceda6 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -15,6 +15,8 @@
 #define GENAM_H
 
 #include "access/htup.h"
+#include "access/itup.h"
+#include "access/relscan.h"
 #include "access/sdir.h"
 #include "access/skey.h"
 #include "nodes/tidbitmap.h"
@@ -265,6 +267,10 @@ extern SysScanDesc systable_beginscan_ordered(Relation heapRelation,
 extern HeapTuple systable_getnext_ordered(SysScanDesc sysscan,
 										  ScanDirection direction);
 extern void systable_endscan_ordered(SysScanDesc sysscan);
+extern Relation globalindex_partition_rel_lookup(GlobalIndexPartitionCache pdir, Oid relid);
+extern void globalindex_partition_cache_destroy(GlobalIndexPartitionCache pdir);
+extern GlobalIndexPartitionCache create_globalindex_partition_cache(MemoryContext mcxt);
+extern void global_indexscan_setup_partrel(IndexScanDesc scan);
 extern void systable_inplace_update_begin(Relation relation,
 										  Oid indexId,
 										  bool indexOK,
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index cf7ddb0131..435a74749a 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -1009,6 +1009,9 @@ typedef BTVacuumPostingData *BTVacuumPosting;
 
 typedef struct BTScanPosItem	/* what we remember about each match */
 {
+	Oid		heapOid;	/* Oid of the partition relation , only valid for
+						   global indexes because global index can hold tuples
+						   from multiple partitions */
 	ItemPointerData heapTid;	/* TID of referenced heap item */
 	OffsetNumber indexOffset;	/* index item's location within page */
 	LocationIndex tupleOffset;	/* IndexTuple's offset in workspace, if any */
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c..8d0925d504 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -125,6 +125,8 @@ typedef struct IndexFetchTableData
 
 struct IndexScanInstrumentation;
 
+typedef struct GlobalIndexPartitionCacheData *GlobalIndexPartitionCache;
+
 /*
  * We use the same IndexScanDescData structure for both amgettuple-based
  * and amgetbitmap-based index scans.  Some fields are only relevant in
@@ -168,7 +170,9 @@ typedef struct IndexScanDescData
 	struct TupleDescData *xs_itupdesc;	/* rowtype descriptor of xs_itup */
 	HeapTuple	xs_hitup;		/* index data returned by AM, as HeapTuple */
 	struct TupleDescData *xs_hitupdesc; /* rowtype descriptor of xs_hitup */
-
+	Oid			xs_heapoid;		/* Oid of the partition relation , only valid
+								   for global indexes because global index can
+								   hold tuples from multiple partitions */
 	ItemPointerData xs_heaptid; /* result */
 	bool		xs_heap_continue;	/* T if must keep walking, potential
 									 * further results */
@@ -189,6 +193,8 @@ typedef struct IndexScanDescData
 
 	/* parallel index scan information, in shared memory */
 	struct ParallelIndexScanDescData *parallel_scan;
+	bool		xs_global_index;
+	GlobalIndexPartitionCache xs_global_index_cache;
 }			IndexScanDescData;
 
 /* Generic structure for parallel scans */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6567759595..fbae020a4c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -153,6 +153,9 @@ typedef struct PlannerGlobal
 	/* type OIDs for PARAM_EXEC Params */
 	List	   *paramExecTypes;
 
+	/* additional relation OIDs to be locked for global index */
+	List	   *lockRelOids;
+
 	/* highest PlaceHolderVar ID assigned */
 	Index		lastPHId;
 
@@ -856,6 +859,13 @@ typedef enum RelOptKind
 	RELOPT_OTHER_UPPER_REL,
 } RelOptKind;
 
+typedef enum IndexKind
+{
+	INDEX_LOCAL,
+	INDEX_GLOBAL_DIRECT,
+	INDEX_GLOBAL_INDIRECT
+} IndexKind;
+
 /*
  * Is the given relation a simple relation i.e a base or "other" member
  * relation?
@@ -1143,6 +1153,14 @@ struct IndexOptInfo
 	Oid			indexoid;
 	/* tablespace of index (not table) */
 	Oid			reltablespace;
+
+	/*
+	 * OID of the relation on which the index is created, for normal index we
+	 * have RelOptInfo reference to identify that relation but for global index
+	 * we need to explicitely need it as global index might have defined on
+	 * some upper level parent relations.
+	 */
+	Oid			indrelid;
 	/* back-link to index's table; don't print, else infinite recursion */
 	RelOptInfo *rel pg_node_attr(read_write_ignore);
 
@@ -1206,6 +1224,9 @@ struct IndexOptInfo
 	 */
 	List	   *indrestrictinfo;
 
+	/* whether the index is local or direct global or indirect global */
+	IndexKind	idxkind;
+
 	/* true if index predicate matches query */
 	bool		predOK;
 	/* true if a unique index */
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 4f59e30d62..c07a8f14fc 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -122,6 +122,9 @@ typedef struct PlannedStmt
 	/* OIDs of relations the plan depends on */
 	List	   *relationOids;
 
+	/* OIDs of relation to be locked */
+	List       *lockrelOids;
+
 	/* other dependencies, as PlanInvalItems */
 	List	   *invalItems;
 
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index d06f93b726..f8fd66c657 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -22,6 +22,8 @@ extern AppendRelInfo *make_append_rel_info(Relation parentrel,
 										   Index parentRTindex, Index childRTindex);
 extern Node *adjust_appendrel_attrs(PlannerInfo *root, Node *node,
 									int nappinfos, AppendRelInfo **appinfos);
+extern Node *adjust_appendrel_rowid_vars(PlannerInfo *root, Node *node,
+										 int varno);
 extern Node *adjust_appendrel_attrs_multilevel(PlannerInfo *root, Node *node,
 											   RelOptInfo *childrel,
 											   RelOptInfo *parentrel);
-- 
2.49.0

#14

Amit Langote

amitlangote09@gmail.com

6 months ago

In reply to: Dilip Kumar (#13)

Re: Proposal: Global Index for PostgreSQL

Hi Dilip,

Happy to see you working on this. It’s clear a lot of thought has
gone into the design.

On Tue, Jul 1, 2025 at 6:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

6) Need to perform a performance test, for SELECT/UPDATE/INSERT cases,
we already know the VACUUM performance.

One point I want to check my understanding of is around the locking
implications of global index scans, especially in prepared statement
scenarios.

I’ve been working on improving how we handle partition locking during
execution of generic plans. Specifically, I committed a patch to defer
locking of partitions until after pruning during ExecutorStart(), so
we avoid taking locks on partitions that aren’t actually needed --
even when the plan contains scans on all partitions. That patch was
later reverted, as Tom pointed out that the plan invalidation logic
wasn't cleanly handled. But the goal remains: to avoid locking
unnecessary partitions, particularly in high-partition-count OLTP
setups that use PREPARE/EXECUTE.

The proposed global index design, IIUC, requires locking all leaf
partitions up front during planning, and I guess during
AcquireExecutorLocks() when using a cached plan, because the index
scan could return tuples from any partition. This seems to directly
undercut that effort: we'd be back to generic plans causing broad
locking regardless of actual runtime needs.

I understand that this is currently necessary, given that a global
index scan is a single node without per-partition awareness. But it
might be worth considering whether the scan could opportunistically
defer heap relation locking until it returns a tuple that actually
belongs to a particular partition -- similar to how inserts into
partitioned tables only lock the target partition at execution time.
Or did I miss that inserts also need to lock all partitions up front
when global indexes are present, due to cross-partition uniqueness
checks?

Let me know if I’ve misunderstood the design.

--
Thanks, Amit Langote

#15

Dilip Kumar

dilipbalaut@gmail.com

6 months ago

In reply to: Amit Langote (#14)

Re: Proposal: Global Index for PostgreSQL

On Tue, Jul 1, 2025 at 7:12 PM Amit Langote <amitlangote09@gmail.com> wrote:

Hi Dilip,

Happy to see you working on this. It’s clear a lot of thought has
gone into the design.

Thanks, Amit. And thanks for your comment.

On Tue, Jul 1, 2025 at 6:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

6) Need to perform a performance test, for SELECT/UPDATE/INSERT cases,
we already know the VACUUM performance.

One point I want to check my understanding of is around the locking
implications of global index scans, especially in prepared statement
scenarios.

Sure

I’ve been working on improving how we handle partition locking during
execution of generic plans. Specifically, I committed a patch to defer
locking of partitions until after pruning during ExecutorStart(), so
we avoid taking locks on partitions that aren’t actually needed --
even when the plan contains scans on all partitions. That patch was
later reverted, as Tom pointed out that the plan invalidation logic
wasn't cleanly handled.

Yes I was following that thread, and at times when I was working on
locking for global index I had in mind that I would have to do
something to the locking after that patch is in. Unfortunately that
got reverted and then I didn't put any effort in reconsidering how
locking is handled for global index.

But the goal remains: to avoid locking

unnecessary partitions, particularly in high-partition-count OLTP
setups that use PREPARE/EXECUTE.

That makes sense.

The proposed global index design, IIUC, requires locking all leaf
partitions up front during planning, and I guess during
AcquireExecutorLocks() when using a cached plan, because the index
scan could return tuples from any partition. This seems to directly
undercut that effort: we'd be back to generic plans causing broad
locking regardless of actual runtime needs.

Just trying to understand the locking difference more with/without
global index, let's assume we have normal partitioned index on non
partition key column, and if we issue a scan on the partitioned table
then internally from expand_partitioned_rtentry() we will take lock on
all the partitions, because we can not prune any partition.
Similarly, with the global index also all the child partitions under
the top partition on which we have global index will be locked. So in
this case we do not have a difference.

I understand that this is currently necessary, given that a global
index scan is a single node without per-partition awareness. But it
might be worth considering whether the scan could opportunistically
defer heap relation locking until it returns a tuple that actually
belongs to a particular partition -- similar to how inserts into
partitioned tables only lock the target partition at execution time.
Or did I miss that inserts also need to lock all partitions up front
when global indexes are present, due to cross-partition uniqueness
checks?

Let me know if I’ve misunderstood the design.

So there difference is in the cases, where we are directly operating
on the leaf table, e.g. if you inserting directly on the leaf
relation, currently we just need to lock that partition, but if there
is global index we need to lock other siblings as well (in short all
the leaf under the parent which has global index) because if the
global index is unique we might need to check unique conflict in other
leafs as well. I believe when the table is partitioned, it might not
be the most preferred way to operate directly on the leaf, and with
global index only this case will be impacted where we are doing DML
directly on the leaf. I am not sure in this case how much delay we
can do in locking, because e.g. for insert we will only identify which
partition has a duplicate key while inserting in the btree.

--
Regards,
Dilip Kumar
Google

#16

Amit Langote

amitlangote09@gmail.com

6 months ago

In reply to: Dilip Kumar (#15)

Re: Proposal: Global Index for PostgreSQL

On Wed, Jul 2, 2025 at 1:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jul 1, 2025 at 7:12 PM Amit Langote <amitlangote09@gmail.com> wrote:

I’ve been working on improving how we handle partition locking during
execution of generic plans. Specifically, I committed a patch to defer
locking of partitions until after pruning during ExecutorStart(), so
we avoid taking locks on partitions that aren’t actually needed --
even when the plan contains scans on all partitions. That patch was
later reverted, as Tom pointed out that the plan invalidation logic
wasn't cleanly handled.

Yes I was following that thread, and at times when I was working on
locking for global index I had in mind that I would have to do
something to the locking after that patch is in. Unfortunately that
got reverted and then I didn't put any effort in reconsidering how
locking is handled for global index.

...

But the goal remains: to avoid locking

unnecessary partitions, particularly in high-partition-count OLTP
setups that use PREPARE/EXECUTE.

That makes sense.

Ok, good to know you were keeping a tab on it.

The proposed global index design, IIUC, requires locking all leaf
partitions up front during planning, and I guess during
AcquireExecutorLocks() when using a cached plan, because the index
scan could return tuples from any partition. This seems to directly
undercut that effort: we'd be back to generic plans causing broad
locking regardless of actual runtime needs.

Just trying to understand the locking difference more with/without
global index, let's assume we have normal partitioned index on non
partition key column, and if we issue a scan on the partitioned table
then internally from expand_partitioned_rtentry() we will take lock on
all the partitions, because we can not prune any partition.
Similarly, with the global index also all the child partitions under
the top partition on which we have global index will be locked. So in
this case we do not have a difference.

Just to clarify -- I was hoping that, at least for SELECTs, we
wouldn’t need to lock all leaf partitions up front.

One of the potential selling points of global indexes (compared to
partitioned indexes) is that we can avoid expand_partitioned_rtentry()
and the full scan path setup per partition, though that's admittedly
quite an undertaking. So I was imagining we could just lock the
parent and the global index during planning, and only lock individual
heap relations at execution time -- once we know which partition the
returned tuple belongs to.

Locking isn’t cheap -- and in workloads with thousands of partitions,
it becomes a major source of overhead, especially when the query
doesn't contain pruning quals and ends up touching only a few
partitions in practice. So I think it’s worth seeing if we can avoid
planning-time locking of all partitions in at least the SELECT case,
even if INSERTs may require broader locking due to uniqueness checks,
but see below...

I understand that this is currently necessary, given that a global
index scan is a single node without per-partition awareness. But it
might be worth considering whether the scan could opportunistically
defer heap relation locking until it returns a tuple that actually
belongs to a particular partition -- similar to how inserts into
partitioned tables only lock the target partition at execution time.
Or did I miss that inserts also need to lock all partitions up front
when global indexes are present, due to cross-partition uniqueness
checks?

Let me know if I’ve misunderstood the design.

So there difference is in the cases, where we are directly operating
on the leaf table, e.g. if you inserting directly on the leaf
relation, currently we just need to lock that partition, but if there
is global index we need to lock other siblings as well (in short all
the leaf under the parent which has global index) because if the
global index is unique we might need to check unique conflict in other
leafs as well. I believe when the table is partitioned, it might not
be the most preferred way to operate directly on the leaf, and with
global index only this case will be impacted where we are doing DML
directly on the leaf. I am not sure in this case how much delay we
can do in locking, because e.g. for insert we will only identify which
partition has a duplicate key while inserting in the btree.

Hmm, it’s my understanding that with a true global index, meaning a
single btree structure spanning all partitions, uniqueness conflicts
are detected by probing the index after inserting the tuple into the
heap. So unless we find a matching key in the index, there is no need
to consult any other partitions.

Even if a match is found, we only need to access the heap page for
that specific TID to check visibility, and that would involve just one
partition.

Why then do we need to lock all leaf partitions in advance? It seems
like we could defer locking until the uniqueness check identifies a
partition that actually contains a conflicting tuple, and only then
lock that one heap.

I understand that in some earlier floated ideas for enforcing global
uniqueness (perhaps only briefly mentioned in past discussions), the
approach was to scan all per-partition indexes, which would have
required locking everything up front. But with a unified global index,
that overhead seems avoidable.

Is there something subtle that I am missing that still requires
locking all heaps in advance?

--
Thanks, Amit Langote

#17

Dilip Kumar

dilipbalaut@gmail.com

6 months ago

In reply to: Amit Langote (#16)

Re: Proposal: Global Index for PostgreSQL

On Wed, Jul 2, 2025 at 7:18 PM Amit Langote <amitlangote09@gmail.com> wrote:

Just to clarify -- I was hoping that, at least for SELECTs, we
wouldn’t need to lock all leaf partitions up front.

One of the potential selling points of global indexes (compared to
partitioned indexes) is that we can avoid expand_partitioned_rtentry()
and the full scan path setup per partition, though that's admittedly
quite an undertaking. So I was imagining we could just lock the
parent and the global index during planning, and only lock individual
heap relations at execution time -- once we know which partition the
returned tuple belongs to.

Locking isn’t cheap -- and in workloads with thousands of partitions,
it becomes a major source of overhead, especially when the query
doesn't contain pruning quals and ends up touching only a few
partitions in practice. So I think it’s worth seeing if we can avoid
planning-time locking of all partitions in at least the SELECT case,
even if INSERTs may require broader locking due to uniqueness checks,
but see below...

I understand that this is currently necessary, given that a global
index scan is a single node without per-partition awareness. But it
might be worth considering whether the scan could opportunistically
defer heap relation locking until it returns a tuple that actually
belongs to a particular partition -- similar to how inserts into
partitioned tables only lock the target partition at execution time.
Or did I miss that inserts also need to lock all partitions up front
when global indexes are present, due to cross-partition uniqueness
checks?

Let me know if I’ve misunderstood the design.

So there difference is in the cases, where we are directly operating
on the leaf table, e.g. if you inserting directly on the leaf
relation, currently we just need to lock that partition, but if there
is global index we need to lock other siblings as well (in short all
the leaf under the parent which has global index) because if the
global index is unique we might need to check unique conflict in other
leafs as well. I believe when the table is partitioned, it might not
be the most preferred way to operate directly on the leaf, and with
global index only this case will be impacted where we are doing DML
directly on the leaf. I am not sure in this case how much delay we
can do in locking, because e.g. for insert we will only identify which
partition has a duplicate key while inserting in the btree.

Hmm, it’s my understanding that with a true global index, meaning a
single btree structure spanning all partitions, uniqueness conflicts
are detected by probing the index after inserting the tuple into the
heap. So unless we find a matching key in the index, there is no need
to consult any other partitions.

Even if a match is found, we only need to access the heap page for
that specific TID to check visibility, and that would involve just one
partition.

Why then do we need to lock all leaf partitions in advance? It seems
like we could defer locking until the uniqueness check identifies a
partition that actually contains a conflicting tuple, and only then
lock that one heap.

I understand that in some earlier floated ideas for enforcing global
uniqueness (perhaps only briefly mentioned in past discussions), the
approach was to scan all per-partition indexes, which would have
required locking everything up front. But with a unified global index,
that overhead seems avoidable.

Is there something subtle that I am missing that still requires
locking all heaps in advance?

Thanks Amit for raising this question. I understand your point, and I
think this is mainly about whether to lock all the tables upfront or
delay the locking and lock them when we need to. I remember the very
first patch when I implemented at least in case of unique checking I
was locking the table on the fly whenever we find the duplicate tuple
and there I had a discussion with Robert regarding the same. Here is
what was out thinking, although it's in my words and Robert might have
something else in mind but this is based on what I think he told.

Here's what I remember our thinking was at the time, though Robert
might recall it differently. We concluded that the code was designed
so the planner would upfront decide which tables were needed for the
plan, lock them, and keep them in the range table. If it was a
prepared plan, AcquireExecutorLock would handle locking all of them
before execution began. And then I changed the logic of lock on the
fly to locking everything upfront.

I'm now inclined to agree with your point. When dealing with multiple
partitions, especially when it's uncertain if all of them will even be
needed, the traditional approach of locking every potential partition
upfront isn't the most efficient strategy. But one problem is we will
only discovered which partitions to lock (whether doing scan or
checking unique while inserting in the index) only when we are in
Btree code, so as of now I am not finding it strategically correct to
lock and build the relation descriptor right then and there, I don't
have any reason why it would be bad but may be one reason is this
could be the first code to do it and another reason is if this has to
lock millions of partitions (in case we find matching tuple from
multiple partition during scan) then locking all of them on the fly
from index AM code might not be the best idea.

What's your thought on this? I would appreciate if @Robert Haas can
also share his thoughts.

--
Regards,
Dilip Kumar
Google

#18

Amit Langote

amitlangote09@gmail.com

6 months ago

In reply to: Dilip Kumar (#17)

Re: Proposal: Global Index for PostgreSQL

Hi Dilip,

Sorry for the late reply.

On Thu, Jul 3, 2025 at 1:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Jul 2, 2025 at 7:18 PM Amit Langote <amitlangote09@gmail.com> wrote:

Just to clarify -- I was hoping that, at least for SELECTs, we
wouldn’t need to lock all leaf partitions up front.

One of the potential selling points of global indexes (compared to
partitioned indexes) is that we can avoid expand_partitioned_rtentry()
and the full scan path setup per partition, though that's admittedly
quite an undertaking. So I was imagining we could just lock the
parent and the global index during planning, and only lock individual
heap relations at execution time -- once we know which partition the
returned tuple belongs to.

Locking isn’t cheap -- and in workloads with thousands of partitions,
it becomes a major source of overhead, especially when the query
doesn't contain pruning quals and ends up touching only a few
partitions in practice. So I think it’s worth seeing if we can avoid
planning-time locking of all partitions in at least the SELECT case,
even if INSERTs may require broader locking due to uniqueness checks,
but see below...

I understand that this is currently necessary, given that a global
index scan is a single node without per-partition awareness. But it
might be worth considering whether the scan could opportunistically
defer heap relation locking until it returns a tuple that actually
belongs to a particular partition -- similar to how inserts into
partitioned tables only lock the target partition at execution time.
Or did I miss that inserts also need to lock all partitions up front
when global indexes are present, due to cross-partition uniqueness
checks?

Let me know if I’ve misunderstood the design.

So there difference is in the cases, where we are directly operating
on the leaf table, e.g. if you inserting directly on the leaf
relation, currently we just need to lock that partition, but if there
is global index we need to lock other siblings as well (in short all
the leaf under the parent which has global index) because if the
global index is unique we might need to check unique conflict in other
leafs as well. I believe when the table is partitioned, it might not
be the most preferred way to operate directly on the leaf, and with
global index only this case will be impacted where we are doing DML
directly on the leaf. I am not sure in this case how much delay we
can do in locking, because e.g. for insert we will only identify which
partition has a duplicate key while inserting in the btree.

Hmm, it’s my understanding that with a true global index, meaning a
single btree structure spanning all partitions, uniqueness conflicts
are detected by probing the index after inserting the tuple into the
heap. So unless we find a matching key in the index, there is no need
to consult any other partitions.

Even if a match is found, we only need to access the heap page for
that specific TID to check visibility, and that would involve just one
partition.

Why then do we need to lock all leaf partitions in advance? It seems
like we could defer locking until the uniqueness check identifies a
partition that actually contains a conflicting tuple, and only then
lock that one heap.

I understand that in some earlier floated ideas for enforcing global
uniqueness (perhaps only briefly mentioned in past discussions), the
approach was to scan all per-partition indexes, which would have
required locking everything up front. But with a unified global index,
that overhead seems avoidable.

Is there something subtle that I am missing that still requires
locking all heaps in advance?

Thanks Amit for raising this question. I understand your point, and I
think this is mainly about whether to lock all the tables upfront or
delay the locking and lock them when we need to. I remember the very
first patch when I implemented at least in case of unique checking I
was locking the table on the fly whenever we find the duplicate tuple
and there I had a discussion with Robert regarding the same. Here is
what was out thinking, although it's in my words and Robert might have
something else in mind but this is based on what I think he told.

Here's what I remember our thinking was at the time, though Robert
might recall it differently. We concluded that the code was designed
so the planner would upfront decide which tables were needed for the
plan, lock them, and keep them in the range table. If it was a
prepared plan, AcquireExecutorLock would handle locking all of them
before execution began. And then I changed the logic of lock on the
fly to locking everything upfront.

I'm now inclined to agree with your point. When dealing with multiple
partitions, especially when it's uncertain if all of them will even be
needed, the traditional approach of locking every potential partition
upfront isn't the most efficient strategy. But one problem is we will
only discovered which partitions to lock (whether doing scan or
checking unique while inserting in the index) only when we are in
Btree code, so as of now I am not finding it strategically correct to
lock and build the relation descriptor right then and there, I don't
have any reason why it would be bad but may be one reason is this
could be the first code to do it and another reason is if this has to
lock millions of partitions (in case we find matching tuple from
multiple partition during scan) then locking all of them on the fly
from index AM code might not be the best idea.

What's your thought on this? I would appreciate if @Robert Haas can
also share his thoughts.

Thanks, I see that you've thought this through and opted for the safe
route of locking all possible partitions during planning, both for
SELECT and INSERT.

That seems like the only viable option today, given how the executor
assumes that each index scan targets a single heap relation which has
already been opened and locked before execution begins. But even so, I
think we should not assume locking is cheap. Even with improvements
like fast-path locking or a higher max_locks_per_transaction, locking
thousands of partitions still adds up. This can become noticeable even
in regular query execution, since one of the motivations for global
indexes is to reduce planning effort, for example by avoiding
per-partition scan node generation. In such cases, the relative cost
of locking can become a dominant part of query startup time.

I do not have a better solution right now, but it is worth keeping
this tradeoff in mind.

--
Thanks, Amit Langote

#19

Dilip Kumar

dilipbalaut@gmail.com

6 months ago

In reply to: Amit Langote (#18)

Re: Proposal: Global Index for PostgreSQL

On Fri, Jul 11, 2025 at 12:22 PM Amit Langote <amitlangote09@gmail.com> wrote:

Thanks, I see that you've thought this through and opted for the safe
route of locking all possible partitions during planning, both for
SELECT and INSERT.

That seems like the only viable option today, given how the executor
assumes that each index scan targets a single heap relation which has
already been opened and locked before execution begins. But even so, I
think we should not assume locking is cheap. Even with improvements
like fast-path locking or a higher max_locks_per_transaction, locking
thousands of partitions still adds up. This can become noticeable even
in regular query execution, since one of the motivations for global
indexes is to reduce planning effort, for example by avoiding
per-partition scan node generation. In such cases, the relative cost
of locking can become a dominant part of query startup time.

That's right

I do not have a better solution right now, but it is worth keeping
this tradeoff in mind.

I agree. Thanks for your opinion on this.

--
Regards,
Dilip Kumar
Google