[WIP]Vertical Clustered Index (columnar store extension) - take2

Started by Aya Iwata (Fujitsu)over 1 year ago141 messages

iwata.aya@fujitsu.com

over 1 year ago

Hi All,

Suggestions
==========

When analyzing real-time data collected by PostgreSQL,
it can be difficult to tune the current PostgreSQL server for satisfactory performance.
Therefore, we propose Vertical Clustered Indexing (VCI), an in-memory column store function that holds data in a state suitable for business analysis and is also expected to improve analysis performance.
With VCI, you can also expect to run analysis 7.8 times faster. This is achieved by the analytics engine, which optimizes parallel processing of column-oriented data, in addition to the fact that VCI stores data in a columnar format, enabling efficient retrieval of the columns needed for analysis.

Similar Features
============

One column store feature available with postgres is Citus Columnar Table.
If you introduces the citus extension, which allows columnar tables to be used using the columnar access method.
This function is intended to analyze the accumulated data. Therefore, you cannot update or delete data.
VCI supports data updates and deletions. This enables you to analyze not only the accumulated data but also the data that occurs in real time.

Implementing VCI
============

To introduce an updatable column store, we explain how updates to row-oriented data are propagated to column-oriented data.

VCI has two storage areas.

- Write Optimized Storage (WOS)
- Read Optimized Storage (ROS)

Describes WOS.
The WOS stores data for all columns in the VCI in a row-oriented format.
All newly added data is stored in the WOS relation along with the transaction information.
Using WOS to delete and update newly added data has no significant performance impact compared to deleting from columnar storage.

ROS is the storage area where all column data is stored.

When inserting/updating/deleting, data is written synchronously to WOS. It does not compress or index the data.
This avoids the overhead of converting to a columnar while updating the data.
After a certain amount of data accumulates in the WOS, the ROS control daemon converts it to column data asynchronously with updates.
Column data transformation compresses and indexes the data and writes it to ROS.

Describes searching for data.
Since there are two storage formats, the SELECT process needs to convert the WOS data to local ROS to determine whether it is visible or invisible. This conversion cost depends on the number of tuples present in the WOS file. This may introduce some performance overhead.
Obtain search results by referencing the local ROS and referencing the ROS in parallel.

These implementation ideas are also posted on Fujitsu's blog for your reference. [1]https://www.postgresql.fastware.com/blog/improve-data-analysis-performance-without-impacting-business-transactions-with-vertical-clustered-index

Past discussions
===========

We've proposed features before. [2]/messages/by-id/CAJrrPGfaC7WC9NK6PTTy6YN-NN+hCy8xOLAh2doYhVg5d6HsAA@mail.gmail.com
This thread also explains the details, so please check it.

In a previous thread, we suggested implementing a modification to the PostgreSQL backend code.
Based on the FB we received at that time, we think we need to re-implement this feature in pluggable storage using the table access method API.
I also got a FB of the features I needed from a PostgreSQLPro member. We believe it is necessary to deal with these issues in stages.
- Need to provide vector processing for nodes (filter, grand aggregate, aggregation with group by...) to speed up computation
- Requires parallel processing support such as scanning

It is assumed that the re-implementation will also require additional functionality to the current Table Access Method API.
It is useful not only for VCI but also for other access methods.
Therefore, we decided to propose the VCI feature to the community and proceed with development.

Request matter
===========

Are members of the PostgreSQL hackers interested in VCI features?
We welcome your comments and suggestions on this feature.
In particular, if you have any questions, required features, or implementations, please let me know.

[1]: https://www.postgresql.fastware.com/blog/improve-data-analysis-performance-without-impacting-business-transactions-with-vertical-clustered-index

[2]: /messages/by-id/CAJrrPGfaC7WC9NK6PTTy6YN-NN+hCy8xOLAh2doYhVg5d6HsAA@mail.gmail.com

Regards,
Aya Iwata

FUJITSU LIMITED

Alvaro Herrera

alvherre@2ndquadrant.com

about 1 year ago

In reply to: Aya Iwata (Fujitsu) (#1)

Re: [WIP]Vertical Clustered Index (columnar store extension) - take2

Hello,

I came across this email by chance while looking for something else.

On 2024-Oct-07, Aya Iwata (Fujitsu) wrote:

Therefore, we propose Vertical Clustered Indexing (VCI), an in-memory
column store function that holds data in a state suitable for business
analysis and is also expected to improve analysis performance.

With VCI, you can also expect to run analysis 7.8 times faster. This
is achieved by the analytics engine, which optimizes parallel
processing of column-oriented data, in addition to the fact that VCI
stores data in a columnar format, enabling efficient retrieval of the
columns needed for analysis.

Wow.

Request matter
===========

Are members of the PostgreSQL hackers interested in VCI features?
We welcome your comments and suggestions on this feature.
In particular, if you have any questions, required features, or implementations, please let me know.

I think this is definitely an important and welcome development.
I look forward to your patches in this area.

Thank you,

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Essentially, you're proposing Kevlar shoes as a solution for the problem
that you want to walk around carrying a loaded gun aimed at your foot.
(Tom Lane)

Yura Sokolov

y.sokolov@postgrespro.ru

about 1 year ago

In reply to: Aya Iwata (Fujitsu) (#1)

Re: [WIP]Vertical Clustered Index (columnar store extension) - take2

07.10.2024 17:53, Aya Iwata (Fujitsu) wrote:

Hi All,

Suggestions

==========

When analyzing real-time data collected by PostgreSQL,

it can be difficult to tune the current PostgreSQL server for
satisfactory performance.

Therefore, we propose Vertical Clustered Indexing (VCI), an in-memory
column store function that holds data in a state suitable for business
analysis and is also expected to improve analysis performance.

I just don't get, why it should be "in-memory"? All the same things you
describe further, but storing in paged index on-disk with caching
through shared_buffers - why this way it wouldn't work?

Aya Iwata (Fujitsu)

iwata.aya@fujitsu.com

11 months ago

In reply to: Alvaro Herrera (#2)

RE: [WIP]Vertical Clustered Index (columnar store extension) - take2

Hi Alvaro san,

I am sorry for my late reply. I continue to work on proposing VCI feature to the community.

I think this is definitely an important and welcome development.
I'm looking forward to patches in this area.

Thank you!
I am currently preparing to share VCI designs with PGConf.dev.
I look forward to sharing more about VCI with you.

Best regards,
Aya Iwata
FUJITSU LIMITED

Aya Iwata (Fujitsu)

iwata.aya@fujitsu.com

11 months ago

In reply to: Yura Sokolov (#3)

RE: [WIP]Vertical Clustered Index (columnar store extension) - take2

Hi Yura san,

I just don't get, why it should be "in-memory"? All the same things you
describe further, but storing in paged index on-disk with caching
through shared_buffers - why this way it wouldn't work?

We make the columnar store resident in memory for maximum search performance.
But I'm not very particular about this. Comments are welcome.

Best regards,
Aya Iwata
FUJITSU LIMITED

Show quoted text

-----Original Message-----
From: Yura Sokolov <y.sokolov@postgrespro.ru>
Sent: Wednesday, January 15, 2025 11:44 PM
To: pgsql-hackers@lists.postgresql.org
Subject: Re: [WIP]Vertical Clustered Index (columnar store extension) - take2

07.10.2024 17:53, Aya Iwata (Fujitsu) wrote:

Hi All,

Suggestions

==========

When analyzing real-time data collected by PostgreSQL,

it can be difficult to tune the current PostgreSQL server for
satisfactory performance.

Therefore, we propose Vertical Clustered Indexing (VCI), an in-memory
column store function that holds data in a state suitable for business
analysis and is also expected to improve analysis performance.

I just don't get, why it should be "in-memory"? All the same things you
describe further, but storing in paged index on-disk with caching
through shared_buffers - why this way it wouldn't work?

Yura Sokolov

y.sokolov@postgrespro.ru

11 months ago

In reply to: Aya Iwata (Fujitsu) (#5)

Re: [WIP]Vertical Clustered Index (columnar store extension) - take2

08.04.2025 13:29, Aya Iwata (Fujitsu) wrote:

Hi Yura san,

I just don't get, why it should be "in-memory"? All the same things you
describe further, but storing in paged index on-disk with caching
through shared_buffers - why this way it wouldn't work?

We make the columnar store resident in memory for maximum search performance.
But I'm not very particular about this. Comments are welcome.

I just wanted to say: there is no need to be super fast.
There is the need to be remarkably faster than it is now.

ClickHouse, DuckDB, Vertica - they are not in-memory, they are disk based.
But they are very fast.
If PostgreSQL will be just as twice slower as ClickHouse, it will be very
great! Most of users will not setup ClickHouse at all then, because twice
slower is still very fast.

Databases could be very huge. Even when they are in "columnar" format,
which usually consumes less space. And memory is still costs more than disk
space.

Certainly there are users who think they need "in-memory". But the truth is
very few of them really need "in-memory".

All of this is just my opinion. I could be wrong.

-----Original Message-----
From: Yura Sokolov <y.sokolov@postgrespro.ru>
Sent: Wednesday, January 15, 2025 11:44 PM
To: pgsql-hackers@lists.postgresql.org
Subject: Re: [WIP]Vertical Clustered Index (columnar store extension) - take2

07.10.2024 17:53, Aya Iwata (Fujitsu) wrote:

Hi All,

Suggestions

==========

When analyzing real-time data collected by PostgreSQL,

it can be difficult to tune the current PostgreSQL server for
satisfactory performance.

Therefore, we propose Vertical Clustered Indexing (VCI), an in-memory
column store function that holds data in a state suitable for business
analysis and is also expected to improve analysis performance.

I just don't get, why it should be "in-memory"? All the same things you
describe further, but storing in paged index on-disk with caching
through shared_buffers - why this way it wouldn't work?

--
regards
Yura Sokolov aka funny-falcon

Aya Iwata (Fujitsu)

iwata.aya@fujitsu.com

10 months ago

In reply to: Yura Sokolov (#6)

RE: [WIP]Vertical Clustered Index (columnar store extension) - take2

Hello,

We created VCI patch.
This patch is a prototype. The build and postgres regression tests have passed,
but there are still some bugs.

This patch includes the following VCI implementations:

- Changes to PostgreSQL core code
- Conversion of WOS to ROS
-- The source code for the conversion part exists. However, there is a slight
source code problem. To avoid this problem,
the vci.enable_ros_control_damon parameter now defaults to false. This change
prevents the ROS control daemon from starting at instance startup.
Background workers that convert WOS to ROS are not created because there is
no ROS control daemon.
- Convert to Local ROS
- Custom Plan VCI Scan
- Data transformation by background workers
- Minimum test
- Minimum documentation

This patch provides the following steps to use VCI:

1. Create Table
2. Insert data into the table
3. Create VCI
4. Run VCI Scan

The functions we will be developing are described in the wiki,
so please refer to it.
https://wiki.postgresql.org/wiki/Vertical_clustered_index

Regards,
Aya Iwata
Fujitsu Limited

Aya Iwata (Fujitsu)

iwata.aya@fujitsu.com

10 months ago

In reply to: Aya Iwata (Fujitsu) (#7)

RE: [WIP]Vertical Clustered Index (columnar store extension) - take2

Hello,

I found some dead codes in 0001 patches. I removed.
Here are new patches.

Regards,
Aya Iwata
Fujitsu Limited

Aya Iwata (Fujitsu)

iwata.aya@fujitsu.com

10 months ago

In reply to: Aya Iwata (Fujitsu) (#8)

Re: [WIP]Vertical Clustered Index (columnar store extension) - take2

Hi hackers,

I will share the notes on the discussions in PGConf.dev. Thanks all for participation.

Feedback on the Advanced Patch Feedback Session;

- A Basic idea that allows both OLTP/OLAP workloads on the same table is OK
- Buffering mechanism has already been used by existing code GIN, IIUC
- IAM seems a more proper approach than TAM
- One reason is that VCI can only optimize the data lookup stuff
- TAM needs all possible queries; it's too much
- Divide the patch more and more
- to allow committers to consider pros and cons and push one by one
- 15 patches is the maximum amount
- Separate features to some committable group

Sawada san suggested several points;
1. Find parts that are useful not only for the VCI and dispatching new threads
2. Develop codes incrementally. E.g., VCI has a custom scan, but we may
be able to give up on the first version

Also, I have questions and advice below on May 16th.

- How to handle visibility?
- What if transactions that insert tuples are aborted?
- VCI vs Index Only Scan
- VCI seems the same as Index Only Scan
- VCI has compression. So, is the amount of size better than the Index?
- How about the efficiency for storage? How much data can we store?
- using TAM (Alvalo's suggestion)
- Most of the code can be ported from the heap. ROS and WOS can exist as forks
of main files, like .vm and .fm. API can be like:

```
CREATE TABLE foo (id, product, price) USING vci WITH (to_be_indexed=price);
```

Again, I really appreciate your efforts.

Regards,
Aya Iwata
Fujitsu Limited

#10

Tomas Vondra

tomas.vondra@2ndquadrant.com

10 months ago

In reply to: Aya Iwata (Fujitsu) (#9)

Re: [WIP]Vertical Clustered Index (columnar store extension) - take2

Hello Iwata-san,

Thanks for presenting the patch at pgconf.dev last week, and for the
discussions at the APFS session. As promised I did a quick review of the
patch this week, trying to understand the design, the trade-offs etc.

I'm still trying to understand the patch, so please correct me when I
got something wrong.

Before I get to the review comments (mostly about the first part) and
questions, let me comment on a couple items from your earlier response:

On 5/17/25 18:31, Aya Iwata (Fujitsu) wrote:

Hi hackers,

I will share the notes on the discussions in PGConf.dev. Thanks all for participation.

Feedback on the Advanced Patch Feedback Session;

- A Basic idea that allows both OLTP/OLAP workloads on the same table is OK
- Buffering mechanism has already been used by existing code GIN, IIUC

Yes. GIN has a "fastupdate" feature, which allows accumulating entries
in a buffer, and merging them into the index later (as a batch, which
makes it more efficient). This is similar to the WOS/ROS architecture.

AFAIK some other columnar databases use a similar WOS/ROS architecture,
with similar goal (to move the index updates into a background job, and
make it more efficient).

I expect there to be a trade-off between the WOS size and query
performance. Larger WOS allows larger updates / better efficiency, but
it likely also means slower queries (that also have to scan the WOS).

- IAM seems a more proper approach than TAM
- One reason is that VCI can only optimize the data lookup stuff
- TAM needs all possible queries; it's too much

I believe this (IAM vs. TAM) is very much an open question.

At the conference the goal of the patch was explained as a way to
improve performance with analytical queries, while maintaining OLTP
performance and features. From this point of view, it makes sense to
consider VCI to be an "accelerator" feature. Which is what indexes do,
i.e. it's a way to speed-up queries, but we can do without them if
that's faster.

If VCI got implemented as TAM, i.e. serving as the primary storage, we
wouldn't have this option - we'd have to do VCI even for OLTP workloads
(e.g. random inserts / selects), and that sounds quite inefficient. Not
to mention the VCI would need to support everything heap does, and it
supports a lot. VCI doesn't even support arbitrary data types, etc.

But this hinges on the assumption that the VCI can be implemented as an
IAM, right? And that IAM is a good fit for the stuff VCI needs to do.
Because if it can't be implemented as IAM, then the simplicity argument
is quite moot. I'm not saying it can't be done as IAM, I don't have a
clear opinion on that yet.

Another reason to think about this is that after the talk, someone in
the audience pointed out this patch originally started on Postgres 9.5,
and that back then there was no TAM interface. So when implementing it
there was no IAM vs. TAM choice, IAM was the only way to do something
like this, and thus it was used. Maybe if TAM was available back then,
it'd be done differently? Not sure.

Also, Alvaro seemed to think TAM is the way to go, and in order to keep
the OLTP performance he suggested to use both heap and VCI at the same
time, in different "forks". I'm not sure how would that work, or if we
can already do that - AFAIK we can't, because ForkNumber does not allow
adding custom forks. We'd have to relax that, or invent some sort of
federated TAM (that just multiplexes it to two TAMs). Maybe.

But it's not like the IAM approach doesn't need to do this. The first
patch had to add stuff to a lot of random places to make this work. And
some of the places touch stuff that we don't expect indexes to worry
about, like ALTER TABLE, etc.

- Divide the patch more and more
- to allow committers to consider pros and cons and push one by one
- 15 patches is the maximum amount
- Separate features to some committable group

Yes. Both parts need to be divided into small (but meaningful) pieces. I
don't recall where the "15 patches" came from, I don't think there's
some sort of maximum number of patches. All I care about is that the
patches need to be small-ish, and still make sense. Ideally, each part
should be committable on it's own. This makes it easier to test/review,
but also easier to agree on individual parts - in which case we may
commit the "preparatory" pieces and not have to rebase them forever.

The 0001 part is ~83K, which is not entirely crazy, but I can easily
imagine it split into ~10 parts, each addressing a different subset. For
0002 it's even more important, because no one can review a 1.3MB patch.

Sawada san suggested several points;
1. Find parts that are useful not only for the VCI and dispatching new threads
2. Develop codes incrementally. E.g., VCI has a custom scan, but we may
be able to give up on the first version

Yes, I think this makes perfect sense.

I'd probably keep it in a single thread (essentially as a single patch
series), because it's very useful to see the "bigger picture" why the
parts are important. But if you find something that can stand on it's
own, then sure. But you'll still need to keep the part in the main patch
series (it should not be necessary for people to collect patches from
different threads, and it would also break the CI testing commitfests
patches).

Breaking 0002 into smaller parts makes perfect sense. Something like:

- minimum working patch, with only serial sequential scan
- add parallel sequential scan
- add other custom scans, one by one ...
- ...

Also, I have questions and advice below on May 16th.

- How to handle visibility?
- What if transactions that insert tuples are aborted?
- VCI vs Index Only Scan
- VCI seems the same as Index Only Scan
- VCI has compression. So, is the amount of size better than the Index?
- How about the efficiency for storage? How much data can we store?
- using TAM (Alvalo's suggestion)
- Most of the code can be ported from the heap. ROS and WOS can exist as forks
of main files, like .vm and .fm. API can be like:

```
CREATE TABLE foo (id, product, price) USING vci WITH (to_be_indexed=price);
```

Yes, I think those questions are important. Not only for understanding
the current patch, but also to understand what trade-offs the solution
can make (e.g. about the consistency/visibility model), etc.

What I really miss in the current patch is some sort of READMEs with

- high-level design of the VCI indexes

- description of the consistency/visibility model (does it behave the
same way as querying the heap, can it be out of sync for, ...)

- WOS/ROS architecture (when are rows inserted into WOS, promoted into
ROS, what triggers that, ...)

- what's the in-memory / on-disk format

- some places in the patch mention "internal VCI tables" but I have no
idea what that is

- how does the execution work? compression? crucial points to consider
for optimal performance, etc.

- limitations (temporary - can be relaxed in the future, permanent -
inherent to the columnar design) and trade-offs

- what are the various custom scan executor nodes

- what "background" processes happen (custom workers, ...)

- anything else substantial for understanding the design

Maybe there's some of this in the 0002 patch, but I haven't stumbled
over it so far.

------------------------------------------------------------------------

Finally, some review comments about the 0001 patch - it's mostly in the
order as appearing in git diff, file by file. Some of the points repeat,
because multiple places made me think about the same thing. And some of
it was mentioned above.

- It seems the VCI is implemented as an index AM, but it seems to be
trying to do some TAM stuff, e.g. with visibility info etc. For example,
I don't think indexes generally care about XIDs that deleted rows, etc.
Similarly, I can't think of another index AM that'd try to hook into
tablecmds.c or execExpr.

- Not sure what's the right choice (makes sense to have it as index on
top of heap, so that we do regular stuff for OLTP queries, and columnar
stuff using the index for analytics). But it seems it does stuff
expected from table AM and not index AM.

- But making a table AM would make it much more demanding (it'd have to
support storing all data types, ...) and would hurt OLTP performance.

- I very much support the idea of implementing this as an extension (so
minimal infrastructure changes to core + extension in contrib or
entirely outside).

- But the patch is not there, not even with 0001, because there's a
number of places mentioning "vci". There should be no reference to VCI
in core, it should be generic extensibility infrastructure. It's a good
indication of places that may need some extensibility.

relation_open
------------------
- Why is this change (disabling assert enforcing proper locking) needed?
There's a comment in lock.c which mentions VCI parallel workers, but
it's suspicious.

reloptions.c
------------------
- Changes that should be moved to the contrib module Why should in-core
reloptions know about this? See for example how "contrib/bloom" defined
reloptions.

heapam.c
------------------
- add_index_delete_hook, not sure I understand the purpose of this,
likely related to the consistency model, needs explanation. Why is it
prefixed with "add_"?

- If we need such DELETE hook, heapam.c is not where to call it, it
should be called from the same "layer" adding tuples to indexes (which
certainly does not happen in heap_insert).

- Furthermore, if we need this, why should it be a hook and not a new
callback in IndexAmRoutine, just like aminsert()?

initscan
------------------
- Isn't it using keep_startblock in a wrong way? it certainly is not
meant to indicate "keep the old number of blocks"

- Why is this even needed? This is in heapam, so why would this even
matter for VCI, which is an index? Shouldn't that be mostly independent
from which TAM is used by the table?

heapam_handler.c
------------------
- Why does this need to preserve the IndexHeapTuple at all? And why
should it be done in heapam_handler? Again, shouldn't IAM be mostly
independent of this?

- Using a global variable does not seem great. We probably can't have
concurrent calls (only a single heapam_index_build_range_scan call at a
time), but globals are dubious.

- If VCI really needs to remember the tuple, there should be a "proper"
way to explicitly pass it (adjust the IndexAmRoutine callback to have a
new argument or something like that).

heapam_visibility.c
-------------------
- It's probably sensible to make visibility snapshots extensible, i.e.
to allow defining custom snapshot types etc. But the patch still does
that by hard-coding stuff in core, even if the actual "visibility" code
is in extension.

- But this doesn't really allow custom snapshot types, those are still
hardcoded SNAPSHOT_VCI_WOS2ROS / SNAPSHOT_VCI_LOCALROS.

- The add_snapshot_satisfies_hook does not seem like the way to make
this extensible - what if two extensions want to do this. First
HeapTupleSatisfiesVisibility would have to know about all the
extensions, and it could easily lead to infinite loop. Because we call
the first hook, and it calls HeapTupleSatisfiesVisibility again.

- This is probably a good example of VCI trying to do TAM stuff. TAM
interface has tuple_satisfies_snapshot, which seems to have the same
purpose as the hook. Maybe not (it still assumes regular snapshots).

- But I still don't understand why should a new index AM require changes
to how heap AM does visibility checks (or any visibility checks in
general, as indexes don't have visibility info in PG). And why would it
make sense for heap to know about this. IAM is the layer *above* TAM, in
a way.

xlogfuncs.c
------------------
- RECOVERY_VCI_PAUSE_REQUESTED - what's that for? why do we need a
separate state from RECOVERY_PAUSE_REQUESTED?

xlogrecovery.c
------------------
- I don't see why we should not log that the recovery is paused because
of VCI. I sure users would still like to know why it's paused:

/* If pause requested by VCI, the log is not output. */
else if (GetRecoveryPauseState() != RECOVERY_VCI_PAUSE_REQUESTED)

dependency.c
------------------
- add_drop_relation_hook raises mostly the same questions as
add_index_delete_hook

- Don't we already have event triggers to do this kind of stuff in an
extensible way?

- Why should an index even care about the table getting dropped? I mean,
it'll get dropped too, before the table. So why this?

index.c
------------------

- Why do we need add_reindex_index_hook at all? we already have ambuild
callback, right?

- It seems really suspicious ConstructTupleDescriptor gains the ability
to index system attributes, why is that needed? surely the VCI does not
need to index system attributes, right? Or why is that needed?

- It seems add_reindex_index_hook is used to "skip" rebuilding the
index, but why would it be useful/necessary, and why should it be done here?

- Why couldn't ambuild() just return? How do I rebuild the index if
something goes wrong (e.g. it gets corrupted). Or to remove bloat, if
that applies to VCI.

explain.c
------------------
- T_CustomPlanMarkPos - so this introduces a new type of custom plan
node? How is that different from CustomScan? We don't even have MarkPos
plan in core, right? So what does it do?

- It even appears to be exactly the same as CustomScan in most places,
but it also introduces some new callbacks (ExplainCustomPlanTargetRel
and SetBoundCustomScan).

- If useful, this should be a separate patch (custom scan improvements).
And it definitely needs to update the CustomScan docs in SGML.

- I don't understand why we need ExplainPropertySortGroupKeys, does not
align with the rest of the API (ExplainProperty is for different data
types - float, text, ...). Maybe expose show_sort_group_keys? Or just
copy show_sort_group_keys into the extension ...

indexcmds.c
------------------
- Yet another place referencing "vci" explicitly (not extensible, should
not be in core)

- Not sure I understand the limitations mentioned there.

tablecmds.c
------------------
- add_alter_tablespace_hook, add_alter_table_change_owner_hook,
add_alter_table_change_schema_hook

- Why does VCI this even need to handle these actions?

- And we have event triggers, so why have these new hooks?

- Not sure what the code in ATExecSetRelOptions does (another example
that handles "vci" AM explicitly, in a non-extensible way).

- Is it maybe rebuilding some reloptions? Maybe it's about the attnums?

vacuum.c
------------------
- add_skip_vacuum_hook is used to "disable" vacuum for internal VCI tables.

- Why. And what is "internal table"?

execAmi.c
------------------
- what's T_CustomPlanMarkPos?

execExpr.c
------------------
- Why does this expose ExecReadyExpr, ExecCreateExprSetupSteps?

- Isn't this a bit too fragile?

- Probably needed because VciExecInitExpr needs a new argument? But why
does it need that?

execExprInterp.c
------------------
- Why does this need ExprEvalVar_hook and ExprEvalParam_hook?

- Doesn't even check if the hooks are != NULL

- CASE_EEOP_VCI_VAR, CASE_EEOP_VCI_PARAM_EXEC - not extensible, should
not reference VCI explicitly

- VciExprEvalVarHook (for llvmjit?)

execGrouping.c
------------------
- LookupTupleHashEntry_internal removes comment, why?

execProcnode.c, execScan.c
------------------
- New MarkPos node, apparetly, but why do we need that? Why can't we
just build the Materialize when constructing the plan?

execUtils.c
------------------
- again, disables locking check, very suspicious

nodeCustom.c
------------------
- Isn't it strange that ExecInitCustomScan short-circuits the init,
skipping most of the initialization for selected scans?

- If anything, it should not hard-code the scan names like this. Why is
this necessary? is the rest of the initialization irrelevant, or what?
Maybe this is sign it should be table AM rather than index AM?

- But IOS scans would also do this initialization, so ...

nodeModifytable.c
------------------
- add_should_index_insert_hook another strange hook

- Doesn't the hook apply to all relations all the time, not just those
with VCI indexes etc.?

gen_node_support.c
------------------
- So is CustomPlanMarkPos a proper node, or why do we need it? why is
CustomPlan not enough?

- What's SmcAllocSetContext for? Doesn't seem to be mentioned anywhere.

params.c
------------------
- ExecInitParam_hook seems unused, no?

allpaths.c
------------------
- set_plain_rel_pathlist - I don't understand why this checks the VCI
vs. parallelism like this, what it's trying to prevent. Maybe if you
don't want certain plan shape, don't generate the partial paths in that
case from the CustomScan planner callback?

- It seems this tries to allow parallel seqscan, but not other more
complex parallel scans (like parallel agg/join ...), but that could be
done in callback, no?

- Again, we should not reference the index AM this explicitly, it's not
extensible.

createplan.c
------------------
- I don't think we want to make copy_plan_costsize part of API. Just a
detail, though.

autovacuum.c
------------------
- Why does this need a custom autovacuum launcher, or what's the purpose
of this code? is there a special autovacuum launcher for VCI indexes?

- Again, explicit references to "VCI", should not be done like this.

bgworker.c
------------------
- I have no idea what's the point of these changes. seems to be enabling
canceling bgworkers, but that seems like a generic improvement. In which
case it should be a separate patch.

- No idea if it's correct etc.

- Seems to be done to allow CountOtherDBBackends() to cancel backends,
so that it can be dropped etc.

- There apparently are bgworkers doing cleanup for VCI, and we want to
treat them like autovacuum workers (which get interrupted in various
cases instead of waiting forever).

- But this will cancel all bgworkers, no? Not just the VCI ones ...

lock.c
------------------
- this seems a bit strange, surely this should use the same locking
approach as other parallel queries

/*
* Don't lock for VCI parallel workers and other type of workers should go
* in normal flow, In case if there is any change in background worker
* name for VCI parallel workers, the following code also needs an update.
* FIXME: Try to use the community parallelism code, so that we don't need
* our own VCI parallel infrastructure.
*/
if (AmBackgroundWorkerProcess() && strstr(MyBgworkerEntry->bgw_name,
"backend="))

- Presumably this is why some of the Asserts on locking were commented
out earlier.

lwlock.c
------------------
- Vci_lwlock_kind seems unused.

timestamp.c
------------------
- I believe the IntervalAggState is internal on purpose, why should it
be made external? It also means timestamp.h has to worry about FRONTEND.

relcache.c
------------------
- add_skip_vci_index_hook - yet another hook, same questions as for the
other hooks earlier

- It seems suspicious we have to skip VCI indexes like this during
planning in RelationGetIndexAttrBitmap, not sure why that's needed? Why
should't we cost the index and maybe reject it based on that?

- Again, references VCI explicitly, so not extensible.

mctx.c
------------------
- why MemoryContextMethods shouldn't be "const"?

common.c
pg_dump.c
pg_dump.h
------------------
- isvciview breaks extensibility, tied to a particular IAM / extension.
Needs to be abstracted / made extensible.

- What is "vci view table" for, actually?

reloptions.h
------------------
- Why does this need to add an entry into the relopt_kind enum? It
should be extensible, see the "contrib/bloom".

xlogrecovery.h
------------------
- Why is RECOVERY_VCI_PAUSE_REQUESTED separate from
RECOVERY_PAUSE_REQUESTED?

timestamp.h
------------------
- The FRONTEND stuff is a sign making IntervalAggState public may not be
a good idea.

executor.h
------------------
- How come it's suddenly OK to return entry->additional without the if
conditions in TupleHashEntryGetAdditional?

- Ah, it's now stored as a separate palloc chunk, not right after the
tuple? Well, maybe that's a separately useful improvement? should be a
separate patch, then. No opinion on whether it's a good idea.

- What's with the struct LimitState; ? Why is it needed?

extensible.h
------------------
- Adds SetBoundCustomScan / ExplainCustomPlanTargetRel, but then it
should also add them to the SGML docs.

memnodes.h
------------------
- What's SmcAllocSetContext? It's not defined anywhere.

params.h
------------------
- Adds noNeedLoadFromMain, loadedFromMain to ParamExecData, but it
doesn't seem to be used anywhere (in either patch).

plannodes.h
------------------
- Adds plan_no to Plan, but it's not set anywhere in core, only in the
VCI extension - which seems wrong (e.g. what if there are two extensions
that want to set it, and they disagree?)

- Either if we want/need this, it should be initialized by core during
planning, in a non-ambiguous way.

- But also how is this different from the existing plan_node_id?

autovacuum.h
------------------
- It seems a bit strage to add "built-in" bgworkers for an extension. If
the extension really needs some sort of workers, it should be defined in
the extension.

bgworker.h
------------------
- So what's the problem with canceling bgworkers? Should this be done
as a separate patch?

itemptr.h
------------------
- Why do we need this SizeOfIptrData definition? How is it different
from sizeof(ItemPointerData)?

lwlock.h
------------------
- LWLockKind / Vci_lwlock_kind seems unused

numeric.h
------------------
- Why does this expose NumericVar, when the second patch redefines the
struct anyway? also, isn't that fragile? although, we're unlikely to
tweak the NumericVar definition.

rel.h
------------------
- Again, not extensible way to do stuff in an extension.

- We should not have pieces specific to AM in StdRdOptions.

- Doesn't AM have custom reloptions now? Like contrib/bloom?

I also tried to build 0002, but unfortunately that fails with:

executor/vci_executor.c: In function ‘vci_setup_executor_hook’:
executor/vci_executor.c:115:28: error: assignment to
‘ExecutorStart_hook_type’ {aka ‘void (*)(QueryDesc *, int)’} from
incompatible pointer type ‘_Bool (*)(QueryDesc *, int)’
[-Wincompatible-pointer-types]
115 | ExecutorStart_hook = vci_executor_start_routine;
| ^
executor/vci_executor.c: In function ‘vci_executor_start_routine’:
executor/vci_executor.c:161:28: error: void value not ignored as it
ought to be
161 | plan_valid = executor_start_prev(queryDesc, eflags);
| ^
executor/vci_executor.c:163:28: error: void value not ignored as it
ought to be
163 | plan_valid = standard_ExecutorStart(queryDesc,
eflags);
| ^
make: *** [../../src/Makefile.global:973: executor/vci_executor.o] Error 1

The extension is not added to contrib/Makefile, so "make world" does not
trigger this failure.

regards

--
Tomas Vondra

[WIP]Vertical Clustered Index (columnar store extension) - take2

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: