[WIP]Vertical Clustered Index (columnar store extension) - take2
Hi All,
Suggestions
==========
When analyzing real-time data collected by PostgreSQL,
it can be difficult to tune the current PostgreSQL server for satisfactory performance.
Therefore, we propose Vertical Clustered Indexing (VCI), an in-memory column store function that holds data in a state suitable for business analysis and is also expected to improve analysis performance.
With VCI, you can also expect to run analysis 7.8 times faster. This is achieved by the analytics engine, which optimizes parallel processing of column-oriented data, in addition to the fact that VCI stores data in a columnar format, enabling efficient retrieval of the columns needed for analysis.
Similar Features
============
One column store feature available with postgres is Citus Columnar Table.
If you introduces the citus extension, which allows columnar tables to be used using the columnar access method.
This function is intended to analyze the accumulated data. Therefore, you cannot update or delete data.
VCI supports data updates and deletions. This enables you to analyze not only the accumulated data but also the data that occurs in real time.
Implementing VCI
============
To introduce an updatable column store, we explain how updates to row-oriented data are propagated to column-oriented data.
VCI has two storage areas.
- Write Optimized Storage (WOS)
- Read Optimized Storage (ROS)
Describes WOS.
The WOS stores data for all columns in the VCI in a row-oriented format.
All newly added data is stored in the WOS relation along with the transaction information.
Using WOS to delete and update newly added data has no significant performance impact compared to deleting from columnar storage.
ROS is the storage area where all column data is stored.
When inserting/updating/deleting, data is written synchronously to WOS. It does not compress or index the data.
This avoids the overhead of converting to a columnar while updating the data.
After a certain amount of data accumulates in the WOS, the ROS control daemon converts it to column data asynchronously with updates.
Column data transformation compresses and indexes the data and writes it to ROS.
Describes searching for data.
Since there are two storage formats, the SELECT process needs to convert the WOS data to local ROS to determine whether it is visible or invisible. This conversion cost depends on the number of tuples present in the WOS file. This may introduce some performance overhead.
Obtain search results by referencing the local ROS and referencing the ROS in parallel.
These implementation ideas are also posted on Fujitsu's blog for your reference. [1]https://www.postgresql.fastware.com/blog/improve-data-analysis-performance-without-impacting-business-transactions-with-vertical-clustered-index
Past discussions
===========
We've proposed features before. [2]/messages/by-id/CAJrrPGfaC7WC9NK6PTTy6YN-NN+hCy8xOLAh2doYhVg5d6HsAA@mail.gmail.com
This thread also explains the details, so please check it.
In a previous thread, we suggested implementing a modification to the PostgreSQL backend code.
Based on the FB we received at that time, we think we need to re-implement this feature in pluggable storage using the table access method API.
I also got a FB of the features I needed from a PostgreSQLPro member. We believe it is necessary to deal with these issues in stages.
- Need to provide vector processing for nodes (filter, grand aggregate, aggregation with group by...) to speed up computation
- Requires parallel processing support such as scanning
It is assumed that the re-implementation will also require additional functionality to the current Table Access Method API.
It is useful not only for VCI but also for other access methods.
Therefore, we decided to propose the VCI feature to the community and proceed with development.
Request matter
===========
Are members of the PostgreSQL hackers interested in VCI features?
We welcome your comments and suggestions on this feature.
In particular, if you have any questions, required features, or implementations, please let me know.
[2]: /messages/by-id/CAJrrPGfaC7WC9NK6PTTy6YN-NN+hCy8xOLAh2doYhVg5d6HsAA@mail.gmail.com
Regards,
Aya Iwata
FUJITSU LIMITED
Hello,
I came across this email by chance while looking for something else.
On 2024-Oct-07, Aya Iwata (Fujitsu) wrote:
Therefore, we propose Vertical Clustered Indexing (VCI), an in-memory
column store function that holds data in a state suitable for business
analysis and is also expected to improve analysis performance.
With VCI, you can also expect to run analysis 7.8 times faster. This
is achieved by the analytics engine, which optimizes parallel
processing of column-oriented data, in addition to the fact that VCI
stores data in a columnar format, enabling efficient retrieval of the
columns needed for analysis.
Wow.
Request matter
===========Are members of the PostgreSQL hackers interested in VCI features?
We welcome your comments and suggestions on this feature.
In particular, if you have any questions, required features, or implementations, please let me know.
I think this is definitely an important and welcome development.
I look forward to your patches in this area.
Thank you,
--
Ãlvaro Herrera Breisgau, Deutschland â https://www.EnterpriseDB.com/
Essentially, you're proposing Kevlar shoes as a solution for the problem
that you want to walk around carrying a loaded gun aimed at your foot.
(Tom Lane)
07.10.2024 17:53, Aya Iwata (Fujitsu) wrote:
Hi All,
Suggestions
==========
When analyzing real-time data collected by PostgreSQL,
it can be difficult to tune the current PostgreSQL server for
satisfactory performance.Therefore, we propose Vertical Clustered Indexing (VCI), an in-memory
column store function that holds data in a state suitable for business
analysis and is also expected to improve analysis performance.
I just don't get, why it should be "in-memory"? All the same things you
describe further, but storing in paged index on-disk with caching
through shared_buffers - why this way it wouldn't work?
Hi Alvaro san,
I am sorry for my late reply. I continue to work on proposing VCI feature to the community.
I think this is definitely an important and welcome development.
I'm looking forward to patches in this area.
Thank you!
I am currently preparing to share VCI designs with PGConf.dev.
I look forward to sharing more about VCI with you.
Best regards,
Aya Iwata
FUJITSU LIMITED
Hi Yura san,
I just don't get, why it should be "in-memory"? All the same things you
describe further, but storing in paged index on-disk with caching
through shared_buffers - why this way it wouldn't work?
We make the columnar store resident in memory for maximum search performance.
But I'm not very particular about this. Comments are welcome.
Best regards,
Aya Iwata
FUJITSU LIMITED
Show quoted text
-----Original Message-----
From: Yura Sokolov <y.sokolov@postgrespro.ru>
Sent: Wednesday, January 15, 2025 11:44 PM
To: pgsql-hackers@lists.postgresql.org
Subject: Re: [WIP]Vertical Clustered Index (columnar store extension) - take207.10.2024 17:53, Aya Iwata (Fujitsu) wrote:
Hi All,
Suggestions
==========
When analyzing real-time data collected by PostgreSQL,
it can be difficult to tune the current PostgreSQL server for
satisfactory performance.Therefore, we propose Vertical Clustered Indexing (VCI), an in-memory
column store function that holds data in a state suitable for business
analysis and is also expected to improve analysis performance.I just don't get, why it should be "in-memory"? All the same things you
describe further, but storing in paged index on-disk with caching
through shared_buffers - why this way it wouldn't work?
08.04.2025 13:29, Aya Iwata (Fujitsu) wrote:
Hi Yura san,
I just don't get, why it should be "in-memory"? All the same things you
describe further, but storing in paged index on-disk with caching
through shared_buffers - why this way it wouldn't work?We make the columnar store resident in memory for maximum search performance.
But I'm not very particular about this. Comments are welcome.
I just wanted to say: there is no need to be super fast.
There is the need to be remarkably faster than it is now.
ClickHouse, DuckDB, Vertica - they are not in-memory, they are disk based.
But they are very fast.
If PostgreSQL will be just as twice slower as ClickHouse, it will be very
great! Most of users will not setup ClickHouse at all then, because twice
slower is still very fast.
Databases could be very huge. Even when they are in "columnar" format,
which usually consumes less space. And memory is still costs more than disk
space.
Certainly there are users who think they need "in-memory". But the truth is
very few of them really need "in-memory".
All of this is just my opinion. I could be wrong.
-----Original Message-----
From: Yura Sokolov <y.sokolov@postgrespro.ru>
Sent: Wednesday, January 15, 2025 11:44 PM
To: pgsql-hackers@lists.postgresql.org
Subject: Re: [WIP]Vertical Clustered Index (columnar store extension) - take207.10.2024 17:53, Aya Iwata (Fujitsu) wrote:
Hi All,
Suggestions
==========
When analyzing real-time data collected by PostgreSQL,
it can be difficult to tune the current PostgreSQL server for
satisfactory performance.Therefore, we propose Vertical Clustered Indexing (VCI), an in-memory
column store function that holds data in a state suitable for business
analysis and is also expected to improve analysis performance.I just don't get, why it should be "in-memory"? All the same things you
describe further, but storing in paged index on-disk with caching
through shared_buffers - why this way it wouldn't work?
--
regards
Yura Sokolov aka funny-falcon
Hello,
We created VCI patch.
This patch is a prototype. The build and postgres regression tests have passed,
but there are still some bugs.
This patch includes the following VCI implementations:
- Changes to PostgreSQL core code
- Conversion of WOS to ROS
-- The source code for the conversion part exists. However, there is a slight
source code problem. To avoid this problem,
the vci.enable_ros_control_damon parameter now defaults to false. This change
prevents the ROS control daemon from starting at instance startup.
Background workers that convert WOS to ROS are not created because there is
no ROS control daemon.
- Convert to Local ROS
- Custom Plan VCI Scan
- Data transformation by background workers
- Minimum test
- Minimum documentation
This patch provides the following steps to use VCI:
1. Create Table
2. Insert data into the table
3. Create VCI
4. Run VCI Scan
The functions we will be developing are described in the wiki,
so please refer to it.
https://wiki.postgresql.org/wiki/Vertical_clustered_index
Regards,
Aya Iwata
Fujitsu Limited
Attachments:
Hello,
I found some dead codes in 0001 patches. I removed.
Here are new patches.
Regards,
Aya Iwata
Fujitsu Limited
Attachments:
Hi hackers,
I will share the notes on the discussions in PGConf.dev. Thanks all for participation.
Feedback on the Advanced Patch Feedback Session;
- A Basic idea that allows both OLTP/OLAP workloads on the same table is OK
- Buffering mechanism has already been used by existing code GIN, IIUC
- IAM seems a more proper approach than TAM
- One reason is that VCI can only optimize the data lookup stuff
- TAM needs all possible queries; it's too much
- Divide the patch more and more
- to allow committers to consider pros and cons and push one by one
- 15 patches is the maximum amount
- Separate features to some committable group
Sawada san suggested several points;
1. Find parts that are useful not only for the VCI and dispatching new threads
2. Develop codes incrementally. E.g., VCI has a custom scan, but we may
be able to give up on the first version
Also, I have questions and advice below on May 16th.
- How to handle visibility?
- What if transactions that insert tuples are aborted?
- VCI vs Index Only Scan
- VCI seems the same as Index Only Scan
- VCI has compression. So, is the amount of size better than the Index?
- How about the efficiency for storage? How much data can we store?
- using TAM (Alvalo's suggestion)
- Most of the code can be ported from the heap. ROS and WOS can exist as forks
of main files, like .vm and .fm. API can be like:
```
CREATE TABLE foo (id, product, price) USING vci WITH (to_be_indexed=price);
```
Again, I really appreciate your efforts.
Regards,
Aya Iwata
Fujitsu Limited
Hello Iwata-san,
Thanks for presenting the patch at pgconf.dev last week, and for the
discussions at the APFS session. As promised I did a quick review of the
patch this week, trying to understand the design, the trade-offs etc.
I'm still trying to understand the patch, so please correct me when I
got something wrong.
Before I get to the review comments (mostly about the first part) and
questions, let me comment on a couple items from your earlier response:
On 5/17/25 18:31, Aya Iwata (Fujitsu) wrote:
Hi hackers,
I will share the notes on the discussions in PGConf.dev. Thanks all for participation.
Feedback on the Advanced Patch Feedback Session;
- A Basic idea that allows both OLTP/OLAP workloads on the same table is OK
- Buffering mechanism has already been used by existing code GIN, IIUC
Yes. GIN has a "fastupdate" feature, which allows accumulating entries
in a buffer, and merging them into the index later (as a batch, which
makes it more efficient). This is similar to the WOS/ROS architecture.
AFAIK some other columnar databases use a similar WOS/ROS architecture,
with similar goal (to move the index updates into a background job, and
make it more efficient).
I expect there to be a trade-off between the WOS size and query
performance. Larger WOS allows larger updates / better efficiency, but
it likely also means slower queries (that also have to scan the WOS).
- IAM seems a more proper approach than TAM
- One reason is that VCI can only optimize the data lookup stuff
- TAM needs all possible queries; it's too much
I believe this (IAM vs. TAM) is very much an open question.
At the conference the goal of the patch was explained as a way to
improve performance with analytical queries, while maintaining OLTP
performance and features. From this point of view, it makes sense to
consider VCI to be an "accelerator" feature. Which is what indexes do,
i.e. it's a way to speed-up queries, but we can do without them if
that's faster.
If VCI got implemented as TAM, i.e. serving as the primary storage, we
wouldn't have this option - we'd have to do VCI even for OLTP workloads
(e.g. random inserts / selects), and that sounds quite inefficient. Not
to mention the VCI would need to support everything heap does, and it
supports a lot. VCI doesn't even support arbitrary data types, etc.
But this hinges on the assumption that the VCI can be implemented as an
IAM, right? And that IAM is a good fit for the stuff VCI needs to do.
Because if it can't be implemented as IAM, then the simplicity argument
is quite moot. I'm not saying it can't be done as IAM, I don't have a
clear opinion on that yet.
Another reason to think about this is that after the talk, someone in
the audience pointed out this patch originally started on Postgres 9.5,
and that back then there was no TAM interface. So when implementing it
there was no IAM vs. TAM choice, IAM was the only way to do something
like this, and thus it was used. Maybe if TAM was available back then,
it'd be done differently? Not sure.
Also, Alvaro seemed to think TAM is the way to go, and in order to keep
the OLTP performance he suggested to use both heap and VCI at the same
time, in different "forks". I'm not sure how would that work, or if we
can already do that - AFAIK we can't, because ForkNumber does not allow
adding custom forks. We'd have to relax that, or invent some sort of
federated TAM (that just multiplexes it to two TAMs). Maybe.
But it's not like the IAM approach doesn't need to do this. The first
patch had to add stuff to a lot of random places to make this work. And
some of the places touch stuff that we don't expect indexes to worry
about, like ALTER TABLE, etc.
- Divide the patch more and more
- to allow committers to consider pros and cons and push one by one
- 15 patches is the maximum amount
- Separate features to some committable group
Yes. Both parts need to be divided into small (but meaningful) pieces. I
don't recall where the "15 patches" came from, I don't think there's
some sort of maximum number of patches. All I care about is that the
patches need to be small-ish, and still make sense. Ideally, each part
should be committable on it's own. This makes it easier to test/review,
but also easier to agree on individual parts - in which case we may
commit the "preparatory" pieces and not have to rebase them forever.
The 0001 part is ~83K, which is not entirely crazy, but I can easily
imagine it split into ~10 parts, each addressing a different subset. For
0002 it's even more important, because no one can review a 1.3MB patch.
Sawada san suggested several points;
1. Find parts that are useful not only for the VCI and dispatching new threads
2. Develop codes incrementally. E.g., VCI has a custom scan, but we may
be able to give up on the first version
Yes, I think this makes perfect sense.
I'd probably keep it in a single thread (essentially as a single patch
series), because it's very useful to see the "bigger picture" why the
parts are important. But if you find something that can stand on it's
own, then sure. But you'll still need to keep the part in the main patch
series (it should not be necessary for people to collect patches from
different threads, and it would also break the CI testing commitfests
patches).
Breaking 0002 into smaller parts makes perfect sense. Something like:
- minimum working patch, with only serial sequential scan
- add parallel sequential scan
- add other custom scans, one by one ...
- ...
Also, I have questions and advice below on May 16th.
- How to handle visibility?
- What if transactions that insert tuples are aborted?
- VCI vs Index Only Scan
- VCI seems the same as Index Only Scan
- VCI has compression. So, is the amount of size better than the Index?
- How about the efficiency for storage? How much data can we store?
- using TAM (Alvalo's suggestion)
- Most of the code can be ported from the heap. ROS and WOS can exist as forks
of main files, like .vm and .fm. API can be like:```
CREATE TABLE foo (id, product, price) USING vci WITH (to_be_indexed=price);
```
Yes, I think those questions are important. Not only for understanding
the current patch, but also to understand what trade-offs the solution
can make (e.g. about the consistency/visibility model), etc.
What I really miss in the current patch is some sort of READMEs with
- high-level design of the VCI indexes
- description of the consistency/visibility model (does it behave the
same way as querying the heap, can it be out of sync for, ...)
- WOS/ROS architecture (when are rows inserted into WOS, promoted into
ROS, what triggers that, ...)
- what's the in-memory / on-disk format
- some places in the patch mention "internal VCI tables" but I have no
idea what that is
- how does the execution work? compression? crucial points to consider
for optimal performance, etc.
- limitations (temporary - can be relaxed in the future, permanent -
inherent to the columnar design) and trade-offs
- what are the various custom scan executor nodes
- what "background" processes happen (custom workers, ...)
- anything else substantial for understanding the design
Maybe there's some of this in the 0002 patch, but I haven't stumbled
over it so far.
------------------------------------------------------------------------
Finally, some review comments about the 0001 patch - it's mostly in the
order as appearing in git diff, file by file. Some of the points repeat,
because multiple places made me think about the same thing. And some of
it was mentioned above.
- It seems the VCI is implemented as an index AM, but it seems to be
trying to do some TAM stuff, e.g. with visibility info etc. For example,
I don't think indexes generally care about XIDs that deleted rows, etc.
Similarly, I can't think of another index AM that'd try to hook into
tablecmds.c or execExpr.
- Not sure what's the right choice (makes sense to have it as index on
top of heap, so that we do regular stuff for OLTP queries, and columnar
stuff using the index for analytics). But it seems it does stuff
expected from table AM and not index AM.
- But making a table AM would make it much more demanding (it'd have to
support storing all data types, ...) and would hurt OLTP performance.
- I very much support the idea of implementing this as an extension (so
minimal infrastructure changes to core + extension in contrib or
entirely outside).
- But the patch is not there, not even with 0001, because there's a
number of places mentioning "vci". There should be no reference to VCI
in core, it should be generic extensibility infrastructure. It's a good
indication of places that may need some extensibility.
relation_open
------------------
- Why is this change (disabling assert enforcing proper locking) needed?
There's a comment in lock.c which mentions VCI parallel workers, but
it's suspicious.
reloptions.c
------------------
- Changes that should be moved to the contrib module Why should in-core
reloptions know about this? See for example how "contrib/bloom" defined
reloptions.
heapam.c
------------------
- add_index_delete_hook, not sure I understand the purpose of this,
likely related to the consistency model, needs explanation. Why is it
prefixed with "add_"?
- If we need such DELETE hook, heapam.c is not where to call it, it
should be called from the same "layer" adding tuples to indexes (which
certainly does not happen in heap_insert).
- Furthermore, if we need this, why should it be a hook and not a new
callback in IndexAmRoutine, just like aminsert()?
initscan
------------------
- Isn't it using keep_startblock in a wrong way? it certainly is not
meant to indicate "keep the old number of blocks"
- Why is this even needed? This is in heapam, so why would this even
matter for VCI, which is an index? Shouldn't that be mostly independent
from which TAM is used by the table?
heapam_handler.c
------------------
- Why does this need to preserve the IndexHeapTuple at all? And why
should it be done in heapam_handler? Again, shouldn't IAM be mostly
independent of this?
- Using a global variable does not seem great. We probably can't have
concurrent calls (only a single heapam_index_build_range_scan call at a
time), but globals are dubious.
- If VCI really needs to remember the tuple, there should be a "proper"
way to explicitly pass it (adjust the IndexAmRoutine callback to have a
new argument or something like that).
heapam_visibility.c
-------------------
- It's probably sensible to make visibility snapshots extensible, i.e.
to allow defining custom snapshot types etc. But the patch still does
that by hard-coding stuff in core, even if the actual "visibility" code
is in extension.
- But this doesn't really allow custom snapshot types, those are still
hardcoded SNAPSHOT_VCI_WOS2ROS / SNAPSHOT_VCI_LOCALROS.
- The add_snapshot_satisfies_hook does not seem like the way to make
this extensible - what if two extensions want to do this. First
HeapTupleSatisfiesVisibility would have to know about all the
extensions, and it could easily lead to infinite loop. Because we call
the first hook, and it calls HeapTupleSatisfiesVisibility again.
- This is probably a good example of VCI trying to do TAM stuff. TAM
interface has tuple_satisfies_snapshot, which seems to have the same
purpose as the hook. Maybe not (it still assumes regular snapshots).
- But I still don't understand why should a new index AM require changes
to how heap AM does visibility checks (or any visibility checks in
general, as indexes don't have visibility info in PG). And why would it
make sense for heap to know about this. IAM is the layer *above* TAM, in
a way.
xlogfuncs.c
------------------
- RECOVERY_VCI_PAUSE_REQUESTED - what's that for? why do we need a
separate state from RECOVERY_PAUSE_REQUESTED?
xlogrecovery.c
------------------
- I don't see why we should not log that the recovery is paused because
of VCI. I sure users would still like to know why it's paused:
/* If pause requested by VCI, the log is not output. */
else if (GetRecoveryPauseState() != RECOVERY_VCI_PAUSE_REQUESTED)
dependency.c
------------------
- add_drop_relation_hook raises mostly the same questions as
add_index_delete_hook
- Don't we already have event triggers to do this kind of stuff in an
extensible way?
- Why should an index even care about the table getting dropped? I mean,
it'll get dropped too, before the table. So why this?
index.c
------------------
- Why do we need add_reindex_index_hook at all? we already have ambuild
callback, right?
- It seems really suspicious ConstructTupleDescriptor gains the ability
to index system attributes, why is that needed? surely the VCI does not
need to index system attributes, right? Or why is that needed?
- It seems add_reindex_index_hook is used to "skip" rebuilding the
index, but why would it be useful/necessary, and why should it be done here?
- Why couldn't ambuild() just return? How do I rebuild the index if
something goes wrong (e.g. it gets corrupted). Or to remove bloat, if
that applies to VCI.
explain.c
------------------
- T_CustomPlanMarkPos - so this introduces a new type of custom plan
node? How is that different from CustomScan? We don't even have MarkPos
plan in core, right? So what does it do?
- It even appears to be exactly the same as CustomScan in most places,
but it also introduces some new callbacks (ExplainCustomPlanTargetRel
and SetBoundCustomScan).
- If useful, this should be a separate patch (custom scan improvements).
And it definitely needs to update the CustomScan docs in SGML.
- I don't understand why we need ExplainPropertySortGroupKeys, does not
align with the rest of the API (ExplainProperty is for different data
types - float, text, ...). Maybe expose show_sort_group_keys? Or just
copy show_sort_group_keys into the extension ...
indexcmds.c
------------------
- Yet another place referencing "vci" explicitly (not extensible, should
not be in core)
- Not sure I understand the limitations mentioned there.
tablecmds.c
------------------
- add_alter_tablespace_hook, add_alter_table_change_owner_hook,
add_alter_table_change_schema_hook
- Why does VCI this even need to handle these actions?
- And we have event triggers, so why have these new hooks?
- Not sure what the code in ATExecSetRelOptions does (another example
that handles "vci" AM explicitly, in a non-extensible way).
- Is it maybe rebuilding some reloptions? Maybe it's about the attnums?
vacuum.c
------------------
- add_skip_vacuum_hook is used to "disable" vacuum for internal VCI tables.
- Why. And what is "internal table"?
execAmi.c
------------------
- what's T_CustomPlanMarkPos?
execExpr.c
------------------
- Why does this expose ExecReadyExpr, ExecCreateExprSetupSteps?
- Isn't this a bit too fragile?
- Probably needed because VciExecInitExpr needs a new argument? But why
does it need that?
execExprInterp.c
------------------
- Why does this need ExprEvalVar_hook and ExprEvalParam_hook?
- Doesn't even check if the hooks are != NULL
- CASE_EEOP_VCI_VAR, CASE_EEOP_VCI_PARAM_EXEC - not extensible, should
not reference VCI explicitly
- VciExprEvalVarHook (for llvmjit?)
execGrouping.c
------------------
- LookupTupleHashEntry_internal removes comment, why?
execProcnode.c, execScan.c
------------------
- New MarkPos node, apparetly, but why do we need that? Why can't we
just build the Materialize when constructing the plan?
execUtils.c
------------------
- again, disables locking check, very suspicious
nodeCustom.c
------------------
- Isn't it strange that ExecInitCustomScan short-circuits the init,
skipping most of the initialization for selected scans?
- If anything, it should not hard-code the scan names like this. Why is
this necessary? is the rest of the initialization irrelevant, or what?
Maybe this is sign it should be table AM rather than index AM?
- But IOS scans would also do this initialization, so ...
nodeModifytable.c
------------------
- add_should_index_insert_hook another strange hook
- Doesn't the hook apply to all relations all the time, not just those
with VCI indexes etc.?
gen_node_support.c
------------------
- So is CustomPlanMarkPos a proper node, or why do we need it? why is
CustomPlan not enough?
- What's SmcAllocSetContext for? Doesn't seem to be mentioned anywhere.
params.c
------------------
- ExecInitParam_hook seems unused, no?
allpaths.c
------------------
- set_plain_rel_pathlist - I don't understand why this checks the VCI
vs. parallelism like this, what it's trying to prevent. Maybe if you
don't want certain plan shape, don't generate the partial paths in that
case from the CustomScan planner callback?
- It seems this tries to allow parallel seqscan, but not other more
complex parallel scans (like parallel agg/join ...), but that could be
done in callback, no?
- Again, we should not reference the index AM this explicitly, it's not
extensible.
createplan.c
------------------
- I don't think we want to make copy_plan_costsize part of API. Just a
detail, though.
autovacuum.c
------------------
- Why does this need a custom autovacuum launcher, or what's the purpose
of this code? is there a special autovacuum launcher for VCI indexes?
- Again, explicit references to "VCI", should not be done like this.
bgworker.c
------------------
- I have no idea what's the point of these changes. seems to be enabling
canceling bgworkers, but that seems like a generic improvement. In which
case it should be a separate patch.
- No idea if it's correct etc.
- Seems to be done to allow CountOtherDBBackends() to cancel backends,
so that it can be dropped etc.
- There apparently are bgworkers doing cleanup for VCI, and we want to
treat them like autovacuum workers (which get interrupted in various
cases instead of waiting forever).
- But this will cancel all bgworkers, no? Not just the VCI ones ...
lock.c
------------------
- this seems a bit strange, surely this should use the same locking
approach as other parallel queries
/*
* Don't lock for VCI parallel workers and other type of workers should go
* in normal flow, In case if there is any change in background worker
* name for VCI parallel workers, the following code also needs an update.
* FIXME: Try to use the community parallelism code, so that we don't need
* our own VCI parallel infrastructure.
*/
if (AmBackgroundWorkerProcess() && strstr(MyBgworkerEntry->bgw_name,
"backend="))
- Presumably this is why some of the Asserts on locking were commented
out earlier.
lwlock.c
------------------
- Vci_lwlock_kind seems unused.
timestamp.c
------------------
- I believe the IntervalAggState is internal on purpose, why should it
be made external? It also means timestamp.h has to worry about FRONTEND.
relcache.c
------------------
- add_skip_vci_index_hook - yet another hook, same questions as for the
other hooks earlier
- It seems suspicious we have to skip VCI indexes like this during
planning in RelationGetIndexAttrBitmap, not sure why that's needed? Why
should't we cost the index and maybe reject it based on that?
- Again, references VCI explicitly, so not extensible.
mctx.c
------------------
- why MemoryContextMethods shouldn't be "const"?
common.c
pg_dump.c
pg_dump.h
------------------
- isvciview breaks extensibility, tied to a particular IAM / extension.
Needs to be abstracted / made extensible.
- What is "vci view table" for, actually?
reloptions.h
------------------
- Why does this need to add an entry into the relopt_kind enum? It
should be extensible, see the "contrib/bloom".
xlogrecovery.h
------------------
- Why is RECOVERY_VCI_PAUSE_REQUESTED separate from
RECOVERY_PAUSE_REQUESTED?
timestamp.h
------------------
- The FRONTEND stuff is a sign making IntervalAggState public may not be
a good idea.
executor.h
------------------
- How come it's suddenly OK to return entry->additional without the if
conditions in TupleHashEntryGetAdditional?
- Ah, it's now stored as a separate palloc chunk, not right after the
tuple? Well, maybe that's a separately useful improvement? should be a
separate patch, then. No opinion on whether it's a good idea.
- What's with the struct LimitState; ? Why is it needed?
extensible.h
------------------
- Adds SetBoundCustomScan / ExplainCustomPlanTargetRel, but then it
should also add them to the SGML docs.
memnodes.h
------------------
- What's SmcAllocSetContext? It's not defined anywhere.
params.h
------------------
- Adds noNeedLoadFromMain, loadedFromMain to ParamExecData, but it
doesn't seem to be used anywhere (in either patch).
plannodes.h
------------------
- Adds plan_no to Plan, but it's not set anywhere in core, only in the
VCI extension - which seems wrong (e.g. what if there are two extensions
that want to set it, and they disagree?)
- Either if we want/need this, it should be initialized by core during
planning, in a non-ambiguous way.
- But also how is this different from the existing plan_node_id?
autovacuum.h
------------------
- It seems a bit strage to add "built-in" bgworkers for an extension. If
the extension really needs some sort of workers, it should be defined in
the extension.
bgworker.h
------------------
- So what's the problem with canceling bgworkers? Should this be done
as a separate patch?
itemptr.h
------------------
- Why do we need this SizeOfIptrData definition? How is it different
from sizeof(ItemPointerData)?
lwlock.h
------------------
- LWLockKind / Vci_lwlock_kind seems unused
numeric.h
------------------
- Why does this expose NumericVar, when the second patch redefines the
struct anyway? also, isn't that fragile? although, we're unlikely to
tweak the NumericVar definition.
rel.h
------------------
- Again, not extensible way to do stuff in an extension.
- We should not have pieces specific to AM in StdRdOptions.
- Doesn't AM have custom reloptions now? Like contrib/bloom?
I also tried to build 0002, but unfortunately that fails with:
executor/vci_executor.c: In function âvci_setup_executor_hookâ:
executor/vci_executor.c:115:28: error: assignment to
âExecutorStart_hook_typeâ {aka âvoid (*)(QueryDesc *, int)â} from
incompatible pointer type â_Bool (*)(QueryDesc *, int)â
[-Wincompatible-pointer-types]
115 | ExecutorStart_hook = vci_executor_start_routine;
| ^
executor/vci_executor.c: In function âvci_executor_start_routineâ:
executor/vci_executor.c:161:28: error: void value not ignored as it
ought to be
161 | plan_valid = executor_start_prev(queryDesc, eflags);
| ^
executor/vci_executor.c:163:28: error: void value not ignored as it
ought to be
163 | plan_valid = standard_ExecutorStart(queryDesc,
eflags);
| ^
make: *** [../../src/Makefile.global:973: executor/vci_executor.o] Error 1
The extension is not added to contrib/Makefile, so "make world" does not
trigger this failure.
regards
--
Tomas Vondra
Hi Tomas.
Thankyou for your extremely valuable post [1]/messages/by-id/a748aa6b-c7e6-4d02-a590-ab404d590448@vondra.me. We are analysing your
review comments and feedback, and will respond later.
Meanwhile, I am attaching updated patches to address the reported
build problems:
I also tried to build 0002, but unfortunately that fails with:
executor/vci_executor.c: In function ‘vci_setup_executor_hook’:
executor/vci_executor.c:115:28: error: assignment to
‘ExecutorStart_hook_type’ {aka ‘void (*)(QueryDesc *, int)’} from
incompatible pointer type ‘_Bool (*)(QueryDesc *, int)’
[-Wincompatible-pointer-types]
115 | ExecutorStart_hook = vci_executor_start_routine;
| ^
executor/vci_executor.c: In function ‘vci_executor_start_routine’:
executor/vci_executor.c:161:28: error: void value not ignored as it
ought to be
161 | plan_valid = executor_start_prev(queryDesc, eflags);
| ^
executor/vci_executor.c:163:28: error: void value not ignored as it
ought to be
163 | plan_valid = standard_ExecutorStart(queryDesc,
eflags);
| ^
make: *** [../../src/Makefile.global:973: executor/vci_executor.o] Error 1The extension is not added to contrib/Makefile, so "make world" does not
trigger this failure.
patch v3-0001.
- Only some whitespace issues are addressed
patch v3-0002.
- Fixed the contrib/Makefile to include vci
- Fixed a build error (due to recent commit [2]https://github.com/postgres/postgres/commit/1722d5eb05d8e5d2e064cd1798abcae4f296ca9d)
- The vci.sgml file is removed (now in patch 0003)
patch v3-0003
- New patch just for the docs.
======
[1]: /messages/by-id/a748aa6b-c7e6-4d02-a590-ab404d590448@vondra.me
[2]: https://github.com/postgres/postgres/commit/1722d5eb05d8e5d2e064cd1798abcae4f296ca9d
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
Hi Tomas.
What I really miss in the current patch is some sort of READMEs with
- high-level design of the VCI indexes
- description of the consistency/visibility model (does it behave the
same way as querying the heap, can it be out of sync for, ...)- WOS/ROS architecture (when are rows inserted into WOS, promoted into
ROS, what triggers that, ...)- what's the in-memory / on-disk format
- some places in the patch mention "internal VCI tables" but I have no
idea what that is- how does the execution work? compression? crucial points to consider
for optimal performance, etc.- limitations (temporary - can be relaxed in the future, permanent -
inherent to the columnar design) and trade-offs- what are the various custom scan executor nodes
- what "background" processes happen (custom workers, ...)
- anything else substantial for understanding the design
Maybe there's some of this in the 0002 patch, but I haven't stumbled
over it so far.
Acknowledged. We are working now on creating a draft README and will
post it here ASAP.
~~~
For patch 0001 you reported some items as being "unused". You were
correct on all counts; those are all removed in the latest patch set.
gen_node_support.c
------------------
- What's SmcAllocSetContext for? Doesn't seem to be mentioned anywhere.
Removed.
params.c
------------------
- ExecInitParam_hook seems unused, no?
Removed.
lwlock.c
------------------
- Vci_lwlock_kind seems unused.
Removed.
memnodes.h
------------------
- What's SmcAllocSetContext? It's not defined anywhere.
Removed.
params.h
------------------
- Adds noNeedLoadFromMain, loadedFromMain to ParamExecData, but it
doesn't seem to be used anywhere (in either patch).
Removed.
lwlock.h
------------------
- LWLockKind / Vci_lwlock_kind seems unused
Removed
======
Kind Regards,
Peter Smith.
Fujitsu Australia.
Attachments:
Hi Peter,
I've noticed there are changes in Postgres code v4 patch that rollback
the commit [1]/messages/by-id/attachment/174581/v9-0004-Remove-additional-pointer-from-TupleHashEntryData.patch. That commit optimizes TupleHashEntryData struct size
and amount of memory allocations which improves performance (see
discussion [2]/messages/by-id/817d244237878cebdff0bc363718feaf49a1ea7d.camel@j-davis.com).
Can we use leave TupleHashEntryData as is and make new VCI-specific
struct that contains TupleHashEntryData member and an additional
pointer or make VCI use TupleHashEntryGetAdditional()?
[1]: /messages/by-id/attachment/174581/v9-0004-Remove-additional-pointer-from-TupleHashEntryData.patch
/messages/by-id/attachment/174581/v9-0004-Remove-additional-pointer-from-TupleHashEntryData.patch
[2]: /messages/by-id/817d244237878cebdff0bc363718feaf49a1ea7d.camel@j-davis.com
/messages/by-id/817d244237878cebdff0bc363718feaf49a1ea7d.camel@j-davis.com
-------
Regards,
Timur Magomedov
Hi Timur.
On Thu, May 29, 2025 at 11:30 PM Timur Magomedov
<t.magomedov@postgrespro.ru> wrote:
Hi Peter,
I've noticed there are changes in Postgres code v4 patch that rollback
the commit [1]. That commit optimizes TupleHashEntryData struct size
and amount of memory allocations which improves performance (see
discussion [2]).
Can we use leave TupleHashEntryData as is and make new VCI-specific
struct that contains TupleHashEntryData member and an additional
pointer or make VCI use TupleHashEntryGetAdditional()?[1]
/messages/by-id/attachment/174581/v9-0004-Remove-additional-pointer-from-TupleHashEntryData.patch
[2]
/messages/by-id/817d244237878cebdff0bc363718feaf49a1ea7d.camel@j-davis.com
Thank you for noticing and reporting this!
It was not intentional to roll back changes to core PostgreSQL. These
VCI patches originated from an older forked source, so it seems this
reversion was inadvertently introduced during the rebasing process.
We’ll aim to correct this in a future patch.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Hello Peter,
I've tried running VCI, the idea looks great to me. Thank you and
Iwata-san for working on this feature. Looked at the source code of VCI
module, the patch is huge but here are some ideas I hope could be
useful for community. Made some patches on top of v4 that can be
squashed into future iteration of VCI module.
0001-Removed-unnecessary-preload-libraries-checks.patch
Please correct me if I'm wrong but there is no need for adding VCI to
"session_preload_libraries" in case it is in "shared_preload_libraries"
already.
0002-Meson-fixes.patch
Fixed some ghost files in Meson build, added VCI to
contrib/meson.build.
0003-Removed-unused-OS-specific-CAS-functions.patch
Having own VCI-specific CAS looks scary to me, fortunately looks like
it was unused and can be removed.
0004-Fixed-segfault-in-case-there-is-no-hottable.patch
The most exciting one, faced it while playing around with queries from
ClickBench benchmark. Segfault was caused by accessing
aggstate->hottable even in case aggstate->hottable is NULL, using
aggstate->hashtable instead fixes this.
Also there are some ideas of making the VCI module patch smaller and
simpler for review:
First of all, we have this hottable optimization which is enabled but
surrounded by "#ifdef VCI_ENABLE_HOT_HASH_TABLE ... #endif". I think
this could be a separate patch on top of VCI module patch, say "Hot
hashtable optimization".
Second, there is some OS-specific code that looks like used only in
case ENABLE_WOSROS_TRANS_DEFERRAL is defined. This feature is unclear
to me, probably it can be better reviewed if separated to another
"Deferral of WOS->ROS transforamtion" patch that would introduce new
GUC and some OS-specific code. So we can be sure other VCI code has no
OS-specific parts.
--
Regards,
Timur Magomedov
Attachments:
On Fri, May 23, 2025 at 4:29 PM Tomas Vondra <tomas@vondra.me> wrote:
Also, Alvaro seemed to think TAM is the way to go, and in order to keep
the OLTP performance he suggested to use both heap and VCI at the same
time, in different "forks". I'm not sure how would that work, or if we
can already do that - AFAIK we can't, because ForkNumber does not allow
adding custom forks. We'd have to relax that, or invent some sort of
federated TAM (that just multiplexes it to two TAMs). Maybe.But it's not like the IAM approach doesn't need to do this. The first
patch had to add stuff to a lot of random places to make this work. And
some of the places touch stuff that we don't expect indexes to worry
about, like ALTER TABLE, etc.
I suspect another option would be to handle this with table inheritance:
have one child that is heap-based, a second that's VCI, and a background
job to move data from heap to VCI (and vice-versa for updates and maybe
deletes).
Note that you could actually implement all that in user-space. Personally
I'd much rather have a way to do pure VCI / column-store sooner and manage
it myself than have to wait another release (or more) to get a complete
solution...
On 6/4/25 19:59, Jim Nasby wrote:
On Fri, May 23, 2025 at 4:29â¯PM Tomas Vondra <tomas@vondra.me
<mailto:tomas@vondra.me>> wrote:Also, Alvaro seemed to think TAM is the way to go, and in order to keep
the OLTP performance he suggested to use both heap and VCI at the same
time, in different "forks". I'm not sure how would that work, or if we
can already do that - AFAIK we can't, because ForkNumber does not allow
adding custom forks. We'd have to relax that, or invent some sort of
federated TAM (that just multiplexes it to two TAMs). Maybe.But it's not like the IAM approach doesn't need to do this. The first
patch had to add stuff to a lot of random places to make this work. And
some of the places touch stuff that we don't expect indexes to worry
about, like ALTER TABLE, etc.I suspect another option would be to handle this with table inheritance:
have one child that is heap-based, a second that's VCI, and a background
job to move data from heap to VCI (and vice-versa for updates and maybe
deletes).Note that you could actually implement all that in user-space.
Personally I'd much rather have a way to do pure VCI / column-store
sooner and manage it myself than have to wait another release (or more)
to get a complete solution... Â
I don't see how could this ever work with the optimizer, which assumes
scanning an inheritance hierarchy means scanning all parts. But this
would require making planner "smarter" to know it should scan only one
of the child relations. And I believe it's not possible to do that while
constructing scans for the heap/VCI parts, those places are not aware of
what other parts are being scanned etc.
Sure, you could do this in "user-space" by constructing queries that
reference either the heap or VCI part. But then why put that into
inheritance tree at all? It certainly does not help with moving data
between the parts.
I may be missing something, of course. But it's not clear to me how is
this supposed to work ...
What I can imagine is "VCI" as a "proxy" TAM on top of heap, keeping the
columnar format in a separate fork. And using either that from custom
scans, or the heap as a fallback for cases not supported by VCI.
regars
--
Tomas Vondra
On Wed, Jun 4, 2025 at 1:16 PM Tomas Vondra <tomas@vondra.me> wrote:
On 6/4/25 19:59, Jim Nasby wrote:
On Fri, May 23, 2025 at 4:29 PM Tomas Vondra <tomas@vondra.me
<mailto:tomas@vondra.me>> wrote:Also, Alvaro seemed to think TAM is the way to go, and in order to
keep
the OLTP performance he suggested to use both heap and VCI at the
same
time, in different "forks". I'm not sure how would that work, or if
we
can already do that - AFAIK we can't, because ForkNumber does not
allow
adding custom forks. We'd have to relax that, or invent some sort of
federated TAM (that just multiplexes it to two TAMs). Maybe.But it's not like the IAM approach doesn't need to do this. The first
patch had to add stuff to a lot of random places to make this work.And
some of the places touch stuff that we don't expect indexes to worry
about, like ALTER TABLE, etc.I suspect another option would be to handle this with table inheritance:
have one child that is heap-based, a second that's VCI, and a background
job to move data from heap to VCI (and vice-versa for updates and maybe
deletes).Note that you could actually implement all that in user-space.
Personally I'd much rather have a way to do pure VCI / column-store
sooner and manage it myself than have to wait another release (or more)
to get a complete solution...I don't see how could this ever work with the optimizer, which assumes
scanning an inheritance hierarchy means scanning all parts. But this
would require making planner "smarter" to know it should scan only one
of the child relations. And I believe it's not possible to do that while
constructing scans for the heap/VCI parts, those places are not aware of
what other parts are being scanned etc.
Right; I was envisioning that one child would be a conventional heap that
stored very recent data and another child would be columnar in nature. So
you'd definitely want to always look at both children.
I am making an assumption (based on the comment about multiple forks) that
we'd have some way to handle VCI without having an actual heap.
Sure, you could do this in "user-space" by constructing queries that
reference either the heap or VCI part. But then why put that into
inheritance tree at all? It certainly does not help with moving data
between the parts.
Right; I only brought it up because just having a working column-store
would be a big win, even if you had to code something to deal with any DML
that wasn't already batch up. Of course it would be better if it just did
the RightThing(TM) out of the box... but the perfect can be the enemy of
the good.
What I can imagine is "VCI" as a "proxy" TAM on top of heap, keeping the
columnar format in a separate fork. And using either that from custom
scans, or the heap as a fallback for cases not supported by VCI.
Yeah, there'd definitely need to be some kind of proxy... I'm just
suggesting that we don't *have* to do that as a separate fork...
Of course I could also just be missing something :)
On Wed, Jun 4, 2025 at 5:19 PM Jim Nasby <jnasby@upgrade.com> wrote:
What I can imagine is "VCI" as a "proxy" TAM on top of heap, keeping the
columnar format in a separate fork. And using either that from custom
scans, or the heap as a fallback for cases not supported by VCI.Yeah, there'd definitely need to be some kind of proxy... I'm just
suggesting that we don't *have* to do that as a separate fork...
(tl;dr: there are some key things that can only be implemented in the
engine that would enable much more complex features to be added at the SQL
level, without requiring tons of C code to implement the larger feature)
Oh, one other thing worth mentioning... it's actually not terribly hard to
build a column-store in userspace today: just turn every column of a table
into an array and set the TOAST target low enough so that it all gets
toasted. I tested that many years ago, and even though I couldn't set the
toast target back then saw some really encouraging results... provided that
I constructed my queries carefully (I also had range fields for each column
that stored the min/max of each array, so the planner could completely skip
de-toasting anything that would not contain values of interest.)
The reason I never went anywhere with this concept is it'd be very hard for
most folks to write queries that performed well. The transform from column
back to row-based was actually hidden behind a view (a bunch of unnest()'s)
- but if you didn't make use of the range fields in your query you lost a
lot (but not all[1]In my testing (which used the taxicab database) there was still a performance gain from storing the data as arrays, even if the queries to access it took no special efforts to eliminate unnecessary data. The reason is that TOAST meant that the base data was being compressed. In fact, testing showed that there was a win even if you didn't treat each individual column as an array; you could simply store an array of a composite type and still see a win.) of the performance gain. I know that I could have used
the hooks to teach the planner how to do this, but it would have been a
huge amount of work (at least for me) to do so.
It did occur to me recently that a generic system for teaching the
optimizer additional transforms it could make would be generally useful. By
far the biggest example would be a way to teach it that
WHERE timestamp_field :: date = '2025-6-4'
is the same thing as
WHERE timestamp_field >= '2025-6-4' AND timestamp_field < '2025-6-5'
That would be extremely helpful in a lot of environments. There are
definitely other cases where you can apply the same kinds of logic. In
particular, such a feature (if generic enough) would make it possible to
write simple queries against a view that transformed columnar data (stored
as arrays) back into a row format *and* apply additional predicates that
would make those queries highly efficient - all done via pure SQL.
[1]: In my testing (which used the taxicab database) there was still a performance gain from storing the data as arrays, even if the queries to access it took no special efforts to eliminate unnecessary data. The reason is that TOAST meant that the base data was being compressed. In fact, testing showed that there was a win even if you didn't treat each individual column as an array; you could simply store an array of a composite type and still see a win.
performance gain from storing the data as arrays, even if the queries to
access it took no special efforts to eliminate unnecessary data. The reason
is that TOAST meant that the base data was being compressed. In fact,
testing showed that there was a win even if you didn't treat each
individual column as an array; you could simply store an array of a
composite type and still see a win.
On Wed, Jun 4, 2025 at 11:47 PM Timur Magomedov
<t.magomedov@postgrespro.ru> wrote:
I've tried running VCI, the idea looks great to me. Thank you and
Iwata-san for working on this feature. Looked at the source code of VCI
module, the patch is huge but here are some ideas I hope could be
useful for community. Made some patches on top of v4 that can be
squashed into future iteration of VCI module.
Hi Timur,
Thank you for the top-up patches and fixes, and also for the ideas to
split the 0002 patch into smaller parts. We will try to incorporate
some/all of these when future patches versions are next posted.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Hi Timur.
On Wed, Jun 4, 2025 at 11:47 PM Timur Magomedov
<t.magomedov@postgrespro.ru> wrote:
Hello Peter,
I've tried running VCI, the idea looks great to me. Thank you and
Iwata-san for working on this feature. Looked at the source code of VCI
module, the patch is huge but here are some ideas I hope could be
useful for community. Made some patches on top of v4 that can be
squashed into future iteration of VCI module.0001-Removed-unnecessary-preload-libraries-checks.patch
Please correct me if I'm wrong but there is no need for adding VCI to
"session_preload_libraries" in case it is in "shared_preload_libraries"
already.
I think you are correct, but I am still checking the history of this
code before making this change.
0002-Meson-fixes.patch
Fixed some ghost files in Meson build, added VCI to
contrib/meson.build.
Done.
0003-Removed-unused-OS-specific-CAS-functions.patch
Having own VCI-specific CAS looks scary to me, fortunately looks like
it was unused and can be removed.
Done.
0004-Fixed-segfault-in-case-there-is-no-hottable.patch
The most exciting one, faced it while playing around with queries from
ClickBench benchmark. Segfault was caused by accessing
aggstate->hottable even in case aggstate->hottable is NULL, using
aggstate->hashtable instead fixes this.
Done.
Also there are some ideas of making the VCI module patch smaller and
simpler for review:First of all, we have this hottable optimization which is enabled but
surrounded by "#ifdef VCI_ENABLE_HOT_HASH_TABLE ... #endif". I think
this could be a separate patch on top of VCI module patch, say "Hot
hashtable optimization".
Done.
Second, there is some OS-specific code that looks like used only in
case ENABLE_WOSROS_TRANS_DEFERRAL is defined. This feature is unclear
to me, probably it can be better reviewed if separated to another
"Deferral of WOS->ROS transforamtion" patch that would introduce new
GUC and some OS-specific code. So we can be sure other VCI code has no
OS-specific parts.
Done.
~~~
PSA v5 patches. These incorporate most of Timur's top-up patches and
patch splitting ideas [1]/messages/by-id/c4e314ef0a58050c8b7847ac1852555876674a69.camel@postgrespro.ru, plus other minor changes:
v5-0001
- same as v4-0001
v5-0002
- Lots of comment prefixes are modified -- e.g. "/**<" to "/*"
- Includes top-up patch 0002 (meson fixes)
- Includes top-up patch 0003 (removes some unused OS-specific functions)
- Includes top-up patch 0004 (fixed segfault)
v5-0003
All the VCI_ENABLE_HOT_HASH_TABLE code is moved from the previous patch 0002.
v5-0004
All the ENABLE_WOSROS_TRANS_DEFERAL code is moved from the previous patch 0002.
v5-0003
- same as v4-0003
======
[1]: /messages/by-id/c4e314ef0a58050c8b7847ac1852555876674a69.camel@postgrespro.ru
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
On Sat, May 24, 2025 at 7:29 AM Tomas Vondra <tomas@vondra.me> wrote:
...
What I really miss in the current patch is some sort of READMEs with
- high-level design of the VCI indexes
- description of the consistency/visibility model (does it behave the
same way as querying the heap, can it be out of sync for, ...)- WOS/ROS architecture (when are rows inserted into WOS, promoted into
ROS, what triggers that, ...)- what's the in-memory / on-disk format
- some places in the patch mention "internal VCI tables" but I have no
idea what that is- how does the execution work? compression? crucial points to consider
for optimal performance, etc.- limitations (temporary - can be relaxed in the future, permanent -
inherent to the columnar design) and trade-offs- what are the various custom scan executor nodes
- what "background" processes happen (custom workers, ...)
- anything else substantial for understanding the design
Maybe there's some of this in the 0002 patch, but I haven't stumbled
over it so far.
Hi Tomas.
Attached is the first version of the README, intended to address the
points above.
For convenience, I’ve included it as a separate file, but the plan is
to integrate it into the 0002 patch in the next update.
Please let me know if you have any feedback or suggestions for
improving the content.
======
Kind Regards,
Peter Smith.
Fujitsu Australia.
Attachments:
Here are the v6 patches. The changes are as follows:
0001 (postgres core)
- now this addresses the rollback of "additional" optimization change
on master -- see report by Timur [1]/messages/by-id/1a7e91a87816ee2a9aa4d5559453baf2f3da9115.camel@postgrespro.ru.
- minor cleanups
0002 (VCI module)
- updated with calls to TupleHashEntryGetAdditional() to support the 0001 fix
- build warnings fixed
- includes the contrib/vci/README
- minor meson changes
0003 (VCI_ENABLE_HOT_HASH_TABLE)
- minor meson changes
0004 (ENABLE_WOSROS_TRANS_DEFERRAL)
- rebased
0005 (docs)
- (unchanged)
======
[1]: /messages/by-id/1a7e91a87816ee2a9aa4d5559453baf2f3da9115.camel@postgrespro.ru
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
On Thu, May 29, 2025 at 11:30 PM Timur Magomedov
<t.magomedov@postgrespro.ru> wrote:
Hi Peter,
I've noticed there are changes in Postgres code v4 patch that rollback
the commit [1]. That commit optimizes TupleHashEntryData struct size
and amount of memory allocations which improves performance (see
discussion [2]).
Can we use leave TupleHashEntryData as is and make new VCI-specific
struct that contains TupleHashEntryData member and an additional
pointer or make VCI use TupleHashEntryGetAdditional()?
Thanks for the report. This is fixed in v6.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
A rebase was necessary. Here are the v7* patches.
In passing, there were some other changes:
- Removed some dead code from patch 0002.
- Moved some code that was unused in patch 0002 to where it is needed
in patch 0004.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
Here are the v8 patches. The main changes are as follows:
v8-0001 VCI - changes to postgres core
- same
v8-0002 VCI module - main
- extracted "compression" related code from this main patch
- applied Timur's top-up patch [1]/messages/by-id/c4e314ef0a58050c8b7847ac1852555876674a69.camel@postgrespro.ru re "session_preload_libraries"
- removed some dead code
v8-0003 VCI module - documentation
- removed mentions about compression
~~~
To avoid too much noise, other extracted patches (below) will be
maintained off-list.
0004 VCI module - compression
0005 VCI module - hothash
0006 VCI module - defer WOS2ROS
======
[1]: /messages/by-id/c4e314ef0a58050c8b7847ac1852555876674a69.camel@postgrespro.ru
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
On Tue, Jun 10, 2025 at 6:02 PM Peter Smith <smithpb2250@gmail.com> wrote:
Hi Timur.
On Wed, Jun 4, 2025 at 11:47 PM Timur Magomedov
<t.magomedov@postgrespro.ru> wrote:Hello Peter,
I've tried running VCI, the idea looks great to me. Thank you and
Iwata-san for working on this feature. Looked at the source code of VCI
module, the patch is huge but here are some ideas I hope could be
useful for community. Made some patches on top of v4 that can be
squashed into future iteration of VCI module.0001-Removed-unnecessary-preload-libraries-checks.patch
Please correct me if I'm wrong but there is no need for adding VCI to
"session_preload_libraries" in case it is in "shared_preload_libraries"
already.I think you are correct, but I am still checking the history of this
code before making this change.
Hi Timur.
Done. This is applied in the latest v8-0002.
FYI, there was some reason for requiring both parameters to be set,
related to issues when using vci in single-user mode. However, being
such a niche problem, it was judged better to remove such code
complications from the current patch 0002; we can revisit this in
future if necessary.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
On Thu, 2025-06-19 at 14:14 +1000, Peter Smith wrote:
Here are the v8 patches. The main changes are as follows:
v8-0001 VCI - changes to postgres core
- samev8-0002 VCI module - main
- extracted "compression" related code from this main patch
- applied Timur's top-up patch [1] re "session_preload_libraries"
- removed some dead codev8-0003 VCI module - documentation
- removed mentions about compression
Hello Peter!
Thank you for working on VCI updates.
Here are some proposals for small improvements:
Since hothash feature lives in separate patch now, vci_hothash.o should
be removed from vci/executor/Makefile of VCI-module-main patch.
0001-Avoid-magic-numbers-in-vci_is_supported_type.patch
I've looked at vci_supported_types.c and tried to rewrite it without
using magic constants. Removed call to vci_check_supported_types() from
SQL code since there is no need now to check that OID constants still
match same types.
Using defined constants for OIDs seems more robust than plain numbers.
There were 23 type OIDs supported by VCI, all of them are there in a
new version of code. I've replaced binary search by simple switch-case
since compilers optimize it into nice binary decision tree. Previous
version was binary searching among 87 structs. New version only
searches among 23 integers.
0002-Updated-supported-funcs-SQL-for-recent-PostgreSQL.patch
I wonder if it is possible to rewrite vci_supported_funcs.c in a
similar way. There is a vci_supported_funcs.sql which I updated so it
can be executed at master version of PostgreSQL.
I don't know if it is intentional, but safe_types list from
vci_supported_funcs.sql have some differences to the types list from
vci_supported_types.c. Specifically, it doesn't include varchar, varbit
and uuid.
--
Regards,
Timur Magomedov
Attachments:
Hi.
Here are the latest v9* patches. These have following changes:
0001
- fixes some of the minor quirks reported by Tomas [1]Tomas - /messages/by-id/a748aa6b-c7e6-4d02-a590-ab404d590448@vondra.me.
0002
- fixes to added address some of Timur's feedback/patches [2]Timur - /messages/by-id/6a6058fc089f89561b2545f024953e4daa0b8561.camel@postgrespro.ru
- test code updated to remove unsupported type
- test code updated to remove dynamic costings
======
[1]: Tomas - /messages/by-id/a748aa6b-c7e6-4d02-a590-ab404d590448@vondra.me
[2]: Timur - /messages/by-id/6a6058fc089f89561b2545f024953e4daa0b8561.camel@postgrespro.ru
Kind Regards,
Peter Smith
Fujitsu Australia
Attachments:
On Fri, Jun 20, 2025 at 4:09 AM Timur Magomedov
<t.magomedov@postgrespro.ru> wrote:
Hello Peter!
Thank you for working on VCI updates.
Here are some proposals for small improvements:
Hi Timur, Thanks for your feedback!
The newly posted v9-0002 addresses some of your comments. Details are below.
Since hothash feature lives in separate patch now, vci_hothash.o should
be removed from vci/executor/Makefile of VCI-module-main patch.
Fixed. Thanks. I also fixed a similar problem and removed vci_mp_rle.o
from storage/Makefile.
0001-Avoid-magic-numbers-in-vci_is_supported_type.patch
I've looked at vci_supported_types.c and tried to rewrite it without
using magic constants. Removed call to vci_check_supported_types() from
SQL code since there is no need now to check that OID constants still
match same types.
Using defined constants for OIDs seems more robust than plain numbers.
There were 23 type OIDs supported by VCI, all of them are there in a
new version of code. I've replaced binary search by simple switch-case
since compilers optimize it into nice binary decision tree. Previous
version was binary searching among 87 structs. New version only
searches among 23 integers.
Hmm. IIUC the “supported types” code was written this way for the
following reason. Out of an over-abundance (?) of caution, the
original patch authors wanted to call the function
'vci_check_supported_types' as a compatibility check, to discover
up-front (during CREATE EXTENSION) if any of the current PostgreSQL
OID definitions differ in any way since when the VCI extension was
built.
e.g.
- if vci build supports the type oid, but the oid value in current
PostgreSQL no longer exists then raise an error
- if vci build supports the type oid but the oid now has a different
meaning (typname mismatch) then raise an error
In other words, even though these type OIDs are in a range that is
supposed to be stable/unlikely to change, I think original VCI
developers were deliberately cautious about any chance of OID values
changing in later PostgreSQL versions.
So, for the compatibility checking we still need to keep that function
'vci_check_supported_types', and therefore we also still need to keep
the struct because that is where the known oid/name mapping (at time
of VCI extension build) is held which is used for the checking.
The current code is written so that when building a new VCI you only
need to execute:
"SELECT oid, typname FROM pg_type WHERE typnamespace = 11 AND typrelid
= 0 AND typelem = 0 ORDER BY oid;"
for the current PostgreSQL then substitute in the necessary true/false
for support.
I agree this means the resulting vci_is_supported_type() is not as
efficient as your implementation, but OTOH it is perhaps easier to
maintain and check against new PostgreSQL releases?
Also, you'll encounter all the same problems with the supported
functions logic of vci_supported_funcs.c -- those have similar logic,
so maybe it is better to keep them similar despite inefficiencies?
FYI, I have re-executed the SQL
SELECT oid, typname FROM pg_type WHERE typnamespace = 11 AND typrelid
= 0 AND typelem = 0 ORDER BY oid;
and this exposed a few changes since whenever this
vci_supported_type_table[] was last generated many years ago.
~~
AFAICT we could implement the vci_supported_type_table[] to *only*
include the types where supported is “true”. That would be more
efficient than now because then the entire array would only have 20ish
elements, but comes at a cost of perhaps being less easy to compare
with the executed SQL result.
Thoughts?
Also, while the vci_supported_type_table[] lookup is needed for the
compatibility check logic, I guess we could implement the
vci_is_supported_type() function exactly the same as your patch
switch code, to be called for the runtime checking (during CREATE
INDEX). That would be more efficient at runtime, but comes at a cost
of then having 2 functions to maintain.
Thoughts?
~~~
So, for now, I have only done the following:
- Updated vci_supported_type_table[] according to the current SQL result.
- Notice that “name” (NAMEOID) no longer qualifies as a VCI supported
type (because typelem is not 0 anymore) so the test code was modified
to remove the name column “c04”.
- In passing, I removed the costings from the EXPLAIN test output
0002-Updated-supported-funcs-SQL-for-recent-PostgreSQL.patch
I wonder if it is possible to rewrite vci_supported_funcs.c in a
similar way. There is a vci_supported_funcs.sql which I updated so it
can be executed at master version of PostgreSQL.
OK. I included your changes in v9 and also added some IF EXISTS so the
.sql file can be run repeatedly without error. I also changed the
matching comment in the supported_funcs.c code where the SQL of this
file was described.
I don't know if it is intentional, but safe_types list from
vci_supported_funcs.sql have some differences to the types list from
vci_supported_types.c. Specifically, it doesn't include varchar, varbit
and uuid.
Thanks for reporting this! TBH, I also expected these lists should be the same.
I found multiple inconsistencies:
- Some items are in safe_types table but NOT in the
vci_supported_type_table[]
- Some Items are in the vci_supported_type_table[] but are
NOT in the safe_types table
Furthermore (and maybe partly caused by content of safe_types), the
results of running vci_support_func.sql also differs quite a lot also
from the vci_supported_func_table [] (in vci_supported_funcs.c), which
also does not seem right to me.
- Some items are in the SQL script results but are NOT in the
vci_supported_func_table []
- Some Items are in the vci_supported_func_table [] but are
NOT in the SQL script results
So, this remains an open problem. I am investigating if there is any
history/explanation of these differences. IIUC, then I hope later to
fix the safe_types, and then also fix the vci_supported_func_table[]
based on the result of that SQL script.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
On Wed, 2025-06-25 at 13:13 +1000, Peter Smith wrote:
Hi Timur, Thanks for your feedback!
Hello Peter!
Hmm. IIUC the “supported types” code was written this way for the
following reason. Out of an over-abundance (?) of caution, the
original patch authors wanted to call the function
'vci_check_supported_types' as a compatibility check, to discover
up-front (during CREATE EXTENSION) if any of the current PostgreSQL
OID definitions differ in any way since when the VCI extension was
built.
e.g.
- if vci build supports the type oid, but the oid value in current
PostgreSQL no longer exists then raise an error
- if vci build supports the type oid but the oid now has a different
meaning (typname mismatch) then raise an errorIn other words, even though these type OIDs are in a range that is
supposed to be stable/unlikely to change, I think original VCI
developers were deliberately cautious about any chance of OID values
changing in later PostgreSQL versions.
This check is really necessary if types are maintained as plain OID
numbers since they theoretically can change. Using preprocessor
constants, for example, BOOLOID instead of "16", make it simpler to
survive actual number change.
So, for the compatibility checking we still need to keep that
function
'vci_check_supported_types', and therefore we also still need to keep
the struct because that is where the known oid/name mapping (at time
of VCI extension build) is held which is used for the checking.The current code is written so that when building a new VCI you only
need to execute:
"SELECT oid, typname FROM pg_type WHERE typnamespace = 11 AND
typrelid
= 0 AND typelem = 0 ORDER BY oid;"
for the current PostgreSQL then substitute in the necessary
true/false
for support.
Using query to get numbers and storing those numbers in struct instead
of using defines like <TYPE_NAME>OID needs some runtime checks indeed.
I agree this means the resulting vci_is_supported_type() is not as
efficient as your implementation, but OTOH it is perhaps easier to
maintain and check against new PostgreSQL releases?Also, you'll encounter all the same problems with the supported
functions logic of vci_supported_funcs.c -- those have similar logic,
so maybe it is better to keep them similar despite inefficiencies?
I agree, having similar logic in both vci_supported_funcs.c and
vci_supported_types.c is good for code readability.
FYI, I have re-executed the SQL
SELECT oid, typname FROM pg_type WHERE typnamespace = 11 AND typrelid
= 0 AND typelem = 0 ORDER BY oid;
and this exposed a few changes since whenever this
vci_supported_type_table[] was last generated many years ago.~~
AFAICT we could implement the vci_supported_type_table[] to *only*
include the types where supported is “true”. That would be more
efficient than now because then the entire array would only have
20ish
elements, but comes at a cost of perhaps being less easy to compare
with the executed SQL result.Thoughts?
Looks like having both "false" and "true" cases inside structs is done
for better maintainability. I.e. in case we only have "true"-valued
types we can't say if some type is absent because it is decided to be
unsupported or because it was not examined yet.
Also, while the vci_supported_type_table[] lookup is needed for the
compatibility check logic, I guess we could implement the
vci_is_supported_type() function exactly the same as your patch
switch code, to be called for the runtime checking (during CREATE
INDEX). That would be more efficient at runtime, but comes at a cost
of then having 2 functions to maintain.Thoughts?
I think having two functions could make maintainability worse.
Maybe one day those checks will be rewritten using some smart runtime
logic in C, not compile-time list of types that were previously queried
using SQL script.
So I don't insist on my patch, it was just a proposal.
It has less code but probably we can do even better or at least stick
to current code running SQL query to get actual list of types from time
to time.
~~~
So, for now, I have only done the following:
- Updated vci_supported_type_table[] according to the current SQL
result.
- Notice that “name” (NAMEOID) no longer qualifies as a VCI supported
type (because typelem is not 0 anymore) so the test code was modified
to remove the name column “c04”.
- In passing, I removed the costings from the EXPLAIN test output0002-Updated-supported-funcs-SQL-for-recent-PostgreSQL.patch
I wonder if it is possible to rewrite vci_supported_funcs.c in a
similar way. There is a vci_supported_funcs.sql which I updated so
it
can be executed at master version of PostgreSQL.OK. I included your changes in v9 and also added some IF EXISTS so
the
.sql file can be run repeatedly without error. I also changed the
matching comment in the supported_funcs.c code where the SQL of this
file was described.
Thanks!
I don't know if it is intentional, but safe_types list from
vci_supported_funcs.sql have some differences to the types list
from
vci_supported_types.c. Specifically, it doesn't include varchar,
varbit
and uuid.Thanks for reporting this! TBH, I also expected these lists should be
the same.I found multiple inconsistencies:
- Some items are in safe_types table but NOT in the
vci_supported_type_table[]
- Some Items are in the vci_supported_type_table[] but are
NOT in the safe_types tableFurthermore (and maybe partly caused by content of safe_types), the
results of running vci_support_func.sql also differs quite a lot also
from the vci_supported_func_table [] (in vci_supported_funcs.c),
which
also does not seem right to me.
- Some items are in the SQL script results but are NOT in
the
vci_supported_func_table []
- Some Items are in the vci_supported_func_table [] but are
NOT in the SQL script resultsSo, this remains an open problem. I am investigating if there is any
history/explanation of these differences. IIUC, then I hope later to
fix the safe_types, and then also fix the vci_supported_func_table[]
based on the result of that SQL script.
Thanks! Evolutionary updating supported types and supported functions
lists looks good to me for now.
--
Regards,
Timur Magomedov
On Wed, 2025-06-25 at 12:22 +1000, Peter Smith wrote:
Hi.
Here are the latest v9* patches. These have following changes:
0001
- fixes some of the minor quirks reported by Tomas [1].0002
- fixes to added address some of Timur's feedback/patches [2]
- test code updated to remove unsupported type
- test code updated to remove dynamic costings
Hello Peter!
Thanks for updates.
I was trying to move some code from Postgres core patch to contrib/vci.
Started with moving VCI options from StdRdOptions struct and
reloptions.c, reloptions.h files into contrib/vci like Tomas suggested
earlier. Getting new relopt kind using add_reloption_kind() instead of
hardcoded RELOPT_KIND_VCI and so on, just like in contrib/bloom.
During this work I've found that those VCI-specific options,
"vci_column_ids" and "vci_dropped_column_ids", are only read in case
vci_IsExtendedToMoreThan32Columns() is true. And
vci_IsExtendedToMoreThan32Columns() itself checks if vci_column_ids
option is non-empty string to return true. AFAIK no one sets those
options and vci_IsExtendedToMoreThan32Columns() always returns false.
There are comments in code mentioning that one can create VCI index
with more than 32 columns using vci_create(). However, there is no such
a function anywhere. I guess it is now impossible to create VCI index
with more than 32 (INDEX_MAX_KEYS) columns. Maybe the options were used
to store column information for indexes created with vci_create() but
this function has gone.
So, giving the ideas above, can we remove those vci_column_ids,
vci_dropped_column_ids options and all the code that assumes
vci_IsExtendedToMoreThan32Columns() to be true? This would make core
patch a bit shorter, specifically there will be no changes in
reloptions.{c,h} files and no changes in StdRdOptions struct in rel.h.
--
Regards,
Timur Magomedov
On Sat, Jun 28, 2025 at 12:24 AM Timur Magomedov
<t.magomedov@postgrespro.ru> wrote:
...
Thanks for updates.
I was trying to move some code from Postgres core patch to contrib/vci.
Started with moving VCI options from StdRdOptions struct and
reloptions.c, reloptions.h files into contrib/vci like Tomas suggested
earlier. Getting new relopt kind using add_reloption_kind() instead of
hardcoded RELOPT_KIND_VCI and so on, just like in contrib/bloom.During this work I've found that those VCI-specific options,
"vci_column_ids" and "vci_dropped_column_ids", are only read in case
vci_IsExtendedToMoreThan32Columns() is true. And
vci_IsExtendedToMoreThan32Columns() itself checks if vci_column_ids
option is non-empty string to return true. AFAIK no one sets those
options and vci_IsExtendedToMoreThan32Columns() always returns false.There are comments in code mentioning that one can create VCI index
with more than 32 columns using vci_create(). However, there is no such
a function anywhere. I guess it is now impossible to create VCI index
with more than 32 (INDEX_MAX_KEYS) columns. Maybe the options were used
to store column information for indexes created with vci_create() but
this function has gone.So, giving the ideas above, can we remove those vci_column_ids,
vci_dropped_column_ids options and all the code that assumes
vci_IsExtendedToMoreThan32Columns() to be true? This would make core
patch a bit shorter, specifically there will be no changes in
reloptions.{c,h} files and no changes in StdRdOptions struct in rel.h.
Hi Timur.
Thanks for reporting this! Your idea seems correct to me.
AFAICT, those relops were intended for supporting >32 vci indexes,
which is only possible when the VCI index is created using
“vci_create()” function, but that function does not exist in the VCI
implementation posted to OSS. So, I agree with you – since those
relops are not currently needed they can just be removed. I will
confirm this understanding with the original patch devs and, if
correct, I will address this in a future patch version.
======
Kind Regards,
Peter Smith.
Fujitsu Australia.
Hi Tomas-san,
- Divide the patch more and more
- to allow committers to consider pros and cons and push one by one
- 15 patches is the maximum amount
- Separate features to some committable groupYes. Both parts need to be divided into small (but meaningful) pieces. I
don't recall where the "15 patches" came from, I don't think there's
some sort of maximum number of patches. All I care about is that the
patches need to be small-ish, and still make sense. Ideally, each part
should be committable on it's own. This makes it easier to test/review,
but also easier to agree on individual parts - in which case we may
commit the "preparatory" pieces and not have to rebase them forever.
Yes, we are aware the patch 0002 is far too big and are taking steps to
try to address that, firstly by removing any "dead" code, and secondly
by extracting out any "optional" components. This is an ongoing task.
If you have any other ideas how to split the patch please let us know.
relation_open
------------------
- Why is this change (disabling assert enforcing proper locking) needed?
There's a comment in lock.c which mentions VCI parallel workers, but
it's suspicious.
This comment is removed in the current posted patches.
reloptions.c
------------------
- Changes that should be moved to the contrib module Why should in-core
reloptions know about this? See for example how "contrib/bloom" defined
reloptions.
Thanks. We are studying how to replace this with an extensible solution
using "contrib/bloom" as guidance. Also, as Timur (28/Jun) suggests that
this might be unused code anyway. Investigating...
heapam.c
------------------
- add_index_delete_hook, not sure I understand the purpose of this,
likely related to the consistency model, needs explanation. Why is it
prefixed with "add_"?
Correct, this is related to maintaining consistency for the WOS
when a delete occurs. The "add_" prefix has no special meaning.
All of the VCI ad-hoc hooks have a prefix of "add_"
- Furthermore, if we need this, why should it be a hook and not a new
callback in IndexAmRoutine, just like aminsert()?
Agree. We intend later to implement this ad-hoc hook as a callback
in IndexAmRoutine as suggested.
vacuum.c
------------------
- Why. And what is "internal table"?
Please refer to the contrib/vci/README section 2.4 for a description
of VCI "Internal Relations"
execGrouping.c
------------------
- LookupTupleHashEntry_internal removes comment, why?
The latest post patches do not have this code.
execUtils.c
------------------
- again, disables locking check, very suspicious
Fixed in latest posted patches.
nodeModifytable.c
------------------
- add_should_index_insert_hook another strange hook
Agree. We intend later to implement this ad-hoc hook as a callback
in IndexAmRoutine as suggested.
mctx.c
------------------
- why MemoryContextMethods shouldn't be "const"?
Fixed in latest posted patches.
mctx.c
------------------
- why MemoryContextMethods shouldn't be "const"?common.c
pg_dump.c
pg_dump.h
------------------
- What is "vci view table" for, actually?
Fixed in latest posted patches.
reloptions.h
------------------
- Why does this need to add an entry into the relopt_kind enum? It
should be extensible, see the "contrib/bloom".
Thanks. We are studying how to replace this with an extensible
solution using "contrib/bloom" as guidance.
(Timur suggests that this also unused code anyway. Investigating...)
executor.h
------------------
- How come it's suddenly OK to return entry->additional without the if
conditions in TupleHashEntryGetAdditional?
This was an accidental revert of a master commit also reported by Timur.
The latest posted patches no longer have this change.
- What's with the struct LimitState; ? Why is it needed?
Fixed in latest posted patches.
numeric.h
------------------
- Why does this expose NumericVar, when the second patch redefines the
struct anyway? also, isn't that fragile? although, we're unlikely to
tweak the NumericVar definition.
Fixed in latest posted patches.
rel.h
------------------
- Doesn't AM have custom reloptions now? Like contrib/bloom?
Thanks. We are studying how to replace this StdRdOptions change
with an extensible solution using "contrib/bloom" as guidance.
(Timur suggests that this also unused code anyway. Investigating...)
Regards,
Aya Iwata
Fujitsu Limited
Here are the latest v10* patches
The main differences are:
PATCH 0001 (core)
- Rebase was needed due to changes in master.
- Removed all embedded vci relopts (per Timur report [1]/messages/by-id/7327bade206c4973b43056a3fc2ad2e7b5dec919.camel@postgrespro.ru these were
not being used).
PATCH 0002 (vci module)
- Changed extension version 1.0 to instead of 2.0
- Fixed several cfbot reported meson problems + WIN32 syntax errors
- Removed some more dead/unreachable code since "vci_create()"
function is not implemented
- Updated README to remove the "max_index_keys" difference.
======
[1]: /messages/by-id/7327bade206c4973b43056a3fc2ad2e7b5dec919.camel@postgrespro.ru
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
On Tue, Jul 1, 2025 at 1:17 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Sat, Jun 28, 2025 at 12:24 AM Timur Magomedov
<t.magomedov@postgrespro.ru> wrote:
...Thanks for updates.
I was trying to move some code from Postgres core patch to contrib/vci.
Started with moving VCI options from StdRdOptions struct and
reloptions.c, reloptions.h files into contrib/vci like Tomas suggested
earlier. Getting new relopt kind using add_reloption_kind() instead of
hardcoded RELOPT_KIND_VCI and so on, just like in contrib/bloom.During this work I've found that those VCI-specific options,
"vci_column_ids" and "vci_dropped_column_ids", are only read in case
vci_IsExtendedToMoreThan32Columns() is true. And
vci_IsExtendedToMoreThan32Columns() itself checks if vci_column_ids
option is non-empty string to return true. AFAIK no one sets those
options and vci_IsExtendedToMoreThan32Columns() always returns false.There are comments in code mentioning that one can create VCI index
with more than 32 columns using vci_create(). However, there is no such
a function anywhere. I guess it is now impossible to create VCI index
with more than 32 (INDEX_MAX_KEYS) columns. Maybe the options were used
to store column information for indexes created with vci_create() but
this function has gone.So, giving the ideas above, can we remove those vci_column_ids,
vci_dropped_column_ids options and all the code that assumes
vci_IsExtendedToMoreThan32Columns() to be true? This would make core
patch a bit shorter, specifically there will be no changes in
reloptions.{c,h} files and no changes in StdRdOptions struct in rel.h.Hi Timur.
Thanks for reporting this! Your idea seems correct to me.
AFAICT, those relops were intended for supporting >32 vci indexes,
which is only possible when the VCI index is created using
“vci_create()” function, but that function does not exist in the VCI
implementation posted to OSS. So, I agree with you – since those
relops are not currently needed they can just be removed. I will
confirm this understanding with the original patch devs and, if
correct, I will address this in a future patch version.
Fixed in v10.
======
Kind Regards,
Peter Smith.
Fujitsu Australia.
On Wed, Jul 2, 2025 at 10:22 AM Aya Iwata (Fujitsu)
<iwata.aya@fujitsu.com> wrote:
...
reloptions.c
------------------
- Changes that should be moved to the contrib module Why should in-core
reloptions know about this? See for example how "contrib/bloom" defined
reloptions.Thanks. We are studying how to replace this with an extensible solution
using "contrib/bloom" as guidance. Also, as Timur (28/Jun) suggests that
this might be unused code anyway. Investigating...
Fixed (removed) in v10.
reloptions.h
------------------
- Why does this need to add an entry into the relopt_kind enum? It
should be extensible, see the "contrib/bloom".Thanks. We are studying how to replace this with an extensible
solution using "contrib/bloom" as guidance.
(Timur suggests that this also unused code anyway. Investigating...)
Fixed (removed) in v10.
rel.h
------------------- Doesn't AM have custom reloptions now? Like contrib/bloom?
Thanks. We are studying how to replace this StdRdOptions change
with an extensible solution using "contrib/bloom" as guidance.
(Timur suggests that this also unused code anyway. Investigating...)
Fixed (removed) in v10.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
On Wed, 02 Jul 2025 at 16:34, Peter Smith <smithpb2250@gmail.com> wrote:
On Wed, Jul 2, 2025 at 10:22 AM Aya Iwata (Fujitsu)
<iwata.aya@fujitsu.com> wrote:...
reloptions.c
------------------
- Changes that should be moved to the contrib module Why should in-core
reloptions know about this? See for example how "contrib/bloom" defined
reloptions.Thanks. We are studying how to replace this with an extensible solution
using "contrib/bloom" as guidance. Also, as Timur (28/Jun) suggests that
this might be unused code anyway. Investigating...Fixed (removed) in v10.
reloptions.h
------------------
- Why does this need to add an entry into the relopt_kind enum? It
should be extensible, see the "contrib/bloom".Thanks. We are studying how to replace this with an extensible
solution using "contrib/bloom" as guidance.
(Timur suggests that this also unused code anyway. Investigating...)Fixed (removed) in v10.
rel.h
------------------- Doesn't AM have custom reloptions now? Like contrib/bloom?
Thanks. We are studying how to replace this StdRdOptions change
with an extensible solution using "contrib/bloom" as guidance.
(Timur suggests that this also unused code anyway. Investigating...)Fixed (removed) in v10.
Hi, Perter
When trying vic, I received the following error:
TRAP: failed Assert("TransactionIdIsNormal(xmax)"), File: "/data/Codes/pg/master/build/../src/backend/access/heap/heapam.c", Line: 7373, PID: 1719347
postgres: japin postgres [local] VACUUM(ExceptionalCondition+0xbb)[0x562674fbc44a]
postgres: japin postgres [local] VACUUM(heap_pre_freeze_checks+0x18f)[0x5626747eba07]
/data/Codes/pg/master/build/pg/lib/postgresql/vci.so(+0x61fca)[0xd355ccbbfca]
/data/Codes/pg/master/build/pg/lib/postgresql/vci.so(+0x6288f)[0xd355ccbc88f]
/data/Codes/pg/master/build/pg/lib/postgresql/vci.so(+0x51673)[0xd355ccab673]
/data/Codes/pg/master/build/pg/lib/postgresql/vci.so(+0x4ec41)[0xd355cca8c41]
postgres: japin postgres [local] VACUUM(index_vacuum_cleanup+0x181)[0x562674810008]
postgres: japin postgres [local] VACUUM(vac_cleanup_one_index+0x27)[0x562674a78dbf]
postgres: japin postgres [local] VACUUM(+0x1bb73b)[0x56267480973b]
postgres: japin postgres [local] VACUUM(+0x1bb489)[0x562674809489]
postgres: japin postgres [local] VACUUM(+0x1b8e64)[0x562674806e64]
postgres: japin postgres [local] VACUUM(heap_vacuum_rel+0x8de)[0x56267480578b]
postgres: japin postgres [local] VACUUM(+0x426ac9)[0x562674a74ac9]
postgres: japin postgres [local] VACUUM(+0x42a5cf)[0x562674a785cf]
postgres: japin postgres [local] VACUUM(vacuum+0x49c)[0x562674a75ec1]
postgres: japin postgres [local] VACUUM(ExecVacuum+0xdf3)[0x562674a759f7]
postgres: japin postgres [local] VACUUM(standard_ProcessUtility+0xa62)[0x562674dac27e]
/data/Codes/pg/master/build/pg/lib/postgresql/vci.so(+0x52c7e)[0xd355ccacc7e]
postgres: japin postgres [local] VACUUM(ProcessUtility+0x109)[0x562674dab7e4]
postgres: japin postgres [local] VACUUM(+0x75c0b7)[0x562674daa0b7]
postgres: japin postgres [local] VACUUM(+0x75c32f)[0x562674daa32f]
postgres: japin postgres [local] VACUUM(PortalRun+0x31d)[0x562674da978f]
postgres: japin postgres [local] VACUUM(+0x753dbe)[0x562674da1dbe]
postgres: japin postgres [local] VACUUM(PostgresMain+0xb43)[0x562674da7603]
postgres: japin postgres [local] VACUUM(+0x74f1da)[0x562674d9d1da]
postgres: japin postgres [local] VACUUM(postmaster_child_launch+0x1b1)[0x562674c97b05]
postgres: japin postgres [local] VACUUM(+0x650859)[0x562674c9e859]
postgres: japin postgres [local] VACUUM(+0x64dd28)[0x562674c9bd28]
postgres: japin postgres [local] VACUUM(PostmasterMain+0x1546)[0x562674c9b5b0]
postgres: japin postgres [local] VACUUM(main+0x38c)[0x562674b359e6]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0xd355b82a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0xd355b82a28b]
postgres: japin postgres [local] VACUUM(_start+0x25)[0x562674769e15]
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost
Here's how to reproduce the issue:
initdb -D demo
cat <<EOF >demo/postgresql.auto.conf
shared_preload_libraries = 'vci'
max_worker_processes = '20'
EOF
pg_ctl -D demo start
cat <<EOF | psql postgres
CREATE EXTENSION vci;
CREATE TABLE t (id int, info text);
CREATE INDEX ON t USING vci (id);
INSERT INTO t SELECT id, md5(random()::text) FROM generate_series(1, 1000) id;
VACUUM t;
EOF
--
Regards,
Japin Li
On Fri, Jul 4, 2025 at 4:03 PM Japin Li <japinli@hotmail.com> wrote:
...
When trying vic, I received the following error:
TRAP: failed Assert("TransactionIdIsNormal(xmax)"), File: "/data/Codes/pg/master/build/../src/backend/access/heap/heapam.c", Line: 7373, PID: 1719347
postgres: japin postgres [local] VACUUM(ExceptionalCondition+0xbb)[0x562674fbc44a]
postgres: japin postgres [local] VACUUM(heap_pre_freeze_checks+0x18f)[0x5626747eba07]
/data/Codes/pg/master/build/pg/lib/postgresql/vci.so(+0x61fca)[0xd355ccbbfca]
/data/Codes/pg/master/build/pg/lib/postgresql/vci.so(+0x6288f)[0xd355ccbc88f]
/data/Codes/pg/master/build/pg/lib/postgresql/vci.so(+0x51673)[0xd355ccab673]
/data/Codes/pg/master/build/pg/lib/postgresql/vci.so(+0x4ec41)[0xd355cca8c41]
postgres: japin postgres [local] VACUUM(index_vacuum_cleanup+0x181)[0x562674810008]
postgres: japin postgres [local] VACUUM(vac_cleanup_one_index+0x27)[0x562674a78dbf]
postgres: japin postgres [local] VACUUM(+0x1bb73b)[0x56267480973b]
postgres: japin postgres [local] VACUUM(+0x1bb489)[0x562674809489]
postgres: japin postgres [local] VACUUM(+0x1b8e64)[0x562674806e64]
postgres: japin postgres [local] VACUUM(heap_vacuum_rel+0x8de)[0x56267480578b]
postgres: japin postgres [local] VACUUM(+0x426ac9)[0x562674a74ac9]
postgres: japin postgres [local] VACUUM(+0x42a5cf)[0x562674a785cf]
postgres: japin postgres [local] VACUUM(vacuum+0x49c)[0x562674a75ec1]
postgres: japin postgres [local] VACUUM(ExecVacuum+0xdf3)[0x562674a759f7]
postgres: japin postgres [local] VACUUM(standard_ProcessUtility+0xa62)[0x562674dac27e]
/data/Codes/pg/master/build/pg/lib/postgresql/vci.so(+0x52c7e)[0xd355ccacc7e]
postgres: japin postgres [local] VACUUM(ProcessUtility+0x109)[0x562674dab7e4]
postgres: japin postgres [local] VACUUM(+0x75c0b7)[0x562674daa0b7]
postgres: japin postgres [local] VACUUM(+0x75c32f)[0x562674daa32f]
postgres: japin postgres [local] VACUUM(PortalRun+0x31d)[0x562674da978f]
postgres: japin postgres [local] VACUUM(+0x753dbe)[0x562674da1dbe]
postgres: japin postgres [local] VACUUM(PostgresMain+0xb43)[0x562674da7603]
postgres: japin postgres [local] VACUUM(+0x74f1da)[0x562674d9d1da]
postgres: japin postgres [local] VACUUM(postmaster_child_launch+0x1b1)[0x562674c97b05]
postgres: japin postgres [local] VACUUM(+0x650859)[0x562674c9e859]
postgres: japin postgres [local] VACUUM(+0x64dd28)[0x562674c9bd28]
postgres: japin postgres [local] VACUUM(PostmasterMain+0x1546)[0x562674c9b5b0]
postgres: japin postgres [local] VACUUM(main+0x38c)[0x562674b359e6]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0xd355b82a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0xd355b82a28b]
postgres: japin postgres [local] VACUUM(_start+0x25)[0x562674769e15]
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lostHere's how to reproduce the issue:
initdb -D demo
cat <<EOF >demo/postgresql.auto.conf
shared_preload_libraries = 'vci'
max_worker_processes = '20'
EOFpg_ctl -D demo start
cat <<EOF | psql postgres
CREATE EXTENSION vci;
CREATE TABLE t (id int, info text);
CREATE INDEX ON t USING vci (id);
INSERT INTO t SELECT id, md5(random()::text) FROM generate_series(1, 1000) id;
VACUUM t;
EOF
Hi Japin,
Thank you for your interest in VCI and for reporting the problem.
We are working to get this patch into a better shape; this vaccum
issue has been added to our list of things to fix. Thanks also for
your script - using it, I reproduced the same TRAP that you reported.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Hi. Here is the latest patch set.
Main differences are:
Patch 0001 (core)
- removed embedded vci code from autovacuum.c / postinit.c (now these
are same as master)
Patch 0002 (vci module)
- a few changes to facilitate the fix to patch 0001
- removed file vci.lib
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
On Sat, May 24, 2025 at 7:29 AM Tomas Vondra <tomas@vondra.me> wrote:
Hello Iwata-san,
...
autovacuum.c
------------------
- Why does this need a custom autovacuum launcher, or what's the purpose
of this code? is there a special autovacuum launcher for VCI indexes?- Again, explicit references to "VCI", should not be done like this.
Fixed in v11-0001
autovacuum.h
------------------
- It seems a bit strage to add "built-in" bgworkers for an extension. If
the extension really needs some sort of workers, it should be defined in
the extension.
Fixed in v11-0001
======
Kind Regards,
Peter Smith.
Fujitsu Australia
On Tue, 08 Jul 2025 at 10:11, Peter Smith <smithpb2250@gmail.com> wrote:
Hi. Here is the latest patch set.
Main differences are:
Patch 0001 (core)
- removed embedded vci code from autovacuum.c / postinit.c (now these
are same as master)Patch 0002 (vci module)
- a few changes to facilitate the fix to patch 0001
- removed file vci.lib
Hi, Peter
Thanks for updating the patch!
I find another bug when using VCI index as following:
2025-07-10 12:17:15.087 CST [2027312] PANIC: unexpected index access method call : "vci_beginscan"
2025-07-10 12:17:15.087 CST [2027312] STATEMENT: SELECT * FROM t WHERE id = 100;
2025-07-10 12:17:15.228 CST [2027143] LOG: client backend (PID 2027312) was terminated by signal 6: Aborted
2025-07-10 12:17:15.228 CST [2027143] DETAIL: Failed process was running: SELECT * FROM t WHERE id = 100;
2025-07-10 12:17:15.228 CST [2027143] LOG: terminating any other active server processes
2025-07-10 12:17:15.231 CST [2027143] LOG: all server processes terminated; reinitializing
2025-07-10 12:17:15.257 CST [2027317] LOG: database system was interrupted; last known up at 2025-07-10 12:17:07 CST
2025-07-10 12:17:15.333 CST [2027317] LOG: database system was not properly shut down; automatic recovery in progress
2025-07-10 12:17:15.337 CST [2027317] LOG: redo starts at 0/01761A18
2025-07-10 12:17:15.491 CST [2027143] LOG: startup process (PID 2027317) was terminated by signal 11: Segmentation fault
2025-07-10 12:17:15.491 CST [2027143] LOG: terminating any other active server processes
2025-07-10 12:17:15.493 CST [2027143] LOG: shutting down due to startup process failure
2025-07-10 12:17:15.496 CST [2027143] LOG: database system is shut down
Here are the steps:
rm -rf demo/
initdb -D demo
cat <<EOF >demo/postgresql.auto.conf
shared_preload_libraries = 'vci'
max_worker_processes = '20'
EOF
pg_ctl -D demo -l demo.log start
cat <<EOF | psql postgres
CREATE EXTENSION vci;
CREATE TABLE t (id int, info text);
CREATE INDEX t_id_idx ON t USING vci (id);
INSERT INTO t SELECT id, md5(random()::text) FROM generate_series(1, 1000) id;
SET enable_seqscan TO off;
SELECT * FROM t WHERE id = 100;
EOF
I'm still trying to understand the patches.
diff --git a/src/include/storage/itemptr.h b/src/include/storage/itemptr.h
index 74b87a9..d97d1c5 100644
--- a/src/include/storage/itemptr.h
+++ b/src/include/storage/itemptr.h
@@ -46,6 +46,9 @@ typedef struct ItemPointerData
#endif
ItemPointerData;
+#define SizeOfIptrData \
+ (offsetof(ItemPointerData, ip_posid) + sizeof(OffsetNumber))
+
I've noticed this macro is currently defined within core; however, I found it only
used in the VCI extension.
Could you clarify the rationale for its inclusion in the core, and whether it's
genuinely required there, or if it would be better suited within the extension
itself?
--
Regards,
Japin Li
On Thu, Jul 10, 2025 at 4:07 PM Japin Li <japinli@hotmail.com> wrote:
...
Thanks for updating the patch!
I find another bug when using VCI index as following:
2025-07-10 12:17:15.087 CST [2027312] PANIC: unexpected index access method call : "vci_beginscan"
2025-07-10 12:17:15.087 CST [2027312] STATEMENT: SELECT * FROM t WHERE id = 100;
2025-07-10 12:17:15.228 CST [2027143] LOG: client backend (PID 2027312) was terminated by signal 6: Aborted
2025-07-10 12:17:15.228 CST [2027143] DETAIL: Failed process was running: SELECT * FROM t WHERE id = 100;
2025-07-10 12:17:15.228 CST [2027143] LOG: terminating any other active server processes
2025-07-10 12:17:15.231 CST [2027143] LOG: all server processes terminated; reinitializing
2025-07-10 12:17:15.257 CST [2027317] LOG: database system was interrupted; last known up at 2025-07-10 12:17:07 CST
2025-07-10 12:17:15.333 CST [2027317] LOG: database system was not properly shut down; automatic recovery in progress
2025-07-10 12:17:15.337 CST [2027317] LOG: redo starts at 0/01761A18
2025-07-10 12:17:15.491 CST [2027143] LOG: startup process (PID 2027317) was terminated by signal 11: Segmentation fault
2025-07-10 12:17:15.491 CST [2027143] LOG: terminating any other active server processes
2025-07-10 12:17:15.493 CST [2027143] LOG: shutting down due to startup process failure
2025-07-10 12:17:15.496 CST [2027143] LOG: database system is shut downHere are the steps:
rm -rf demo/
initdb -D demo
cat <<EOF >demo/postgresql.auto.conf
shared_preload_libraries = 'vci'
max_worker_processes = '20'
EOFpg_ctl -D demo -l demo.log start
cat <<EOF | psql postgres
CREATE EXTENSION vci;
CREATE TABLE t (id int, info text);
CREATE INDEX t_id_idx ON t USING vci (id);
INSERT INTO t SELECT id, md5(random()::text) FROM generate_series(1, 1000) id;
SET enable_seqscan TO off;
SELECT * FROM t WHERE id = 100;
EOF
Thanks for reporting this. We will look into it as soon as we can.
I'm still trying to understand the patches.
diff --git a/src/include/storage/itemptr.h b/src/include/storage/itemptr.h index 74b87a9..d97d1c5 100644 --- a/src/include/storage/itemptr.h +++ b/src/include/storage/itemptr.h @@ -46,6 +46,9 @@ typedef struct ItemPointerData #endif ItemPointerData;+#define SizeOfIptrData \ + (offsetof(ItemPointerData, ip_posid) + sizeof(OffsetNumber)) +I've noticed this macro is currently defined within core; however, I found it only
used in the VCI extension.Could you clarify the rationale for its inclusion in the core, and whether it's
genuinely required there, or if it would be better suited within the extension
itself?
Right, this had previously also been reported by Tomas [1]/messages/by-id/a748aa6b-c7e6-4d02-a590-ab404d590448@vondra.me.
Upon investigation, I found that this was master code from 10 years
ago (back when this VCI patch was implemented). The master code has
moved on since then and removed this macro [2]https://github.com/postgres/postgres/commit/8023b5827fbada6815ce269db4f3373ac77ec7c3, but this VCI patch did
not...
I'll try to address this for the next patchset.
======
[1]: /messages/by-id/a748aa6b-c7e6-4d02-a590-ab404d590448@vondra.me
[2]: https://github.com/postgres/postgres/commit/8023b5827fbada6815ce269db4f3373ac77ec7c3
Kind Regards,
Peter Smith.
Fujitsu Australia
On Thu, 10 Jul 2025 at 17:51, Peter Smith <smithpb2250@gmail.com> wrote:
On Thu, Jul 10, 2025 at 4:07 PM Japin Li <japinli@hotmail.com> wrote:
...I'm still trying to understand the patches.
diff --git a/src/include/storage/itemptr.h b/src/include/storage/itemptr.h index 74b87a9..d97d1c5 100644 --- a/src/include/storage/itemptr.h +++ b/src/include/storage/itemptr.h @@ -46,6 +46,9 @@ typedef struct ItemPointerData #endif ItemPointerData;+#define SizeOfIptrData \ + (offsetof(ItemPointerData, ip_posid) + sizeof(OffsetNumber)) +I've noticed this macro is currently defined within core; however, I found it only
used in the VCI extension.Could you clarify the rationale for its inclusion in the core, and whether it's
genuinely required there, or if it would be better suited within the extension
itself?Right, this had previously also been reported by Tomas [1].
Upon investigation, I found that this was master code from 10 years
ago (back when this VCI patch was implemented). The master code has
moved on since then and removed this macro [2], but this VCI patch did
not...I'll try to address this for the next patchset.
======
[1] /messages/by-id/a748aa6b-c7e6-4d02-a590-ab404d590448@vondra.me
[2] https://github.com/postgres/postgres/commit/8023b5827fbada6815ce269db4f3373ac77ec7c3
My apologies, I wasn't following the full email thread.
--
Regards,
Japin Li
Thank You for working on this. I started going through the README and
tried running simple tests, have few concerns:
1)
I am not able to understand section 4.2 'WOS-to-ROS conversion'. When
whiteout-WOS says 'delete 4', what does that mean? 4 is CRID, TXID?
And when does delete-vector X represents? I did not get why ColA-2,
ColA-4 and ColB-2, ColB-4 were removed in resultant data? Is the
diagram complete?
2)
We can make the definition consistent at both places as the first one
gives a feeling that rows are marked for deletion in WOS while the
second one says ROS.
Whiteout WOS = Record of WOS rows marked for deletion
Whiteout WOS -- TID records of WOS rows that are marked for deletion on ROS
3)
It is not part of README. But please help me understand the meaning
and usage of this GUC in VCI context:
vci.max_devices: Sets the maximum device number which can be attached.
4)
I was trying my hands on using VCI, I found that on enabling the VCI
path, autovacuum asserts in between. We can reproduce the scenario
using:
create extension vci;
create table tab1( i int, j int, k int);
create index ivci on tab1 using vci (i);
insert into tab1 SELECT generate_series(1,1000),
generate_series(1,1000), generate_series(1,1000);
insert into tab1 SELECT generate_series(1,1000),
generate_series(1,1000), generate_series(1,1000);
SET enable_seqscan TO off;
VACUUM ANALYZE public.tab1;
It asserts here:
TRAP: failed Assert("TransactionIdIsNormal(xmax)"), File: "heapam.c",
Line: 7373, PID: 79834
postgres: autovacuum worker postgres(ExceptionalCondition+0xbb)[0x55fdf8f3ccc6]
postgres: autovacuum worker
postgres(heap_pre_freeze_checks+0x18f)[0x55fdf877e26f]
postgres: autovacuum worker postgres(index_vacuum_cleanup+0x181)[0x55fdf87a1c06]
postgres: autovacuum worker postgres(vac_cleanup_one_index+0x27)[0x55fdf8a0503a]
-------------
Few typos in README:
a) Each VCI indexed column is stored as an internal relations.
--relations --> relation
b) Records are addresses by CRID (Columnar Record ID) instead of by TID.
--addresses->addressed
c) Extents can be found by ID using offsets in a column "meta-data"
internal relation.
-- by ID using offsets? Do you mean 'by using offsets' alone?
d) EXPLAIN ANALYSE -->EXPLAIN ANALYZE
thanks
Shveta
On Fri, Jul 11, 2025 at 1:46 PM shveta malik <shveta.malik@gmail.com> wrote:
Thank You for working on this. I started going through the README and
tried running simple tests, have few concerns:1)
I am not able to understand section 4.2 'WOS-to-ROS conversion'. When
whiteout-WOS says 'delete 4', what does that mean? 4 is CRID, TXID?
And when does delete-vector X represents? I did not get why ColA-2,
ColA-4 and ColB-2, ColB-4 were removed in resultant data? Is the
diagram complete?2)
We can make the definition consistent at both places as the first one
gives a feeling that rows are marked for deletion in WOS while the
second one says ROS.Whiteout WOS = Record of WOS rows marked for deletion
Whiteout WOS -- TID records of WOS rows that are marked for deletion on ROS3)
It is not part of README. But please help me understand the meaning
and usage of this GUC in VCI context:
vci.max_devices: Sets the maximum device number which can be attached.4)
I was trying my hands on using VCI, I found that on enabling the VCI
path, autovacuum asserts in between. We can reproduce the scenario
using:
create extension vci;
create table tab1( i int, j int, k int);
create index ivci on tab1 using vci (i);
insert into tab1 SELECT generate_series(1,1000),
generate_series(1,1000), generate_series(1,1000);
insert into tab1 SELECT generate_series(1,1000),
generate_series(1,1000), generate_series(1,1000);
SET enable_seqscan TO off;
VACUUM ANALYZE public.tab1;It asserts here:
TRAP: failed Assert("TransactionIdIsNormal(xmax)"), File: "heapam.c",
Line: 7373, PID: 79834
postgres: autovacuum worker postgres(ExceptionalCondition+0xbb)[0x55fdf8f3ccc6]
postgres: autovacuum worker
postgres(heap_pre_freeze_checks+0x18f)[0x55fdf877e26f]
postgres: autovacuum worker postgres(index_vacuum_cleanup+0x181)[0x55fdf87a1c06]
postgres: autovacuum worker postgres(vac_cleanup_one_index+0x27)[0x55fdf8a0503a]-------------
Few typos in README:
a) Each VCI indexed column is stored as an internal relations.
--relations --> relationb) Records are addresses by CRID (Columnar Record ID) instead of by TID.
--addresses->addressedc) Extents can be found by ID using offsets in a column "meta-data"
internal relation.
-- by ID using offsets? Do you mean 'by using offsets' alone?d) EXPLAIN ANALYSE -->EXPLAIN ANALYZE
Hi Shveta.
Thanks for your README questions and typo reports. I will try to
address those soon.
Regarding your reported VACUUM exception, I believe this is the same
issue that was previously reported by Japin Li [1]/messages/by-id/SY8P300MB0442BEC3F5CF432F0121ACC4B642A@SY8P300MB0442.AUSP300.PROD.OUTLOOK.COM.
======
[1]: /messages/by-id/SY8P300MB0442BEC3F5CF432F0121ACC4B642A@SY8P300MB0442.AUSP300.PROD.OUTLOOK.COM
Kind Regards,
Peter Smith.
Fujitsu Australia
Hi. Here is the latest patch set v12*
Main differences are:
Patch 0001 (core)
- removed SizeOfIptrData macro, as reported by Tomas [1]Tomas - /messages/by-id/a748aa6b-c7e6-4d02-a590-ab404d590448@vondra.me and Japin [2]Japin - /messages/by-id/ME0P300MB04457E24CA8965F008FB2CDBB648A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Patch 0002 (vci module)
- Made fixes so the "ROS Control Worker" (for background WOS->ROS
transfer) can now launch ok.
======
[1]: Tomas - /messages/by-id/a748aa6b-c7e6-4d02-a590-ab404d590448@vondra.me
[2]: Japin - /messages/by-id/ME0P300MB04457E24CA8965F008FB2CDBB648A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Kind Regards,
Peter Smith.
Fujitsu Australia.
Attachments:
On Fri, 11 Jul 2025 at 16:31, Peter Smith <smithpb2250@gmail.com> wrote:
Hi. Here is the latest patch set v12*
Main differences are:
Patch 0001 (core)
- removed SizeOfIptrData macro, as reported by Tomas [1] and Japin [2]Patch 0002 (vci module)
- Made fixes so the "ROS Control Worker" (for background WOS->ROS
transfer) can now launch ok.
Hi, Peter
1.
I'm curious if you encountered the following warning during compilation:
/home/japin/Codes/pg/master/build/../contrib/vci/include/vci_ros.h:745:9: warning: result of comparison of constant 65536 with expression of type 'OffsetNumber' (aka 'unsigned short') is always true [-Wtautological-constant-out-of-range-c
ompare]
745 | return vci_MakeUint64FromBlockNumberAndOffset(blockNumber, item->ip_posid);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/japin/Codes/pg/master/build/../contrib/vci/include/vci_ros.h:639:19: note: expanded from macro 'vci_MakeUint64FromBlockNumberAndOffset'
638 | (AssertMacro((0 <= (offset)) \
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
639 | && ((offset) < (1U << (BITS_PER_BYTE * sizeof(OffsetNumber))))), \
| ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/japin/Codes/pg/master/build/../src/include/c.h:868:12: note: expanded from macro 'AssertMacro'
868 | ((void) ((condition) || \
| ^~~~~~~~~
1 warning generated.
Since the offset is unsigned, we can infer it's always non-negative. Did I miss
anything?
2.
I've noticed that InitPageCoreWithoutLock() consistently requires a lock.
Given this, I propose merging it with vci_InitPageCore() and having the caller
handle buffer locking.
3.
In the README, 'TID' seems to have conflicting definitions:
Transaction ID (2.1) vs. tuple physical identifier (2.3.1).
Could you confirm the intended meaning? Suggest using 'XID' for Transaction ID
if my understanding is correct.
4.
-1: TID relation (maps CRID to original TID)
-5: TID-CRID mapping table
I'm trying to understand the distinctions here. Based on the definition in
vci_tidcrid.h, it seems plausible to use just one relation for the mapping,
suggesting a potential redundancy.
/*
* TID-CRID pair used for TIDCRID update list
*/
typedef struct vcis_tidcrid_pair_item
{
ItemPointerData page_item_id; /* TID on the original relation */
vcis_Crid crid; /* CRID */
} vcis_tidcrid_pair_item_t;
How they are different? I see the code in vci_tidcrid.c
5.
Typo in README.
- Each extent can have its own independent compression dictionary or all
extents can share a comon dictionary
--> s/comon/common/g
--
Regards,
Japin Li
Hi Shveta,
Thanks for your README questions.
On Fri, Jul 11, 2025 at 1:46 PM shveta malik <shveta.malik@gmail.com> wrote:
Thank You for working on this. I started going through the README and
tried running simple tests, have few concerns:1)
I am not able to understand section 4.2 'WOS-to-ROS conversion'. When
whiteout-WOS says 'delete 4', what does that mean? 4 is CRID, TXID?
Whiteout WOS remembers the tuple that needs to be deleted on the next
WOS-to-ROS transfer. There is a TID/CRID mapping, so the intended
meaning of “delete 4” in this diagram was “delete the ROS data which
has CRID 4”.
And when does delete-vector X represents?
The delete vector is a bitset for knowing which records of ROS are
marked for deletion. IIUC, the bits of the “delete vector” are what
were previously in the “Whiteout ROS” -- i.e. the bits were set during
the previous WOS-to-ROS transfers.
Updating the delete vector bits is cheap, but it is more expensive to
reconcile with ROS to delete the ROS data, so that happens only
periodically when some threshold is exceeded. See README 2.5.3
“Garbage Collection”. But, the diagram is showing the result of
garbage collection at the same time as the WOS-to-ROS transfer.
The “X” in the diagram was supposed to represent that the bit is set
to mark the CRID 2 columns for deletion. I’ve changed this now to be
0’s and 1’s, which makes it consistent with the other description
about the delete vector in “2.3.3”. e.g. 1 means "marked for delete".
I did not get why ColA-2,
ColA-4 and ColB-2, ColB-4 were removed in resultant data?
The record (of CRID 4) was in the “Whiteout WOS”, so during
WOS-to-ROS transfer, the “delete vector” bit 4 would become set to
mark CRID 4 for deletion in the ROS.
The “garbage collection” (aka “deleted-rows-collection”) happens
according to some threshold; however, this diagram shows what happens
when the threshold is reached at the same time as the WOS-to-ROS
transfer.
e.g. So AFTER case shows result of the garbage collection as well:
The “ColA-2 │ ColB-2” was removed because the delete vector bit 2 was
already set
The “ColA-4 │ ColB-4” was removed because the delete vector bit 4
would become set (from having previously been in the Whiteout WOS)
~
Notice the CRIDs in the before/after are different because they were
renumbered after garbage collection.
Is the diagram complete?
Yes, the diagram was complete, but hopefully it is easier to
understand now that I have made a few minor changes to it. FYI, see
also the PGConf.dev 2025 presentation slides [1]slides - https://www.pgevents.ca/events/pgconfdev2025/sessions/session/292/slides/98/A%20journey%20toward%20the%20columnar%20data%20store%20.pdf – it might help
understanding to see similar information presented slightly
differently.
2)
We can make the definition consistent at both places as the first one
gives a feeling that rows are marked for deletion in WOS while the
second one says ROS.Whiteout WOS = Record of WOS rows marked for deletion
Whiteout WOS -- TID records of WOS rows that are marked for deletion on ROS
Fixed. Both are now using 2nd wording.
3)
It is not part of README. But please help me understand the meaning
and usage of this GUC in VCI context:
vci.max_devices: Sets the maximum device number which can be attached.
The WOS–to–ROS transfer can be done by background workers. IIUC, the
patch 0002 currently includes code for inspecting Linux devices
associated with tablespaces where any VCI indexes reside. The purpose
of this inspection is to discover the IO load so that VCI can
determine the best time to launch the background worker – e.g.
launching may be delayed longer if the system is deemed too busy. The
“vci.max_devices” puts a limit on the number of devices that can be
handled. All this logic is inherited from the product where this VCI
patch originated; I feel some of this may be overly complex for the
OSS patch's first version. We may be able to simplify/remove parts of
this logic – maybe even this GUC.
...
-------------
Few typos in README:
a) Each VCI indexed column is stored as an internal relations.
--relations --> relation
Fixed.
b) Records are addresses by CRID (Columnar Record ID) instead of by TID.
--addresses->addressed
Fixed.
c) Extents can be found by ID using offsets in a column "meta-data"
internal relation.
-- by ID using offsets? Do you mean 'by using offsets' alone?
I was trying to say the appropriate offset can be found using the
extent ID as the offset-array index. I’ve reworded this in the README
to be clearer.
d) EXPLAIN ANALYSE -->EXPLAIN ANALYZE
Fixed.
~~~
Please see the updated README.
======
[1]: slides - https://www.pgevents.ca/events/pgconfdev2025/sessions/session/292/slides/98/A%20journey%20toward%20the%20columnar%20data%20store%20.pdf
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
Hi Japin,
Thanks for your README questions.
On Fri, Jul 11, 2025 at 7:18 PM Japin Li <japinli@hotmail.com> wrote:
...
3.
In the README, 'TID' seems to have conflicting definitions:
Transaction ID (2.1) vs. tuple physical identifier (2.3.1).Could you confirm the intended meaning? Suggest using 'XID' for Transaction ID
if my understanding is correct.
Yes, TID was meant only for the Tuple identifier. Some terms became
muddled. Hopefully, those are fixed now.
4.
-1: TID relation (maps CRID to original TID)
-5: TID-CRID mapping tableI'm trying to understand the distinctions here. Based on the definition in
vci_tidcrid.h, it seems plausible to use just one relation for the mapping,
suggesting a potential redundancy./*
* TID-CRID pair used for TIDCRID update list
*/
typedef struct vcis_tidcrid_pair_item
{
ItemPointerData page_item_id; /* TID on the original relation */
vcis_Crid crid; /* CRID */
} vcis_tidcrid_pair_item_t;How they are different? I see the code in vci_tidcrid.c
AFAIK, the distinction is described by the code comments in vci_columns.h:
+/** Column ID of special column */
+#define VCI_COLUMN_ID_TID (-1)
+#define VCI_COLUMN_ID_NULL (-2)
+#define VCI_COLUMN_ID_DELETE (-3)
So those are all special columns in the ROS data part. In other words,
these internal relations all have data that is indexed by the CRID –
e.g “Delete vector” (2.3.3) and “Null information” (2.3.4). So here,
the TID relation is the mapping from the CRID back to the original
TID.
On the other hand, the other relations...
+/** The data below are not column-stored data.
+ * We prepare them for convenience.
+ */
+#define VCI_COLUMN_ID_TID_CRID (-5)
+#define VCI_COLUMN_ID_TID_CRID_UPDATE (-6)
+#define VCI_COLUMN_ID_TID_CRID_WRITE (-7)
+#define VCI_COLUMN_ID_TID_CRID_CDR (-8)
+#define VCI_COLUMN_ID_DATA_WOS (-9)
+#define VCI_COLUMN_ID_WHITEOUT_WOS (-10)
… are not “column-stored” – In other words, these ones, including the
"TID-CRID mapping table” (-5), are *not* indexed by CRID.
You may be right about a potential redundancy. But right now we're
focused on making these patches ready for open source - removing dead
code to shrink the size, improving the PostgreSQL core interface, and
fixing bugs. Rewriting or optimising the logic will have to wait.
5.
Typo in README.
- Each extent can have its own independent compression dictionary or all
extents can share a comon dictionary
--> s/comon/common/g
Fixed.
~~~
Please see the updated README that I attached in the previous post.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
On Mon, Jul 14, 2025 at 2:08 PM Peter Smith <smithpb2250@gmail.com> wrote:
Hi Shveta,
Thanks for your README questions.
On Fri, Jul 11, 2025 at 1:46 PM shveta malik <shveta.malik@gmail.com> wrote:
Thank You for working on this. I started going through the README and
tried running simple tests, have few concerns:1)
I am not able to understand section 4.2 'WOS-to-ROS conversion'. When
whiteout-WOS says 'delete 4', what does that mean? 4 is CRID, TXID?Whiteout WOS remembers the tuple that needs to be deleted on the next
WOS-to-ROS transfer. There is a TID/CRID mapping, so the intended
meaning of “delete 4” in this diagram was “delete the ROS data which
has CRID 4”.And when does delete-vector X represents?
The delete vector is a bitset for knowing which records of ROS are
marked for deletion. IIUC, the bits of the “delete vector” are what
were previously in the “Whiteout ROS” -- i.e. the bits were set during
the previous WOS-to-ROS transfers.Updating the delete vector bits is cheap, but it is more expensive to
reconcile with ROS to delete the ROS data, so that happens only
periodically when some threshold is exceeded. See README 2.5.3
“Garbage Collection”. But, the diagram is showing the result of
garbage collection at the same time as the WOS-to-ROS transfer.The “X” in the diagram was supposed to represent that the bit is set
to mark the CRID 2 columns for deletion. I’ve changed this now to be
0’s and 1’s, which makes it consistent with the other description
about the delete vector in “2.3.3”. e.g. 1 means "marked for delete".I did not get why ColA-2,
ColA-4 and ColB-2, ColB-4 were removed in resultant data?The record (of CRID 4) was in the “Whiteout WOS”, so during
WOS-to-ROS transfer, the “delete vector” bit 4 would become set to
mark CRID 4 for deletion in the ROS.The “garbage collection” (aka “deleted-rows-collection”) happens
according to some threshold; however, this diagram shows what happens
when the threshold is reached at the same time as the WOS-to-ROS
transfer.e.g. So AFTER case shows result of the garbage collection as well:
The “ColA-2 │ ColB-2” was removed because the delete vector bit 2 was
already set
The “ColA-4 │ ColB-4” was removed because the delete vector bit 4
would become set (from having previously been in the Whiteout WOS)~
Notice the CRIDs in the before/after are different because they were
renumbered after garbage collection.Is the diagram complete?
Yes, the diagram was complete, but hopefully it is easier to
understand now that I have made a few minor changes to it. FYI, see
also the PGConf.dev 2025 presentation slides [1] – it might help
understanding to see similar information presented slightly
differently.
Thanks for improving the diagram. Now the flow is clear.
2)
We can make the definition consistent at both places as the first one
gives a feeling that rows are marked for deletion in WOS while the
second one says ROS.Whiteout WOS = Record of WOS rows marked for deletion
Whiteout WOS -- TID records of WOS rows that are marked for deletion on ROSFixed. Both are now using 2nd wording.
3)
It is not part of README. But please help me understand the meaning
and usage of this GUC in VCI context:
vci.max_devices: Sets the maximum device number which can be attached.The WOS–to–ROS transfer can be done by background workers. IIUC, the
patch 0002 currently includes code for inspecting Linux devices
associated with tablespaces where any VCI indexes reside. The purpose
of this inspection is to discover the IO load so that VCI can
determine the best time to launch the background worker – e.g.
launching may be delayed longer if the system is deemed too busy. The
“vci.max_devices” puts a limit on the number of devices that can be
handled. All this logic is inherited from the product where this VCI
patch originated; I feel some of this may be overly complex for the
OSS patch's first version. We may be able to simplify/remove parts of
this logic – maybe even this GUC.
Thanks for the explanation. The concept is quite tricky though. I will
also look at the code to understand it better.
Show quoted text
...
-------------
Few typos in README:
a) Each VCI indexed column is stored as an internal relations.
--relations --> relationFixed.
b) Records are addresses by CRID (Columnar Record ID) instead of by TID.
--addresses->addressedFixed.
c) Extents can be found by ID using offsets in a column "meta-data"
internal relation.
-- by ID using offsets? Do you mean 'by using offsets' alone?I was trying to say the appropriate offset can be found using the
extent ID as the offset-array index. I’ve reworded this in the README
to be clearer.d) EXPLAIN ANALYSE -->EXPLAIN ANALYZE
Fixed.
~~~
Please see the updated README.
======
[1] slides - https://www.pgevents.ca/events/pgconfdev2025/sessions/session/292/slides/98/A%20journey%20toward%20the%20columnar%20data%20store%20.pdfKind Regards,
Peter Smith.
Fujitsu Australia
On Mon, 14 Jul 2025 at 18:47, Peter Smith <smithpb2250@gmail.com> wrote:
Hi Japin,
Thanks for your README questions.
On Fri, Jul 11, 2025 at 7:18 PM Japin Li <japinli@hotmail.com> wrote:
...3.
In the README, 'TID' seems to have conflicting definitions:
Transaction ID (2.1) vs. tuple physical identifier (2.3.1).Could you confirm the intended meaning? Suggest using 'XID' for Transaction ID
if my understanding is correct.Yes, TID was meant only for the Tuple identifier. Some terms became
muddled. Hopefully, those are fixed now.
Thanks for your confirmation.
4.
-1: TID relation (maps CRID to original TID)
-5: TID-CRID mapping tableI'm trying to understand the distinctions here. Based on the definition in
vci_tidcrid.h, it seems plausible to use just one relation for the mapping,
suggesting a potential redundancy./*
* TID-CRID pair used for TIDCRID update list
*/
typedef struct vcis_tidcrid_pair_item
{
ItemPointerData page_item_id; /* TID on the original relation */
vcis_Crid crid; /* CRID */
} vcis_tidcrid_pair_item_t;How they are different? I see the code in vci_tidcrid.c
AFAIK, the distinction is described by the code comments in vci_columns.h:
+/** Column ID of special column */ +#define VCI_COLUMN_ID_TID (-1) +#define VCI_COLUMN_ID_NULL (-2) +#define VCI_COLUMN_ID_DELETE (-3)So those are all special columns in the ROS data part. In other words,
these internal relations all have data that is indexed by the CRID –
e.g “Delete vector” (2.3.3) and “Null information” (2.3.4). So here,
the TID relation is the mapping from the CRID back to the original
TID.On the other hand, the other relations...
+/** The data below are not column-stored data. + * We prepare them for convenience. + */ +#define VCI_COLUMN_ID_TID_CRID (-5) +#define VCI_COLUMN_ID_TID_CRID_UPDATE (-6) +#define VCI_COLUMN_ID_TID_CRID_WRITE (-7) +#define VCI_COLUMN_ID_TID_CRID_CDR (-8) +#define VCI_COLUMN_ID_DATA_WOS (-9) +#define VCI_COLUMN_ID_WHITEOUT_WOS (-10)… are not “column-stored” – In other words, these ones, including the
"TID-CRID mapping table” (-5), are *not* indexed by CRID.You may be right about a potential redundancy. But right now we're
focused on making these patches ready for open source - removing dead
code to shrink the size, improving the PostgreSQL core interface, and
fixing bugs. Rewriting or optimising the logic will have to wait.
Appreciate the detailed explanation! I'll dive deeper into it.
--
Regards,
Japin Li
On Fri, Jul 11, 2025 at 7:18 PM Japin Li <japinli@hotmail.com> wrote:
...
Hi, Peter
1.
I'm curious if you encountered the following warning during compilation:/home/japin/Codes/pg/master/build/../contrib/vci/include/vci_ros.h:745:9: warning: result of comparison of constant 65536 with expression of type 'OffsetNumber' (aka 'unsigned short') is always true [-Wtautological-constant-out-of-range-c
ompare]
745 | return vci_MakeUint64FromBlockNumberAndOffset(blockNumber, item->ip_posid);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/japin/Codes/pg/master/build/../contrib/vci/include/vci_ros.h:639:19: note: expanded from macro 'vci_MakeUint64FromBlockNumberAndOffset'
638 | (AssertMacro((0 <= (offset)) \
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
639 | && ((offset) < (1U << (BITS_PER_BYTE * sizeof(OffsetNumber))))), \
| ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/japin/Codes/pg/master/build/../src/include/c.h:868:12: note: expanded from macro 'AssertMacro'
868 | ((void) ((condition) || \
| ^~~~~~~~~
1 warning generated.Since the offset is unsigned, we can infer it's always non-negative. Did I miss
anything?
Yeah, I don't see warnings in my build environment, but OTOH we are
aware there are a number of compiler warnings that are reported by the
cfbot [1]https://cirrus-ci.com/task/5631634596167680. That one you cited above is just another one of them.
Addressing all the cfbot compiler warnings in patch 0002 is certainly
on the to-do list.
======
[1]: https://cirrus-ci.com/task/5631634596167680
Kind Regards,
Peter Smith.
Fujitsu Australia
Hi. Here are the latest v13 patches.
Changes include:
PATCH 0002.
- README improvements -- as previously sent separately [1]/messages/by-id/CAHut+PvYQZAHcD-tK5XaobUpWoTf0Gkjx7nAA9eJq_HbPCSxCQ@mail.gmail.com
- Refactor InitPageCoreWithoutLock -- per proposal from Japin [2-#2]
- Changes to eliminate warnings from "headerscheck" and "cpluspluscheck"
- Add missing assignments in vci_handler()
======
[1]: /messages/by-id/CAHut+PvYQZAHcD-tK5XaobUpWoTf0Gkjx7nAA9eJq_HbPCSxCQ@mail.gmail.com
[2]: /messages/by-id/ME0P300MB0445FD473D75F65E8B0A6F5DB64BA@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
Hi.
Here are the latest v14 patches.
Changes include:
PATCH 0002.
- Fixes the enable_seqscan PANIC bug reported by Japin [1]/messages/by-id/ME0P300MB04457E24CA8965F008FB2CDBB648A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
- Adds a new regression test case for this
======
[1]: /messages/by-id/ME0P300MB04457E24CA8965F008FB2CDBB648A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
On Tue, 22 Jul 2025 at 17:46, Peter Smith <smithpb2250@gmail.com> wrote:
Hi.
Here are the latest v14 patches.
Changes include:
PATCH 0002.
- Fixes the enable_seqscan PANIC bug reported by Japin [1]
- Adds a new regression test case for this======
[1] /messages/by-id/ME0P300MB04457E24CA8965F008FB2CDBB648A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Hi.
Thanks for updating the patchset.
1.
I encountered another crash while checking VCI's internal relations.
rm -rf demo
initdb -D demo
cat <<EOF >demo/postgresql.auto.conf
shared_preload_libraries = 'vci'
max_worker_processes = '20'
EOF
pg_ctl -D demo start
cat <<EOF | psql postgres
CREATE EXTENSION vci;
CREATE TABLE t (id int, info text);
CREATE INDEX ON t USING vci (id);
SELECT relname FROM pg_class WHERE relname ~ '^vci_*' LIMIT 1 \gset
SELECT * FROM :relname;
\d+ :relname
REFRESH MATERIALIZED VIEW :relname;
EOF
VCI's definition of internal relations as materialized views lacks the
corresponding view bodies.
+ /*
+ * @see
+ * https://www.postgresql.jp/document/9.4/html/catalog-pg-rewrite.html
+ */
+ new_rel_reltup->relhasrules = true;
+
+ new_rel->rd_att->tdtypeid = new_type_oid;
+
+ InsertPgClassTuple(pg_class, new_rel, new_oid, (Datum) 0, (Datum) 0);
Given that VCI's internal relations are materialized views, are VCI workers
responsible for their periodic refreshment?
Or is it by design that users are unable to read the internal relations?
2.
+#ifdef WIN32
+ char *dir_name = "base\\" PG_TEMP_FILES_DIR;
+#endif
+
+#ifdef __sparc__
+ char *dir_name = "base/" PG_TEMP_FILES_DIR;
+#endif
I think it's also possible to use the second format on Windows.
And what happens if neither WIN32 nor __sparc__ are defined?
--
Regards,
Japin Li
Hi, Peter
On Tue, 22 Jul 2025 at 17:46, Peter Smith <smithpb2250@gmail.com> wrote:
Hi.
Here are the latest v14 patches.
Changes include:
PATCH 0002.
- Fixes the enable_seqscan PANIC bug reported by Japin [1]
- Adds a new regression test case for this======
[1] /messages/by-id/ME0P300MB04457E24CA8965F008FB2CDBB648A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
After some investigation, I found that the VACUUM assertion failures were
caused by an uninitialized HeapTupleFreeze structure. PFA.
--
Regards,
Japin Li
Attachments:
On Tue, Jul 22, 2025 at 8:12 PM Japin Li <japinli@hotmail.com> wrote:
...
1.
I encountered another crash while checking VCI's internal relations.rm -rf demo
initdb -D demo
cat <<EOF >demo/postgresql.auto.conf
shared_preload_libraries = 'vci'
max_worker_processes = '20'
EOFpg_ctl -D demo start
cat <<EOF | psql postgres
CREATE EXTENSION vci;
CREATE TABLE t (id int, info text);
CREATE INDEX ON t USING vci (id);
SELECT relname FROM pg_class WHERE relname ~ '^vci_*' LIMIT 1 \gset
SELECT * FROM :relname;
\d+ :relname
REFRESH MATERIALIZED VIEW :relname;
EOFVCI's definition of internal relations as materialized views lacks the
corresponding view bodies.+ /* + * @see + * https://www.postgresql.jp/document/9.4/html/catalog-pg-rewrite.html + */ + new_rel_reltup->relhasrules = true; + + new_rel->rd_att->tdtypeid = new_type_oid; + + InsertPgClassTuple(pg_class, new_rel, new_oid, (Datum) 0, (Datum) 0);Given that VCI's internal relations are materialized views, are VCI workers
responsible for their periodic refreshment?Or is it by design that users are unable to read the internal relations?
IIUC, those VCI internal relations (implemented as materialized views)
are entirely managed by VCI logic. Users are not required to be aware
of them, and they definitely are not meant to tamper with them.
The REFRESH that you attempted should have caused a more graceful error, like:
ERROR: extension "vci" prohibits this operation on view
"vci_0000016482_00000_d"
So, thanks for reporting that the ERROR failed. Investigating...
======
Kind Regards,
Peter Smith.
Fujitsu Australia
On Wed, Jul 23, 2025 at 1:58 PM Japin Li <japinli@hotmail.com> wrote:
Hi, Peter
On Tue, 22 Jul 2025 at 17:46, Peter Smith <smithpb2250@gmail.com> wrote:
Hi.
Here are the latest v14 patches.
Changes include:
PATCH 0002.
- Fixes the enable_seqscan PANIC bug reported by Japin [1]
- Adds a new regression test case for this======
[1] /messages/by-id/ME0P300MB04457E24CA8965F008FB2CDBB648A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COMAfter some investigation, I found that the VACUUM assertion failures were
caused by an uninitialized HeapTupleFreeze structure. PFA.
Hey, great! Thanks very much for your fix and the new test case.
I will include these when the next version is posted.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
On Wed, 23 Jul 2025 at 14:07, Peter Smith <smithpb2250@gmail.com> wrote:
On Tue, Jul 22, 2025 at 8:12 PM Japin Li <japinli@hotmail.com> wrote:
...1.
I encountered another crash while checking VCI's internal relations.rm -rf demo
initdb -D demo
cat <<EOF >demo/postgresql.auto.conf
shared_preload_libraries = 'vci'
max_worker_processes = '20'
EOFpg_ctl -D demo start
cat <<EOF | psql postgres
CREATE EXTENSION vci;
CREATE TABLE t (id int, info text);
CREATE INDEX ON t USING vci (id);
SELECT relname FROM pg_class WHERE relname ~ '^vci_*' LIMIT 1 \gset
SELECT * FROM :relname;
\d+ :relname
REFRESH MATERIALIZED VIEW :relname;
EOFVCI's definition of internal relations as materialized views lacks the
corresponding view bodies.+ /* + * @see + * https://www.postgresql.jp/document/9.4/html/catalog-pg-rewrite.html + */ + new_rel_reltup->relhasrules = true; + + new_rel->rd_att->tdtypeid = new_type_oid; + + InsertPgClassTuple(pg_class, new_rel, new_oid, (Datum) 0, (Datum) 0);Given that VCI's internal relations are materialized views, are VCI workers
responsible for their periodic refreshment?Or is it by design that users are unable to read the internal relations?
IIUC, those VCI internal relations (implemented as materialized views)
are entirely managed by VCI logic. Users are not required to be aware
of them, and they definitely are not meant to tamper with them.
Thanks for your explanation!
The REFRESH that you attempted should have caused a more graceful error, like:
ERROR: extension "vci" prohibits this operation on view
"vci_0000016482_00000_d"
So, thanks for reporting that the ERROR failed. Investigating...
I'm considering storing this metadata in heap tables, as Citus Columnar [1]https://github.com/citusdata/citus/blob/main/src/backend/columnar/columnar_metadata.c#L174
and TimescaleDB [2]https://github.com/timescale/timescaledb/blob/main/src/chunk.c#L151 also utilize them for some metadata. Is this a sound
approach? I'm wondering if this is a suitable strategy for VCI?
[1]: https://github.com/citusdata/citus/blob/main/src/backend/columnar/columnar_metadata.c#L174
[2]: https://github.com/timescale/timescaledb/blob/main/src/chunk.c#L151
--
Regards,
Japin Li
Here are the latest v15 patches.
Changes include:
PATCH 0002.
- README now says user should not tamper with VCI internal relations
- fixes/test the VACUUM bug -- fix provided by Japin [1]/messages/by-id/ME0P300MB0445891C69BD82561055F218B65FA@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
- fixes/tests the reported segv for attempted REFRESH of VCI internal
relation -- see [2 comment#1]
- fixes/tests VCI internal relation dependency on the indexed table
- simplifies code for PG_TEMP_FILES_DIR -- see [2 comment#2]
======
[1]: /messages/by-id/ME0P300MB0445891C69BD82561055F218B65FA@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
[2]: /messages/by-id/ME0P300MB0445EBA04D6947DD717074DFB65CA@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Kind Regards,
Peter Smith.
Fujitsu Australia.
Attachments:
On Tue, Jul 22, 2025 at 8:12 PM Japin Li <japinli@hotmail.com> wrote:
...
1.
I encountered another crash while checking VCI's internal relations.rm -rf demo
initdb -D demo
cat <<EOF >demo/postgresql.auto.conf
shared_preload_libraries = 'vci'
max_worker_processes = '20'
EOFpg_ctl -D demo start
cat <<EOF | psql postgres
CREATE EXTENSION vci;
CREATE TABLE t (id int, info text);
CREATE INDEX ON t USING vci (id);
SELECT relname FROM pg_class WHERE relname ~ '^vci_*' LIMIT 1 \gset
SELECT * FROM :relname;
\d+ :relname
REFRESH MATERIALIZED VIEW :relname;
EOFVCI's definition of internal relations as materialized views lacks the
corresponding view bodies.
Fixed in v15.
+ /* + * @see + * https://www.postgresql.jp/document/9.4/html/catalog-pg-rewrite.html + */ + new_rel_reltup->relhasrules = true; + + new_rel->rd_att->tdtypeid = new_type_oid; + + InsertPgClassTuple(pg_class, new_rel, new_oid, (Datum) 0, (Datum) 0);Given that VCI's internal relations are materialized views, are VCI workers
responsible for their periodic refreshment?Or is it by design that users are unable to read the internal relations?
Updated README in v15.
2. +#ifdef WIN32 + char *dir_name = "base\\" PG_TEMP_FILES_DIR; +#endif + +#ifdef __sparc__ + char *dir_name = "base/" PG_TEMP_FILES_DIR; +#endifI think it's also possible to use the second format on Windows.
And what happens if neither WIN32 nor __sparc__ are defined?
Modified in v15.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
On Wed, Jul 23, 2025 at 2:12 PM Peter Smith <smithpb2250@gmail.com> wrote:
On Wed, Jul 23, 2025 at 1:58 PM Japin Li <japinli@hotmail.com> wrote:
After some investigation, I found that the VACUUM assertion failures were
caused by an uninitialized HeapTupleFreeze structure. PFA.Hey, great! Thanks very much for your fix and the new test case.
I will include these when the next version is posted.
Fixed in v15.
======
Kind Regards
Peter Smith.
Fujitsu Australia
On Fri, Jul 25, 2025 at 1:58 PM Japin Li <japinli@hotmail.com> wrote:
On Wed, 23 Jul 2025 at 14:07, Peter Smith <smithpb2250@gmail.com> wrote:
On Tue, Jul 22, 2025 at 8:12 PM Japin Li <japinli@hotmail.com> wrote:
...
...
Or is it by design that users are unable to read the internal relations?
IIUC, those VCI internal relations (implemented as materialized views)
are entirely managed by VCI logic. Users are not required to be aware
of them, and they definitely are not meant to tamper with them.Thanks for your explanation!
The REFRESH that you attempted should have caused a more graceful error, like:
ERROR: extension "vci" prohibits this operation on view
"vci_0000016482_00000_d"
So, thanks for reporting that the ERROR failed. Investigating...I'm considering storing this metadata in heap tables, as Citus Columnar [1]
and TimescaleDB [2] also utilize them for some metadata. Is this a sound
approach? I'm wondering if this is a suitable strategy for VCI?[1] https://github.com/citusdata/citus/blob/main/src/backend/columnar/columnar_metadata.c#L174
[2] https://github.com/timescale/timescaledb/blob/main/src/chunk.c#L151
Hi Japin,
TL;DR;
------
What metadata relations did you have in mind to change, how do you
want to change them, and what is the main motivation?
Metadata
--------
What relations did you mean when you referred to "metadata"? There are
a number of VCI internal relations serving different purposes -- WOS,
NULL vectors, ROS Columnar Data, etc. That relation
("vci_0000016482_00000_d") is not exactly what I would call metadata;
that relation is for VCI ROS data of column 0 (the 'd' suffix means
"data"), so it is real data that VCI maintains in the ROS and keeps in
synch with the original data of the table being indexed.
The relations that VCI considers as "metadata" have an 'm' suffix. e.g.
+/**
+ * Meta Relation
+ */
+#define VCI_RELTYPE_META ('m')
+
All "metadata" files can be seen below with VCI_RELTYPE_META suffix.
Are these the files you had in mind to change?
+ /* TID */
+ oid = vci_create_relation(GenRelName(indexRel, VCI_COLUMN_ID_TID,
VCI_RELTYPE_DATA), indexRel, indexInfo, VCI_RELTYPE_ROS);
+ vci_SetMainRelVar(vmr_info, vcimrv_tid_data_oid, 0, oid);
+ oid = vci_create_relation(GenRelName(indexRel, VCI_COLUMN_ID_TID,
VCI_RELTYPE_META), indexRel, indexInfo, VCI_RELTYPE_ROS);
+ vci_SetMainRelVar(vmr_info, vcimrv_tid_meta_oid, 0, oid);
+
+ /* NUll */
+ oid = vci_create_relation(GenRelName(indexRel, VCI_COLUMN_ID_NULL,
VCI_RELTYPE_DATA), indexRel, indexInfo, VCI_RELTYPE_ROS);
+ vci_SetMainRelVar(vmr_info, vcimrv_null_data_oid, 0, oid);
+ oid = vci_create_relation(GenRelName(indexRel, VCI_COLUMN_ID_NULL,
VCI_RELTYPE_META), indexRel, indexInfo, VCI_RELTYPE_ROS);
+ vci_SetMainRelVar(vmr_info, vcimrv_null_meta_oid, 0, oid);
+
+ /* Delete Vector */
+ oid = vci_create_relation(GenRelName(indexRel, VCI_COLUMN_ID_DELETE,
VCI_RELTYPE_DATA), indexRel, indexInfo, VCI_RELTYPE_ROS);
+ vci_SetMainRelVar(vmr_info, vcimrv_delete_data_oid, 0, oid);
+ oid = vci_create_relation(GenRelName(indexRel, VCI_COLUMN_ID_DELETE,
VCI_RELTYPE_META), indexRel, indexInfo, VCI_RELTYPE_ROS);
+ vci_SetMainRelVar(vmr_info, vcimrv_delete_meta_oid, 0, oid);
...
+ vci_SetMainRelVar(vmr_info, vcimrv_num_columns, 0, tupdesc->natts);
+ for (i = 0; i < tupdesc->natts; i++)
+ {
+ Oid column_store_oid;
+ Oid column_meta_oid;
+ vcis_m_column_t *columnPointer;
+
+ column_store_oid = vci_create_relation(GenRelName(indexRel, i,
VCI_RELTYPE_DATA), indexRel, indexInfo, VCI_RELTYPE_ROS);
+ column_meta_oid = vci_create_relation(GenRelName(indexRel, i,
VCI_RELTYPE_META), indexRel, indexInfo, VCI_RELTYPE_ROS);
Heap Tables
-----------
When you say "storing ... in heap tables", I assumed you meant that
some of the metadata files could be stored as normal row-based
PostgreSQL files -- e.g. like PG catalogs, right?
AFAIK the current VCI internal relations are implemented with a column
of byte arrays that get mapped to VCI data structures in memory --
e.g. the Extents and Offsets and Bit Vectors etc (as described in the
README) are all stored in binary form. This arrangement also helps for
manipulating Extents for compression/defragmentation, and for Delete
Vector garbage collection and CRID renumbering.
Can you please elaborate what is your goal of changing some of these
relations to be normal PostgreSQL row-based tables? e.g. What problems
do you have in mind that using the heap tables will solve?
If your idea was mainly to assist debugging, then for this purpose I
believe that the original VCI implementation (in the commercial
product) had a separate "vci inspection" tool capable of dumping the
content of the VCI internal relations in a human-readable format. But,
we did not bring this tool across to OSS. Users are not supposed to be
aware of these VCI internal relations, and they cannot change them, so
even if they could see the contents of those heap tables it might
appear meaningless.
Current Focus
--------------
We know that the VCI patches still need more work before they might be
acceptable to the OSS community. Currently, we are focusing on
simplifying the existing VCI patches and fixing bugs. We also need to
make the PostgreSQL core interface less intrusive, and add lots more
test cases.
Changing any VCI internal relations to be heap tables may involve lots
of code rewrite, so even if it can be shown advantageous to do it, I'd
prefer to put this kind of work on a TODO list for later -- e.g. it
might be premature to redesign the current logic before all the other
necessary stability work is done first.
======
Kind Regards,
Peter Smith.
Fujitsu Australia.
On Wed, 30 Jul 2025 at 14:49, Peter Smith <smithpb2250@gmail.com> wrote:
On Fri, Jul 25, 2025 at 1:58 PM Japin Li <japinli@hotmail.com> wrote:
On Wed, 23 Jul 2025 at 14:07, Peter Smith <smithpb2250@gmail.com> wrote:
On Tue, Jul 22, 2025 at 8:12 PM Japin Li <japinli@hotmail.com> wrote:
......
Or is it by design that users are unable to read the internal relations?
IIUC, those VCI internal relations (implemented as materialized views)
are entirely managed by VCI logic. Users are not required to be aware
of them, and they definitely are not meant to tamper with them.Thanks for your explanation!
The REFRESH that you attempted should have caused a more graceful error, like:
ERROR: extension "vci" prohibits this operation on view
"vci_0000016482_00000_d"
So, thanks for reporting that the ERROR failed. Investigating...I'm considering storing this metadata in heap tables, as Citus Columnar [1]
and TimescaleDB [2] also utilize them for some metadata. Is this a sound
approach? I'm wondering if this is a suitable strategy for VCI?[1] https://github.com/citusdata/citus/blob/main/src/backend/columnar/columnar_metadata.c#L174
[2] https://github.com/timescale/timescaledb/blob/main/src/chunk.c#L151Hi Japin,
TL;DR;
------
What metadata relations did you have in mind to change, how do you
want to change them, and what is the main motivation?
I've noticed a strange user experience with vci_xxx_{d,m}. Although they are
defined as materialized views, they cannot be queried, leading to an error
that suggests using REFRESH MATERIALIZED VIEW. However, that command is also
rejected. This is my motivation.
Metadata
--------
What relations did you mean when you referred to "metadata"? There are
a number of VCI internal relations serving different purposes -- WOS,
NULL vectors, ROS Columnar Data, etc. That relation
("vci_0000016482_00000_d") is not exactly what I would call metadata;
that relation is for VCI ROS data of column 0 (the 'd' suffix means
"data"), so it is real data that VCI maintains in the ROS and keeps in
synch with the original data of the table being indexed.
My apologies, "metadata" I said might not be the most accurate term here.
I mean the internal relations.
The relations that VCI considers as "metadata" have an 'm' suffix. e.g.
+/** + * Meta Relation + */ +#define VCI_RELTYPE_META ('m') +All "metadata" files can be seen below with VCI_RELTYPE_META suffix.
Are these the files you had in mind to change?+ /* TID */ + oid = vci_create_relation(GenRelName(indexRel, VCI_COLUMN_ID_TID, VCI_RELTYPE_DATA), indexRel, indexInfo, VCI_RELTYPE_ROS); + vci_SetMainRelVar(vmr_info, vcimrv_tid_data_oid, 0, oid); + oid = vci_create_relation(GenRelName(indexRel, VCI_COLUMN_ID_TID, VCI_RELTYPE_META), indexRel, indexInfo, VCI_RELTYPE_ROS); + vci_SetMainRelVar(vmr_info, vcimrv_tid_meta_oid, 0, oid); + + /* NUll */ + oid = vci_create_relation(GenRelName(indexRel, VCI_COLUMN_ID_NULL, VCI_RELTYPE_DATA), indexRel, indexInfo, VCI_RELTYPE_ROS); + vci_SetMainRelVar(vmr_info, vcimrv_null_data_oid, 0, oid); + oid = vci_create_relation(GenRelName(indexRel, VCI_COLUMN_ID_NULL, VCI_RELTYPE_META), indexRel, indexInfo, VCI_RELTYPE_ROS); + vci_SetMainRelVar(vmr_info, vcimrv_null_meta_oid, 0, oid); + + /* Delete Vector */ + oid = vci_create_relation(GenRelName(indexRel, VCI_COLUMN_ID_DELETE, VCI_RELTYPE_DATA), indexRel, indexInfo, VCI_RELTYPE_ROS); + vci_SetMainRelVar(vmr_info, vcimrv_delete_data_oid, 0, oid); + oid = vci_create_relation(GenRelName(indexRel, VCI_COLUMN_ID_DELETE, VCI_RELTYPE_META), indexRel, indexInfo, VCI_RELTYPE_ROS); + vci_SetMainRelVar(vmr_info, vcimrv_delete_meta_oid, 0, oid);...
+ vci_SetMainRelVar(vmr_info, vcimrv_num_columns, 0, tupdesc->natts); + for (i = 0; i < tupdesc->natts; i++) + { + Oid column_store_oid; + Oid column_meta_oid; + vcis_m_column_t *columnPointer; + + column_store_oid = vci_create_relation(GenRelName(indexRel, i, VCI_RELTYPE_DATA), indexRel, indexInfo, VCI_RELTYPE_ROS); + column_meta_oid = vci_create_relation(GenRelName(indexRel, i, VCI_RELTYPE_META), indexRel, indexInfo, VCI_RELTYPE_ROS);Heap Tables
-----------
When you say "storing ... in heap tables", I assumed you meant that
some of the metadata files could be stored as normal row-based
PostgreSQL files -- e.g. like PG catalogs, right?
Yeah, you got me.
AFAIK the current VCI internal relations are implemented with a column
of byte arrays that get mapped to VCI data structures in memory --
e.g. the Extents and Offsets and Bit Vectors etc (as described in the
README) are all stored in binary form. This arrangement also helps for
manipulating Extents for compression/defragmentation, and for Delete
Vector garbage collection and CRID renumbering.
Sorry, I haven't thought about those things yet.
Can you please elaborate what is your goal of changing some of these
relations to be normal PostgreSQL row-based tables? e.g. What problems
do you have in mind that using the heap tables will solve?If your idea was mainly to assist debugging, then for this purpose I
believe that the original VCI implementation (in the commercial
product) had a separate "vci inspection" tool capable of dumping the
content of the VCI internal relations in a human-readable format. But,
we did not bring this tool across to OSS. Users are not supposed to be
aware of these VCI internal relations, and they cannot change them, so
even if they could see the contents of those heap tables it might
appear meaningless.
I agree that the internal relations aren't for end users. However, it's
confusing that users can't query the materialized views.
Current Focus
--------------
We know that the VCI patches still need more work before they might be
acceptable to the OSS community. Currently, we are focusing on
simplifying the existing VCI patches and fixing bugs. We also need to
make the PostgreSQL core interface less intrusive, and add lots more
test cases.Changing any VCI internal relations to be heap tables may involve lots
of code rewrite, so even if it can be shown advantageous to do it, I'd
prefer to put this kind of work on a TODO list for later -- e.g. it
might be premature to redesign the current logic before all the other
necessary stability work is done first.
--
Regards,
Japin Li
On Tue, 29 Jul 2025 at 06:57, Peter Smith <smithpb2250@gmail.com> wrote:
Here are the latest v15 patches.
Changes include:
PATCH 0002.
- README now says user should not tamper with VCI internal relations
- fixes/test the VACUUM bug -- fix provided by Japin [1]
- fixes/tests the reported segv for attempted REFRESH of VCI internal
relation -- see [2 comment#1]
- fixes/tests VCI internal relation dependency on the indexed table
- simplifies code for PG_TEMP_FILES_DIR -- see [2 comment#2]
Hi Peter,
Thanks for updating the patches.
1.
I've identified another TRAP failure. Here are the reproduction steps:
rm -rf demo
initdb -D demo
cat <<EOF >>demo/postgresql.auto.conf
shared_preload_libraries = 'vci'
max_worker_processes = '20'
EOF
pg_ctl -D demo start
cat <<EOF | psql postgres
CREATE EXTENSION vci;
CREATE TABLE t (id int, info text);
CREATE INDEX ON t USING vci (id);
INSERT INTO t SELECT id, md5(random()::text) FROM generate_series(1, 1000) id;
REINDEX TABLE t;
REINDEX TABLE t;
EOF
The current VCI design doesn't support REINDEX, which is expected. But I've
discovered an unexpected issue: running REINDEX on a table a second time causes
an assertion failure.
2.
+Internal Relation Types:
+- -1: TID relation (maps CRID to original TID)
+- -2: NULL vector (bit array for NULL values)
+- -3: Delete vector (bit array for deleted records)
+- -5: TID-CRID mapping table
+- -9: Data WOS (buffered row data)
+- -10: Whiteout WOS (deletion markers)
+- 0-N: ROS column data relations (one per indexed column)
+
+Example:
+For a VCI index on sales(customer_id, amount, date):
+
+Generated relations include:
+vci_0000001234_00000_d → Column 0 data (customer_id)
+vci_0000001234_00001_d → Column 1 data (amount)
+vci_0000001234_00002_d → Column 2 data (date)
+vci_0000001234_65535_d → TID relation
+vci_0000001234_65534_d → NULL vector
+vci_0000001234_65533_d → Delete vector
+vci_0000001234_65531_m → TID-CRID mapping
+vci_0000001234_65527_d → Data WOS
+vci_0000001234_65526_d → Whiteout WOS
The README states that it generates the above relations, but there are
additional internal relations that are not mentioned.
SELECT relname, relkind FROM pg_class WHERE relname ~ 'vci*' ORDER BY relname;
relname | relkind
------------------------+---------
vci_0000016578_00000_d | m
vci_0000016578_00000_m | m
vci_0000016578_00001_d | m
vci_0000016578_00001_m | m
vci_0000016578_65526_d | m
vci_0000016578_65527_d | m
vci_0000016578_65530_0 | m
vci_0000016578_65530_1 | m
vci_0000016578_65531_d | m
vci_0000016578_65531_m | m
vci_0000016578_65533_d | m
vci_0000016578_65533_m | m
vci_0000016578_65534_d | m
vci_0000016578_65534_m | m
vci_0000016578_65535_d | m
vci_0000016578_65535_m | m
Based on the above, are the following materialized views unused, or is their
use just undocumented?
- vci_0000016578_00000_m
- vci_0000016578_00001_m
- vci_0000016491_65530_0
- vci_0000016578_65530_1
- vci_0000016578_65531_d
- vci_0000016578_65534_m
- vci_0000016578_65535_m
What is the purpose of the '0' and '1' suffixes?
3.
I've also found that the VCI index is not working. Is this the expected
behavior?
[local]:3209161 postgres=# \d+ t
Table "public.t"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
info | text | | | | extended | | |
Indexes:
"t_id_idx" vci (id)
Access method: heap
[local]:3209161 postgres=# SET enable_seqscan TO off;
SET
[local]:3209161 postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000;
QUERY PLAN
-----------------------------------------------------
Seq Scan on t (cost=0.00..2084.00 rows=1 width=37)
Disabled: true
Filter: (id = 1000)
(3 rows)
--
Regards,
Japin Li
On Wed, Jul 30, 2025 at 9:07 PM Japin Li <japinli@hotmail.com> wrote:
...
2. +Internal Relation Types: +- -1: TID relation (maps CRID to original TID) +- -2: NULL vector (bit array for NULL values) +- -3: Delete vector (bit array for deleted records) +- -5: TID-CRID mapping table +- -9: Data WOS (buffered row data) +- -10: Whiteout WOS (deletion markers) +- 0-N: ROS column data relations (one per indexed column) + +Example: +For a VCI index on sales(customer_id, amount, date): + +Generated relations include: +vci_0000001234_00000_d → Column 0 data (customer_id) +vci_0000001234_00001_d → Column 1 data (amount) +vci_0000001234_00002_d → Column 2 data (date) +vci_0000001234_65535_d → TID relation +vci_0000001234_65534_d → NULL vector +vci_0000001234_65533_d → Delete vector +vci_0000001234_65531_m → TID-CRID mapping +vci_0000001234_65527_d → Data WOS +vci_0000001234_65526_d → Whiteout WOSThe README states that it generates the above relations, but there are
additional internal relations that are not mentioned.SELECT relname, relkind FROM pg_class WHERE relname ~ 'vci*' ORDER BY relname;
relname | relkind
------------------------+---------
vci_0000016578_00000_d | m
vci_0000016578_00000_m | m
vci_0000016578_00001_d | m
vci_0000016578_00001_m | m
vci_0000016578_65526_d | m
vci_0000016578_65527_d | m
vci_0000016578_65530_0 | m
vci_0000016578_65530_1 | m
vci_0000016578_65531_d | m
vci_0000016578_65531_m | m
vci_0000016578_65533_d | m
vci_0000016578_65533_m | m
vci_0000016578_65534_d | m
vci_0000016578_65534_m | m
vci_0000016578_65535_d | m
vci_0000016578_65535_m | mBased on the above, are the following materialized views unused, or is their
use just undocumented?- vci_0000016578_00000_m
- vci_0000016578_00001_m
- vci_0000016491_65530_0
- vci_0000016578_65530_1
- vci_0000016578_65531_d
- vci_0000016578_65534_m
- vci_0000016578_65535_mWhat is the purpose of the '0' and '1' suffixes?
Yeah, it was undocumented. I didn't intend for the README to give a
complete list, but in hindsight it may have been clearer if it did.
Those '0' and '1' relations are two more files associated with the
TID-CRID mappings -- those ones are for keeping track of mapping
updates.
+ oid = vci_create_relation(GenRelName(indexRel,
VCI_COLUMN_ID_TID_CRID_UPDATE, '0'), indexRel, indexInfo,
VCI_RELTYPE_TIDCRID);
+ vci_SetMainRelVar(vmr_info, vcimrv_tid_crid_update_oid_0, 0, oid);
+
+ oid = vci_create_relation(GenRelName(indexRel,
VCI_COLUMN_ID_TID_CRID_UPDATE, '1'), indexRel, indexInfo,
VCI_RELTYPE_TIDCRID);
+ vci_SetMainRelVar(vmr_info, vcimrv_tid_crid_update_oid_1, 0, oid);
The README was updated like below; it will be available next time new
patches are posted
------
Internal Relation Types:
- -1: TID relation (maps CRID to original TID)
- -2: NULL vector (bit array for NULL values)
- -3: Delete vector (bit array for deleted records)
- -5: TID-CRID mappings
- -6: TID-CRID mappings (update list)
- -9: Data WOS (buffered row data)
- -10: Whiteout WOS (deletion markers)
- 0-N: ROS column data relations (one per indexed column)
Example:
For a VCI index on sales(customer_id, amount, date):
Generated relations include:
vci_0000012345_00000_d → Column 0 data (customer_id)
vci_0000012345_00000_m ... and metadata
vci_0000012345_00001_d → Column 1 data (amount)
vci_0000012345_00001_m ... and metadata
vci_0000012345_00002_d → Column 2 data (date)
vci_0000012345_00002_m ... and metadata
vci_0000012345_65526_d → Whiteout WOS
vci_0000012345_65527_d → Data WOS
vci_0000012345_65531_d → TID-CRID mappings
vci_0000012345_65531_m ... and metadata
vci_0000012345_65530_0 ... and update list #0
vci_0000012345_65530_1 ... and update list #1
vci_0000012345_65533_d → Delete vector
vci_0000012345_65533_m ... and metadata
vci_0000012345_65534_d → NULL vector
vci_0000012345_65534_m ... and metadata
vci_0000012345_65535_d → TID relation
vci_0000012345_65535_m ... and metadata
------
======
Kind Regards,
Peter Smith.
Fujitsu Australia
On Wed, 30 Jul 2025 at 19:07, Japin Li <japinli@hotmail.com> wrote:
On Tue, 29 Jul 2025 at 06:57, Peter Smith <smithpb2250@gmail.com> wrote:
Here are the latest v15 patches.
Changes include:
PATCH 0002.
- README now says user should not tamper with VCI internal relations
- fixes/test the VACUUM bug -- fix provided by Japin [1]
- fixes/tests the reported segv for attempted REFRESH of VCI internal
relation -- see [2 comment#1]
- fixes/tests VCI internal relation dependency on the indexed table
- simplifies code for PG_TEMP_FILES_DIR -- see [2 comment#2]Hi Peter,
Thanks for updating the patches.
1.
I've identified another TRAP failure. Here are the reproduction steps:rm -rf demo
initdb -D demo
cat <<EOF >>demo/postgresql.auto.conf
shared_preload_libraries = 'vci'
max_worker_processes = '20'
EOFpg_ctl -D demo start
cat <<EOF | psql postgres
CREATE EXTENSION vci;
CREATE TABLE t (id int, info text);
CREATE INDEX ON t USING vci (id);
INSERT INTO t SELECT id, md5(random()::text) FROM generate_series(1, 1000) id;
REINDEX TABLE t;
REINDEX TABLE t;
EOFThe current VCI design doesn't support REINDEX, which is expected. But I've
discovered an unexpected issue: running REINDEX on a table a second time causes
an assertion failure.
It appears VCI doesn't support reindexing; add_reindex_index_hook() returns
false, and reindex_index() exits without restoring the security context.
PFA.
--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.
Attachments:
Import Notes
Reply to msg id not found: 87freeos1x.fsf@hotmail.com
On Thu, Jul 31, 2025 at 6:20 PM Japin Li <japinli@hotmail.com> wrote:
...
1.
I've identified another TRAP failure. Here are the reproduction steps:rm -rf demo
initdb -D demo
cat <<EOF >>demo/postgresql.auto.conf
shared_preload_libraries = 'vci'
max_worker_processes = '20'
EOFpg_ctl -D demo start
cat <<EOF | psql postgres
CREATE EXTENSION vci;
CREATE TABLE t (id int, info text);
CREATE INDEX ON t USING vci (id);
INSERT INTO t SELECT id, md5(random()::text) FROM generate_series(1, 1000) id;
REINDEX TABLE t;
REINDEX TABLE t;
EOFThe current VCI design doesn't support REINDEX, which is expected. But I've
discovered an unexpected issue: running REINDEX on a table a second time causes
an assertion failure.It appears VCI doesn't support reindexing; add_reindex_index_hook() returns
false, and reindex_index() exits without restoring the security context.
Thanks for your fix! This will be included in the next v16 patch set.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
On Wed, Jul 30, 2025 at 9:07 PM Japin Li <japinli@hotmail.com> wrote:
...
3.
I've also found that the VCI index is not working. Is this the expected
behavior?[local]:3209161 postgres=# \d+ t
Table "public.t"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
info | text | | | | extended | | |
Indexes:
"t_id_idx" vci (id)
Access method: heap[local]:3209161 postgres=# SET enable_seqscan TO off;
SET
[local]:3209161 postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000;
QUERY PLAN
-----------------------------------------------------
Seq Scan on t (cost=0.00..2084.00 rows=1 width=37)
Disabled: true
Filter: (id = 1000)
(3 rows)
Hi Japin. Yes, that's expected behaviour.
VCI is used only when the vci index is defined for all the columns of
your query. In your example there was a table with 2 columns ('id' and
'info') but you only have an index on the 'id' column. If you change
the query then you can see VCI getting used.
E.g.
postgres=# EXPLAIN SELECT id FROM t WHERE id = 1000;
QUERY PLAN
----------------------------------------------------------------------------
Custom Scan (VCI Scan) using tidx on t (cost=0.00..209.00 rows=1 width=4)
Filter: (id = 1000)
(2 rows)
postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000;
QUERY PLAN
----------------------------------------------------
Seq Scan on t (cost=0.00..209.00 rows=1 width=37)
Filter: (id = 1000)
(2 rows)
postgres=# EXPLAIN SELECT id,info FROM t WHERE id = 1000;
QUERY PLAN
----------------------------------------------------
Seq Scan on t (cost=0.00..209.00 rows=1 width=37)
Filter: (id = 1000)
(2 rows)
~~~
You can see this also in the DEBUG logs, from
vci_can_rewrite_custom_scan(), where it checks to see if the attrs are
in the vci index or not.
e.g.
2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: target table
"t"(oid=16384) tuples(rows=10000,extents=0)
2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: don't match
index "tidx"(oid=16469)
2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 1 x
2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 2
======
Kind Regards,
Peter Smith.
Fujitsu Australia
On Fri, Aug 01, 2025 at 05:18:11PM +1000, Peter Smith wrote:
On Wed, Jul 30, 2025 at 9:07â¯PM Japin Li <japinli@hotmail.com> wrote:
...
3.
I've also found that the VCI index is not working. Is this the expected
behavior?[local]:3209161 postgres=# \d+ t
Table "public.t"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
info | text | | | | extended | | |
Indexes:
"t_id_idx" vci (id)
Access method: heap[local]:3209161 postgres=# SET enable_seqscan TO off;
SET
[local]:3209161 postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000;
QUERY PLAN
-----------------------------------------------------
Seq Scan on t (cost=0.00..2084.00 rows=1 width=37)
Disabled: true
Filter: (id = 1000)
(3 rows)Hi Japin. Yes, that's expected behaviour.
VCI is used only when the vci index is defined for all the columns of
your query. In your example there was a table with 2 columns ('id' and
'info') but you only have an index on the 'id' column. If you change
the query then you can see VCI getting used.E.g.
postgres=# EXPLAIN SELECT id FROM t WHERE id = 1000;
QUERY PLAN
----------------------------------------------------------------------------
Custom Scan (VCI Scan) using tidx on t (cost=0.00..209.00 rows=1 width=4)
Filter: (id = 1000)
(2 rows)postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000;
QUERY PLAN
----------------------------------------------------
Seq Scan on t (cost=0.00..209.00 rows=1 width=37)
Filter: (id = 1000)
(2 rows)postgres=# EXPLAIN SELECT id,info FROM t WHERE id = 1000;
QUERY PLAN
----------------------------------------------------
Seq Scan on t (cost=0.00..209.00 rows=1 width=37)
Filter: (id = 1000)
(2 rows)~~~
You can see this also in the DEBUG logs, from
vci_can_rewrite_custom_scan(), where it checks to see if the attrs are
in the vci index or not.e.g.
2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: target table
"t"(oid=16384) tuples(rows=10000,extents=0)
2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: don't match
index "tidx"(oid=16469)
2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 1 x
2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 2
Thanks for your explantion! Got it.
Are there any plans to remove this restriction in the future?
--
Best regards,
Japin Li
ChengDu WenWu Information Technology Co., LTD.
Here are the latest v16 patches.
Changes include:
PATCH 0001.
- REINDEX bugfix -- fix provided by Japin [1]/messages/by-id/ME0P300MB04453BEE52F84048683460E4B627A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
PATCH 0002.
- REINDEX bugfix test cases - per [1]/messages/by-id/ME0P300MB04453BEE52F84048683460E4B627A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
- README now list all the internal relations -- per [2]/messages/by-id/CAHut+Pt8naGc7pH0YG_0G8Wu5aqJiHoT6xP+Y81_eJWapg9=DA@mail.gmail.com
- Query of VCI internal relation no longer causes confusing HINT -- per [3]/messages/by-id/ME0P300MB0445307958A2DC0831CEF56DB624A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
- Renamed the VCI internal relations to have a "pg_" prefix
======
[1]: /messages/by-id/ME0P300MB04453BEE52F84048683460E4B627A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
[2]: /messages/by-id/CAHut+Pt8naGc7pH0YG_0G8Wu5aqJiHoT6xP+Y81_eJWapg9=DA@mail.gmail.com
[3]: /messages/by-id/ME0P300MB0445307958A2DC0831CEF56DB624A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
On Wed, Jul 30, 2025 at 4:39 PM Japin Li <japinli@hotmail.com> wrote:
On Wed, 30 Jul 2025 at 14:49, Peter Smith <smithpb2250@gmail.com> wrote:
On Fri, Jul 25, 2025 at 1:58 PM Japin Li <japinli@hotmail.com> wrote:
On Wed, 23 Jul 2025 at 14:07, Peter Smith <smithpb2250@gmail.com> wrote:
On Tue, Jul 22, 2025 at 8:12 PM Japin Li <japinli@hotmail.com> wrote:
......
Or is it by design that users are unable to read the internal relations?
IIUC, those VCI internal relations (implemented as materialized views)
are entirely managed by VCI logic. Users are not required to be aware
of them, and they definitely are not meant to tamper with them.Thanks for your explanation!
The REFRESH that you attempted should have caused a more graceful error, like:
ERROR: extension "vci" prohibits this operation on view
"vci_0000016482_00000_d"
So, thanks for reporting that the ERROR failed. Investigating...I'm considering storing this metadata in heap tables, as Citus Columnar [1]
and TimescaleDB [2] also utilize them for some metadata. Is this a sound
approach? I'm wondering if this is a suitable strategy for VCI?[1] https://github.com/citusdata/citus/blob/main/src/backend/columnar/columnar_metadata.c#L174
[2] https://github.com/timescale/timescaledb/blob/main/src/chunk.c#L151Hi Japin,
TL;DR;
------
What metadata relations did you have in mind to change, how do you
want to change them, and what is the main motivation?I've noticed a strange user experience with vci_xxx_{d,m}. Although they are
defined as materialized views, they cannot be queried, leading to an error
that suggests using REFRESH MATERIALIZED VIEW. However, that command is also
rejected. This is my motivation.
FYI, the query is made possible now in v16. The result will be
meaningless, but the error and misleading HINT will no longer happen.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Here are the latest v17 patches.
Changes include:
PATCH 0002.
- Rebase to fix compile error, result of recent master change
- Bugfix for an unreported EXPLAIN ANALYZE error
- Bugfix for an unreported double pfree
- Code cleanup (ran pgindent; corrected ~100 typos; removed empty
coverage comments)
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
On Fri, Aug 08, 2025 at 06:08:27PM +1000, Peter Smith wrote:
Here are the latest v17 patches.
Changes include:
PATCH 0002.
- Rebase to fix compile error, result of recent master change
- Bugfix for an unreported EXPLAIN ANALYZE error
- Bugfix for an unreported double pfree
- Code cleanup (ran pgindent; corrected ~100 typos; removed empty
coverage comments)
1.
+static struct
+{
+ int transfn_oid; /* Transition function's funcoid. Arrays are
+ * sorted in ascending order */
+ Oid transtype; /* Transition data type */
+ PGFunction merge_trans; /* Function pointer set required for parallel
+ * aggregation for each transfn_oid */
+ vci_aggtranstype_kind kind; /* If transtype is INTERNALOID, its details */
+} trans_funcs_table[] = {
+ {F_FLOAT4_ACCUM, 1022, merge_floatX_accum, VCI_AGG_NOT_INTERNAL}, /* 208 */
+ {F_FLOAT8_ACCUM, 1022, merge_floatX_accum, VCI_AGG_NOT_INTERNAL}, /* 222 */
+ {F_INT8INC, 20, int8pl, VCI_AGG_NOT_INTERNAL}, /* 1833 */
+ {F_NUMERIC_ACCUM, 2281, numeric_combine, VCI_AGG_NUMERIC_AGG_STATE}, /* 1834 */
+ {F_INT2_ACCUM, 2281, numeric_poly_combine, VCI_AGG_POLY_NUM_AGG_STATE}, /* 1836 */
+ {F_INT4_ACCUM, 2281, numeric_poly_combine, VCI_AGG_POLY_NUM_AGG_STATE}, /* 1835 */
+ {F_INT8_ACCUM, 2281, numeric_combine, VCI_AGG_NUMERIC_AGG_STATE}, /* 1836 */
+ {F_INT2_SUM, 20, int8pl, VCI_AGG_NOT_INTERNAL}, /* 1840 */
+ {F_INT4_SUM, 20, int8pl, VCI_AGG_NOT_INTERNAL}, /* 1841 */
+ {F_INTERVAL_AVG_COMBINE, 2281, merge_interval_avg_accum, VCI_AGG_NOT_INTERNAL}, /* 3325 */
+ {F_INT2_AVG_ACCUM, 1016, merge_intX_accum, VCI_AGG_NOT_INTERNAL}, /* 1962 */
+ {F_INT4_AVG_ACCUM, 1016, merge_intX_accum, VCI_AGG_NOT_INTERNAL}, /* 1963 */
+ {F_INT8INC_ANY, 20, int8pl, VCI_AGG_NOT_INTERNAL}, /* 2804 */
+ {F_INT8_AVG_ACCUM, 2281, int8_avg_combine, VCI_AGG_POLY_AVG_NUM_AGG_STATE}, /* 2746 */
+ {F_NUMERIC_AVG_ACCUM, 2281, numeric_avg_combine, VCI_AGG_AVG_NUMERIC_AGG_STATE}, /* 2858 */
+};
The comments state that this is sorted in ascending order, but the code doesn't
follow that rule. While the current linear search works, a future change to
binary search could cause problems.
2.
+static void
+CheckIndexedRelationKind(Relation rel)
+{
+ char relKind = get_rel_relkind(RelationGetRelid(rel));
+
+ if (relKind == RELKIND_MATVIEW)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("access method \"%s\" does not support index on materialized view", VCI_STRING)));
+
+ if (rel->rd_rel->relpersistence == RELPERSISTENCE_TEMP)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("access method \"%s\" does not support index on temporary table", VCI_STRING)));
+}
Would it be possible to use rel->rd_rel->relkind directly? This might avoid
the overhead of a function call.
3.
The discussion on add_index_delete_hook [1]/messages/by-id/OS7PR01MB119642862AA1CE536549D08CFEA40A@OS7PR01MB11964.jpnprd01.prod.outlook.com makes me wonder if an Index Access
Method callback could serve the same purpose. This also raises the question:
would we then need an index update callback as well?
3.
Here are some typos.
a)
@@ -475,7 +477,7 @@ vci_scan_EndCustomPlan(CustomScanState *node)
default:
/* LCOV_EXCL_START */
- elog(PANIC, "Should not reach hare");
+ elog(PANIC, "Should not reach here");
/* LCOV_EXCL_STOP */
break;
}
b)
@@ -543,7 +545,7 @@ vci_create_relation(const char *rel_identifier, Relation indexRel, IndexInfo *in
TupleDescInitEntry(new_tupdesc, (AttrNumber) 1, "bindata", BYTEAOID, -1, 0); /* */
break;
- /* TIC-CRID */
+ /* TID-CRID */
case VCI_RELTYPE_TIDCRID:
natts = 1;
new_tupdesc = CreateTemplateTupleDesc(natts); /* no Oid */
c)
@@ -1065,7 +1065,7 @@ vci_inner_build(Relation heapRel, Relation indexRel, IndexInfo *indexInfo)
/*
* Put or Copy page into INIT_FORK.
* If valid page is given, that page will be putted into INIT_FORK.
- * If Invalid page (NULL pointer) is given, MAIN_FORK page well be copied.
+ * If invalid page (NULL pointer) is given, MAIN_FORK page well be copied.
*/
static void
vci_putInitPage(Oid oid, Page page, BlockNumber blkno)
[1]: /messages/by-id/OS7PR01MB119642862AA1CE536549D08CFEA40A@OS7PR01MB11964.jpnprd01.prod.outlook.com
--
Best regards,
Japin Li
ChengDu WenWu Information Technology Co., LTD.
Attachments:
On Mon, Aug 11, 2025 at 05:39:01PM +0800, Japin Li wrote:
On Fri, Aug 08, 2025 at 06:08:27PM +1000, Peter Smith wrote:
Here are the latest v17 patches.
Changes include:
PATCH 0002.
- Rebase to fix compile error, result of recent master change
- Bugfix for an unreported EXPLAIN ANALYZE error
- Bugfix for an unreported double pfree
- Code cleanup (ran pgindent; corrected ~100 typos; removed empty
coverage comments)3.
Here are some typos.a)
@@ -475,7 +477,7 @@ vci_scan_EndCustomPlan(CustomScanState *node)default: /* LCOV_EXCL_START */ - elog(PANIC, "Should not reach hare"); + elog(PANIC, "Should not reach here"); /* LCOV_EXCL_STOP */ break; } b) @@ -543,7 +545,7 @@ vci_create_relation(const char *rel_identifier, Relation indexRel, IndexInfo *in TupleDescInitEntry(new_tupdesc, (AttrNumber) 1, "bindata", BYTEAOID, -1, 0); /* */ break;- /* TIC-CRID */
+ /* TID-CRID */
case VCI_RELTYPE_TIDCRID:
natts = 1;
new_tupdesc = CreateTemplateTupleDesc(natts); /* no Oid */c) @@ -1065,7 +1065,7 @@ vci_inner_build(Relation heapRel, Relation indexRel, IndexInfo *indexInfo) /* * Put or Copy page into INIT_FORK. * If valid page is given, that page will be putted into INIT_FORK. - * If Invalid page (NULL pointer) is given, MAIN_FORK page well be copied. + * If invalid page (NULL pointer) is given, MAIN_FORK page well be copied. */ static void vci_putInitPage(Oid oid, Page page, BlockNumber blkno)
1.
PFA the other typos.
2.
I found it skip vci query context initialization in vci_intialize_query_context()
if full page writes is disabled, Could you explain why we need full page write
enabled for VCI?
3.
Both vci_ros.h and vci_ros.c have a comment about accessing the VCI main relation
header, but they are slightly different. Could we sync them and keep only one?
It seems the comment is outdated, as the functions vci_KeepReadingMainRelHeader()
and vci_KeepWritingMainRelHeader() do not exist in the current VCI implementation.
4.
+/**
+ * This function is assumed when the VCI index is newly built, and
+ * it converts all the data in the relation of PostgreSQL into ROS.
+ */
+uint64
+vci_ConvertWos2RosForBuild(Relation mainRel,
+ Size workareaSize,
+ IndexInfo *indexInfo)
...
+ result = (uint64) table_index_build_scan(comContext.heapRel,
+ mainRel,
+ indexInfo,
+ true, /* allow syncscan */
+ true,
+ vci_build_callback,
+ (void *) &comContext, NULL)
Perhaps we can use a double return type to avoid type casting since
other places also use double.
--
Best regards,
Japin Li
ChengDu WenWu Information Technology Co., LTD.
Attachments:
On Mon, 2025-08-11 at 17:39 +0800, Japin Li wrote:
1. +static struct +{ + int transfn_oid; /* Transition function's funcoid. Arrays are + * sorted in ascending order */ + Oid transtype; /* Transition data type */ + PGFunction merge_trans; /* Function pointer set required for parallel + * aggregation for each transfn_oid */ + vci_aggtranstype_kind kind; /* If transtype is INTERNALOID, its details */ +} trans_funcs_table[] = { + {F_FLOAT4_ACCUM, 1022, merge_floatX_accum, VCI_AGG_NOT_INTERNAL}, /* 208 */ + {F_FLOAT8_ACCUM, 1022, merge_floatX_accum, VCI_AGG_NOT_INTERNAL}, /* 222 */ + {F_INT8INC, 20, int8pl, VCI_AGG_NOT_INTERNAL}, /* 1833 */ + {F_NUMERIC_ACCUM, 2281, numeric_combine, VCI_AGG_NUMERIC_AGG_STATE}, /* 1834 */ + {F_INT2_ACCUM, 2281, numeric_poly_combine, VCI_AGG_POLY_NUM_AGG_STATE}, /* 1836 */ + {F_INT4_ACCUM, 2281, numeric_poly_combine, VCI_AGG_POLY_NUM_AGG_STATE}, /* 1835 */ + {F_INT8_ACCUM, 2281, numeric_combine, VCI_AGG_NUMERIC_AGG_STATE}, /* 1836 */ + {F_INT2_SUM, 20, int8pl, VCI_AGG_NOT_INTERNAL}, /* 1840 */ + {F_INT4_SUM, 20, int8pl, VCI_AGG_NOT_INTERNAL}, /* 1841 */ + {F_INTERVAL_AVG_COMBINE, 2281, merge_interval_avg_accum, VCI_AGG_NOT_INTERNAL}, /* 3325 */ + {F_INT2_AVG_ACCUM, 1016, merge_intX_accum, VCI_AGG_NOT_INTERNAL}, /* 1962 */ + {F_INT4_AVG_ACCUM, 1016, merge_intX_accum, VCI_AGG_NOT_INTERNAL}, /* 1963 */ + {F_INT8INC_ANY, 20, int8pl, VCI_AGG_NOT_INTERNAL}, /* 2804 */ + {F_INT8_AVG_ACCUM, 2281, int8_avg_combine, VCI_AGG_POLY_AVG_NUM_AGG_STATE}, /* 2746 */ + {F_NUMERIC_AVG_ACCUM, 2281, numeric_avg_combine, VCI_AGG_AVG_NUMERIC_AGG_STATE}, /* 2858 */ +};The comments state that this is sorted in ascending order, but the
code doesn't
follow that rule. While the current linear search works, a future
change to
binary search could cause problems.
Hi Japin!
I've looked at the code, vci_set_merge_and_copy_trans_funcs() function
is unused and almost all code of vci_aggmergetranstype.c file including
trans_funcs_table[] can be either removed either replaced with simple
switch-case. Only transfn_oid fields of trans_funcs_table[] were
actually used. Here is my patch made on top of v17.
Peter, what do you think? Is it OK to remove those code?
--
Regards,
Timur Magomedov
Attachments:
Here are the latest v18* patches.
Changes include:
PATCH 0002.
Cleaning:
- Fix typos (per Japin patches [1]Japin 11/8. /messages/by-id/ME0P300MB04457AAC931AD1E3D0CE32FBB628A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM and [2]Japin 12/8. /messages/by-id/ME0P300MB04450DF54484C145B77BD0D8B62BA@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM)
- Remove all #if 0 code
- Make all header comments more consistent
- Cleanup Doxygen annotation formatting
- Remove all double blank lines
Fixes:
- trans_funcs_table[] ordering (per Japin patch [1]Japin 11/8. /messages/by-id/ME0P300MB04457AAC931AD1E3D0CE32FBB628A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM)
- Access relkind via member instead of function calls (per Japin patch [1]Japin 11/8. /messages/by-id/ME0P300MB04457AAC931AD1E3D0CE32FBB628A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM)
- Change vci_ConvertWos2RosForBuild return type to reduce casts (per Japin [2]Japin 12/8. /messages/by-id/ME0P300MB04450DF54484C145B77BD0D8B62BA@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM)
======
[1]: Japin 11/8. /messages/by-id/ME0P300MB04457AAC931AD1E3D0CE32FBB628A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
/messages/by-id/ME0P300MB04457AAC931AD1E3D0CE32FBB628A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
[2]: Japin 12/8. /messages/by-id/ME0P300MB04450DF54484C145B77BD0D8B62BA@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
/messages/by-id/ME0P300MB04450DF54484C145B77BD0D8B62BA@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
On Mon, Aug 11, 2025 at 7:39 PM Japin Li <japinli@hotmail.com> wrote:
On Fri, Aug 08, 2025 at 06:08:27PM +1000, Peter Smith wrote:
Here are the latest v17 patches.
Changes include:
PATCH 0002.
- Rebase to fix compile error, result of recent master change
- Bugfix for an unreported EXPLAIN ANALYZE error
- Bugfix for an unreported double pfree
- Code cleanup (ran pgindent; corrected ~100 typos; removed empty
coverage comments)1. +static struct +{ + int transfn_oid; /* Transition function's funcoid. Arrays are + * sorted in ascending order */ + Oid transtype; /* Transition data type */ + PGFunction merge_trans; /* Function pointer set required for parallel + * aggregation for each transfn_oid */ + vci_aggtranstype_kind kind; /* If transtype is INTERNALOID, its details */ +} trans_funcs_table[] = { + {F_FLOAT4_ACCUM, 1022, merge_floatX_accum, VCI_AGG_NOT_INTERNAL}, /* 208 */ + {F_FLOAT8_ACCUM, 1022, merge_floatX_accum, VCI_AGG_NOT_INTERNAL}, /* 222 */ + {F_INT8INC, 20, int8pl, VCI_AGG_NOT_INTERNAL}, /* 1833 */ + {F_NUMERIC_ACCUM, 2281, numeric_combine, VCI_AGG_NUMERIC_AGG_STATE}, /* 1834 */ + {F_INT2_ACCUM, 2281, numeric_poly_combine, VCI_AGG_POLY_NUM_AGG_STATE}, /* 1836 */ + {F_INT4_ACCUM, 2281, numeric_poly_combine, VCI_AGG_POLY_NUM_AGG_STATE}, /* 1835 */ + {F_INT8_ACCUM, 2281, numeric_combine, VCI_AGG_NUMERIC_AGG_STATE}, /* 1836 */ + {F_INT2_SUM, 20, int8pl, VCI_AGG_NOT_INTERNAL}, /* 1840 */ + {F_INT4_SUM, 20, int8pl, VCI_AGG_NOT_INTERNAL}, /* 1841 */ + {F_INTERVAL_AVG_COMBINE, 2281, merge_interval_avg_accum, VCI_AGG_NOT_INTERNAL}, /* 3325 */ + {F_INT2_AVG_ACCUM, 1016, merge_intX_accum, VCI_AGG_NOT_INTERNAL}, /* 1962 */ + {F_INT4_AVG_ACCUM, 1016, merge_intX_accum, VCI_AGG_NOT_INTERNAL}, /* 1963 */ + {F_INT8INC_ANY, 20, int8pl, VCI_AGG_NOT_INTERNAL}, /* 2804 */ + {F_INT8_AVG_ACCUM, 2281, int8_avg_combine, VCI_AGG_POLY_AVG_NUM_AGG_STATE}, /* 2746 */ + {F_NUMERIC_AVG_ACCUM, 2281, numeric_avg_combine, VCI_AGG_AVG_NUMERIC_AGG_STATE}, /* 2858 */ +};The comments state that this is sorted in ascending order, but the code doesn't
follow that rule. While the current linear search works, a future change to
binary search could cause problems.
Fixed in v18.
2. +static void +CheckIndexedRelationKind(Relation rel) +{ + char relKind = get_rel_relkind(RelationGetRelid(rel)); + + if (relKind == RELKIND_MATVIEW) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("access method \"%s\" does not support index on materialized view", VCI_STRING))); + + if (rel->rd_rel->relpersistence == RELPERSISTENCE_TEMP) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("access method \"%s\" does not support index on temporary table", VCI_STRING))); +}Would it be possible to use rel->rd_rel->relkind directly? This might avoid
the overhead of a function call.
Fixed in v18.
3.
The discussion on add_index_delete_hook [1] makes me wonder if an Index Access
Method callback could serve the same purpose. This also raises the question:
would we then need an index update callback as well?
Yeah, the README "3.3.1 Ad-hoc hooks" already says that the plan is to
try to see if we can convert these ad-hoc VCI hooks to instead use IAM
callback methods wherever possible.
3.
Here are some typos.a)
@@ -475,7 +477,7 @@ vci_scan_EndCustomPlan(CustomScanState *node)default: /* LCOV_EXCL_START */ - elog(PANIC, "Should not reach hare"); + elog(PANIC, "Should not reach here"); /* LCOV_EXCL_STOP */ break; } b) @@ -543,7 +545,7 @@ vci_create_relation(const char *rel_identifier, Relation indexRel, IndexInfo *in TupleDescInitEntry(new_tupdesc, (AttrNumber) 1, "bindata", BYTEAOID, -1, 0); /* */ break;- /* TIC-CRID */
+ /* TID-CRID */
case VCI_RELTYPE_TIDCRID:
natts = 1;
new_tupdesc = CreateTemplateTupleDesc(natts); /* no Oid */c) @@ -1065,7 +1065,7 @@ vci_inner_build(Relation heapRel, Relation indexRel, IndexInfo *indexInfo) /* * Put or Copy page into INIT_FORK. * If valid page is given, that page will be putted into INIT_FORK. - * If Invalid page (NULL pointer) is given, MAIN_FORK page well be copied. + * If invalid page (NULL pointer) is given, MAIN_FORK page well be copied. */ static void vci_putInitPage(Oid oid, Page page, BlockNumber blkno)
Fixed in v18.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
On Tue, Aug 12, 2025 at 1:48 PM Japin Li <japinli@hotmail.com> wrote:
...
1.
PFA the other typos.
Fixed in v18.
2.
I found it skip vci query context initialization in vci_intialize_query_context()
if full page writes is disabled, Could you explain why we need full page write
enabled for VCI?
TODO
3.
Both vci_ros.h and vci_ros.c have a comment about accessing the VCI main relation
header, but they are slightly different. Could we sync them and keep only one?It seems the comment is outdated, as the functions vci_KeepReadingMainRelHeader()
and vci_KeepWritingMainRelHeader() do not exist in the current VCI implementation.
TODO. I agree that there should be only one comment, and outdated
function references need to be fixed.
4. +/** + * This function is assumed when the VCI index is newly built, and + * it converts all the data in the relation of PostgreSQL into ROS. + */ +uint64 +vci_ConvertWos2RosForBuild(Relation mainRel, + Size workareaSize, + IndexInfo *indexInfo) ... + result = (uint64) table_index_build_scan(comContext.heapRel, + mainRel, + indexInfo, + true, /* allow syncscan */ + true, + vci_build_callback, + (void *) &comContext, NULL)Perhaps we can use a double return type to avoid type casting since
other places also use double.
Fixed in v18.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
On Thu, Aug 14, 2025 at 2:28 AM Timur Magomedov
<t.magomedov@postgrespro.ru> wrote:
On Mon, 2025-08-11 at 17:39 +0800, Japin Li wrote:
1. +static struct +{ + int transfn_oid; /* Transition function's funcoid. Arrays are + * sorted in ascending order */ + Oid transtype; /* Transition data type */ + PGFunction merge_trans; /* Function pointer set required for parallel + * aggregation for each transfn_oid */ + vci_aggtranstype_kind kind; /* If transtype is INTERNALOID, its details */ +} trans_funcs_table[] = { + {F_FLOAT4_ACCUM, 1022, merge_floatX_accum, VCI_AGG_NOT_INTERNAL}, /* 208 */ + {F_FLOAT8_ACCUM, 1022, merge_floatX_accum, VCI_AGG_NOT_INTERNAL}, /* 222 */ + {F_INT8INC, 20, int8pl, VCI_AGG_NOT_INTERNAL}, /* 1833 */ + {F_NUMERIC_ACCUM, 2281, numeric_combine, VCI_AGG_NUMERIC_AGG_STATE}, /* 1834 */ + {F_INT2_ACCUM, 2281, numeric_poly_combine, VCI_AGG_POLY_NUM_AGG_STATE}, /* 1836 */ + {F_INT4_ACCUM, 2281, numeric_poly_combine, VCI_AGG_POLY_NUM_AGG_STATE}, /* 1835 */ + {F_INT8_ACCUM, 2281, numeric_combine, VCI_AGG_NUMERIC_AGG_STATE}, /* 1836 */ + {F_INT2_SUM, 20, int8pl, VCI_AGG_NOT_INTERNAL}, /* 1840 */ + {F_INT4_SUM, 20, int8pl, VCI_AGG_NOT_INTERNAL}, /* 1841 */ + {F_INTERVAL_AVG_COMBINE, 2281, merge_interval_avg_accum, VCI_AGG_NOT_INTERNAL}, /* 3325 */ + {F_INT2_AVG_ACCUM, 1016, merge_intX_accum, VCI_AGG_NOT_INTERNAL}, /* 1962 */ + {F_INT4_AVG_ACCUM, 1016, merge_intX_accum, VCI_AGG_NOT_INTERNAL}, /* 1963 */ + {F_INT8INC_ANY, 20, int8pl, VCI_AGG_NOT_INTERNAL}, /* 2804 */ + {F_INT8_AVG_ACCUM, 2281, int8_avg_combine, VCI_AGG_POLY_AVG_NUM_AGG_STATE}, /* 2746 */ + {F_NUMERIC_AVG_ACCUM, 2281, numeric_avg_combine, VCI_AGG_AVG_NUMERIC_AGG_STATE}, /* 2858 */ +};The comments state that this is sorted in ascending order, but the
code doesn't
follow that rule. While the current linear search works, a future
change to
binary search could cause problems.Hi Japin!
I've looked at the code, vci_set_merge_and_copy_trans_funcs() function
is unused and almost all code of vci_aggmergetranstype.c file including
trans_funcs_table[] can be either removed either replaced with simple
switch-case. Only transfn_oid fields of trans_funcs_table[] were
actually used. Here is my patch made on top of v17.Peter, what do you think? Is it OK to remove those code?
Hi Timur. Thanks for the suggestion/patch.
I had already prepared the v18 patches using Japin's reordering of
that array; I'll study your switch/removal patch for possible
inclusion in v19 (next week?).
======
Kind Regards,
Peter Smith.
Fujitsu Australia.
On Fri, Aug 1, 2025 at 5:43 PM Japin Li <japinli@hotmail.com> wrote:
On Fri, Aug 01, 2025 at 05:18:11PM +1000, Peter Smith wrote:
On Wed, Jul 30, 2025 at 9:07 PM Japin Li <japinli@hotmail.com> wrote:
...
3.
I've also found that the VCI index is not working. Is this the expected
behavior?[local]:3209161 postgres=# \d+ t
Table "public.t"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
info | text | | | | extended | | |
Indexes:
"t_id_idx" vci (id)
Access method: heap[local]:3209161 postgres=# SET enable_seqscan TO off;
SET
[local]:3209161 postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000;
QUERY PLAN
-----------------------------------------------------
Seq Scan on t (cost=0.00..2084.00 rows=1 width=37)
Disabled: true
Filter: (id = 1000)
(3 rows)Hi Japin. Yes, that's expected behaviour.
VCI is used only when the vci index is defined for all the columns of
your query. In your example there was a table with 2 columns ('id' and
'info') but you only have an index on the 'id' column. If you change
the query then you can see VCI getting used.E.g.
postgres=# EXPLAIN SELECT id FROM t WHERE id = 1000;
QUERY PLAN
----------------------------------------------------------------------------
Custom Scan (VCI Scan) using tidx on t (cost=0.00..209.00 rows=1 width=4)
Filter: (id = 1000)
(2 rows)postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000;
QUERY PLAN
----------------------------------------------------
Seq Scan on t (cost=0.00..209.00 rows=1 width=37)
Filter: (id = 1000)
(2 rows)postgres=# EXPLAIN SELECT id,info FROM t WHERE id = 1000;
QUERY PLAN
----------------------------------------------------
Seq Scan on t (cost=0.00..209.00 rows=1 width=37)
Filter: (id = 1000)
(2 rows)~~~
You can see this also in the DEBUG logs, from
vci_can_rewrite_custom_scan(), where it checks to see if the attrs are
in the vci index or not.e.g.
2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: target table
"t"(oid=16384) tuples(rows=10000,extents=0)
2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: don't match
index "tidx"(oid=16469)
2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 1 x
2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 2Thanks for your explantion! Got it.
Are there any plans to remove this restriction in the future?
No. There aren't any plans to remove this restriction because it is
not considered to be a "restriction" in the first place; e.g. VCI is
intended more like an accelerator only for those *specified* columns
for which you intend to do your analysis.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
On Wed, Aug 13, 2025 at 07:28:34PM +0300, Timur Magomedov wrote:
On Mon, 2025-08-11 at 17:39 +0800, Japin Li wrote:
1. +static struct +{ +   int        transfn_oid;   /* Transition function's funcoid. Arrays are +                                * sorted in ascending order */ +   Oid        transtype;     /* Transition data type */ +   PGFunction merge_trans;   /* Function pointer set required for parallel +                                * aggregation for each transfn_oid */ +   vci_aggtranstype_kind kind; /* If transtype is INTERNALOID, its details */ +}          trans_funcs_table[] = { +   {F_FLOAT4_ACCUM, 1022, merge_floatX_accum, VCI_AGG_NOT_INTERNAL},  /* 208 */ +   {F_FLOAT8_ACCUM, 1022, merge_floatX_accum, VCI_AGG_NOT_INTERNAL},  /* 222 */ +   {F_INT8INC, 20, int8pl, VCI_AGG_NOT_INTERNAL}, /* 1833 */ +   {F_NUMERIC_ACCUM, 2281, numeric_combine, VCI_AGG_NUMERIC_AGG_STATE},   /* 1834 */ +   {F_INT2_ACCUM, 2281, numeric_poly_combine, VCI_AGG_POLY_NUM_AGG_STATE}, /* 1836 */ +   {F_INT4_ACCUM, 2281, numeric_poly_combine, VCI_AGG_POLY_NUM_AGG_STATE}, /* 1835 */ +   {F_INT8_ACCUM, 2281, numeric_combine, VCI_AGG_NUMERIC_AGG_STATE},  /* 1836 */ +   {F_INT2_SUM, 20, int8pl, VCI_AGG_NOT_INTERNAL}, /* 1840 */ +   {F_INT4_SUM, 20, int8pl, VCI_AGG_NOT_INTERNAL}, /* 1841 */ +   {F_INTERVAL_AVG_COMBINE, 2281, merge_interval_avg_accum, VCI_AGG_NOT_INTERNAL}, /* 3325 */ +   {F_INT2_AVG_ACCUM, 1016, merge_intX_accum, VCI_AGG_NOT_INTERNAL},  /* 1962 */ +   {F_INT4_AVG_ACCUM, 1016, merge_intX_accum, VCI_AGG_NOT_INTERNAL},  /* 1963 */ +   {F_INT8INC_ANY, 20, int8pl, VCI_AGG_NOT_INTERNAL}, /* 2804 */ +   {F_INT8_AVG_ACCUM, 2281, int8_avg_combine, VCI_AGG_POLY_AVG_NUM_AGG_STATE}, /* 2746 */ +   {F_NUMERIC_AVG_ACCUM, 2281, numeric_avg_combine, VCI_AGG_AVG_NUMERIC_AGG_STATE},   /* 2858 */ +};The comments state that this is sorted in ascending order, but the
code doesn't
follow that rule. While the current linear search works, a future
change to
binary search could cause problems.Hi Japin!
I've looked at the code, vci_set_merge_and_copy_trans_funcs() function
is unused and almost all code of vci_aggmergetranstype.c file including
trans_funcs_table[] can be either removed either replaced with simple
switch-case. Only transfn_oid fields of trans_funcs_table[] were
actually used. Here is my patch made on top of v17.
Yeah. The code isn't used in these patches, so it's a good idea to remove it.
--
Best regards,
Japin Li
ChengDu WenWu Information Technology Co., LTD.
On Thu, Aug 14, 2025 at 12:04:12PM +1000, Peter Smith wrote:
On Fri, Aug 1, 2025 at 5:43â¯PM Japin Li <japinli@hotmail.com> wrote:
On Fri, Aug 01, 2025 at 05:18:11PM +1000, Peter Smith wrote:
On Wed, Jul 30, 2025 at 9:07â¯PM Japin Li <japinli@hotmail.com> wrote:
...
3.
I've also found that the VCI index is not working. Is this the expected
behavior?[local]:3209161 postgres=# \d+ t
Table "public.t"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
info | text | | | | extended | | |
Indexes:
"t_id_idx" vci (id)
Access method: heap[local]:3209161 postgres=# SET enable_seqscan TO off;
SET
[local]:3209161 postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000;
QUERY PLAN
-----------------------------------------------------
Seq Scan on t (cost=0.00..2084.00 rows=1 width=37)
Disabled: true
Filter: (id = 1000)
(3 rows)Hi Japin. Yes, that's expected behaviour.
VCI is used only when the vci index is defined for all the columns of
your query. In your example there was a table with 2 columns ('id' and
'info') but you only have an index on the 'id' column. If you change
the query then you can see VCI getting used.E.g.
postgres=# EXPLAIN SELECT id FROM t WHERE id = 1000;
QUERY PLAN
----------------------------------------------------------------------------
Custom Scan (VCI Scan) using tidx on t (cost=0.00..209.00 rows=1 width=4)
Filter: (id = 1000)
(2 rows)postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000;
QUERY PLAN
----------------------------------------------------
Seq Scan on t (cost=0.00..209.00 rows=1 width=37)
Filter: (id = 1000)
(2 rows)postgres=# EXPLAIN SELECT id,info FROM t WHERE id = 1000;
QUERY PLAN
----------------------------------------------------
Seq Scan on t (cost=0.00..209.00 rows=1 width=37)
Filter: (id = 1000)
(2 rows)~~~
You can see this also in the DEBUG logs, from
vci_can_rewrite_custom_scan(), where it checks to see if the attrs are
in the vci index or not.e.g.
2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: target table
"t"(oid=16384) tuples(rows=10000,extents=0)
2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: don't match
index "tidx"(oid=16469)
2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 1 x
2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 2Thanks for your explantion! Got it.
Are there any plans to remove this restriction in the future?
No. There aren't any plans to remove this restriction because it is
not considered to be a "restriction" in the first place; e.g. VCI is
intended more like an accelerator only for those *specified* columns
for which you intend to do your analysis.
Thanks for your explantion!
--
Best regards,
Japin Li
ChengDu WenWu Information Technology Co., LTD.
On Thu, 2025-08-14 at 11:23 +1000, Peter Smith wrote:
Here are the latest v18* patches.
Hi Peter!
I've reworked my recent patch [1]/messages/by-id/8beac6e8a01971b22ccf0f2e2a8eb12a78e5a7ac.camel@postgrespro.ru so it is now based on v18 and is
divided into several simpler patches. Here they are plus one additional
patch.
0001-Fixed-comment-and-guard-name-in-vci_pg_copy.h.patch
Looks like vci_pg_copy.h was renamed from vci_numeric.h but file name
comment and define guard name were not updated. Fixed it.
0002-Removed-vci_set_merge_and_copy_trans_funcs.patch
Found that vci_set_merge_and_copy_trans_funcs() is not used anywhere,
removed it alogn with the code that was only called inside it.
trans_funcs_table[] now only contains single transfn_oid field, others
(unused) are removed.
0003-Replaced-linear-search-by-switch-case.patch
Replaced linear search inside trans_funcs_table array to more optimal
switch-case.
0004-Removed-worker-name-check-in-lock.c.patch
This is one I'm not sure about.
Found that changes in Postgres core lock.c file check for "backend="
substring in background worker name. There is also a comment in
vci_ros_daemon.c mentioning bgw_name checks of LockAquire(). Names
don't match however. So as far as I understand the check for "backend="
in name is always false since no code in VCI sets bgw_name to something
similar.
This is either forgotten feature that can be easily fixed by removing
bgw_name checks, either some bug, either my misunderstanding.
For the first case, here is a patch that removes bgw_name checks in
lock.c. It makes core patch a bit smaller and not touching lock.c at
all (Yay!).
[1]: /messages/by-id/8beac6e8a01971b22ccf0f2e2a8eb12a78e5a7ac.camel@postgrespro.ru
/messages/by-id/8beac6e8a01971b22ccf0f2e2a8eb12a78e5a7ac.camel@postgrespro.ru
--
Regards,
Timur Magomedov
Attachments:
Hi Timur.
Thanks for the patches you provided. My replies are inline below.
On Sat, Aug 16, 2025 at 3:45 AM Timur Magomedov
<t.magomedov@postgrespro.ru> wrote:
On Thu, 2025-08-14 at 11:23 +1000, Peter Smith wrote:
Here are the latest v18* patches.
Hi Peter!
I've reworked my recent patch [1] so it is now based on v18 and is
divided into several simpler patches. Here they are plus one additional
patch.0001-Fixed-comment-and-guard-name-in-vci_pg_copy.h.patch
Looks like vci_pg_copy.h was renamed from vci_numeric.h but file name
comment and define guard name were not updated. Fixed it.
In v19 I intend to merge vci_pg_copy.h into postgresql_copy.h, so this is moot.
0002-Removed-vci_set_merge_and_copy_trans_funcs.patch
Found that vci_set_merge_and_copy_trans_funcs() is not used anywhere,
removed it alogn with the code that was only called inside it.
trans_funcs_table[] now only contains single transfn_oid field, others
(unused) are removed.0003-Replaced-linear-search-by-switch-case.patch
Replaced linear search inside trans_funcs_table array to more optimal
switch-case.
Thanks, these 0002 and 0003 changes will be in v19 patches which I'm
hoping to post tomorrow or the day after.
0004-Removed-worker-name-check-in-lock.c.patch
This is one I'm not sure about.
Found that changes in Postgres core lock.c file check for "backend="
substring in background worker name. There is also a comment in
vci_ros_daemon.c mentioning bgw_name checks of LockAquire(). Names
don't match however. So as far as I understand the check for "backend="
in name is always false since no code in VCI sets bgw_name to something
similar.
This is either forgotten feature that can be easily fixed by removing
bgw_name checks, either some bug, either my misunderstanding.
For the first case, here is a patch that removes bgw_name checks in
lock.c. It makes core patch a bit smaller and not touching lock.c at
all (Yay!).
There is code in the function vci_LaunchROSControlWorker() which does
a sprintf to assign the worker.bgw_name, but I also do not see how
LockAcquire can have a bgw_name containing string “backend=”.
Frankly, I expected the patch 0001 code should say more like strstr(…,
“vci:”) because otherwise it does not seem VCI specific. Indeed, if it
was checking “vci:” then the comment in storage/vci_ros_daemon.c would
make sense to me.
So, I agree that the Acquire/ReleaseLock code seems like it might be
unreachable, OTOH, I am reluctant to remove it without understanding
more about what was the intended purpose in the first place. Checking…
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Here are the latest v19* patches.
Changes include:
PATCH 0002.
Cleaning
- Removed unused code (per Timur's patch 0002 [1]Timur's patches. /messages/by-id/07c53a696afb8089d724214dbaeded6fcaa8fc0d.camel@postgrespro.ru)
- Removed vci_pg_copy.h; Combined this into postrgesql_copy.h
Code changes
- Changed linear search to switch (per Timur's patch 0003 [1]Timur's patches. /messages/by-id/07c53a696afb8089d724214dbaeded6fcaa8fc0d.camel@postgrespro.ru)
- Removed UpperUniquePath (fix build issue due to master commit 24225ad)
======
[1]: Timur's patches. /messages/by-id/07c53a696afb8089d724214dbaeded6fcaa8fc0d.camel@postgrespro.ru
/messages/by-id/07c53a696afb8089d724214dbaeded6fcaa8fc0d.camel@postgrespro.ru
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
Here are the latest v20* patches.
Changes include:
PATCH 0002.
- Addressed all compiler warnings. Hopefully, cfbot CI will now report
green for these.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
Here are the latest v21* patches.
Changes include:
PATCH 0002.
- Removed all the unused "port" subfolder files plus associated include files.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
Here are the latest v22* patches.
Changes include:
PATCH 0002.
- Code cleanups -- remove some redundant code related to "devload_t",
because the OSS patch only implements the "unmonitored" device.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
Here are the latest v23* patches.
Changes include:
PATCH 0002.
- Some code/config changes so that cfbot can work now for MacOS and FreeBSD.
- Modify test code to make the ANALYZE expected results more stable
(don't display buffers).
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
Hello Peter!
Thanks for updates. Here is a small fix for clang "variable set but not
used" warnings.
--
Regards,
Timur Magomedov
Attachments:
On Sat, Sep 13, 2025 at 1:58 AM Timur Magomedov
<t.magomedov@postgrespro.ru> wrote:
Hello Peter!
Thanks for updates. Here is a small fix for clang "variable set but not
used" warnings.
Thanks for your fixes. I will include these whenever I post the next version.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Hello Peter!
I've found (using valgrind) some cases of reading random garbage after
allocated memory. Investigation showed this was caused by converting
some nodes to VciScanState* even if they have smaller size allocated
originally. So accessing VciScanState fields was accessing memory after
palloc'ed memory which could be used by any other allocation.
I think converting to VciScanState* is only valid for nodes with tag
T_CustomScanState so here is a patch that adds assertions for this:
0001-Assert-corrrect-node-tags-when-casting-to-VciScanSta.patch
VCI v23 passes the tests with this patch applied.
There are queries that fail unfortunately. I've added one of them to
bugs.sql:
0002-Reproducer-of-invalid-cast-to-VciScanState.patch
Node with tag 420 (T_NestLoopState) is cast to VciScanState* that fails
newly added assertion.
I'm not sure where to look next to fix this. Looking forward for you
comments and ideas.
--
Regards,
Timur Magomedov
Attachments:
Here are the latest v24* patches.
Changes include:
PATCH 0002.
- Some compiler warning fixes, from Timur [1]/messages/by-id/2af90dfaf6004e17782bd6c8cb8444670ab4d82c.camel@postgrespro.ru
- Pre-emptive removal of PointerIsValid() macro to prevent a rebase
- Added sanity Asserts for T_CustomScanState, from Timur [2]/messages/by-id/149d6694c0c5a789b0ee865e80109022002bade5.camel@postgrespro.ru
======
[1]: /messages/by-id/2af90dfaf6004e17782bd6c8cb8444670ab4d82c.camel@postgrespro.ru
[2]: /messages/by-id/149d6694c0c5a789b0ee865e80109022002bade5.camel@postgrespro.ru
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
Hi Timur,
Thanks for your ongoing work for this patch.
On Thu, Sep 18, 2025 at 1:15 AM Timur Magomedov
<t.magomedov@postgrespro.ru> wrote:
...
I've found (using valgrind) some cases of reading random garbage after
allocated memory. Investigation showed this was caused by converting
some nodes to VciScanState* even if they have smaller size allocated
originally. So accessing VciScanState fields was accessing memory after
palloc'ed memory which could be used by any other allocation.I think converting to VciScanState* is only valid for nodes with tag
T_CustomScanState so here is a patch that adds assertions for this:
0001-Assert-corrrect-node-tags-when-casting-to-VciScanSta.patch
What exactly did Valgrind report? For example, you said the
VciScanState points beyond allocated memory. Do you have any more
clues, like where that happened? Did you discover where that (smaller
than it should be) memory was allocated in the first place?
VCI v23 passes the tests with this patch applied.
OK. I am not 100% certain about the asserts, but since the existing
VCI tests are passing, I have merged your patch as-is into v24-0002. I
guess we will find out later if the bug below is due to an old code
cast problem or a new code assert problem.
There are queries that fail unfortunately. I've added one of them to
bugs.sql:
0002-Reproducer-of-invalid-cast-to-VciScanState.patch
Node with tag 420 (T_NestLoopState) is cast to VciScanState* that fails
newly added assertion.I'm not sure where to look next to fix this. Looking forward for you
comments and ideas.
OK. I ran with your 2nd patch applied and reproduced the core-dump
below, where it tripped over one of your new Asserts at
executor/vci_sort.c:89. I can see it is an unexpected value
T_NestLoopState.
(gdb) bt 15
#0 0x00007ff948aa62c7 in raise () from /lib64/libc.so.6
#1 0x00007ff948aa79b8 in abort () from /lib64/libc.so.6
#2 0x0000000000c07977 in ExceptionalCondition
(conditionName=0x7ff940839fa8 "scanstate->vci.css.ss.ps.type ==
T_CustomScanState", fileName=0x7ff940839f90 "executor/vci_sort.c",
lineNumber=89) at assert.c:66
#3 0x00007ff9408084e6 in vci_sort_ExecCustomPlan (node=0x2a862f0) at
executor/vci_sort.c:89
#4 0x000000000079d5bd in ExecCustomScan (pstate=0x2a862f0) at nodeCustom.c:137
#5 0x000000000077f693 in ExecProcNodeInstr (node=0x2a862f0) at
execProcnode.c:486
#6 0x000000000077f664 in ExecProcNodeFirst (node=0x2a862f0) at
execProcnode.c:470
#7 0x0000000000772b72 in ExecProcNode (node=0x2a862f0) at
../../../src/include/executor/executor.h:316
#8 0x0000000000775774 in ExecutePlan (queryDesc=0x2a89100,
operation=CMD_SELECT, sendTuples=true, numberTuples=0,
direction=ForwardScanDirection, dest=0xe5b1a0 <donothingDR>)
at execMain.c:1697
#9 0x000000000077317b in standard_ExecutorRun (queryDesc=0x2a89100,
direction=ForwardScanDirection, count=0) at execMain.c:366
#10 0x00007ff9407f9efd in vci_executor_run_routine
(queryDesc=0x2a89100, direction=ForwardScanDirection, count=0) at
executor/vci_executor.c:178
#11 0x0000000000772ff5 in ExecutorRun (queryDesc=0x2a89100,
direction=ForwardScanDirection, count=0) at execMain.c:301
#12 0x00000000006b7f66 in ExplainOnePlan (plannedstmt=0x2a8a628,
into=0x0, es=0x2a81388,
queryString=0x28b0fd0 "EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS
FALSE, TIMING FALSE, SUMMARY FALSE)\nSELECT *\n FROM main m\n JOIN
secondary s\n\tON m.id = s.main_id\n WHERE s.val in (\n\t\tSELECT
MAX(val)\n\t\t FROM secondary s2\n\t\t W"..., params=0x0,
queryEnv=0x0, planduration=0x7ffe51311320, bufusage=0x0,
mem_counters=0x0) at explain.c:579
#13 0x00000000006b799a in standard_ExplainOneQuery (query=0x2ac21f8,
cursorOptions=2048, into=0x0, es=0x2a81388,
queryString=0x28b0fd0 "EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS
FALSE, TIMING FALSE, SUMMARY FALSE)\nSELECT *\n FROM main m\n JOIN
secondary s\n\tON m.id = s.main_id\n WHERE s.val in (\n\t\tSELECT
MAX(val)\n\t\t FROM secondary s2\n\t\t W"..., params=0x0,
queryEnv=0x0) at explain.c:372
#14 0x00007ff9407f9ff3 in vci_explain_one_query_routine
(queryDesc=0x2ac21f8, cursorOptions=2048, into=0x0, es=0x2a81388,
queryString=0x28b0fd0 "EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS
FALSE, TIMING FALSE, SUMMARY FALSE)\nSELECT *\n FROM main m\n JOIN
secondary s\n\tON m.id = s.main_id\n WHERE s.val in (\n\t\tSELECT
MAX(val)\n\t\t FROM secondary s2\n\t\t W"..., params=0x0,
queryEnv=0x0) at executor/vci_executor.c:224
(More stack frames follow...)
(gdb)
I will keep investigating it...
I have not included your test case in the v24* patches because I
didn't want this known test failure to mask out any other unknown test
problems that might occur.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
On Thu, 18 Sep 2025 at 15:07, Peter Smith <smithpb2250@gmail.com> wrote:
Hi Timur,
Thanks for your ongoing work for this patch.
On Thu, Sep 18, 2025 at 1:15 AM Timur Magomedov
<t.magomedov@postgrespro.ru> wrote:
...I've found (using valgrind) some cases of reading random garbage after
allocated memory. Investigation showed this was caused by converting
some nodes to VciScanState* even if they have smaller size allocated
originally. So accessing VciScanState fields was accessing memory after
palloc'ed memory which could be used by any other allocation.I think converting to VciScanState* is only valid for nodes with tag
T_CustomScanState so here is a patch that adds assertions for this:
0001-Assert-corrrect-node-tags-when-casting-to-VciScanSta.patchWhat exactly did Valgrind report? For example, you said the
VciScanState points beyond allocated memory. Do you have any more
clues, like where that happened? Did you discover where that (smaller
than it should be) memory was allocated in the first place?VCI v23 passes the tests with this patch applied.
OK. I am not 100% certain about the asserts, but since the existing
VCI tests are passing, I have merged your patch as-is into v24-0002. I
guess we will find out later if the bug below is due to an old code
cast problem or a new code assert problem.There are queries that fail unfortunately. I've added one of them to
bugs.sql:
0002-Reproducer-of-invalid-cast-to-VciScanState.patch
Node with tag 420 (T_NestLoopState) is cast to VciScanState* that fails
newly added assertion.I'm not sure where to look next to fix this. Looking forward for you
comments and ideas.OK. I ran with your 2nd patch applied and reproduced the core-dump
below, where it tripped over one of your new Asserts at
executor/vci_sort.c:89. I can see it is an unexpected value
T_NestLoopState.
I'm curious about the assert here. Why is it that children must be CustomScan?
It seems that there is a JOIN node, and we do not support customizing the JOIN
operation, right?
(gdb) bt 15
#0 0x00007ff948aa62c7 in raise () from /lib64/libc.so.6
#1 0x00007ff948aa79b8 in abort () from /lib64/libc.so.6
#2 0x0000000000c07977 in ExceptionalCondition
(conditionName=0x7ff940839fa8 "scanstate->vci.css.ss.ps.type ==
T_CustomScanState", fileName=0x7ff940839f90 "executor/vci_sort.c",
lineNumber=89) at assert.c:66
#3 0x00007ff9408084e6 in vci_sort_ExecCustomPlan (node=0x2a862f0) at
executor/vci_sort.c:89
#4 0x000000000079d5bd in ExecCustomScan (pstate=0x2a862f0) at nodeCustom.c:137
#5 0x000000000077f693 in ExecProcNodeInstr (node=0x2a862f0) at
execProcnode.c:486
#6 0x000000000077f664 in ExecProcNodeFirst (node=0x2a862f0) at
execProcnode.c:470
#7 0x0000000000772b72 in ExecProcNode (node=0x2a862f0) at
../../../src/include/executor/executor.h:316
#8 0x0000000000775774 in ExecutePlan (queryDesc=0x2a89100,
operation=CMD_SELECT, sendTuples=true, numberTuples=0,
direction=ForwardScanDirection, dest=0xe5b1a0 <donothingDR>)
at execMain.c:1697
#9 0x000000000077317b in standard_ExecutorRun (queryDesc=0x2a89100,
direction=ForwardScanDirection, count=0) at execMain.c:366
#10 0x00007ff9407f9efd in vci_executor_run_routine
(queryDesc=0x2a89100, direction=ForwardScanDirection, count=0) at
executor/vci_executor.c:178
#11 0x0000000000772ff5 in ExecutorRun (queryDesc=0x2a89100,
direction=ForwardScanDirection, count=0) at execMain.c:301
#12 0x00000000006b7f66 in ExplainOnePlan (plannedstmt=0x2a8a628,
into=0x0, es=0x2a81388,
queryString=0x28b0fd0 "EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS
FALSE, TIMING FALSE, SUMMARY FALSE)\nSELECT *\n FROM main m\n JOIN
secondary s\n\tON m.id = s.main_id\n WHERE s.val in (\n\t\tSELECT
MAX(val)\n\t\t FROM secondary s2\n\t\t W"..., params=0x0,
queryEnv=0x0, planduration=0x7ffe51311320, bufusage=0x0,
mem_counters=0x0) at explain.c:579
#13 0x00000000006b799a in standard_ExplainOneQuery (query=0x2ac21f8,
cursorOptions=2048, into=0x0, es=0x2a81388,
queryString=0x28b0fd0 "EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS
FALSE, TIMING FALSE, SUMMARY FALSE)\nSELECT *\n FROM main m\n JOIN
secondary s\n\tON m.id = s.main_id\n WHERE s.val in (\n\t\tSELECT
MAX(val)\n\t\t FROM secondary s2\n\t\t W"..., params=0x0,
queryEnv=0x0) at explain.c:372
#14 0x00007ff9407f9ff3 in vci_explain_one_query_routine
(queryDesc=0x2ac21f8, cursorOptions=2048, into=0x0, es=0x2a81388,
queryString=0x28b0fd0 "EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS
FALSE, TIMING FALSE, SUMMARY FALSE)\nSELECT *\n FROM main m\n JOIN
secondary s\n\tON m.id = s.main_id\n WHERE s.val in (\n\t\tSELECT
MAX(val)\n\t\t FROM secondary s2\n\t\t W"..., params=0x0,
queryEnv=0x0) at executor/vci_executor.c:224
(More stack frames follow...)
(gdb)I will keep investigating it...
I have not included your test case in the v24* patches because I
didn't want this known test failure to mask out any other unknown test
problems that might occur.
--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.
Hi Peter!
What exactly did Valgrind report? For example, you said the
VciScanState points beyond allocated memory. Do you have any more
clues, like where that happened? Did you discover where that (smaller
than it should be) memory was allocated in the first place?
Doing some experiments I've faced a segfault on a query joining tables
filled with some amount of data. It was flaky so I used Valgrind.
There is a line in vci_scan.c, exec_custom_plan_enabling_vp():
if (!scanstate->first_fetch || (scanstate->pos.num_fetched_rows <=
scanstate->pos.current_row))
Valgrind reported that line as Invalid read of size 1, 4 and 4. So all
three of the values checked in this line are read from some random
memory, possibly allocated and used by other objects already.
When the expression in exec_custom_plan_enabling_vp() randomly
evaluated to true, the following ExecClearTuple() dereferences NULL in
slot->tts_ops.
The memory was originally allocated in nodeHashJoin.c, in hjstate =
makeNode(HashJoinState) line so it is really smaller than VciScanState.
I did not use any table data for reproducer since asserts helps to
catch the original problem. I also simplified the original query for a
reproducer.
OK. I am not 100% certain about the asserts, but since the existing
VCI tests are passing, I have merged your patch as-is into v24-0002.
I
guess we will find out later if the bug below is due to an old code
cast problem or a new code assert problem.
Thanks for merging asserts. And looks like the problem is related to
VCI join nodes.
There is no VCI hash join or VCI nested loop. There is a code in VCI
planner that still puts VCI Sort or VCI Aggregate nodes on top of
regular join nodes which makes no sense to me. This is the cause of the
problem. VCI Sort and VCI Aggregate then convert outer nodes to VCI
Scan since they know there can't be anything another. This can be fixed
either by implementing VCI joins either by disabling them in a deeper
way. Since we already have developer GUCs for them I would rather set
them to disabled by default instead of removing all useful VCI joins
related code.
Made a patch with a test and a simplest fix (disabling joins in GUCs).
--
Regards,
Timur Magomedov
Attachments:
Hi Japin!
I'm curious about the assert here. Why is it that children must be
CustomScan?
It seems that there is a JOIN node, and we do not support customizing
the JOIN
operation, right?
You are right, looks like we don't have VCI joins and this was the
cause of the problem. Thanks!
--
Regards,
Timur Magomedov
Here are the latest v25* patches.
Changes include:
PATCH 0001.
- A rebase was needed due to a recent commit d4d1fc5.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
On Sat, Sep 20, 2025 at 12:13 AM Timur Magomedov
<t.magomedov@postgrespro.ru> wrote:
Hi Peter!
What exactly did Valgrind report? For example, you said the
VciScanState points beyond allocated memory. Do you have any more
clues, like where that happened? Did you discover where that (smaller
than it should be) memory was allocated in the first place?Doing some experiments I've faced a segfault on a query joining tables
filled with some amount of data. It was flaky so I used Valgrind.
There is a line in vci_scan.c, exec_custom_plan_enabling_vp():
if (!scanstate->first_fetch || (scanstate->pos.num_fetched_rows <=
scanstate->pos.current_row))
Valgrind reported that line as Invalid read of size 1, 4 and 4. So all
three of the values checked in this line are read from some random
memory, possibly allocated and used by other objects already.When the expression in exec_custom_plan_enabling_vp() randomly
evaluated to true, the following ExecClearTuple() dereferences NULL in
slot->tts_ops.
The memory was originally allocated in nodeHashJoin.c, in hjstate =
makeNode(HashJoinState) line so it is really smaller than VciScanState.I did not use any table data for reproducer since asserts helps to
catch the original problem. I also simplified the original query for a
reproducer.OK. I am not 100% certain about the asserts, but since the existing
VCI tests are passing, I have merged your patch as-is into v24-0002.
I
guess we will find out later if the bug below is due to an old code
cast problem or a new code assert problem.Thanks for merging asserts. And looks like the problem is related to
VCI join nodes.
There is no VCI hash join or VCI nested loop. There is a code in VCI
planner that still puts VCI Sort or VCI Aggregate nodes on top of
regular join nodes which makes no sense to me. This is the cause of the
problem. VCI Sort and VCI Aggregate then convert outer nodes to VCI
Scan since they know there can't be anything another. This can be fixed
either by implementing VCI joins either by disabling them in a deeper
way. Since we already have developer GUCs for them I would rather set
them to disabled by default instead of removing all useful VCI joins
related code.Made a patch with a test and a simplest fix (disabling joins in GUCs).
Hi Timur,
Thanks for the patch! Unfortunately, this is straying into areas with
which I'm not familiar, so I'm taking it on faith that these are good
changes. For now, I'm happy to merge your patch into the next VCI
version, posted unless someone else objects.
~
But, I still have a couple of questions for clarification:
1. What about the original Valgrind issue?
Is that still a problem that needs to be addressed? E.g., is the bad
allocation still lurking, and your sort avoidance patch is simply
preventing the bad allocation from being exposed until some next thing
randomly fails? Or is there no allocation problem anymore to worry
about?
2. What about your added Assert that was previously failing at
executor/vci_sort.c:89?
That Assert is still present in vci_sort.c, but AFAICT the current
tests are not executing that code. Do those patched GUC changes simply
make that code unreachable now? In other words, should that previously
failing Assert be left where it is or not? Should there be another
test case added to execute this Assert?
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Hi Peter!
On Wed, 2025-09-24 at 12:46 +1000, Peter Smith wrote:
On Sat, Sep 20, 2025 at 12:13 AM Timur Magomedov
<t.magomedov@postgrespro.ru> wrote:Hi Peter!
What exactly did Valgrind report? For example, you said the
VciScanState points beyond allocated memory. Do you have any more
clues, like where that happened? Did you discover where that
(smaller
than it should be) memory was allocated in the first place?Doing some experiments I've faced a segfault on a query joining
tables
filled with some amount of data. It was flaky so I used Valgrind.
There is a line in vci_scan.c, exec_custom_plan_enabling_vp():
if (!scanstate->first_fetch || (scanstate->pos.num_fetched_rows <=
scanstate->pos.current_row))
Valgrind reported that line as Invalid read of size 1, 4 and 4. So
all
three of the values checked in this line are read from some random
memory, possibly allocated and used by other objects already.When the expression in exec_custom_plan_enabling_vp() randomly
evaluated to true, the following ExecClearTuple() dereferences NULL
in
slot->tts_ops.
The memory was originally allocated in nodeHashJoin.c, in hjstate =
makeNode(HashJoinState) line so it is really smaller than
VciScanState.I did not use any table data for reproducer since asserts helps to
catch the original problem. I also simplified the original query
for a
reproducer.OK. I am not 100% certain about the asserts, but since the
existing
VCI tests are passing, I have merged your patch as-is into v24-
0002.
I
guess we will find out later if the bug below is due to an old
code
cast problem or a new code assert problem.Thanks for merging asserts. And looks like the problem is related
to
VCI join nodes.
There is no VCI hash join or VCI nested loop. There is a code in
VCI
planner that still puts VCI Sort or VCI Aggregate nodes on top of
regular join nodes which makes no sense to me. This is the cause of
the
problem. VCI Sort and VCI Aggregate then convert outer nodes to VCI
Scan since they know there can't be anything another. This can be
fixed
either by implementing VCI joins either by disabling them in a
deeper
way. Since we already have developer GUCs for them I would rather
set
them to disabled by default instead of removing all useful VCI
joins
related code.Made a patch with a test and a simplest fix (disabling joins in
GUCs).Hi Timur,
Thanks for the patch! Unfortunately, this is straying into areas with
which I'm not familiar, so I'm taking it on faith that these are good
changes. For now, I'm happy to merge your patch into the next VCI
version, posted unless someone else objects.~
But, I still have a couple of questions for clarification:
1. What about the original Valgrind issue?
Is that still a problem that needs to be addressed? E.g., is the bad
allocation still lurking, and your sort avoidance patch is simply
preventing the bad allocation from being exposed until some next
thing
randomly fails? Or is there no allocation problem anymore to worry
about?
Allocations are fine, the problem was using some nodes as nodes of
another type (and bigger size) which leads to crossing boundary of
allocated memory. We are safe now and asserts guard us from repeating
the original bug.
2. What about your added Assert that was previously failing at
executor/vci_sort.c:89?That Assert is still present in vci_sort.c, but AFAICT the current
tests are not executing that code. Do those patched GUC changes
simply
make that code unreachable now? In other words, should that
previously
failing Assert be left where it is or not? Should there be another
test case added to execute this Assert?
Added simple test for running VCI sort node, it executes the assertion
code in vci_sort.c. No assertion fails, so VCI Sort itself is OK. Here
are both two commits on top of v25 version.
--
Regards,
Timur Magomedov
Attachments:
Hi Timur.
On Wed, Sep 24, 2025 at 11:47 PM Timur Magomedov
<t.magomedov@postgrespro.ru> wrote:
...
Thanks for the patch! Unfortunately, this is straying into areas with
which I'm not familiar, so I'm taking it on faith that these are good
changes. For now, I'm happy to merge your patch into the next VCI
version, posted unless someone else objects.~
But, I still have a couple of questions for clarification:
1. What about the original Valgrind issue?
Is that still a problem that needs to be addressed? E.g., is the bad
allocation still lurking, and your sort avoidance patch is simply
preventing the bad allocation from being exposed until some next
thing
randomly fails? Or is there no allocation problem anymore to worry
about?Allocations are fine, the problem was using some nodes as nodes of
another type (and bigger size) which leads to crossing boundary of
allocated memory. We are safe now and asserts guard us from repeating
the original bug.
Thanks for the clarification.
2. What about your added Assert that was previously failing at
executor/vci_sort.c:89?That Assert is still present in vci_sort.c, but AFAICT the current
tests are not executing that code. Do those patched GUC changes
simply
make that code unreachable now? In other words, should that
previously
failing Assert be left where it is or not? Should there be another
test case added to execute this Assert?Added simple test for running VCI sort node, it executes the assertion
code in vci_sort.c. No assertion fails, so VCI Sort itself is OK. Here
are both two commits on top of v25 version.
These have been included in v26. Thanks!
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Here are the latest v26* patches.
Changes include:
PATCH 0002.
- Rebase was needed due to a5b35fc
- Merged the sort/join patches provided by Timur [1]/messages/by-id/fd67d6c54cccaf0d98ec8a19182635067392b928.camel@postgrespro.ru
======
[1]: /messages/by-id/fd67d6c54cccaf0d98ec8a19182635067392b928.camel@postgrespro.ru
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
Hello Peter!
There is a code in vci_ros.c that initializes xl_heap_inplace xlrec.
Comment says this code was taken from src/backend/access/heap/heapam.c.
It was fine for Postgres 17 and earlier however struct xl_heap_inplace
has 6 fields, not one since commit 243e9b40f1b2. So nmsgs field of
xlrec has some random uninitialized value from stack. It goes to WAL
and in case of big nmsgs it can cause segfault during server startup.
Here are backtrace of a segfault while applying WAL on server startup
and a patch that initializes all necessary fields of xlrec to avoid bad
WAL records.
--
Regards,
Timur Magomedov
Attachments:
Here are the latest v27* patches.
Changes include:
PATCH 0002.
- Rebase vci_ros.c was needed due to 243e9b4 (patch provided by Timur [1]/messages/by-id/b0183172fbe8fbf4260d10df50a57127753eba68.camel@postgrespro.ru)
======
[1]: /messages/by-id/b0183172fbe8fbf4260d10df50a57127753eba68.camel@postgrespro.ru
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
Here are the latest v28* patches.
Changes include:
PATCH 0001.
- Remove some code that Iwata-San has already separated out to a
different thread [1]/messages/by-id/OS7PR01MB11964335F36BE41021B62EAE8EAE4A@OS7PR01MB11964.jpnprd01.prod.outlook.com.
PATCH 0002.
- Temp removal of call to the now separated out function.
======
[1]: /messages/by-id/OS7PR01MB11964335F36BE41021B62EAE8EAE4A@OS7PR01MB11964.jpnprd01.prod.outlook.com
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
Here are the latest v29* patches.
Changes include:
PATCH 0001.
- Rebase was needed
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
Here are the latest v30* patches.
Changes include:
PATCH 0002.
- Rebase was needed due to commit add323d
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
Here are the latest patches.
Some rebasing was needed.
Note -- I've changed the versioning -- now date-based instead of v31.
~~~
Changes include:
PATCH 0001.
- Rebase due to commit e1ac846f3d2836dcfa0ad15310e28d0a0b495500
(ItemPointerData)
PATCH 0002.
- Rebase due to commit e1ac846f3d2836dcfa0ad15310e28d0a0b495500
(ItemPointerData)
- Rebase due to commit a13833c (GUC structs)
~~~
KNOWN ISSUES
The test cases are currently failing with a TRAP Assert.
Unfortunately, the CI has failed to apply this patch for the last few
weeks, so any master changes during that time could have impacted VCI
without being noticed. Meanwhile, I am reposting these rebased patches
despite the known test issue, because then at least it may
catch/report any other emerging build problems early, while the cause
of the TRAP is still being investigated.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
Hello Peter!
I've succeeded in making a reproducer for a infrequent bug I've seen
several times with ROS control daemon enabled.
Looks like WAL records produced by ROS control daemon while processing
"vci_rc_update_del_vec" command are not compatible with what
heap_xlog_prune_freeze() function expects to read from WAL. Those
records are produced in cleanUpWos(), specific line looks like "recptr
= XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE_ON_ACCESS);".
The reproducing steps can look tricky, any ideas of improving them are
welcome. This would ideally be a TAP test. For now I just patch code so
that ROS daemon terminates rignt after "update delete vector" command,
kill all postgress processes and next time PostgreSQL is started it
catches assertion inside heap_xlog_prune_freeze() function.
This is the reproduction routine in four steps:
1. Patch VCI using vci_always_fail_update_delete_vector.patch and build
it.
2. Setup VCI in config file (ros_control_daemon enabled):
shared_preload_libraries = 'vci'
max_worker_processes = 20
vci.table_rows_threshold = 0
vci.cost_threshold = 0
vci.enable_ros_control_daemon = true
3. Run reproducer.sh script that runs pgbench on VCI-enabled table and
terminates all postgres processes immediately using "killall -s 9
postgres" after pgbench failed. "pg_ctl stop" can't terminate
PostgreSQL here. "update delete vector" command is usually executed in
less than ten minutes on my system but it needs to wait some time.
4. Here we are with some WAL records on storage that (at least on my
machine) PostgreSQL is unable to apply and fails the assertion:
$ pg_ctl start
...
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: redo starts at 0/017689D8
..TRAP: failed Assert("do_prune || nplans > 0 || vmflags &
VISIBILITYMAP_VALID_BITS"), File: "heapam_xlog.c", Line: 117, PID:
841207
Debugger shows data actually contains some offsets, in order, but the
format and flags combination are unexpected:
#6 0x000055fad895d1f7 in heap_xlog_prune_freeze
(record=0x55faf23f0ce0) at heapam_xlog.c:117
117 Assert(do_prune || nplans > 0 || vmflags &
VISIBILITYMAP_VALID_BITS);
(gdb) print dataptr
$1 = 0x7072b5e18248 "\001"
(gdb) print datalen
$2 = 370
(gdb) print frz_offsets
$3 = (OffsetNumber *) 0x7072b5e18248
(gdb) print *frz_offsets@185
$4 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71,
72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88,
89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
105,
106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133,
134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147,
148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161,
162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175,
176, 177, 178, 179, 180, 181, 182, 183, 184, 185}
(gdb) print frz_offsets==dataptr
$5 = 1
I also attached backtrace from GDB.
I don't understand yet how to fix this and the reproducing is clunky so
any ideas are welcome.
Does this reproduce on your system too? Is it some known problem?
--
Regards,
Timur Magomedov
Attachments:
Hi Timur,
Thank you so much for your continued interest in this patch and for
taking the time to report these bugs! It is really appreciated and
your info has been recorded so it won't be forgotten.
Recent work and focus:
* As you might have seen, there have been many changes to master
lately, so I've been rebasing VCI to keep everything building cleanly.
Good news - the existing tests are now working again (locally) after
failing on -hackers for the past couple of weeks. I'm planning to post
the updated patches before the end of this week.
* Once that's done, my next priority is to split the patch 0002 into
smaller, more manageable pieces for easier review.
In other words, it is going to take a little while before I can circle
back to help investigate intermittent bugs.
Regarding your questions:
On Thu, Nov 13, 2025 at 3:11 AM Timur Magomedov
<t.magomedov@postgrespro.ru> wrote:
...
I don't understand yet how to fix this and the reproducing is clunky so
any ideas are welcome.
Does this reproduce on your system too? Is it some known problem?
I'll follow your reproduction steps when I have a chance and get back to you.
I am not aware that this is a known problem.
~~
Thanks again. Sorry I can't be of more immediate help.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Here are the latest patches v20251114*
Changes include:
PATCH 0002.
- A fix was needed to make VCI code compatible with the recent master
change c106ef0.
VCI tests should now be passing again.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
On Wed, 12 Nov 2025 at 19:11, Timur Magomedov <t.magomedov@postgrespro.ru> wrote:
Hello Peter!
I've succeeded in making a reproducer for a infrequent bug I've seen
several times with ROS control daemon enabled.
Looks like WAL records produced by ROS control daemon while processing
"vci_rc_update_del_vec" command are not compatible with what
heap_xlog_prune_freeze() function expects to read from WAL. Those
records are produced in cleanUpWos(), specific line looks like "recptr
= XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE_ON_ACCESS);".
The reproducing steps can look tricky, any ideas of improving them are
welcome. This would ideally be a TAP test. For now I just patch code so
that ROS daemon terminates rignt after "update delete vector" command,
kill all postgress processes and next time PostgreSQL is started it
catches assertion inside heap_xlog_prune_freeze() function.This is the reproduction routine in four steps:
1. Patch VCI using vci_always_fail_update_delete_vector.patch and build
it.2. Setup VCI in config file (ros_control_daemon enabled):
shared_preload_libraries = 'vci'
max_worker_processes = 20
vci.table_rows_threshold = 0
vci.cost_threshold = 0
vci.enable_ros_control_daemon = true3. Run reproducer.sh script that runs pgbench on VCI-enabled table and
terminates all postgres processes immediately using "killall -s 9
postgres" after pgbench failed. "pg_ctl stop" can't terminate
PostgreSQL here. "update delete vector" command is usually executed in
less than ten minutes on my system but it needs to wait some time.4. Here we are with some WAL records on storage that (at least on my
machine) PostgreSQL is unable to apply and fails the assertion:$ pg_ctl start
...
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: redo starts at 0/017689D8
..TRAP: failed Assert("do_prune || nplans > 0 || vmflags &
VISIBILITYMAP_VALID_BITS"), File: "heapam_xlog.c", Line: 117, PID:
841207Debugger shows data actually contains some offsets, in order, but the
format and flags combination are unexpected:
#6 0x000055fad895d1f7 in heap_xlog_prune_freeze
(record=0x55faf23f0ce0) at heapam_xlog.c:117
117 Assert(do_prune || nplans > 0 || vmflags &
VISIBILITYMAP_VALID_BITS);
(gdb) print dataptr
$1 = 0x7072b5e18248 "\001"
(gdb) print datalen
$2 = 370
(gdb) print frz_offsets
$3 = (OffsetNumber *) 0x7072b5e18248
(gdb) print *frz_offsets@185
$4 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71,
72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88,
89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
105,
106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133,
134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147,
148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161,
162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175,
176, 177, 178, 179, 180, 181, 182, 183, 184, 185}
(gdb) print frz_offsets==dataptr
$5 = 1I also attached backtrace from GDB.
I don't understand yet how to fix this and the reproducing is clunky so
any ideas are welcome.
Does this reproduce on your system too? Is it some known problem?
Yes, I can reproduce this by using your patch.
--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.
Here are the latest patches.
Changes include:
PATCH 0002.
- Address a new compiler warning reported by CI
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
On Mon, 2025-11-17 at 12:34 +1100, Peter Smith wrote:
Here are the latest patches.
Changes include:
PATCH 0002.
- Address a new compiler warning reported by CI
Hello Peter!
Thanks for updating the patch!
According to the issue I've found recently with incorrect WAL record
[1]: /messages/by-id/36cedffdfcac437afd692442cf9c1d16d7f28b01.camel@postgrespro.ru
call to log_heap_prune_and_freeze() in a way similar to what
freezeWos() and lazy_vacuum_heap_page() do. This fixes reproducer
scenario on my machine.
[1]: /messages/by-id/36cedffdfcac437afd692442cf9c1d16d7f28b01.camel@postgrespro.ru
/messages/by-id/36cedffdfcac437afd692442cf9c1d16d7f28b01.camel@postgrespro.ru
--
Regards,
Timur Magomedov
Attachments:
On Fri, 2025-11-14 at 14:22 +0800, Japin Li wrote:
Yes, I can reproduce this by using your patch.
Hello Japin!
Thanks for reproducing the bug!
I've made a fix on top of recent VCI version v20251117:
/messages/by-id/1de545d00a5c0bb8aaf42f05dd118a2c19030041.camel@postgrespro.ru
That worked for me.
--
Regards,
Timur Magomedov
On Tue, Nov 18, 2025 at 1:32 AM Timur Magomedov
<t.magomedov@postgrespro.ru> wrote:
Hello Peter!
Thanks for updating the patch!
According to the issue I've found recently with incorrect WAL record
[1], I've prepared a patch that replaces manual XLog...() calls by a
call to log_heap_prune_and_freeze() in a way similar to what
freezeWos() and lazy_vacuum_heap_page() do. This fixes reproducer
scenario on my machine.[1]
/messages/by-id/36cedffdfcac437afd692442cf9c1d16d7f28b01.camel@postgrespro.ru
Here are the latest patches.
Changes include:
PATCH 0002.
- Merged the fix provided by Timur [1]/messages/by-id/1de545d00a5c0bb8aaf42f05dd118a2c19030041.camel@postgrespro.ru.
======
[1]: /messages/by-id/1de545d00a5c0bb8aaf42f05dd118a2c19030041.camel@postgrespro.ru
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachments:
On Mon, Nov 17, 2025 at 05:42:03PM +0300, Timur Magomedov wrote:
On Fri, 2025-11-14 at 14:22 +0800, Japin Li wrote:
Yes, I can reproduce this by using your patch.
Hello Japin!
Thanks for reproducing the bug!
I've made a fix on top of recent VCI version v20251117:
/messages/by-id/1de545d00a5c0bb8aaf42f05dd118a2c19030041.camel@postgrespro.ru
That worked for me.
Hi, Timur Magomedov
Thanks for working on this. I test this on v20251118, it works for me.
--
Best regards,
Japin Li
ChengDu WenWu Information Technology Co., LTD.
Hi. Here is the latest set of VCI patches.
The previous huge patch 0002 has been split into smaller parts to make
it more manageable. The split parts are buildable, but they are not
separately functional. In other words, nothing will work unless/until
all of the parts have been applied.
Summary of changes:
Patch 0001 (unchanged)
~
Patch 0002 (content unchanged; split into multiple parts)
BEFORE
0002-VCI-module-main
AFTER
0002-VCI-main-part1
0003-VCI-main-part2
0004-VCI-main-part3
0005-VCI-main-part4
0006-VCI-main-part5
0007-VCI-main-part6
0008-VCI-tests
~
Patch 0003 (content unchanged)
BEFORE
0003-VCI-module-documentation
AFTER
0009-VCI-docs
======
Kind Regards,
Peter Smith.
Fujitsu Australia