Can we use parallel workers to create index without active/transaction snapshot?

Started by Hao Zhangover 1 year ago2 messages

zhrt1446384557@gmail.com

over 1 year ago

Hi hackers,
I'm doing work related to creating an index with parallel workers. I found
that SnapshotAny
is used in table_beginscan_parallel() when indexInfo->ii_Concurrent Is set
to false. So can we
not pass the snapshot from the parallel worker creator to the parallel
worker? like this:
```
InitializeParallelDSM()
{
...

if (is_concurrent == false)
{
/* Serialize the active snapshot. */
asnapspace = shm_toc_allocate(pcxt->toc, asnaplen);
SerializeSnapshot(active_snapshot, asnapspace);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_ACTIVE_SNAPSHOT,
asnapspace);
}

...
}

ParallelWorkerMain()
{
...

if(is_concurrent == false)
{
asnapspace = shm_toc_lookup(toc, PARALLEL_KEY_ACTIVE_SNAPSHOT,
false);
tsnapspace = shm_toc_lookup(toc, PARALLEL_KEY_TRANSACTION_SNAPSHOT,
true);
asnapshot = RestoreSnapshot(asnapspace);
tsnapshot = tsnapspace ? RestoreSnapshot(tsnapspace) : asnapshot;
RestoreTransactionSnapshot(tsnapshot,
fps->parallel_leader_pgproc);
PushActiveSnapshot(asnapshot);
}

...
}
```

I would appreciate your help.

With Regards
Hao Zhang

tomas.vondra@enterprisedb.com

over 1 year ago

In reply to: Hao Zhang (#1)

Re: Can we use parallel workers to create index without active/transaction snapshot?

On 7/19/24 09:11, Hao Zhang wrote:

Hi hackers,
I'm doing work related to creating an index with parallel workers. I found
that SnapshotAny
is used in table_beginscan_parallel() when indexInfo->ii_Concurrent Is set
to false. So can we
not pass the snapshot from the parallel worker creator to the parallel
worker? like this:

Maybe, but I wonder why are you thinking about doing this. I'm guessing
you're trying to skip "unnecessary" stuff to make parallel workers
faster, or is the goal different? FWIW I doubt this will make measurable
difference, I'd expect the mere fork() to be way more expensive than
copying the SnapshotAny (which I think is pretty small).

Up to you, but I'd suggest doing some measurements first, to show how
much overhead this actually is.

```> InitializeParallelDSM()
{
...

if (is_concurrent == false)
{
/* Serialize the active snapshot. */
asnapspace = shm_toc_allocate(pcxt->toc, asnaplen);
SerializeSnapshot(active_snapshot, asnapspace);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_ACTIVE_SNAPSHOT,
asnapspace);
}

...
}

ParallelWorkerMain()
{
...

if(is_concurrent == false)
{
asnapspace = shm_toc_lookup(toc, PARALLEL_KEY_ACTIVE_SNAPSHOT,
false);
tsnapspace = shm_toc_lookup(toc, PARALLEL_KEY_TRANSACTION_SNAPSHOT,
true);
asnapshot = RestoreSnapshot(asnapspace);
tsnapshot = tsnapspace ? RestoreSnapshot(tsnapspace) : asnapshot;
RestoreTransactionSnapshot(tsnapshot,
fps->parallel_leader_pgproc);
PushActiveSnapshot(asnapshot);
}

...
}
```

It's not clear to me where you get the is_concurrent flag in those
places. Also, in ParallelWorkerMain() you probably should not skip
restoring the transaction snapshot.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company