VOPS-2.0

Started by Konstantin Knizhnikabout 7 years ago5 messages

k.knizhnik@postgrespro.ru

about 7 years ago

Hi,

I want to introduce new version of VOPS extension for Postgres
(vectorized operations) providing new, more convenient way of usage:
auto-substitution of projections.

The idea of VOPS was to use special vector (tile) types instead of
standard scalar types, i.e. vops_float8 instead of float8.
Such vector types implement all standard arithmetic operators and make
it possible to use them in queries almost in the same way as scalar types.
But each such operator or aggregate function proceed hundred of values,
minimizing interpretation overhead of Postgres executor.
This is why VOPS provides more than ten times performance improvement
for analytic queries performing filtering and aggregation (like Q1/Q6
TPC-H queries).
It was possible to create VOPS tables explicitly but now VOPS provides
much more convenient way: VOPS projections.

It is assumed that data is stored in normal Postgres table and user can
define one or more projections of this table (as in Vertica).
Each projection consists of some subset of attributes of original table
and some of this attributes preserve the attribute type of original table
(them are referred as "scalar attributes"), while other use instead
correspondent VOPS types ("vector attributes").
Scalar attributes of projection can be used for sorting, grouping or
joining. Vector attributes can be used in where clause and in target
list (including aggregate functions).

The main main advantage of new version of VOPS is that it allows to
automatically substitute queries to original table with queries to one
of its projections.
So client should not know about presence of VOPS projection and somehow
change used queries.
VOPS projections are created using create_projection() function which
specifies list of vector and scalar attributes:

select
create_projection('vops_lineitem','lineitem',array['l_shipdate','l_quantity','l_extendedprice','l_discount','l_tax'],array['l_returnflag','l_linestatus']);

In future may be "CREATE PROJECTION..." clause will be supported, but it
requires changes in PostgreSQL core and can't be implemented at
extension level.

If "vops.auto_substitute_projections" option is set to true, then parse
hook installed by VOPS tries all projections defined for the table
and if original table can be substituted by one of this projections,
then alternative query is executed.

By default "vops.auto_substitute_projections" is switched off, because
VOPS doesn't enforce consistency of projections with original table.
As with materialized view, refresh of projections should by done
manually by user. Certainly it is possible to do it using triggers, but
it will be very inefficient:
only bulk updates can provide good update and select performance.
create_projection(XXX) generates XXX_refresh() function which can be
used to periodically update the projection.

VOPS supports all version of Postgres starting from 10.0 and can be
downloaded from github repository:
https://github.com/postgrespro/vops.git

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Bruce Momjian

bruce@momjian.us

about 7 years ago

In reply to: Konstantin Knizhnik (#1)

Re: VOPS-2.0

On Wed, Nov 28, 2018 at 01:01:03PM +0300, Konstantin Knizhnik wrote:

Hi,

I want to introduce new version of VOPS extension for Postgres (vectorized
operations) providing new, more convenient way of usage:
auto-substitution of projections.

[Announce post moved to hackers.]

I remember the good performance numbers from this feature. How does
this interact with the JIT executor feature, which is also designed to
speed up the executor? Is it something that can be combined with JIT?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Alvaro Herrera

alvherre@2ndquadrant.com

about 7 years ago

In reply to: Bruce Momjian (#2)

Re: VOPS-2.0

On 2018-Nov-28, Bruce Momjian wrote:

On Wed, Nov 28, 2018 at 01:01:03PM +0300, Konstantin Knizhnik wrote:

Hi,

I want to introduce new version of VOPS extension for Postgres (vectorized
operations) providing new, more convenient way of usage:
auto-substitution of projections.

[Announce post moved to hackers.]

I remember the good performance numbers from this feature. How does
this interact with the JIT executor feature, which is also designed to
speed up the executor? Is it something that can be combined with JIT?

ISTM that VOPS is a temporary hack that we'll need to include in some
form eventually, but that basing it on top of pluggable storage would be
a better way forward than the current proposal.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 7 years ago

In reply to: Bruce Momjian (#2)

Re: VOPS-2.0

On 28.11.2018 16:18, Bruce Momjian wrote:

On Wed, Nov 28, 2018 at 01:01:03PM +0300, Konstantin Knizhnik wrote:

Hi,

I want to introduce new version of VOPS extension for Postgres (vectorized
operations) providing new, more convenient way of usage:
auto-substitution of projections.

[Announce post moved to hackers.]

I remember the good performance numbers from this feature. How does
this interact with the JIT executor feature, which is also designed to
speed up the executor? Is it something that can be combined with JIT?

JIT and vector execution are more or less alternative solutions of the
same problem: eliminate interpretation overhead.
In JIT it was done be replacing interpretation with native code execution.
In VOPS - by processing more elements by each operation.
This is the implementation of binary operations in VOPS:

    PG_FUNCTION_INFO_V1(vops_##TYPE##_##OP);     \
    Datum vops_##TYPE##_##OP(PG_FUNCTION_ARGS)         \
    {     \
        vops_##TYPE* left = (vops_##TYPE*)PG_GETARG_POINTER(0);         \
        vops_##TYPE* right = (vops_##TYPE*)PG_GETARG_POINTER(1);     \
        vops_##TYPE* result = (vops_##TYPE*)palloc(sizeof(vops_##TYPE));\
        int i;         \
        for (i = 0; i < TILE_SIZE; i++) result->payload[i] =
left->payload[i] COP right->payload[i]; \

So it is just loop through TILE_SIZE elements (128 by default) for which
compiler generates optimal code.
Interpretation overhead is not eliminate but is divided by 128.

In principle this two approaches can be combined (as it was done for
example in HyPer).
But practically, at least for expressions evaluation is has not so much
sense.
Because code generated by compiler for vector function mentioned above
at better (or at least not worser) than one generated by LLVM.
And 100 times reduced interpretation overhead is not noticeable.

Concerning results: now Postgres with JIT is executing Q1 about 30%
faster than without JIT.
VOPS is executing Q1 about 10 times faster.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 7 years ago

In reply to: Alvaro Herrera (#3)

Re: VOPS-2.0

On 28.11.2018 16:45, Alvaro Herrera wrote:

On 2018-Nov-28, Bruce Momjian wrote:

On Wed, Nov 28, 2018 at 01:01:03PM +0300, Konstantin Knizhnik wrote:

Hi,

I want to introduce new version of VOPS extension for Postgres (vectorized
operations) providing new, more convenient way of usage:
auto-substitution of projections.

[Announce post moved to hackers.]

I remember the good performance numbers from this feature. How does
this interact with the JIT executor feature, which is also designed to
speed up the executor? Is it something that can be combined with JIT?

ISTM that VOPS is a temporary hack that we'll need to include in some
form eventually, but that basing it on top of pluggable storage would be
a better way forward than the current proposal.

I am not sure that pluggable storage API will solve all problems.
Right now there are the following challenges with VOPS:
1. Provide convenient way of defining projections for user: not doable
now at extension level.
2. Vector executor: in principle it is possible to reimplement all (or
most) nodes to use vector operations and somehow make optimizer to
generate them. But it requires huge amount of work. And I am not sure
that all can be done et extension level just by defining new custom nodes.
3. Update projections. It can be efficiently done only using bulk
inserts, but it means that projections are not up-to-date with original
table. Hiding internals inside pluggable storage doesn't automatically
solve this problem.

Actually VOPS approach has least of all problems with storage layer,
because VOPS projections are stored as normal Postgres tables.
The main challenge is incorporating vector operations in executor.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company