AXLE Plans for 9.5 and 9.6

Started by Simon Riggsover 11 years ago14 messages
#1Simon Riggs
simon@2ndQuadrant.com

I've discussed 2ndQuadrant's involvement in the AXLE project a few
times publicly, but never on this mailing list. The project relates to
innovation and improvement in Business Intelligence for systems based
upon PostgreSQL in the range of 10-100TB.

Our work will span the 9.5 and 9.6 cycles. We're looking to make
measurable improvements in a number of cases; one of those is TPC-H,
since its a publicly accessible benchmark, another is a more private
benchmark on healthcare data. In brief, this means speeding up the
performance of large queries, data loading and looking at very large
systems issues.

Some of areas of R&D are definitely on the roadmap, others are more
flexible. Some of this is in progress, other stuff is not even at the
design stage - yet, just a few paragraphs along the lines of "we will
look at these topics". If we have room, its possible we may
accommodate other topics; this is not carte blanche, but the reason
for posting here is so people know we will take input, following the
normal community process. Detailed in-person discussions at PGCon are
expected and the Wiki pages will be updated for each aspect.

BI-related Indexing
* MinMax indexes
* Bitmap indexes

Large Systems
* Freeze avoidance
* Storage management issues for very large systems

Storage Efficiency
* Compression
* Column Orientation

Optimisation
* Bulk loading speed improvements
* Bulk FK evaluation
* Executor tuning for very large queries

Query tuning
* Approximate queries, sampling
* Materialized Views

...and possibly some other aspects.

2ndQuadrant is also assisting other researchers on GPU and FPGA
topics, which may also yield work of interest to PostgreSQL project.

Couple of points: The project is time limited, so if work gets pushed
back beyond that then we'll lose the opportunity to contribute. Please
support our work with timely objections, assistance in defining the
path forwards and limiting the scope to something that avoids wasting
this opportunity. Further funding is possible if we don't squander
this. We are being funded to make best efforts to contribute to open
source PostgreSQL, not pay-for-commit.

AXLE is funded by the EU under FP7 Grant Agreement 318633. Further
details are available here http://www.axleproject.eu/

(There are also other 2ndQuadrant development projects in progress,
this is just one of the larger ones).

Best Regards

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Josh Berkus
josh@agliodbs.com
In reply to: Simon Riggs (#1)
Re: AXLE Plans for 9.5 and 9.6

On 04/21/2014 03:41 PM, Simon Riggs wrote:

Storage Efficiency
* Compression
* Column Orientation

You might look at turning this:

http://citusdata.github.io/cstore_fdw/

... into a more integrated part of Postgres.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Jov
amutu@amutu.com
In reply to: Simon Riggs (#1)
Re: AXLE Plans for 9.5 and 9.6

what about runtime code generation using LLVM?
http://blog.cloudera.com/blog/2013/02/inside-cloudera-impala-runtime-code-generation/
http://llvm.org/devmtg/2013-11/slides/Wanderman-Milne-Cloudera.pdf

Jov
blog: http:amutu.com/blog <http://amutu.com/blog&gt;

2014-04-22 6:41 GMT+08:00 Simon Riggs <simon@2ndquadrant.com>:

Show quoted text

I've discussed 2ndQuadrant's involvement in the AXLE project a few
times publicly, but never on this mailing list. The project relates to
innovation and improvement in Business Intelligence for systems based
upon PostgreSQL in the range of 10-100TB.

Our work will span the 9.5 and 9.6 cycles. We're looking to make
measurable improvements in a number of cases; one of those is TPC-H,
since its a publicly accessible benchmark, another is a more private
benchmark on healthcare data. In brief, this means speeding up the
performance of large queries, data loading and looking at very large
systems issues.

Some of areas of R&D are definitely on the roadmap, others are more
flexible. Some of this is in progress, other stuff is not even at the
design stage - yet, just a few paragraphs along the lines of "we will
look at these topics". If we have room, its possible we may
accommodate other topics; this is not carte blanche, but the reason
for posting here is so people know we will take input, following the
normal community process. Detailed in-person discussions at PGCon are
expected and the Wiki pages will be updated for each aspect.

BI-related Indexing
* MinMax indexes
* Bitmap indexes

Large Systems
* Freeze avoidance
* Storage management issues for very large systems

Storage Efficiency
* Compression
* Column Orientation

Optimisation
* Bulk loading speed improvements
* Bulk FK evaluation
* Executor tuning for very large queries

Query tuning
* Approximate queries, sampling
* Materialized Views

...and possibly some other aspects.

2ndQuadrant is also assisting other researchers on GPU and FPGA
topics, which may also yield work of interest to PostgreSQL project.

Couple of points: The project is time limited, so if work gets pushed
back beyond that then we'll lose the opportunity to contribute. Please
support our work with timely objections, assistance in defining the
path forwards and limiting the scope to something that avoids wasting
this opportunity. Further funding is possible if we don't squander
this. We are being funded to make best efforts to contribute to open
source PostgreSQL, not pay-for-commit.

AXLE is funded by the EU under FP7 Grant Agreement 318633. Further
details are available here http://www.axleproject.eu/

(There are also other 2ndQuadrant development projects in progress,
this is just one of the larger ones).

Best Regards

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Simon Riggs
simon@2ndQuadrant.com
In reply to: Josh Berkus (#2)
Re: AXLE Plans for 9.5 and 9.6

On 22 April 2014 00:24, Josh Berkus <josh@agliodbs.com> wrote:

On 04/21/2014 03:41 PM, Simon Riggs wrote:

Storage Efficiency
* Compression
* Column Orientation

You might look at turning this:

http://citusdata.github.io/cstore_fdw/

... into a more integrated part of Postgres.

Of course I'm aware of that work - credit to them. Certainly, many
people feel that it is now time to do as you suggest and include
column store features within PostgreSQL.

As to turning it into a more integrated part of Postgres, we have a
few problems there

1. cstore_fdw code has an incompatible licence

2. I don't think FDWs are the right place for complex new
architectures such as column store, massively parallel processing or
sharding. The fact that it is probably the best place to implement it
in user space doesn't mean it transfers well into core code. That's a
shame and I don't know what to do about it, because it would be nice
to simply ask for change of licence and then integrate it, but it
seems more work than that (to me).

cstore_fdw uses ORC, which interestingly stores "lightweight index"
values that look exactly like MinMax indexes, so at least PostgreSQL
shoiuld be getting that soon.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Simon Riggs
simon@2ndQuadrant.com
In reply to: Jov (#3)
Re: AXLE Plans for 9.5 and 9.6

On 22 April 2014 10:42, Jov <amutu@amutu.com> wrote:

what about runtime code generation using LLVM?
http://blog.cloudera.com/blog/2013/02/inside-cloudera-impala-runtime-code-generation/
http://llvm.org/devmtg/2013-11/slides/Wanderman-Milne-Cloudera.pdf

Those techniques have been in use for at least 20 years on various platforms.

The main issues PostgreSQL faces is supporting many platforms and
compilers, while at the same time supporting extensible data types.

I believe there is some research work into run-time compilation in
progress, but that seems unlikely to make it into Postgres core.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Hannu Krosing
hannu@2ndQuadrant.com
In reply to: Josh Berkus (#2)
Re: AXLE Plans for 9.5 and 9.6

On 04/22/2014 01:24 AM, Josh Berkus wrote:

On 04/21/2014 03:41 PM, Simon Riggs wrote:

Storage Efficiency
* Compression
* Column Orientation

You might look at turning this:

http://citusdata.github.io/cstore_fdw/

... into a more integrated part of Postgres.

What would be of more general usefulness is probably
better planning and better performance of FDW interface.

So instead of integrating one specific FDW it would make
sense to improve postgresql so that it can use (properly written)
FDWs at native speeds

Regards

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7MauMau
maumau307@gmail.com
In reply to: Simon Riggs (#1)
Re: AXLE Plans for 9.5 and 9.6

From: "Simon Riggs" <simon@2ndQuadrant.com>

Some of areas of R&D are definitely on the roadmap, others are more
flexible. Some of this is in progress, other stuff is not even at the
design stage - yet, just a few paragraphs along the lines of "we will
look at these topics". If we have room, its possible we may
accommodate other topics; this is not carte blanche, but the reason
for posting here is so people know we will take input, following the
normal community process. Detailed in-person discussions at PGCon are
expected and the Wiki pages will be updated for each aspect.

BI-related Indexing
* MinMax indexes
* Bitmap indexes

Large Systems
* Freeze avoidance
* Storage management issues for very large systems

Storage Efficiency
* Compression
* Column Orientation

Optimisation
* Bulk loading speed improvements
* Bulk FK evaluation
* Executor tuning for very large queries

Query tuning
* Approximate queries, sampling
* Materialized Views

Great! I'm looking forward to seeing PostgreSQL evolve as an analytics
database for data warehousing. Is there any reason why in-memory database
and MPP is not included?

Are you planning to include the above features in 9.5 and 9.6? Are you
recommending other developers not implement these features to avoid
duplication of work with AXLE?

Regards
MauMau

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Hannu Krosing
hannu@krosing.net
In reply to: Simon Riggs (#4)
Re: AXLE Plans for 9.5 and 9.6

On 04/22/2014 02:04 PM, Simon Riggs wrote:

On 22 April 2014 00:24, Josh Berkus <josh@agliodbs.com> wrote:

On 04/21/2014 03:41 PM, Simon Riggs wrote:

Storage Efficiency
* Compression
* Column Orientation

You might look at turning this:

http://citusdata.github.io/cstore_fdw/

... into a more integrated part of Postgres.

Of course I'm aware of that work - credit to them. Certainly, many
people feel that it is now time to do as you suggest and include
column store features within PostgreSQL.

As to turning it into a more integrated part of Postgres, we have a
few problems there

1. cstore_fdw code has an incompatible licence

2. I don't think FDWs are the right place for complex new
architectures such as column store, massively parallel processing or
sharding.

I agree that FDW is not an end-all solution for all these, but it is a
reasonable starting point and it just might be that the extra things
needed could be added to our FDW API instead of sewing it directly
into backend guts.

I recently tried to implement sharding at FDW level and the main
problem I ran into was a missing join type for efficiently using it
for certain queries.

The specific use case was queries of form

select l.*, r*
from remotetable r
join localtable l
on l.key1 = r.id and l.n = N;

PostgreSQL offered only two options:

1) full scan on remote table

2) single id=$ selects

neither of which are what is actually needed, as firs performs badly
if there are more than a few rows in remote table and 2nd performs
badly if l.n = N returns more than a few rows

when I manually rewrote the query to

select l.*, r*
from remotetable r where r.id = ANY(ARRAY(select key1 from localtable
where n = N))
join localtable l
on l.key1 = r.id and l.n = N;

it run really well.

Unfortunately this is not something that postgreSQL considers by itself
while optimising.

BTW, this kind of optimisation should also be a win for really large IN
queries if we
could have an indexed IN whic would not start each lookup from the index
root, but
rather would sort the IN contents and do an index merge vis skipping
from current position.

Cheers

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Andrew Dunstan
andrew@dunslane.net
In reply to: MauMau (#7)
Re: AXLE Plans for 9.5 and 9.6

On 04/22/2014 08:15 AM, MauMau wrote:

Are you planning to include the above features in 9.5 and 9.6? Are you
recommending other developers not implement these features to avoid
duplication of work with AXLE?

Without pointing any fingers, I should note that I have learned the hard
way to take such recommendations with a grain of salt. More than once I
have been stopped from working on something because someone else said
they were, only for nothing to appear, and in the interests of full
disclosure I can think of two significant instances when I have been
similarly guilty, although the most serious of those has since been
rectified by someone else.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Andrew Dunstan
andrew@dunslane.net
In reply to: Simon Riggs (#4)
Re: AXLE Plans for 9.5 and 9.6

On 04/22/2014 08:04 AM, Simon Riggs wrote:

On 22 April 2014 00:24, Josh Berkus <josh@agliodbs.com> wrote:

On 04/21/2014 03:41 PM, Simon Riggs wrote:

Storage Efficiency
* Compression
* Column Orientation

You might look at turning this:

http://citusdata.github.io/cstore_fdw/

... into a more integrated part of Postgres.

Of course I'm aware of that work - credit to them. Certainly, many
people feel that it is now time to do as you suggest and include
column store features within PostgreSQL.

As to turning it into a more integrated part of Postgres, we have a
few problems there

1. cstore_fdw code has an incompatible licence

2. I don't think FDWs are the right place for complex new
architectures such as column store, massively parallel processing or
sharding. The fact that it is probably the best place to implement it
in user space doesn't mean it transfers well into core code. That's a
shame and I don't know what to do about it, because it would be nice
to simply ask for change of licence and then integrate it, but it
seems more work than that (to me).

I agree, and indeed that was something like my first reaction to hearing
about this development - FDW seems like a very odd way to handle this.
But the notion of builtin columnar storage suggests to me that we really
need first to tackle how various storage engines might be incorporated
into Postgres. I know this has been a bugbear for many years, but maybe
now with serious proposals for alternative storage engines on the
horizon we can no longer afford to put off the evil day when we grapple
with it.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Simon Riggs
simon@2ndQuadrant.com
In reply to: MauMau (#7)
Re: AXLE Plans for 9.5 and 9.6

On 22 April 2014 13:15, MauMau <maumau307@gmail.com> wrote:

Great! I'm looking forward to seeing PostgreSQL evolve as an analytics
database for data warehousing. Is there any reason why in-memory database
and MPP is not included?

Those ideas are valid; the features are bounded by resource
constraints of time and money, as well as by technical skills/
capacities of my fellow developers. My analysis has been that
implementing parallelism has lower benefit/cost ratio than other
features, as well as requiring more expensive servers (for MPP). I
expect MPP to be an eventual end goal from BDR project.

Are you planning to include the above features in 9.5 and 9.6?

Yes

Are you
recommending other developers not implement these features to avoid
duplication of work with AXLE?

This was more to draw attention to the work so that all interested
parties can participate in producing something useful.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Stephen Frost
sfrost@snowman.net
In reply to: Andrew Dunstan (#10)
Re: AXLE Plans for 9.5 and 9.6

* Andrew Dunstan (andrew@dunslane.net) wrote:

I agree, and indeed that was something like my first reaction to
hearing about this development - FDW seems like a very odd way to
handle this. But the notion of builtin columnar storage suggests to
me that we really need first to tackle how various storage engines
might be incorporated into Postgres. I know this has been a bugbear
for many years, but maybe now with serious proposals for alternative
storage engines on the horizon we can no longer afford to put off
the evil day when we grapple with it.

Agreed, and it goes beyond just columnar stores- I could see IOTs being
implemented using this notion of a different 'storage engine', but
calling it a 'storage engine' makes it sound like we want to change how
we access files and I don't think we really want to change that but
rather come up with a way to have an alternative heap.. Columnar or
IOTs would still be page-based and go through shared buffers, etc, I'd
think..

Thanks,

Stephen

#13Josh Berkus
josh@agliodbs.com
In reply to: Simon Riggs (#1)
Re: AXLE Plans for 9.5 and 9.6

On 04/22/2014 06:39 AM, Andrew Dunstan wrote:

I agree, and indeed that was something like my first reaction to hearing
about this development - FDW seems like a very odd way to handle this.
But the notion of builtin columnar storage suggests to me that we really
need first to tackle how various storage engines might be incorporated
into Postgres. I know this has been a bugbear for many years, but maybe
now with serious proposals for alternative storage engines on the
horizon we can no longer afford to put off the evil day when we grapple
with it.

Yes. *IF* PostgreSQL already supported alternate storage, then the
Citus folks might have released their CStore as a storage plugin instead
of an FDW. However, if they'd waited for pluggable storage, they'd
still be waiting.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Pavel Stehule
pavel.stehule@gmail.com
In reply to: Josh Berkus (#13)
Re: AXLE Plans for 9.5 and 9.6

2014-04-22 19:02 GMT+02:00 Josh Berkus <josh@agliodbs.com>:

On 04/22/2014 06:39 AM, Andrew Dunstan wrote:

I agree, and indeed that was something like my first reaction to hearing
about this development - FDW seems like a very odd way to handle this.
But the notion of builtin columnar storage suggests to me that we really
need first to tackle how various storage engines might be incorporated
into Postgres. I know this has been a bugbear for many years, but maybe
now with serious proposals for alternative storage engines on the
horizon we can no longer afford to put off the evil day when we grapple
with it.

Yes. *IF* PostgreSQL already supported alternate storage, then the
Citus folks might have released their CStore as a storage plugin instead
of an FDW. However, if they'd waited for pluggable storage, they'd
still be waiting.

I am sceptical - what I know about OLAP column store databases - they need
a hardly different planner, so just engine or storage is not enough. Vector
Wise try to merge Ingres with Monet engine more than four years - and still
has some issues.

Our extensibility is probably major barrier against fast OLAP - I see a
most realistic way to support better partitioning and going in direction
higher parallelism and distribution - and maybe map/reduce support.

In GoodData we use successfully Postgres for BI projects to 20G with fast
response - and most painfulness are missing MERGE, missing fault tolerant
copy, IO expensive update of large tables with lot of indexes and missing
simple massive partitioning. On second hand - Postgres works perfectly on
thousands databases with thousands tables without errors with terrible
simple deploying in cloud environment.

Regards

Pavel

Show quoted text

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers