Columnar store as default for PostgreSQL 10?

Started by Bráulio Bhavamitraalmost 10 years ago18 messagesgeneral
Jump to latest
#1Bráulio Bhavamitra
brauliobo@gmail.com

Hi all,

I'm finally having performance issues with PostgreSQL when doing big
analytics queries over almost the entire database of more than 100gb of
data.

And what I keep reading all over the web is many databases switching to
columnar store (RedShift, Cassandra, cstore_fdw, etc) and having great
performance on queries in general and giant boosts with big analytics
queries.

I wonder if there is any plans to move postgresql entirely to a columnar
store (or at least make it an option), maybe for version 10?

The current extensions are rather limited (types support for example) and
require quite some configuration and data migration to work, besides they
don't work in services like AWS RDS.

best regards,
bráulio

#2Francisco Olarte
folarte@peoplecall.com
In reply to: Bráulio Bhavamitra (#1)
Re: Columnar store as default for PostgreSQL 10?

Hi Bráulio:

On Thu, Apr 21, 2016 at 12:08 PM, Bráulio Bhavamitra
<brauliobo@gmail.com> wrote:

And what I keep reading all over the web is many databases switching to
columnar store (RedShift, Cassandra, cstore_fdw, etc) and having great
performance on queries in general and giant boosts with big analytics
queries.

And have you read anything about the drawbacks of columnar? They are
there, but writing about them does not makes the headlines.

I wonder if there is any plans to move postgresql entirely to a columnar
store (or at least make it an option), maybe for version 10?

An option may be good ( may, not sure because nothing is free. More
complex code, more bug surface, some time will be eaten managing the
extra complexity, less developer time available for each feature, ...
) , but IMHO a complete move would be bad. Columnar is not that good
for a lot of postgres usages. If columnar were the silver bullet
everybody would be doing it.

Francisco Olarte.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Bráulio Bhavamitra (#1)
Re: Columnar store as default for PostgreSQL 10?

On Thu, Apr 21, 2016 at 3:08 AM, Bráulio Bhavamitra <brauliobo@gmail.com>
wrote:

Hi all,

I'm finally having performance issues with PostgreSQL when doing big
analytics queries over almost the entire database of more than 100gb of
data.

And what I keep reading all over the web is many databases switching to
columnar store (RedShift, Cassandra, cstore_fdw, etc) and having great
performance on queries in general and giant boosts with big analytics
queries.

I wonder if there is any plans to move postgresql entirely to a columnar
store (or at least make it an option), maybe for version 10?

The current extensions are rather limited (types support for example) and
require quite some configuration and data migration to work, besides they
don't work in services like AWS RDS.

​I have little experience (and nothing practical) with columnar store but
at a high level I don't see the point. I would hope that anyone interested
in working on a columnar store database would pick an existing one to
improve rather than converting a very successful row store database into
one. And I don't immediately understand how a dual setup would even be
viable - it seems like you'd have to re-write so much
​of the code the only thing left would be the SQL parser.


​David J.

#4Geoff Winkless
pgsqladmin@geoff.dj
In reply to: David G. Johnston (#3)
Re: Columnar store as default for PostgreSQL 10?

On 21 April 2016 at 17:08, David G. Johnston <david.g.johnston@gmail.com> wrote:

I have little experience (and nothing practical) with columnar store but at
a high level I don't see the point. I would hope that anyone interested in
working on a columnar store database would pick an existing one to improve
rather than converting a very successful row store database into one. And I
don't immediately understand how a dual setup would even be viable - it
seems like you'd have to re-write so much
of the code the only thing left would be the SQL parser.

To be fair, I'd say that this "only thing" would be pretty huge. The
cost of changing databases is often prohibitive (or nearly so) because
the parser isn't _quite_ the same, and if the sort of gains that are
bandied about could really be achieved just by choosing columnar
storage for certain tables without having to rewrite large chunks of
code that would be a very big win.

I certainly agree that changing the store to columnar-only makes
little sense though, because it would alienate a lot (I would suggest
the majority) of users whose data fits far better into a row model.

FWIW, looking at the cstore_fdw extension did get me quite excited
(because I have an inkling that quite a lot of our queries might
benefit from such a feature) until I saw that DELETEs aren't possible,
which would invalidate most of the wins for us because of the
subsequent massive cost of modifying data.

There's also an interesting document from the monet_db guys about how
the wins to be gained just by using cstore_fdw (rather than moving to
a native column-store) aren't as high as you would hope. I have a
feeling that would remain the case even if the store were integrated.

https://www.monetdb.org/content/citusdb-postgresql-column-store-vs-monetdb-tpc-h-shootout
" the margin by which MonetDB outperforms cstore_ftw shows that only
switching storage models alone is probably not enough"

Geoff
(Disclaimer: I've no connection to MonetDB in any way)

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Bráulio Bhavamitra
brauliobo@gmail.com
In reply to: Geoff Winkless (#4)
Re: Columnar store as default for PostgreSQL 10?

On Thu, Apr 21, 2016 at 1:39 PM Geoff Winkless <pgsqladmin@geoff.dj> wrote:

On 21 April 2016 at 17:08, David G. Johnston <david.g.johnston@gmail.com>
wrote:

I have little experience (and nothing practical) with columnar store but

at

a high level I don't see the point. I would hope that anyone interested

in

working on a columnar store database would pick an existing one to

improve

rather than converting a very successful row store database into one.

And I

don't immediately understand how a dual setup would even be viable - it
seems like you'd have to re-write so much
of the code the only thing left would be the SQL parser.

To be fair, I'd say that this "only thing" would be pretty huge. The
cost of changing databases is often prohibitive (or nearly so) because
the parser isn't _quite_ the same, and if the sort of gains that are
bandied about could really be achieved just by choosing columnar
storage for certain tables without having to rewrite large chunks of
code that would be a very big win.

I certainly agree that changing the store to columnar-only makes
little sense though, because it would alienate a lot (I would suggest
the majority) of users whose data fits far better into a row model.

FWIW, looking at the cstore_fdw extension did get me quite excited
(because I have an inkling that quite a lot of our queries might
benefit from such a feature) until I saw that DELETEs aren't possible,
which would invalidate most of the wins for us because of the
subsequent massive cost of modifying data.

There's also an interesting document from the monet_db guys about how
the wins to be gained just by using cstore_fdw (rather than moving to
a native column-store) aren't as high as you would hope. I have a
feeling that would remain the case even if the store were integrated.

https://www.monetdb.org/content/citusdb-postgresql-column-store-vs-monetdb-tpc-h-shootout
" the margin by which MonetDB outperforms cstore_ftw shows that only
switching storage models alone is probably not enough"

I think the gains are really high as with big data caching is usually not
really possible.
But of course cstore_fdw should perform better when caching is feasible.

Show quoted text

Geoff
(Disclaimer: I've no connection to MonetDB in any way)

#6Jonathan Eastgate
jonathan.eastgate@simpro.co
In reply to: Bráulio Bhavamitra (#5)
Re: Columnar store as default for PostgreSQL 10?

An interesting topic we have also discussed in our team.

Realistically - this is more about picking the right software for the job.

PostgreSQL has come so far up in it's performance for more general workloads
that it is fast becoming a bit of a darling in the world of cloud - being
able to handle lots of db's and their associated web (fast response
required) queries. So to start to take it down this path I believe would be
detrimental.

There are plenty of DB's out there designed for exactly what you are trying
to do - so maybe a better option is to build yourself someway of having data
stored in another system where you require this sort of data handling.

Some of us are already doing that with great success - so maybe asking how
you accomplish that would be a better question.

Jonathan

--
View this message in context: http://postgresql.nabble.com/Columnar-store-as-default-for-PostgreSQL-10-tp5899731p5899973.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Merlin Moncure
mmoncure@gmail.com
In reply to: Bráulio Bhavamitra (#1)
Re: Columnar store as default for PostgreSQL 10?

On Thu, Apr 21, 2016 at 5:08 AM, Bráulio Bhavamitra <brauliobo@gmail.com> wrote:

Hi all,

I'm finally having performance issues with PostgreSQL when doing big
analytics queries over almost the entire database of more than 100gb of
data.

And what I keep reading all over the web is many databases switching to
columnar store (RedShift, Cassandra, cstore_fdw, etc) and having great
performance on queries in general and giant boosts with big analytics
queries.

I wonder if there is any plans to move postgresql entirely to a columnar
store (or at least make it an option), maybe for version 10?

The current extensions are rather limited (types support for example) and
require quite some configuration and data migration to work, besides they
don't work in services like AWS RDS.

Column stores are better at one case (selecting a few columns from a
very wide table) and worse at just about every other case. Also,
beware database benchmarks -- as they say, there is no free lunch
There is a reason why databases store things in rows.

Analytics in traditional postgres tables is definitely possible, but
you have to be smart.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Guyren Howe
guyren@gmail.com
In reply to: Merlin Moncure (#7)
Re: Columnar store as default for PostgreSQL 10?

On Apr 22, 2016, at 15:03 , Merlin Moncure <mmoncure@gmail.com> wrote:

On Thu, Apr 21, 2016 at 5:08 AM, Bráulio Bhavamitra <brauliobo@gmail.com> wrote:

I'm finally having performance issues with PostgreSQL when doing big
analytics queries over almost the entire database of more than 100gb of
data.

And what I keep reading all over the web is many databases switching to
columnar store (RedShift, Cassandra, cstore_fdw, etc) and having great
performance on queries in general and giant boosts with big analytics
queries.

I wonder if there is any plans to move postgresql entirely to a columnar
store (or at least make it an option), maybe for version 10?

The current extensions are rather limited (types support for example) and
require quite some configuration and data migration to work, besides they
don't work in services like AWS RDS.

Column stores are better at one case (selecting a few columns from a
very wide table) and worse at just about every other case. Also,
beware database benchmarks -- as they say, there is no free lunch
There is a reason why databases store things in rows.

Analytics in traditional postgres tables is definitely possible, but
you have to be smart.

There are tradeoffs; a column store is faster at queries that select a subset of columns. The *big* tradeoff is that insert time increases linearly with the number of columns. Queries that pull a large subset of the columns can also be slower.

I would quite like to set a table to columnar in Postgres, but really you can achieve much the same thing with multiple tables in a 1:1 relationship, so I don't think this would be worth putting much effort into.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bráulio Bhavamitra (#1)
Re: Columnar store as default for PostgreSQL 10?

Bráulio Bhavamitra wrote:

Hi all,

I'm finally having performance issues with PostgreSQL when doing big
analytics queries over almost the entire database of more than 100gb of
data.

And what I keep reading all over the web is many databases switching to
columnar store (RedShift, Cassandra, cstore_fdw, etc) and having great
performance on queries in general and giant boosts with big analytics
queries.

I wonder if there is any plans to move postgresql entirely to a columnar
store (or at least make it an option), maybe for version 10?

This is a pretty interesting question. I wrote an answer, then thought
it would make a good blog post, so it's at
http://blog.2ndquadrant.com/column-store-plans/
I reproduce it below.

Completely replacing the current row-based store wouldn't be a good
idea: it has served us extremely well and I’m pretty sure that replacing
it entirely with a columnar store would be disastrous performance-wise
for OLTP use cases.

That doesn't mean columnar stores are a bad idea in general -- because
they aren't. They just have a more limited use case than "the whole
database". For analytical queries on append-mostly data, a columnar
store is a much more appropriate representation than the regular
row-based store, but not all databases are analytical.

However, in order to attain interesting performance gains you need to do
a lot more than just change the underlying storage: you need to ensure
that the rest of the system can take advantage of the changed
representation, so that it can execute queries optimally; for instance,
you may want aggregates that operate in a SIMD mode rather than
one-value-at-a-time as it is today. This, in itself, is a large
undertaking, and there are other challenges too.

As it turns out, there's a team at 2ndQuadrant working precisely on
these matters. We posted a patch last year, but it wasn’t terribly
interesting -— it only made a single-digit percentage improvement in
TPC-H scores; not enough to bother the development community with (it
was a fairly invasive patch). We want more than that.

In our design, columnar or not is going to be an option: you're going to
be able to say "Dear server, for this table kindly set up columnar
storage for me, would you? Thank you very much." And then you’re going
to get a table which may be slower for regular usage but which will rock
for analytics. For most of your tables the current row-based store will
still likely be the best option, because row-based storage is much
better suited to the more general cases.

We don’t have a timescale yet. Stay tuned.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10Bráulio Bhavamitra
brauliobo@gmail.com
In reply to: Alvaro Herrera (#9)
Re: Columnar store as default for PostgreSQL 10?

On Mon, Apr 25, 2016 at 11:20 AM Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Bráulio Bhavamitra wrote:

Hi all,

I'm finally having performance issues with PostgreSQL when doing big
analytics queries over almost the entire database of more than 100gb of
data.

And what I keep reading all over the web is many databases switching to
columnar store (RedShift, Cassandra, cstore_fdw, etc) and having great
performance on queries in general and giant boosts with big analytics
queries.

I wonder if there is any plans to move postgresql entirely to a columnar
store (or at least make it an option), maybe for version 10?

This is a pretty interesting question. I wrote an answer, then thought
it would make a good blog post, so it's at
http://blog.2ndquadrant.com/column-store-plans/
I reproduce it below.

Completely replacing the current row-based store wouldn't be a good
idea: it has served us extremely well and I’m pretty sure that replacing
it entirely with a columnar store would be disastrous performance-wise
for OLTP use cases.

That doesn't mean columnar stores are a bad idea in general -- because
they aren't. They just have a more limited use case than "the whole
database". For analytical queries on append-mostly data, a columnar
store is a much more appropriate representation than the regular
row-based store, but not all databases are analytical.

However, in order to attain interesting performance gains you need to do
a lot more than just change the underlying storage: you need to ensure
that the rest of the system can take advantage of the changed
representation, so that it can execute queries optimally; for instance,
you may want aggregates that operate in a SIMD mode rather than
one-value-at-a-time as it is today. This, in itself, is a large
undertaking, and there are other challenges too.

As it turns out, there's a team at 2ndQuadrant working precisely on
these matters. We posted a patch last year, but it wasn’t terribly
interesting -— it only made a single-digit percentage improvement in
TPC-H scores; not enough to bother the development community with (it
was a fairly invasive patch). We want more than that.

In our design, columnar or not is going to be an option: you're going to
be able to say "Dear server, for this table kindly set up columnar
storage for me, would you? Thank you very much." And then you’re going
to get a table which may be slower for regular usage but which will rock
for analytics. For most of your tables the current row-based store will
still likely be the best option, because row-based storage is much
better suited to the more general cases.

Nice Alvaro, I think that's the right approach.

Wish a good work for you on that :)

cheers,
bráulio

Show quoted text

We don’t have a timescale yet. Stay tuned.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#11Merlin Moncure
mmoncure@gmail.com
In reply to: Alvaro Herrera (#9)
Re: Columnar store as default for PostgreSQL 10?

On Mon, Apr 25, 2016 at 9:20 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Bráulio Bhavamitra wrote:

Hi all,

I'm finally having performance issues with PostgreSQL when doing big
analytics queries over almost the entire database of more than 100gb of
data.

And what I keep reading all over the web is many databases switching to
columnar store (RedShift, Cassandra, cstore_fdw, etc) and having great
performance on queries in general and giant boosts with big analytics
queries.

I wonder if there is any plans to move postgresql entirely to a columnar
store (or at least make it an option), maybe for version 10?

This is a pretty interesting question. I wrote an answer, then thought
it would make a good blog post, so it's at
http://blog.2ndquadrant.com/column-store-plans/
I reproduce it below.

Completely replacing the current row-based store wouldn't be a good
idea: it has served us extremely well and I’m pretty sure that replacing
it entirely with a columnar store would be disastrous performance-wise
for OLTP use cases.

That doesn't mean columnar stores are a bad idea in general -- because
they aren't. They just have a more limited use case than "the whole
database". For analytical queries on append-mostly data, a columnar
store is a much more appropriate representation than the regular
row-based store, but not all databases are analytical.

However, in order to attain interesting performance gains you need to do
a lot more than just change the underlying storage: you need to ensure
that the rest of the system can take advantage of the changed
representation, so that it can execute queries optimally; for instance,
you may want aggregates that operate in a SIMD mode rather than
one-value-at-a-time as it is today. This, in itself, is a large
undertaking, and there are other challenges too.

As it turns out, there's a team at 2ndQuadrant working precisely on
these matters. We posted a patch last year, but it wasn’t terribly
interesting -— it only made a single-digit percentage improvement in
TPC-H scores; not enough to bother the development community with (it
was a fairly invasive patch). We want more than that.

In our design, columnar or not is going to be an option: you're going to
be able to say "Dear server, for this table kindly set up columnar
storage for me, would you? Thank you very much." And then you’re going
to get a table which may be slower for regular usage but which will rock
for analytics. For most of your tables the current row-based store will
still likely be the best option, because row-based storage is much
better suited to the more general cases.

We don’t have a timescale yet. Stay tuned.

Please keep us posted.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12George Neuner
gneuner2@comcast.net
In reply to: Bráulio Bhavamitra (#1)
Re: Columnar store as default for PostgreSQL 10?

On Thu, 21 Apr 2016 09:08:22 -0700, "David G. Johnston"
<david.g.johnston@gmail.com> wrote:

?I have little experience (and nothing practical) with columnar store but
at a high level I don't see the point.

At the high level, it's about avoiding fetching data you don't need.
In a row store system, in general you must fetch the whole row to
extract any of its columns.

It is not difficult to simulate column store in a row store system if
you're willing to decompose your tables into (what is essentially)
BCNF fragments. It simply is laborious for designers and programmers.

I would hope that anyone interested in working on a columnar store
database would pick an existing one to improve rather than converting
a very successful row store database into one.

+1

And I don't immediately understand how a dual setup would even be
viable - it seems like you'd have to re-write so much
?of the code the only thing left would be the SQL parser.

If you are willing to go to BCNF and manage the physical location of
your tables [which any performance system will be doing anyway], then
any decent row store system can mix in "column" tables where desired.

IMO, the only real added value of a dedicated column store system is
to developers: the automagic table fragmentation and the ability to
query virtual tables rather than specify table fragments individually.
Convenient, but not necessary.

YMMV,
George

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#13Adam Brusselback
adambrusselback@gmail.com
In reply to: George Neuner (#12)
Re: Columnar store as default for PostgreSQL 10?

It is not difficult to simulate column store in a row store system if
you're willing to decompose your tables into (what is essentially)
BCNF fragments. It simply is laborious for designers and programmers.

I could see a true column store having much better performance than
tricking a row based system into it. Just think of the per-row overhead we
currently have at 28 bytes per row. Breaking up data manually like that
may help a little, but if you don't have a very wide table to begin with,
it could turn out you save next to nothing by doing so. A column store
wouldn't have this issue, and could potentially have much better
performance.

#14George Neuner
gneuner2@comcast.net
In reply to: Bráulio Bhavamitra (#1)
Re: Columnar store as default for PostgreSQL 10?

On Mon, 25 Apr 2016 21:48:44 -0400, Adam Brusselback
<adambrusselback@gmail.com> wrote:

It is not difficult to simulate column store in a row store system if
you're willing to decompose your tables into (what is essentially)
BCNF fragments. It simply is laborious for designers and programmers.

I could see a true column store having much better performance than
tricking a row based system into it. Just think of the per-row overhead we
currently have at 28 bytes per row. Breaking up data manually like that
may help a little, but if you don't have a very wide table to begin with,
it could turn out you save next to nothing by doing so. A column store
wouldn't have this issue, and could potentially have much better
performance.

A column store must be able to distinguish entries in the column
[which may be non-unique] as well as join the columns of the
fragmented virtual table to reconstruct its rows.

These requirements dictate that a "column" be at least a triple:

{ id, table_row, data }

so there is no space saving WRT row store - the opposite in fact:
column store usually requires more space.

Column store enhances performance mainly by not fetching and not
caching unused data. And standard practices like controlling the
physical locations of tables help both row and column store systems.

George

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#15Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#9)
Re: Columnar store as default for PostgreSQL 10?

On Mon, Apr 25, 2016 at 11:20:11AM -0300, Alvaro Herrera wrote:

In our design, columnar or not is going to be an option: you're going to
be able to say "Dear server, for this table kindly set up columnar
storage for me, would you? Thank you very much." And then you’re going
to get a table which may be slower for regular usage but which will rock
for analytics. For most of your tables the current row-based store will
still likely be the best option, because row-based storage is much
better suited to the more general cases.

I am coming late to this thread, but one item not discussed about
columnar storage is the use of compression of identical column values
across rows. Existing Postgres storage only compresses single values,
not values across rows.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#16Bráulio Bhavamitra
brauliobo@gmail.com
In reply to: Alvaro Herrera (#9)
Re: Columnar store as default for PostgreSQL 10?

Alvaro, is this related or dependent on
https://www.pgcon.org/2016/schedule/events/920.en.html ?

On Mon, Apr 25, 2016 at 11:20 AM Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Show quoted text

Bráulio Bhavamitra wrote:

Hi all,

I'm finally having performance issues with PostgreSQL when doing big
analytics queries over almost the entire database of more than 100gb of
data.

And what I keep reading all over the web is many databases switching to
columnar store (RedShift, Cassandra, cstore_fdw, etc) and having great
performance on queries in general and giant boosts with big analytics
queries.

I wonder if there is any plans to move postgresql entirely to a columnar
store (or at least make it an option), maybe for version 10?

This is a pretty interesting question. I wrote an answer, then thought
it would make a good blog post, so it's at
http://blog.2ndquadrant.com/column-store-plans/
I reproduce it below.

Completely replacing the current row-based store wouldn't be a good
idea: it has served us extremely well and I’m pretty sure that replacing
it entirely with a columnar store would be disastrous performance-wise
for OLTP use cases.

That doesn't mean columnar stores are a bad idea in general -- because
they aren't. They just have a more limited use case than "the whole
database". For analytical queries on append-mostly data, a columnar
store is a much more appropriate representation than the regular
row-based store, but not all databases are analytical.

However, in order to attain interesting performance gains you need to do
a lot more than just change the underlying storage: you need to ensure
that the rest of the system can take advantage of the changed
representation, so that it can execute queries optimally; for instance,
you may want aggregates that operate in a SIMD mode rather than
one-value-at-a-time as it is today. This, in itself, is a large
undertaking, and there are other challenges too.

As it turns out, there's a team at 2ndQuadrant working precisely on
these matters. We posted a patch last year, but it wasn’t terribly
interesting -— it only made a single-digit percentage improvement in
TPC-H scores; not enough to bother the development community with (it
was a fairly invasive patch). We want more than that.

In our design, columnar or not is going to be an option: you're going to
be able to say "Dear server, for this table kindly set up columnar
storage for me, would you? Thank you very much." And then you’re going
to get a table which may be slower for regular usage but which will rock
for analytics. For most of your tables the current row-based store will
still likely be the best option, because row-based storage is much
better suited to the more general cases.

We don’t have a timescale yet. Stay tuned.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#17Merlin Moncure
mmoncure@gmail.com
In reply to: Adam Brusselback (#13)
Re: Columnar store as default for PostgreSQL 10?

On Mon, Apr 25, 2016 at 8:48 PM, Adam Brusselback
<adambrusselback@gmail.com> wrote:

It is not difficult to simulate column store in a row store system if
you're willing to decompose your tables into (what is essentially)
BCNF fragments. It simply is laborious for designers and programmers.

I could see a true column store having much better performance than tricking
a row based system into it. Just think of the per-row overhead we currently
have at 28 bytes per row. Breaking up data manually like that may help a
little, but if you don't have a very wide table to begin with, it could turn
out you save next to nothing by doing so. A column store wouldn't have this
issue, and could potentially have much better performance.

FYI tuple header is 23 bytes, not 28 bytes
(http://www.postgresql.org/docs/9.5/static/storage-page-layout.html).
Personally I think column stores are a bit overrated. They are faster
at certain things (in some cases much faster) but tend to put pretty
onerous requirements on application design so that they are very much
a special case vehicle.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#18Edson Richter
edsonrichter@hotmail.com
In reply to: Merlin Moncure (#17)
Re: Columnar store as default for PostgreSQL 10?

Em 17/05/2016 11:07, Merlin Moncure escreveu:

On Mon, Apr 25, 2016 at 8:48 PM, Adam Brusselback
<adambrusselback@gmail.com> wrote:

It is not difficult to simulate column store in a row store system if
you're willing to decompose your tables into (what is essentially)
BCNF fragments. It simply is laborious for designers and programmers.

I could see a true column store having much better performance than tricking
a row based system into it. Just think of the per-row overhead we currently
have at 28 bytes per row. Breaking up data manually like that may help a
little, but if you don't have a very wide table to begin with, it could turn
out you save next to nothing by doing so. A column store wouldn't have this
issue, and could potentially have much better performance.

FYI tuple header is 23 bytes, not 28 bytes
(http://www.postgresql.org/docs/9.5/static/storage-page-layout.html).
Personally I think column stores are a bit overrated. They are faster
at certain things (in some cases much faster) but tend to put pretty
onerous requirements on application design so that they are very much
a special case vehicle.

merlin

+1 (to not change current defaults).

I would tend to avoid columnar store "as default" because this would
badly affect hundred of thousands of applications around the world.
Columnar store should have its own niche, but certainly doesn't fit my
needs.

Would you give a "option to change the store" is another history.

As I work with objects at programming side, and ORM works just so well,
it is a really waste of time (and other resources) to change systems
that are working well in the past 10 or more years.

Just my 2c,

Edson Richter

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general