VLDB Features
I'm starting work on next projects for 8.4.
Many applications have the need to store very large data volumes for
both archival and analysis. The analytic databases are commonly known as
Data Warehouses, though there isn't a common term for large archival
data stores. The use cases for those can often be blurred and many
people see those as only one use case. My initial interest is in the
large archival data stores.
One of the main issues to be faced is simply data maintenance and
management. Loading, deleting, vacuuming data all takes time. Those
issues relate mainly to the size of the data store rather than any
particular workload, so I'm calling that set of required features "Very
Large Database" (or VLDB) features.
VLDB Features I'm expecting to work on are
- Read Only Tables/WORM tables
- Advanced Partitioning
- Compression
plus related performance features
Details of those will be covered in separate mails over next few weeks
and months. So just to let everybody know that's where I'm headed, so
you see the big picture with me.
I'll be working on other projects as well, many of which I've listed
here: http://developer.postgresql.org/index.php/Simon_Riggs%
27_Development_Projects I expect the list is too long to complete for
8.4, but I'm allowing for various issues arising during development.
So specific discussion on other mails as they arrive, please.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
Simon.
VLDB Features I'm expecting to work on are
- Read Only Tables/WORM tables
- Advanced Partitioning
- Compression
plus related performance features
Just so you don't lose sight of it, one of the biggest VLDB features we're
missing is fault-tolerant bulk load. Unfortunately, I don't know anyone
who's working on it.
--
Josh Berkus
PostgreSQL @ Sun
San Francisco
Ühel kenal päeval, T, 2007-12-11 kell 10:53, kirjutas Josh Berkus:
Simon.
VLDB Features I'm expecting to work on are
- Read Only Tables/WORM tables
- Advanced Partitioning
- Compression
plus related performance featuresJust so you don't lose sight of it, one of the biggest VLDB features we're
missing is fault-tolerant bulk load.
What do you mean by fault-tolerant here ?
Just
COPY ... WITH ERRORS TO ...
or something more advanced, like bulkload which can be continued after
crash ?
--------------
Hannu
On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote:
Simon.
VLDB Features I'm expecting to work on are
- Read Only Tables/WORM tables
- Advanced Partitioning
- Compression
plus related performance featuresJust so you don't lose sight of it, one of the biggest VLDB features we're
missing is fault-tolerant bulk load. Unfortunately, I don't know anyone
who's working on it.
Not lost sight of it; I have a design, but I have to prioritise also.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
Hannu,
COPY ... WITH ERRORS TO ...
Yeah, that's a start.
or something more advanced, like bulkload which can be continued after
crash ?
Well, we could also use a loader which automatically parallelized, but that
functionality can be done at the middleware level. WITH ERRORS is the
most critical part.
Here's the other VLDB features we're missing:
Parallel Query
Windowing Functions
Parallel Index Build (not sure how this works exactly, but it speeds Oracle
up considerably)
On-disk Bitmap Index (anyone game to finish GP patch?)
Simon, we should start a VLDB-Postgres developer wiki page.
--
--Josh
Josh Berkus
PostgreSQL @ Sun
San Francisco
On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote:
Just so you don't lose sight of it, one of the biggest VLDB features we're
missing is fault-tolerant bulk load.
I actually had to cook up a version of this for Truviso recently. I'll
take a look at submitting a cleaned-up implementation for 8.4.
-Neil
On Tue, 2007-12-11 at 15:31 -0800, Josh Berkus wrote:
Here's the other VLDB features we're missing:
Parallel Query
Windowing Functions
Parallel Index Build (not sure how this works exactly, but it speeds Oracle
up considerably)
On-disk Bitmap Index (anyone game to finish GP patch?)
I would call those VLDB Data Warehousing features to differentiate
between that and the use of VLDBs for other purposes.
I'd add Materialized View support in the planner, as well as saying its
more important than parallel query, IMHO. MVs are to DW what indexes are
to OLTP. It's the same as indexes vs. seqscan; you can speed up the seq
scan or you can avoid it. Brute force is cool, but being smarter is even
better.
The reason they don't normally show up high on anybody's feature list is
that the TPC benchmarks specifically disallow them, which as I once
observed is very good support for them being a useful feature in
practice. (Oracle originally brought out MV support as a way of
improving their TPC scores at a time when Teradata was wiping the floor
with parallel query implementation).
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
On Tue, 11 Dec 2007, Josh Berkus wrote:
Just so you don't lose sight of it, one of the biggest VLDB features we're
missing is fault-tolerant bulk load. Unfortunately, I don't know anyone
who's working on it.
I'm curious what you feel is missing that pgloader doesn't fill that
requirement: http://pgfoundry.org/projects/pgloader/
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg,
I'm curious what you feel is missing that pgloader doesn't fill that
requirement: http://pgfoundry.org/projects/pgloader/
Because pgloader is implemented in middleware, it carries a very high overhead
if you have bad rows. As little as 1% bad rows will slow down loading by 20%
due to retries.
--
Josh Berkus
PostgreSQL @ Sun
San Francisco
Hi,
Le mercredi 12 décembre 2007, Josh Berkus a écrit :
I'm curious what you feel is missing that pgloader doesn't fill that
requirement: http://pgfoundry.org/projects/pgloader/Because pgloader is implemented in middleware, it carries a very high
overhead if you have bad rows. As little as 1% bad rows will slow down
loading by 20% due to retries.
Not that much, in fact, I'd say.
pgloader allows its user to configure how large a COPY buffer to use (global
parameter as of now, could easily be a per-section configuration knob, just
didn't see any need for this yet).
It's the 'copy_every' parameter as seen on the man page here:
http://pgloader.projects.postgresql.org/#toc4
pgloader will obviously prepare a in-memory buffer of copy_every tuples to
give to COPY, and in case of error will cut it and retry. Classic dichotomy
approach, from initial implementation by Jan Wieck.
So you can easily balance the error recovery costs against the COPY bulk size.
Note also that the overall loading time with pgloader is not scaling the same
as the COPY buffer size, the optimal choice depends on the dataset --- and
the data massaging pgloader has to make on it ---, and I've experienced best
results with 10000 and 15000 tuples buffers so far.
FYI, now the pgloader topic is on the table, the next items I think I'm gonna
develop for it are configurable behavior on errors tuples (load to another
table when pk error, e.g.), and some limited ddl-partioning support.
I'm playing with the idea for pgloader to be able to read some partitioning
schemes (parsing CHECK constraint on inherited tables) and load directly into
the right partitions.
That would of course be done only when configured this way, and if constraints
are misread it would only result in a lot more rejected rows than expected,
and you still can retry using your insert trigger instead of pgloader buggy
smartness.
Comments welcome, regards,
--
dim
On Tue, 2007-12-11 at 15:31 -0800, Josh Berkus wrote:
Simon, we should start a VLDB-Postgres developer wiki page.
http://developer.postgresql.org/index.php/DataWarehousing
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
Hi,
Josh Berkus wrote:
Here's the other VLDB features we're missing:
Parallel Query
Uh.. this only makes sense in a distributed database, no? I've thought
about parallel querying on top of Postgres-R. Does it make sense
implementing some form of parallel querying apart from the distribution
or replication engine?
Windowing Functions
Isn't Gavin Sherry working on this? Haven't read anything from him lately...
Parallel Index Build (not sure how this works exactly, but it speeds Oracle
up considerably)
Sounds interesting *turs-away-to-google*
On-disk Bitmap Index (anyone game to finish GP patch?)
Anybody having an idea of what's missing there (besides good use cases,
which some people doubt)? Again: Gavin?
Simon, we should start a VLDB-Postgres developer wiki page.
Thanks, Simon, wiki page looks good!
Regards
Markus
Markus,
Parallel Query
Uh.. this only makes sense in a distributed database, no? I've thought
about parallel querying on top of Postgres-R. Does it make sense
implementing some form of parallel querying apart from the distribution
or replication engine?
Sure. Imagine you have a 5TB database on a machine with 8 cores and only one
concurrent user. You'd like to have 1 core doing I/O, and say 4-5 cores
dividing the scan and join processing into 4-5 chunks.
I'd say implementing a separate I/O worker would be the first step towards
this; if we could avoid doing I/O in the same process/thread where we're
doing row parsing it would speed up large scans by 100%. I know Oracle does
this, and their large-table-I/O is 30-40% faster than ours despite having
less efficient storage.
Maybe Greenplum or EnterpriseDB will contribute something. ;-)
Windowing Functions
Isn't Gavin Sherry working on this? Haven't read anything from him
lately...
Me neither. Swallowed by Greenplum and France.
--
Josh Berkus
PostgreSQL @ Sun
San Francisco
Hi Josh,
Josh Berkus wrote:
Sure. Imagine you have a 5TB database on a machine with 8 cores and only one
concurrent user. You'd like to have 1 core doing I/O, and say 4-5 cores
dividing the scan and join processing into 4-5 chunks.
Ah, right, thank for enlightenment. Heck, I'm definitely too focused on
replication and distributed databases :-)
However, there's certainly a great deal of an intersection between
parallel processing on different machines and parallel processing on
multiple CPUs - especially considering NUMA architecture.
*comes-to-think-again*...
Isn't Gavin Sherry working on this? Haven't read anything from him
lately...Me neither. Swallowed by Greenplum and France.
Hm.. good for him, I guess!
Regards
Markus
On Wed, Dec 12, 2007 at 08:26:16PM +0100, Markus Schiltknecht wrote:
Isn't Gavin Sherry working on this? Haven't read anything from him
lately...Me neither. Swallowed by Greenplum and France.
Hm.. good for him, I guess!
Yes, I'm around -- just extremely busy with a big release at Greenplum as
well as other Real Life stuff.
Thanks,
Gavin
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Greenplum as well as other Real Life stuff.
For those of us here who have no idea what you are talking about can
you define what "Real Life" is like?
Joshua D. Drake
- --
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/
Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFHYEmLATb/zqfZUUQRAhHJAJ9GD5DPZOlyd9LiBUG5TENIjuTgSwCaAnsf
5vdCZatl+XqD5S0+zMV/Ltk=
=KyqY
-----END PGP SIGNATURE-----
"Josh Berkus" <josh@agliodbs.com> writes:
Markus,
Parallel Query
Uh.. this only makes sense in a distributed database, no? I've thought
about parallel querying on top of Postgres-R. Does it make sense
implementing some form of parallel querying apart from the distribution
or replication engine?
Yes, but not for the reasons Josh describes.
I'd say implementing a separate I/O worker would be the first step towards
this; if we could avoid doing I/O in the same process/thread where we're
doing row parsing it would speed up large scans by 100%. I know Oracle does
this, and their large-table-I/O is 30-40% faster than ours despite having
less efficient storage.
Oracle is using Direct I/O so they need the reader and writer threads to avoid
blocking on i/o all the time. We count on the OS doing readahead and buffering
our writes so we don't have to. Direct I/O and needing some way to do
asynchronous writes and reads are directly tied.
Where Parallel query is useful is when you have queries that involve a
substantial amount of cpu resources, especially if you have a very fast I/O
system which can saturate the bandwidth to a single cpu.
So for example if you have a merge join which requires sorting both sides of
the query you could easily have subprocesses handle those sorts allowing you
to bring two processors to bear on the problem instead of being limited to a
single processor.
On Oracle Parallel Query goes great with partitioned tables. Their query
planner will almost always turn the partition scans into parallel scans and
use separate processors to scan different partitions.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!
Hello Gregory,
Gregory Stark wrote:
Oracle is using Direct I/O so they need the reader and writer threads to avoid
blocking on i/o all the time. We count on the OS doing readahead and buffering
our writes so we don't have to. Direct I/O and needing some way to do
asynchronous writes and reads are directly tied.
Yeah, except in cases where we can tell ahead non-sequential reads.
Which admittedly doesn't come up too frequently and can probably be
handled with posix_fadvice - as you are currently testing.
Where Parallel query is useful is when you have queries that involve a
substantial amount of cpu resources, especially if you have a very fast I/O
system which can saturate the bandwidth to a single cpu.
Full ACK, the very same applies to parallel querying on shared-nothing
clusters. Those can help if the bandwidth to all processing cores
together becomes the bottleneck (and the resulting data is relatively
small compared to the input data).
For example, Sun's UltraSparc T2 features only 8 PCIe lanes for those 8
cores, so you end up with 250 MiB/sec per core or about 32 MiB/sec per
thread on average. To be fair: their 10 Gig Ethernet ports don't go via
PCIe, so you get an additional 2x 1 GiB/sec for the complete chip. And
memory bandwidth looks a lot better: Sun claims 60+ GiB/sec, leaving
almost 8 GiB/sec per core or 1 GiB/sec per thread.
If my calculations for Intel are correct, a Quad Xeon with a 1.33 GHz
FSB has around 21 GiB/sec throughput to main memory, giving 5 GiB/sec
per core. (Why are these numbers so hard to find? It looks like Intel
deliberately obfuscates them with FSB MHz or Giga-transactions per sec
and the like.)
Regards
Markus
Ühel kenal päeval, T, 2007-12-11 kell 15:41, kirjutas Neil Conway:
On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote:
Just so you don't lose sight of it, one of the biggest VLDB features we're
missing is fault-tolerant bulk load.I actually had to cook up a version of this for Truviso recently. I'll
take a look at submitting a cleaned-up implementation for 8.4.
How did you do it ?
Did you enchance COPY command or was it something completely new ?
-----------
Hannu
On Fri, 2007-12-14 at 14:48 +0200, Hannu Krosing wrote:
How did you do it ?
Did you enchance COPY command or was it something completely new ?
By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
to drop (and log) rows that contain malformed data. That is, rows with
too many or too few columns, rows that result in constraint violations,
and rows containing columns where the data type's input function raises
an error. The last case is the only thing that would be a bit tricky to
implement, I think: you could use PG_TRY() around the InputFunctionCall,
but I guess you'd need a subtransaction to ensure that you reset your
state correctly after catching an error.
-Neil