Sorting performance vs. MySQL
I have the exact same table of data in both MySQL and Postgresql. In Postgresql:
tpcc=# \d metarelcloud_transactionlog
Table
"public.metarelcloud_transactionlog"
Column | Type |
Modifiers
---------------------+-----------------------+--------------------------------------------------------------------------
id | integer | not null default
nextval('metarelcloud_transactionlog_id_seq'::regclass)
transactionid | integer | not null
queryid | smallint | not null
tableid | character varying(30) | not null
tupleid | integer | not null
querytype | character varying | not null
graphpartition | smallint |
replicatedpartition | smallint |
justifiedpartition | smallint |
hashpartition | smallint |
nodeid | integer |
manualpartition | smallint |
Indexes:
"metarelcloud_transactionlog_pkey" PRIMARY KEY, btree (id)
Check constraints:
"metarelcloud_transactionlog_graphpartition_check" CHECK
(graphpartition >= 0)
"metarelcloud_transactionlog_hashpartition_check" CHECK (hashpartition >= 0)
"metarelcloud_transactionlog_justifiedpartition_check" CHECK
(justifiedpartition >= 0)
"metarelcloud_transactionlog_manualpartition_check" CHECK
(manualpartition >= 0)
"metarelcloud_transactionlog_querytype_check" CHECK
(querytype::text = ANY (ARRAY['select'::character varying,
'insert'::character varying, 'delete'::character varying,
'update'::character varying]::text[]))
"metarelcloud_transactionlog_replicatedpartition_check" CHECK
(replicatedpartition >= 0)
In MySQL:
CREATE TABLE `metarelcloud_transactionlog` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`transactionid` int(11) NOT NULL,
`queryid` tinyint(4) NOT NULL,
`tableid` varchar(30) NOT NULL,
`tupleid` int(11) NOT NULL,
`querytype` enum('select','insert','delete','update') NOT NULL,
`graphpartition` tinyint(3) unsigned DEFAULT NULL,
`replicatedpartition` tinyint(3) unsigned DEFAULT NULL,
`justifiedpartition` tinyint(3) unsigned DEFAULT NULL,
`hashpartition` tinyint(3) unsigned DEFAULT NULL,
`nodeid` int(11) DEFAULT NULL,
`manualpartition` tinyint(3) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `transactionid` (`transactionid`),
KEY `tableid` (`tableid`,`tupleid`),
KEY `nodeid` (`nodeid`)
) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1
I'm running:
select * from metarelcloud_transactionlog order by transactionid;
It takes MySQL 6 minutes, but Postgresql is still running after 70
minutes. Is there something like a glaring misconfiguration that I'm
overlooking? Thanks in advance.
--
Yang Zhang
http://www.mit.edu/~y_z/
hello
the speed depends on setting of working_memory. Try to increase a working_memory
set working_memory to '10MB';
Regards
Pavel Stehule
2010/2/22 Yang Zhang <yanghatespam@gmail.com>:
Show quoted text
I have the exact same table of data in both MySQL and Postgresql. In Postgresql:
tpcc=# \d metarelcloud_transactionlog
Table
"public.metarelcloud_transactionlog"
Column | Type |
Modifiers
---------------------+-----------------------+--------------------------------------------------------------------------
id | integer | not null default
nextval('metarelcloud_transactionlog_id_seq'::regclass)
transactionid | integer | not null
queryid | smallint | not null
tableid | character varying(30) | not null
tupleid | integer | not null
querytype | character varying | not null
graphpartition | smallint |
replicatedpartition | smallint |
justifiedpartition | smallint |
hashpartition | smallint |
nodeid | integer |
manualpartition | smallint |
Indexes:
"metarelcloud_transactionlog_pkey" PRIMARY KEY, btree (id)
Check constraints:
"metarelcloud_transactionlog_graphpartition_check" CHECK
(graphpartition >= 0)
"metarelcloud_transactionlog_hashpartition_check" CHECK (hashpartition >= 0)
"metarelcloud_transactionlog_justifiedpartition_check" CHECK
(justifiedpartition >= 0)
"metarelcloud_transactionlog_manualpartition_check" CHECK
(manualpartition >= 0)
"metarelcloud_transactionlog_querytype_check" CHECK
(querytype::text = ANY (ARRAY['select'::character varying,
'insert'::character varying, 'delete'::character varying,
'update'::character varying]::text[]))
"metarelcloud_transactionlog_replicatedpartition_check" CHECK
(replicatedpartition >= 0)In MySQL:
CREATE TABLE `metarelcloud_transactionlog` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`transactionid` int(11) NOT NULL,
`queryid` tinyint(4) NOT NULL,
`tableid` varchar(30) NOT NULL,
`tupleid` int(11) NOT NULL,
`querytype` enum('select','insert','delete','update') NOT NULL,
`graphpartition` tinyint(3) unsigned DEFAULT NULL,
`replicatedpartition` tinyint(3) unsigned DEFAULT NULL,
`justifiedpartition` tinyint(3) unsigned DEFAULT NULL,
`hashpartition` tinyint(3) unsigned DEFAULT NULL,
`nodeid` int(11) DEFAULT NULL,
`manualpartition` tinyint(3) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `transactionid` (`transactionid`),
KEY `tableid` (`tableid`,`tupleid`),
KEY `nodeid` (`nodeid`)
) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1I'm running:
select * from metarelcloud_transactionlog order by transactionid;
It takes MySQL 6 minutes, but Postgresql is still running after 70
minutes. Is there something like a glaring misconfiguration that I'm
overlooking? Thanks in advance.
--
Yang Zhang
http://www.mit.edu/~y_z/--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
There is no index on the column transactionid in your PostgreSQL-
table, as there is in your MySQL-table. This explains the difference.
CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog
(transactionid);
Op 22 feb 2010, om 19:10 heeft Yang Zhang het volgende geschreven:
I have the exact same table of data in both MySQL and Postgresql. In
Postgresql:tpcc=# \d metarelcloud_transactionlog Table "public.metarelcloud_transactionlog" Column | Type | Modifiers ---------------------+----------------------- + -------------------------------------------------------------------------- id | integer | not null default nextval('metarelcloud_transactionlog_id_seq'::regclass) transactionid | integer | not null queryid | smallint | not null tableid | character varying(30) | not null tupleid | integer | not null querytype | character varying | not null graphpartition | smallint | replicatedpartition | smallint | justifiedpartition | smallint | hashpartition | smallint | nodeid | integer | manualpartition | smallint | Indexes: "metarelcloud_transactionlog_pkey" PRIMARY KEY, btree (id) Check constraints: "metarelcloud_transactionlog_graphpartition_check" CHECK (graphpartition >= 0) "metarelcloud_transactionlog_hashpartition_check" CHECK (hashpartition >= 0) "metarelcloud_transactionlog_justifiedpartition_check" CHECK (justifiedpartition >= 0) "metarelcloud_transactionlog_manualpartition_check" CHECK (manualpartition >= 0) "metarelcloud_transactionlog_querytype_check" CHECK (querytype::text = ANY (ARRAY['select'::character varying, 'insert'::character varying, 'delete'::character varying, 'update'::character varying]::text[])) "metarelcloud_transactionlog_replicatedpartition_check" CHECK (replicatedpartition >= 0)In MySQL:
CREATE TABLE `metarelcloud_transactionlog` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`transactionid` int(11) NOT NULL,
`queryid` tinyint(4) NOT NULL,
`tableid` varchar(30) NOT NULL,
`tupleid` int(11) NOT NULL,
`querytype` enum('select','insert','delete','update') NOT NULL,
`graphpartition` tinyint(3) unsigned DEFAULT NULL,
`replicatedpartition` tinyint(3) unsigned DEFAULT NULL,
`justifiedpartition` tinyint(3) unsigned DEFAULT NULL,
`hashpartition` tinyint(3) unsigned DEFAULT NULL,
`nodeid` int(11) DEFAULT NULL,
`manualpartition` tinyint(3) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `transactionid` (`transactionid`),
KEY `tableid` (`tableid`,`tupleid`),
KEY `nodeid` (`nodeid`)
) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1I'm running:
select * from metarelcloud_transactionlog order by transactionid;
It takes MySQL 6 minutes, but Postgresql is still running after 70
minutes. Is there something like a glaring misconfiguration that I'm
overlooking? Thanks in advance.
--
Yang Zhang
http://www.mit.edu/~y_z/--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Frank Heikens
frankheikens@mac.com
On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens <frankheikens@mac.com> wrote:
There is no index on the column transactionid in your PostgreSQL-table, as
there is in your MySQL-table. This explains the difference.CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog
(transactionid);
Does an index help a sort operation in PostgreSQL?
--
Regards,
Richard Broersma Jr.
Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug
On Mon, Feb 22, 2010 at 1:13 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
hello
the speed depends on setting of working_memory. Try to increase a working_memory
set working_memory to '10MB';
It's already at
tpcc=# show work_mem;
work_mem
----------
20000kB
(1 row)
I also wouldn't have imagined an external merge-sort as being very
memory-intensive--wouldn't it only enough buffer space to read 2x and
write 1x in big-enough chunks for mostly-sequential access?
--
Yang Zhang
http://www.mit.edu/~y_z/
On Mon, Feb 22, 2010 at 1:30 PM, Richard Broersma
<richard.broersma@gmail.com> wrote:
On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens <frankheikens@mac.com> wrote:
There is no index on the column transactionid in your PostgreSQL-table, as
there is in your MySQL-table. This explains the difference.CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog
(transactionid);Does an index help a sort operation in PostgreSQL?
I also share the same doubt. An external merge-sort needs to make
complete passes over the entire dataset, with no index-directed
accesses.
--
Yang Zhang
http://www.mit.edu/~y_z/
Op 22 feb 2010, om 19:30 heeft Richard Broersma het volgende geschreven:
On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens
<frankheikens@mac.com> wrote:There is no index on the column transactionid in your PostgreSQL-
table, as
there is in your MySQL-table. This explains the difference.CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog
(transactionid);Does an index help a sort operation in PostgreSQL?
Yes it does, see the manual: http://www.postgresql.org/docs/8.4/interactive/indexes-ordering.html
Example without index:
"Sort (cost=804.39..829.39 rows=10000 width=4) (actual
time=16.006..17.171 rows=10000 loops=1)"
" Sort Key: bar"
" Sort Method: quicksort Memory: 491kB"
" -> Seq Scan on bla (cost=0.00..140.00 rows=10000 width=4) (actual
time=0.015..2.236 rows=10000 loops=1)"
"Total runtime: 18.098 ms"
Same query with index (btree):
"Index Scan Backward using i_bar on bla (cost=0.00..406.25 rows=10000
width=4) (actual time=0.093..4.408 rows=10000 loops=1)"
"Total runtime: 5.381 ms"
--
Regards,
Richard Broersma Jr.Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug
Regards,
Frank Heikens
On 22 Feb 2010, at 19:35, Yang Zhang wrote:
I also wouldn't have imagined an external merge-sort as being very
Where's that external merge-sort coming from? Can you show an explain analyze?
If your work-mem is too low there's a good chance that Postgres has to use your disks for sorting, which will obviously be quite slow.
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.
!DSPAM:737,4b82d18510442035320951!
Yang Zhang escribi�:
I'm running:
select * from metarelcloud_transactionlog order by transactionid;
It takes MySQL 6 minutes, but Postgresql is still running after 70
minutes. Is there something like a glaring misconfiguration that I'm
overlooking? Thanks in advance.
How large is the table, and have you vacuumed it? Did you analyze it?
What Pg version is this?
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
On Mon, Feb 22, 2010 at 1:48 PM, Alban Hertroys
<dalroi@solfertje.student.utwente.nl> wrote:
On 22 Feb 2010, at 19:35, Yang Zhang wrote:
I also wouldn't have imagined an external merge-sort as being very
Where's that external merge-sort coming from? Can you show an explain analyze?
I just assumed that the "Sort" in the EXPLAIN output meant an external
merge-sort, given that the table has over 50 million tuples and is
over 3GB, *and* there is no index on the sort key:
tpcc=# explain select * from metarelcloud_transactionlog order by transactionid;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Sort (cost=8408637.34..8534662.95 rows=50410244 width=17)
Sort Key: a.transactionid
-> Seq Scan on metarelcloud_transactionlog a
(cost=0.00..925543.44 rows=50410244 width=17)
(3 rows)
Anyway, I added the INDEX as suggested by Frank, but it's been 20
minutes and it's still running. With the index, EXPLAIN says:
tpcc=# explain select * from metarelcloud_transactionlog order by transactionid;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Scan using i_transactionid on metarelcloud_transactionlog
(cost=0.00..4453076.81 rows=50410164 width=44)
(1 row)
If your work-mem is too low there's a good chance that Postgres has to use your disks for sorting, which will obviously be quite slow.
Relative to the non-terminating 80-minute-so-far sort, Unix sort runs
much faster (on the order of several minutes).
--
Yang Zhang
http://www.mit.edu/~y_z/
On Mon, Feb 22, 2010 at 2:03 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
Yang Zhang escribió:
I'm running:
select * from metarelcloud_transactionlog order by transactionid;
It takes MySQL 6 minutes, but Postgresql is still running after 70
minutes. Is there something like a glaring misconfiguration that I'm
overlooking? Thanks in advance.How large is the table, and have you vacuumed it? Did you analyze it?
What Pg version is this?
The table has 50.4M tuples. It's been vacuumed and analyzed. I'm using
PG 8.3.8 on Fedora 10 x86_64.
--
Yang Zhang
http://www.mit.edu/~y_z/
Op 22 feb 2010, om 20:07 heeft Yang Zhang het volgende geschreven:
On Mon, Feb 22, 2010 at 1:48 PM, Alban Hertroys
<dalroi@solfertje.student.utwente.nl> wrote:On 22 Feb 2010, at 19:35, Yang Zhang wrote:
I also wouldn't have imagined an external merge-sort as being very
Where's that external merge-sort coming from? Can you show an
explain analyze?I just assumed that the "Sort" in the EXPLAIN output meant an external
merge-sort, given that the table has over 50 million tuples and is
over 3GB, *and* there is no index on the sort key:tpcc=# explain select * from metarelcloud_transactionlog order by
transactionid;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Sort (cost=8408637.34..8534662.95 rows=50410244 width=17)
Sort Key: a.transactionid
-> Seq Scan on metarelcloud_transactionlog a
(cost=0.00..925543.44 rows=50410244 width=17)
(3 rows)Anyway, I added the INDEX as suggested by Frank, but it's been 20
minutes and it's still running. With the index, EXPLAIN says:tpcc=# explain select * from metarelcloud_transactionlog order by
transactionid;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Scan using i_transactionid on metarelcloud_transactionlog
(cost=0.00..4453076.81 rows=50410164 width=44)
(1 row)
Use EXPLAIN ANALYZE to see how the query is executed, gives you more
details.
If your work-mem is too low there's a good chance that Postgres has
to use your disks for sorting, which will obviously be quite slow.Relative to the non-terminating 80-minute-so-far sort, Unix sort runs
much faster (on the order of several minutes).
Make sure your index does fit into memory, what's the size of the index?
--
Yang Zhang
http://www.mit.edu/~y_z/--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Frank Heikens
Yang Zhang escribi�:
I have the exact same table of data in both MySQL and Postgresql. In Postgresql:
I just noticed two things:
[snip lots of stuff]
1.
) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1
You're doing a comparison to MyISAM.
2.
select * from metarelcloud_transactionlog order by transactionid;
You're reading the whole table.
This is unlikely to fly very far. I suggest you try some query that's
actually going to be used in the real world.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
On Mon, Feb 22, 2010 at 2:15 PM, Frank Heikens <frankheikens@mac.com> wrote:
Op 22 feb 2010, om 20:07 heeft Yang Zhang het volgende geschreven:
On Mon, Feb 22, 2010 at 1:48 PM, Alban Hertroys
<dalroi@solfertje.student.utwente.nl> wrote:On 22 Feb 2010, at 19:35, Yang Zhang wrote:
I also wouldn't have imagined an external merge-sort as being very
Where's that external merge-sort coming from? Can you show an explain
analyze?I just assumed that the "Sort" in the EXPLAIN output meant an external
merge-sort, given that the table has over 50 million tuples and is
over 3GB, *and* there is no index on the sort key:tpcc=# explain select * from metarelcloud_transactionlog order by
transactionid;
QUERY PLAN-----------------------------------------------------------------------------------------------------------------
Sort (cost=8408637.34..8534662.95 rows=50410244 width=17)
Sort Key: a.transactionid
-> Seq Scan on metarelcloud_transactionlog a
(cost=0.00..925543.44 rows=50410244 width=17)
(3 rows)Anyway, I added the INDEX as suggested by Frank, but it's been 20
minutes and it's still running. With the index, EXPLAIN says:tpcc=# explain select * from metarelcloud_transactionlog order by
transactionid;
QUERY PLAN-----------------------------------------------------------------------------------------------------------------
Index Scan using i_transactionid on metarelcloud_transactionlog
(cost=0.00..4453076.81 rows=50410164 width=44)
(1 row)Use EXPLAIN ANALYZE to see how the query is executed, gives you more
details.
As mentioned, this would take a very long time to complete running --
I have not yet seen one successful query completion yet. I'd need to
let it run overnight.
If your work-mem is too low there's a good chance that Postgres has to
use your disks for sorting, which will obviously be quite slow.Relative to the non-terminating 80-minute-so-far sort, Unix sort runs
much faster (on the order of several minutes).Make sure your index does fit into memory, what's the size of the index?
How might I find out the size and whether it's being fit in memory?
--
Yang Zhang
http://www.mit.edu/~y_z/
On Mon, Feb 22, 2010 at 2:27 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
Yang Zhang escribió:
I have the exact same table of data in both MySQL and Postgresql. In Postgresql:
I just noticed two things:
[snip lots of stuff]
1.
) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1
You're doing a comparison to MyISAM.
We've actually been using innodb as well; it exhibits similar
execution times to MyISAM.
2.
select * from metarelcloud_transactionlog order by transactionid;
You're reading the whole table.
This is unlikely to fly very far. I suggest you try some query that's
actually going to be used in the real world.
This isn't some microbenchmark. This is part of our actual analytical
application. We're running large-scale graph partitioning algorithms.
--
Yang Zhang
http://www.mit.edu/~y_z/
On Mon, Feb 22, 2010 at 12:30 PM, Yang Zhang <yanghatespam@gmail.com> wrote:
This isn't some microbenchmark. This is part of our actual analytical
application. We're running large-scale graph partitioning algorithms.
It's important to see how it runs if you can fit more / most of the
data set into memory by cranking up work_mem to something really big
(like a gigabyte or two) and if the query planner can switch to some
sort of hash algorithm.
Also, can you cluster the table on transactionid ?
Op 22 feb 2010, om 20:28 heeft Yang Zhang het volgende geschreven:
If your work-mem is too low there's a good chance that Postgres
has to
use your disks for sorting, which will obviously be quite slow.Relative to the non-terminating 80-minute-so-far sort, Unix sort
runs
much faster (on the order of several minutes).Make sure your index does fit into memory, what's the size of the
index?How might I find out the size and whether it's being fit in memory?
SELECT pg_size_pretty(pg_relation_size('i_transactionid'));
--
Yang Zhang
http://www.mit.edu/~y_z/--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Frank Heikens
On Mon, Feb 22, 2010 at 2:39 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Mon, Feb 22, 2010 at 12:30 PM, Yang Zhang <yanghatespam@gmail.com> wrote:
This isn't some microbenchmark. This is part of our actual analytical
application. We're running large-scale graph partitioning algorithms.It's important to see how it runs if you can fit more / most of the
data set into memory by cranking up work_mem to something really big
(like a gigabyte or two) and if the query planner can switch to some
sort of hash algorithm.
We're actually using a very small dataset right now. Being bounded by
memory capacity is not a scalable approach for our application.
Also, can you cluster the table on transactionid ?
We can, but that's not really addressing the core issue, which matters
to us since the sort itself is only for performing a self merge join
on transactionid, and the *very next step* is a group by a.tableid,
a.tupleid, b.tableid, b.tupleid (i.e. requiring another sort for the
group-agg).
--
Yang Zhang
http://www.mit.edu/~y_z/
On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang <yanghatespam@gmail.com> wrote:
I have the exact same table of data in both MySQL and Postgresql. In Postgresql:
Just wondering, are these on the same exact machine?
On Mon, Feb 22, 2010 at 2:52 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang <yanghatespam@gmail.com> wrote:
I have the exact same table of data in both MySQL and Postgresql. In Postgresql:
Just wondering, are these on the same exact machine?
Yes, on the same disk.
--
Yang Zhang
http://www.mit.edu/~y_z/