Sorting performance vs. MySQL

Started by Yang Zhangabout 16 years ago45 messagesgeneral
Jump to latest
#1Yang Zhang
yanghatespam@gmail.com

I have the exact same table of data in both MySQL and Postgresql. In Postgresql:

tpcc=# \d metarelcloud_transactionlog
Table
"public.metarelcloud_transactionlog"
Column | Type |
Modifiers
---------------------+-----------------------+--------------------------------------------------------------------------
id | integer | not null default
nextval('metarelcloud_transactionlog_id_seq'::regclass)
transactionid | integer | not null
queryid | smallint | not null
tableid | character varying(30) | not null
tupleid | integer | not null
querytype | character varying | not null
graphpartition | smallint |
replicatedpartition | smallint |
justifiedpartition | smallint |
hashpartition | smallint |
nodeid | integer |
manualpartition | smallint |
Indexes:
"metarelcloud_transactionlog_pkey" PRIMARY KEY, btree (id)
Check constraints:
"metarelcloud_transactionlog_graphpartition_check" CHECK
(graphpartition >= 0)
"metarelcloud_transactionlog_hashpartition_check" CHECK (hashpartition >= 0)
"metarelcloud_transactionlog_justifiedpartition_check" CHECK
(justifiedpartition >= 0)
"metarelcloud_transactionlog_manualpartition_check" CHECK
(manualpartition >= 0)
"metarelcloud_transactionlog_querytype_check" CHECK
(querytype::text = ANY (ARRAY['select'::character varying,
'insert'::character varying, 'delete'::character varying,
'update'::character varying]::text[]))
"metarelcloud_transactionlog_replicatedpartition_check" CHECK
(replicatedpartition >= 0)

In MySQL:

CREATE TABLE `metarelcloud_transactionlog` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`transactionid` int(11) NOT NULL,
`queryid` tinyint(4) NOT NULL,
`tableid` varchar(30) NOT NULL,
`tupleid` int(11) NOT NULL,
`querytype` enum('select','insert','delete','update') NOT NULL,
`graphpartition` tinyint(3) unsigned DEFAULT NULL,
`replicatedpartition` tinyint(3) unsigned DEFAULT NULL,
`justifiedpartition` tinyint(3) unsigned DEFAULT NULL,
`hashpartition` tinyint(3) unsigned DEFAULT NULL,
`nodeid` int(11) DEFAULT NULL,
`manualpartition` tinyint(3) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `transactionid` (`transactionid`),
KEY `tableid` (`tableid`,`tupleid`),
KEY `nodeid` (`nodeid`)
) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1

I'm running:

select * from metarelcloud_transactionlog order by transactionid;

It takes MySQL 6 minutes, but Postgresql is still running after 70
minutes. Is there something like a glaring misconfiguration that I'm
overlooking? Thanks in advance.
--
Yang Zhang
http://www.mit.edu/~y_z/

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Yang Zhang (#1)
Re: Sorting performance vs. MySQL

hello

the speed depends on setting of working_memory. Try to increase a working_memory

set working_memory to '10MB';

Regards
Pavel Stehule

2010/2/22 Yang Zhang <yanghatespam@gmail.com>:

Show quoted text

I have the exact same table of data in both MySQL and Postgresql. In Postgresql:

tpcc=# \d metarelcloud_transactionlog
                                      Table
"public.metarelcloud_transactionlog"
      Column        |         Type          |
      Modifiers
---------------------+-----------------------+--------------------------------------------------------------------------
 id                  | integer               | not null default
nextval('metarelcloud_transactionlog_id_seq'::regclass)
 transactionid       | integer               | not null
 queryid             | smallint              | not null
 tableid             | character varying(30) | not null
 tupleid             | integer               | not null
 querytype           | character varying     | not null
 graphpartition      | smallint              |
 replicatedpartition | smallint              |
 justifiedpartition  | smallint              |
 hashpartition       | smallint              |
 nodeid              | integer               |
 manualpartition     | smallint              |
Indexes:
   "metarelcloud_transactionlog_pkey" PRIMARY KEY, btree (id)
Check constraints:
   "metarelcloud_transactionlog_graphpartition_check" CHECK
(graphpartition >= 0)
   "metarelcloud_transactionlog_hashpartition_check" CHECK (hashpartition >= 0)
   "metarelcloud_transactionlog_justifiedpartition_check" CHECK
(justifiedpartition >= 0)
   "metarelcloud_transactionlog_manualpartition_check" CHECK
(manualpartition >= 0)
   "metarelcloud_transactionlog_querytype_check" CHECK
(querytype::text = ANY (ARRAY['select'::character varying,
'insert'::character varying, 'delete'::character varying,
'update'::character varying]::text[]))
   "metarelcloud_transactionlog_replicatedpartition_check" CHECK
(replicatedpartition >= 0)

In MySQL:

CREATE TABLE `metarelcloud_transactionlog` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `transactionid` int(11) NOT NULL,
 `queryid` tinyint(4) NOT NULL,
 `tableid` varchar(30) NOT NULL,
 `tupleid` int(11) NOT NULL,
 `querytype` enum('select','insert','delete','update') NOT NULL,
 `graphpartition` tinyint(3) unsigned DEFAULT NULL,
 `replicatedpartition` tinyint(3) unsigned DEFAULT NULL,
 `justifiedpartition` tinyint(3) unsigned DEFAULT NULL,
 `hashpartition` tinyint(3) unsigned DEFAULT NULL,
 `nodeid` int(11) DEFAULT NULL,
 `manualpartition` tinyint(3) unsigned DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `transactionid` (`transactionid`),
 KEY `tableid` (`tableid`,`tupleid`),
 KEY `nodeid` (`nodeid`)
) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1

I'm running:

 select * from metarelcloud_transactionlog order by transactionid;

It takes MySQL 6 minutes, but Postgresql is still running after 70
minutes. Is there something like a glaring misconfiguration that I'm
overlooking? Thanks in advance.
--
Yang Zhang
http://www.mit.edu/~y_z/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Frank Heikens
frankheikens@mac.com
In reply to: Yang Zhang (#1)
Re: Sorting performance vs. MySQL

There is no index on the column transactionid in your PostgreSQL-
table, as there is in your MySQL-table. This explains the difference.

CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog
(transactionid);

Op 22 feb 2010, om 19:10 heeft Yang Zhang het volgende geschreven:

I have the exact same table of data in both MySQL and Postgresql. In
Postgresql:

tpcc=# \d metarelcloud_transactionlog
Table
"public.metarelcloud_transactionlog"
Column        |         Type          |
Modifiers
---------------------+----------------------- 
+ 
--------------------------------------------------------------------------
id                  | integer               | not null default
nextval('metarelcloud_transactionlog_id_seq'::regclass)
transactionid       | integer               | not null
queryid             | smallint              | not null
tableid             | character varying(30) | not null
tupleid             | integer               | not null
querytype           | character varying     | not null
graphpartition      | smallint              |
replicatedpartition | smallint              |
justifiedpartition  | smallint              |
hashpartition       | smallint              |
nodeid              | integer               |
manualpartition     | smallint              |
Indexes:
"metarelcloud_transactionlog_pkey" PRIMARY KEY, btree (id)
Check constraints:
"metarelcloud_transactionlog_graphpartition_check" CHECK
(graphpartition >= 0)
"metarelcloud_transactionlog_hashpartition_check" CHECK  
(hashpartition >= 0)
"metarelcloud_transactionlog_justifiedpartition_check" CHECK
(justifiedpartition >= 0)
"metarelcloud_transactionlog_manualpartition_check" CHECK
(manualpartition >= 0)
"metarelcloud_transactionlog_querytype_check" CHECK
(querytype::text = ANY (ARRAY['select'::character varying,
'insert'::character varying, 'delete'::character varying,
'update'::character varying]::text[]))
"metarelcloud_transactionlog_replicatedpartition_check" CHECK
(replicatedpartition >= 0)

In MySQL:

CREATE TABLE `metarelcloud_transactionlog` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`transactionid` int(11) NOT NULL,
`queryid` tinyint(4) NOT NULL,
`tableid` varchar(30) NOT NULL,
`tupleid` int(11) NOT NULL,
`querytype` enum('select','insert','delete','update') NOT NULL,
`graphpartition` tinyint(3) unsigned DEFAULT NULL,
`replicatedpartition` tinyint(3) unsigned DEFAULT NULL,
`justifiedpartition` tinyint(3) unsigned DEFAULT NULL,
`hashpartition` tinyint(3) unsigned DEFAULT NULL,
`nodeid` int(11) DEFAULT NULL,
`manualpartition` tinyint(3) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `transactionid` (`transactionid`),
KEY `tableid` (`tableid`,`tupleid`),
KEY `nodeid` (`nodeid`)
) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1

I'm running:

select * from metarelcloud_transactionlog order by transactionid;

It takes MySQL 6 minutes, but Postgresql is still running after 70
minutes. Is there something like a glaring misconfiguration that I'm
overlooking? Thanks in advance.
--
Yang Zhang
http://www.mit.edu/~y_z/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Frank Heikens
frankheikens@mac.com

#4Richard Broersma
richard.broersma@gmail.com
In reply to: Frank Heikens (#3)
Re: Sorting performance vs. MySQL

On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens <frankheikens@mac.com> wrote:

There is no index on the column transactionid in your PostgreSQL-table, as
there is in your MySQL-table. This explains the difference.

CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog
(transactionid);

Does an index help a sort operation in PostgreSQL?

--
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug

#5Yang Zhang
yanghatespam@gmail.com
In reply to: Pavel Stehule (#2)
Re: Sorting performance vs. MySQL

On Mon, Feb 22, 2010 at 1:13 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

hello

the speed depends on setting of working_memory. Try to increase a working_memory

set working_memory to '10MB';

It's already at

tpcc=# show work_mem;
work_mem
----------
20000kB
(1 row)

I also wouldn't have imagined an external merge-sort as being very
memory-intensive--wouldn't it only enough buffer space to read 2x and
write 1x in big-enough chunks for mostly-sequential access?
--
Yang Zhang
http://www.mit.edu/~y_z/

#6Yang Zhang
yanghatespam@gmail.com
In reply to: Richard Broersma (#4)
Re: Sorting performance vs. MySQL

On Mon, Feb 22, 2010 at 1:30 PM, Richard Broersma
<richard.broersma@gmail.com> wrote:

On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens <frankheikens@mac.com> wrote:

There is no index on the column transactionid in your PostgreSQL-table, as
there is in your MySQL-table. This explains the difference.

CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog
(transactionid);

Does an index help a sort operation in PostgreSQL?

I also share the same doubt. An external merge-sort needs to make
complete passes over the entire dataset, with no index-directed
accesses.
--
Yang Zhang
http://www.mit.edu/~y_z/

#7Frank Heikens
frankheikens@mac.com
In reply to: Richard Broersma (#4)
Re: Sorting performance vs. MySQL

Op 22 feb 2010, om 19:30 heeft Richard Broersma het volgende geschreven:

On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens
<frankheikens@mac.com> wrote:

There is no index on the column transactionid in your PostgreSQL-
table, as
there is in your MySQL-table. This explains the difference.

CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog
(transactionid);

Does an index help a sort operation in PostgreSQL?

Yes it does, see the manual: http://www.postgresql.org/docs/8.4/interactive/indexes-ordering.html

Example without index:
"Sort (cost=804.39..829.39 rows=10000 width=4) (actual
time=16.006..17.171 rows=10000 loops=1)"
" Sort Key: bar"
" Sort Method: quicksort Memory: 491kB"
" -> Seq Scan on bla (cost=0.00..140.00 rows=10000 width=4) (actual
time=0.015..2.236 rows=10000 loops=1)"
"Total runtime: 18.098 ms"

Same query with index (btree):
"Index Scan Backward using i_bar on bla (cost=0.00..406.25 rows=10000
width=4) (actual time=0.093..4.408 rows=10000 loops=1)"
"Total runtime: 5.381 ms"

--
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug

Regards,
Frank Heikens

#8Alban Hertroys
dalroi@solfertje.student.utwente.nl
In reply to: Yang Zhang (#5)
Re: Sorting performance vs. MySQL

On 22 Feb 2010, at 19:35, Yang Zhang wrote:

I also wouldn't have imagined an external merge-sort as being very

Where's that external merge-sort coming from? Can you show an explain analyze?

If your work-mem is too low there's a good chance that Postgres has to use your disks for sorting, which will obviously be quite slow.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4b82d18510442035320951!

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Yang Zhang (#1)
Re: Sorting performance vs. MySQL

Yang Zhang escribi�:

I'm running:

select * from metarelcloud_transactionlog order by transactionid;

It takes MySQL 6 minutes, but Postgresql is still running after 70
minutes. Is there something like a glaring misconfiguration that I'm
overlooking? Thanks in advance.

How large is the table, and have you vacuumed it? Did you analyze it?
What Pg version is this?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#10Yang Zhang
yanghatespam@gmail.com
In reply to: Alban Hertroys (#8)
Re: Sorting performance vs. MySQL

On Mon, Feb 22, 2010 at 1:48 PM, Alban Hertroys
<dalroi@solfertje.student.utwente.nl> wrote:

On 22 Feb 2010, at 19:35, Yang Zhang wrote:

I also wouldn't have imagined an external merge-sort as being very

Where's that external merge-sort coming from? Can you show an explain analyze?

I just assumed that the "Sort" in the EXPLAIN output meant an external
merge-sort, given that the table has over 50 million tuples and is
over 3GB, *and* there is no index on the sort key:

tpcc=# explain select * from metarelcloud_transactionlog order by transactionid;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Sort (cost=8408637.34..8534662.95 rows=50410244 width=17)
Sort Key: a.transactionid
-> Seq Scan on metarelcloud_transactionlog a
(cost=0.00..925543.44 rows=50410244 width=17)
(3 rows)

Anyway, I added the INDEX as suggested by Frank, but it's been 20
minutes and it's still running. With the index, EXPLAIN says:

tpcc=# explain select * from metarelcloud_transactionlog order by transactionid;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Scan using i_transactionid on metarelcloud_transactionlog
(cost=0.00..4453076.81 rows=50410164 width=44)
(1 row)

If your work-mem is too low there's a good chance that Postgres has to use your disks for sorting, which will obviously be quite slow.

Relative to the non-terminating 80-minute-so-far sort, Unix sort runs
much faster (on the order of several minutes).
--
Yang Zhang
http://www.mit.edu/~y_z/

#11Yang Zhang
yanghatespam@gmail.com
In reply to: Alvaro Herrera (#9)
Re: Sorting performance vs. MySQL

On Mon, Feb 22, 2010 at 2:03 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Yang Zhang escribió:

I'm running:

  select * from metarelcloud_transactionlog order by transactionid;

It takes MySQL 6 minutes, but Postgresql is still running after 70
minutes. Is there something like a glaring misconfiguration that I'm
overlooking? Thanks in advance.

How large is the table, and have you vacuumed it?  Did you analyze it?
What Pg version is this?

The table has 50.4M tuples. It's been vacuumed and analyzed. I'm using
PG 8.3.8 on Fedora 10 x86_64.
--
Yang Zhang
http://www.mit.edu/~y_z/

#12Frank Heikens
frankheikens@mac.com
In reply to: Yang Zhang (#10)
Re: Sorting performance vs. MySQL

Op 22 feb 2010, om 20:07 heeft Yang Zhang het volgende geschreven:

On Mon, Feb 22, 2010 at 1:48 PM, Alban Hertroys
<dalroi@solfertje.student.utwente.nl> wrote:

On 22 Feb 2010, at 19:35, Yang Zhang wrote:

I also wouldn't have imagined an external merge-sort as being very

Where's that external merge-sort coming from? Can you show an
explain analyze?

I just assumed that the "Sort" in the EXPLAIN output meant an external
merge-sort, given that the table has over 50 million tuples and is
over 3GB, *and* there is no index on the sort key:

tpcc=# explain select * from metarelcloud_transactionlog order by
transactionid;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Sort (cost=8408637.34..8534662.95 rows=50410244 width=17)
Sort Key: a.transactionid
-> Seq Scan on metarelcloud_transactionlog a
(cost=0.00..925543.44 rows=50410244 width=17)
(3 rows)

Anyway, I added the INDEX as suggested by Frank, but it's been 20
minutes and it's still running. With the index, EXPLAIN says:

tpcc=# explain select * from metarelcloud_transactionlog order by
transactionid;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Scan using i_transactionid on metarelcloud_transactionlog
(cost=0.00..4453076.81 rows=50410164 width=44)
(1 row)

Use EXPLAIN ANALYZE to see how the query is executed, gives you more
details.

If your work-mem is too low there's a good chance that Postgres has
to use your disks for sorting, which will obviously be quite slow.

Relative to the non-terminating 80-minute-so-far sort, Unix sort runs
much faster (on the order of several minutes).

Make sure your index does fit into memory, what's the size of the index?

--
Yang Zhang
http://www.mit.edu/~y_z/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Frank Heikens

#13Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Yang Zhang (#1)
Re: Sorting performance vs. MySQL

Yang Zhang escribi�:

I have the exact same table of data in both MySQL and Postgresql. In Postgresql:

I just noticed two things:

[snip lots of stuff]

1.

) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1

You're doing a comparison to MyISAM.

2.

select * from metarelcloud_transactionlog order by transactionid;

You're reading the whole table.

This is unlikely to fly very far. I suggest you try some query that's
actually going to be used in the real world.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#14Yang Zhang
yanghatespam@gmail.com
In reply to: Frank Heikens (#12)
Re: Sorting performance vs. MySQL

On Mon, Feb 22, 2010 at 2:15 PM, Frank Heikens <frankheikens@mac.com> wrote:

Op 22 feb 2010, om 20:07 heeft Yang Zhang het volgende geschreven:

On Mon, Feb 22, 2010 at 1:48 PM, Alban Hertroys
<dalroi@solfertje.student.utwente.nl> wrote:

On 22 Feb 2010, at 19:35, Yang Zhang wrote:

I also wouldn't have imagined an external merge-sort as being very

Where's that external merge-sort coming from? Can you show an explain
analyze?

I just assumed that the "Sort" in the EXPLAIN output meant an external
merge-sort, given that the table has over 50 million tuples and is
over 3GB, *and* there is no index on the sort key:

tpcc=# explain select * from metarelcloud_transactionlog order by
transactionid;
                                                 QUERY PLAN

-----------------------------------------------------------------------------------------------------------------
Sort  (cost=8408637.34..8534662.95 rows=50410244 width=17)
 Sort Key: a.transactionid
 ->  Seq Scan on metarelcloud_transactionlog a
(cost=0.00..925543.44 rows=50410244 width=17)
(3 rows)

Anyway, I added the INDEX as suggested by Frank, but it's been 20
minutes and it's still running. With the index, EXPLAIN says:

tpcc=# explain select * from metarelcloud_transactionlog order by
transactionid;
                                                 QUERY PLAN

-----------------------------------------------------------------------------------------------------------------
Index Scan using i_transactionid on metarelcloud_transactionlog
(cost=0.00..4453076.81 rows=50410164 width=44)
(1 row)

Use EXPLAIN ANALYZE to see how the query is executed, gives you more
details.

As mentioned, this would take a very long time to complete running --
I have not yet seen one successful query completion yet. I'd need to
let it run overnight.

If your work-mem is too low there's a good chance that Postgres has to
use your disks for sorting, which will obviously be quite slow.

Relative to the non-terminating 80-minute-so-far sort, Unix sort runs
much faster (on the order of several minutes).

Make sure your index does fit into memory, what's the size of the index?

How might I find out the size and whether it's being fit in memory?
--
Yang Zhang
http://www.mit.edu/~y_z/

#15Yang Zhang
yanghatespam@gmail.com
In reply to: Alvaro Herrera (#13)
Re: Sorting performance vs. MySQL

On Mon, Feb 22, 2010 at 2:27 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Yang Zhang escribió:

I have the exact same table of data in both MySQL and Postgresql. In Postgresql:

I just noticed two things:

[snip lots of stuff]

1.

) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1

You're doing a comparison to MyISAM.

We've actually been using innodb as well; it exhibits similar
execution times to MyISAM.

2.

  select * from metarelcloud_transactionlog order by transactionid;

You're reading the whole table.

This is unlikely to fly very far.  I suggest you try some query that's
actually going to be used in the real world.

This isn't some microbenchmark. This is part of our actual analytical
application. We're running large-scale graph partitioning algorithms.
--
Yang Zhang
http://www.mit.edu/~y_z/

#16Scott Marlowe
scott.marlowe@gmail.com
In reply to: Yang Zhang (#15)
Re: Sorting performance vs. MySQL

On Mon, Feb 22, 2010 at 12:30 PM, Yang Zhang <yanghatespam@gmail.com> wrote:

This isn't some microbenchmark. This is part of our actual analytical
application. We're running large-scale graph partitioning algorithms.

It's important to see how it runs if you can fit more / most of the
data set into memory by cranking up work_mem to something really big
(like a gigabyte or two) and if the query planner can switch to some
sort of hash algorithm.

Also, can you cluster the table on transactionid ?

#17Frank Heikens
frankheikens@mac.com
In reply to: Yang Zhang (#14)
Re: Sorting performance vs. MySQL

Op 22 feb 2010, om 20:28 heeft Yang Zhang het volgende geschreven:

If your work-mem is too low there's a good chance that Postgres
has to
use your disks for sorting, which will obviously be quite slow.

Relative to the non-terminating 80-minute-so-far sort, Unix sort
runs
much faster (on the order of several minutes).

Make sure your index does fit into memory, what's the size of the
index?

How might I find out the size and whether it's being fit in memory?

SELECT pg_size_pretty(pg_relation_size('i_transactionid'));

--
Yang Zhang
http://www.mit.edu/~y_z/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Frank Heikens

#18Yang Zhang
yanghatespam@gmail.com
In reply to: Scott Marlowe (#16)
Re: Sorting performance vs. MySQL

On Mon, Feb 22, 2010 at 2:39 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Mon, Feb 22, 2010 at 12:30 PM, Yang Zhang <yanghatespam@gmail.com> wrote:

This isn't some microbenchmark. This is part of our actual analytical
application. We're running large-scale graph partitioning algorithms.

It's important to see how it runs if you can fit more / most of the
data set into memory by cranking up work_mem to something really big
(like a gigabyte or two) and if the query planner can switch to some
sort of hash algorithm.

We're actually using a very small dataset right now. Being bounded by
memory capacity is not a scalable approach for our application.

Also, can you cluster the table on transactionid ?

We can, but that's not really addressing the core issue, which matters
to us since the sort itself is only for performing a self merge join
on transactionid, and the *very next step* is a group by a.tableid,
a.tupleid, b.tableid, b.tupleid (i.e. requiring another sort for the
group-agg).
--
Yang Zhang
http://www.mit.edu/~y_z/

#19Scott Marlowe
scott.marlowe@gmail.com
In reply to: Yang Zhang (#1)
Re: Sorting performance vs. MySQL

On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang <yanghatespam@gmail.com> wrote:

I have the exact same table of data in both MySQL and Postgresql. In Postgresql:

Just wondering, are these on the same exact machine?

#20Yang Zhang
yanghatespam@gmail.com
In reply to: Scott Marlowe (#19)
Re: Sorting performance vs. MySQL

On Mon, Feb 22, 2010 at 2:52 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang <yanghatespam@gmail.com> wrote:

I have the exact same table of data in both MySQL and Postgresql. In Postgresql:

Just wondering, are these on the same exact machine?

Yes, on the same disk.
--
Yang Zhang
http://www.mit.edu/~y_z/

#21Scott Marlowe
scott.marlowe@gmail.com
In reply to: Yang Zhang (#18)
#22Yang Zhang
yanghatespam@gmail.com
In reply to: Frank Heikens (#17)
#23Yeb Havinga
yebhavinga@gmail.com
In reply to: Scott Marlowe (#19)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Yang Zhang (#5)
#25Scott Marlowe
scott.marlowe@gmail.com
In reply to: Yang Zhang (#20)
#26Yang Zhang
yanghatespam@gmail.com
In reply to: Tom Lane (#24)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Yeb Havinga (#23)
#28Igor Neyman
ineyman@perceptron.com
In reply to: Yang Zhang (#6)
#29Yang Zhang
yanghatespam@gmail.com
In reply to: Igor Neyman (#28)
#30Scott Marlowe
scott.marlowe@gmail.com
In reply to: Yang Zhang (#29)
#31Scott Marlowe
scott.marlowe@gmail.com
In reply to: Yang Zhang (#29)
#32Alex Hunsaker
badalex@gmail.com
In reply to: Yang Zhang (#1)
#33Yang Zhang
yanghatespam@gmail.com
In reply to: Scott Marlowe (#25)
#34Yang Zhang
yanghatespam@gmail.com
In reply to: Alex Hunsaker (#32)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Yang Zhang (#34)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Yang Zhang (#33)
#37Scott Marlowe
scott.marlowe@gmail.com
In reply to: Yang Zhang (#33)
#38Alex Hunsaker
badalex@gmail.com
In reply to: Yang Zhang (#33)
#39Yang Zhang
yanghatespam@gmail.com
In reply to: Scott Marlowe (#37)
#40Bruce Momjian
bruce@momjian.us
In reply to: Scott Marlowe (#37)
#41John Gage
jsmgage@numericable.fr
In reply to: Yang Zhang (#33)
#42Yeb Havinga
yebhavinga@gmail.com
In reply to: Bruce Momjian (#40)
#43Bruce Momjian
bruce@momjian.us
In reply to: Yeb Havinga (#42)
#44Alex Hunsaker
badalex@gmail.com
In reply to: Yang Zhang (#39)
#45Baron Schwartz
baron@xaprb.com
In reply to: Yang Zhang (#33)