Large databases, performance

Started by Shridhar Daithankarover 23 years ago70 messageshackersgeneral

shridhar_daithankar@persistent.co.in

over 23 years ago

hackersgeneral

Hi,

Today we concluded test for database performance. Attached are results and the
schema, for those who have missed earlier discussion on this.

We have (almost) decided that we will partition the data across machines. The
theme is, after every some short interval a burst of data will be entered in
new table in database, indexed and vacuume. The table(s) will be inherited so
that query on base table will fetch results from all the children. The
application has to consolidate all the data per node basis. If the database is
not postgresql, app. has to consolidate data across partitions as well.

Now we need to investigate whether selecting on base table to include children
would use indexes created on children table.

It's estimated that when entire data is gathered, total number of children
tables would be around 1K-1.1K across all machines.

This is in point of average rate of data insertion i.e. 5K records/sec and
total data size, estimated to be 9 billion rows max i.e. estimated database
size is 900GB. Obviously it's impossible to keep insertion rate on an indexed
table high as data grows. So partitioning/inheritance looks better approach.

Postgresql is not the final winner as yet. Mysql is in close range. I will keep
you guys posted about the result.

Let me know about any comments..

Bye
Shridhar

--
Price's Advice: It's all a game -- play it to have fun.

Charles H. Woloszynski

chw@clearmetrix.com

over 23 years ago

In reply to: Shridhar Daithankar (#1)

hackersgeneral

Re: Large databases, performance

Can you comment on the tools you are using to do the insertions (Perl,
Java?) and the distribution of data (all random, all static), and the
transaction scope (all inserts in one transaction, each insert as a
single transaction, some group of inserts as a transaction).

I'd be curious what happens when you submit more queries than you have
processors (you had four concurrent queries and four CPUs), if you care
to run any additional tests. Also, I'd report the query time in
absolute (like you did) and also in 'Time/number of concurrent queries".
This will give you a sense of how the system is scaling as the workload
increases. Personally I am more concerned about this aspect than the
load time, since I am going to guess that this is where all the time is
spent.

Was the original posting on GENERAL or HACKERS. Is this moving the
PERFORMANCE for follow-up? I'd like to follow this discussion and want
to know if I should join another group?

Thanks,

Charlie

P.S. Anyone want to comment on their expectation for 'commercial'
databases handling this load? I know that we cannot speak about
specific performance metrics on some products (licensing restrictions)
but I'd be curious if folks have seen some of the databases out there
handle these dataset sizes and respond resonably.

Shridhar Daithankar wrote:

Hi,

Today we concluded test for database performance. Attached are results and the
schema, for those who have missed earlier discussion on this.

We have (almost) decided that we will partition the data across machines. The
theme is, after every some short interval a burst of data will be entered in
new table in database, indexed and vacuume. The table(s) will be inherited so
that query on base table will fetch results from all the children. The
application has to consolidate all the data per node basis. If the database is
not postgresql, app. has to consolidate data across partitions as well.

Now we need to investigate whether selecting on base table to include children
would use indexes created on children table.

It's estimated that when entire data is gathered, total number of children
tables would be around 1K-1.1K across all machines.

This is in point of average rate of data insertion i.e. 5K records/sec and
total data size, estimated to be 9 billion rows max i.e. estimated database
size is 900GB. Obviously it's impossible to keep insertion rate on an indexed
table high as data grows. So partitioning/inheritance looks better approach.

Postgresql is not the final winner as yet. Mysql is in close range. I will keep
you guys posted about the result.

Let me know about any comments..

Bye
Shridhar

--
Price's Advice: It's all a game -- play it to have fun.

------------------------------------------------------------------------

Machine
Compaq Proliant Server ML 530
"Intel Xeon 2.4 Ghz Processor x 4, "
"4 GB RAM, 5 x 72.8 GB SCSI HDD "
"RAID 0 (Striping) Hardware Setup, Mandrake Linux 9.0"
"Cost - $13,500 ($1,350 for each additional 72GB HDD)"

Performance Parameter MySQL 3.23.52 MySQL 3.23.52 PostgreSQL 7.2.2
WITHOUT InnoDB WITH InnoDB for with built-in support
for transactional transactional support for transactions
support
Complete Data

Inserts + building a composite index
"40 GB data, 432,000,000 tuples" 3738 secs 18720 secs 20628 secs
"about 100 bytes each, schema on
'schema' sheet"
"composite index on 3 fields
(esn, min, datetime)"

Load Speed 115570 tuples/second 23076 tuples/second 20942 tuples/second

Database Size on Disk 48 GB 87 GB 111 GB

Average per partition

Inserts + building a composite index
"300MB data, 3,000,000 tuples," 28 secs 130 secs 150 secs
"about 100 bytes each, schema on
'schema' sheet"
"composite index on 3 fields
(esn, min, datetime)"

Select Query 7 secs 7 secs 6 secs
based on equality match of 2 fields
(esn and min) - 4 concurrent queries
running

Database Size on Disk 341 MB 619 MB 788 MB

------------------------------------------------------------------------

Field Name Field Type Nullable Indexed
type int no no
esn char (10) no yes
min char (10) no yes
datetime timestamp no yes
opc0 char (3) no no
opc1 char (3) no no
opc2 char (3) no no
dpc0 char (3) no no
dpc1 char (3) no no
dpc2 char (3) no no
npa char (3) no no
nxx char (3) no no
rest char (4) no no
field0 int yes no
field1 char (4) yes no
field2 int yes no
field3 char (4) yes no
field4 int yes no
field5 char (4) yes no
field6 int yes no
field7 char (4) yes no
field8 int yes no
field9 char (4) yes no

------------------------------------------------------------------------

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Charles H. Woloszynski

ClearMetrix, Inc.
115 Research Drive
Bethlehem, PA 18015

tel: 610-419-2210 x400
fax: 240-371-3256
web: www.clearmetrix.com

Nigel J. Andrews

nandrews@investsystems.co.uk

over 23 years ago

In reply to: Shridhar Daithankar (#1)

hackersgeneral

Re: Large databases, performance

Shridhar,

It's one hell of a DB you're building. I'm sure I'm not the only one interested
so to satisfy those of us who are nosey: can you say what the application is?

I'm sure we'll all understand if it's not possible for you mention such
information.

--
Nigel J. Andrews

On Thu, 3 Oct 2002, Shridhar Daithankar wrote:

Show quoted text

Hi,

Today we concluded test for database performance. Attached are results and the
schema, for those who have missed earlier discussion on this.

We have (almost) decided that we will partition the data across machines. The
theme is, after every some short interval a burst of data will be entered in
new table in database, indexed and vacuume. The table(s) will be inherited so
that query on base table will fetch results from all the children. The
application has to consolidate all the data per node basis. If the database is
not postgresql, app. has to consolidate data across partitions as well.

Now we need to investigate whether selecting on base table to include children
would use indexes created on children table.

It's estimated that when entire data is gathered, total number of children
tables would be around 1K-1.1K across all machines.

This is in point of average rate of data insertion i.e. 5K records/sec and
total data size, estimated to be 9 billion rows max i.e. estimated database
size is 900GB. Obviously it's impossible to keep insertion rate on an indexed
table high as data grows. So partitioning/inheritance looks better approach.

Postgresql is not the final winner as yet. Mysql is in close range. I will keep
you guys posted about the result.

Let me know about any comments..

Bye
Shridhar

Shridhar Daithankar

shridhar_daithankar@persistent.co.in

over 23 years ago

In reply to: Nigel J. Andrews (#3)

hackersgeneral

Re: Large databases, performance

On 3 Oct 2002 at 13:56, Nigel J. Andrews wrote:

It's one hell of a DB you're building. I'm sure I'm not the only one interested
so to satisfy those of us who are nosey: can you say what the application is?

I'm sure we'll all understand if it's not possible for you mention such
information.

Well, I can't tell everything but somethings I can..

1) This is a system that does not have online capability yet. This is an
attempt to provide one.

2) The goal is to avoid costs like licensing oracle. I am sure this would make
a great example for OSDB advocacy, which ever database wins..

3) The database size estimates, I put earlier i.e. 9 billion tuples/900GB data
size, are in a fixed window. The data is generated from some real time systems.
You can imagine the rate.

4) Further more there are timing restrictions attached to it. 5K inserts/sec.
4800 queries per hour with response time of 10 sec. each. It's this aspect that
has forced us for partitioning..

And contrary to my earlier information, this is going to be a live system
rather than a back up one.. A better win to postgresql.. I hope it makes it.

And BTW, all these results were on reiserfs. We didn't found much of difference
in write performance between them. So we stick to reiserfs. And of course we
got the latest hot shot Mandrake9 with 2.4.19-16 which really made difference
over RHL7.2..

Bye
Shridhar

--
QOTD: "Do you smell something burning or is it me?" -- Joan of Arc

Charles H. Woloszynski

chw@clearmetrix.com

over 23 years ago

In reply to: Shridhar Daithankar (#1)

hackersgeneral

Re: Large databases, performance

Forgive my ignorance, but what about 2.4.19-16 is that much faster? Are
we talking about 2x improvement for your tests? We are currently on
2.4.9 and looking at the performance and wondering... so any comments
are appreciated.

Charlie

Shridhar Daithankar wrote:

And BTW, all these results were on reiserfs. We didn't found much of difference
in write performance between them. So we stick to reiserfs. And of course we
got the latest hot shot Mandrake9 with 2.4.19-16 which really made difference
over RHL7.2..

Bye
Shridhar

--
QOTD: "Do you smell something burning or is it me?" -- Joan of Arc

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Charles H. Woloszynski

ClearMetrix, Inc.
115 Research Drive
Bethlehem, PA 18015

tel: 610-419-2210 x400
fax: 240-371-3256
web: www.clearmetrix.com

Shridhar Daithankar

shridhar_daithankar@persistent.co.in

over 23 years ago

In reply to: Charles H. Woloszynski (#5)

hackersgeneral

Re: Large databases, performance

On 3 Oct 2002 at 10:26, Charles H. Woloszynski wrote:

Forgive my ignorance, but what about 2.4.19-16 is that much faster? Are
we talking about 2x improvement for your tests? We are currently on
2.4.9 and looking at the performance and wondering... so any comments
are appreciated.

Well, for one thing, 2.4.19 contains backported O(1) scheduler patch which
improves SMP performance by heaps as task queue is per cpu rather than one per
system. I don't think any system routinely runs thousands of processes unless
it's a web/ftp/mail server. In that case improved scheduling wuld help as
well..

Besides there were major VM rewrites/changes after 2.4.10 which corrected
almost all the major VM fiaskos on linux. For anything VM intensive it's
recommended that you run 2.4.17 at least.

I would say it's worth going for it.

Bye
Shridhar

--
Sturgeon's Law: 90% of everything is crud.

Shridhar Daithankar

shridhar_daithankar@persistent.co.in

over 23 years ago

In reply to: Shridhar Daithankar (#4)

hackersgeneral

Re: Large databases, performance

On 3 Oct 2002 at 19:33, Shridhar Daithankar wrote:

On 3 Oct 2002 at 13:56, Nigel J. Andrews wrote:

It's one hell of a DB you're building. I'm sure I'm not the only one interested
so to satisfy those of us who are nosey: can you say what the application is?

I'm sure we'll all understand if it's not possible for you mention such
information.

Well, I can't tell everything but somethings I can..

1) This is a system that does not have online capability yet. This is an
attempt to provide one.

2) The goal is to avoid costs like licensing oracle. I am sure this would make
a great example for OSDB advocacy, which ever database wins..

3) The database size estimates, I put earlier i.e. 9 billion tuples/900GB data
size, are in a fixed window. The data is generated from some real time systems.
You can imagine the rate.

Read that fixed time window..

4) Further more there are timing restrictions attached to it. 5K inserts/sec.
4800 queries per hour with response time of 10 sec. each. It's this aspect that
has forced us for partitioning..

And contrary to my earlier information, this is going to be a live system
rather than a back up one.. A better win to postgresql.. I hope it makes it.

And BTW, all these results were on reiserfs. We didn't found much of difference
in write performance between them. So we stick to reiserfs. And of course we
got the latest hot shot Mandrake9 with 2.4.19-16 which really made difference
over RHL7.2..

Well, we were comparing ext3 v/s reiserfs. I don't remember the journalling
mode of ext3 but we did a 10 GB write test. Besides converting the RAID to RAID-
0 from RAID-5 might have something to do about it.

There was a discussion on hackers some time back as in which file system is
better. I hope this might have an addition over it..

Bye
Shridhar

--
"What terrible way to die." "There are no good ways." -- Sulu and Kirk, "That
Which Survives", stardate unknown

Robert Treat

xzilla@users.sourceforge.net

over 23 years ago

In reply to: Shridhar Daithankar (#1)

hackersgeneral

Re: Large databases, performance

NOTE: Setting follow up to the performance list

Funny that the status quo seems to be if you need fast selects on data
that has few inserts to pick mysql, otherwise if you have a lot of
inserts and don't need super fast selects go with PostgreSQL; yet your
data seems to cut directly against this.

I'm curious, did you happen to run the select tests while also running
the insert tests? IIRC the older mysql versions have to lock the table
when doing the insert, so select performance goes in the dumper in that
scenario, perhaps that's not an issue with 3.23.52?

It also seems like the vacuum after each insert is unnecessary, unless
your also deleting/updating data behind it. Perhaps just running an
ANALYZE on the table would suffice while reducing overhead.

Robert Treat

Show quoted text

On Thu, 2002-10-03 at 08:36, Shridhar Daithankar wrote:

Machine
Compaq Proliant Server ML 530
"Intel Xeon 2.4 Ghz Processor x 4, "
"4 GB RAM, 5 x 72.8 GB SCSI HDD "
"RAID 0 (Striping) Hardware Setup, Mandrake Linux 9.0"
"Cost - $13,500 ($1,350 for each additional 72GB HDD)"

Performance Parameter MySQL 3.23.52 MySQL 3.23.52 PostgreSQL 7.2.2
WITHOUT InnoDB WITH InnoDB for with built-in support
for transactional transactional support for transactions
support
Complete Data

Inserts + building a composite index
"40 GB data, 432,000,000 tuples" 3738 secs 18720 secs 20628 secs
"about 100 bytes each, schema on
'schema' sheet"
"composite index on 3 fields
(esn, min, datetime)"

Load Speed 115570 tuples/second 23076 tuples/second 20942 tuples/second

Database Size on Disk 48 GB 87 GB 111 GB

Average per partition

Inserts + building a composite index
"300MB data, 3,000,000 tuples," 28 secs 130 secs 150 secs
"about 100 bytes each, schema on
'schema' sheet"
"composite index on 3 fields
(esn, min, datetime)"

Select Query 7 secs 7 secs 6 secs
based on equality match of 2 fields
(esn and min) - 4 concurrent queries
running

Database Size on Disk 341 MB 619 MB 788 MB
----

Shridhar Daithankar

shridhar_daithankar@persistent.co.in

over 23 years ago

In reply to: Charles H. Woloszynski (#2)

hackersgeneral

Re: Large databases, performance

On 3 Oct 2002 at 8:54, Charles H. Woloszynski wrote:

Can you comment on the tools you are using to do the insertions (Perl,
Java?) and the distribution of data (all random, all static), and the
transaction scope (all inserts in one transaction, each insert as a
single transaction, some group of inserts as a transaction).

Most proably it's all inserts in one transaction spread almost uniformly over
around 15-20 tables. Of course there will be bunch of transactions..

I'd be curious what happens when you submit more queries than you have
processors (you had four concurrent queries and four CPUs), if you care
to run any additional tests. Also, I'd report the query time in
absolute (like you did) and also in 'Time/number of concurrent queries".
This will give you a sense of how the system is scaling as the workload
increases. Personally I am more concerned about this aspect than the
load time, since I am going to guess that this is where all the time is
spent.

I don't think so. Because we plan to put enough shared buffers that would
almost contain the indexes in RAM if not data. Besides number of tuples
expected per query are not many. So more concurrent queries are not going to
hog anything other than CPU power at most.

Our major concern remains load time as data is generated in real time and is
expecetd in database with in specified time period. We need indexes for query
and inserting into indexed table is on hell of a job. We did attempt inserting
8GB of data in indexed table. It took almost 20 hours at 1K tuples per second
on average.. Though impressive it's not acceptable for that load..

Was the original posting on GENERAL or HACKERS. Is this moving the
PERFORMANCE for follow-up? I'd like to follow this discussion and want
to know if I should join another group?

Shall I subscribe to performance? What's the exat list name? Benchmarks? I
don't see anything as performance mailing list on this page..
http://developer.postgresql.org/mailsub.php?devlp

P.S. Anyone want to comment on their expectation for 'commercial'
databases handling this load? I know that we cannot speak about
specific performance metrics on some products (licensing restrictions)
but I'd be curious if folks have seen some of the databases out there
handle these dataset sizes and respond resonably.

Well, if something handles such kind of data with single machine and costs
under USD20K for entire setup, I would be willing to recommend that to client..

BTW we are trying same test on HP-UX. I hope we get some better figures on 64
bit machines..

Bye
Shridhar

--
Clarke's Conclusion: Never let your sense of morals interfere with doing the
right thing.

#10

Justin Clift

justin@postgresql.org

over 23 years ago

In reply to: Shridhar Daithankar (#9)

hackersgeneral

Re: [HACKERS] Large databases, performance

Shridhar Daithankar wrote:
<snip>

Was the original posting on GENERAL or HACKERS. Is this moving the
PERFORMANCE for follow-up? I'd like to follow this discussion and want
to know if I should join another group?

Shall I subscribe to performance? What's the exat list name? Benchmarks? I
don't see anything as performance mailing list on this page..
http://developer.postgresql.org/mailsub.php?devlp

It's a fairly new mailing list. :)

pgsql-performance@postgresql.org

Easiest way to subscribe is by emailing majordomo@postgresql.org with:

subscribe pgsql-performance

as the message body.

:-)

Regards and best wishes,

Justin Clift

<snip>

Bye
Shridhar

--
Clarke's Conclusion: Never let your sense of morals interfere with doing the
right thing.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi

#11

Shridhar Daithankar

shridhar_daithankar@persistent.co.in

over 23 years ago

In reply to: Robert Treat (#8)

hackersgeneral

Re: Large databases, performance

On 3 Oct 2002 at 11:57, Robert Treat wrote:

NOTE: Setting follow up to the performance list

Funny that the status quo seems to be if you need fast selects on data
that has few inserts to pick mysql, otherwise if you have a lot of
inserts and don't need super fast selects go with PostgreSQL; yet your
data seems to cut directly against this.

Well, couple of things..

The number of inserts aren't few. it's 5000/sec.required in the field Secondly
I don't know really but postgresql seems doing pretty fine in parallel selects.
If we use mysql with transaction support then numbers are really close..

May be it's time to rewrite famous myth that postgresql is slow. When properly
tuned or given enough head room, it's almost as fast as mysql..

I'm curious, did you happen to run the select tests while also running
the insert tests? IIRC the older mysql versions have to lock the table
when doing the insert, so select performance goes in the dumper in that
scenario, perhaps that's not an issue with 3.23.52?

IMO even if it locks tables that shouldn't affect select performance. It would
be fun to watch when we insert multiple chunks of data and fire queries
concurrently. I would be surprised if mysql starts slowing down..

It also seems like the vacuum after each insert is unnecessary, unless
your also deleting/updating data behind it. Perhaps just running an
ANALYZE on the table would suffice while reducing overhead.

I believe that was vacuum analyze only. But still it takes lot of time. Good
thing is it's not blocking..

Anyway I don't think such frequent vacuums are going to convince planner to
choose index scan over sequential scan. I am sure it's already convinced..

Regards,
Shridhar

-----------------------------------------------------------
Shridhar Daithankar
LIMS CPE Team Member, PSPL.
mailto:shridhar_daithankar@persistent.co.in
Phone:- +91-20-5678900 Extn.270
Fax :- +91-20-5678901
-----------------------------------------------------------

#12

Greg Copeland

greg@CopelandConsulting.Net

over 23 years ago

In reply to: Shridhar Daithankar (#7)

hackersgeneral

Re: [HACKERS] Large databases, performance

On Thu, 2002-10-03 at 10:56, Shridhar Daithankar wrote:

Well, we were comparing ext3 v/s reiserfs. I don't remember the journalling
mode of ext3 but we did a 10 GB write test. Besides converting the RAID to RAID-
0 from RAID-5 might have something to do about it.

There was a discussion on hackers some time back as in which file system is
better. I hope this might have an addition over it..

Hmm. Reiserfs' claim to fame is it's low latency with many, many small
files and that it's journaled. I've never seem anyone comment about it
being considered an extremely fast file system in an general computing
context nor have I seen any even hint at it as a file system for use in
heavy I/O databases. This is why Reiserfs is popular with news and
squid cache servers as it's almost an ideal fit. That is, tons of small
files or directories contained within a single directory. As such, I'm
very surprised that reiserfs is even in the running for your comparison.

Might I point you toward XFS, JFS, or ext3, ? As I understand it, XFS
and JFS are going to be your preferred file systems for for this type of
application with XFS in the lead as it's tool suite is very rich and
robust. I'm actually lacking JFS experience but from what I've read,
it's a notch or two back from XFS in robustness (assuming we are talking
Linux here). Feel free to read and play to find out for your self. I'd
recommend that you start playing with XFS to see how the others
compare. After all, XFS' specific claim to fame is high throughput w/
low latency on large and very large files. Furthermore, they even have
a real time mechanism that you can further play with to see how it
effects your throughput and/or latencies.

Greg

#13

Robert Treat

xzilla@users.sourceforge.net

over 23 years ago

In reply to: Shridhar Daithankar (#11)

hackersgeneral

Re: Large databases, performance

On Thu, 2002-10-03 at 12:17, Shridhar Daithankar wrote:

On 3 Oct 2002 at 11:57, Robert Treat wrote:
May be it's time to rewrite famous myth that postgresql is slow.

That myth has been dis-proven long ago, it just takes awhile for
everyone to catch on ;-)

When properly

tuned or given enough head room, it's almost as fast as mysql..

I'm curious, did you happen to run the select tests while also running
the insert tests? IIRC the older mysql versions have to lock the table
when doing the insert, so select performance goes in the dumper in that
scenario, perhaps that's not an issue with 3.23.52?

IMO even if it locks tables that shouldn't affect select performance. It would
be fun to watch when we insert multiple chunks of data and fire queries
concurrently. I would be surprised if mysql starts slowing down..

Hmm... been awhile since I dug into mysql internals, but IIRC once the
table was locked, you had to wait for the insert to complete so the
table would be unlocked and the select could go through. (maybe this is
a myth that I need to get clued in on)

It also seems like the vacuum after each insert is unnecessary, unless
your also deleting/updating data behind it. Perhaps just running an
ANALYZE on the table would suffice while reducing overhead.

I believe that was vacuum analyze only. But still it takes lot of time. Good
thing is it's not blocking..

Anyway I don't think such frequent vacuums are going to convince planner to
choose index scan over sequential scan. I am sure it's already convinced..

My thinking was that if your just doing inserts, you need to update the
statistics but don't need to check on unused tuples.

Robert Treat

#14

Shridhar Daithankar

shridhar_daithankar@persistent.co.in

over 23 years ago

In reply to: Greg Copeland (#12)

hackersgeneral

Re: [HACKERS] Large databases, performance

On 3 Oct 2002 at 11:23, Greg Copeland wrote:

On Thu, 2002-10-03 at 10:56, Shridhar Daithankar wrote:

Well, we were comparing ext3 v/s reiserfs. I don't remember the journalling
mode of ext3 but we did a 10 GB write test. Besides converting the RAID to RAID-
0 from RAID-5 might have something to do about it.

There was a discussion on hackers some time back as in which file system is
better. I hope this might have an addition over it..

Hmm. Reiserfs' claim to fame is it's low latency with many, many small
files and that it's journaled. I've never seem anyone comment about it
being considered an extremely fast file system in an general computing
context nor have I seen any even hint at it as a file system for use in
heavy I/O databases. This is why Reiserfs is popular with news and
squid cache servers as it's almost an ideal fit. That is, tons of small
files or directories contained within a single directory. As such, I'm
very surprised that reiserfs is even in the running for your comparison.

Might I point you toward XFS, JFS, or ext3, ? As I understand it, XFS
and JFS are going to be your preferred file systems for for this type of
application with XFS in the lead as it's tool suite is very rich and
robust. I'm actually lacking JFS experience but from what I've read,
it's a notch or two back from XFS in robustness (assuming we are talking
Linux here). Feel free to read and play to find out for your self. I'd
recommend that you start playing with XFS to see how the others
compare. After all, XFS' specific claim to fame is high throughput w/
low latency on large and very large files. Furthermore, they even have
a real time mechanism that you can further play with to see how it
effects your throughput and/or latencies.

I would try that. Once we are thr. with tests at our hands..

Bye
Shridhar

--
"The combination of a number of things to make existence worthwhile." "Yes,
the philosophy of 'none,' meaning 'all.'" -- Spock and Lincoln, "The Savage
Curtain", stardate 5906.4

#15

Shridhar Daithankar

shridhar_daithankar@persistent.co.in

over 23 years ago

In reply to: Robert Treat (#13)

hackersgeneral

Re: Large databases, performance

On 3 Oct 2002 at 12:26, Robert Treat wrote:

On Thu, 2002-10-03 at 12:17, Shridhar Daithankar wrote:

On 3 Oct 2002 at 11:57, Robert Treat wrote:
May be it's time to rewrite famous myth that postgresql is slow.

That myth has been dis-proven long ago, it just takes awhile for
everyone to catch on ;-)

:-)

Hmm... been awhile since I dug into mysql internals, but IIRC once the
table was locked, you had to wait for the insert to complete so the
table would be unlocked and the select could go through. (maybe this is
a myth that I need to get clued in on)

If that turns out to be true, I guess mysql will nose dive out of window.. May
be time to run a test that's nearer to real world expectation, especially in
terms on concurrency..

I don't think tat will be an issue with mysql with transaction support. The
vanilla one might suffer.. Not the other one.. At least theoretically..

My thinking was that if your just doing inserts, you need to update the
statistics but don't need to check on unused tuples.

Any other way of doing that other than vacuum analyze? I thought that was the
only way..

Bye
Shridhar

--
"Even more amazing was the realization that God has Internet access. Iwonder
if He has a full newsfeed?"(By Matt Welsh)

#16

Manfred Koizar

mkoi-pg@aon.at

over 23 years ago

In reply to: Shridhar Daithankar (#1)

hackersgeneral

Re: Large databases, performance

On Thu, 03 Oct 2002 18:06:10 +0530, "Shridhar Daithankar"
<shridhar_daithankar@persistent.co.in> wrote:

Machine
Compaq Proliant Server ML 530
"Intel Xeon 2.4 Ghz Processor x 4, "
"4 GB RAM, 5 x 72.8 GB SCSI HDD "
"RAID 0 (Striping) Hardware Setup, Mandrake Linux 9.0"

Shridhar,

forgive me if I ask what has been said before: Did you run at 100%
CPU or was IO bandwidth your limit? And is the answer the same for
all three configurations?

Servus
Manfred

#17

Hans-Jürgen Schönig

postgres@cybertec.at

over 23 years ago

In reply to: Shridhar Daithankar (#1)

hackersgeneral

Re: [HACKERS] Large databases, performance

Shridhar Daithankar wrote:

On 3 Oct 2002 at 11:57, Robert Treat wrote:

NOTE: Setting follow up to the performance list

Funny that the status quo seems to be if you need fast selects on data
that has few inserts to pick mysql, otherwise if you have a lot of
inserts and don't need super fast selects go with PostgreSQL; yet your
data seems to cut directly against this.

Well, couple of things..

The number of inserts aren't few. it's 5000/sec.required in the field Secondly
I don't know really but postgresql seems doing pretty fine in parallel selects.
If we use mysql with transaction support then numbers are really close..

May be it's time to rewrite famous myth that postgresql is slow. When properly
tuned or given enough head room, it's almost as fast as mysql..

In the case of concurrent transactions MySQL does not do as well due to
very bad locking behavious. PostgreSQL is far better because it does row
level locking instead of table locking.
If you have many concurrent transactions MySQL performs some sort of
"self-denial-of-service". I'd choose PostgreSQL in order to make sure
that the database does not block.

I'm curious, did you happen to run the select tests while also running
the insert tests? IIRC the older mysql versions have to lock the table
when doing the insert, so select performance goes in the dumper in that
scenario, perhaps that's not an issue with 3.23.52?

IMO even if it locks tables that shouldn't affect select performance. It would
be fun to watch when we insert multiple chunks of data and fire queries
concurrently. I would be surprised if mysql starts slowing down..

In the case of concurrent SELECTs and INSERT/UPDATE/DELETE operations
MySQL will slow down for sure. The more concurrent transactions you have
the worse MySQL will be.

It also seems like the vacuum after each insert is unnecessary, unless
your also deleting/updating data behind it. Perhaps just running an
ANALYZE on the table would suffice while reducing overhead.

I believe that was vacuum analyze only. But still it takes lot of time. Good
thing is it's not blocking..

Anyway I don't think such frequent vacuums are going to convince planner to
choose index scan over sequential scan. I am sure it's already convinced..

PostgreSQL allows you to improve execution plans by giving the planner a
hint.
In addition to that: if you need REAL performance and if you are running
similar queries consider using SPI.

Also: 7.3 will support PREPARE/EXECUTE.

If you are running MySQL you will not be able to add features to the
database easily.
In the case of PostgreSQL you have a broad range of simple interfaces
which make many things pretty simple (eg. optimized data types in < 50
lines of C code).

PostgreSQL is the database of the future and you can perform a lot of
tuning.
MySQL is a simple frontend to a filesystem and it is fast as long as you
are doing SELECT 1+1 operations.

Also: Keep in mind that PostgreSQL has a wonderful core team. MySQL is
built on Monty Widenius and the core team = Monty.
Also: PostgreSQL = ANSI compilant, MySQL = Monty compliant

In the past few years I have seen that there is no database system which
can beat PostgreSQL's flexibility and stability.
I am familiar with various database systems but believe: PostgreSQL is
the best choice.

Hans

Regards,
Shridhar

-----------------------------------------------------------
Shridhar Daithankar
LIMS CPE Team Member, PSPL.
mailto:shridhar_daithankar@persistent.co.in
Phone:- +91-20-5678900 Extn.270
Fax :- +91-20-5678901
-----------------------------------------------------------

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

--
*Cybertec Geschwinde u Schoenig*
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/1/913 68 09; +43/664/233 90 75
www.postgresql.at <http://www.postgresql.at>, cluster.postgresql.at
<http://cluster.postgresql.at>, www.cybertec.at
<http://www.cybertec.at>, kernel.cybertec.at <http://kernel.cybertec.at>

#18

Manfred Koizar

mkoi-pg@aon.at

over 23 years ago

In reply to: Shridhar Daithankar (#11)

hackersgeneral

Re: [HACKERS] Large databases, performance

On Thu, 03 Oct 2002 21:47:03 +0530, "Shridhar Daithankar"
<shridhar_daithankar@persistent.co.in> wrote:

I believe that was vacuum analyze only.

Well there is

VACUUM [tablename];

and there is

ANALYZE [tablename];

And

VACUUM ANALYZE [tablename];

is VACUUM followed by ANALYZE.

Servus
Manfred

#19

Ron Johnson

ron.l.johnson@cox.net

over 23 years ago

In reply to: Shridhar Daithankar (#11)

hackersgeneral

Re: Large databases, performance

On Thu, 2002-10-03 at 11:17, Shridhar Daithankar wrote:

On 3 Oct 2002 at 11:57, Robert Treat wrote:

[snip]

I'm curious, did you happen to run the select tests while also running
the insert tests? IIRC the older mysql versions have to lock the table
when doing the insert, so select performance goes in the dumper in that
scenario, perhaps that's not an issue with 3.23.52?

IMO even if it locks tables that shouldn't affect select performance. It would
be fun to watch when we insert multiple chunks of data and fire queries
concurrently. I would be surprised if mysql starts slowing down..

What kind of lock? Shared lock or exclusive lock? If SELECT
performance tanked when doing simultaneous INSERTs, then maybe there
were exclusive table locks.

-- 
+------------------------------------------------------------+
| Ron Johnson, Jr.     mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA   http://members.cox.net/ron.l.johnson  |
|                                                            |
| "What other evidence do you have that they are terrorists, |
|  other than that they trained in these camps?"             |
|   17-Sep-2002 Katie Couric to an FBI agent regarding the 5 |
|   men arrested near Buffalo NY                             |
+------------------------------------------------------------+

#20

Ron Johnson

ron.l.johnson@cox.net

over 23 years ago

In reply to: Hans-Jürgen Schönig (#17)

hackersgeneral

Re: [HACKERS] Large databases, performance

On Thu, 2002-10-03 at 11:51, Hans-Jürgen Schönig wrote:

Shridhar Daithankar wrote:

On 3 Oct 2002 at 11:57, Robert Treat wrote:

[snip]

PostgreSQL allows you to improve execution plans by giving the planner a
hint.
In addition to that: if you need REAL performance and if you are running
similar queries consider using SPI.

What is SPI?

-- 
+------------------------------------------------------------+
| Ron Johnson, Jr.     mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA   http://members.cox.net/ron.l.johnson  |
|                                                            |
| "What other evidence do you have that they are terrorists, |
|  other than that they trained in these camps?"             |
|   17-Sep-2002 Katie Couric to an FBI agent regarding the 5 |
|   men arrested near Buffalo NY                             |
+------------------------------------------------------------+

#21

Andrew Sullivan

andrew@libertyrms.info

over 23 years ago

In reply to: Hans-Jürgen Schönig (#17)

hackersgeneral

#22

Jean-Luc Lachance

jllachan@nsd.ca

over 23 years ago

In reply to: Shridhar Daithankar (#1)

hackersgeneral

#23

Shridhar Daithankar

shridhar_daithankar@persistent.co.in

over 23 years ago

In reply to: Manfred Koizar (#18)

hackersgeneral

#24

scott.marlowe

scott.marlowe@ihs.com

over 23 years ago

In reply to: Hans-Jürgen Schönig (#17)

hackersgeneral

#25

Hans-Jürgen Schönig

postgres@cybertec.at

over 23 years ago

In reply to: scott.marlowe (#24)

hackersgeneral

#26

Michael Paesold

mpaesold@gmx.at

over 23 years ago

In reply to: Shridhar Daithankar (#1)

hackersgeneral

#27

David Blood

david@matraex.com

over 23 years ago

In reply to: Michael Paesold (#26)

hackersgeneral

#28

Tom Lane

tgl@sss.pgh.pa.us

over 23 years ago

In reply to: David Blood (#27)

hackersgeneral

#29

Curt Sampson

cjs@cynic.net

over 23 years ago

In reply to: Shridhar Daithankar (#7)

hackersgeneral

#30

Curt Sampson

cjs@cynic.net

over 23 years ago

In reply to: Shridhar Daithankar (#9)

hackersgeneral

#31

Tom Lane

tgl@sss.pgh.pa.us

over 23 years ago

In reply to: Curt Sampson (#29)

hackersgeneral

#32

Shridhar Daithankar

shridhar_daithankar@persistent.co.in

over 23 years ago

In reply to: Charles H. Woloszynski (#2)

hackersgeneral

#33

Hans-Jürgen Schönig

postgres@cybertec.at

over 23 years ago

In reply to: Shridhar Daithankar (#32)

hackersgeneral

#34

Larry Rosenman

ler@lerctr.org

over 23 years ago

In reply to: Tom Lane (#31)

hackersgeneral

#35

Michael Paesold

mpaesold@gmx.at

over 23 years ago

In reply to: Curt Sampson (#29)

hackersgeneral

#36

Larry Rosenman

ler@lerctr.org

over 23 years ago

In reply to: Michael Paesold (#35)

hackersgeneral

#37

Manfred Koizar

mkoi-pg@aon.at

over 23 years ago

In reply to: Shridhar Daithankar (#32)

hackersgeneral

#38

Shridhar Daithankar

shridhar_daithankar@persistent.co.in

over 23 years ago

In reply to: Manfred Koizar (#37)

hackersgeneral

#39

Tom Lane

tgl@sss.pgh.pa.us

over 23 years ago

In reply to: Shridhar Daithankar (#32)

hackersgeneral

#40

Shridhar Daithankar

shridhar_daithankar@persistent.co.in

over 23 years ago

In reply to: Tom Lane (#39)

hackersgeneral

#41

Tom Lane

tgl@sss.pgh.pa.us

over 23 years ago

In reply to: Shridhar Daithankar (#38)

hackersgeneral

#42

Manfred Koizar

mkoi-pg@aon.at

over 23 years ago

In reply to: Shridhar Daithankar (#38)

hackersgeneral

#43

Zeugswetter Andreas SB SD

ZeugswetterA@spardat.at

over 23 years ago

In reply to: Manfred Koizar (#42)

hackersgeneral

#44

Shridhar Daithankar

shridhar_daithankar@persistent.co.in

over 23 years ago

In reply to: Tom Lane (#41)

hackersgeneral

#45

Martijn van Oosterhout

kleptog@svana.org

over 23 years ago

In reply to: Shridhar Daithankar (#44)

hackersgeneral

#46

Jan Wieck

JanWieck@Yahoo.com

over 23 years ago

In reply to: David Blood (#27)

hackersgeneral

#47

Ron Johnson

ron.l.johnson@cox.net

over 23 years ago

In reply to: Martijn van Oosterhout (#45)

hackersgeneral

#48

Tom Lane

tgl@sss.pgh.pa.us

over 23 years ago

In reply to: Ron Johnson (#47)

hackersgeneral

#49

Shridhar Daithankar

shridhar_daithankar@persistent.co.in

over 23 years ago

In reply to: Tom Lane (#48)

hackersgeneral

#50

Ron Johnson

ron.l.johnson@cox.net

over 23 years ago

In reply to: Tom Lane (#48)

hackersgeneral

#51

Josh Berkus

josh@agliodbs.com

over 23 years ago

In reply to: Ron Johnson (#50)

hackersgeneral

#52

Tom Lane

tgl@sss.pgh.pa.us

over 23 years ago

In reply to: Ron Johnson (#50)

hackersgeneral

#53

Ron Johnson

ron.l.johnson@cox.net

over 23 years ago

In reply to: Josh Berkus (#51)

hackersgeneral

#54

Josh Berkus

josh@agliodbs.com

over 23 years ago

In reply to: Ron Johnson (#53)

hackersgeneral

#55

Martijn van Oosterhout

kleptog@svana.org

over 23 years ago

In reply to: Zeugswetter Andreas SB SD (#43)

hackersgeneral

#56

Josh Berkus

josh@agliodbs.com

over 23 years ago

In reply to: Josh Berkus (#51)

hackersgeneral

#57

Manfred Koizar

mkoi-pg@aon.at

over 23 years ago

In reply to: Shridhar Daithankar (#32)

hackersgeneral

#58

Shridhar Daithankar

shridhar_daithankar@persistent.co.in

over 23 years ago

In reply to: Manfred Koizar (#57)

hackersgeneral

#59

Shridhar Daithankar

shridhar_daithankar@persistent.co.in

over 23 years ago

In reply to: Manfred Koizar (#57)

hackersgeneral

#60

Tom Lane

tgl@sss.pgh.pa.us

over 23 years ago

In reply to: Manfred Koizar (#57)

hackersgeneral

#61

Shridhar Daithankar

shridhar_daithankar@persistent.co.in

over 23 years ago

In reply to: Tom Lane (#60)

hackersgeneral

#62

jose antonio leo

jaleo8@storelandia.com

over 23 years ago

In reply to: Tom Lane (#60)

hackersgeneral

#63

Stephan Szabo

sszabo@megazone23.bigpanda.com

over 23 years ago

In reply to: jose antonio leo (#62)

hackersgeneral

#64

Manfred Koizar

mkoi-pg@aon.at

over 23 years ago

In reply to: Tom Lane (#60)

hackersgeneral

#65

Manfred Koizar

mkoi-pg@aon.at

over 23 years ago

In reply to: Manfred Koizar (#57)

hackersgeneral

#66

Shridhar Daithankar

shridhar_daithankar@persistent.co.in

over 23 years ago

In reply to: Manfred Koizar (#65)

hackersgeneral

#67

Giles Lean

giles@nemeton.com.au

over 23 years ago

In reply to: Shridhar Daithankar (#66)

hackers

#68

Tom Lane

tgl@sss.pgh.pa.us

over 23 years ago

In reply to: Giles Lean (#67)

hackers

#69

Shridhar Daithankar

shridhar_daithankar@persistent.co.in

over 23 years ago

In reply to: Giles Lean (#67)

hackers

#70

Warren Massengill

warrenmassengill@hotmail.com

over 23 years ago

In reply to: Larry Rosenman (#36)

hackers