Re: your mail

Started by Bruce Momjianalmost 28 years ago9 messages
#1Bruce Momjian
maillist@candle.pha.pa.us

By the way, I have a sugestion I'd like to do about psql:

It would be a good idea, IMHO, that if psql is called alone (without
database nor any parameter), instead of try to connect to database 'user'
starts interactively without connection. Then, simple calling "\c dbname"
would connect the user to the desired database. What you think?
Cheers,

This seems like a good idea. Any comments?

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#2Brett McCormick
brett@work.chicken.org
In reply to: Bruce Momjian (#1)
Re: [HACKERS] Re: your mail

I dunno, I think it's kind of nice the way is really. Perhaps if the
user db doesn't exist, it could say so and start interactively as per
below. Either way I'm not sure it matters though. I think it just
nice for connecting to my random test databases.

On Sat, 7 March 1998, at 11:11:42, Bruce Momjian wrote:

Show quoted text

By the way, I have a sugestion I'd like to do about psql:

It would be a good idea, IMHO, that if psql is called alone (without
database nor any parameter), instead of try to connect to database 'user'
starts interactively without connection. Then, simple calling "\c dbname"
would connect the user to the desired database. What you think?
Cheers,

This seems like a good idea. Any comments?

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
+  If your life is a hard drive,     |  (610) 353-9879(w)
+  Christ can be your backup.        |  (610) 853-3000(h)
#3Noname
fgschwindt@qnx.com
In reply to: Brett McCormick (#2)

It would be a good idea, IMHO, that if psql is called alone (without
database nor any parameter), instead of try to connect to database 'user'
starts interactively without connection. Then, simple calling "\c dbname"
would connect the user to the desired database. What you think?
Cheers,

This seems like a good idea. Any comments?

When I was doing my own changes in the source to achieve this behavior, I've
noticed that in several places exit is called without free'ing the PGconn
pointer, the prompt and other things, specially when malloc fails and when
psql cannot establish the connection to a new database.
It's not a great thing, but it'd be good to fix it.
Cheers,

Federico.

--
Before humanity was born to this world Federico Schwindt
the stars shone in the heavens. fgschwindt@qnx.com
Long after humanity is gone
the stars will continue to shine.

#4Roland B. Roberts
roberts@panix.com
In reply to: Bruce Momjian (#1)
Re: [HACKERS] Re: your mail

"bm" == Bruce Momjian <maillist@candle.pha.pa.us> writes:

By the way, I have a sugestion I'd like to do about psql:

It would be a good idea, IMHO, that if psql is called alone
(without database nor any parameter), instead of try to connect
to database 'user' starts interactively without connection.
Then, simple calling "\c dbname" would connect the user to the
desired database. What you think? Cheers,

bm> This seems like a good idea. Any comments?

I don't mind the current setup, but would like it to not dump me out
if I don't have a database named `roland'. I tend to give my
databases a name that pertains to what they contain, not who uses
them.

Still, having it attempt to connect to `roland' and then leave me at
the psql prompt with a message like "You are not connected to any
database" would be more friendly....

roland
--
Roland B. Roberts, PhD Custom Software Solutions
roberts@panix.com 101 West 15th St #4NN
New York, NY 10011

#5Andrew Martin
martin@biochemistry.ucl.ac.uk
In reply to: Roland B. Roberts (#4)
Re: [HACKERS] Re: your mail

By the way, I have a sugestion I'd like to do about psql:

It would be a good idea, IMHO, that if psql is called alone (without
database nor any parameter), instead of try to connect to database 'user'
starts interactively without connection. Then, simple calling "\c dbname"
would connect the user to the desired database. What you think?
Cheers,

This seems like a good idea. Any comments?

What about having an environment variable to check and if this isn't set either,
start without connecting.

i.e. Priority 1 is the command line
Priority 2 the envvar
Default, start without connection

Andrew

----------------------------------------------------------------------------
Dr. Andrew C.R. Martin University College London
EMAIL: (Work) martin@biochem.ucl.ac.uk (Home) andrew@stagleys.demon.co.uk
URL: http://www.biochem.ucl.ac.uk/~martin
Tel: (Work) +44(0)171 419 3890 (Home) +44(0)1372 275775

#6Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Andrew Martin (#5)

trashing postgresql doing a 'select count(*) from artist_fti'
"select count(*) from artist_fti where string ~ '^lling';"
"select count(*) from artist_fti where string ~ '^tones';"
and the join-statement.

I don't have much knowledge of profiling, but it seems to me that in the
end this is really related to I/O. Dirty disk caches forces postgresql to
read everything from disk again, and this disk is not the fastest in the
world. Still it takes much more time than I would expect. I mean, btree
is designed to have few disk accesses, and then accessing the tables
shouldn't take too long also.

[Marteen is finding word searches on a big table to take a long time.
This is the word slice code we talked about recently as a group. It
rapid searches for words inside of strings. Very useful.]

OK, I meant to reply to this, but forgot to. Let's take the second query:

select count(*) from artist_fti where string ~ '^lling';

Here is the top of the profile output:

---------------------------------------------------------------------------

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls us/call us/call name
10.20 0.05 0.05 18434 2.71 2.71 sstep
8.16 0.09 0.04 3225 12.40 12.41 RelationGetLRelId
6.12 0.12 0.03 5862 5.12 5.12 SpinAcquire
6.12 0.15 0.03 2750 10.91 10.91 ExecEvalVar
6.12 0.18 0.03 105 285.71 411.52 heapgettup
6.12 0.21 0.03 2 15000.00 54472.81 ExecAgg
4.08 0.23 0.02 12711 1.57 1.57 OrderedSetContains
4.08 0.25 0.02 5501 3.64 9.09 ExecEvalExpr
2.04 0.26 0.01 12711 0.79 2.36 AllocSetContains
2.04 0.27 0.01 7722 1.30 1.30 OrderedElemPushHead
2.04 0.28 0.01 7608 1.31 2.63 hash_search
2.04 0.29 0.01 6395 1.56 1.56 tag_hash
2.04 0.30 0.01 5610 1.78 6.50 PortalHeapMemoryFree
2.04 0.31 0.01 2293 4.36 4.36 fmgr_isbuiltin
2.04 0.32 0.01 1919 5.21 5.21 BufferGetBlockNumber
2.04 0.33 0.01 1912 5.23 9.55 ReleaseBuffer
2.04 0.34 0.01 1572 6.36 6.36 TransactionIdEquals
2.04 0.35 0.01 1571 6.37 12.73 HeapTupleSatisfiesVisibili

---------------------------------------------------------------------------

First 'sstep' is taking a lot of time, and that, I think, is the regex
stuff. Author says the regex is slow and can be speeded up, so we may
get a new release from him soon.

Also strange is the time for ExecAgg, which seems fairly high for a
single function, but only 0.015 seconds. Certainly not significant in
the wallclock time.

Now, seeing as everything else adds up to perhaps 1-2 seconds, why 30
seconds for the query. Certainly looks like I/O is the problem.

Have you tried adding -B buffers. This, I think, would help. If you
don't flush the cache, how long does a second identical query take?
Also, Vadim, when the backend is accessing a btree index, does it walk
down to the page it needs and read all rows off of that, or does it
drill down the btree to get each row. Third, remember that these are
made indexable by adding

string >= 'ling' and
string <= 'ling\377'

Now, if each is taken to evaluate litarally, I am guessing the backend
is getting all >= 'ling' and then all <= 'ling\377' and then finding the
ones that match. This is terrible for performance. I believe this is
the crux of the problem, and why so much I/O is going on.

Vadim, any way to make the optimizer realize that we are looking for a
range of values, and preventing it from pulling all those rows from the
index? Almost some index scheme that would get all >= 'ling' but stop
when they get > 'ling\377'?

With this going on, the system is spinning through the entire btree
index each time to get the data, even though it is a small subset.

As part of a test, would you please run:

select count(*) from artist_fti where string >= 'lling'

and

select count(*) from artist_fti where string <= 'lling\377'

and then try:

select count(*) from artist_fti where string >= 'lling' and
string <= 'lling\377'

Also try:

select count(*) from artist_fti where string ~ '^lling' and
string >= 'lling and
string <= 'lling\377';

Try these with the cache trashing code to get good numbers on the
performance. My guess is that the sum of the first and second execution
times will equal the time for the third, and the third and fourth will
take the same amount of time.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#7Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#1)
Re: [HACKERS] Re: your mail

It would be a good idea, IMHO, that if psql is called alone (without
database nor any parameter), instead of try to connect to database 'user'
starts interactively without connection. Then, simple calling "\c dbname"
would connect the user to the desired database. What you think?
Cheers,

This seems like a good idea. Any comments?

Of course :) I personally like the current default behavior, and I think that
some others find it similarly convenient. If the alternate behavior is
desirable for some, how about implementing a command line switch which would
change the default behavior to "don't open anything". Then, you can alias the
definition of psql to get what you want.

I actually had a patch of some sort which changed the behavior of "\c
unknownDB"; at the moment if a connection fails psql bails out. The patch left
the psql session open and connected to the previous database. That behavior
would be dangerous in some cases so we didn't apply it; going to an
"unconnected state" would be more helpful and less dangerous.

- Tom

#8Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Thomas G. Lockhart (#7)

For each test, I will list 2 numbers here, the first being with the default
number of buffers, the second with 256 buffers (ie. 2048K)

btw. just to give all possible information: I start postgreSQL with the
following arguments:

-i -b /usr/local/pgsql/bin/postgres -S 1024 -o "-o mylogfile -F -d 1"

OK, I meant to reply to this, but forgot to. Let's take the second query:

select count(*) from artist_fti where string ~ '^lling';

Have you tried adding -B buffers. This, I think, would help. If you
don't flush the cache, how long does a second identical query take?

This is 3 times the above query without trashing in between and with
the default number of buffers:

0.030u 0.020s 0:23.41 0.2% 0+0k 0+0io 201pf+0
0.040u 0.010s 0:01.06 4.7% 0+0k 0+0io 184pf+0w
0.030u 0.040s 0:00.70 10.0% 0+0k 0+0io 184pf+0w

OK, I think this actually tells the whole story. The query goes from 22
seconds to 0.75 seconds because all of the btree indexes are in the
buffer. Looks like the optimizer is clearly understanding both parts
of the restriction, which is good. No problems there.

Also seems the index fits easily in the cache.

Now my question is "What is the performance problem?" Doesn't the
shared buffer cache keep these in memory, so the first one is slow, but
the rest are fast, or does the buffer get flushed a lot, and performance
is terrible on the first query after that. Does going after other words
flush the cache for previous words searched?

I must say, I am surprised that the buffer cache causes it to speed up
so much. Are these really slow disks?

In the Ingres case, the first query or two was not as quick, but they
speeded up, and kept pretty fast for the rest of the day. Also, the
indexes where ISAM, which has a less sophisticated/less overhead way of
indexing than btree.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#9Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Bruce Momjian (#8)

Hi,

I have done a little more testing, and the performance bottleneck
seems definitely be memory related. Note that it does not really seems
to be dependend on buffer-settings, but really on disk caches.

additional info:
the index on this table is around 155 Megs big

Now, if I do a count(*) on '^rol', after the second query, this takes
around 1 second, and returns 2528.

On the other hand, if I do a count(*) on '^ric', his takes consequently
around 1:30 mins, no matter how often I run it. This returns 7866.

A search on count(*) of '^lling' and '^tones' takes around 2.5 secs after
running it several times.

Wow, this makes no sense to me.

Running different queries in between affect these times.

My computer has 64 Megs of RAM, and I'm running X (linux 2.0.30)

So, it seems to me that with this amount of memory, my system is usable
only for smaller tables (ie. no 550,000 rows in the main table, and no
4,500,000 rows in the 'index' table). If I want better performance for this
setup, I need faster disks and more (how much more?) memory. Wish I could
test this somehow.... Maybe I can ask my sysop at the university if I may
test this on the dual PPro with 256 megs, but I don't think so....

Maarten

_____________________________________________________________________________
| TU Delft, The Netherlands, Faculty of Information Technology and Systems |
| Department of Electrical Engineering |
| Computer Architecture and Digital Technique section |
| M.Boekhold@et.tudelft.nl |
-----------------------------------------------------------------------------

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)