some points for FAQ

Started by Pavel Stehuleover 18 years ago13 messages
#1Pavel Stehule
pavel.stehule@gmail.com

4.1)

To SELECT a random row, use:
SELECT col
FROM tab
ORDER BY random()
LIMIT 1;

+ On bigger tables this solution is slow. Please, find smarter
solution on network.

4.6)

ILIKE is slow, specially on multibyte encodings. If is possible use
FULLTEXT. LIKE '%some%' is slow always .. thing about FULLTEXT.

4.11.2)

+ Alternatively (on PostgreSQL 8.2.0 and all later releases) you could
RETURNING clause for retrieving used SERIAL value, e.g.,

new_id = execute("SELECT INSERT INTO person(name) VALUES('Blaise
Pascal') RETURNING id");

4.19)

+ most of problems with invalid OIDs in cache are solved in PostgreSQL
8.3. Please remeber, so every replanning of SQL statements needs time.
Write your application, they can exist without cache invalidation.

I am sorry, I am not able create patch via my minimalistic english knowleage.

Regards
Pavel Stehule

p.s. can we create some general F.A.Q XML format and store FAQ there?

WIP Proposal:

<faq name = ..... language = >
<entry number="1.1.1">
<query>....</query>
<ansver>
...
we need some tags from html: <p><br><a><i><b><ul><li><table>

#2Alvaro Herrera
alvherre@commandprompt.com
In reply to: Pavel Stehule (#1)
Re: some points for FAQ

Pavel Stehule escribi�:

p.s. can we create some general F.A.Q XML format and store FAQ there?

WIP Proposal:

<faq name = ..... language = >
<entry number="1.1.1">
<query>....</query>
<ansver>
...
we need some tags from html: <p><br><a><i><b><ul><li><table>

There is a DocBook spec for FAQ lists. Actually a friend of mine was
working on converting our FAQ into that kind of XML.

--
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
"God is real, unless declared as int"

#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: Alvaro Herrera (#2)
Re: some points for FAQ

2007/10/9, Alvaro Herrera <alvherre@commandprompt.com>:

Pavel Stehule escribió:

p.s. can we create some general F.A.Q XML format and store FAQ there?

WIP Proposal:

<faq name = ..... language = >
<entry number="1.1.1">
<query>....</query>
<ansver>
...
we need some tags from html: <p><br><a><i><b><ul><li><table>

There is a DocBook spec for FAQ lists. Actually a friend of mine was
working on converting our FAQ into that kind of XML.

I'll look on it

Pavel

#4Chris Browne
cbbrowne@acm.org
In reply to: Pavel Stehule (#1)
Re: some points for FAQ

alvherre@commandprompt.com (Alvaro Herrera) writes:

Pavel Stehule escribi�:

p.s. can we create some general F.A.Q XML format and store FAQ there?

WIP Proposal:

<faq name = ..... language = >
<entry number="1.1.1">
<query>....</query>
<ansver>
...
we need some tags from html: <p><br><a><i><b><ul><li><table>

There is a DocBook spec for FAQ lists. Actually a friend of mine was
working on converting our FAQ into that kind of XML.

Yup, the structure is known as a <qandaset>

<http://www.docbook.org/tdg/en/html/qandaset.html&gt;

There is an example of this in the Slony-I docs - the admin guide has
a FAQ defined using qandaset and its children.
--
"cbbrowne","@","acm.org"
http://www3.sympatico.ca/cbbrowne/faq.html
All extremists should be taken out and shot.

#5Bruce Momjian
bruce@momjian.us
In reply to: Pavel Stehule (#1)
Re: some points for FAQ

Pavel Stehule wrote:

4.1)

To SELECT a random row, use:
SELECT col
FROM tab
ORDER BY random()
LIMIT 1;

+ On bigger tables this solution is slow. Please, find smarter
solution on network.

Well, give me a better example that works.

4.6)

ILIKE is slow, specially on multibyte encodings. If is possible use
FULLTEXT. LIKE '%some%' is slow always .. thing about FULLTEXT.

I added a mention of "full text indexing" for word searches.

4.11.2)

+ Alternatively (on PostgreSQL 8.2.0 and all later releases) you could
RETURNING clause for retrieving used SERIAL value, e.g.,

new_id = execute("SELECT INSERT INTO person(name) VALUES('Blaise
Pascal') RETURNING id");

Agreed. I have updated the text to suggest RETURNING be used and
reduced the other examples. The web site should have the updated
content shortly but CVS will have FAQ.html as well soon.

4.19)

+ most of problems with invalid OIDs in cache are solved in PostgreSQL
8.3. Please remeber, so every replanning of SQL statements needs time.
Write your application, they can exist without cache invalidation.

Agreed. Item removed.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#6Pavel Stehule
pavel.stehule@gmail.com
In reply to: Bruce Momjian (#5)
Re: some points for FAQ

2007/10/9, Bruce Momjian <bruce@momjian.us>:

Pavel Stehule wrote:

4.1)

To SELECT a random row, use:
SELECT col
FROM tab
ORDER BY random()
LIMIT 1;

+ On bigger tables this solution is slow. Please, find smarter
solution on network.

Well, give me a better example that works.

Better universal solution doesn't exist. Exists only unelegant
solutions - but mutch faster.

SELECT id, ...
FROM data
WHERE id = ANY(ARRAY(
SELECT (random()*:max_id)::int
FROM generate_series(1,20)))
LIMIT 1;

max_id is host variable ~ real max id + some

-- fast solution if id is PK of data

4.19)

+ most of problems with invalid OIDs in cache are solved in PostgreSQL
8.3. Please remeber, so every replanning of SQL statements needs time.
Write your application, they can exist without cache invalidation.

Agreed. Item removed.

Cache invalidation isn't 100% protection before this error message.
With specific using of EXECUTE statement, you can get this message
too. But all temp tables related problems are solved.

Regards
Pavel Stehule

#7Bruce Momjian
bruce@momjian.us
In reply to: Pavel Stehule (#6)
Re: some points for FAQ

Pavel Stehule wrote:

2007/10/9, Bruce Momjian <bruce@momjian.us>:

Pavel Stehule wrote:

4.1)

To SELECT a random row, use:
SELECT col
FROM tab
ORDER BY random()
LIMIT 1;

+ On bigger tables this solution is slow. Please, find smarter
solution on network.

Well, give me a better example that works.

Better universal solution doesn't exist. Exists only unelegant
solutions - but mutch faster.

SELECT id, ...
FROM data
WHERE id = ANY(ARRAY(
SELECT (random()*:max_id)::int
FROM generate_series(1,20)))
LIMIT 1;

max_id is host variable ~ real max id + some

-- fast solution if id is PK of data

Right. We really only want general solutions in the FAQ.

4.19)

+ most of problems with invalid OIDs in cache are solved in PostgreSQL
8.3. Please remeber, so every replanning of SQL statements needs time.
Write your application, they can exist without cache invalidation.

Agreed. Item removed.

Cache invalidation isn't 100% protection before this error message.
With specific using of EXECUTE statement, you can get this message
too. But all temp tables related problems are solved.

OK, let's see how many bug reports we get and we can always re-add it.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#8Pavel Stehule
pavel.stehule@gmail.com
In reply to: Bruce Momjian (#7)
Re: some points for FAQ

2007/10/9, Bruce Momjian <bruce@momjian.us>:

Pavel Stehule wrote:

2007/10/9, Bruce Momjian <bruce@momjian.us>:

Pavel Stehule wrote:

4.1)

To SELECT a random row, use:
SELECT col
FROM tab
ORDER BY random()
LIMIT 1;

+ On bigger tables this solution is slow. Please, find smarter
solution on network.

Well, give me a better example that works.

Better universal solution doesn't exist. Exists only unelegant
solutions - but mutch faster.

SELECT id, ...
FROM data
WHERE id = ANY(ARRAY(
SELECT (random()*:max_id)::int
FROM generate_series(1,20)))
LIMIT 1;

max_id is host variable ~ real max id + some

-- fast solution if id is PK of data

Right. We really only want general solutions in the FAQ.

ok. I accept it. Can be some note there? Not this strange select.

4.19)

+ most of problems with invalid OIDs in cache are solved in PostgreSQL
8.3. Please remeber, so every replanning of SQL statements needs time.
Write your application, they can exist without cache invalidation.

Agreed. Item removed.

Cache invalidation isn't 100% protection before this error message.
With specific using of EXECUTE statement, you can get this message
too. But all temp tables related problems are solved.

OK, let's see how many bug reports we get and we can always re-add it.

It's true :). You have to try really wild things inside plpgsql procedures.

Pavel

#9Bruce Momjian
bruce@momjian.us
In reply to: Pavel Stehule (#8)
Re: some points for FAQ

Pavel Stehule wrote:

Better universal solution doesn't exist. Exists only unelegant
solutions - but mutch faster.

SELECT id, ...
FROM data
WHERE id = ANY(ARRAY(
SELECT (random()*:max_id)::int
FROM generate_series(1,20)))
LIMIT 1;

max_id is host variable ~ real max id + some

-- fast solution if id is PK of data

Right. We really only want general solutions in the FAQ.

ok. I accept it. Can be some note there? Not this strange select.

Well, with 8.3 having this be faster I am thinking we should wait to see
if the hacks are needed.

Cache invalidation isn't 100% protection before this error message.
With specific using of EXECUTE statement, you can get this message
too. But all temp tables related problems are solved.

OK, let's see how many bug reports we get and we can always re-add it.

It's true :). You have to try really wild things inside plpgsql procedures.

Good.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#10Pavel Stehule
pavel.stehule@gmail.com
In reply to: Bruce Momjian (#9)
Re: some points for FAQ

ok. I accept it. Can be some note there? Not this strange select.

Well, with 8.3 having this be faster I am thinking we should wait to see
if the hacks are needed.

difference, on 10K lines (on small think table)

postgres=# select * from test where i = any(array(select
(random()*10000)::int from generate_series(1,20))) limit 1;
i | v
-----+-----
869 | 113
(1 row)

Time: 3,984 ms

postgres=# select * from test order by random() limit 1;
i | v
------+-----
3687 | 293
(1 row)

Time: 21,978 ms

8.2
postgres=# select * from test order by random() limit 1;
i | v
------+-----
4821 | 608
(1 row)

Time: 51,299 ms

postgres=# select * from test where i = any(array(select
(random()*10000)::int from generate_series(1,20))) limit 1;
i | v
-----+-----
762 | 254
(1 row)

Time: 4,530 ms

Results:

8.3 "fast solution' is 6x faster
8.2 'fast solution' is 11x faster .. it's minimum.

Pavel

for me, it's one from typical beginers mistakes

#11Bruce Momjian
bruce@momjian.us
In reply to: Pavel Stehule (#10)
Re: some points for FAQ

Pavel Stehule wrote:

ok. I accept it. Can be some note there? Not this strange select.

Well, with 8.3 having this be faster I am thinking we should wait to see
if the hacks are needed.

difference, on 10K lines (on small think table)

postgres=# select * from test where i = any(array(select
(random()*10000)::int from generate_series(1,20))) limit 1;
i | v
-----+-----
869 | 113
(1 row)

Time: 3,984 ms

postgres=# select * from test order by random() limit 1;
i | v
------+-----
3687 | 293
(1 row)

Time: 21,978 ms

8.2
postgres=# select * from test order by random() limit 1;
i | v
------+-----
4821 | 608
(1 row)

Time: 51,299 ms

postgres=# select * from test where i = any(array(select
(random()*10000)::int from generate_series(1,20))) limit 1;
i | v
-----+-----
762 | 254
(1 row)

Time: 4,530 ms

Results:

8.3 "fast solution' is 6x faster
8.2 'fast solution' is 11x faster .. it's minimum.

OK, how do we even explain this idea in the FAQ. It pulls 20 random
values from 1 to 10000? That seems pretty hard to code to me. Where do
you get the 10000 number from? How do you know you will hit a match in
20 tries?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#12Pavel Stehule
pavel.stehule@gmail.com
In reply to: Bruce Momjian (#11)
Re: some points for FAQ

OK, how do we even explain this idea in the FAQ. It pulls 20 random
values from 1 to 10000? That seems pretty hard to code to me. Where do
you get the 10000 number from? How do you know you will hit a match in
20 tries?

Number 10000 you have to store in application .. it's magic constant.
It similar our statistics. And sometimes you have to actualise it.
This is stochastic methods, so it's possible so it doesn't return any
value, and you have to repeat it. Using this method expect knowledge
about generating random numbers. This method is far to ideal, but on
databases with big traffic only this is usable.

Pavel

#13Bruce Momjian
bruce@momjian.us
In reply to: Pavel Stehule (#12)
Re: some points for FAQ

Pavel Stehule wrote:

OK, how do we even explain this idea in the FAQ. It pulls 20 random
values from 1 to 10000? That seems pretty hard to code to me. Where do
you get the 10000 number from? How do you know you will hit a match in
20 tries?

Number 10000 you have to store in application .. it's magic constant.
It similar our statistics. And sometimes you have to actualise it.
This is stochastic methods, so it's possible so it doesn't return any
value, and you have to repeat it. Using this method expect knowledge
about generating random numbers. This method is far to ideal, but on
databases with big traffic only this is usable.

OK, but this is clearly something I can't just throw into the FAQ and
expect people to figure it out, and going into major detail to explain
it in the FAQ isn't logical either.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +