Text search dictionary vs. the C locale

Started by twofloweralmost 9 years ago11 messagesgeneral

standa.kurik@gmail.com

almost 9 years ago

I am having problems creating an Ispell-based text search dictionary for
Czech language.

Issuing the following command:

create text search dictionary czech_ispell (
  template = ispell,
  dictfile = czech_ispell,
  affFile = czech_ispell
);

ends with

ERROR: syntax error
CONTEXT: line 252 of configuration file
"/usr/share/postgresql/9.6/tsearch_data/czech_ispell.affix": " . > TŘIA

The dictionary files are in UTF-8. The database cluster was initialized with

initdb --locale=C --encoding=UTF8

When, on the other hand, I initialize it with

initdb --locale=en_US.UTF8

it works.

I was hoping I could have the C locale with the UTF-8 encoding but it seems
non-ASCII text search dictionaries are not supported in that case. This is a
shame as restoring the dumps goes from 1.5 hour (with the C locale) to 9.5
hours (with en_US.UTF8).

--
View this message in context: http://www.postgresql-archive.org/Text-search-dictionary-vs-the-C-locale-tp5969677.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

twoflower

standa.kurik@gmail.com

almost 9 years ago

In reply to: twoflower (#1)

Re: Text search dictionary vs. the C locale

Initializing the cluster with

initdb
  --locale=C
  --lc-ctype=en_US.UTF-8
  --lc-messages=en_US.UTF-8
  --lc-monetary=en_US.UTF-8
  --lc-numeric=en_US.UTF-8
  --lc-time=en_US.UTF-8
  --encoding=UTF8

allows me to use my text search dictionary. Now it only remains to see
whether index creation will be still fast (I suspect it should) and if it
doesn't have any other unintended consequences (e.g. in pattern matching
which we use a lot).

--
View this message in context: http://www.postgresql-archive.org/Text-search-dictionary-vs-the-C-locale-tp5969677p5969678.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Tom Lane

tgl@sss.pgh.pa.us

almost 9 years ago

In reply to: twoflower (#1)

Re: Text search dictionary vs. the C locale

twoflower <standa.kurik@gmail.com> writes:

I am having problems creating an Ispell-based text search dictionary for
Czech language.

Issuing the following command:

create text search dictionary czech_ispell (
template = ispell,
dictfile = czech_ispell,
affFile = czech_ispell
);

ends with

ERROR: syntax error
CONTEXT: line 252 of configuration file
"/usr/share/postgresql/9.6/tsearch_data/czech_ispell.affix": " . > TŘIA

The dictionary files are in UTF-8. The database cluster was initialized with
initdb --locale=C --encoding=UTF8

Presumably the problem is that the dictionary file parsing functions
reject anything that doesn't satisfy t_isalpha() (unless it matches
t_isspace()) and in C locale that's not going to accept very much.

I wonder why we're doing it like that. It seems like it'd often be
useful to load dictionary files that don't match the database's
prevailing locale. Do we really need the t_isalpha tests, or would
it be good enough to assume that anything that isn't t_isspace is
part of a word?

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Rob Sargent

robjsargent@gmail.com

almost 9 years ago

In reply to: Tom Lane (#3)

Re: Text search dictionary vs. the C locale

Sent from my iPad

On Jul 2, 2017, at 10:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

twoflower <standa.kurik@gmail.com> writes:

I am having problems creating an Ispell-based text search dictionary for
Czech language.

Issuing the following command:

create text search dictionary czech_ispell (
template = ispell,
dictfile = czech_ispell,
affFile = czech_ispell
);

ends with

ERROR: syntax error
CONTEXT: line 252 of configuration file
"/usr/share/postgresql/9.6/tsearch_data/czech_ispell.affix": " . > TŘIA

The dictionary files are in UTF-8. The database cluster was initialized with
initdb --locale=C --encoding=UTF8

Presumably the problem is that the dictionary file parsing functions
reject anything that doesn't satisfy t_isalpha() (unless it matches
t_isspace()) and in C locale that's not going to accept very much.

I wonder why we're doing it like that. It seems like it'd often be
useful to load dictionary files that don't match the database's
prevailing locale. Do we really need the t_isalpha tests, or would
it be good enough to assume that anything that isn't t_isspace is
part of a word?

regards, tom lane

What about punctuation?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Rob Sargent

robjsargent@gmail.com

almost 9 years ago

In reply to: Tom Lane (#3)

Re: Text search dictionary vs. the C locale

On Jul 2, 2017, at 10:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

twoflower <standa.kurik@gmail.com> writes:

I am having problems creating an Ispell-based text search dictionary for
Czech language.

Issuing the following command:

create text search dictionary czech_ispell (
template = ispell,
dictfile = czech_ispell,
affFile = czech_ispell
);

ends with

ERROR: syntax error
CONTEXT: line 252 of configuration file
"/usr/share/postgresql/9.6/tsearch_data/czech_ispell.affix": " . > TŘIA

The dictionary files are in UTF-8. The database cluster was initialized with
initdb --locale=C --encoding=UTF8

Presumably the problem is that the dictionary file parsing functions
reject anything that doesn't satisfy t_isalpha() (unless it matches
t_isspace()) and in C locale that's not going to accept very much.

I wonder why we're doing it like that. It seems like it'd often be
useful to load dictionary files that don't match the database's
prevailing locale. Do we really need the t_isalpha tests, or would
it be good enough to assume that anything that isn't t_isspace is
part of a word?

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Apologies for truncating entire body of replied-to post

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Jerry Regan

jerry.regan@concertoglobalresources.com

almost 9 years ago

In reply to: Tom Lane (#3)

Using 'WITH SELECT' Results Do Not Match 'SELECT FROM <table>' Results

For reasons beyond my control, I am using Postgresql 9.4 on a MacBookPro (development system). I use pgadminIII and psql for clients (I tried and didn’t like the pgadmin4 UI; pg_dump, pg_restore also seem to be dumbed down).

My question:
I have some performance test results in table cor_duration_report. One column, c_entered_ion is of type timestamptz, another, c_scenario, is of type text. I want to calculate the difference between succeeding c_entered_ion rows to learn the rate at which entry events occur. In cor_duration_report, c_entered_ion columns are NOT in ascending sort order.

For a first attempt, I created another table cor_temp_gap as:
CREATE TABLE cor_temp_gap
(
c_id serial NOT NULL,
c_entered_ion timestamp with time zone NOT NULL,
c_scenario text NOT NULL
)
WITH (
OIDS=FALSE
);

and loaded it with:
INSERT into cor_temp_gap (c_entered_ion, c_scenario) SELECT c_entered_ion, c_scenario from cor_duration_report order by c_entered_ion;

The c_id column is loaded with the default value - the next sequence value.

I then generated my report with:
select count( gap ) as gaps, sum(gap) as sum,
mode() within group (order by gap) as mode,
percentile_disc(0.5) within group (order by gap) as median,
avg( gap::integer ) as mean,
min( gap ) as min,
max( gap ) as max
from ( select extract( epoch from ( f.c_entered_ion - s.c_entered_ion)::interval) * 1000 as gap
from cor_temp_gap s, cor_temp_gap f
where s.c_scenario = '20170628tc04'
and s.c_id+1 = f.c_id ) vals;

This seems to give me the results I want:
gaps | sum | mode | median | mean | min | max
------+---------+------+--------+-----------------------+-----+--------
307412 | 6872207 | 1 | 8 | 22.3550381897908995 | 0 | 10846

The min value of zero is accurate. The mode value of 1 is reasonable, as is the median value of 8. Using a totally different method, the mean value is accurate, as is gaps (there are 307,413 rows in the table).

I do know enough sql to believe my cor_temp_gap table could probably be replace by a ‘WITH SELECT….’

I attempted this:
with cor_entry_time as ( select nextval('cor_temp_select_c_id_seq') as c_id, c_entered_ion, c_scenario
from cor_duration_report where c_scenario = '20170628tc04' order by c_entered_ion )
select count( gap ) as gaps,
sum(gap::integer) as sum,
mode() within group (order by gap) as mode,
percentile_disc(0.5) within group (order by gap) as median,
avg( gap::integer ) as mean,
min( gap::integer ) as min,
max( gap::integer ) as max
from ( select extract( epoch from ( f.c_entered_ion - s.c_entered_ion)::interval) * 1000 as gap
from cor_entry_time s, cor_entry_time f
where s.c_id+1 = f.c_id ) vals;

which returned:

gaps | sum | mode | median | mean | min | max
--------+---------+------+--------+---------------------+----------+-------
307412 | 6867802 | 0 | 461 | 22.3407088857949592 | -6871881 | 59791

The results should match but obviously they don’t. the ‘mode’, ‘median’, ‘min’ and ‘max’ columns are clearly different. The ‘sum’ and ‘mean’ columns are similar but are also different. Only ‘gaps’ is the same. There should be no negative numbers at all, assuming my c_entered_ion column is in ascending order. Wouldn’t the 'order by c_entered_ion’ in the ‘WITH’ select do that?

To me, the differences between loading ‘cor_temp_gaps’ and my ‘WITH’ select should not cause the difference. The differences are in how ‘c_id’ is generated and the fact that selecting only ‘c_scenario = ‘20170628tc04’ has been moved from calculating the interval to the ‘WITH’ select. I have also tried the ‘WITH SELECT’ approach without moving that test and received the same results.

My suspicion is that in the failing approach, my sequence is being assigned before the sort whereas when I load ‘cor_temp_gap’, and c_id defaults to a sequence then c_id is generated AFTER c_entered_ion is put in sort order.

If my suspicion is right, how do accomplish the same thing in the ‘WITH SELECT’ case?

If it is wrong, what am I doing wrong?

Thanks for any insights you may be able to provide!

/s/jr
Consultant
Concerto GR
Mobile: 612.208.6601

Concerto - a composition for orchestra and a soloist

twoflower

standa.kurik@gmail.com

almost 9 years ago

In reply to: Tom Lane (#3)

Re: Text search dictionary vs. the C locale

Tom Lane-2 wrote

Presumably the problem is that the dictionary file parsing functionsreject
anything that doesn't satisfy t_isalpha() (unless it matchest_isspace())
and in C locale that's not going to accept very much.

That's what I also guessed and the fact that setting lc-ctype=en_US.UTF-8
makes it work confirms it, I think.

--
View this message in context: http://www.postgresql-archive.org/Text-search-dictionary-vs-the-C-locale-tp5969677p5969703.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Adrian Klaver

adrian.klaver@aklaver.com

almost 9 years ago

In reply to: Jerry Regan (#6)

Re: Using 'WITH SELECT' Results Do Not Match 'SELECT FROM <table>' Results

On 07/02/2017 10:33 AM, Jerry Regan wrote:

For reasons beyond my control, I am using Postgresql 9.4 on a MacBookPro
(development system). I use pgadminIII and psql for clients (I tried and
didn’t like the pgadmin4 UI; pg_dump, pg_restore also seem to be dumbed
down).

My question:
I have some performance test results in table cor_duration_report. One
column, c_entered_ion is of type timestamptz, another, c_scenario, is of
type text. I want to calculate the difference between succeeding
c_entered_ion rows to learn the rate at which entry events occur. In
cor_duration_report, c_entered_ion columns are NOT in ascending sort order.

For a first attempt, I created another table cor_temp_gap as:

CREATE TABLE cor_temp_gap
(
c_id serial NOT NULL,
c_entered_ion timestamp with time zone NOT NULL,
c_scenario text NOT NULL
)
WITH (
OIDS=FALSE
);

and loaded it with:

INSERT into cor_temp_gap (c_entered_ion, c_scenario) SELECT
c_entered_ion, c_scenario from cor_duration_report order by
c_entered_ion;

The c_id column is loaded with the default value - the next sequence value.

I then generated my report with:

select count( gap ) as gaps, sum(gap) as sum,
mode() within group (order by gap) as mode,
percentile_disc(0.5) within group (order by gap) as median,
avg( gap::integer ) as mean,
min( gap ) as min,
max( gap ) as max
from ( select extract( epoch from ( f.c_entered_ion -
s.c_entered_ion)::interval) * 1000 as gap
from cor_temp_gap s, cor_temp_gap f
where s.c_scenario = '20170628tc04'
and s.c_id+1 = f.c_id ) vals;

This seems to give me the results I want:

gaps | sum | mode | median | mean
| min | max
------+---------+------+--------+-----------------------+-----+--------
307412 | 6872207 | 1 | 8 | 22.3550381897908995
| 0 | 10846

The min value of zero is accurate. The mode value of 1 is reasonable, as
is the median value of 8. Using a totally different method, the mean
value is accurate, as is gaps (there are 307,413 rows in the table).

I do know enough sql to believe my cor_temp_gap table could probably be
replace by a ‘WITH SELECT….’

I attempted this:

with cor_entry_time as ( select nextval('cor_temp_select_c_id_seq')
as c_id, c_entered_ion, c_scenario
from cor_duration_report where c_scenario = '20170628tc04' order by
c_entered_ion )
select count( gap ) as gaps,
sum(gap::integer) as sum,
mode() within group (order by gap) as mode,
percentile_disc(0.5) within group (order by gap) as median,
avg( gap::integer ) as mean,
min( gap::integer ) as min,
max( gap::integer ) as max
from ( select extract( epoch from ( f.c_entered_ion -
s.c_entered_ion)::interval) * 1000 as gap
from cor_entry_time s, cor_entry_time f
where s.c_id+1 = f.c_id ) vals;

I used this site to reformat the above:

http://sqlformat.darold.net/

WITH cor_entry_time AS (
SELECT
nextval('cor_temp_select_c_id_seq') AS c_id,
c_entered_ion,
c_scenario
FROM
cor_duration_report
WHERE
c_scenario = '20170628tc04'
ORDER BY
c_entered_ion
)
SELECT
count(gap) AS gaps,
sum(gap::INTEGER) AS SUM,
MODE ()
WITHIN
GROUP (
ORDER BY
gap) AS MODE,
percentile_disc (0.5)
WITHIN
GROUP (
ORDER BY
gap) AS median,
avg(gap::INTEGER) AS mean,
min(gap::INTEGER) AS MIN,
max(gap::INTEGER) AS MAX
FROM (
SELECT
extract(EPOCH
FROM (f.c_entered_ion - s.c_entered_ion)::interval) * 1000 AS gap
FROM
cor_entry_time s,
cor_entry_time f
WHERE
s.c_id + 1 = f.c_id) vals;

Still have not figured out everything that is going on above, but it
gave me a fighting chance:)

which returned:

gaps | sum | mode | median | mean
| min | max
--------+---------+------+--------+---------------------+----------+-------
307412 | 6867802 | 0 | 461 | 22.3407088857949592 |
-6871881 | 59791

The results should match but obviously they don’t. the ‘mode’, ‘median’,
‘min’ and ‘max’ columns are clearly different. The ‘sum’ and ‘mean’
columns are similar but are also different. Only ‘gaps’ is the same.
There should be no negative numbers at all, assuming my c_entered_ion
column is in ascending order. Wouldn’t the 'order by c_entered_ion’ in
the ‘WITH’ select do that?

I believe you are going to have to move the ORDER BY to outside the
WITH. If I am following correctly:

FROM (
SELECT
extract(EPOCH
FROM (f.c_entered_ion - s.c_entered_ion)::interval) * 1000 AS gap
FROM
cor_entry_time s,
cor_entry_time f
WHERE
s.c_id + 1 = f.c_id
ORDER BY
s.c__entered_ion
) vals;

You can remove the:

ORDER BY
c_entered_ion

in the WITH.

To me, the differences between loading ‘cor_temp_gaps’ and my ‘WITH’
select should not cause the difference. The differences are in how
‘c_id’ is generated and the fact that selecting only ‘c_scenario =
‘20170628tc04’ has been moved from calculating the interval to the
‘WITH’ select. I have also tried the ‘WITH SELECT’ approach without
moving that test and received the same results.

My suspicion is that in the failing approach, my sequence is being
assigned before the sort whereas when I load ‘cor_temp_gap’, and c_id
defaults to a sequence then c_id is generated AFTER c_entered_ion is put
in sort order.

If my suspicion is right, how do accomplish the same thing in the ‘WITH
SELECT’ case?

If it is wrong, what am I doing wrong?

Thanks for any insights you may be able to provide!

/s/jr
Consultant
Concerto GR
Mobile: 612.208.6601

Concerto - a composition for orchestra and a soloist

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Jerry Regan

jerry.regan@concertoglobalresources.com

almost 9 years ago

In reply to: Adrian Klaver (#8)

Re: Using 'WITH SELECT' Results Do Not Match 'SELECT FROM <table>' Results

Adrian,

Thank you for your reply!

I apologize in advance for not being detailed below. Hard to do from my phone.

I did have to move the 'ORDER BY', but not outside the 'WITH'. My first workaround parenthesized the select containing the 'ORDER BY', forcing it to be evaluated before the 'INSERT'. That worked.

But I never liked using a sequence for the c_id column. And using the sequence on my personal workstation was maybe safe, but given that sequences not are guaranteed to be without gaps, that was not very portable.

So I searched a bit and found I could use 'row_number()' instead. That approach allowed me to use the 'ORDER BY' required by 'row_number()'.

That worked and is far more portable to other postgresql instances.

I really do appreciate your response. It is also my nature to continue my research even after asking for help. However I find my answer, one validates the other.

Thanks again!

/s/jr
Sent from my iPhone

On Jul 3, 2017, at 18:21, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 07/02/2017 10:33 AM, Jerry Regan wrote:
For reasons beyond my control, I am using Postgresql 9.4 on a MacBookPro (development system). I use pgadminIII and psql for clients (I tried and didn’t like the pgadmin4 UI; pg_dump, pg_restore also seem to be dumbed down).
My question:
I have some performance test results in table cor_duration_report. One column, c_entered_ion is of type timestamptz, another, c_scenario, is of type text. I want to calculate the difference between succeeding c_entered_ion rows to learn the rate at which entry events occur. In cor_duration_report, c_entered_ion columns are NOT in ascending sort order.
For a first attempt, I created another table cor_temp_gap as:
CREATE TABLE cor_temp_gap
(
c_id serial NOT NULL,
c_entered_ion timestamp with time zone NOT NULL,
c_scenario text NOT NULL
)
WITH (
OIDS=FALSE
);
and loaded it with:
INSERT into cor_temp_gap (c_entered_ion, c_scenario) SELECT
c_entered_ion, c_scenario from cor_duration_report order by
c_entered_ion;
The c_id column is loaded with the default value - the next sequence value.
I then generated my report with:
select count( gap ) as gaps, sum(gap) as sum,
mode() within group (order by gap) as mode,
percentile_disc(0.5) within group (order by gap) as median,
avg( gap::integer ) as mean,
min( gap ) as min,
max( gap ) as max
from ( select extract( epoch from ( f.c_entered_ion -
s.c_entered_ion)::interval) * 1000 as gap
from cor_temp_gap s, cor_temp_gap f
where s.c_scenario = '20170628tc04'
and s.c_id+1 = f.c_id ) vals;
This seems to give me the results I want:
gaps | sum | mode | median | mean | min | max
------+---------+------+--------+-----------------------+-----+--------
307412 | 6872207 | 1 | 8 | 22.3550381897908995
| 0 | 10846
The min value of zero is accurate. The mode value of 1 is reasonable, as is the median value of 8. Using a totally different method, the mean value is accurate, as is gaps (there are 307,413 rows in the table).
I do know enough sql to believe my cor_temp_gap table could probably be replace by a ‘WITH SELECT….’
I attempted this:
with cor_entry_time as ( select nextval('cor_temp_select_c_id_seq')
as c_id, c_entered_ion, c_scenario
from cor_duration_report where c_scenario = '20170628tc04' order by
c_entered_ion )
select count( gap ) as gaps,
sum(gap::integer) as sum,
mode() within group (order by gap) as mode,
percentile_disc(0.5) within group (order by gap) as median,
avg( gap::integer ) as mean,
min( gap::integer ) as min,
max( gap::integer ) as max
from ( select extract( epoch from ( f.c_entered_ion -
s.c_entered_ion)::interval) * 1000 as gap
from cor_entry_time s, cor_entry_time f
where s.c_id+1 = f.c_id ) vals;

I used this site to reformat the above:

http://sqlformat.darold.net/

WITH cor_entry_time AS (
SELECT
nextval('cor_temp_select_c_id_seq') AS c_id,
c_entered_ion,
c_scenario
FROM
cor_duration_report
WHERE
c_scenario = '20170628tc04'
ORDER BY
c_entered_ion
)
SELECT
count(gap) AS gaps,
sum(gap::INTEGER) AS SUM,
MODE ()
WITHIN
GROUP (
ORDER BY
gap) AS MODE,
percentile_disc (0.5)
WITHIN
GROUP (
ORDER BY
gap) AS median,
avg(gap::INTEGER) AS mean,
min(gap::INTEGER) AS MIN,
max(gap::INTEGER) AS MAX
FROM (
SELECT
extract(EPOCH
FROM (f.c_entered_ion - s.c_entered_ion)::interval) * 1000 AS gap
FROM
cor_entry_time s,
cor_entry_time f
WHERE
s.c_id + 1 = f.c_id) vals;

Still have not figured out everything that is going on above, but it gave me a fighting chance:)

which returned:
gaps | sum | mode | median | mean | min | max
--------+---------+------+--------+---------------------+----------+-------
307412 | 6867802 | 0 | 461 | 22.3407088857949592 |
-6871881 | 59791
The results should match but obviously they don’t. the ‘mode’, ‘median’, ‘min’ and ‘max’ columns are clearly different. The ‘sum’ and ‘mean’ columns are similar but are also different. Only ‘gaps’ is the same. There should be no negative numbers at all, assuming my c_entered_ion column is in ascending order. Wouldn’t the 'order by c_entered_ion’ in the ‘WITH’ select do that?

I believe you are going to have to move the ORDER BY to outside the WITH. If I am following correctly:

FROM (
SELECT
extract(EPOCH
FROM (f.c_entered_ion - s.c_entered_ion)::interval) * 1000 AS gap
FROM
cor_entry_time s,
cor_entry_time f
WHERE
s.c_id + 1 = f.c_id
ORDER BY
s.c__entered_ion
) vals;

You can remove the:

ORDER BY
c_entered_ion

in the WITH.

To me, the differences between loading ‘cor_temp_gaps’ and my ‘WITH’ select should not cause the difference. The differences are in how ‘c_id’ is generated and the fact that selecting only ‘c_scenario = ‘20170628tc04’ has been moved from calculating the interval to the ‘WITH’ select. I have also tried the ‘WITH SELECT’ approach without moving that test and received the same results.
My suspicion is that in the failing approach, my sequence is being assigned before the sort whereas when I load ‘cor_temp_gap’, and c_id defaults to a sequence then c_id is generated AFTER c_entered_ion is put in sort order.
If my suspicion is right, how do accomplish the same thing in the ‘WITH SELECT’ case?
If it is wrong, what am I doing wrong?
Thanks for any insights you may be able to provide!
/s/jr
Consultant
Concerto GR
Mobile: 612.208.6601
Concerto - a composition for orchestra and a soloist

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10

Adrian Klaver

adrian.klaver@aklaver.com

almost 9 years ago

In reply to: Jerry Regan (#9)

Re: Using 'WITH SELECT' Results Do Not Match 'SELECT FROM <table>' Results

On 07/03/2017 05:20 PM, Jerry Regan wrote:

Adrian,

Thank you for your reply!

I apologize in advance for not being detailed below. Hard to do from my phone.

I did have to move the 'ORDER BY', but not outside the 'WITH'. My first workaround parenthesized the select containing the 'ORDER BY', forcing it to be evaluated before the 'INSERT'. That worked.

Not sure where the INSERT comes into the picture, but glad you got it
working.

But I never liked using a sequence for the c_id column. And using the sequence on my personal workstation was maybe safe, but given that sequences not are guaranteed to be without gaps, that was not very portable.

Yeah, that concerned me also, still I figured one problem at a time.

So I searched a bit and found I could use 'row_number()' instead. That approach allowed me to use the 'ORDER BY' required by 'row_number()'.

That worked and is far more portable to other postgresql instances.

I really do appreciate your response. It is also my nature to continue my research even after asking for help. However I find my answer, one validates the other.

Sometimes it just a matter a little push to get out of the rut:)

Thanks again!

/s/jr
Sent from my iPhone

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11

Jerry Regan

jerry.regan@concertoglobalresources.com

almost 9 years ago

In reply to: Adrian Klaver (#10)

Re: Using 'WITH SELECT' Results Do Not Match 'SELECT FROM <table>' Results

Adrian,

Working from my phone wasn't such a good idea!

When I said 'INSERT' I meant 'WITH'. My excuse is that the 'WITH' statement is building a temporary table ( at least logically ) so there is at least an implicit 'INSERT' there.

/s/jr
Sent from my iPhone

On Jul 3, 2017, at 23:12, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 07/03/2017 05:20 PM, Jerry Regan wrote:
Adrian,
Thank you for your reply!
I apologize in advance for not being detailed below. Hard to do from my phone.
I did have to move the 'ORDER BY', but not outside the 'WITH'. My first workaround parenthesized the select containing the 'ORDER BY', forcing it to be evaluated before the 'INSERT'. That worked.

Not sure where the INSERT comes into the picture, but glad you got it working.

But I never liked using a sequence for the c_id column. And using the sequence on my personal workstation was maybe safe, but given that sequences not are guaranteed to be without gaps, that was not very portable.

Yeah, that concerned me also, still I figured one problem at a time.

So I searched a bit and found I could use 'row_number()' instead. That approach allowed me to use the 'ORDER BY' required by 'row_number()'.
That worked and is far more portable to other postgresql instances.
I really do appreciate your response. It is also my nature to continue my research even after asking for help. However I find my answer, one validates the other.

Sometimes it just a matter a little push to get out of the rut:)

Thanks again!
/s/jr
Sent from my iPhone

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general