Row ordering after CREATE TABLE AS...SELECT regexp_split_to_table(source_text, regexp) AS new_column
This is a two-part question:
1) I have a source_text that I want to divide into smaller subunits
that will be contained in rows in a column in a new table. Is it
absolutely certain that the initial order of the rows in the resultant
table after this operation:
CREATE TABLE new_table AS SELECT regexp_split_to_table(source_text,
E'regexp') as subunits FROM source_table;
will be the same as the order of these subunits in the original text?
Emphasis *initial order*.
2) I would like to be able to create a serial-type column during
CREATE TABLE AS in the new table that "memorizes" this order so that I
can reconstruct the original text using ORDER BY on that serial
column. However, I am stumped how to do that. I do not see how to
put the name of that column into my SELECT statement which generates
the table, and I do not see where else to put it. Please forgive my
stupidity.
The "work-around" to this problem has been to ALTER my table after its
creation with a new serial-type column. But this assumes that the
answer to Question 1) above is always "Yes".
Thanking you for your understanding,
John
On Wed, Feb 24, 2010 at 07:51:54AM +0100, John Gage wrote:
This is a two-part question:
1) I have a source_text that I want to divide into smaller subunits
that will be contained in rows in a column in a new table. Is it
absolutely certain that the initial order of the rows in the
resultant table after this operation:CREATE TABLE new_table AS SELECT regexp_split_to_table(source_text,
E'regexp') as subunits FROM source_table;will be the same as the order of these subunits in the original
text? Emphasis *initial order*.
I'd put money on not; this is not what databases are designed for.
2) I would like to be able to create a serial-type column during
CREATE TABLE AS in the new table that "memorizes" this order so that
I can reconstruct the original text using ORDER BY on that serial
column. However, I am stumped how to do that. I do not see how to
put the name of that column into my SELECT statement which generates
the table, and I do not see where else to put it. Please forgive my
stupidity.
Pre- or append an increasing serial number to the data, and use that
as a column named "initial_order" or something else that will make
it clear to you and other users what it is, and then import.
But if you have the original data, in order, why do you need to be
able to reconstruct it from a database dump? It just looks like
adding a step to add a step, to me.
Show quoted text
The "work-around" to this problem has been to ALTER my table after
its creation with a new serial-type column. But this assumes that
the answer to Question 1) above is always "Yes".Thanking you for your understanding,
John
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
John Gage <jsmgage@numericable.fr> writes:
This is a two-part question:
1) I have a source_text that I want to divide into smaller subunits
that will be contained in rows in a column in a new table. Is it
absolutely certain that the initial order of the rows in the resultant
table after this operation:
CREATE TABLE new_table AS SELECT regexp_split_to_table(source_text,
E'regexp') as subunits FROM source_table;
will be the same as the order of these subunits in the original text?
If you have a version new enough to have synchronize_seqscans, you'd
need to turn that off. Otherwise should be OK.
2) I would like to be able to create a serial-type column during
CREATE TABLE AS in the new table that "memorizes" this order so that I
can reconstruct the original text using ORDER BY on that serial
column. However, I am stumped how to do that.
I think the trick is to get the SRF to be expanded before the serial
values are assigned. There's more than one way to do it, but I think
(too tired to experiment) this would work:
CREATE TABLE new_table (id serial, subunits text);
INSERT INTO new_table(subunits) SELECT regexp_split_to_table(source_text,
E'regexp') FROM source_table;
regards, tom lane
Thank you very much for this explanation/reply. It precisely answers
my question.
Unfortunately, it prompts a new question. I am using 8.4.2 which I
assume is new enough to trigger a "yes" response to "If you have a
version new enough to have synchronize_seqscans...". I have
absolutely no idea how to turn that off. Perhaps the best thing would
be to direct me to the documentation where turning it off is described
so that I can become more autonomous. However, accompanying that with
explicit directions would be welcome too.
I am in Greenwich +1 timezone, but I fear you are in the 2AM time
zone. Thank you again,
John
On Feb 24, 2010, at 8:06 AM, Tom Lane wrote:
Show quoted text
John Gage <jsmgage@numericable.fr> writes:
This is a two-part question:
1) I have a source_text that I want to divide into smaller subunits
that will be contained in rows in a column in a new table. Is it
absolutely certain that the initial order of the rows in the
resultant
table after this operation:CREATE TABLE new_table AS SELECT regexp_split_to_table(source_text,
E'regexp') as subunits FROM source_table;will be the same as the order of these subunits in the original text?
If you have a version new enough to have synchronize_seqscans, you'd
need to turn that off. Otherwise should be OK.2) I would like to be able to create a serial-type column during
CREATE TABLE AS in the new table that "memorizes" this order so
that I
can reconstruct the original text using ORDER BY on that serial
column. However, I am stumped how to do that.I think the trick is to get the SRF to be expanded before the serial
values are assigned. There's more than one way to do it, but I think
(too tired to experiment) this would work:CREATE TABLE new_table (id serial, subunits text);
INSERT INTO new_table(subunits) SELECT
regexp_split_to_table(source_text,
E'regexp') FROM source_table;regards, tom lane
John Gage wrote:
Unfortunately, it prompts a new question. I am using 8.4.2 which I
assume is new enough to trigger a "yes" response to "If you have a
version new enough to have synchronize_seqscans...". I have
absolutely no idea how to turn that off. Perhaps the best thing
would be to direct me to the documentation where turning it off is
described so that I can become more autonomous. However,
accompanying that with explicit directions would be welcome too.
See postgresql.conf, but you probably want to leave it turned on in
general and turn it off only for the specific case of this usage.
(Using the SET command, or ALTER ROLE, or ALTER DATABASE).
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support