linked list rewrite
I'd like to wrap up the linked list rewrite, and get the code into CVS
reasonably soon. If you're unfamiliar with the motivations for
redesigning the linked list code, check the archives for previous
discussions, such as:
http://www.mail-archive.com/pgsql-patches@postgresql.org/msg02204.html
http://www.mail-archive.com/pgsql-patches@postgresql.org/msg01320.html
The basic idea is to ditch the Lisp-style linked list representation
(in which a "list" is merely a pointer to the head node), and adopt a
new List struct like so:
typedef struct List
{
NodeTag type; /* T_List, T_IntList, or T_OidList */
int length;
ListCell *head;
ListCell *tail;
} List;
This allows us to implement most of the important list operations
(length, prepend, append and concatenation) with constant time
complexity, and speed up some other operations (such as equality).
The known remaining issues that need to be addressed are:
(1) API naming
In the latest prototype implementation, I've adopted a completely new
naming convention for the functions in the list API. The new API looks
like:
List *list_append(List *list, void *datum);
List *list_prepend(List *list, void *datum);
List *list_conc(List *list1, List *list2);
bool list_member(List *list, void *datum);
List *list_remove(List *list, void *datum);
etc.
Function variants which operate on integer lists (type = T_IntList)
have an "_int" suffix, whereas OID list functions use an "_oid" suffix.
By default, list operations use structural equality (i.e. equal()) --
if a list function uses pointer equality to determine if two list
elements are the same, it has the "_simple" suffix. And so on.
In contrast, the existing List API is a lot more inconsistent (IMHO). I
elaborated on some of the deficiencies of the existing API naming
scheme here:
http://archives.postgresql.org/pgsql-patches/2004-01/msg00188.php
Tom objected to changing the names:
http://archives.postgresql.org/pgsql-patches/2004-01/msg00186.php
http://archives.postgresql.org/pgsql-patches/2004-01/msg00191.php
(I won't summarize Tom's cogent points here so as not to put words in
his mouth; however, please read those posts.)
I think changing the list API names is a good idea for a few reasons.
First and foremost, it improves the readability of the code and the
consistency of the list API. Second, one objection to changing the API
names is that it requires more changes to the rest of the tree. While
that's indisputable, I think we'll realistically need to visit the vast
majority of the List API call sites in order to implement the list
rewrite anyway, so in the long run I don't think changing the API names
will be that much extra work. Third, it has the benefit of cleanly and
totally breaking existing code that uses the List API, so that we don't
miss any List API call sites whose semantics have subtly changed (e.g.
a switch of a particular function from structural equality to pointer
equality).
So, would people prefer that we keep the old API names, or adopt the
new naming conventions?
(2) Remaining implementation work
The code has drifted a fair bit underneath the latest list rewrite
patch (which was incomplete anyway). It's unfortunately been a little
while since I've looked at the patch, but I don't recall there being
any showstopping problems that needed to be resolved --- it's just a
matter of wrapping up remaining loose ends. I'm going to get started on
this on today.
(3) Apply the work to CVS, and update the rest of the tree for the new
API
The amount of integration work that needs to be done partially depends
on the resolution to #1, but either way the list rewrite will require a
lot of (relatively minor) changes scattered throughout the tree. What
is the best way to land these changes in HEAD with minimal disturbance
to other developers?
One possibility is to create a private CVS branch, implement any
necessary changes throughout the source tree and test all those
changes, and then merge the branch into HEAD once it is ready. This
would still break outstanding patches, as Tom has pointed out, but it
might reduce the disturbance to other developers. It doesn't really
matter to me whether this is done or not (I can do my own revision
control locally if there's a need for it), so speak up if you think
this would be worth doing.
Another possibility is to create some wrapper macros / functions that
would insulate the changes in the list API, so that we could make the
changes incrementally. Not sure how viable this is.
That's about it. Does anyone know of anything else that will need to be
tackled before we can merge the list rewrite?
Cheers,
Neil
Re changing APIs or not.
I agree with Tom that an incremental change is easier. More
importantly, it is also easier to test your implementation.
Even if everybody agrees that the API should be changed, IMO a better
way would be to first use your implementation with the old API and go
through some thorough testing to make sure it is AOK. Once we are
certain (or as certain as we can be) about that, you can changeover to
the new API. This way, when bugs show up we won't be left wondering if
its because of the new API interacting with some hidden assumption at
the call-sites or if it is only the implementation.
One step at a time seems safer to me.
I would think a new CVS branch would be the right way to merge your
stuff especially if you are going with the API change route. That way
you won't have to lock the tip and can incrementally work with a
broken branch until it is fine (and keep re-merging).
Which brings me to another question .. has anybody considered using
subversion instead of CVS ?
--
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh
Neil Conway <neilc@samurai.com> writes:
The known remaining issues that need to be addressed are:
(1) API naming
I'll keep my mouth shut about this one until some other people have had
a chance to weigh in...
(3) Apply the work to CVS, and update the rest of the tree for the new
API
The amount of integration work that needs to be done partially depends
on the resolution to #1, but either way the list rewrite will require a
lot of (relatively minor) changes scattered throughout the tree. What
is the best way to land these changes in HEAD with minimal disturbance
to other developers?
I'm not especially excited about the CVS-branch idea; I think that
you'll end up with a major merge headache, unless the branch lives for
such a short time that it'd hardly be worth doing anyway. As I said
before, I'm willing to help out with the grunt-labor part of this so
that the change can be completed more swiftly. If I'm doing that
instead of other stuff, that's one large source of code drift removed ;-)
What I would like to do is to set up compatibility macros along the
lines I suggested before, so that existing code will compile and work
as much as possible. I am thinking that we'd actually leave the
compatibility macros in place indefinitely, but protected with a define
("#ifdef OLD_LIST_API" or some such). This would make it easier for
people to deal with porting user-written code. There are plenty of
scenarios where one would like a loadable module's source to compile
as-is against multiple backend versions, and that will be impossible
for anything that uses lists if we don't provide some hack like this.
The number of macros needed obviously depends on whether we rename
functions or not, but in any case the core of the thing would be
versions of lfirst, lnext, and foreach that cast their arguments so that
a pointer declared as List * can be misused as a list iteration variable.
There will be a small number of things that we cannot make work this
way, for instance applying lfirst to something that actually is a List
and not a ListCell, but I think we can make sure that all the commonly
used coding patterns work.
Given that, the work plan would look like this:
1. Finish up list.c and pg_list.h with compatibility macros; for the
moment, #define OLD_LIST_API.
2. Debug until regression tests pass. This will entail finding the
places where the compatibility layer fails and fixing them.
3. Commit to CVS head.
4. Incrementally fix places that need API adjustment (locally turning
off OLD_LIST_API should be sufficient to locate these). Commit as
the work is done, thereby avoiding any code drift problem.
5. Remove default definition of OLD_LIST_API.
regards, tom lane
On Tue, 23 Mar 2004 08:39:21 -0800
Sailesh Krishnamurthy <sailesh@cs.berkeley.edu> wrote:
Which brings me to another question .. has anybody considered using
subversion instead of CVS ?
I for one would love to see more Open Source projects like PostgreSQL
move to using subversion instead of CVS. I've been using it for
awhile now on my little projects and it's a joy to work with.
---------------------------------
Frank Wiles <frank@wiles.org>
http://frank.wiles.org
---------------------------------
On Mar 23, 2004, at 8:39, Sailesh Krishnamurthy wrote:
Which brings me to another question .. has anybody considered using
subversion instead of CVS ?
My guess would be only people who haven't used arch. :)
Seriously, though. Decentralized, disconnected revision control is a
lot nicer to work with in open source projects...especially larger ones
with more developers in more locations.
--
Dustin Sallings
Neil Conway wrote:
(1) API naming
In the latest prototype implementation, I've adopted a completely new
naming convention for the functions in the list API. The new API looks
like:List *list_append(List *list, void *datum);
List *list_prepend(List *list, void *datum);
List *list_conc(List *list1, List *list2);
bool list_member(List *list, void *datum);
List *list_remove(List *list, void *datum);
etc.Function variants which operate on integer lists (type = T_IntList)
have an "_int" suffix, whereas OID list functions use an "_oid" suffix.
By default, list operations use structural equality (i.e. equal()) --
if a list function uses pointer equality to determine if two list
elements are the same, it has the "_simple" suffix. And so on.In contrast, the existing List API is a lot more inconsistent (IMHO). I
elaborated on some of the deficiencies of the existing API naming
scheme here:http://archives.postgresql.org/pgsql-patches/2004-01/msg00188.php
Tom objected to changing the names:
http://archives.postgresql.org/pgsql-patches/2004-01/msg00186.php
http://archives.postgresql.org/pgsql-patches/2004-01/msg00191.php
I agree a renaming of list functions is good. If we had kept the
original Berkeley code as-is, we would have a lot fewer developers
today. :-) Making drastic cleanups is often worthwile. I write
backend code and still can't remember which list function does what, so
clearer naming would help me a lot, and I am sure others too.
Tom had the idea of making compatible names for the old macros, and I
think that is an excellent compromise.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Tue, Mar 23, 2004 at 05:00:14AM -0500, Neil Conway wrote:
[...]
The basic idea is to ditch the Lisp-style linked list representation
(in which a "list" is merely a pointer to the head node), and adopt a
new List struct like so:
I assume you are doing away with the FastList hack too, aren't you?
--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
We take risks not to escape from life, but to prevent life escaping from us.
Dustin Sallings <dustin@spy.net> writes:
On Mar 23, 2004, at 8:39, Sailesh Krishnamurthy wrote:
Which brings me to another question .. has anybody considered using
subversion instead of CVS ?
My guess would be only people who haven't used arch. :)
This reminds me quite a lot of a discussion that was just happening on
a Red Hat mailing list. It seems the great thing about revision control
systems is there are so many to choose from ;-) ... and plenty of people
willing to proselytize for their favorite.
AFAICS, though, CVS is not broken for our needs. I don't see an
adequate reason to change.
regards, tom lane
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
I assume you are doing away with the FastList hack too, aren't you?
Yup, we'll get to revert that junk too.
regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Neil Conway wrote:
Tom objected to changing the names:
I agree a renaming of list functions is good. If we had kept the
original Berkeley code as-is, we would have a lot fewer developers
today. :-) Making drastic cleanups is often worthwile.
I would be satisfied if we kept the names of the core,
most-commonly-used functions the same. I would put lfirst, lnext,
lcons, lappend, length, maybe member into the category of names
I don't want to change. Attaching "_int" and "_oid" to those for the
related functions is okay.
If we go in that direction then the common prefix would be just "l"
and not "list_", which seems a good idea to me on grounds of brevity.
Looking over Neil's proposal again, one of the things that bugged me
about it was that the function names were overly verbose. That's okay
for stuff you don't see often, but the common list functions are *all
over* the backend. You can't really claim that developers will be
unfamiliar with them. Making those names longer won't buy us anything
except sooner onset of carpal tunnel syndrome.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Neil Conway wrote:
Tom objected to changing the names:
I agree a renaming of list functions is good. If we had kept the
original Berkeley code as-is, we would have a lot fewer developers
today. :-) Making drastic cleanups is often worthwile.I would be satisfied if we kept the names of the core,
most-commonly-used functions the same. I would put lfirst, lnext,
lcons, lappend, length, maybe member into the category of names
I don't want to change. Attaching "_int" and "_oid" to those for the
related functions is okay.If we go in that direction then the common prefix would be just "l"
and not "list_", which seems a good idea to me on grounds of brevity.
Looking over Neil's proposal again, one of the things that bugged me
about it was that the function names were overly verbose. That's okay
for stuff you don't see often, but the common list functions are *all
over* the backend. You can't really claim that developers will be
unfamiliar with them. Making those names longer won't buy us anything
except sooner onset of carpal tunnel syndrome.
Agreed. Sounds like a plan.
What does the 'n' stand for in ncons? I also felt that lcons
(construct) and nconc(concat) were too similarly named.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes:
What does the 'n' stand for in ncons? I also felt that lcons
(construct) and nconc(concat) were too similarly named.
I think nconc is a direct copy from the Lisp original; whatever its
origins are, they're back in Lisp prehistory. I don't mind renaming
that one ;-)
Let's see ... fleshing out this idea a bit, here's a rundown of all the
symbols in pg_list.h and suggested new names:
ListCell accessors:
lfirst no change
lfirsti lfirst_int
lfirsto lfirst_oid
lnext no change
foreach no change
List accessors:
length no change
lfirstcell new function to get first cell, or NULL if none
lfirst rewrite as lfirst(lfirstcell()) when applied to a List
lsecond, lthird, lfourth not clear if worth keeping
llastnode llastcell
llast Perhaps rewrite as lfirst(llastcell()) instead of keeping
makeListN list_makeN or possibly list_make_N
makeListiN list_makeN_int or list_make_N_int
makeListoN list_makeN_oid or list_make_N_oid
lcons no change
lconsi lcons_int
lconso lcons_oid
lappend no change
lappendi lappend_int
lappendo lappend_oid
nconc list_concat
We might also need a function to attach some bare ListCells to a List
nth list_nth
list_nth_int
list_nth_oid
Should add list_nth_int, list_nth_oid, even though there are no
corresponding functions at the moment
member list_member
ptrMember list_member_ptr
intMember list_member_int
oidMember list_member_oid
LispRemove list_remove
lremove list_remove_ptr
lremovei list_remove_int
list_remove_oid add, though not currently used
ltruncate list_truncate
set_union list_union
set_ptrUnion list_union_ptr
list_union_int not currently used
set_uniono list_union_oid
set_difference list_difference
set_ptrDifference list_difference_ptr
list_difference_int not currently used
set_differenceo list_difference_oid
equali and equalo become just calls on equal()
freeList list_free (frees only List, not pointed-to objects)
listCopy list_copy (shallow copy of List only)
A couple of notes here: I propose using the suffix "_ptr" for operations
that use pointer equality rather than equal(). Neil proposed "_simple"
but that seems less than clear to me --- which one is "simple" and which
isn't? Also, for the "set" operations I have just "list_union" etc
where Neil proposed "list_set_union".
Neil proposed inventing "list_deep_copy" which as far as I can see is
entirely redundant with copyObject. Similarly I can't see any value in
list_equal when equal() will do the trick.
regards, tom lane
I wrote:
Let's see ... fleshing out this idea a bit, here's a rundown of all the
symbols in pg_list.h and suggested new names:
Sheesh ... I forgot that I intended to do s/list_/l/g on that.
Doing so brings up one problem, which is that the old version of
lremove() conflicts with the new naming convention (it should have
had a name mentioning "ptr"). In the attached I work around this
by using "ldelete" rather than "lremove" as the base name for these
functions, but I'm open to other ideas.
ListCell accessors:
lfirst no change
lfirsti lfirst_int
lfirsto lfirst_oid
lnext no change
foreach no change
List accessors:
length no change
lfirstcell new function to get first cell, or NULL if none
lfirst rewrite as lfirst(lfirstcell()) when applied to a List
lsecond, lthird, lfourth not clear if worth keeping
llastnode llastcell
llast Perhaps rewrite as lfirst(llastcell()) instead of keeping
makeListN lmakeN or possibly lmake_N
makeListiN lmakeN_int or lmake_N_int
makeListoN lmakeN_oid or lmake_N_oid
lcons no change
lconsi lcons_int
lconso lcons_oid
lappend no change
lappendi lappend_int
lappendo lappend_oid
nconc lconcat
We might also need a function to attach some bare ListCells to a List
nth lnth
lnth_int
lnth_oid
Should add lnth_int, lnth_oid, even though there are no
corresponding functions at the moment
member lmember
ptrMember lmember_ptr
intMember lmember_int
oidMember lmember_oid
LispRemove ldelete
lremove ldelete_ptr
lremovei ldelete_int
ldelete_oid add, though not currently used
ltruncate no change
set_union lunion
set_ptrUnion lunion_ptr
lunion_int not currently used
set_uniono lunion_oid
set_difference ldifference
set_ptrDifference ldifference_ptr
ldifference_int not currently used
set_differenceo ldifference_oid
equali and equalo become just calls on equal()
freeList lfree (frees only List, not pointed-to objects)
listCopy lcopy (shallow copy of List only)
regards, tom lane
Tom Lane wrote:
lcons no change
lconsi lcons_int
lconso lcons_oid
Should these be lnew or something clearer than cons-truct?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
What does the 'n' stand for in ncons? I also felt that lcons
(construct) and nconc(concat) were too similarly named.I think nconc is a direct copy from the Lisp original; whatever its
origins are, they're back in Lisp prehistory. I don't mind renaming
that one ;-)
I found out why it is called Nconc:
The name comes from CommonLisp: 'conc' for 'concatenate', prefixed by
the 'n' which signals a dangerous function modifying existing lists.
(Think of as as n-for-nuke.)
from:
http://www.muq.org/~cynbe/muq/mufref_437.html
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Tom Lane wrote:
lcons no change
lconsi lcons_int
lconso lcons_oid
Should these be lnew or something clearer than cons-truct?
No, lcons is one of the names that I think we should stick with on
historical grounds. It's widely used in the backend and it has the
right connotations for anyone who's ever used Lisp. "lnew" conveys
nothing, certainly not the right thing (it doesn't make a new List,
only a new ListCell).
regards, tom lane
On 23-Mar-04, at 7:05 PM, Tom Lane wrote:
No, lcons is one of the names that I think we should stick with on
historical grounds. It's widely used in the backend and it has the
right connotations for anyone who's ever used Lisp.
I think it has exactly the *wrong* connotations: the name suggests that
it creates a new cons cell (along with the ensuing implications about
performance and the internal implementation of the list), which is no
longer the case.
How about lprepend()? That allows for some symmetric with lappend().
-Neil
Neil Conway wrote:
On 23-Mar-04, at 7:05 PM, Tom Lane wrote:
No, lcons is one of the names that I think we should stick with on
historical grounds. It's widely used in the backend and it has the
right connotations for anyone who's ever used Lisp.I think it has exactly the *wrong* connotations: the name suggests that
it creates a new cons cell (along with the ensuing implications about
performance and the internal implementation of the list), which is no
longer the case.How about lprepend()? That allows for some symmetric with lappend().
Wow, I like that one!
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Tue, 23 Mar 2004, Sailesh Krishnamurthy wrote:
Which brings me to another question .. has anybody considered using
subversion instead of CVS ?
Why? not that I'm for a chance from something that isn't broken, but what
advantages does subversion give us over what we already have?
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Neil Conway <neilc@samurai.com> writes:
On 23-Mar-04, at 7:05 PM, Tom Lane wrote:
No, lcons is one of the names that I think we should stick with on
historical grounds. It's widely used in the backend and it has the
right connotations for anyone who's ever used Lisp.
I think it has exactly the *wrong* connotations: the name suggests that
it creates a new cons cell (along with the ensuing implications about
performance and the internal implementation of the list), which is no
longer the case.
How do you mean it's no longer the case? ListCell looks exactly like a
cons cell to me.
regards, tom lane