More on elog and error codes

Started by Peter Eisentrautalmost 25 years ago26 messages

peter_e@gmx.net

almost 25 years ago

I've looked at the elog calls in the source, about 1700 in total (only
elog(ERROR)). If we mapped these to the SQL error codes then we'd have
about two dozen calls with an assigned code and the rest being "other".
The way I estimate it (I didn't really look at *each* call, of course) is
that about 2/3 of the calls are internal panic calls ("cache lookup of %s
failed"), 1/6 are SQL-level problems, and the rest are operating system,
storage problems, "not implemented", misconfigurations, etc.

A problem that makes this quite hard to manage is that many errors can be
reported from several places, e.g., the parser, the executor, the access
method. Some of these messages are probably not readily reproduceable
because they are caught elsewhere.

Consequentially, the most pragmatic approach to assigning error codes
might be to just pick some numbers and give them out gradually. A
hierarchical subsystem+code might be useful, beyond that it really depends
on what we expect from error codes in the first place. Does anyone have
good experiences from other products?

Essentially, I envision making up a new function, say "elogc", which has

elogc(<level>, [<subsys>,?] <code>, message...)

where the code is some macro, the expansion of which is to be determined.
A call to "elogc" would also require a formalized message wording, adding
the error code to the documentation, which also requires having a fairly
good idea how the error can happen and how to handle it. This could
perhaps even be automated to some extent.

All the calls that are not converted yet will be assigned a to the generic
"internal error" class; most of them will stay this way.

As for translations, I don't think we have to worry about this right now.
Assuming that we would use gettext or something similar, we can tell it
that all calls to elog (or "elogc" or whatever) contain translatable
strings, so we don't have to uglify it with gettext(...) or _(...) calls
or what else.

So we need some good error numbering scheme. Any ideas?

--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/

Philip Warner

pjw@rhyme.com.au

almost 25 years ago

In reply to: Peter Eisentraut (#1)

Re: More on elog and error codes

At 23:56 19/03/01 +0100, Peter Eisentraut wrote:

Essentially, I envision making up a new function, say "elogc", which has

elogc(<level>, [<subsys>,?] <code>, message...)

where the code is some macro, the expansion of which is to be determined.
A call to "elogc" would also require a formalized message wording, adding
the error code to the documentation, which also requires having a fairly
good idea how the error can happen and how to handle it. This could
perhaps even be automated to some extent.

All the calls that are not converted yet will be assigned a to the generic
"internal error" class; most of them will stay this way.

...

So we need some good error numbering scheme. Any ideas?

FWIW, the VMS scheme has error numbers broken down to include system,
subsystem, error number & severity. These are maintained in an error
message source file. eg. the file system's 'file not found' error message
is something like:

FACILITY RMS (the file system)
...
SEVERITY WARNING
...
FILNFND "File %AS not found"
...

It's a while since I used VMS messages files regularly, this is at least
representative. It has the drawback that severity is often tied to the
message, not the circumstance, but this is a problem only rarely.

In code, the messages are used as external symbols (probably in our case
representing pointers to C format strings). In making extensive use of such
a mnemonics, I never really needed to have full text messages. Once a set
of standards is in place for message abbreviations, the most people can
read the message codes. This would mean that:

elogc(<level>, [<subsys>,?] <code>, message...)

becomes:

elogc(<code> [, parameter...])

eg.

"cache lookup of %s failed"

might be replaced by:

elog(CACHELOOKUPFAIL, cacheItemThatFailed);

and
"internal error: %s"

becomes

elog(INTERNAL, "could not find the VeryImportantThing");

Unlike VMS, it's probably a good idea to separate the severity from the
error code, since a CACHELOOKUPFAIL in one place may be less significant
than another (eg. severity=debug).

I also think it's important that we get the source file and line number
somewhere in the message, and if we have these, we may not need the subsystem.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

Tom Lane

tgl@sss.pgh.pa.us

almost 25 years ago

In reply to: Philip Warner (#2)

Re: More on elog and error codes

Philip Warner <pjw@rhyme.com.au> writes:

I also think it's important that we get the source file and line number
somewhere in the message, and if we have these, we may not need the
subsystem.

I agree that the subsystem concept is not necessary, except possibly as
a means of avoiding collisions in the error-symbol namespace, and for
that it would only be a naming convention (PGERR_subsys_IDENTIFIER).
We probably do not need it considering that we have much less than 1000
distinct error identifiers to assign, judging from Peter's survey.

We do need severity to be distinct from the error code ("internal
errors" are surely not all the same severity, even if we don't bother
to assign formal error codes to each one).

BTW, the symbols used in the source code do need to have a common prefix
(PGERR_CACHELOOKUPFAIL not CACHELOOKUPFAIL) to avoid namespace pollution
problems. We blew this before with "DEBUG" and friends, let's learn
from that mistake.

regards, tom lane

Thomas Lockhart

lockhart@alumni.caltech.edu

almost 25 years ago

In reply to: Peter Eisentraut (#1)

Re: More on elog and error codes

So we need some good error numbering scheme. Any ideas?

SQL9x specifies some error codes, with no particular numbering scheme
other than negative numbers indicate a problem afaicr.

Shouldn't we map to those where possible?

- Thomas

Gunnar R|nning

gunnar@candleweb.no

almost 25 years ago

In reply to: Peter Eisentraut (#1)

Re: More on elog and error codes

Thomas Lockhart <lockhart@alumni.caltech.edu> writes:

So we need some good error numbering scheme. Any ideas?

SQL9x specifies some error codes, with no particular numbering scheme
other than negative numbers indicate a problem afaicr.

Shouldn't we map to those where possible?

Good point, but I guess most of the errors produced are pgsql
specific. If I remember right Sybase had several different SQL types of error
mapped to one of the standard error codes.

Also the JDBC API provides methods to look at the database dependent error
code and standard error code. I've found both useful when working with
Sybase.

cheers,

Gunnar

Import Notes

Reply to msg id not found: ThomasLockhartsmessageofTue20Mar2001060119+0000

Zeugswetter Andreas SB

ZeugswetterA@wien.spardat.at

almost 25 years ago

In reply to: Gunnar R|nning (#5)

AW: Re: More on elog and error codes

So we need some good error numbering scheme. Any ideas?

SQL9x specifies some error codes, with no particular numbering scheme
other than negative numbers indicate a problem afaicr.

Shouldn't we map to those where possible?

Yes, it defines at least a few dozen char(5) error codes. These are hierarchical,
grouped into Warnings and Errors, and have room for implementation specific
message codes.
Imho there is no room for inventing something new here, or only in addition.

Andreas

Import Notes

Resolved by subject fallback

Peter Eisentraut

peter_e@gmx.net

almost 25 years ago

In reply to: Philip Warner (#2)

Re: More on elog and error codes

Philip Warner writes:

elog(CACHELOOKUPFAIL, cacheItemThatFailed);

The disadvantage of this approach, which I tried to explain in a previous
message, is that we might want to have different wordings for different
occurences of the same class of error.

Additionally, the whole idea behind having error *codes* is that the
client program can easily distinguish errors that it can handle specially.
Thus the codes should be numeric or some other short, fixed scheme. In
the backend they could be replaced by macros.

Example:

#define PGERR_TYPE 1854

/* somewhere... */

elogc(ERROR, PGERR_TYPE, "type %s cannot be created because it already exists", ...)

/* elsewhere... */

elogc(ERROR, PGERR_TYPE, "type %s used as argument %d of function %s doesn't exist", ...)

In fact, this is my proposal. The "1854" can be argued, but I like the
rest.

--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/

Peter Eisentraut

peter_e@gmx.net

almost 25 years ago

In reply to: Zeugswetter Andreas SB (#6)

Re: AW: Re: More on elog and error codes

Zeugswetter Andreas SB writes:

SQL9x specifies some error codes, with no particular numbering scheme
other than negative numbers indicate a problem afaicr.

Shouldn't we map to those where possible?

Yes, it defines at least a few dozen char(5) error codes. These are hierarchical,
grouped into Warnings and Errors, and have room for implementation specific
message codes.

Let's use those then to start with.

Anyone got a good idea for a client API to this? I think we could just
prefix the actual message with the error code, at least as a start.
Since they're all fixed width the client could take them apart easily. I
recall other RDBMS' (Oracle?) also having an error code before each
message.

--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/

Zeugswetter Andreas SB

ZeugswetterA@wien.spardat.at

almost 25 years ago

In reply to: Peter Eisentraut (#8)

AW: More on elog and error codes

#define PGERR_TYPE 1854

#define PGSQLSTATE_TYPE "S0021" // char(5) SQLSTATE

The standard calls this error variable SQLSTATE
(look up in ESQL standard)

first 2 chars are class next 3 are subclass

"00000" is e.g. Success
"02000" is Data not found
"U0xxx" user defined routine error xxx is user defined

/* somewhere... */

elogc(ERROR, PGERR_TYPE, "type %s cannot be created because it already exists", ...)

PGELOG(ERROR, PGSQLSTATE_TYPE, ("type %s cannot be created because it already exists", ...))

put varargs into parentheses to avoid need for ... macros see Tom's proposal

I also agree, that we can group different text messages into the same SQLSTATE,
if it seems appropriate for the client to handle them alike.

Andreas

Import Notes

Resolved by subject fallback

#10

Larry Rosenman

ler@lerctr.org

almost 25 years ago

In reply to: Peter Eisentraut (#8)

Re: AW: Re: More on elog and error codes

Coming from an IBM Mainframe background, I'm used to ALL OS/Product
messages having a message number, and a fat messages and codes book.

I hope we can do that eventually.
(maybe a database of the error numbers and codes?)

LER

Original Message <<<<<<<<<<<<<<<<<<

On 3/20/01, 10:53:42 AM, Peter Eisentraut <peter_e@gmx.net> wrote regarding
Re: AW: [HACKERS] Re: More on elog and error codes:

Zeugswetter Andreas SB writes:

SQL9x specifies some error codes, with no particular numbering scheme
other than negative numbers indicate a problem afaicr.

Shouldn't we map to those where possible?

Yes, it defines at least a few dozen char(5) error codes. These are

hierarchical,

grouped into Warnings and Errors, and have room for implementation

specific

Show quoted text

message codes.

Let's use those then to start with.

Anyone got a good idea for a client API to this? I think we could just
prefix the actual message with the error code, at least as a start.
Since they're all fixed width the client could take them apart easily. I
recall other RDBMS' (Oracle?) also having an error code before each
message.

--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

#11

Tom Lane

tgl@sss.pgh.pa.us

almost 25 years ago

In reply to: Zeugswetter Andreas SB (#9)

Re: AW: More on elog and error codes

Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:

PGELOG(ERROR, PGSQLSTATE_TYPE, ("type %s cannot be created because it already exists", ...))

put varargs into parentheses to avoid need for ... macros see Tom's proposal

I'd be inclined to make it

PGELOG((ERROR, PGSQLSTATE_TYPE, "type %s cannot be created because it already exists", ...))

The extra parens are ugly and annoying in any case, but they seem
slightly less so if you just double the parens associated with the
PGELOG call. Takes less thought than adding a paren somewhere in the
middle of the call. IMHO anyway...

regards, tom lane

#12

Christopher Sawtell

csawtell@xtra.co.nz

almost 25 years ago

In reply to: Peter Eisentraut (#1)

Re: More on elog and error codes

On Tue, 20 Mar 2001 10:56, you wrote:

I've looked at the elog calls in the source, about 1700 in total (only

[ ... ]

So we need some good error numbering scheme. Any ideas?

Just that it might be a good idea to incorporate the version / release
details in some way so that when somebody on the list is squeaking about
an error message it is obvious to the helper that the advice needed is to
upgrade from the Cretatious Period version to a modern release, and have
another go.

--
Sincerely etc.,

NAME Christopher Sawtell
CELL PHONE 021 257 4451
ICQ UIN 45863470
EMAIL csawtell @ xtra . co . nz
CNOTES ftp://ftp.funet.fi/pub/languages/C/tutorials/sawtell_C.tar.gz

-->> Please refrain from using HTML or WORD attachments in e-mails to me
<<--

#13

Otto A. Hirr, Jr.

otto.hirr@olabinc.com

almost 25 years ago

In reply to: Zeugswetter Andreas SB (#6)

RE: Re: More on elog and error codes

So we need some good error numbering scheme. Any ideas?

I'm a newbie, but have been following dev and have a few comments
and these are thoughts not criticisms:

1) I've seen a huge mixture of "how to implement" to support some
desired feature without first knowing "all" of the features that
are desired. Examination over all of the mailings reveals some
but not all of possible features you may want to include.
2) Define what you want to have without worrying about how to do it.
3) Design something that can implement all of the features.
4) Reconsider design if there are performance issues.

e.g.

Features desired
* system
* subsystem
* function
* file, line, etc
* severity
* user-ability-to-recover
* standards conformance - e.g.. SQL std
* default msg statement
* locale msg statement lookup mech, os dep or indep (careful here)
* success/warning/failure
* semantic taxonomy
* syntactic taxonomy
* forced to user, available to api, logging or not, tracing
* concept of level
* reports filtering on some attribute
* interoperation with existing system reports e.g. syslog, event log,...
* system environment snapshot option
(e.g. resource low/empty may/should trigger a log of conn cnt,
sys resource counts, load, etc)
* non-mnemonic internal numbers (mnemonic only to obey stds and then
only as a function call, not by implementation)
* ease of use (i.e. pgsql-dev-hacker use)
* ease of use (i.e. api development use)
* ease of use (i.e. rolling into an existing system, e.g. during
transition both may need to be in use.)
* ease of use (i.e. looking through existing errors to find one
that may "correctly" fit the situation, instead of
creating yet-another-error-message.)
* ease of use (i.e. maybe having each "sub-system" having its own
"error domain" but using the same error mechanism)
* distinction btwn error report, debug report, tracing report, etc
* separate the concepts of
- report creation
- report delivery
- report reception
- report interpretation
* what do other's do, other's as in os, db, middleware, etc
along with their strong and weak points
... what else do you want... and lets flesh out the meaning of
each of these. Then we can go on to a design...

Sorry if this sounds like a lecture.

With regards to mnemonic things - ugh - this is a database.
I've worked with a LARGE electronics company that had
10 and 12 digit mnemonic part numbers. The mnemonic-ness
begins to break down. (So you have a part number of an eprom,
what is the part number when it is blown - still an eprom?
how about including the version of the sw on the eprom? is it
now an assembly? opps that tended to mean multiple parts attached
together, humm still looks like an eprom?) They have gone through
a huge transition to move away, as has the industry from mnemonic
numbers to simply an id number. You look up the id number in a

database< :-) to find out what it is.

So why not drop the mnemonic concept and apply a function to a
blackbox dataitem to determine its attribute? But again first
determine what attributes you want, which are mandatory, optional,
system supplied (e.g. __LINE__ etc), is it for erroring, tracing,
debugging, some combo; then the appropriate dataitem can be
designed and functions defined. Functions (macros) for both the
report creation, report distribution, report reception, and
report interpretation. Some other email pointed out that
there are different people doing different things. Each of these
people-groups should identify what they need with regards to
error, debug, tracing reports. Each may have some nuances that
are not needed elsewhere, but the reporting system should be able
to support them all.

Ok, so I've got my flame suit on... but I am really trying to give
an "outsiders" birdseye view of what I've been reading, hopefully
which may be helpful.

Best regards,

.. Otto

Otto Hirr
OLAB Inc.
otto.hirr@olabinc.com
503 / 617-6595

#14

Ross J. Reedstrom

reedstrm@rice.edu

almost 25 years ago

In reply to: Christopher Sawtell (#12)

Re: More on elog and error codes

On Wed, Mar 21, 2001 at 09:41:44AM +1200, Christopher Sawtell wrote:

On Tue, 20 Mar 2001 10:56, you wrote:

Just that it might be a good idea to incorporate the version / release
details in some way so that when somebody on the list is squeaking about
an error message it is obvious to the helper that the advice needed is to
upgrade from the Cretatious Period version to a modern release, and have

ROFL - parsed this as Cretinous period on the first pass.

Ross

#15

Philip Warner

pjw@rhyme.com.au

almost 25 years ago

In reply to: Peter Eisentraut (#7)

Re: More on elog and error codes

At 17:35 20/03/01 +0100, Peter Eisentraut wrote:

Philip Warner writes:

elog(CACHELOOKUPFAIL, cacheItemThatFailed);

The disadvantage of this approach, which I tried to explain in a previous
message, is that we might want to have different wordings for different
occurences of the same class of error.

Additionally, the whole idea behind having error *codes* is that the
client program can easily distinguish errors that it can handle specially.
Thus the codes should be numeric or some other short, fixed scheme. In
the backend they could be replaced by macros.

This seems to be just an argument for constructing the value of
PGERR_CACHELOOKUPFAIL carefully (which is what the VMS message source files
did). The point is that when they are used by a developer, they are simple.

#define PGERR_TYPE 1854

/* somewhere... */

elogc(ERROR, PGERR_TYPE, "type %s cannot be created because it already

exists", ...)

/* elsewhere... */

elogc(ERROR, PGERR_TYPE, "type %s used as argument %d of function %s

doesn't exist", ...)

I can appreciate that there may be cases where the same message is reused,
but that is where parameter substitution comes in.

In the specific example above, returning the same error code is not going
to help the client. What if they want to handle "type %s used as argument
%d of function %s doesn't exist" by creating the type, and silently ignore
"type %s cannot be created because it already exists"?

How do you handle "type %s can not be used as a function return type"? Is
this PGERR_FUNC or PGERR_TYPE?

If the motivation behind this is to alloy easy translation to SQL error
codes, then I suggest we have an error definition file with explicit
translation:

Code SQL Text
PGERR_TYPALREXI 02xxx "type %s cannot be created because it already exists"
PGERR_FUNCNOTYPE 02xxx "type %s used as argument %d of function %s doesn't
exist"

and if we want a generic 'type does not exist', then:

PGERR_NOSUCHTYPE 02xxx "type %s does not exist - %s"

where the %s might contain 'it can't be used as a function argument'.

the we just have

elogc(ERROR, PGERR_TYPALEXI, ...)

/* elsewhere... */

elogc(ERROR, PGERR_FUNCNOTYPE, ...)

Creating central message files/objects has the added advantage of a much
simpler locale support - they're just resource files, and they're NOT
embedded throughout the code.

Finally, if you do want to have some kind of error classification beyond
the SQL code, it could be encoded in the error message file.

#16

Philip Warner

pjw@rhyme.com.au

almost 25 years ago

In reply to: Christopher Sawtell (#12)

Re: More on elog and error codes

At 09:41 21/03/01 +1200, Christopher Sawtell wrote:

Just that it might be a good idea to incorporate the version / release
details in some way so that when somebody on the list is squeaking about
an error message it is obvious to the helper that the advice needed is to
upgrade from the Cretatious Period version to a modern release, and have
another go.

This is better handled by the bug *reporting* system; the users can easily
get the current version number from PG and send it with their reports. We
don't really want all the error codes changing between releases.

#17

Philip Warner

pjw@rhyme.com.au

almost 25 years ago

In reply to: Philip Warner (#15)

Re: More on elog and error codes

At 09:43 21/03/01 +1100, Philip Warner wrote:

Code SQL Text
PGERR_TYPALREXI 02xxx "type %s cannot be created because it already exists"
PGERR_FUNCNOTYPE 02xxx "type %s used as argument %d of function %s doesn't
exist"

Peter,

Just to clarify, because in a previous email you seemed to believe that I
wanted 'PGERR_TYPALREXI' to resolve to a string. I have no such desire; a
meaningful number is fine, but we should never have to type it. One
possibility is that it is the address of an error-info function (built by
'compiling' the message file). Another possibility is that it could be a
prefix to several external symbols, PGERR_TYPALREXI_msg,
PGERR_TYPALREXI_code, PGERR_TYPALREXI_num, PGERR_TYPALREXI_sqlcode etc,
which are again built by compiling the message file. We can then encode
whatever we like into the message, have flexible text, and ease of use for
developers.

Hope this clarifies things...

#18

Thomas Lockhart

lockhart@alumni.caltech.edu

almost 25 years ago

In reply to: Philip Warner (#2)

Re: More on elog and error codes

Creating central message files/objects has the added advantage of a much
simpler locale support - they're just resource files, and they're NOT
embedded throughout the code.
Finally, if you do want to have some kind of error classification beyond
the SQL code, it could be encoded in the error message file.

We could also (automatically) build a DBMS reference table *from* this
message file (or files), which would allow lookup of messages from codes
for applications which are not "message-aware".

Not a requirement, and it does not meet all needs (e.g. you would have
to be connected to get the messages in that case) but it would be
helpful for some use cases...

- Thomas

#19

Philip Warner

pjw@rhyme.com.au

almost 25 years ago

In reply to: Thomas Lockhart (#18)

Re: More on elog and error codes

At 03:28 21/03/01 +0000, Thomas Lockhart wrote:

Creating central message files/objects has the added advantage of a much
simpler locale support - they're just resource files, and they're NOT
embedded throughout the code.
Finally, if you do want to have some kind of error classification beyond
the SQL code, it could be encoded in the error message file.

We could also (automatically) build a DBMS reference table *from* this
message file (or files), which would allow lookup of messages from codes
for applications which are not "message-aware".

Not a requirement, and it does not meet all needs (e.g. you would have
to be connected to get the messages in that case) but it would be
helpful for some use cases...

If we extended the message definitions to have (optional) description &
user-resolution sections, then we have the possibilty of asking psql to
explain the last error, and (broadly) how to fix it. Of course, in the
first pass, these would all be empty.

#20

Peter Eisentraut

peter_e@gmx.net

almost 25 years ago

In reply to: Philip Warner (#15)

Re: More on elog and error codes

Philip Warner writes:

If the motivation behind this is to alloy easy translation to SQL error
codes, then I suggest we have an error definition file with explicit
translation:

Code SQL Text
PGERR_TYPALREXI 02xxx "type %s cannot be created because it already exists"
PGERR_FUNCNOTYPE 02xxx "type %s used as argument %d of function %s doesn't
exist"

and if we want a generic 'type does not exist', then:

PGERR_NOSUCHTYPE 02xxx "type %s does not exist - %s"

where the %s might contain 'it can't be used as a function argument'.

the we just have

elogc(ERROR, PGERR_TYPALEXI, ...)

/* elsewhere... */

elogc(ERROR, PGERR_FUNCNOTYPE, ...)

This is going to be a disaster for the coder. Every time you look at an
elog you don't know what it does? Is the first arg a %s or a %d? What's
the first %s, what the second? How can this be checked against bugs? (I
know GCC can be pretty helpful here, but does it catch all problems?)

Conversely, when you look at the error message you don't know from what
contexts it's called. The error messages will degrade rapidly in quality
because changing one will become a major project.

Creating central message files/objects has the added advantage of a much
simpler locale support - they're just resource files, and they're NOT
embedded throughout the code.

Actually, the fact that the messages are in the code, where they're used,
and not in a catalog file is a reason why gettext is so popular and
catgets gets laughed at.

--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/

#21

Philip Warner

pjw@rhyme.com.au

almost 25 years ago

In reply to: Peter Eisentraut (#20)

Re: More on elog and error codes

At 22:03 21/03/01 +0100, Peter Eisentraut wrote:

Philip Warner writes:

If the motivation behind this is to alloy easy translation to SQL error
codes, then I suggest we have an error definition file with explicit
translation:

Code SQL Text
PGERR_TYPALREXI 02xxx "type %s cannot be created because it already

exists"

PGERR_FUNCNOTYPE 02xxx "type %s used as argument %d of function %s doesn't
exist"

and if we want a generic 'type does not exist', then:

PGERR_NOSUCHTYPE 02xxx "type %s does not exist - %s"

where the %s might contain 'it can't be used as a function argument'.

the we just have

elogc(ERROR, PGERR_TYPALEXI, ...)

/* elsewhere... */

elogc(ERROR, PGERR_FUNCNOTYPE, ...)

This is going to be a disaster for the coder. Every time you look at an
elog you don't know what it does? Is the first arg a %s or a %d? What's
the first %s, what the second?

From experience using this sort of system, probably 80% of errors in new
code are new; if you don't know the format of your own errors, then you
have a larger problem. Secondly, most errors have obvious parameters, and
it only ever gets confusing when they have more than one parameter, and
even then it's pretty obvious. This concern was often raised by people new
to the system, but generally turned out to be more FUD than fact.

How can this be checked against bugs?
Conversely, when you look at the error message you don't know from what
contexts it's called.

Am I missing something here? The user gets a message like:

TYPALREXI: Specified type 'fred' already exists.

then we do

glimpse TYPALREXI

It is actually a lot easier than the plain text search we already have to
do, when we have to guess at the words that have been substituted into the
message. Besides, in *both* proposed systems, if we have done things
properly, then the postgres log also contains the module name & line #.

The error messages will degrade rapidly in quality
because changing one will become a major project.

Changing one will be a major project only if it is used everywhere. Most
will be relatively localized. And, with glimpse 'XYZ', it's not really that
big a task. Finally, you would need to ask why it was being changed - would
a new message work better? Tell me where the degradation in quality is in
comparison with text-in-the-source versions, with umpteen dozen slightly
different versions of essentially the same error messages?

Creating central message files/objects has the added advantage of a much
simpler locale support - they're just resource files, and they're NOT
embedded throughout the code.

Actually, the fact that the messages are in the code, where they're used,
and not in a catalog file is a reason why gettext is so popular and
catgets gets laughed at.

Is there a URL for a getcats vs. gettext debate would help me understand
the reason for the laughter? I can understand laughing at code that looks
like:

elog(ERROR, 123456, typename);

but

elog(ERROR, TYPALREXI, typename);

is a whole lot more readable.

Also, you failed to address the two points below:

#define PGERR_TYPE 1854

/* somewhere... */

elogc(ERROR, PGERR_TYPE, "type %s cannot be created because it already

exists", ...)

/* elsewhere... */

elogc(ERROR, PGERR_TYPE, "type %s used as argument %d of function %s

doesn't exist", ...)

How do you handle "type %s can not be used as a function return type"? Is
this PGERR_FUNC or PGERR_TYPE?

#22

Tom Lane

tgl@sss.pgh.pa.us

almost 25 years ago

In reply to: Philip Warner (#21)

Re: More on elog and error codes

I've pretty much got to agree with Peter on both of these points.

Philip Warner <pjw@rhyme.com.au> writes:

At 22:03 21/03/01 +0100, Peter Eisentraut wrote:

elogc(ERROR, PGERR_FUNCNOTYPE, ...)

This is going to be a disaster for the coder. Every time you look at an
elog you don't know what it does? Is the first arg a %s or a %d? What's
the first %s, what the second?

From experience using this sort of system, probably 80% of errors in new

code are new; if you don't know the format of your own errors, then you
have a larger problem. Secondly, most errors have obvious parameters, and
it only ever gets confusing when they have more than one parameter, and
even then it's pretty obvious.

The general set of parameters might be pretty obvious, but the exact
type that the format string expects them to be is not so obvious. We
have enough ints, longs, unsigned longs, etc etc running around the
system that care is required. If you look at the existing elog calls
you'll find quite a lot of explicit casts to make certain that the right
thing will happen. If the format strings are not directly visible to
the guy writing an elog call, then errors of that kind will creep in
more easily.

The error messages will degrade rapidly in quality
because changing one will become a major project.

Changing one will be a major project only if it is used everywhere.

I agree with Peter on this one too. Even having to edit a separate
file will create enough friction that people will tend to use an
existing string if it's even marginally appropriate. What I fear even
more is that people will simply not code error checks, especially for
"can't happen" cases, because it's too much of a pain in the neck to
register the appropriate message.

We must not raise the cost of adding error checks significantly, or we
will lose the marginal checks that sometimes save our bacon by revealing
bugs.

regards, tom lane

#23

Philip Warner

pjw@rhyme.com.au

almost 25 years ago

In reply to: Tom Lane (#22)

Re: More on elog and error codes

At 22:03 21/03/01 +0100, Peter Eisentraut wrote:

This is going to be a disaster for the coder. Every time you look at an
elog you don't know what it does? Is the first arg a %s or a %d? What's
the first %s, what the second?

FWIW, I did a quick scan for elog in PG and found:

- 6856 calls (may include commented-out calls)
- 2528 unique messages
- 1248 have no parameters
- 859 have exactly one argument
- 285 have exactly 2 args
- 136 have 3 or more args

so 83% have one or no arguments, which is probably not going to be very
confusing.

Looking at the actual messages, there is also a great deal of opportunity
to standardize and simplify since many of the messages only differ by their
prefixed function name.

Import Notes

Resolved by subject fallback

#24

Philip Warner

pjw@rhyme.com.au

almost 25 years ago

In reply to: Tom Lane (#22)

Re: More on elog and error codes

At 23:24 21/03/01 -0500, Tom Lane wrote:

I've pretty much got to agree with Peter on both of these points.

Damn.

Philip Warner <pjw@rhyme.com.au> writes:

At 22:03 21/03/01 +0100, Peter Eisentraut wrote:

elogc(ERROR, PGERR_FUNCNOTYPE, ...)

This is going to be a disaster for the coder. Every time you look at an
elog you don't know what it does? Is the first arg a %s or a %d? What's
the first %s, what the second?

From experience using this sort of system, probably 80% of errors in new

code are new; if you don't know the format of your own errors, then you
have a larger problem. Secondly, most errors have obvious parameters, and
it only ever gets confusing when they have more than one parameter, and
even then it's pretty obvious.

The general set of parameters might be pretty obvious, but the exact
type that the format string expects them to be is not so obvious. We
have enough ints, longs, unsigned longs, etc etc running around the
system that care is required. If you look at the existing elog calls
you'll find quite a lot of explicit casts to make certain that the right
thing will happen. If the format strings are not directly visible to
the guy writing an elog call, then errors of that kind will creep in
more easily.

I agree it's more likely, but most (all?) cases can be caught by the
compiler. It's not ideal, but neither is having eight different versions of
the same message.

The error messages will degrade rapidly in quality
because changing one will become a major project.

Changing one will be a major project only if it is used everywhere.

I agree with Peter on this one too. Even having to edit a separate
file will create enough friction that people will tend to use an
existing string if it's even marginally appropriate. What I fear even
more is that people will simply not code error checks, especially for
"can't happen" cases, because it's too much of a pain in the neck to
register the appropriate message.

We must not raise the cost of adding error checks significantly, or we
will lose the marginal checks that sometimes save our bacon by revealing
bugs.

This is a problem, I agree - but a procedural one. We need to make
registering messages easy. To do this, rather than having a central message
file, perhaps do the following:

- allow multiple message files (which can be processed to produce .h
files). eg. pg_dump would have it's own pg_dump_messages.xxx file.

- define a message that will assume it's first arg is really a format
string for use in the "can't happen" classes, and which has the SQLCODE for
'internal error'.

We do need some central control, but by creating module-based message files
we can allocate number ranges easily, and we at least take a step down the
path towards a both easy locale handling and a 'big book of error codes'.

#25

Tom Lane

tgl@sss.pgh.pa.us

almost 25 years ago

In reply to: Philip Warner (#24)

Re: More on elog and error codes

Philip Warner <pjw@rhyme.com.au> writes:

This is a problem, I agree - but a procedural one. We need to make
registering messages easy. To do this, rather than having a central message
file, perhaps do the following:

- allow multiple message files (which can be processed to produce .h
files). eg. pg_dump would have it's own pg_dump_messages.xxx file.

I guess I fail to see why that's better than processing the .c files
to extract the message strings from them.

I agree that the sort of system Peter proposes doesn't have any direct
forcing function to discourage gratuitous variations of what's basically
the same message. The forcing function would have to come from the
translators, who will look at the extracted list of messages and
complain that there are near-duplicates. Then we fix the
near-duplicates. Seems like no big deal.

However, a system that uses multiple message files is also not going to
discourage near-duplicates very effectively. I don't think you can have
it both ways: if you are discouraging near-duplicates, then you are
making it harder to for people to create new messages, whether
duplicates or not.

regards, tom lane

#26

Philip Warner

pjw@rhyme.com.au

almost 25 years ago

In reply to: Tom Lane (#25)

Re: More on elog and error codes

At 00:35 22/03/01 -0500, Tom Lane wrote:

Philip Warner <pjw@rhyme.com.au> writes:

This is a problem, I agree - but a procedural one. We need to make
registering messages easy. To do this, rather than having a central message
file, perhaps do the following:

- allow multiple message files (which can be processed to produce .h
files). eg. pg_dump would have it's own pg_dump_messages.xxx file.

However, a system that uses multiple message files is also not going to
discourage near-duplicates very effectively. I don't think you can have
it both ways: if you are discouraging near-duplicates, then you are
making it harder to for people to create new messages, whether
duplicates or not.

Many of the near duplicates are in the same, or related, code so with local
message files there should be a good chance of reduced duplicates.

Other advantages of a separate definition include:

- Extra fields (eg. description, resolution) which could be used by client
programs.
- Message IDs which can be checked by clients to detect specific errors,
independent of locale.
- SQLCODE set in one place, rather than developers having to code it in
multiple places.

The original proposal also included a 'class' field:

elogc(ERROR, PGERR_TYPE, "type %s cannot be created because it already

ISTM that we will have a similar allocation problem with these. But, more
recent example have exluded them, so I am not sure about their status is
Peter's plans.