From 7798121992154edab4768d7eab5a89be04730b2f Mon Sep 17 00:00:00 2001
From: "David G. Johnston" <David.G.Johnston@Gmail.com>
Date: Wed, 1 May 2024 07:45:48 -0700
Subject: [PATCH] Document NULL

---
 doc/src/sgml/data.sgml     | 169 +++++++++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml |   1 +
 doc/src/sgml/postgres.sgml |   1 +
 3 files changed, 171 insertions(+)
 create mode 100644 doc/src/sgml/data.sgml

diff --git a/doc/src/sgml/data.sgml b/doc/src/sgml/data.sgml
new file mode 100644
index 0000000000..2b09382494
--- /dev/null
+++ b/doc/src/sgml/data.sgml
@@ -0,0 +1,169 @@
+<chapter id="data">
+ <title>Data</title>
+
+ <para>
+  This chapter provides definitions for, and an overview of, data.
+  It discusses the basic design of the metadata related to values
+  and then goes on to describe the special value NULL which typically
+  represents &quot;unknown&quot;
+ </para>
+
+ <sect1 id="data-basics">
+  <title>Data Basics</title>
+  <para>
+   All literals, columns, variables, and expression results in PostgreSQL
+   are typed, which are listed in the next chapter.  Literals and columns
+   must only use one of the concrete types while variables can use
+   either a concrete type or a pseudo-type.  Expression results
+   are limited to concrete types and the pseudo-type record described below.
+  </para>
+  <para>
+   The pseudo-types prefixed with &quot;any&quot; implement polymorphism
+   in PostgreSQL.  Polymorphism allows a single function specification
+   to act on multiple concrete types.  At runtime, the function body
+   associates concrete types to all polymorphic types based upon the
+   conrete argument of its inputs. See ... for more details.
+  </para>
+  <para>
+   The record pseudo-type is also polymorphic in nature but allows
+   the caller of the function to specify the row-like structure of
+   output within the containing query. See ... for more details.
+   The ROW(...) expression (see ...) will also produce a record
+   result comprised of the named columns.
+  </para>
+ </sect1>
+
+  <sect1 id="data-null">
+   <title>Unknown Values (NULL)</title>
+   <para>
+    This section first introduces the meaning of NULL and then goes
+    on to explain how different parts of the system behave when faced
+    with NULL input.
+   </para>
+
+  <sect2 id="data-null-model">
+   <title>NULL in Data Models</title>
+   <para>
+    Generally NULL is assumed to mean "unknown".  However,
+    in practice meaning comes from context and so a model design may state that
+    NULL is to be used to represent "not applicable" - i.e., that a value is not
+    even possible.  SQL has only the single value NULL while there are multiple
+    concepts that people have chosen to apply it to.  In any case the behavior
+    of the system when dealing with NULL is the same regardless of the meaning
+    the given to it in the surrounding context.
+   </para>
+   <para>
+    NULL also takes on a literal meaning of "not found" when produced as the
+    result of an outer join.
+   </para>
+  </sect2>
+
+  <sect2 id="data-null-usage">
+   <title>NULL Usage</title>
+   <para>
+    As NULL is treated as a data value it, like all values, must have
+    a data type.  NULL is a valid value for all data types.
+   </para>
+
+   <para>
+    A NULL literal is written as unquoted NULL.  Its type is unknown but
+    can be cast to any concrete data type.  The [type 'string'] syntax
+    however will not work as there is no way to express NULL using single
+    quotes and unlike.
+   </para>
+
+   <para>
+    The presence of NULL in the system results in three-valued logic.
+    In binary logic every outcome is either true or false.  In
+    three-valued logic unknown, represented using a NULL value, is
+    also an outcome.  Aspects of the system that branch based upon
+    whether a condition variable is true or false thus must also
+    decide how to behave when then condition is NULL.  The remaining
+    sub-sections summarize these decisions.
+   </para>
+  </sect2>
+
+  <sect2 id="data-null-cardinalrule">
+    <title>The Cardinal Rule of NULL</title>
+   <para>
+    The cardinal rule, NULL is never equal or unequal to any non-null
+    value (or itself).  [NULL = anything yields NULL].
+    Checking for NULL has an explicit test documented
+    [here] and additionally there are distinctness tests that treat NULL like
+    a value equal to itself and unequal to any other value.
+   </para>
+   <para>
+    There is also a cardinal warning: when dealing with composite types
+    [see ...] the expressions; composite IS NULL and composite IS NOT NULL
+    are not the opposites of each other in the case where some of the
+    composite's fields are NULL.  Write NOT(composite IS NULL) instead.
+   </para>
+  </sect2>
+
+  <sect2 id="data-null-opertions">
+   <title>NULLs in Operations</title>
+   <para>
+    As a general expectation, operator invocation expressions where one of inputs
+    will result in a NULL output.
+    [add: 1 + NULL yields NULL]
+    [concatenate: 'text' || NULL yields NULL]
+    Operators that behave otherwise should document their deviation from this norm.
+   </para>
+  </sect2>
+
+  <sect2 id="data-null-functions">
+   <title>NULLs in Normal Function Calls</title>
+   <para>
+    Function specifications has a "strictness" attribute that, when set to "strict"
+    (a.k.a. "null on null input") will tell the executor to return NULL for any
+    function call having at least one NULL input value, without executing the
+    function.  Most SQL standard functions behave strictly, and in many cases
+    non-standard functions may exist that do not.
+   </para>
+  </sect2>
+
+  <sect2 id="data-null-aggregates">
+   <title>NULLs in Aggregate and Window Processing</title>
+   <para>  (needs research)
+    When executing an aggregate or window function the state tracking
+    component will remain unchanged even if the underlying processing
+    function returns NULL, whether from being defined strict (see above)
+    or simply returns a NULL value upon execution.
+   </para>
+  </sect2>
+
+  <sect2 id="data-null-filters">
+   <title>NULLs in Filters</title>
+   <para>
+    A WHERE clause that evaluates to NULL for a given row will exclude that row.
+   </para>
+  </sect2>
+
+  <sect2 id="data-null-constraints">
+   <title>NULLs in Constraints</title>
+   <para>
+    It is possible to define validation expressions on
+    tables that ensure only values passing those expressions are inserted.  While
+    this seems like it would behave the same as a filter, the choice here,
+    when an expression evaulates to NULL, is allow the row to be inserted.  See
+    [check constraints] in create table for details.
+   </para>
+   <para>
+    The NOT NULL column constraint is largely syntax sugar for the corresponding
+    column IS NOT NULL check constraint, though there are metadata differences
+    described in create table.
+   </para>
+  </sect2>
+
+  <sect2 id="data-null-grouping">
+   <title>NULLs When Grouping</title>
+   <para>
+    In the context of both DISTINCT and GROUP BY it is necessary that all inputs
+    resolve to being either equal to or not equal to all other values.  These features
+    use distinctness (see ...) instead of simple equality in order to handle
+    NULL like a definite value equal to itself and unequal to all other values.
+   </para>
+  </sect2>
+ </sect1>
+
+</chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 38ec362d8f..2b4fc79b8a 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -14,6 +14,7 @@
 
 <!-- user's guide -->
 <!ENTITY array      SYSTEM "array.sgml">
+<!ENTITY data       SYSTEM "data.sgml">
 <!ENTITY datatype   SYSTEM "datatype.sgml">
 <!ENTITY ddl        SYSTEM "ddl.sgml">
 <!ENTITY dml        SYSTEM "dml.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index ec9f90e283..bd0e4007ee 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -101,6 +101,7 @@ break is not needed in a wider output rendering.
   </partintro>
 
   &syntax;
+  &data;
   &ddl;
   &dml;
   &queries;
-- 
2.34.1

