ASOF join

Started by Konstantin Knizhnikover 8 years ago8 messages

k.knizhnik@postgrespro.ru

over 8 years ago

I wonder if there were some discussion/attempts to add ASOF join to
Postgres (sorry, may be there is better term for it, I am refereeing
KDB definition: http://code.kx.com/wiki/Reference/aj ).
Such kind of join can be useful when we need to associate two
timeseries. It is quite popular in trading:

//join the eq price and size for the op trade time
a::aj[`underlyingSym`time;select time, underlyingSym, sym, putorcall,
ex, cond, price, seqnum, contracts, contractsTraded from t;eqtrades];

...and not only. Below is one example of how people now manually coding
ASOF join:

select
count(*),
count(*)
filter (where timedelta_prev < -30),
count(*)
filter (where ride_prev = ride_next),
... -- many different aggregates
from
(
select
p.provider_id,
p.ts,
(
select extract(epoch from t.ts - p.ts)
from providers_positions t
where p.provider_id = t.provider_id and t.ts < p.ts and
t.source = 'gps'
order by t.ts desc
limit 1
) as timedelta_prev,
(
select extract(epoch from t.ts - p.ts)
from providers_positions t
where p.provider_id = t.provider_id and t.ts > p.ts and
t.source = 'gps'
order by t.ts
limit 1
) as timedelta,
(
select ride_id
from providers_positions t
where p.provider_id = t.provider_id and t.ts < p.ts and
t.source = 'gps'
order by t.ts desc
limit 1
) as ride_prev,
(
select ride_id
from providers_positions t
where p.provider_id = t.provider_id and t.ts > p.ts and
t.source = 'gps'
order by t.ts
limit 1
) as ride_next
from (
select
provider_id,
ts,
event_name
from
lvl2_681_parsed p
) p
where
p.event_name = 'GPS signal restored'
-- offset 0
) z;

Without OFFSET 0 this query generates awful execution plans with
hundreds (!) of subplans corresponding to the subqueries.
Number of subplans (most of them are the same) is equal number of
occurrences of timedelta, timedelta_prev, ... columns in target aggregates.
OFFSET 0 reduce number of subplans to 4. And I expect that using LATERAL
join can reduce it to two and even without "OFFSET 0" trick.
But in any case - it is very complex and unnatural way of expressing
this not so complex query.
With ASOF join is can be written much simpler.

Also Postgres is implementing this query using nested loop with index
scan, which is definitely not the most efficient strategy.
The best way to implement ASOF join is to use something like mergejoin.
Usually there are indexes for both timeseries, so what we need is to
merge two ordered sets using ASOF join rules.
It will require minimal changes in SQL syntax, just adding ASOF keyword
seems to be enough:

select * from Trades ASOF JOIN EqTrades USING (underlyingSym,time);

It seems to me that adding ASOF joins should not require huge amount of
work and can be done with minimal number of changes in executor and
optimizer.
But may be there are some problems/challenges which I do not realize now?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Thomas Munro

thomas.munro@enterprisedb.com

over 8 years ago

In reply to: Konstantin Knizhnik (#1)

Re: ASOF join

On Fri, Jun 16, 2017 at 4:20 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

I wonder if there were some discussion/attempts to add ASOF join to Postgres
(sorry, may be there is better term for it, I am refereeing KDB definition:
http://code.kx.com/wiki/Reference/aj ).

Interesting idea. Also in Pandas:

http://pandas.pydata.org/pandas-docs/version/0.19.0/generated/pandas.merge_asof.html#pandas.merge_asof

I suppose you could write a function that pulls tuples out of a bunch
of cursors and zips them together like this, as a kind of hand-coded
special merge join "except that we match on nearest key rather than
equal keys" (as they put it).

I've written code like this before in a trading context, where we
called that 'previous tick interpolation', and in a scientific context
where other kinds of interpolation were called for (so not really
matching a tuple but synthesising one if no exact match). If you view
the former case as a kind of degenerate case of interpolation then it
doesn't feel like a "join" as we know it, but clearly it is. I had
never considered before that such things might belong inside the
database as a kind of join operator.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

David Fetter

david@fetter.org

over 8 years ago

In reply to: Thomas Munro (#2)

Re: ASOF join

On Fri, Jun 16, 2017 at 11:51:34AM +1200, Thomas Munro wrote:

On Fri, Jun 16, 2017 at 4:20 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

I wonder if there were some discussion/attempts to add ASOF join to Postgres
(sorry, may be there is better term for it, I am refereeing KDB definition:
http://code.kx.com/wiki/Reference/aj ).

Interesting idea. Also in Pandas:

http://pandas.pydata.org/pandas-docs/version/0.19.0/generated/pandas.merge_asof.html#pandas.merge_asof

I suppose you could write a function that pulls tuples out of a bunch
of cursors and zips them together like this, as a kind of hand-coded
special merge join "except that we match on nearest key rather than
equal keys" (as they put it).

I've written code like this before in a trading context, where we
called that 'previous tick interpolation', and in a scientific context
where other kinds of interpolation were called for (so not really
matching a tuple but synthesising one if no exact match). If you view
the former case as a kind of degenerate case of interpolation then it
doesn't feel like a "join" as we know it, but clearly it is. I had
never considered before that such things might belong inside the
database as a kind of join operator.

If you turn your head sideways, it's very similar to the range merge
join Jeff Davis proposed. https://commitfest.postgresql.org/14/1106/

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

over 8 years ago

In reply to: David Fetter (#3)

1 attachment(s)

Re: ASOF join

On 16.06.2017 19:07, David Fetter wrote:

On Fri, Jun 16, 2017 at 11:51:34AM +1200, Thomas Munro wrote:

On Fri, Jun 16, 2017 at 4:20 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

I wonder if there were some discussion/attempts to add ASOF join to Postgres
(sorry, may be there is better term for it, I am refereeing KDB definition:
http://code.kx.com/wiki/Reference/aj ).

Interesting idea. Also in Pandas:

http://pandas.pydata.org/pandas-docs/version/0.19.0/generated/pandas.merge_asof.html#pandas.merge_asof

I attached simple patch adding ASOF join to Postgres. Right now it
support only outer join and requires USING clause (consequently it is
not possible to join two tables which joi keys has different names. May
be it is also possible to support ON clause with condition written like
o.k1 = i.k2 AND o.k2 = i.k2 AND ... AND o.kN >= i.kN
But such notation can be confusing, because join result includes only
one matching inner record with kN smaller or equal than kN of outer
record and not all such records.
As alternative we can add specia

If people fin such construction really useful, I will continue work on it.

I suppose you could write a function that pulls tuples out of a bunch
of cursors and zips them together like this, as a kind of hand-coded
special merge join "except that we match on nearest key rather than
equal keys" (as they put it).

I've written code like this before in a trading context, where we
called that 'previous tick interpolation', and in a scientific context
where other kinds of interpolation were called for (so not really
matching a tuple but synthesising one if no exact match). If you view
the former case as a kind of degenerate case of interpolation then it
doesn't feel like a "join" as we know it, but clearly it is. I had
never considered before that such things might belong inside the
database as a kind of join operator.

If you turn your head sideways, it's very similar to the range merge
join Jeff Davis proposed. https://commitfest.postgresql.org/14/1106/

May be, but I do not understand how to limit result to contain exactly
one (last) inner tuple for each outer tuple.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

asof.patchtext/x-patch; name=asof.patchDownload

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 482a3dd..f7a8f38 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1324,6 +1324,9 @@ get_jointype_name(JoinType jointype)
 		case JOIN_FULL:
 			return "FULL";
 
+		case JOIN_ASOF:
+			return "ASOF";
+
 		default:
 			/* Shouldn't come here, but protect from buggy code. */
 			elog(ERROR, "unsupported join type %d", jointype);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 080cb0a..54cf6c1 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -4073,7 +4073,7 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
 	 * Constructing queries representing SEMI and ANTI joins is hard, hence
 	 * not considered right now.
 	 */
-	if (jointype != JOIN_INNER && jointype != JOIN_LEFT &&
+	if (jointype != JOIN_INNER && jointype != JOIN_LEFT && jointype != JOIN_ASOF && 
 		jointype != JOIN_RIGHT && jointype != JOIN_FULL)
 		return false;
 
@@ -4211,6 +4211,7 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
 			break;
 
 		case JOIN_LEFT:
+		case JOIN_ASOF:
 			fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
 										  list_copy(fpinfo_i->remote_conds));
 			fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 211e4c3..fd3be8c 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -514,6 +514,9 @@ TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
          <listitem>
           <para><literal>CROSS JOIN</literal></para>
          </listitem>
+         <listitem>
+          <para><literal>ASOF [ OUTER ] JOIN</literal></para>
+         </listitem>
         </itemizedlist>
 
         For the <literal>INNER</> and <literal>OUTER</> join types, a
@@ -523,7 +526,9 @@ TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
         <literal>USING (<replaceable
         class="parameter">join_column</replaceable> [, ...])</literal>.
         See below for the meaning.  For <literal>CROSS JOIN</literal>,
-        none of these clauses can appear.
+        none of these clauses can appear. For <literal>ASOF</literal> join type, a
+        join condition must be <literal>USING (<replaceable
+        class="parameter">join_column</replaceable> [, ...])</literal>.
        </para>
 
        <para>
@@ -571,6 +576,32 @@ TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
         on the right), plus one row for each unmatched right-hand row
         (extended with nulls on the left).
        </para>
+
+       <para><literal>ASOF OUTER JOIN</> is similar to <literal>LEFT OUTER JOIN</> but it accepts only
+        <literal>USING (<replaceable class="parameter">join_column_1</replaceable> [, ...], <replaceable class="parameter">join_column_N</replaceable>)</literal> clause
+        where last joined column <replaceable class="parameter">join_column_N</replaceable> is expected to be timestamp 
+        (but actually can have any comparable type) and outer tuple is matched with only one inner tuple with the same values of 1..N-1 
+        join columns and with largest value of <replaceable class="parameter">join_column_N</replaceable> which is smaller or equal than 
+        timesamp of the outer tuple. If there is no such inner tuple, then outer tuple is extended to the full width
+        of the joined table by inserting null values for the inner columns. Example below illustrates work of ASOF join:
+<programlisting>
+create table t(sym varchar,dt time,qty bigint);
+create index ti on t(sym,dt);
+create table q(sym varchar,dt time,px real);
+create index qi on q(sym,dt);
+insert into t values ('msft','10:01:01',100),('msft','10:01:01',101),('ibm','10:01:03',200),('ge','10:01:04',150);
+insert into q values ('ibm','10:01:00',100),('msft','10:01:00',99),('msft','10:01:01',101),('ibm','10:01:02',98); 
+select * from t asof join q using (sym,dt);
+ sym  |    dt    | qty | px  
+------+----------+-----+-----
+ ge   | 10:01:04 | 150 |    
+ ibm  | 10:01:03 | 200 |  98
+ msft | 10:01:01 | 101 | 101
+ msft | 10:01:01 | 100 | 101
+(4 rows)
+</programlisting>
+        </para>
+       
       </listitem>
      </varlistentry>
 
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 9359d0a..a285bae 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1208,6 +1208,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 					case JOIN_ANTI:
 						jointype = "Anti";
 						break;
+					case JOIN_ASOF:
+						jointype = "Asof";
+						break;
 					default:
 						jointype = "???";
 						break;
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index f9ab0d6..4e6940a 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -449,6 +449,7 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
 		case JOIN_SEMI:
 			break;
 		case JOIN_LEFT:
+		case JOIN_ASOF:
 		case JOIN_ANTI:
 			hjstate->hj_NullInnerTupleSlot =
 				ExecInitNullTupleSlot(estate,
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 572e9dc..7c34856 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -388,7 +388,7 @@ MJEvalInnerValues(MergeJoinState *mergestate, TupleTableSlot *innerslot)
  * for the current outer and inner tuples, respectively.
  */
 static int
-MJCompare(MergeJoinState *mergestate)
+MJCompare(MergeJoinState *mergestate, int* diffpos)
 {
 	int			result = 0;
 	bool		nulleqnull = false;
@@ -422,9 +422,13 @@ MJCompare(MergeJoinState *mergestate)
 									 &clause->ssup);
 
 		if (result != 0)
+		{
 			break;
+		}
 	}
 
+	*diffpos = i;
+
 	/*
 	 * If we had any NULL-vs-NULL inputs, we do not want to report that the
 	 * tuples are equal.  Instead, if result is still 0, change it to +1. This
@@ -506,6 +510,21 @@ MJFillInner(MergeJoinState *node)
 	return NULL;
 }
 
+/*
+ * In case of ASOF join we should find last matching tuple: inner tuple which timestamp is smaller or equal than timestapm of outer tuple.
+ * We Remember this tuple in mj_InnerTupleSlot and when find non-matching tuple, perform join with this marked tuple.
+ * mj_InnerTupleSlot will bbe restored by EXEC_MJ_JOINTUPLES using value saved in mj_NextInnerTupleSlot.
+ */
+static void
+MJAsofJoinTuple(MergeJoinState *node)
+{
+	node->mj_NextInnerTupleSlot = node->mj_InnerTupleSlot;
+	node->mj_InnerTupleSlot = node->mj_MarkedTupleSlot;
+	(void) MJEvalInnerValues(node, node->mj_InnerTupleSlot);
+	node->mj_MatchedAsof = false;
+	node->mj_JoinState = EXEC_MJ_JOINTUPLES;
+}
+
 
 /*
  * Check that a qual condition is constant true or constant false.
@@ -609,7 +628,7 @@ ExecMergeJoin(MergeJoinState *node)
 	ExprContext *econtext;
 	bool		doFillOuter;
 	bool		doFillInner;
-
+	int         diffpos;
 	/*
 	 * get information from node
 	 */
@@ -784,7 +803,11 @@ ExecMergeJoin(MergeJoinState *node)
 				econtext->ecxt_outertuple = outerTupleSlot;
 				innerTupleSlot = node->mj_InnerTupleSlot;
 				econtext->ecxt_innertuple = innerTupleSlot;
-
+				if (node->mj_NextInnerTupleSlot != NULL)
+				{
+					node->mj_InnerTupleSlot = node->mj_NextInnerTupleSlot;
+					node->mj_NextInnerTupleSlot = NULL;
+				}
 				qualResult = (joinqual == NULL ||
 							  ExecQual(joinqual, econtext));
 				MJ_DEBUG_QUAL(joinqual, qualResult);
@@ -884,15 +907,33 @@ ExecMergeJoin(MergeJoinState *node)
 						 * If they do not match then advance to next outer
 						 * tuple.
 						 */
-						compareResult = MJCompare(node);
+					    compareResult = MJCompare(node, &diffpos);
 						MJ_DEBUG_COMPARE(compareResult);
 
 						if (compareResult == 0)
-							node->mj_JoinState = EXEC_MJ_JOINTUPLES;
+						{
+							if (node->js.jointype == JOIN_ASOF)
+							{
+								node->mj_JoinState = EXEC_MJ_NEXTINNER;
+								node->mj_MatchedAsof = true;
+								MarkInnerTuple(node->mj_InnerTupleSlot, node);
+							}
+							else
+							{
+								node->mj_JoinState = EXEC_MJ_JOINTUPLES;
+							}
+						}
 						else
 						{
 							Assert(compareResult < 0);
-							node->mj_JoinState = EXEC_MJ_NEXTOUTER;
+							if (node->mj_MatchedAsof)
+							{
+								MJAsofJoinTuple(node);
+							}
+							else
+							{
+								node->mj_JoinState = EXEC_MJ_NEXTOUTER;
+							}
 						}
 						break;
 					case MJEVAL_NONMATCHABLE:
@@ -914,9 +955,16 @@ ExecMergeJoin(MergeJoinState *node)
 						 * because we are not transiting to a state where the
 						 * inner plan is assumed to be exhausted.)
 						 */
-						node->mj_InnerTupleSlot = NULL;
-						node->mj_JoinState = EXEC_MJ_NEXTOUTER;
-						break;
+					     if (node->mj_MatchedAsof)
+						 {
+							 MJAsofJoinTuple(node);
+						 }
+						 else
+						 {
+							 node->mj_InnerTupleSlot = NULL;
+							 node->mj_JoinState = EXEC_MJ_NEXTOUTER;
+						 }
+						 break;
 				}
 				break;
 
@@ -971,7 +1019,7 @@ ExecMergeJoin(MergeJoinState *node)
 				{
 					case MJEVAL_MATCHABLE:
 						/* Go test the new tuple against the marked tuple */
-						node->mj_JoinState = EXEC_MJ_TESTOUTER;
+					    node->mj_JoinState = EXEC_MJ_TESTOUTER;
 						break;
 					case MJEVAL_NONMATCHABLE:
 						/* Can't match, so fetch next outer tuple */
@@ -1041,7 +1089,7 @@ ExecMergeJoin(MergeJoinState *node)
 				innerTupleSlot = node->mj_MarkedTupleSlot;
 				(void) MJEvalInnerValues(node, innerTupleSlot);
 
-				compareResult = MJCompare(node);
+				compareResult = MJCompare(node, &diffpos);
 				MJ_DEBUG_COMPARE(compareResult);
 
 				if (compareResult == 0)
@@ -1129,7 +1177,14 @@ ExecMergeJoin(MergeJoinState *node)
 								 * Need to emit left-join tuples for remaining
 								 * outer tuples.
 								 */
-								node->mj_JoinState = EXEC_MJ_ENDINNER;
+								if (node->mj_MatchedAsof)
+								{
+									MJAsofJoinTuple(node);
+								}
+								else
+								{
+									node->mj_JoinState = EXEC_MJ_ENDINNER;
+								}
 								break;
 							}
 							/* Otherwise we're done. */
@@ -1175,23 +1230,63 @@ ExecMergeJoin(MergeJoinState *node)
 				 * satisfy the mergeclauses.  If they do, then we update the
 				 * marked tuple position and go join them.
 				 */
-				compareResult = MJCompare(node);
+				compareResult = MJCompare(node, &diffpos);
 				MJ_DEBUG_COMPARE(compareResult);
 
 				if (compareResult == 0)
 				{
-					if (!node->mj_SkipMarkRestore)
-						ExecMarkPos(innerPlan);
+					if (node->js.jointype == JOIN_ASOF)
+					{
+						MarkInnerTuple(node->mj_InnerTupleSlot, node);
+						node->mj_MatchedAsof = true;
+						node->mj_JoinState = EXEC_MJ_NEXTINNER;
+					}
+					else
+					{
+						if (!node->mj_SkipMarkRestore)
+							ExecMarkPos(innerPlan);
 
-					MarkInnerTuple(node->mj_InnerTupleSlot, node);
+						MarkInnerTuple(node->mj_InnerTupleSlot, node);
 
-					node->mj_JoinState = EXEC_MJ_JOINTUPLES;
+						node->mj_JoinState = EXEC_MJ_JOINTUPLES;
+					}
 				}
 				else if (compareResult < 0)
-					node->mj_JoinState = EXEC_MJ_SKIPOUTER_ADVANCE;
+				{
+					if (node->mj_MatchedAsof)
+					{
+						MJAsofJoinTuple(node);
+					}
+					else
+					{
+						node->mj_JoinState = EXEC_MJ_SKIPOUTER_ADVANCE;
+					}
+				}
 				else
+				{
 					/* compareResult > 0 */
-					node->mj_JoinState = EXEC_MJ_SKIPINNER_ADVANCE;
+					if (node->js.jointype == JOIN_ASOF)
+					{
+						if (diffpos == node->mj_NumClauses-1)
+						{
+							MarkInnerTuple(node->mj_InnerTupleSlot, node);
+							node->mj_MatchedAsof = true;
+							node->mj_JoinState = EXEC_MJ_SKIPINNER_ADVANCE;
+						}
+						else if (node->mj_MatchedAsof)
+						{
+							MJAsofJoinTuple(node);
+						}
+						else
+						{
+							node->mj_JoinState = EXEC_MJ_SKIPINNER_ADVANCE;
+						}
+					}
+					else
+					{
+						node->mj_JoinState = EXEC_MJ_SKIPINNER_ADVANCE;
+					}
+				}
 				break;
 
 				/*
@@ -1318,7 +1413,14 @@ ExecMergeJoin(MergeJoinState *node)
 							 * Need to emit left-join tuples for remaining
 							 * outer tuples.
 							 */
-							node->mj_JoinState = EXEC_MJ_ENDINNER;
+							if (node->mj_MatchedAsof)
+							{
+								MJAsofJoinTuple(node);
+							}
+							else
+							{
+								node->mj_JoinState = EXEC_MJ_ENDINNER;
+							}
 							break;
 						}
 						/* Otherwise we're done. */
@@ -1518,7 +1620,8 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
 	 * detect whether we need only consider the first matching inner tuple
 	 */
 	mergestate->js.single_match = (node->join.inner_unique ||
-								   node->join.jointype == JOIN_SEMI);
+								   node->join.jointype == JOIN_SEMI ||
+								   node->join.jointype == JOIN_ASOF);
 
 	/* set up null tuples for outer joins, if needed */
 	switch (node->join.jointype)
@@ -1530,6 +1633,7 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
 			break;
 		case JOIN_LEFT:
 		case JOIN_ANTI:
+		case JOIN_ASOF:
 			mergestate->mj_FillOuter = true;
 			mergestate->mj_FillInner = false;
 			mergestate->mj_NullInnerTupleSlot =
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index f062c6b..5aaed57 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2449,6 +2449,7 @@ initial_cost_mergejoin(PlannerInfo *root, JoinCostWorkspace *workspace,
 			innerendsel = cache->leftendsel;
 		}
 		if (jointype == JOIN_LEFT ||
+			jointype == JOIN_ASOF ||
 			jointype == JOIN_ANTI)
 		{
 			outerstartsel = 0.0;
@@ -4254,6 +4255,7 @@ calc_joinrel_size_estimate(PlannerInfo *root,
 			nrows = outer_rows * inner_rows * fkselec * jselec;
 			/* pselec not used */
 			break;
+		case JOIN_ASOF:
 		case JOIN_LEFT:
 			nrows = outer_rows * inner_rows * fkselec * jselec;
 			if (nrows < outer_rows)
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index c130d2f..f354be5 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -1266,6 +1266,7 @@ match_unsorted_outer(PlannerInfo *root,
 			break;
 		case JOIN_RIGHT:
 		case JOIN_FULL:
+		case JOIN_ASOF:
 			nestjoinOK = false;
 			useallclauses = true;
 			break;
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 5a68de3..7c981f1 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -782,6 +782,20 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
 								 JOIN_INNER, sjinfo,
 								 restrictlist);
 			break;
+		case JOIN_ASOF:
+			if (is_dummy_rel(rel1) ||
+				restriction_is_constant_false(restrictlist, true))
+			{
+				mark_dummy_rel(joinrel);
+				break;
+			}
+			if (restriction_is_constant_false(restrictlist, false) &&
+				bms_is_subset(rel2->relids, sjinfo->syn_righthand))
+				mark_dummy_rel(rel2);
+			add_paths_to_joinrel(root, joinrel, rel1, rel2,
+								 JOIN_ASOF, sjinfo,
+								 restrictlist);
+			break;
 		case JOIN_LEFT:
 			if (is_dummy_rel(rel1) ||
 				restriction_is_constant_false(restrictlist, true))
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index ebd442a..ea16bc6 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -890,6 +890,7 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join,
 				/* and it doesn't force anything to null, either */
 				nullable_rels = NULL;
 				break;
+			case JOIN_ASOF:
 			case JOIN_LEFT:
 			case JOIN_ANTI:
 				leftjoinlist = deconstruct_recurse(root, j->larg,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 5cac171..77c24e5 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1679,6 +1679,7 @@ set_join_references(PlannerInfo *root, Join *join, int rtoffset)
 	 */
 	switch (join->jointype)
 	{
+		case JOIN_ASOF:
 		case JOIN_LEFT:
 		case JOIN_SEMI:
 		case JOIN_ANTI:
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 749ea80..f9b1009 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -273,6 +273,7 @@ pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode,
 														 NULL, NULL);
 				break;
 			case JOIN_LEFT:
+			case JOIN_ASOF:
 				j->quals = pull_up_sublinks_qual_recurse(root, j->quals,
 														 &j->rarg,
 														 rightrelids,
@@ -805,6 +806,7 @@ pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode,
 			case JOIN_LEFT:
 			case JOIN_SEMI:
 			case JOIN_ANTI:
+			case JOIN_ASOF:
 				j->larg = pull_up_subqueries_recurse(root, j->larg,
 													 j,
 												   lowest_nulling_outer_join,
@@ -2626,6 +2628,7 @@ reduce_outer_joins_pass2(Node *jtnode,
 				break;
 			case JOIN_SEMI:
 			case JOIN_ANTI:
+			case JOIN_ASOF:
 
 				/*
 				 * These could only have been introduced by pull_up_sublinks,
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index ada95e5..b3b6edb 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -602,7 +602,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 /* ordinary key words in alphabetical order */
 %token <keyword> ABORT_P ABSOLUTE_P ACCESS ACTION ADD_P ADMIN AFTER
-	AGGREGATE ALL ALSO ALTER ALWAYS ANALYSE ANALYZE AND ANY ARRAY AS ASC
+	AGGREGATE ALL ALSO ALTER ALWAYS ANALYSE ANALYZE AND ANY ARRAY AS ASC ASOF_P
 	ASSERTION ASSIGNMENT ASYMMETRIC AT ATTACH ATTRIBUTE AUTHORIZATION
 
 	BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BINARY BIT
@@ -763,7 +763,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
  * They wouldn't be given a precedence at all, were it not that we need
  * left-associativity among the JOIN rules themselves.
  */
-%left		JOIN CROSS LEFT FULL RIGHT INNER_P NATURAL
+%left		JOIN CROSS LEFT FULL RIGHT INNER_P ASOF_P NATURAL
 /* kluge to keep xml_whitespace_option from causing shift/reduce conflicts */
 %right		PRESERVE STRIP_P
 
@@ -11561,8 +11561,16 @@ joined_table:
 					n->rarg = $4;
 					if ($5 != NULL && IsA($5, List))
 						n->usingClause = (List *) $5; /* USING clause */
-					else
+					else { 
 						n->quals = $5; /* ON clause */
+						if (n->jointype == JOIN_ASOF) 
+						{
+							ereport(ERROR,
+									(errcode(ERRCODE_SYNTAX_ERROR),
+									 errmsg("ASOF join requires USING clause"),
+									 parser_errposition(@2)));
+						}
+					}
 					$$ = n;
 				}
 			| table_ref JOIN table_ref join_qual
@@ -11668,6 +11676,7 @@ join_type:	FULL join_outer							{ $$ = JOIN_FULL; }
 			| LEFT join_outer						{ $$ = JOIN_LEFT; }
 			| RIGHT join_outer						{ $$ = JOIN_RIGHT; }
 			| INNER_P								{ $$ = JOIN_INNER; }
+            | ASOF_P join_outer						{ $$ = JOIN_ASOF; }
 		;
 
 /* OUTER is just noise... */
@@ -14952,7 +14961,8 @@ col_name_keyword:
  * - thomas 2000-11-28
  */
 type_func_name_keyword:
-			  AUTHORIZATION
+              ASOF_P
+			| AUTHORIZATION
 			| BINARY
 			| COLLATION
 			| CONCURRENTLY
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 3d5b208..12e24a5 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -1631,6 +1631,7 @@ buildMergedJoinVar(ParseState *pstate, JoinType jointype,
 				res_node = l_node;
 			break;
 		case JOIN_LEFT:
+		case JOIN_ASOF:
 			/* Always use left var */
 			res_node = l_node;
 			break;
diff --git a/src/backend/parser/parse_cte.c b/src/backend/parser/parse_cte.c
index dfbcaa2..83f30ae 100644
--- a/src/backend/parser/parse_cte.c
+++ b/src/backend/parser/parse_cte.c
@@ -854,6 +854,7 @@ checkWellFormedRecursionWalker(Node *node, CteState *cstate)
 				checkWellFormedRecursionWalker(j->quals, cstate);
 				break;
 			case JOIN_LEFT:
+			case JOIN_ASOF:
 				checkWellFormedRecursionWalker(j->larg, cstate);
 				if (save_context == RECURSION_OK)
 					cstate->context = RECURSION_OUTERJOIN;
diff --git a/src/backend/utils/adt/network_selfuncs.c b/src/backend/utils/adt/network_selfuncs.c
index 1d29ecd..8558cfb 100644
--- a/src/backend/utils/adt/network_selfuncs.c
+++ b/src/backend/utils/adt/network_selfuncs.c
@@ -212,6 +212,7 @@ networkjoinsel(PG_FUNCTION_ARGS)
 	switch (sjinfo->jointype)
 	{
 		case JOIN_INNER:
+		case JOIN_ASOF:
 		case JOIN_LEFT:
 		case JOIN_FULL:
 
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 6a0d273..3c4a597 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -9938,6 +9938,12 @@ get_from_clause_item(Node *jtnode, Query *query, deparse_context *context)
 									 PRETTYINDENT_STD,
 									 PRETTYINDENT_JOIN);
 				break;
+			case JOIN_ASOF:
+				appendContextKeyword(context, " ASOF JOIN ",
+									 -PRETTYINDENT_STD,
+									 PRETTYINDENT_STD,
+									 PRETTYINDENT_JOIN);
+				break;
 			case JOIN_FULL:
 				appendContextKeyword(context, " FULL JOIN ",
 									 -PRETTYINDENT_STD,
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 22dabf5..f502667 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -2218,6 +2218,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
 	switch (sjinfo->jointype)
 	{
 		case JOIN_INNER:
+		case JOIN_ASOF:
 		case JOIN_LEFT:
 		case JOIN_FULL:
 			selec = eqjoinsel_inner(operator, &vardata1, &vardata2);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index d33392f..2ab8e4a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1606,9 +1606,11 @@ typedef struct MergeJoinState
 	bool		mj_FillInner;
 	bool		mj_MatchedOuter;
 	bool		mj_MatchedInner;
+	bool        mj_MatchedAsof;
 	TupleTableSlot *mj_OuterTupleSlot;
 	TupleTableSlot *mj_InnerTupleSlot;
 	TupleTableSlot *mj_MarkedTupleSlot;
+	TupleTableSlot *mj_NextInnerTupleSlot;
 	TupleTableSlot *mj_NullOuterTupleSlot;
 	TupleTableSlot *mj_NullInnerTupleSlot;
 	ExprContext *mj_OuterEContext;
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 15de936..2adc2bf 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -697,7 +697,9 @@ typedef enum JoinType
 	 * by the executor (nor, indeed, by most of the planner).
 	 */
 	JOIN_UNIQUE_OUTER,			/* LHS path must be made unique */
-	JOIN_UNIQUE_INNER			/* RHS path must be made unique */
+	JOIN_UNIQUE_INNER,			/* RHS path must be made unique */
+
+	JOIN_ASOF                   /* ASOF join (http://code.kx.com/wiki/Reference/aj) */
 
 	/*
 	 * We might need additional join types someday.
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index f50e45e..74ef76e 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -45,6 +45,7 @@ PG_KEYWORD("any", ANY, RESERVED_KEYWORD)
 PG_KEYWORD("array", ARRAY, RESERVED_KEYWORD)
 PG_KEYWORD("as", AS, RESERVED_KEYWORD)
 PG_KEYWORD("asc", ASC, RESERVED_KEYWORD)
+PG_KEYWORD("asof", ASOF_P, TYPE_FUNC_NAME_KEYWORD)
 PG_KEYWORD("assertion", ASSERTION, UNRESERVED_KEYWORD)
 PG_KEYWORD("assignment", ASSIGNMENT, UNRESERVED_KEYWORD)
 PG_KEYWORD("asymmetric", ASYMMETRIC, RESERVED_KEYWORD)
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 9cad4e6..402f6c6 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5685,3 +5685,73 @@ where exists (select 1 from j3
 (13 rows)
 
 drop table j3;
+create table t(sym varchar,dt time,qty bigint);
+create index ti on t(sym,dt);
+create table q(sym varchar,dt time,px real);
+create index qi on q(sym,dt);
+insert into t values ('msft','10:01:01',100),('msft','10:01:01',101),('ibm','10:01:03',200),('ge','10:01:04',150);
+insert into q values ('ibm','10:01:00',100),('msft','10:01:00',99),('msft','10:01:01',101),('ibm','10:01:02',98); 
+select * from t asof join q using (sym,dt);
+ sym  |    dt    | qty | px  
+------+----------+-----+-----
+ ge   | 10:01:04 | 150 |    
+ ibm  | 10:01:03 | 200 |  98
+ msft | 10:01:01 | 101 | 101
+ msft | 10:01:01 | 100 | 101
+(4 rows)
+
+explain select * from t asof join q using (sym,dt);
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Merge Asof Join  (cost=0.30..140.61 rows=1 width=52)
+   Merge Cond: (((t.sym)::text = (q.sym)::text) AND (t.dt = q.dt))
+   ->  Index Scan using ti on t  (cost=0.15..64.20 rows=1070 width=48)
+   ->  Index Scan using qi on q  (cost=0.15..65.10 rows=1130 width=44)
+(4 rows)
+
+drop table t;
+drop table q;
+create table t(sym varchar,dt time,qty bigint);
+create table q(sym varchar,dt time,px real);
+insert into t values ('msft','10:01:01',100),('msft','10:01:01',101),('ibm','10:01:03',200),('ge','10:01:04',150),('ora','10:01:05',250);
+insert into q values ('ibm','10:01:00',100),('msft','10:01:00',99),('msft','10:01:01',101),('ibm','10:01:02',98),('ora','10:01:03',111); 
+select * from t asof join q using (sym,dt);
+ sym  |    dt    | qty | px  
+------+----------+-----+-----
+ ge   | 10:01:04 | 150 |    
+ ibm  | 10:01:03 | 200 |  98
+ msft | 10:01:01 | 100 | 101
+ msft | 10:01:01 | 101 | 101
+ ora  | 10:01:05 | 250 | 111
+(5 rows)
+
+explain select * from t asof join q using (sym,dt);
+                            QUERY PLAN                             
+-------------------------------------------------------------------
+ Merge Asof Join  (cost=153.14..169.94 rows=1 width=52)
+   Merge Cond: (((t.sym)::text = (q.sym)::text) AND (t.dt = q.dt))
+   ->  Sort  (cost=74.54..77.21 rows=1070 width=48)
+         Sort Key: t.sym, t.dt
+         ->  Seq Scan on t  (cost=0.00..20.70 rows=1070 width=48)
+   ->  Sort  (cost=78.60..81.43 rows=1130 width=44)
+         Sort Key: q.sym, q.dt
+         ->  Seq Scan on q  (cost=0.00..21.30 rows=1130 width=44)
+(8 rows)
+
+drop table t;
+drop table q;
+create table t (tid serial primary key, dt time);
+insert into t (dt) values ('10:01'),('10:03'),('10:07'),('10:08');
+create table q (qid serial primary key, dt time);
+insert into q (dt) values ('10:02'),('10:04'),('10:06'),('10:08');
+select * from t asof outer join q using (dt);
+    dt    | tid | qid 
+----------+-----+-----
+ 10:01:00 |   1 |    
+ 10:03:00 |   2 |   1
+ 10:07:00 |   3 |   3
+ 10:08:00 |   4 |   4
+(4 rows)
+
+drop table t;
+drop table q;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index c3994ea..68ea6fc 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1878,3 +1878,31 @@ where exists (select 1 from j3
       and t1.unique1 < 1;
 
 drop table j3;
+
+create table t(sym varchar,dt time,qty bigint);
+create index ti on t(sym,dt);
+create table q(sym varchar,dt time,px real);
+create index qi on q(sym,dt);
+insert into t values ('msft','10:01:01',100),('msft','10:01:01',101),('ibm','10:01:03',200),('ge','10:01:04',150);
+insert into q values ('ibm','10:01:00',100),('msft','10:01:00',99),('msft','10:01:01',101),('ibm','10:01:02',98); 
+select * from t asof join q using (sym,dt);
+explain select * from t asof join q using (sym,dt);
+drop table t;
+drop table q;
+
+create table t(sym varchar,dt time,qty bigint);
+create table q(sym varchar,dt time,px real);
+insert into t values ('msft','10:01:01',100),('msft','10:01:01',101),('ibm','10:01:03',200),('ge','10:01:04',150),('ora','10:01:05',250);
+insert into q values ('ibm','10:01:00',100),('msft','10:01:00',99),('msft','10:01:01',101),('ibm','10:01:02',98),('ora','10:01:03',111); 
+select * from t asof join q using (sym,dt);
+explain select * from t asof join q using (sym,dt);
+drop table t;
+drop table q;
+
+create table t (tid serial primary key, dt time);
+insert into t (dt) values ('10:01'),('10:03'),('10:07'),('10:08');
+create table q (qid serial primary key, dt time);
+insert into q (dt) values ('10:02'),('10:04'),('10:06'),('10:08');
+select * from t asof outer join q using (dt);
+drop table t;
+drop table q;

Thomas Munro

thomas.munro@enterprisedb.com

over 8 years ago

In reply to: Konstantin Knizhnik (#4)

Re: ASOF join

On Mon, Jun 19, 2017 at 11:57 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

I attached simple patch adding ASOF join to Postgres. Right now it support
only outer join and requires USING clause (consequently it is not possible
to join two tables which joi keys has different names. May be it is also
possible to support ON clause with condition written like o.k1 = i.k2 AND
o.k2 = i.k2 AND ... AND o.kN >= i.kN
But such notation can be confusing, because join result includes only one
matching inner record with kN smaller or equal than kN of outer record and
not all such records.
As alternative we can add specia

Hmm. Yeah, I see the notational problem. It's hard to come up with a
new syntax that has SQL nature. What if... we didn't use a new syntax
at all, but recognised existing queries that are executable with this
strategy? Queries like this:

WITH ticks(time, price) AS
(VALUES ('2017-07-20 12:00:00'::timestamptz, 100.00),
('2017-07-21 11:00:00'::timestamptz, 150.00)),
times(time) AS
(VALUES ('2017-07-19 12:00:00'::timestamptz),
('2017-07-20 12:00:00'::timestamptz),
('2017-07-21 12:00:00'::timestamptz),
('2017-07-22 12:00:00'::timestamptz))

SELECT times.time, previous_tick.price
FROM times
LEFT JOIN LATERAL (SELECT * FROM ticks
WHERE ticks.time <= times.time
ORDER BY ticks.time DESC LIMIT 1) previous_tick ON true
ORDER BY times.time;

time | price
------------------------+--------
2017-07-19 12:00:00+12 |
2017-07-20 12:00:00+12 | 100.00
2017-07-21 12:00:00+12 | 150.00
2017-07-22 12:00:00+12 | 150.00
(4 rows)

I haven't used LATERAL much myself but I've noticed that it's often
used to express this type of thing. "Get me the latest ... as of time
...".

It'd a bit like the way we recognise EXISTS (...) as a semi-join and
execute it with a join operator instead of having a SEMI JOIN syntax.
On the other hand it's a bit more long winded, extreme and probably
quite niche.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

over 8 years ago

In reply to: Thomas Munro (#5)

Re: ASOF join

On 21.06.2017 11:00, Thomas Munro wrote:

Hmm. Yeah, I see the notational problem. It's hard to come up with a
new syntax that has SQL nature. What if... we didn't use a new syntax
at all, but recognised existing queries that are executable with this
strategy? Queries like this:

WITH ticks(time, price) AS
(VALUES ('2017-07-20 12:00:00'::timestamptz, 100.00),
('2017-07-21 11:00:00'::timestamptz, 150.00)),
times(time) AS
(VALUES ('2017-07-19 12:00:00'::timestamptz),
('2017-07-20 12:00:00'::timestamptz),
('2017-07-21 12:00:00'::timestamptz),
('2017-07-22 12:00:00'::timestamptz))

SELECT times.time, previous_tick.price
FROM times
LEFT JOIN LATERAL (SELECT * FROM ticks
WHERE ticks.time <= times.time
ORDER BY ticks.time DESC LIMIT 1) previous_tick ON true
ORDER BY times.time;

time | price
------------------------+--------
2017-07-19 12:00:00+12 |
2017-07-20 12:00:00+12 | 100.00
2017-07-21 12:00:00+12 | 150.00
2017-07-22 12:00:00+12 | 150.00
(4 rows)

I haven't used LATERAL much myself but I've noticed that it's often
used to express this type of thing. "Get me the latest ... as of time
...".

It'd a bit like the way we recognise EXISTS (...) as a semi-join and
execute it with a join operator instead of having a SEMI JOIN syntax.
On the other hand it's a bit more long winded, extreme and probably
quite niche.

Thank you for this idea. I agree that it is the best way of implementing
ASOF join - just as optimization of standard SQL query.
But do you think that still it will be good idea to extend SQL syntax
with ASOF JOIN ... USING ... clause? It will significantly simplify
writing queries like above
and IMHO doesn't introduce some confusions with standard SQL syntax. My
primary idea of suggesting ASOF join for Postgres was not just building
more efficient plan (using merge join instead of nested loop) but also
simplifying writing of such queries. Or do you think that nobody will be
interested in non-standard SQL extensions?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thomas Munro

thomas.munro@enterprisedb.com

over 8 years ago

In reply to: Konstantin Knizhnik (#4)

Re: ASOF join

On Mon, Jun 19, 2017 at 11:57 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

On 16.06.2017 19:07, David Fetter wrote:

If you turn your head sideways, it's very similar to the range merge
join Jeff Davis proposed. https://commitfest.postgresql.org/14/1106/

May be, but I do not understand how to limit result to contain exactly one
(last) inner tuple for each outer tuple.

Yeah, it's somehow related but it's not the same thing. I guess you
can think of the keys in the ASOF case as modelling range starts, with
range ends implied by the record with next higher/lower key.
Concretely, if every 'tick' row from my example in a nearby message
had a time but also an end time to be set when a new tick is inserted
so that each tick row had the complete effective time range for that
tick, then you could rewrite the query as "get me the tick whose
effective time range overlaps with each value of times.time" and get a
nice range merge join with Jeff's patch. That sort of thing might be
very useful for SQL:2011 temporal query-style stuff, but it's not what
Konstantin wants to do: he wants to merge time series where the
effective range is not explicitly stored in every row.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thomas Munro

thomas.munro@enterprisedb.com

over 8 years ago

In reply to: Konstantin Knizhnik (#6)

Re: ASOF join

On Wed, Jun 21, 2017 at 9:46 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

Thank you for this idea. I agree that it is the best way of implementing
ASOF join - just as optimization of standard SQL query.

Great. I think this part definitely has potential.

But do you think that still it will be good idea to extend SQL syntax with
ASOF JOIN ... USING ... clause? It will significantly simplify writing
queries like above
and IMHO doesn't introduce some confusions with standard SQL syntax. My
primary idea of suggesting ASOF join for Postgres was not just building
more efficient plan (using merge join instead of nested loop) but also
simplifying writing of such queries. Or do you think that nobody will be
interested in non-standard SQL extensions?

I can see the appeal, but I expect it to be difficult to convince the
project to accept a non-standard syntax for a niche use case that can
be expressed already. Q is super terse and designed for time series
data. SQL is neither of those things.

Some first reactions to the syntaxes you mentioned:

1. times LEFT ASOF JOIN ticks ON ticks.time <= times.time
2. times LEFT ASOF JOIN ticks USING (time)
3. times LEFT ASOF JOIN ticks USING (ticks.time, times.time)

The USING ideas don't seem to be general enough, because there is no
place to say whether to use a lower or higher value if there is no
match, or did I miss something? Relying on an ORDER BY clause in the
query to control the meaning of the join seems too weird, and making
it always (for example) <= would be an arbitrary limitation. The
first syntax at least has enough information: when you say one of <,

, <=, >= you also imply the search order. I'm not sure if there are

any problems with that, perhaps when combined with other quals.

The equivalent nearly-standard syntax is definitely quite verbose, but
it has the merit of being absolutely explicit about which row from
'ticks' will be selected:

times LEFT JOIN LATERAL (SELECT * FROM ticks
WHERE ticks.time <= times.time
ORDER BY ticks.time DESC LIMIT 1) x ON true

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers