ASOF join
I wonder if there were some discussion/attempts to add ASOF join to
Postgres (sorry, may be there is better term for it, I am refereeing
KDB definition: http://code.kx.com/wiki/Reference/aj ).
Such kind of join can be useful when we need to associate two
timeseries. It is quite popular in trading:
//join the eq price and size for the op trade time
a::aj[`underlyingSym`time;select time, underlyingSym, sym, putorcall,
ex, cond, price, seqnum, contracts, contractsTraded from t;eqtrades];
...and not only. Below is one example of how people now manually coding
ASOF join:
select
count(*),
count(*)
filter (where timedelta_prev < -30),
count(*)
filter (where ride_prev = ride_next),
... -- many different aggregates
from
(
select
p.provider_id,
p.ts,
(
select extract(epoch from t.ts - p.ts)
from providers_positions t
where p.provider_id = t.provider_id and t.ts < p.ts and
t.source = 'gps'
order by t.ts desc
limit 1
) as timedelta_prev,
(
select extract(epoch from t.ts - p.ts)
from providers_positions t
where p.provider_id = t.provider_id and t.ts > p.ts and
t.source = 'gps'
order by t.ts
limit 1
) as timedelta,
(
select ride_id
from providers_positions t
where p.provider_id = t.provider_id and t.ts < p.ts and
t.source = 'gps'
order by t.ts desc
limit 1
) as ride_prev,
(
select ride_id
from providers_positions t
where p.provider_id = t.provider_id and t.ts > p.ts and
t.source = 'gps'
order by t.ts
limit 1
) as ride_next
from (
select
provider_id,
ts,
event_name
from
lvl2_681_parsed p
) p
where
p.event_name = 'GPS signal restored'
-- offset 0
) z;
Without OFFSET 0 this query generates awful execution plans with
hundreds (!) of subplans corresponding to the subqueries.
Number of subplans (most of them are the same) is equal number of
occurrences of timedelta, timedelta_prev, ... columns in target aggregates.
OFFSET 0 reduce number of subplans to 4. And I expect that using LATERAL
join can reduce it to two and even without "OFFSET 0" trick.
But in any case - it is very complex and unnatural way of expressing
this not so complex query.
With ASOF join is can be written much simpler.
Also Postgres is implementing this query using nested loop with index
scan, which is definitely not the most efficient strategy.
The best way to implement ASOF join is to use something like mergejoin.
Usually there are indexes for both timeseries, so what we need is to
merge two ordered sets using ASOF join rules.
It will require minimal changes in SQL syntax, just adding ASOF keyword
seems to be enough:
select * from Trades ASOF JOIN EqTrades USING (underlyingSym,time);
It seems to me that adding ASOF joins should not require huge amount of
work and can be done with minimal number of changes in executor and
optimizer.
But may be there are some problems/challenges which I do not realize now?
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Fri, Jun 16, 2017 at 4:20 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
I wonder if there were some discussion/attempts to add ASOF join to Postgres
(sorry, may be there is better term for it, I am refereeing KDB definition:
http://code.kx.com/wiki/Reference/aj ).
Interesting idea. Also in Pandas:
I suppose you could write a function that pulls tuples out of a bunch
of cursors and zips them together like this, as a kind of hand-coded
special merge join "except that we match on nearest key rather than
equal keys" (as they put it).
I've written code like this before in a trading context, where we
called that 'previous tick interpolation', and in a scientific context
where other kinds of interpolation were called for (so not really
matching a tuple but synthesising one if no exact match). If you view
the former case as a kind of degenerate case of interpolation then it
doesn't feel like a "join" as we know it, but clearly it is. I had
never considered before that such things might belong inside the
database as a kind of join operator.
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 16, 2017 at 11:51:34AM +1200, Thomas Munro wrote:
On Fri, Jun 16, 2017 at 4:20 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:I wonder if there were some discussion/attempts to add ASOF join to Postgres
(sorry, may be there is better term for it, I am refereeing KDB definition:
http://code.kx.com/wiki/Reference/aj ).Interesting idea. Also in Pandas:
I suppose you could write a function that pulls tuples out of a bunch
of cursors and zips them together like this, as a kind of hand-coded
special merge join "except that we match on nearest key rather than
equal keys" (as they put it).I've written code like this before in a trading context, where we
called that 'previous tick interpolation', and in a scientific context
where other kinds of interpolation were called for (so not really
matching a tuple but synthesising one if no exact match). If you view
the former case as a kind of degenerate case of interpolation then it
doesn't feel like a "join" as we know it, but clearly it is. I had
never considered before that such things might belong inside the
database as a kind of join operator.
If you turn your head sideways, it's very similar to the range merge
join Jeff Davis proposed. https://commitfest.postgresql.org/14/1106/
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 16.06.2017 19:07, David Fetter wrote:
On Fri, Jun 16, 2017 at 11:51:34AM +1200, Thomas Munro wrote:
On Fri, Jun 16, 2017 at 4:20 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:I wonder if there were some discussion/attempts to add ASOF join to Postgres
(sorry, may be there is better term for it, I am refereeing KDB definition:
http://code.kx.com/wiki/Reference/aj ).Interesting idea. Also in Pandas:
I attached simple patch adding ASOF join to Postgres. Right now it
support only outer join and requires USING clause (consequently it is
not possible to join two tables which joi keys has different names. May
be it is also possible to support ON clause with condition written like
o.k1 = i.k2 AND o.k2 = i.k2 AND ... AND o.kN >= i.kN
But such notation can be confusing, because join result includes only
one matching inner record with kN smaller or equal than kN of outer
record and not all such records.
As alternative we can add specia
If people fin such construction really useful, I will continue work on it.
I suppose you could write a function that pulls tuples out of a bunch
of cursors and zips them together like this, as a kind of hand-coded
special merge join "except that we match on nearest key rather than
equal keys" (as they put it).I've written code like this before in a trading context, where we
called that 'previous tick interpolation', and in a scientific context
where other kinds of interpolation were called for (so not really
matching a tuple but synthesising one if no exact match). If you view
the former case as a kind of degenerate case of interpolation then it
doesn't feel like a "join" as we know it, but clearly it is. I had
never considered before that such things might belong inside the
database as a kind of join operator.If you turn your head sideways, it's very similar to the range merge
join Jeff Davis proposed. https://commitfest.postgresql.org/14/1106/
May be, but I do not understand how to limit result to contain exactly
one (last) inner tuple for each outer tuple.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
asof.patchtext/x-patch; name=asof.patchDownload+316-28
On Mon, Jun 19, 2017 at 11:57 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
I attached simple patch adding ASOF join to Postgres. Right now it support
only outer join and requires USING clause (consequently it is not possible
to join two tables which joi keys has different names. May be it is also
possible to support ON clause with condition written like o.k1 = i.k2 AND
o.k2 = i.k2 AND ... AND o.kN >= i.kN
But such notation can be confusing, because join result includes only one
matching inner record with kN smaller or equal than kN of outer record and
not all such records.
As alternative we can add specia
Hmm. Yeah, I see the notational problem. It's hard to come up with a
new syntax that has SQL nature. What if... we didn't use a new syntax
at all, but recognised existing queries that are executable with this
strategy? Queries like this:
WITH ticks(time, price) AS
(VALUES ('2017-07-20 12:00:00'::timestamptz, 100.00),
('2017-07-21 11:00:00'::timestamptz, 150.00)),
times(time) AS
(VALUES ('2017-07-19 12:00:00'::timestamptz),
('2017-07-20 12:00:00'::timestamptz),
('2017-07-21 12:00:00'::timestamptz),
('2017-07-22 12:00:00'::timestamptz))
SELECT times.time, previous_tick.price
FROM times
LEFT JOIN LATERAL (SELECT * FROM ticks
WHERE ticks.time <= times.time
ORDER BY ticks.time DESC LIMIT 1) previous_tick ON true
ORDER BY times.time;
time | price
------------------------+--------
2017-07-19 12:00:00+12 |
2017-07-20 12:00:00+12 | 100.00
2017-07-21 12:00:00+12 | 150.00
2017-07-22 12:00:00+12 | 150.00
(4 rows)
I haven't used LATERAL much myself but I've noticed that it's often
used to express this type of thing. "Get me the latest ... as of time
...".
It'd a bit like the way we recognise EXISTS (...) as a semi-join and
execute it with a join operator instead of having a SEMI JOIN syntax.
On the other hand it's a bit more long winded, extreme and probably
quite niche.
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 21.06.2017 11:00, Thomas Munro wrote:
Hmm. Yeah, I see the notational problem. It's hard to come up with a
new syntax that has SQL nature. What if... we didn't use a new syntax
at all, but recognised existing queries that are executable with this
strategy? Queries like this:WITH ticks(time, price) AS
(VALUES ('2017-07-20 12:00:00'::timestamptz, 100.00),
('2017-07-21 11:00:00'::timestamptz, 150.00)),
times(time) AS
(VALUES ('2017-07-19 12:00:00'::timestamptz),
('2017-07-20 12:00:00'::timestamptz),
('2017-07-21 12:00:00'::timestamptz),
('2017-07-22 12:00:00'::timestamptz))SELECT times.time, previous_tick.price
FROM times
LEFT JOIN LATERAL (SELECT * FROM ticks
WHERE ticks.time <= times.time
ORDER BY ticks.time DESC LIMIT 1) previous_tick ON true
ORDER BY times.time;time | price
------------------------+--------
2017-07-19 12:00:00+12 |
2017-07-20 12:00:00+12 | 100.00
2017-07-21 12:00:00+12 | 150.00
2017-07-22 12:00:00+12 | 150.00
(4 rows)I haven't used LATERAL much myself but I've noticed that it's often
used to express this type of thing. "Get me the latest ... as of time
...".It'd a bit like the way we recognise EXISTS (...) as a semi-join and
execute it with a join operator instead of having a SEMI JOIN syntax.
On the other hand it's a bit more long winded, extreme and probably
quite niche.
Thank you for this idea. I agree that it is the best way of implementing
ASOF join - just as optimization of standard SQL query.
But do you think that still it will be good idea to extend SQL syntax
with ASOF JOIN ... USING ... clause? It will significantly simplify
writing queries like above
and IMHO doesn't introduce some confusions with standard SQL syntax. My
primary idea of suggesting ASOF join for Postgres was not just building
more efficient plan (using merge join instead of nested loop) but also
simplifying writing of such queries. Or do you think that nobody will be
interested in non-standard SQL extensions?
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jun 19, 2017 at 11:57 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
On 16.06.2017 19:07, David Fetter wrote:
If you turn your head sideways, it's very similar to the range merge
join Jeff Davis proposed. https://commitfest.postgresql.org/14/1106/May be, but I do not understand how to limit result to contain exactly one
(last) inner tuple for each outer tuple.
Yeah, it's somehow related but it's not the same thing. I guess you
can think of the keys in the ASOF case as modelling range starts, with
range ends implied by the record with next higher/lower key.
Concretely, if every 'tick' row from my example in a nearby message
had a time but also an end time to be set when a new tick is inserted
so that each tick row had the complete effective time range for that
tick, then you could rewrite the query as "get me the tick whose
effective time range overlaps with each value of times.time" and get a
nice range merge join with Jeff's patch. That sort of thing might be
very useful for SQL:2011 temporal query-style stuff, but it's not what
Konstantin wants to do: he wants to merge time series where the
effective range is not explicitly stored in every row.
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 21, 2017 at 9:46 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
Thank you for this idea. I agree that it is the best way of implementing
ASOF join - just as optimization of standard SQL query.
Great. I think this part definitely has potential.
But do you think that still it will be good idea to extend SQL syntax with
ASOF JOIN ... USING ... clause? It will significantly simplify writing
queries like above
and IMHO doesn't introduce some confusions with standard SQL syntax. My
primary idea of suggesting ASOF join for Postgres was not just building
more efficient plan (using merge join instead of nested loop) but also
simplifying writing of such queries. Or do you think that nobody will be
interested in non-standard SQL extensions?
I can see the appeal, but I expect it to be difficult to convince the
project to accept a non-standard syntax for a niche use case that can
be expressed already. Q is super terse and designed for time series
data. SQL is neither of those things.
Some first reactions to the syntaxes you mentioned:
1. times LEFT ASOF JOIN ticks ON ticks.time <= times.time
2. times LEFT ASOF JOIN ticks USING (time)
3. times LEFT ASOF JOIN ticks USING (ticks.time, times.time)
The USING ideas don't seem to be general enough, because there is no
place to say whether to use a lower or higher value if there is no
match, or did I miss something? Relying on an ORDER BY clause in the
query to control the meaning of the join seems too weird, and making
it always (for example) <= would be an arbitrary limitation. The
first syntax at least has enough information: when you say one of <,
, <=, >= you also imply the search order. I'm not sure if there are
any problems with that, perhaps when combined with other quals.
The equivalent nearly-standard syntax is definitely quite verbose, but
it has the merit of being absolutely explicit about which row from
'ticks' will be selected:
times LEFT JOIN LATERAL (SELECT * FROM ticks
WHERE ticks.time <= times.time
ORDER BY ticks.time DESC LIMIT 1) x ON true
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers