Do we want a hashset type?
Hi,
I've been working with a social network start-up that uses PostgreSQL as their
only database. Recently, they became interested in graph databases, largely
because of an article [1]https://neo4j.com/news/how-much-faster-is-a-graph-database-really/ suggesting that a SQL database "just chokes" when it
encounters a depth-five friends-of-friends query (for a million users having
50 friends each).
The article didn't provide the complete SQL queries, so I had to buy the
referenced book to get the full picture. It turns out, the query was a simple
self-join, which, of course, isn't very efficient. When we rewrote the query
using a modern RECURSIVE CTE, it worked but still took quite some time.
Of course, there will always be a need for specific databases, and some queries
will run faster on them. But, if PostgreSQL could handle graph queries with a
Big-O runtime similar to graph databases, scalability wouldn't be such a big
worry.
Just like the addition of the JSON type to PostgreSQL helped reduce the hype
around NoSQL, maybe there's something simple that's missing in PostgreSQL that
could help us achieve the same Big-O class performance as graph databases for
some of these type of graph queries?
Looking into the key differences between PostgreSQL and graph databases,
it seems that one is how they store adjacent nodes. In SQL, a graph can be
represented as one table for the Nodes and another table for the Edges.
For a friends-of-friends query, we would need to query Edges to find adjacent
nodes, recursively.
Graph databases, on the other hand, keep adjacent nodes immediately accessible
by storing them with the node itself. This looks like a major difference in
terms of how the data is stored.
Could a hashset type help bridge this gap?
The idea would be to store adjacent nodes as a hashset column in a Nodes table.
Apache AGE is an option for users who really need a new graph query language.
But I believe there are more users who occasionally run into a graph problem and
would be glad if there was an efficient way to solve it in SQL without having
to bring in a new database.
/Joel
[1]: https://neo4j.com/news/how-much-faster-is-a-graph-database-really/
On 5/31/23 16:09, Joel Jacobson wrote:
Hi,
I've been working with a social network start-up that uses PostgreSQL as
their
only database. Recently, they became interested in graph databases, largely
because of an article [1] suggesting that a SQL database "just chokes"
when it
encounters a depth-five friends-of-friends query (for a million users having
50 friends each).The article didn't provide the complete SQL queries, so I had to buy the
referenced book to get the full picture. It turns out, the query was a
simple
self-join, which, of course, isn't very efficient. When we rewrote the query
using a modern RECURSIVE CTE, it worked but still took quite some time.Of course, there will always be a need for specific databases, and some
queries
will run faster on them. But, if PostgreSQL could handle graph queries
with a
Big-O runtime similar to graph databases, scalability wouldn't be such a big
worry.Just like the addition of the JSON type to PostgreSQL helped reduce the hype
around NoSQL, maybe there's something simple that's missing in
PostgreSQL that
could help us achieve the same Big-O class performance as graph
databases for
some of these type of graph queries?Looking into the key differences between PostgreSQL and graph databases,
it seems that one is how they store adjacent nodes. In SQL, a graph can be
represented as one table for the Nodes and another table for the Edges.
For a friends-of-friends query, we would need to query Edges to find
adjacent
nodes, recursively.Graph databases, on the other hand, keep adjacent nodes immediately
accessible
by storing them with the node itself. This looks like a major difference in
terms of how the data is stored.Could a hashset type help bridge this gap?
The idea would be to store adjacent nodes as a hashset column in a Nodes
table.
I think this needs a better explanation - what exactly is a hashset in
this context? Something like an array with a hash for faster lookup of
unique elements, or what?
Presumably it'd store whole adjacent nodes, not just some sort of node
ID. So what if a node is adjacent to many other nodes? What if a node is
added/deleted/modified?
AFAICS the main problem is the lookups of adjacent nodes, generating
lot of random I/O etc. Presumably it's not that hard to keep the
"relational" schema with table for vertices/edges, and then an auxiliary
table with adjacent nodes grouped by node, possibly maintained by a
couple triggers. A bit like an "aggregated index" except the queries
would have to use it explicitly.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, May 31, 2023, at 16:53, Tomas Vondra wrote:
I think this needs a better explanation - what exactly is a hashset in
this context? Something like an array with a hash for faster lookup of
unique elements, or what?
In this context, by "hashset" I am indeed referring to a data structure similar
to an array, where each element would be unique, and lookups would be faster
than arrays for larger number of elements due to hash-based lookups.
This data structure would store identifiers (IDs) of the nodes, not the complete
nodes themselves.
Presumably it'd store whole adjacent nodes, not just some sort of node
ID. So what if a node is adjacent to many other nodes? What if a node is
added/deleted/modified?
That would require updating the hashset, which should be close to O(1) in
practical applications.
AFAICS the main problem is the lookups of adjacent nodes, generating
lot of random I/O etc. Presumably it's not that hard to keep the
"relational" schema with table for vertices/edges, and then an auxiliary
table with adjacent nodes grouped by node, possibly maintained by a
couple triggers. A bit like an "aggregated index" except the queries
would have to use it explicitly.
Yes, auxiliary table would be good, since we don't want to duplicate all
node-related data, and only store the IDs in the adjacent nodes hashset.
/Joel
On 5/31/23 17:40, Joel Jacobson wrote:
On Wed, May 31, 2023, at 16:53, Tomas Vondra wrote:
I think this needs a better explanation - what exactly is a hashset in
this context? Something like an array with a hash for faster lookup of
unique elements, or what?In this context, by "hashset" I am indeed referring to a data structure similar
to an array, where each element would be unique, and lookups would be faster
than arrays for larger number of elements due to hash-based lookups.
Why would you want hash-based lookups? It should be fairly trivial to
implement as a user-defined data type, but in what cases are you going
to ask "does the hashset contain X"?
This data structure would store identifiers (IDs) of the nodes, not the complete
nodes themselves.
How does storing just the IDs solves anything? Isn't the main challenge
the random I/O when fetching the adjacent nodes? This does not really
improve that, no?
Presumably it'd store whole adjacent nodes, not just some sort of node
ID. So what if a node is adjacent to many other nodes? What if a node is
added/deleted/modified?That would require updating the hashset, which should be close to O(1) in
practical applications.
But you need to modify hashsets for all the adjacent nodes. Also, O(1)
doesn't say it's cheap. I wonder how expensive it'd be in practice.
AFAICS the main problem is the lookups of adjacent nodes, generating
lot of random I/O etc. Presumably it's not that hard to keep the
"relational" schema with table for vertices/edges, and then an auxiliary
table with adjacent nodes grouped by node, possibly maintained by a
couple triggers. A bit like an "aggregated index" except the queries
would have to use it explicitly.Yes, auxiliary table would be good, since we don't want to duplicate all
node-related data, and only store the IDs in the adjacent nodes hashset.
I may be missing something, but as mentioned, I don't quite see how this
would help. What exactly would this save us? If you create an index on
the edge node IDs, you should get the adjacent nodes pretty cheap from
an IOS. Granted, a pre-built hashset is going to be faster, but if the
next step is fetching the node info, that's going to do a lot of random
I/O, dwarfing all of this.
It's entirely possible I'm missing some important aspect. It'd be very
useful to have a couple example queries illustrating the issue, that we
could use to actually test different approaches.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, May 31, 2023, at 18:59, Tomas Vondra wrote:
How does storing just the IDs solves anything? Isn't the main challenge
the random I/O when fetching the adjacent nodes? This does not really
improve that, no?
I'm thinking of a recursive query where a lot of time is just spent following
all friends-of-friends, where the graph traversal is the heavy part,
where the final set of nodes are only fetched at the end.
It's entirely possible I'm missing some important aspect. It'd be very
useful to have a couple example queries illustrating the issue, that we
could use to actually test different approaches.
Here is an example using a real anonymised social network.
wget https://snap.stanford.edu/data/soc-pokec-relationships.txt.gz
gunzip soc-pokec-relationships.txt.gz
CREATE TABLE edges (from_node INT, to_node INT);
\copy edges from soc-pokec-relationships.txt;
ALTER TABLE edges ADD PRIMARY KEY (from_node, to_node);
SET work_mem TO '1GB';
CREATE VIEW friends_of_friends AS
WITH RECURSIVE friends_of_friends AS (
SELECT
ARRAY[5867::bigint] AS current,
ARRAY[5867::bigint] AS found,
0 AS depth
UNION ALL
SELECT
new_current,
found || new_current,
friends_of_friends.depth + 1
FROM
friends_of_friends
CROSS JOIN LATERAL (
SELECT
array_agg(DISTINCT edges.to_node) AS new_current
FROM
edges
WHERE
from_node = ANY(friends_of_friends.current)
) q
WHERE
friends_of_friends.depth < 3
)
SELECT
depth,
coalesce(array_length(current, 1), 0)
FROM
friends_of_friends
WHERE
depth = 3;
;
SELECT COUNT(*) FROM edges;
count
----------
30622564
(1 row)
SELECT COUNT(DISTINCT from_node) FROM edges;
count
---------
1432693
(1 row)
-- Most connected user (worst-case) is user 5867 with 8763 friends:
SELECT from_node, COUNT(*) FROM edges GROUP BY from_node ORDER BY COUNT(*) DESC LIMIT 1;
from_node | count
-----------+-------
5867 | 8763
(1 row)
-- Friend-of-friends query exactly at depth three:
EXPLAIN ANALYZE
SELECT * FROM friends_of_friends;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on friends_of_friends (cost=6017516.90..6017517.60 rows=1 width=8) (actual time=2585.881..2586.334 rows=1 loops=1)
Filter: (depth = 3)
Rows Removed by Filter: 3
CTE friends_of_friends
-> Recursive Union (cost=0.00..6017516.90 rows=31 width=68) (actual time=0.005..2581.664 rows=4 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=68) (actual time=0.002..0.002 rows=1 loops=1)
-> Subquery Scan on "*SELECT* 2" (cost=200583.71..601751.66 rows=3 width=68) (actual time=645.036..645.157 rows=1 loops=4)
-> Nested Loop (cost=200583.71..601751.54 rows=3 width=68) (actual time=641.880..641.972 rows=1 loops=4)
-> WorkTable Scan on friends_of_friends friends_of_friends_1 (cost=0.00..0.22 rows=3 width=68) (actual time=0.001..0.002 rows=1 loops=4)
Filter: (depth < 3)
Rows Removed by Filter: 0
-> Aggregate (cost=200583.71..200583.72 rows=1 width=32) (actual time=850.997..850.998 rows=1 loops=3)
-> Bitmap Heap Scan on edges (cost=27656.38..196840.88 rows=1497133 width=4) (actual time=203.239..423.534 rows=3486910 loops=3)
Recheck Cond: (from_node = ANY (friends_of_friends_1.current))
Heap Blocks: exact=117876
-> Bitmap Index Scan on edges_pkey (cost=0.00..27282.10 rows=1497133 width=0) (actual time=198.047..198.047 rows=3486910 loops=3)
Index Cond: (from_node = ANY (friends_of_friends_1.current))
Planning Time: 1.414 ms
Execution Time: 2588.288 ms
(19 rows)
I tested on PostgreSQL 15.2. For some reason I got a different slower query on HEAD:
SELECT * FROM friends_of_friends;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on friends_of_friends (cost=6576.67..6577.37 rows=1 width=8) (actual time=6412.693..6413.335 rows=1 loops=1)
Filter: (depth = 3)
Rows Removed by Filter: 3
CTE friends_of_friends
-> Recursive Union (cost=0.00..6576.67 rows=31 width=68) (actual time=0.008..6407.134 rows=4 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=68) (actual time=0.005..0.005 rows=1 loops=1)
-> Subquery Scan on "*SELECT* 2" (cost=219.05..657.64 rows=3 width=68) (actual time=1600.747..1600.934 rows=1 loops=4)
-> Nested Loop (cost=219.05..657.52 rows=3 width=68) (actual time=1594.906..1595.035 rows=1 loops=4)
-> WorkTable Scan on friends_of_friends friends_of_friends_1 (cost=0.00..0.22 rows=3 width=68) (actual time=0.001..0.002 rows=1 loops=4)
Filter: (depth < 3)
Rows Removed by Filter: 0
-> Aggregate (cost=219.05..219.06 rows=1 width=32) (actual time=2118.105..2118.105 rows=1 loops=3)
-> Sort (cost=207.94..213.49 rows=2221 width=4) (actual time=1780.770..1925.853 rows=3486910 loops=3)
Sort Key: edges.to_node
Sort Method: quicksort Memory: 393217kB
-> Index Only Scan using edges_pkey on edges (cost=0.56..84.48 rows=2221 width=4) (actual time=0.077..762.408 rows=3486910 loops=3)
Index Cond: (from_node = ANY (friends_of_friends_1.current))
Heap Fetches: 0
Planning Time: 8.229 ms
Execution Time: 6446.421 ms
(20 rows)
On Thu, Jun 1, 2023, at 09:02, Joel Jacobson wrote:
Here is an example using a real anonymised social network.
I realised the "found" column is not necessary in this particular example,
since we only care about the friends at the exact depth level. Simplified query:
CREATE OR REPLACE VIEW friends_of_friends AS
WITH RECURSIVE friends_of_friends AS (
SELECT
ARRAY[5867::bigint] AS current,
0 AS depth
UNION ALL
SELECT
new_current,
friends_of_friends.depth + 1
FROM
friends_of_friends
CROSS JOIN LATERAL (
SELECT
array_agg(DISTINCT edges.to_node) AS new_current
FROM
edges
WHERE
from_node = ANY(friends_of_friends.current)
) q
WHERE
friends_of_friends.depth < 3
)
SELECT
depth,
coalesce(array_length(current, 1), 0)
FROM
friends_of_friends
WHERE
depth = 3;
;
-- PostgreSQL 15.2:
EXPLAIN ANALYZE SELECT * FROM friends_of_friends;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on friends_of_friends (cost=2687.88..2688.58 rows=1 width=8) (actual time=2076.362..2076.454 rows=1 loops=1)
Filter: (depth = 3)
Rows Removed by Filter: 3
CTE friends_of_friends
-> Recursive Union (cost=0.00..2687.88 rows=31 width=36) (actual time=0.008..2075.073 rows=4 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=36) (actual time=0.002..0.002 rows=1 loops=1)
-> Subquery Scan on "*SELECT* 2" (cost=89.44..268.75 rows=3 width=36) (actual time=518.613..518.622 rows=1 loops=4)
-> Nested Loop (cost=89.44..268.64 rows=3 width=36) (actual time=515.523..515.523 rows=1 loops=4)
-> WorkTable Scan on friends_of_friends friends_of_friends_1 (cost=0.00..0.22 rows=3 width=36) (actual time=0.001..0.001 rows=1 loops=4)
Filter: (depth < 3)
Rows Removed by Filter: 0
-> Aggregate (cost=89.44..89.45 rows=1 width=32) (actual time=687.356..687.356 rows=1 loops=3)
-> Index Only Scan using edges_pkey on edges (cost=0.56..83.96 rows=2191 width=4) (actual time=0.139..290.996 rows=3486910 loops=3)
Index Cond: (from_node = ANY (friends_of_friends_1.current))
Heap Fetches: 0
Planning Time: 0.557 ms
Execution Time: 2076.990 ms
(17 rows)
On 2023-05-31 We 11:40, Joel Jacobson wrote:
On Wed, May 31, 2023, at 16:53, Tomas Vondra wrote:
I think this needs a better explanation - what exactly is a hashset in
this context? Something like an array with a hash for faster lookup of
unique elements, or what?In this context, by "hashset" I am indeed referring to a data structure similar
to an array, where each element would be unique, and lookups would be faster
than arrays for larger number of elements due to hash-based lookups.
Yeah, a fast lookup set type has long been on my "blue sky" wish list.
So +1 for pursuing the idea.
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.com
On Wed, 31 May 2023 at 18:40, Joel Jacobson <joel@compiler.org> wrote:
On Wed, May 31, 2023, at 16:53, Tomas Vondra wrote:
I think this needs a better explanation - what exactly is a hashset in
this context? Something like an array with a hash for faster lookup of
unique elements, or what?In this context, by "hashset" I am indeed referring to a data structure similar
to an array, where each element would be unique, and lookups would be faster
than arrays for larger number of elements due to hash-based lookups.This data structure would store identifiers (IDs) of the nodes, not the complete
nodes themselves.
Have you looked at roaring bitmaps? There is a pg_roaringbitmap
extension [1]https://github.com/ChenHuajun/pg_roaringbitmap already available that offers very fast unions,
intersections and membership tests over integer sets. I used it to get
some pretty impressive performance results for faceting search on
large document sets. [2]https://github.com/cybertec-postgresql/pgfaceting -- Ants Aasma www.cybertec-postgresql.com
Depending on the graph fan-outs and operations it might make sense in
the graph use case. For small sets it's probably not too different
from the intarray extension in contrib. But for finding intersections
over large sets (i.e. a join) it's very-very fast. If the workload is
traversal heavy it might make sense to even cache materialized
transitive closures up to some depth (a friend-of-a-friend list).
Roaring bitmaps only support int4 right now, but that is easily
fixable. And they need a relatively dense ID space to get the
performance boost, which seems essential to the approach. The latter
issue means that it can't be easily dropped into GIN or B-tree indexes
for ctid storage.
[1]: https://github.com/ChenHuajun/pg_roaringbitmap
[2]: https://github.com/cybertec-postgresql/pgfaceting -- Ants Aasma www.cybertec-postgresql.com
--
Ants Aasma
www.cybertec-postgresql.com
On 6/1/23 09:14, Joel Jacobson wrote:
On Thu, Jun 1, 2023, at 09:02, Joel Jacobson wrote:
Here is an example using a real anonymised social network.
I realised the "found" column is not necessary in this particular example,
since we only care about the friends at the exact depth level. Simplified query:CREATE OR REPLACE VIEW friends_of_friends AS
WITH RECURSIVE friends_of_friends AS (
SELECT
ARRAY[5867::bigint] AS current,
0 AS depth
UNION ALL
SELECT
new_current,
friends_of_friends.depth + 1
FROM
friends_of_friends
CROSS JOIN LATERAL (
SELECT
array_agg(DISTINCT edges.to_node) AS new_current
FROM
edges
WHERE
from_node = ANY(friends_of_friends.current)
) q
WHERE
friends_of_friends.depth < 3
)
SELECT
depth,
coalesce(array_length(current, 1), 0)
FROM
friends_of_friends
WHERE
depth = 3;
;-- PostgreSQL 15.2:
EXPLAIN ANALYZE SELECT * FROM friends_of_friends;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on friends_of_friends (cost=2687.88..2688.58 rows=1 width=8) (actual time=2076.362..2076.454 rows=1 loops=1)
Filter: (depth = 3)
Rows Removed by Filter: 3
CTE friends_of_friends
-> Recursive Union (cost=0.00..2687.88 rows=31 width=36) (actual time=0.008..2075.073 rows=4 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=36) (actual time=0.002..0.002 rows=1 loops=1)
-> Subquery Scan on "*SELECT* 2" (cost=89.44..268.75 rows=3 width=36) (actual time=518.613..518.622 rows=1 loops=4)
-> Nested Loop (cost=89.44..268.64 rows=3 width=36) (actual time=515.523..515.523 rows=1 loops=4)
-> WorkTable Scan on friends_of_friends friends_of_friends_1 (cost=0.00..0.22 rows=3 width=36) (actual time=0.001..0.001 rows=1 loops=4)
Filter: (depth < 3)
Rows Removed by Filter: 0
-> Aggregate (cost=89.44..89.45 rows=1 width=32) (actual time=687.356..687.356 rows=1 loops=3)
-> Index Only Scan using edges_pkey on edges (cost=0.56..83.96 rows=2191 width=4) (actual time=0.139..290.996 rows=3486910 loops=3)
Index Cond: (from_node = ANY (friends_of_friends_1.current))
Heap Fetches: 0
Planning Time: 0.557 ms
Execution Time: 2076.990 ms
(17 rows)
I've been thinking about this a bit on the way back from pgcon. Per CPU
profile it seems most of the job is actually spent on qsort, calculating
the array_agg(distinct) bit. I don't think we build
We could replace this part by a hash-based set aggregate, which would be
faster, but I doubt that may yield a massive improvement that'd change
the fundamental cost.
I forgot I did something like that a couple years back, implementing a
count_distinct() aggregate that was meant as a faster alternative to
count(distinct). And then I mostly abandoned it because the built-in
sort-based approach improved significantly - it was still slower in
cases, but the gap got small enough.
Anyway, I hacked together a trivial set backed by an open addressing
hash table:
https://github.com/tvondra/hashset
It's super-basic, providing just some bare minimum of functions, but
hopefully good enough for experiments.
- hashset data type - hash table in varlena
- hashset_init - create new hashset
- hashset_add / hashset_contains - add value / check membership
- hashset_merge - merge two hashsets
- hashset_count - count elements
- hashset_to_array - return
- hashset(int) aggregate
This allows rewriting the recursive query like this:
WITH RECURSIVE friends_of_friends AS (
SELECT
ARRAY[5867::bigint] AS current,
0 AS depth
UNION ALL
SELECT
new_current,
friends_of_friends.depth + 1
FROM
friends_of_friends
CROSS JOIN LATERAL (
SELECT
hashset_to_array(hashset(edges.to_node)) AS new_current
FROM
edges
WHERE
from_node = ANY(friends_of_friends.current)
) q
WHERE
friends_of_friends.depth < 3
)
SELECT
depth,
coalesce(array_length(current, 1), 0)
FROM
friends_of_friends
WHERE
depth = 3;
On my laptop cuts the timing roughly in half - which is nice, but as I
said I don't think it's not a fundamental speedup. The aggregate can be
also parallelized, but I don't think that changes much.
Furthermore, this has a number of drawbacks too - e.g. it can't spill
data to disk, which might be an issue on more complex queries / larger
data sets.
FWIW I wonder how representative this query is, considering it returns
~1M node IDs, i.e. about 10% of the whole set of node IDs. Seems a bit
on the high side.
I've also experimented with doing stuff from plpgsql procedure (that's
what the non-aggregate functions are about). I saw this mostly as a way
to do stuff that'd be hard to do in the recursive CTE, but it has a lot
of additional execution overhead due to plpgsql. Maybe it we had some
smart trick to calculate adjacent nodes we could have a SRF written in C
to get rid of the overhead.
Anyway, this leads me to the question what graph databases are doing for
these queries, if they're faster in answering such queries (by how
much?). I'm not very familiar with that stuff, but surely they do
something smart - precalculating the data, some special indexing,
duplicating the data in multiple places?
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, Jun 2, 2023, at 10:01, Ants Aasma wrote:
Have you looked at roaring bitmaps? There is a pg_roaringbitmap
extension [1] already available that offers very fast unions,
intersections and membership tests over integer sets. I used it to get
some pretty impressive performance results for faceting search on
large document sets. [2]
Many thanks for the tip!
New benchmark:
We already had since before:
wget https://snap.stanford.edu/data/soc-pokec-relationships.txt.gz
gunzip soc-pokec-relationships.txt.gz
CREATE TABLE edges (from_node INT, to_node INT);
\copy edges from soc-pokec-relationships.txt;
ALTER TABLE edges ADD PRIMARY KEY (from_node, to_node);
I've created a new `users` table from the `edges` table,
with a new `friends` roaringbitmap column:
CREATE TABLE users AS
SELECT from_node AS id, rb_build_agg(to_node) AS friends FROM edges GROUP BY 1;
ALTER TABLE users ADD PRIMARY KEY (id);
Old query from before:
CREATE OR REPLACE VIEW friends_of_friends AS
WITH RECURSIVE friends_of_friends AS (
SELECT
ARRAY[5867::bigint] AS current,
0 AS depth
UNION ALL
SELECT
new_current,
friends_of_friends.depth + 1
FROM
friends_of_friends
CROSS JOIN LATERAL (
SELECT
array_agg(DISTINCT edges.to_node) AS new_current
FROM
edges
WHERE
from_node = ANY(friends_of_friends.current)
) q
WHERE
friends_of_friends.depth < 3
)
SELECT
coalesce(array_length(current, 1), 0) AS count_friends_at_depth_3
FROM
friends_of_friends
WHERE
depth = 3;
;
New roaringbitmap-based query using users instead:
CREATE OR REPLACE VIEW friends_of_friends_roaringbitmap AS
WITH RECURSIVE friends_of_friends AS
(
SELECT
friends,
1 AS depth
FROM users WHERE id = 5867
UNION ALL
SELECT
new_friends,
friends_of_friends.depth + 1
FROM
friends_of_friends
CROSS JOIN LATERAL (
SELECT
rb_or_agg(users.friends) AS new_friends
FROM
users
WHERE
users.id = ANY(rb_to_array(friends_of_friends.friends))
) q
WHERE
friends_of_friends.depth < 3
)
SELECT
rb_cardinality(friends) AS count_friends_at_depth_3
FROM
friends_of_friends
WHERE
depth = 3
;
Note, depth is 1 at first level since we already have user 5867's friends in the users column.
Maybe there is a better way to make use of the btree index on users.id,
than to convert the roaringbitmap to an array in order to use = ANY(...),
that is, this part: `users.id = ANY(rb_to_array(friends_of_friends.friends))`?
Or maybe there is some entirely different but equivalent way of writing the query
in a more efficient way?
SELECT * FROM friends_of_friends;
count_friends_at_depth_3
--------------------------
1035293
(1 row)
SELECT * FROM friends_of_friends_roaringbitmap;
count_friends_at_depth_3
--------------------------
1035293
(1 row)
EXPLAIN ANALYZE SELECT * FROM friends_of_friends;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on friends_of_friends (cost=5722.03..5722.73 rows=1 width=4) (actual time=2232.896..2233.289 rows=1 loops=1)
Filter: (depth = 3)
Rows Removed by Filter: 3
CTE friends_of_friends
-> Recursive Union (cost=0.00..5722.03 rows=31 width=36) (actual time=0.003..2228.707 rows=4 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=36) (actual time=0.001..0.001 rows=1 loops=1)
-> Subquery Scan on "*SELECT* 2" (cost=190.59..572.17 rows=3 width=36) (actual time=556.806..556.837 rows=1 loops=4)
-> Nested Loop (cost=190.59..572.06 rows=3 width=36) (actual time=553.748..553.748 rows=1 loops=4)
-> WorkTable Scan on friends_of_friends friends_of_friends_1 (cost=0.00..0.22 rows=3 width=36) (actual time=0.000..0.001 rows=1 loops=4)
Filter: (depth < 3)
Rows Removed by Filter: 0
-> Aggregate (cost=190.59..190.60 rows=1 width=32) (actual time=737.427..737.427 rows=1 loops=3)
-> Sort (cost=179.45..185.02 rows=2227 width=4) (actual time=577.192..649.812 rows=3486910 loops=3)
Sort Key: edges.to_node
Sort Method: quicksort Memory: 393217kB
-> Index Only Scan using edges_pkey on edges (cost=0.56..55.62 rows=2227 width=4) (actual time=0.027..225.609 rows=3486910 loops=3)
Index Cond: (from_node = ANY (friends_of_friends_1.current))
Heap Fetches: 0
Planning Time: 0.294 ms
Execution Time: 2240.446 ms
EXPLAIN ANALYZE SELECT * FROM friends_of_friends_roaringbitmap;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on friends_of_friends (cost=799.30..800.00 rows=1 width=8) (actual time=492.925..492.930 rows=1 loops=1)
Filter: (depth = 3)
Rows Removed by Filter: 2
CTE friends_of_friends
-> Recursive Union (cost=0.43..799.30 rows=31 width=118) (actual time=0.061..492.842 rows=3 loops=1)
-> Index Scan using users_pkey on users (cost=0.43..2.65 rows=1 width=118) (actual time=0.060..0.062 rows=1 loops=1)
Index Cond: (id = 5867)
-> Nested Loop (cost=26.45..79.63 rows=3 width=36) (actual time=164.244..164.244 rows=1 loops=3)
-> WorkTable Scan on friends_of_friends friends_of_friends_1 (cost=0.00..0.22 rows=3 width=36) (actual time=0.000..0.001 rows=1 loops=3)
Filter: (depth < 3)
Rows Removed by Filter: 0
-> Aggregate (cost=26.45..26.46 rows=1 width=32) (actual time=246.359..246.359 rows=1 loops=2)
-> Index Scan using users_pkey on users users_1 (cost=0.43..26.42 rows=10 width=114) (actual time=0.074..132.318 rows=116336 loops=2)
Index Cond: (id = ANY (rb_to_array(friends_of_friends_1.friends)))
Planning Time: 0.257 ms
Execution Time: 493.134 ms
On Mon, Jun 5, 2023, at 01:44, Tomas Vondra wrote:
Anyway, I hacked together a trivial set backed by an open addressing
hash table:https://github.com/tvondra/hashset
It's super-basic, providing just some bare minimum of functions, but
hopefully good enough for experiments.- hashset data type - hash table in varlena
- hashset_init - create new hashset
- hashset_add / hashset_contains - add value / check membership
- hashset_merge - merge two hashsets
- hashset_count - count elements
- hashset_to_array - return
- hashset(int) aggregateThis allows rewriting the recursive query like this:
WITH RECURSIVE friends_of_friends AS (
...
Nice! I get similar results, 737 ms (hashset) vs 1809 ms (array_agg).
I think if you just add one more hashset function, it will be a win against roaringbitmap, which is 400 ms.
The missing function is an agg that takes hashset as input and returns hashset, similar to roaringbitmap's rb_or_agg().
With such a function, we could add an adjacent nodes hashset column to the `nodes` table, which would eliminate the need to scan the edges table for graph traversal:
We could then benchmark roaringbitmap against hashset querying the same table:
CREATE TABLE users AS
SELECT
from_node AS id,
rb_build_agg(to_node) AS friends_roaringbitmap,
hashset(to_node) AS friends_hashset
FROM edges
GROUP BY 1;
/Joel
On 6/5/23 21:52, Joel Jacobson wrote:
On Mon, Jun 5, 2023, at 01:44, Tomas Vondra wrote:
Anyway, I hacked together a trivial set backed by an open addressing
hash table:https://github.com/tvondra/hashset
It's super-basic, providing just some bare minimum of functions, but
hopefully good enough for experiments.- hashset data type - hash table in varlena
- hashset_init - create new hashset
- hashset_add / hashset_contains - add value / check membership
- hashset_merge - merge two hashsets
- hashset_count - count elements
- hashset_to_array - return
- hashset(int) aggregateThis allows rewriting the recursive query like this:
WITH RECURSIVE friends_of_friends AS (
...
Nice! I get similar results, 737 ms (hashset) vs 1809 ms (array_agg).
I think if you just add one more hashset function, it will be a win against roaringbitmap, which is 400 ms.
The missing function is an agg that takes hashset as input and returns hashset, similar to roaringbitmap's rb_or_agg().
With such a function, we could add an adjacent nodes hashset column to the `nodes` table, which would eliminate the need to scan the edges table for graph traversal:
I added a trivial version of such aggregate hashset(hashset), and if I
rewrite the CTE like this:
WITH RECURSIVE friends_of_friends AS (
SELECT
(select hashset(v) from values (5867) as s(v)) AS current,
0 AS depth
UNION ALL
SELECT
new_current,
friends_of_friends.depth + 1
FROM
friends_of_friends
CROSS JOIN LATERAL (
SELECT
hashset(edges.to_node) AS new_current
FROM
edges
WHERE
from_node =
ANY(hashset_to_array(friends_of_friends.current))
) q
WHERE
friends_of_friends.depth < 3
)
SELECT
depth,
hashset_count(current)
FROM
friends_of_friends
WHERE
depth = 3;
it cuts the timing to about 50% on my laptop, so maybe it'll be ~300ms
on your system. There's a bunch of opportunities for more improvements,
as the hash table implementation is pretty naive/silly, the on-disk
format is wasteful and so on.
But before spending more time on that, it'd be interesting to know what
would be a competitive timing. I mean, what would be "good enough"? What
timings are achievable with graph databases?
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Jun 6, 2023, at 13:20, Tomas Vondra wrote:
it cuts the timing to about 50% on my laptop, so maybe it'll be ~300ms
on your system. There's a bunch of opportunities for more improvements,
as the hash table implementation is pretty naive/silly, the on-disk
format is wasteful and so on.But before spending more time on that, it'd be interesting to know what
would be a competitive timing. I mean, what would be "good enough"? What
timings are achievable with graph databases?
Your hashset is now almost exactly as fast as the corresponding roaringbitmap query, +/- 1 ms on my machine.
I tested Neo4j and the results are surprising; it appears to be significantly *slower*.
However, I've probably misunderstood something, maybe I need to add some index or something.
Even so, it's interesting it's apparently not fast "by default".
The query I tested:
MATCH (user:User {id: '5867'})-[:FRIENDS_WITH*3..3]->(fof)
RETURN COUNT(DISTINCT fof)
Here is how I loaded the data into it:
% pwd
/Users/joel/Library/Application Support/Neo4j Desktop/Application/relate-data/dbmss/dbms-3837aa22-c830-4dcf-8668-ef8e302263c7
% head import/*
==> import/friendships.csv <==
1,13,FRIENDS_WITH
1,11,FRIENDS_WITH
1,6,FRIENDS_WITH
1,3,FRIENDS_WITH
1,4,FRIENDS_WITH
1,5,FRIENDS_WITH
1,15,FRIENDS_WITH
1,14,FRIENDS_WITH
1,7,FRIENDS_WITH
1,8,FRIENDS_WITH
==> import/friendships_header.csv <==
:START_ID(User),:END_ID(User),:TYPE
==> import/users.csv <==
1,User
2,User
3,User
4,User
5,User
6,User
7,User
8,User
9,User
10,User
==> import/users_header.csv <==
id:ID(User),:LABEL
% ./bin/neo4j-admin database import full --overwrite-destination --nodes=User=import/users_header.csv,import/users.csv --relationships=FRIENDS_WIDTH=import/friendships_header.csv,import/friendships.csv neo4j
/Joel
On 6/7/23 16:21, Joel Jacobson wrote:
On Tue, Jun 6, 2023, at 13:20, Tomas Vondra wrote:
it cuts the timing to about 50% on my laptop, so maybe it'll be ~300ms
on your system. There's a bunch of opportunities for more improvements,
as the hash table implementation is pretty naive/silly, the on-disk
format is wasteful and so on.But before spending more time on that, it'd be interesting to know what
would be a competitive timing. I mean, what would be "good enough"? What
timings are achievable with graph databases?Your hashset is now almost exactly as fast as the corresponding roaringbitmap query, +/- 1 ms on my machine.
Interesting, considering how dumb the the hash table implementation is.
I tested Neo4j and the results are surprising; it appears to be significantly *slower*.
However, I've probably misunderstood something, maybe I need to add some index or something.
Even so, it's interesting it's apparently not fast "by default".
No idea how to fix that, but it's rather suspicious.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Jun 7, 2023, at 19:37, Tomas Vondra wrote:
Interesting, considering how dumb the the hash table implementation is.
That's promising.
I tested Neo4j and the results are surprising; it appears to be significantly *slower*.
However, I've probably misunderstood something, maybe I need to add some index or something.
Even so, it's interesting it's apparently not fast "by default".No idea how to fix that, but it's rather suspicious.
I've had a graph-db expert review my benchmark, and he suggested adding an index:
CREATE INDEX FOR (n:User) ON (n.id)
This did improve the execution time for Neo4j a bit, down from 819 ms to 528 ms, but PostgreSQL 299 ms is still a win.
Benchmark here: https://github.com/joelonsql/graph-query-benchmarks
Note, in this benchmark, I only test the naive RECURSIVE CTE approach using array_agg(DISTINCT ...).
And I couldn't even test the most connected user with Neo4j, the query never finish for some reason,
so I had to test with a less connected user.
The graph expert also said that other more realistic graph use-cases might be "multi-relational",
and pointed me to a link: https://github.com/totogo/awesome-knowledge-graph#knowledge-graph-dataset
No idea how such multi-relational datasets would affect the benchmarks.
I think we have already strong indicators that PostgreSQL with a hashset type will from a relative
performance perspective, do just fine processing basic graph queries, even with large datasets.
Then there will always be the case when users primarily write very different graph queries all day long,
who might prefer a graph query *language*, like SQL/PGQ in SQL:2023, Cypher or Gremlin.
/Joel
On 6/8/23 11:41, Joel Jacobson wrote:
On Wed, Jun 7, 2023, at 19:37, Tomas Vondra wrote:
Interesting, considering how dumb the the hash table implementation is.
That's promising.
Yeah, not bad for sleep-deprived on-plane hacking ...
There's a bunch of stuff that needs to be improved to make this properly
usable, like:
1) better hash table implementation
2) input/output functions
3) support for other types (now it only works with int32)
4) I wonder if this might be done as an array-like polymorphic type.
5) more efficient storage format, with versioning etc.
6) regression tests
Would you be interested in helping with / working on some of that? I
don't have immediate need for this stuff, so it's not very high on my
TODO list.
I tested Neo4j and the results are surprising; it appears to be significantly *slower*.
However, I've probably misunderstood something, maybe I need to add some index or something.
Even so, it's interesting it's apparently not fast "by default".No idea how to fix that, but it's rather suspicious.
I've had a graph-db expert review my benchmark, and he suggested adding an index:
CREATE INDEX FOR (n:User) ON (n.id)
This did improve the execution time for Neo4j a bit, down from 819 ms to 528 ms, but PostgreSQL 299 ms is still a win.
Benchmark here: https://github.com/joelonsql/graph-query-benchmarks
Note, in this benchmark, I only test the naive RECURSIVE CTE approach using array_agg(DISTINCT ...).
And I couldn't even test the most connected user with Neo4j, the query never finish for some reason,
so I had to test with a less connected user.
Interesting. I'd have expected the graph db to be much faster.
The graph expert also said that other more realistic graph use-cases might be "multi-relational",
and pointed me to a link: https://github.com/totogo/awesome-knowledge-graph#knowledge-graph-dataset
No idea how such multi-relational datasets would affect the benchmarks.
Not sure either, but I don't have ambition to improve everything at
once. If the hashset improves one practical use case, fine with me.
I think we have already strong indicators that PostgreSQL with a hashset type will from a relative
performance perspective, do just fine processing basic graph queries, even with large datasets.Then there will always be the case when users primarily write very different graph queries all day long,
who might prefer a graph query *language*, like SQL/PGQ in SQL:2023, Cypher or Gremlin.
Right. IMHO the query language is a separate thing, you still need to
evaluate the query somehow - which is where hashset applies.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Jun 8, 2023, at 12:19, Tomas Vondra wrote:
Would you be interested in helping with / working on some of that? I
don't have immediate need for this stuff, so it's not very high on my
TODO list.
Sure, I'm willing to help!
I've attached a patch that works on some of the items on your list,
including some additions to the README.md.
There were a bunch of places where `maxelements / 8` caused bugs,
that had to be changed to do proper integer ceiling division:
- values = (int32 *) (set->data + set->maxelements / 8);
+ values = (int32 *) (set->data + (set->maxelements + 7) / 8);
Side note: I wonder if it would be good to add CEIL_DIV and FLOOR_DIV macros
to the PostgreSQL source code in general, since it's easy to make this mistake,
and quite verbose/error-prone to write it out manually everywhere.
Such macros could simplify code in e.g. numeric.c.
There's a bunch of stuff that needs to be improved to make this properly
usable, like:1) better hash table implementation
TODO
2) input/output functions
I've attempted to implement these.
I thought comma separated values wrapped around curly braces felt as the most natural format,
example:
SELECT '{1,2,3}'::hashset;
3) support for other types (now it only works with int32)
TODO
4) I wonder if this might be done as an array-like polymorphic type.
That would be nice!
I guess the work-around would be to store the actual value of non-int type
in a lookup table, and then hash the int-based primary key in such table.
Do you think later implementing polymorphic type support would
mean a more or less complete rewrite, or can we carry on with int32-support
and add it later on?
5) more efficient storage format, with versioning etc.
TODO
6) regression tests
I've added some regression tests.
Right. IMHO the query language is a separate thing, you still need to
evaluate the query somehow - which is where hashset applies.
Good point, I fully agree.
/Joel
Attachments:
hashset-1.0.0-joel-0001.patchapplication/octet-stream; name=hashset-1.0.0-joel-0001.patchDownload+416-54
On Fri, Jun 9, 2023 at 6:58 PM Joel Jacobson <joel@compiler.org> wrote:
On Thu, Jun 8, 2023, at 12:19, Tomas Vondra wrote:
Would you be interested in helping with / working on some of that? I
don't have immediate need for this stuff, so it's not very high on my
TODO list.Sure, I'm willing to help!
I've attached a patch that works on some of the items on your list,
including some additions to the README.md.There were a bunch of places where `maxelements / 8` caused bugs,
that had to be changed to do proper integer ceiling division:- values = (int32 *) (set->data + set->maxelements / 8); + values = (int32 *) (set->data + (set->maxelements + 7) / 8);Side note: I wonder if it would be good to add CEIL_DIV and FLOOR_DIV
macros
to the PostgreSQL source code in general, since it's easy to make this
mistake,
and quite verbose/error-prone to write it out manually everywhere.
Such macros could simplify code in e.g. numeric.c.There's a bunch of stuff that needs to be improved to make this properly
usable, like:1) better hash table implementation
TODO
2) input/output functions
I've attempted to implement these.
I thought comma separated values wrapped around curly braces felt as the
most natural format,
example:
SELECT '{1,2,3}'::hashset;3) support for other types (now it only works with int32)
TODO
4) I wonder if this might be done as an array-like polymorphic type.
That would be nice!
I guess the work-around would be to store the actual value of non-int type
in a lookup table, and then hash the int-based primary key in such table.Do you think later implementing polymorphic type support would
mean a more or less complete rewrite, or can we carry on with int32-support
and add it later on?5) more efficient storage format, with versioning etc.
TODO
6) regression tests
I've added some regression tests.
Right. IMHO the query language is a separate thing, you still need to
evaluate the query somehow - which is where hashset applies.Good point, I fully agree.
/Joel
Hi, I am quite new about C.....
The following function I have 3 questions.
1. 7691,4201, I assume they are just random prime ints?
2. I don't get the last return set, even the return type should be bool.
3. I don't understand 13 in hash = (hash + 13) % set->maxelements;
static bool
hashset_contains_element(hashset_t *set, int32 value)
{
int byte;
int bit;
uint32 hash;
char *bitmap;
int32 *values;
hash = ((uint32) value * 7691 + 4201) % set->maxelements;
bitmap = set->data;
values = (int32 *) (set->data + set->maxelements / 8);
while (true)
{
byte = (hash / 8);
bit = (hash % 8);
/* found an empty slot, value is not there */
if ((bitmap[byte] & (0x01 << bit)) == 0)
return false;
/* is it the same value? */
if (values[hash] == value)
return true;
/* move to the next element */
hash = (hash + 13) % set->maxelements;
}
return set;
}
On Fri, Jun 9, 2023, at 13:33, jian he wrote:
Hi, I am quite new about C.....
The following function I have 3 questions.
1. 7691,4201, I assume they are just random prime ints?
Yes, 7691 and 4201 are likely chosen as random prime numbers.
In hash functions, prime numbers are often used to help in evenly distributing
the hash values across the range and reduce the chance of collisions.
2. I don't get the last return set, even the return type should be bool.
Thanks, you found a mistake!
The line
return set;
is actually unreachable and should be removed.
The function will always return either true or false within the while loop and
never reach the final return statement.
I've attached a new incremental patch with this fix.
3. I don't understand 13 in hash = (hash + 13) % set->maxelements;
The value 13 is used for linear probing [1]https://en.wikipedia.org/wiki/Linear_probing in handling hash collisions.
Linear probing sequentially checks the next slot in the array when a collision
occurs. 13, being a small prime number not near a power of 2, helps in uniformly
distributing data and ensuring that all slots are probed, as it's relatively prime
to the hash table size.
Hm, I realise we actually don't ensure the hash table size and step size (13)
are coprime. I've fixed that in the attached patch as well.
[1]: https://en.wikipedia.org/wiki/Linear_probing
/Joel
Attachments:
hashset-1.0.0-joel-0002.patchapplication/octet-stream; name=hashset-1.0.0-joel-0002.patchDownload+9-2
On 2023-06-09 Fr 07:56, Joel Jacobson wrote:
On Fri, Jun 9, 2023, at 13:33, jian he wrote:
Hi, I am quite new about C.....
The following function I have 3 questions.
1. 7691,4201, I assume they are just random prime ints?Yes, 7691 and 4201 are likely chosen as random prime numbers.
In hash functions, prime numbers are often used to help in evenly
distributing
the hash values across the range and reduce the chance of collisions.2. I don't get the last return set, even the return type should be bool.
Thanks, you found a mistake!
The line
return set;
is actually unreachable and should be removed.
The function will always return either true or false within the while
loop and
never reach the final return statement.I've attached a new incremental patch with this fix.
3. I don't understand 13 in hash = (hash + 13) % set->maxelements;
The value 13 is used for linear probing [1] in handling hash collisions.
Linear probing sequentially checks the next slot in the array when a
collision
occurs. 13, being a small prime number not near a power of 2, helps in
uniformly
distributing data and ensuring that all slots are probed, as it's
relatively prime
to the hash table size.Hm, I realise we actually don't ensure the hash table size and step
size (13)
are coprime. I've fixed that in the attached patch as well.
Maybe you can post a full patch as well as incremental?
Stylistically I think you should reduce reliance on magic numbers (like
13). Probably need some #define's?
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.com