Unlinking Parallel Hash Join inner batch files sooner
Hi,
One complaint about PHJ is that it can, in rare cases, use a
surprising amount of temporary disk space where non-parallel HJ would
not. When it decides that it needs to double the number of batches to
try to fit each inner batch into memory, and then again and again
depending on your level of bad luck, it leaves behind all the earlier
generations of inner batch files to be cleaned up at the end of the
query. That's stupid. Here's a patch to unlink them sooner, as a
small improvement.
The reason I didn't do this earlier is that sharedtuplestore.c
continues the pre-existing tradition where each parallel process
counts what it writes against its own temp_file_limit. At the time I
thought I'd need to have one process unlink all the files, but if a
process were to unlink files that it didn't create, that accounting
system would break. Without some new kind of shared temp_file_limit
mechanism that doesn't currently exist, per-process counters could go
negative, creating free money. In the attached patch, I realised
something that I'd missed before: there is a safe point for each
backend to unlink just the files that it created, and there is no way
for a process that created files not to reach that point.
Here's an example query that tries 8, 16 and then 32 batches on my
machine, because reltuples is clobbered with a bogus value.
Pathological cases can try many more rounds than that, but 3 is enough
to demonstrate. Using truss and shell tricks I spat out the list of
create and unlink operations from master and the attached draft/POC
patch. See below.
set work_mem = '1MB';
CREATE TABLE t (i int);
INSERT INTO t SELECT generate_series(1, 1000000);
ANALYZE t;
UPDATE pg_class SET reltuples = reltuples / 4 WHERE relname = 't';
EXPLAIN ANALYZE SELECT COUNT(*) FROM t t1 JOIN t t2 USING (i);
This code is also exercised by the existing "bad" case in join_hash.sql.
This is the second of two experimental patches investigating increased
resource usage in PHJ compared to HJ based on user complaints, this
one being per-batch temp files, and the other[1]/messages/by-id/CA+hUKGKCnU9NjFfzO219V-YeyWr8mZe4JRrf=x_uv6qsePBcOw@mail.gmail.com being per-batch
buffer memory.
[1]: /messages/by-id/CA+hUKGKCnU9NjFfzO219V-YeyWr8mZe4JRrf=x_uv6qsePBcOw@mail.gmail.com
=====
master:
99861: create i3of8.p0.0
99861: create i6of8.p0.0
99861: create i4of8.p0.0
99861: create i5of8.p0.0
99861: create i7of8.p0.0
99861: create i2of8.p0.0
99861: create i1of8.p0.0
99863: create i2of8.p1.0
99862: create i7of8.p2.0
99863: create i5of8.p1.0
99862: create i1of8.p2.0
99863: create i6of8.p1.0
99862: create i5of8.p2.0
99863: create i1of8.p1.0
99862: create i3of8.p2.0
99863: create i7of8.p1.0
99862: create i4of8.p2.0
99863: create i3of8.p1.0
99862: create i2of8.p2.0
99863: create i4of8.p1.0
99862: create i6of8.p2.0
99863: create i8of16.p1.0
99861: create i8of16.p0.0
99862: create i8of16.p2.0
99863: create i9of16.p1.0
99862: create i1of16.p2.0
99863: create i1of16.p1.0
99862: create i9of16.p2.0
99861: create i9of16.p0.0
99861: create i1of16.p0.0
99862: create i10of16.p2.0
99863: create i2of16.p1.0
99862: create i2of16.p2.0
99863: create i10of16.p1.0
99861: create i2of16.p0.0
99861: create i10of16.p0.0
99862: create i11of16.p2.0
99863: create i3of16.p1.0
99862: create i3of16.p2.0
99861: create i3of16.p0.0
99863: create i11of16.p1.0
99861: create i11of16.p0.0
99863: create i4of16.p1.0
99863: create i12of16.p1.0
99862: create i12of16.p2.0
99862: create i4of16.p2.0
99861: create i12of16.p0.0
99861: create i4of16.p0.0
99863: create i13of16.p1.0
99863: create i5of16.p1.0
99862: create i5of16.p2.0
99862: create i13of16.p2.0
99861: create i5of16.p0.0
99861: create i13of16.p0.0
99862: create i6of16.p2.0
99863: create i6of16.p1.0
99861: create i14of16.p0.0
99862: create i14of16.p2.0
99863: create i14of16.p1.0
99861: create i6of16.p0.0
99863: create i15of16.p1.0
99861: create i7of16.p0.0
99863: create i7of16.p1.0
99862: create i15of16.p2.0
99861: create i15of16.p0.0
99862: create i7of16.p2.0
99863: create i16of32.p1.0
99862: create i16of32.p2.0
99861: create i16of32.p0.0
99863: create i17of32.p1.0
99863: create i1of32.p1.0
99861: create i1of32.p0.0
99862: create i17of32.p2.0
99862: create i1of32.p2.0
99861: create i17of32.p0.0
99863: create i18of32.p1.0
99863: create i2of32.p1.0
99862: create i2of32.p2.0
99862: create i18of32.p2.0
99861: create i2of32.p0.0
99861: create i18of32.p0.0
99862: create i3of32.p2.0
99862: create i19of32.p2.0
99861: create i19of32.p0.0
99861: create i3of32.p0.0
99863: create i19of32.p1.0
99863: create i3of32.p1.0
99863: create i20of32.p1.0
99863: create i4of32.p1.0
99861: create i20of32.p0.0
99861: create i4of32.p0.0
99862: create i20of32.p2.0
99862: create i4of32.p2.0
99861: create i21of32.p0.0
99863: create i21of32.p1.0
99861: create i5of32.p0.0
99863: create i5of32.p1.0
99862: create i5of32.p2.0
99862: create i21of32.p2.0
99863: create i22of32.p1.0
99863: create i6of32.p1.0
99861: create i22of32.p0.0
99862: create i22of32.p2.0
99861: create i6of32.p0.0
99862: create i6of32.p2.0
99863: create i7of32.p1.0
99863: create i23of32.p1.0
99861: create i7of32.p0.0
99862: create i23of32.p2.0
99862: create i7of32.p2.0
99861: create i23of32.p0.0
99862: create i24of32.p2.0
99862: create i8of32.p2.0
99863: create i24of32.p1.0
99863: create i8of32.p1.0
99861: create i24of32.p0.0
99861: create i8of32.p0.0
99863: create i9of32.p1.0
99863: create i25of32.p1.0
99862: create i9of32.p2.0
99862: create i25of32.p2.0
99861: create i9of32.p0.0
99861: create i25of32.p0.0
99861: create i26of32.p0.0
99862: create i26of32.p2.0
99863: create i10of32.p1.0
99862: create i10of32.p2.0
99861: create i10of32.p0.0
99863: create i26of32.p1.0
99862: create i11of32.p2.0
99861: create i11of32.p0.0
99862: create i27of32.p2.0
99861: create i27of32.p0.0
99863: create i27of32.p1.0
99863: create i11of32.p1.0
99862: create i12of32.p2.0
99861: create i28of32.p0.0
99862: create i28of32.p2.0
99861: create i12of32.p0.0
99863: create i12of32.p1.0
99863: create i28of32.p1.0
99863: create i29of32.p1.0
99863: create i13of32.p1.0
99862: create i29of32.p2.0
99862: create i13of32.p2.0
99861: create i13of32.p0.0
99861: create i29of32.p0.0
99863: create i14of32.p1.0
99862: create i14of32.p2.0
99863: create i30of32.p1.0
99861: create i30of32.p0.0
99862: create i30of32.p2.0
99861: create i14of32.p0.0
99863: create i15of32.p1.0
99863: create i31of32.p1.0
99862: create i15of32.p2.0
99861: create i31of32.p0.0
99862: create i31of32.p2.0
99861: create i15of32.p0.0
99863: create o19of32.p1.0
99861: create o19of32.p0.0
99863: create o30of32.p1.0
99861: create o20of32.p0.0
99863: create o28of32.p1.0
99862: create o23of32.p2.0
99861: create o25of32.p0.0
99863: create o21of32.p1.0
99862: create o26of32.p2.0
99861: create o15of32.p0.0
99863: create o8of32.p1.0
99862: create o7of32.p2.0
99863: create o20of32.p1.0
99861: create o2of32.p0.0
99863: create o14of32.p1.0
99862: create o3of32.p2.0
99863: create o12of32.p1.0
99861: create o8of32.p0.0
99862: create o18of32.p2.0
99863: create o7of32.p1.0
99862: create o24of32.p2.0
99861: create o24of32.p0.0
99863: create o24of32.p1.0
99862: create o10of32.p2.0
99863: create o11of32.p1.0
99861: create o23of32.p0.0
99862: create o22of32.p2.0
99863: create o31of32.p1.0
99862: create o12of32.p2.0
99861: create o10of32.p0.0
99863: create o2of32.p1.0
99862: create o30of32.p2.0
99861: create o6of32.p0.0
99863: create o22of32.p1.0
99862: create o14of32.p2.0
99861: create o17of32.p0.0
99863: create o0of32.p1.0
99862: create o29of32.p2.0
99861: create o4of32.p0.0
99863: create o6of32.p1.0
99862: create o8of32.p2.0
99861: create o11of32.p0.0
99863: create o18of32.p1.0
99862: create o15of32.p2.0
99861: create o1of32.p0.0
99863: create o5of32.p1.0
99862: create o2of32.p2.0
99861: create o12of32.p0.0
99863: create o4of32.p1.0
99862: create o28of32.p2.0
99861: create o13of32.p0.0
99863: create o9of32.p1.0
99862: create o31of32.p2.0
99861: create o21of32.p0.0
99863: create o27of32.p1.0
99862: create o0of32.p2.0
99861: create o16of32.p0.0
99863: create o26of32.p1.0
99862: create o13of32.p2.0
99861: create o29of32.p0.0
99863: create o3of32.p1.0
99862: create o5of32.p2.0
99861: create o3of32.p0.0
99863: create o25of32.p1.0
99862: create o21of32.p2.0
99861: create o5of32.p0.0
99863: create o1of32.p1.0
99862: create o20of32.p2.0
99861: create o30of32.p0.0
99863: create o17of32.p1.0
99862: create o1of32.p2.0
99861: create o14of32.p0.0
99863: create o23of32.p1.0
99862: create o16of32.p2.0
99861: create o0of32.p0.0
99863: create o13of32.p1.0
99862: create o19of32.p2.0
99861: create o28of32.p0.0
99863: create o16of32.p1.0
99862: create o6of32.p2.0
99861: create o26of32.p0.0
99863: create o15of32.p1.0
99862: create o9of32.p2.0
99861: create o18of32.p0.0
99863: create o29of32.p1.0
99862: create o11of32.p2.0
99861: create o31of32.p0.0
99862: create o4of32.p2.0
99863: create o10of32.p1.0
99861: create o27of32.p0.0
99862: create o27of32.p2.0
99861: create o7of32.p0.0
99862: create o17of32.p2.0
99861: create o22of32.p0.0
99862: create o25of32.p2.0
99861: create o9of32.p0.0
99861: unlink i20of32.p0.0
99861: unlink o24of32.p0.0
99861: unlink i29of32.p2.0
99861: unlink i7of8.p1.0
99861: unlink i26of32.p2.0
99861: unlink i3of8.p1.0
99861: unlink o7of32.p1.0
99861: unlink o22of32.p2.0
99861: unlink o8of32.p1.0
99861: unlink i30of32.p0.0
99861: unlink i1of32.p2.0
99861: unlink i7of32.p0.0
99861: unlink i18of32.p1.0
99861: unlink i8of32.p0.0
99861: unlink i17of32.p1.0
99861: unlink i6of16.p0.0
99861: unlink o13of32.p1.0
99861: unlink i9of16.p0.0
99861: unlink o0of32.p0.0
99861: unlink o6of32.p2.0
99861: unlink o9of32.p2.0
99861: unlink o23of32.p1.0
99861: unlink i28of32.p1.0
99861: unlink i27of32.p1.0
99861: unlink i1of16.p1.0
99861: unlink i11of16.p0.0
99861: unlink o14of32.p0.0
99861: unlink i10of32.p0.0
99861: unlink o12of32.p2.0
99861: unlink i19of32.p2.0
99861: unlink i16of32.p2.0
99861: unlink i6of8.p2.0
99861: unlink i9of32.p2.0
99861: unlink i6of32.p2.0
99861: unlink i8of16.p2.0
99861: unlink i2of8.p2.0
99861: unlink i7of16.p2.0
99861: unlink i10of32.p1.0
99861: unlink i31of32.p2.0
99861: unlink i11of16.p1.0
99861: unlink o14of32.p1.0
99861: unlink i1of16.p0.0
99861: unlink i27of32.p0.0
99861: unlink i28of32.p0.0
99861: unlink o23of32.p0.0
99861: unlink i21of32.p2.0
99861: unlink o25of32.p2.0
99861: unlink o0of32.p1.0
99861: unlink o13of32.p0.0
99861: unlink i9of16.p1.0
99861: unlink i6of16.p1.0
99861: unlink i8of32.p1.0
99861: unlink i17of32.p0.0
99861: unlink i7of32.p1.0
99861: unlink i18of32.p0.0
99861: unlink o15of32.p2.0
99861: unlink i10of16.p2.0
99861: unlink i11of32.p2.0
99861: unlink i30of32.p1.0
99861: unlink o8of32.p0.0
99861: unlink i3of8.p0.0
99861: unlink o7of32.p0.0
99861: unlink i7of8.p0.0
99861: unlink o24of32.p1.0
99861: unlink o1of32.p2.0
99861: unlink i20of32.p1.0
99861: unlink o1of32.p0.0
99861: unlink i7of8.p2.0
99861: unlink i26of32.p1.0
99861: unlink i29of32.p1.0
99861: unlink o22of32.p1.0
99861: unlink o8of32.p2.0
99861: unlink i3of8.p2.0
99861: unlink o7of32.p2.0
99861: unlink i11of32.p0.0
99861: unlink i1of32.p1.0
99861: unlink o15of32.p0.0
99861: unlink i10of16.p0.0
99861: unlink i17of32.p2.0
99861: unlink i18of32.p2.0
99861: unlink o13of32.p2.0
99861: unlink o25of32.p0.0
99861: unlink i21of32.p0.0
99861: unlink o9of32.p1.0
99861: unlink o23of32.p2.0
99861: unlink o6of32.p1.0
99861: unlink i27of32.p2.0
99861: unlink i28of32.p2.0
99861: unlink i1of16.p2.0
99861: unlink i31of32.p0.0
99861: unlink i8of16.p0.0
99861: unlink o12of32.p1.0
99861: unlink i2of8.p0.0
99861: unlink i7of16.p0.0
99861: unlink i6of8.p0.0
99861: unlink i16of32.p1.0
99861: unlink i9of32.p0.0
99861: unlink i19of32.p1.0
99861: unlink i6of32.p0.0
99861: unlink i19of32.p0.0
99861: unlink i6of32.p1.0
99861: unlink i6of8.p1.0
99861: unlink i16of32.p0.0
99861: unlink i9of32.p1.0
99861: unlink i7of16.p1.0
99861: unlink i2of8.p1.0
99861: unlink i8of16.p1.0
99861: unlink o12of32.p0.0
99861: unlink i31of32.p1.0
99861: unlink i10of32.p2.0
99861: unlink i11of16.p2.0
99861: unlink o14of32.p2.0
99861: unlink o6of32.p0.0
99861: unlink o9of32.p0.0
99861: unlink i21of32.p1.0
99861: unlink o0of32.p2.0
99861: unlink o25of32.p1.0
99861: unlink i6of16.p2.0
99861: unlink i9of16.p2.0
99861: unlink i7of32.p2.0
99861: unlink i8of32.p2.0
99861: unlink o15of32.p1.0
99861: unlink i10of16.p1.0
99861: unlink i30of32.p2.0
99861: unlink i1of32.p0.0
99861: unlink i11of32.p1.0
99861: unlink o22of32.p0.0
99861: unlink i29of32.p0.0
99861: unlink i26of32.p0.0
99861: unlink o1of32.p1.0
99861: unlink o24of32.p2.0
99861: unlink i20of32.p2.0
99861: unlink i3of16.p0.0
99861: unlink o19of32.p1.0
99861: unlink o16of32.p1.0
99861: unlink i13of16.p1.0
99861: unlink i2of32.p0.0
99861: unlink i12of32.p1.0
99861: unlink i5of16.p2.0
99861: unlink o31of32.p0.0
99861: unlink i4of32.p2.0
99861: unlink o2of32.p1.0
99861: unlink o28of32.p2.0
99861: unlink o27of32.p2.0
99861: unlink i23of32.p2.0
99861: unlink i5of8.p2.0
99861: unlink o21of32.p0.0
99861: unlink i25of32.p0.0
99861: unlink i1of8.p2.0
99861: unlink i13of32.p2.0
99861: unlink o18of32.p2.0
99861: unlink i12of16.p2.0
99861: unlink o17of32.p2.0
99861: unlink i5of32.p1.0
99861: unlink i15of32.p0.0
99861: unlink i4of16.p1.0
99861: unlink i4of8.p0.0
99861: unlink i14of16.p0.0
99861: unlink o11of32.p0.0
99861: unlink i22of32.p1.0
99861: unlink o29of32.p1.0
99861: unlink o3of32.p2.0
99861: unlink o26of32.p1.0
99861: unlink o5of32.p0.0
99861: unlink o20of32.p2.0
99861: unlink o5of32.p1.0
99861: unlink i24of32.p2.0
99861: unlink o26of32.p0.0
99861: unlink o29of32.p0.0
99861: unlink i22of32.p0.0
99861: unlink i14of16.p1.0
99861: unlink o11of32.p1.0
99861: unlink o30of32.p2.0
99861: unlink i4of16.p0.0
99861: unlink i4of8.p1.0
99861: unlink i15of32.p1.0
99861: unlink i5of32.p0.0
99861: unlink i2of16.p2.0
99861: unlink i3of32.p2.0
99861: unlink i25of32.p1.0
99861: unlink o21of32.p1.0
99861: unlink o4of32.p2.0
99861: unlink o2of32.p0.0
99861: unlink i14of32.p2.0
99861: unlink o10of32.p2.0
99861: unlink i15of16.p2.0
99861: unlink o31of32.p1.0
99861: unlink i12of32.p0.0
99861: unlink i2of32.p1.0
99861: unlink o16of32.p0.0
99861: unlink i13of16.p0.0
99861: unlink i3of16.p1.0
99861: unlink o19of32.p0.0
99861: unlink o16of32.p2.0
99861: unlink i13of16.p2.0
99861: unlink o19of32.p2.0
99861: unlink i12of32.p2.0
99861: unlink o10of32.p0.0
99861: unlink i15of16.p0.0
99861: unlink i5of16.p1.0
99861: unlink i14of32.p0.0
99861: unlink i4of32.p1.0
99861: unlink o27of32.p1.0
99861: unlink o2of32.p2.0
99861: unlink o28of32.p1.0
99861: unlink i23of32.p1.0
99861: unlink o4of32.p0.0
99861: unlink i5of8.p1.0
99861: unlink i1of8.p1.0
99861: unlink i13of32.p1.0
99861: unlink i3of32.p0.0
99861: unlink i12of16.p1.0
99861: unlink o17of32.p1.0
99861: unlink o18of32.p1.0
99861: unlink i2of16.p0.0
99861: unlink i5of32.p2.0
99861: unlink o30of32.p0.0
99861: unlink i4of16.p2.0
99861: unlink i22of32.p2.0
99861: unlink o26of32.p2.0
99861: unlink o29of32.p2.0
99861: unlink o3of32.p1.0
99861: unlink i24of32.p0.0
99861: unlink o20of32.p0.0
99861: unlink o5of32.p2.0
99861: unlink o20of32.p1.0
99861: unlink i24of32.p1.0
99861: unlink o3of32.p0.0
99861: unlink o30of32.p1.0
99861: unlink i4of8.p2.0
99861: unlink i14of16.p2.0
99861: unlink o11of32.p2.0
99861: unlink i15of32.p2.0
99861: unlink o18of32.p0.0
99861: unlink i2of16.p1.0
99861: unlink i12of16.p0.0
99861: unlink o17of32.p0.0
99861: unlink i3of32.p1.0
99861: unlink i13of32.p0.0
99861: unlink i25of32.p2.0
99861: unlink i1of8.p0.0
99861: unlink o4of32.p1.0
99861: unlink i5of8.p0.0
99861: unlink o21of32.p2.0
99861: unlink i23of32.p0.0
99861: unlink o28of32.p0.0
99861: unlink o27of32.p0.0
99861: unlink i4of32.p0.0
99861: unlink i14of32.p1.0
99861: unlink i5of16.p0.0
99861: unlink o31of32.p2.0
99861: unlink o10of32.p1.0
99861: unlink i15of16.p1.0
99861: unlink i2of32.p2.0
99861: unlink i3of16.p2.0
Patched:
93662: create i3of8.p0.0
93662: create i6of8.p0.0
93662: create i4of8.p0.0
93662: create i5of8.p0.0
93662: create i7of8.p0.0
93662: create i2of8.p0.0
93662: create i1of8.p0.0
93664: create i4of8.p1.0
93663: create i2of8.p2.0
93664: create i6of8.p1.0
93663: create i7of8.p2.0
93664: create i1of8.p1.0
93663: create i3of8.p2.0
93664: create i2of8.p1.0
93663: create i4of8.p2.0
93664: create i7of8.p1.0
93664: create i5of8.p1.0
93663: create i5of8.p2.0
93664: create i3of8.p1.0
93663: create i6of8.p2.0
93663: create i1of8.p2.0
93664: create i8of16.p1.0
93662: create i8of16.p0.0
93663: create i8of16.p2.0
93662: create i9of16.p0.0
93664: create i1of16.p1.0
93663: create i1of16.p2.0
93664: create i9of16.p1.0
93662: create i1of16.p0.0
93663: create i9of16.p2.0
93663: create i10of16.p2.0
93664: create i10of16.p1.0
93663: create i2of16.p2.0
93664: create i2of16.p1.0
93662: create i2of16.p0.0
93662: create i10of16.p0.0
93663: create i11of16.p2.0
93663: create i3of16.p2.0
93664: create i3of16.p1.0
93664: create i11of16.p1.0
93662: create i3of16.p0.0
93662: create i11of16.p0.0
93662: create i12of16.p0.0
93664: create i12of16.p1.0
93663: create i12of16.p2.0
93664: create i4of16.p1.0
93662: create i4of16.p0.0
93663: create i4of16.p2.0
93664: create i5of16.p1.0
93663: create i5of16.p2.0
93662: create i5of16.p0.0
93664: create i13of16.p1.0
93662: create i13of16.p0.0
93663: create i13of16.p2.0
93664: create i6of16.p1.0
93664: create i14of16.p1.0
93663: create i6of16.p2.0
93663: create i14of16.p2.0
93662: create i14of16.p0.0
93662: create i6of16.p0.0
93662: create i7of16.p0.0
93663: create i15of16.p2.0
93662: create i15of16.p0.0
93663: create i7of16.p2.0
93664: create i15of16.p1.0
93664: create i7of16.p1.0
93664: unlink i1of8.p1.0
93663: unlink i1of8.p2.0
93662: unlink i1of8.p0.0
93664: unlink i2of8.p1.0
93663: unlink i2of8.p2.0
93662: unlink i2of8.p0.0
93664: unlink i3of8.p1.0
93663: unlink i3of8.p2.0
93664: unlink i4of8.p1.0
93662: unlink i3of8.p0.0
93664: unlink i5of8.p1.0
93663: unlink i4of8.p2.0
93662: unlink i4of8.p0.0
93664: unlink i6of8.p1.0
93663: unlink i5of8.p2.0
93664: unlink i7of8.p1.0
93662: unlink i5of8.p0.0
93663: unlink i6of8.p2.0
93662: unlink i6of8.p0.0
93663: unlink i7of8.p2.0
93662: unlink i7of8.p0.0
93664: create i16of32.p1.0
93663: create i16of32.p2.0
93662: create i16of32.p0.0
93663: create i1of32.p2.0
93664: create i17of32.p1.0
93663: create i17of32.p2.0
93664: create i1of32.p1.0
93662: create i1of32.p0.0
93662: create i17of32.p0.0
93663: create i18of32.p2.0
93663: create i2of32.p2.0
93664: create i18of32.p1.0
93664: create i2of32.p1.0
93662: create i2of32.p0.0
93662: create i18of32.p0.0
93663: create i3of32.p2.0
93663: create i19of32.p2.0
93662: create i19of32.p0.0
93664: create i19of32.p1.0
93662: create i3of32.p0.0
93664: create i3of32.p1.0
93663: create i4of32.p2.0
93663: create i20of32.p2.0
93664: create i4of32.p1.0
93662: create i20of32.p0.0
93664: create i20of32.p1.0
93662: create i4of32.p0.0
93664: create i5of32.p1.0
93664: create i21of32.p1.0
93662: create i21of32.p0.0
93663: create i21of32.p2.0
93663: create i5of32.p2.0
93662: create i5of32.p0.0
93664: create i22of32.p1.0
93663: create i6of32.p2.0
93664: create i6of32.p1.0
93663: create i22of32.p2.0
93662: create i22of32.p0.0
93662: create i6of32.p0.0
93664: create i7of32.p1.0
93664: create i23of32.p1.0
93663: create i7of32.p2.0
93662: create i7of32.p0.0
93663: create i23of32.p2.0
93662: create i23of32.p0.0
93664: create i24of32.p1.0
93662: create i24of32.p0.0
93664: create i8of32.p1.0
93663: create i8of32.p2.0
93662: create i8of32.p0.0
93663: create i24of32.p2.0
93663: create i9of32.p2.0
93664: create i25of32.p1.0
93663: create i25of32.p2.0
93662: create i9of32.p0.0
93664: create i9of32.p1.0
93662: create i25of32.p0.0
93663: create i26of32.p2.0
93663: create i10of32.p2.0
93664: create i26of32.p1.0
93664: create i10of32.p1.0
93662: create i26of32.p0.0
93662: create i10of32.p0.0
93662: create i11of32.p0.0
93664: create i11of32.p1.0
93662: create i27of32.p0.0
93663: create i27of32.p2.0
93664: create i27of32.p1.0
93663: create i11of32.p2.0
93663: create i28of32.p2.0
93664: create i28of32.p1.0
93663: create i12of32.p2.0
93664: create i12of32.p1.0
93662: create i28of32.p0.0
93662: create i12of32.p0.0
93664: create i29of32.p1.0
93664: create i13of32.p1.0
93663: create i13of32.p2.0
93663: create i29of32.p2.0
93662: create i13of32.p0.0
93662: create i29of32.p0.0
93663: create i30of32.p2.0
93664: create i30of32.p1.0
93663: create i14of32.p2.0
93664: create i14of32.p1.0
93662: create i30of32.p0.0
93662: create i14of32.p0.0
93664: create i31of32.p1.0
93663: create i31of32.p2.0
93664: create i15of32.p1.0
93663: create i15of32.p2.0
93662: create i31of32.p0.0
93662: create i15of32.p0.0
93664: unlink i1of16.p1.0
93663: unlink i1of16.p2.0
93662: unlink i1of16.p0.0
93664: unlink i2of16.p1.0
93663: unlink i2of16.p2.0
93664: unlink i3of16.p1.0
93662: unlink i2of16.p0.0
93663: unlink i3of16.p2.0
93664: unlink i4of16.p1.0
93662: unlink i3of16.p0.0
93663: unlink i4of16.p2.0
93664: unlink i5of16.p1.0
93663: unlink i5of16.p2.0
93664: unlink i6of16.p1.0
93663: unlink i6of16.p2.0
93662: unlink i4of16.p0.0
93664: unlink i7of16.p1.0
93663: unlink i7of16.p2.0
93662: unlink i5of16.p0.0
93664: unlink i8of16.p1.0
93663: unlink i8of16.p2.0
93664: unlink i9of16.p1.0
93662: unlink i6of16.p0.0
93663: unlink i9of16.p2.0
93664: unlink i10of16.p1.0
93662: unlink i7of16.p0.0
93663: unlink i10of16.p2.0
93664: unlink i11of16.p1.0
93663: unlink i11of16.p2.0
93664: unlink i12of16.p1.0
93662: unlink i8of16.p0.0
93664: unlink i13of16.p1.0
93663: unlink i12of16.p2.0
93662: unlink i9of16.p0.0
93664: unlink i14of16.p1.0
93663: unlink i13of16.p2.0
93662: unlink i10of16.p0.0
93664: unlink i15of16.p1.0
93663: unlink i14of16.p2.0
93662: unlink i11of16.p0.0
93663: unlink i15of16.p2.0
93662: unlink i12of16.p0.0
93662: unlink i13of16.p0.0
93662: unlink i14of16.p0.0
93662: unlink i15of16.p0.0
93664: create o19of32.p1.0
93663: create o19of32.p2.0
93662: create o23of32.p0.0
93664: create o30of32.p1.0
93663: create o20of32.p2.0
93662: create o26of32.p0.0
93664: create o28of32.p1.0
93663: create o25of32.p2.0
93664: create o21of32.p1.0
93662: create o7of32.p0.0
93663: create o15of32.p2.0
93664: create o8of32.p1.0
93663: create o2of32.p2.0
93662: create o3of32.p0.0
93664: create o20of32.p1.0
93663: create o8of32.p2.0
93662: create o18of32.p0.0
93664: create o14of32.p1.0
93663: create o24of32.p2.0
93662: create o24of32.p0.0
93664: create o12of32.p1.0
93663: create o23of32.p2.0
93664: create o7of32.p1.0
93662: create o10of32.p0.0
93663: create o10of32.p2.0
93664: create o24of32.p1.0
93662: create o22of32.p0.0
93663: create o6of32.p2.0
93664: create o11of32.p1.0
93663: create o17of32.p2.0
93662: create o12of32.p0.0
93664: create o31of32.p1.0
93663: create o4of32.p2.0
93664: create o2of32.p1.0
93662: create o30of32.p0.0
93663: create o11of32.p2.0
93664: create o22of32.p1.0
93663: create o1of32.p2.0
93662: create o14of32.p0.0
93664: create o0of32.p1.0
93663: create o12of32.p2.0
93662: create o29of32.p0.0
93664: create o6of32.p1.0
93663: create o13of32.p2.0
93664: create o18of32.p1.0
93663: create o21of32.p2.0
93662: create o8of32.p0.0
93664: create o5of32.p1.0
93663: create o16of32.p2.0
93662: create o15of32.p0.0
93664: create o4of32.p1.0
93663: create o29of32.p2.0
93662: create o2of32.p0.0
93664: create o9of32.p1.0
93663: create o3of32.p2.0
93662: create o28of32.p0.0
93664: create o27of32.p1.0
93663: create o5of32.p2.0
93662: create o31of32.p0.0
93664: create o26of32.p1.0
93663: create o30of32.p2.0
93662: create o0of32.p0.0
93664: create o3of32.p1.0
93663: create o14of32.p2.0
93664: create o25of32.p1.0
93663: create o0of32.p2.0
93662: create o13of32.p0.0
93664: create o1of32.p1.0
93663: create o28of32.p2.0
93662: create o5of32.p0.0
93664: create o17of32.p1.0
93663: create o26of32.p2.0
93664: create o23of32.p1.0
93662: create o21of32.p0.0
93663: create o18of32.p2.0
93664: create o13of32.p1.0
93662: create o20of32.p0.0
93663: create o31of32.p2.0
93664: create o16of32.p1.0
93662: create o1of32.p0.0
93663: create o27of32.p2.0
93664: create o15of32.p1.0
93662: create o16of32.p0.0
93663: create o7of32.p2.0
93664: create o29of32.p1.0
93662: create o19of32.p0.0
93663: create o22of32.p2.0
93664: create o10of32.p1.0
93662: create o6of32.p0.0
93663: create o9of32.p2.0
93662: create o9of32.p0.0
93662: create o11of32.p0.0
93662: create o4of32.p0.0
93662: create o27of32.p0.0
93662: create o17of32.p0.0
93662: create o25of32.p0.0
93662: unlink i9of32.p0.0
93662: unlink i6of32.p0.0
93662: unlink o23of32.p0.0
93662: unlink i28of32.p0.0
93662: unlink i27of32.p0.0
93662: unlink o25of32.p2.0
93662: unlink i21of32.p2.0
93662: unlink o9of32.p1.0
93662: unlink o6of32.p1.0
93662: unlink o14of32.p1.0
93662: unlink i31of32.p2.0
93662: unlink i10of32.p1.0
93662: unlink i20of32.p1.0
93662: unlink i1of32.p1.0
93662: unlink o24of32.p1.0
93662: unlink o8of32.p2.0
93662: unlink i18of32.p0.0
93662: unlink o7of32.p2.0
93662: unlink i17of32.p0.0
93662: unlink o13of32.p0.0
93662: unlink i30of32.p1.0
93662: unlink i11of32.p2.0
93662: unlink o1of32.p0.0
93662: unlink o15of32.p2.0
93662: unlink o1of32.p1.0
93662: unlink i30of32.p0.0
93662: unlink o13of32.p1.0
93662: unlink i17of32.p1.0
93662: unlink i18of32.p1.0
93662: unlink i1of32.p0.0
93662: unlink o24of32.p0.0
93662: unlink i20of32.p0.0
93662: unlink o22of32.p2.0
93662: unlink i7of32.p2.0
93662: unlink i8of32.p2.0
93662: unlink i26of32.p2.0
93662: unlink i29of32.p2.0
93662: unlink o0of32.p2.0
93662: unlink i10of32.p0.0
93662: unlink o14of32.p0.0
93662: unlink i16of32.p2.0
93662: unlink o6of32.p0.0
93662: unlink i19of32.p2.0
93662: unlink o9of32.p0.0
93662: unlink o12of32.p2.0
93662: unlink i27of32.p1.0
93662: unlink i28of32.p1.0
93662: unlink i6of32.p1.0
93662: unlink o23of32.p1.0
93662: unlink i9of32.p1.0
93662: unlink o25of32.p1.0
93662: unlink i21of32.p1.0
93662: unlink o12of32.p0.0
93662: unlink i16of32.p0.0
93662: unlink o6of32.p2.0
93662: unlink i19of32.p0.0
93662: unlink o9of32.p2.0
93662: unlink o14of32.p2.0
93662: unlink o0of32.p0.0
93662: unlink i10of32.p2.0
93662: unlink i31of32.p1.0
93662: unlink i26of32.p0.0
93662: unlink i29of32.p0.0
93662: unlink o22of32.p0.0
93662: unlink i7of32.p0.0
93662: unlink i8of32.p0.0
93662: unlink i20of32.p2.0
93662: unlink i1of32.p2.0
93662: unlink o24of32.p2.0
93662: unlink o7of32.p1.0
93662: unlink o8of32.p1.0
93662: unlink i11of32.p1.0
93662: unlink i30of32.p2.0
93662: unlink o15of32.p1.0
93662: unlink o15of32.p0.0
93662: unlink i11of32.p0.0
93662: unlink o1of32.p2.0
93662: unlink o13of32.p2.0
93662: unlink o8of32.p0.0
93662: unlink i18of32.p2.0
93662: unlink o7of32.p0.0
93662: unlink i17of32.p2.0
93662: unlink i8of32.p1.0
93662: unlink o22of32.p1.0
93662: unlink i7of32.p1.0
93662: unlink i29of32.p1.0
93662: unlink i26of32.p1.0
93662: unlink i31of32.p0.0
93662: unlink o0of32.p1.0
93662: unlink i19of32.p1.0
93662: unlink i16of32.p1.0
93662: unlink o12of32.p1.0
93662: unlink i21of32.p0.0
93662: unlink o25of32.p0.0
93662: unlink i28of32.p2.0
93662: unlink i27of32.p2.0
93662: unlink i9of32.p2.0
93662: unlink i6of32.p2.0
93662: unlink o23of32.p2.0
93662: unlink i15of32.p1.0
93662: unlink o30of32.p2.0
93662: unlink o11of32.p1.0
93662: unlink o3of32.p1.0
93662: unlink i24of32.p2.0
93662: unlink i5of32.p2.0
93662: unlink o20of32.p2.0
93662: unlink i22of32.p0.0
93662: unlink o29of32.p0.0
93662: unlink o26of32.p0.0
93662: unlink i3of32.p0.0
93662: unlink o31of32.p1.0
93662: unlink o10of32.p2.0
93662: unlink o4of32.p0.0
93662: unlink i14of32.p2.0
93662: unlink o19of32.p0.0
93662: unlink o16of32.p0.0
93662: unlink i12of32.p0.0
93662: unlink o2of32.p2.0
93662: unlink o21of32.p1.0
93662: unlink i4of32.p1.0
93662: unlink i25of32.p1.0
93662: unlink i23of32.p2.0
93662: unlink i2of32.p2.0
93662: unlink o27of32.p2.0
93662: unlink o28of32.p2.0
93662: unlink i25of32.p0.0
93662: unlink o21of32.p0.0
93662: unlink i4of32.p0.0
93662: unlink i12of32.p1.0
93662: unlink o16of32.p1.0
93662: unlink o19of32.p1.0
93662: unlink o4of32.p1.0
93662: unlink o31of32.p0.0
93662: unlink o26of32.p1.0
93662: unlink i3of32.p1.0
93662: unlink o29of32.p1.0
93662: unlink i22of32.p1.0
93662: unlink o17of32.p2.0
93662: unlink o18of32.p2.0
93662: unlink o3of32.p0.0
93662: unlink i13of32.p2.0
93662: unlink o11of32.p0.0
93662: unlink i15of32.p0.0
93662: unlink o5of32.p2.0
93662: unlink i15of32.p2.0
93662: unlink o5of32.p0.0
93662: unlink o11of32.p2.0
93662: unlink o30of32.p1.0
93662: unlink o3of32.p2.0
93662: unlink i13of32.p0.0
93662: unlink o17of32.p0.0
93662: unlink o18of32.p0.0
93662: unlink i24of32.p1.0
93662: unlink i5of32.p1.0
93662: unlink o20of32.p1.0
93662: unlink o10of32.p1.0
93662: unlink o31of32.p2.0
93662: unlink i14of32.p1.0
93662: unlink o2of32.p1.0
93662: unlink o21of32.p2.0
93662: unlink i4of32.p2.0
93662: unlink i25of32.p2.0
93662: unlink i2of32.p0.0
93662: unlink o27of32.p0.0
93662: unlink o28of32.p0.0
93662: unlink i23of32.p0.0
93662: unlink i23of32.p1.0
93662: unlink o28of32.p1.0
93662: unlink i2of32.p1.0
93662: unlink o27of32.p1.0
93662: unlink i12of32.p2.0
93662: unlink o2of32.p0.0
93662: unlink o19of32.p2.0
93662: unlink o16of32.p2.0
93662: unlink o4of32.p2.0
93662: unlink i14of32.p0.0
93662: unlink o10of32.p0.0
93662: unlink o29of32.p2.0
93662: unlink o26of32.p2.0
93662: unlink i3of32.p2.0
93662: unlink i22of32.p2.0
93662: unlink i5of32.p0.0
93662: unlink o20of32.p0.0
93662: unlink i24of32.p0.0
93662: unlink o18of32.p1.0
93662: unlink o17of32.p1.0
93662: unlink i13of32.p1.0
93662: unlink o30of32.p0.0
93662: unlink o5of32.p1.0
Attachments:
0001-Unlink-PHJ-temporary-files-proactively-when-repartit.patchtext/x-patch; charset=US-ASCII; name=0001-Unlink-PHJ-temporary-files-proactively-when-repartit.patchDownload
From 660ee4b9f7ba6c08cc8bc00b18bdbe6c83eb581b Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sat, 6 May 2023 17:13:31 +1200
Subject: [PATCH] Unlink PHJ temporary files proactively when repartitioning.
XXX Draft
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 5fd1c5553b..1baf2be815 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -73,6 +73,7 @@ static void ExecParallelHashJoinSetUpBatches(HashJoinTable hashtable, int nbatch
static void ExecParallelHashEnsureBatchAccessors(HashJoinTable hashtable);
static void ExecParallelHashRepartitionFirst(HashJoinTable hashtable);
static void ExecParallelHashRepartitionRest(HashJoinTable hashtable);
+static void ExecParallelHashDeleteOldPartitions(HashJoinTable hashtable);
static HashMemoryChunk ExecParallelHashPopChunkQueue(HashJoinTable hashtable,
dsa_pointer *shared);
static bool ExecParallelHashTuplePrealloc(HashJoinTable hashtable,
@@ -1217,6 +1218,9 @@ ExecParallelHashIncreaseNumBatches(HashJoinTable hashtable)
/* Wait for the above to be finished. */
BarrierArriveAndWait(&pstate->grow_batches_barrier,
WAIT_EVENT_HASH_GROW_BATCHES_REPARTITION);
+ /* It's now safe to free the previous partitions on disk. */
+ ExecParallelHashDeleteOldPartitions(hashtable);
+
/* Fall through. */
case PHJ_GROW_BATCHES_DECIDE:
@@ -1438,6 +1442,34 @@ ExecParallelHashMergeCounters(HashJoinTable hashtable)
LWLockRelease(&pstate->lock);
}
+/*
+ * After all attached participants have finished repartitioning, it is safe
+ * to unlink the files holding the previous generation of batches.
+ */
+static void
+ExecParallelHashDeleteOldPartitions(HashJoinTable hashtable)
+{
+ ParallelHashJoinState *pstate = hashtable->parallel_state;
+ int old_nbatch = pstate->old_nbatch;
+ ParallelHashJoinBatch *old_batches;
+
+ old_batches = (ParallelHashJoinBatch *)
+ dsa_get_address(hashtable->area, pstate->old_batches);
+ for (int i = 1; i < old_nbatch; ++i)
+ {
+ ParallelHashJoinBatch *shared =
+ NthParallelHashJoinBatch(old_batches, i);
+ SharedTuplestoreAccessor *accessor;
+
+ accessor = sts_attach(ParallelHashJoinBatchInner(shared),
+ ParallelWorkerNumber + 1,
+ &pstate->fileset);
+ sts_dispose(accessor);
+ /* XXX free */
+ }
+
+}
+
/*
* ExecHashIncreaseNumBuckets
* increase the original number of buckets in order to reduce
diff --git a/src/backend/utils/sort/sharedtuplestore.c b/src/backend/utils/sort/sharedtuplestore.c
index 0831249159..ceff3b3313 100644
--- a/src/backend/utils/sort/sharedtuplestore.c
+++ b/src/backend/utils/sort/sharedtuplestore.c
@@ -584,6 +584,23 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
return NULL;
}
+/*
+ * Free any disk space consumed by this process. After this, data should not
+ * be read from this SharedTuplestore by any process, so it must be known that
+ * all readers have finished and will not try to read again. The reason we
+ * unlink only files created by this backend is that the file size limit
+ * mechanism is per-process, but the counters could go negative if one process
+ * freed more than it had consumed.
+ */
+void
+sts_dispose(SharedTuplestoreAccessor *accessor)
+{
+ char name[MAXPGPATH];
+
+ sts_filename(name, accessor, accessor->participant);
+ BufFileDeleteFileSet(&accessor->fileset->fs, name, true);
+}
+
/*
* Create the name used for the BufFile that a given participant will write.
*/
diff --git a/src/include/utils/sharedtuplestore.h b/src/include/utils/sharedtuplestore.h
index c7075ad055..34f7d27198 100644
--- a/src/include/utils/sharedtuplestore.h
+++ b/src/include/utils/sharedtuplestore.h
@@ -58,4 +58,6 @@ extern void sts_puttuple(SharedTuplestoreAccessor *accessor,
extern MinimalTuple sts_parallel_scan_next(SharedTuplestoreAccessor *accessor,
void *meta_data);
+extern void sts_dispose(SharedTuplestoreAccessor *accessor);
+
#endif /* SHAREDTUPLESTORE_H */
--
2.40.1
Hi,
Thanks for working on this!
On Wed, 10 May 2023 15:11:20 +1200
Thomas Munro <thomas.munro@gmail.com> wrote:
One complaint about PHJ is that it can, in rare cases, use a
surprising amount of temporary disk space where non-parallel HJ would
not. When it decides that it needs to double the number of batches to
try to fit each inner batch into memory, and then again and again
depending on your level of bad luck, it leaves behind all the earlier
generations of inner batch files to be cleaned up at the end of the
query. That's stupid. Here's a patch to unlink them sooner, as a
small improvement.
This patch can indeed save a decent amount of temporary disk space.
Considering its complexity is (currently?) quite low, it worth it.
The reason I didn't do this earlier is that sharedtuplestore.c
continues the pre-existing tradition where each parallel process
counts what it writes against its own temp_file_limit. At the time I
thought I'd need to have one process unlink all the files, but if a
process were to unlink files that it didn't create, that accounting
system would break. Without some new kind of shared temp_file_limit
mechanism that doesn't currently exist, per-process counters could go
negative, creating free money. In the attached patch, I realised
something that I'd missed before: there is a safe point for each
backend to unlink just the files that it created, and there is no way
for a process that created files not to reach that point.
Indeed.
For what it worth, from my new and non-experienced understanding of the
parallel mechanism, waiting for all workers to reach
WAIT_EVENT_HASH_GROW_BATCHES_REPARTITION, after re-dispatching old batches in
new ones, seems like a safe place to instruct each workers to clean their old
temp files.
Here's an example query that tries 8, 16 and then 32 batches on my
machine, because reltuples is clobbered with a bogus value.
Nice!
Regards,
On 11/05/2023 00:00, Jehan-Guillaume de Rorthais wrote:
On Wed, 10 May 2023 15:11:20 +1200
Thomas Munro <thomas.munro@gmail.com> wrote:The reason I didn't do this earlier is that sharedtuplestore.c
continues the pre-existing tradition where each parallel process
counts what it writes against its own temp_file_limit. At the time I
thought I'd need to have one process unlink all the files, but if a
process were to unlink files that it didn't create, that accounting
system would break. Without some new kind of shared temp_file_limit
mechanism that doesn't currently exist, per-process counters could go
negative, creating free money. In the attached patch, I realised
something that I'd missed before: there is a safe point for each
backend to unlink just the files that it created, and there is no way
for a process that created files not to reach that point.Indeed.
For what it worth, from my new and non-experienced understanding of the
parallel mechanism, waiting for all workers to reach
WAIT_EVENT_HASH_GROW_BATCHES_REPARTITION, after re-dispatching old batches in
new ones, seems like a safe place to instruct each workers to clean their old
temp files.
Looks good to me too at a quick glance. There's this one "XXX free"
comment though:
for (int i = 1; i < old_nbatch; ++i)
{
ParallelHashJoinBatch *shared =
NthParallelHashJoinBatch(old_batches, i);
SharedTuplestoreAccessor *accessor;accessor = sts_attach(ParallelHashJoinBatchInner(shared),
ParallelWorkerNumber + 1,
&pstate->fileset);
sts_dispose(accessor);
/* XXX free */
}
I think that's referring to the fact that sts_dispose() doesn't free the
'accessor', or any of the buffers etc. that it contains. That's a
pre-existing problem, though: ExecParallelHashRepartitionRest() already
leaks the SharedTuplestoreAccessor structs and their buffers etc. of the
old batches. I'm a little surprised there isn't aready an sts_free()
function.
Another thought is that it's a bit silly to have to call sts_attach()
just to delete the files. Maybe sts_dispose() should take the same three
arguments that sts_attach() does, instead.
So that freeing would be nice to tidy up, although the amount of memory
leaked is tiny so might not be worth it, and it's a pre-existing issue.
I'm marking this as Ready for Committer.
--
Heikki Linnakangas
Neon (https://neon.tech)
On Wed, Sep 27, 2023 at 11:42 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Looks good to me too at a quick glance. There's this one "XXX free"
comment though:for (int i = 1; i < old_nbatch; ++i)
{
ParallelHashJoinBatch *shared =
NthParallelHashJoinBatch(old_batches, i);
SharedTuplestoreAccessor *accessor;accessor = sts_attach(ParallelHashJoinBatchInner(shared),
ParallelWorkerNumber + 1,
&pstate->fileset);
sts_dispose(accessor);
/* XXX free */
}I think that's referring to the fact that sts_dispose() doesn't free the
'accessor', or any of the buffers etc. that it contains. That's a
pre-existing problem, though: ExecParallelHashRepartitionRest() already
leaks the SharedTuplestoreAccessor structs and their buffers etc. of the
old batches. I'm a little surprised there isn't aready an sts_free()
function.Another thought is that it's a bit silly to have to call sts_attach()
just to delete the files. Maybe sts_dispose() should take the same three
arguments that sts_attach() does, instead.So that freeing would be nice to tidy up, although the amount of memory
leaked is tiny so might not be worth it, and it's a pre-existing issue.
I'm marking this as Ready for Committer.
(I thought I'd go around and nudge CF entries where both author and
reviewer are committers.)
Hi Thomas, do you have any additional thoughts on the above?
Hi,
I see in [1]/messages/by-id/18349-83d33dd3d0c855c3@postgresql.org that the reporter mentioned a delay between the error
message in parallel HashJoin and the return control back from PSQL. Your
patch might reduce this delay.
Also, I have the same complaint from users who processed gigabytes of
data in parallel HashJoin. Presumably, they also stuck into the unlink
of tons of temporary files. So, are you going to do something with this
code?
[1]: /messages/by-id/18349-83d33dd3d0c855c3@postgresql.org
/messages/by-id/18349-83d33dd3d0c855c3@postgresql.org
--
regards,
Andrei Lepikhov
Postgres Professional
On Wed, Feb 21, 2024 at 7:34 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
I see in [1] that the reporter mentioned a delay between the error
message in parallel HashJoin and the return control back from PSQL. Your
patch might reduce this delay.
Also, I have the same complaint from users who processed gigabytes of
data in parallel HashJoin. Presumably, they also stuck into the unlink
of tons of temporary files. So, are you going to do something with this
code?
Yeah, right. I will aim to get this into the tree next week. First,
there are a couple of minor issues to resolve around freeing that
Heikki mentioned. Then there is the question of whether we think this
might be a candidate for back-patching, given the complaints you
mention. Opinions?
I would add that the problems you reach when you get to very large
number of partitions are hard (see several very long threads about
extreme skew for one version of the problem, but even with zero/normal
skewness and perfect estimation of the number of partitions, if you
ask a computer to partition 42TB of data into partitions that fit in a
work_mem suitable for a Commodore 64, it's gonna hurt on several
levels) and this would only slightly improve one symptom. One idea
that might improve just the directory entry and file descriptor
aspect, would be to scatter the partitions into (say) 1MB chunks
within the file, and hope that the file system supports holes (a bit
like logtape.c's multiplexing but I wouldn't do it quite like that).
On 22/2/2024 06:42, Thomas Munro wrote:
On Wed, Feb 21, 2024 at 7:34 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:I see in [1] that the reporter mentioned a delay between the error
message in parallel HashJoin and the return control back from PSQL. Your
patch might reduce this delay.
Also, I have the same complaint from users who processed gigabytes of
data in parallel HashJoin. Presumably, they also stuck into the unlink
of tons of temporary files. So, are you going to do something with this
code?Yeah, right. I will aim to get this into the tree next week. First,
there are a couple of minor issues to resolve around freeing that
Heikki mentioned. Then there is the question of whether we think this
might be a candidate for back-patching, given the complaints you
mention. Opinions?
The code is related to performance, not a bug. Also, it adds one
external function into the 'sharedtuplestore.h'. IMO, it isn't worth it
to make back-patches.
I would add that the problems you reach when you get to very large
number of partitions are hard (see several very long threads about
extreme skew for one version of the problem, but even with zero/normal
skewness and perfect estimation of the number of partitions, if you
ask a computer to partition 42TB of data into partitions that fit in a
work_mem suitable for a Commodore 64, it's gonna hurt on several
levels) and this would only slightly improve one symptom. One idea
that might improve just the directory entry and file descriptor
aspect, would be to scatter the partitions into (say) 1MB chunks
within the file, and hope that the file system supports holes (a bit
like logtape.c's multiplexing but I wouldn't do it quite like that).
Thanks, I found in [1]/messages/by-id/CA+hUKGKDbv+5uiJZDdB1wttkMPFs9CDb6=02Qxitq4am-KBM_A@mail.gmail.com good entry point to dive into this issue.
[1]: /messages/by-id/CA+hUKGKDbv+5uiJZDdB1wttkMPFs9CDb6=02Qxitq4am-KBM_A@mail.gmail.com
/messages/by-id/CA+hUKGKDbv+5uiJZDdB1wttkMPFs9CDb6=02Qxitq4am-KBM_A@mail.gmail.com
--
regards,
Andrei Lepikhov
Postgres Professional
On Thu, Feb 22, 2024 at 5:37 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
On 22/2/2024 06:42, Thomas Munro wrote:
extreme skew for one version of the problem, but even with zero/normal
skewness and perfect estimation of the number of partitions, if you
Sorry, I meant to write "but even with no duplicates" there (mention
of "normal" was brain fade).
On Wed, Feb 21, 2024 at 6:42 PM Thomas Munro <thomas.munro@gmail.com> wrote:
Yeah, right. I will aim to get this into the tree next week. First,
there are a couple of minor issues to resolve around freeing that
Heikki mentioned. Then there is the question of whether we think this
might be a candidate for back-patching, given the complaints you
mention. Opinions?
It doesn't appear to me that this got committed. On the procedural
question, I would personally treat it as a non-back-patchable bug fix
i.e. master-only but without regard to feature freeze. However, I can
see arguments for either treating it as a back-patchable fix or for
waiting until v18 development opens. What would you like to do?
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, May 14, 2024 at 02:56:37PM -0400, Robert Haas wrote:
It doesn't appear to me that this got committed. On the procedural
question, I would personally treat it as a non-back-patchable bug fix
i.e. master-only but without regard to feature freeze. However, I can
see arguments for either treating it as a back-patchable fix or for
waiting until v18 development opens. What would you like to do?
I am not seeing anything committed, either. Thomas, an update?
--
Michael