DBT-5 Stored Procedure Development (2022)
Dear all,
Please review the attached for my jerry-rigged project proposal. I am
seeking to continually refactor the proposal as I can!
Thanks,
Mahesh
Attachments:
GSoC (2022) - Proposal for PostgreSQL-Gouru.pdfapplication/pdf; name="GSoC (2022) - Proposal for PostgreSQL-Gouru.pdf"Download+3-3
On Tue, Apr 19, 2022 at 11:02 AM Mahesh Gouru <mahesh.gouru@gmail.com> wrote:
Please review the attached for my jerry-rigged project proposal. I am seeking to continually refactor the proposal as I can!
I for one see a lot of value in this proposal. I think it would be
great to revive DBT-5, since TPC-E has a number of interesting
bottlenecks that we'd likely learn something from. It's particularly
good at stressing concurrency control, which TPC-C really doesn't do.
It's also a lot easier to run smaller benchmarks that don't require
lots of storage space, but are nevertheless correct according to the
spec.
--
Peter Geoghegan
Hi Mahesh,
On Tue, Apr 19, 2022 at 02:01:54PM -0400, Mahesh Gouru wrote:
Dear all,
Please review the attached for my jerry-rigged project proposal. I am
seeking to continually refactor the proposal as I can!
My comments might briefer that they should be, but I need to write this
quickly. :)
* The 4 steps in the description aren't needed, they already exist.
* May 20: I think this should be more about reviewing the TPC-E
specification rather than industry research, as we want to try to
follow specification guidelines.
* June 20: Random data generation and scaling are provided by and
already defined by the spec
* Aug 01: A report generator already exists, but I think time could be
allocated to redoing the raw HTML generation with something like
reStructuredText, something that is easier to generate with scripts
and convertible into other formats with other tools
As some of tasks proposed are actually in place, one other task could be
updating egen (the TPC supplied code.) The kit was last developed again
1.12 and 1.14 is current as this email.
Regards,
Mark
On Tue, Apr 19, 2022 at 11:31 AM Mark Wong <markwkm@gmail.com> wrote:
As some of tasks proposed are actually in place, one other task could be
updating egen (the TPC supplied code.) The kit was last developed again
1.12 and 1.14 is current as this email.
As you know, I have had some false starts with using DBT5 on a modern
Linux distribution. Perhaps I gave up too easily at the time, but I'm
definitely still interested. Has there been work on that since?
Thanks
--
Peter Geoghegan
On Tue, Apr 19, 2022 at 05:20:50PM -0700, Peter Geoghegan wrote:
On Tue, Apr 19, 2022 at 11:31 AM Mark Wong <markwkm@gmail.com> wrote:
As some of tasks proposed are actually in place, one other task could be
updating egen (the TPC supplied code.) The kit was last developed again
1.12 and 1.14 is current as this email.As you know, I have had some false starts with using DBT5 on a modern
Linux distribution. Perhaps I gave up too easily at the time, but I'm
definitely still interested. Has there been work on that since?
I'm afraid not. I'm guessing that pulling in egen 1.14 would address
that. Maybe it would make sense to put that on the top of todo list if
this project is accepted...
Regards,
Mark
On Tue, Apr 26, 2022 at 10:36 AM Mark Wong <markwkm@gmail.com> wrote:
I'm afraid not. I'm guessing that pulling in egen 1.14 would address
that. Maybe it would make sense to put that on the top of todo list if
this project is accepted...
Wouldn't it be a prerequisite here? I don't actually have any reason
to prefer the old function-based code to the new stored procedure
based code. Really, all I'm looking for is a credible implementation
of TPC-E that I can use to model some aspects of OLTP performance for
my own purposes.
TPC-C (which I have plenty of experience with) has only two secondary
indexes (in typical configurations), and doesn't really stress
concurrency control at all. Plus there are no low cardinality indexes
in TPC-C, while TPC-E has quite a few. Chances are high that I'd learn
something from TPC-E, which has all of these things -- I'm really
looking for bottlenecks, where Postgres does entirely the wrong thing.
It's especially interesting to me as somebody that focuses on B-Tree
indexing.
--
Peter Geoghegan
On Mon, May 02, 2022 at 07:14:28AM -0700, Mark Wong wrote:
On Tue, Apr 26, 2022, 10:45 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Tue, Apr 26, 2022 at 10:36 AM Mark Wong <markwkm@gmail.com> wrote:
I'm afraid not. I'm guessing that pulling in egen 1.14 would address
that. Maybe it would make sense to put that on the top of todo list if
this project is accepted...Wouldn't it be a prerequisite here? I don't actually have any reason
to prefer the old function-based code to the new stored procedure
based code. Really, all I'm looking for is a credible implementation
of TPC-E that I can use to model some aspects of OLTP performance for
my own purposes.TPC-C (which I have plenty of experience with) has only two secondary
indexes (in typical configurations), and doesn't really stress
concurrency control at all. Plus there are no low cardinality indexes
in TPC-C, while TPC-E has quite a few. Chances are high that I'd learn
something from TPC-E, which has all of these things -- I'm really
looking for bottlenecks, where Postgres does entirely the wrong thing.
It's especially interesting to me as somebody that focuses on B-Tree
indexing.
I think it could be done in either order.
While it's not ideal that the kit seems to work most reliably as-is on
RHEL/Centos/etc. 6, I think that could provide some confidence in
getting familiar with something on a working platform. The updates to
the stored functions/procedures would be the same regardless of egen
version.
If we get the project slot, we can talk further about what to actually
tackle first.
Regards,
Mark
Import Notes
Reply to msg id not found: CAE+TzGq_71X5axLiTOWtD3powp5jLSonLvRUQFA2qxoeW0wxRg@mail.gmail.com