GSoC 2017
Hi all!
In 2016 PostgreSQL project didn't pass to GSoC program. In my
understanding the reasons for that are following.
1. We did last-minute submission of our application to GSoC.
2. In 2016 GSoC application form for mentoring organizations has been
changed. In particular, it required more detailed information about
possible project.
As result we didn't manage to make a good enough application that time.
Thus, our application was declined. See [1]/messages/by-id/CAA-aLv4p1jfuMpsRaY2jDUQqypkEXUxeb7z8Mp-0mW6M03St7A@mail.gmail.com and [2]/messages/by-id/CALxAEPuGpAjBSN-PTuxHfuLLqDS47BEbO_ZYxUYQR3ud1nwbww@mail.gmail.com for details.
I think that the right way to manage this in 2017 would be to start
collecting required information in advance. According to GSoC 2017
timeline [3]https://developers.google.com/open-source/gsoc/timeline mentoring organization can submit their applications from
January 19 to February 9. Thus, now it's a good time to start collecting
project ideas and make call for mentors. Also, we need to decide who would
be our admin this year.
In sum, we have following questions:
1. What project ideas we have?
2. Who are going to be mentors this year?
3. Who is going to be project admin this year?
BTW, I'm ready to be mentor this year. I'm also open to be an admin if
needed.
[1]: /messages/by-id/CAA-aLv4p1jfuMpsRaY2jDUQqypkEXUxeb7z8Mp-0mW6M03St7A@mail.gmail.com
/messages/by-id/CAA-aLv4p1jfuMpsRaY2jDUQqypkEXUxeb7z8Mp-0mW6M03St7A@mail.gmail.com
[2]: /messages/by-id/CALxAEPuGpAjBSN-PTuxHfuLLqDS47BEbO_ZYxUYQR3ud1nwbww@mail.gmail.com
/messages/by-id/CALxAEPuGpAjBSN-PTuxHfuLLqDS47BEbO_ZYxUYQR3ud1nwbww@mail.gmail.com
[3]: https://developers.google.com/open-source/gsoc/timeline
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Count me in as a mentor
On 10-Jan-2017 3:24 PM, "Alexander Korotkov" <a.korotkov@postgrespro.ru>
wrote:
Show quoted text
Hi all!
In 2016 PostgreSQL project didn't pass to GSoC program. In my
understanding the reasons for that are following.1. We did last-minute submission of our application to GSoC.
2. In 2016 GSoC application form for mentoring organizations has been
changed. In particular, it required more detailed information about
possible project.As result we didn't manage to make a good enough application that time.
Thus, our application was declined. See [1] and [2] for details.I think that the right way to manage this in 2017 would be to start
collecting required information in advance. According to GSoC 2017
timeline [3] mentoring organization can submit their applications from
January 19 to February 9. Thus, now it's a good time to start collecting
project ideas and make call for mentors. Also, we need to decide who would
be our admin this year.In sum, we have following questions:
1. What project ideas we have?
2. Who are going to be mentors this year?
3. Who is going to be project admin this year?BTW, I'm ready to be mentor this year. I'm also open to be an admin if
needed.[1] /messages/by-id/CAA-
aLv4p1jfuMpsRaY2jDUQqypkEXUxeb7z8Mp-0mW6M03St7A%40mail.gmail.com
[2] /messages/by-id/CALxAEPuGpAjBSN-
PTuxHfuLLqDS47BEbO_ZYxUYQR3ud1nwbww%40mail.gmail.com
[3] https://developers.google.com/open-source/gsoc/timeline------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
2017-01-10 14:53 GMT+05:00 Alexander Korotkov <a.korotkov@postgrespro.ru>:
1. What project ideas we have?
Hi!
I'd like to propose project on sorting algorithm research. I’m ready
to be a mentor on this project.
===Topic===
Sorting algorithms benchmark and implementation.
===Idea===
Currently the PostgreSQL uses Hoare’s Quicksort implementation based
on work of Bentley and McIlroy [1]Bentley, Jon L., and M. Douglas McIlroy. "Engineering a sort function." Software: Practice and Experience 23.11 (1993): 1249-1265. from 1993, while there exist some
more novel algorithms [2]Musser, David R. "Introspective sorting and selection algorithms." Softw., Pract. Exper. 27.8 (1997): 983-993., [3]Auger, Nicolas, Cyril Nicaud, and Carine Pivoteau. "Merge Strategies: from Merge Sort to TimSort." (2015)., and [4]Beniwal, Sonal, and Deepti Grover. "Comparison of various sorting algorithms: A review." International Journal of Emerging Research in Management &Technology 2 (2013). which are actively used by
highly optimized code like Java and .NET. Probably, use of optimized
sorting algorithm could yield general system performance improvement.
Also, use of non-comparison based algorithms deserves attention and
benchmarking [5]Mcllroy, Peter M., Keith Bostic, and M. Douglas Mcllroy. "Engineering radix sort." Computing systems 6.1 (1993): 5-27..
===Project details===
The project has four essential parts:
1. Implementation of benchmark for sorting. Making sure that
operations using sorting are represented proportionally to some
“average” use cases.
2. Selection of benchmark algorithms. Selection can be based,
for example, on scientific papers or community opinions.
3. Benchmark implementation of selected algorithms. Analysis of
results, picking of winner.
4. Industrial implementation for pg_qsort(), pg_qsort_args() and
gen_qsort_tuple.pl. Implemented patch is submitted to commitfest,
other patch is reviewed by the student.
[1]: Bentley, Jon L., and M. Douglas McIlroy. "Engineering a sort function." Software: Practice and Experience 23.11 (1993): 1249-1265.
function." Software: Practice and Experience 23.11 (1993): 1249-1265.
[2]: Musser, David R. "Introspective sorting and selection algorithms." Softw., Pract. Exper. 27.8 (1997): 983-993.
Softw., Pract. Exper. 27.8 (1997): 983-993.
[3]: Auger, Nicolas, Cyril Nicaud, and Carine Pivoteau. "Merge Strategies: from Merge Sort to TimSort." (2015).
Strategies: from Merge Sort to TimSort." (2015).
[4]: Beniwal, Sonal, and Deepti Grover. "Comparison of various sorting algorithms: A review." International Journal of Emerging Research in Management &Technology 2 (2013).
algorithms: A review." International Journal of Emerging Research in
Management &Technology 2 (2013).
[5]: Mcllroy, Peter M., Keith Bostic, and M. Douglas Mcllroy. "Engineering radix sort." Computing systems 6.1 (1993): 5-27.
"Engineering radix sort." Computing systems 6.1 (1993): 5-27.
Best regards, Andrey Borodin.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2017-01-10 14:53 GMT+05:00 Alexander Korotkov <a.korotkov@postgrespro.ru>:
1. What project ideas we have?
I have one more project of interest which I can mentor.
Topic. GiST API advancement
===Idea===
GiST API was designed at the beginning of 90th to reduce boilerplate
code around data access methods over balanced tree. Now, after 30
years, there are some ideas on improving this API.
===Project details===
Opclass developer must specify 4 core operations to make a type GiST-indexable:
1. Split: a function to split set of datatype instances into two parts.
2. Penalty calculation: a function to measure penalty for unification
of two keys.
3. Collision check: a function which determines whether two keys may
have overlap or are not intersecting.
4. Unification: a function to combine two keys into one so that
combined key collides with both input keys.
Functions 2 and 3 can be improved.
For example, Revised R*-tree[1]Beckmann, Norbert, and Bernhard Seeger. "A revised r*-tree in comparison with related index structures." Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. ACM, 2009. algorithm of insertion cannot be
expressed in terms of penalty-based algorithms. There was some
attempts to bring parts of RR*-tree insertion, but they come down to
ugly hacks [2]/messages/by-id/CAJEAwVFMo-FXaJ6Lkj8Wtb1br0MtBY48EGMVEJBOodROEGykKg@mail.gmail.com. Current GiST API, due to penalty-based insertion
algorithm, does not allow to implement important feature of RR*-tree:
overlap optimization. As Norbert Beckman, author of RR*-tree, put it
in discussion: “Overlap optimization is one of the main elements, if
not the main query performance tuning element of the RR*-tree. You
would fall back to old R-Tree times if that would be left off.”
Collision check currently returns binary result:
1. Query may be collides with subtree MBR
2. Query do not collides with subtree
This result may be augmented with a third state: subtree is totally
within query. In this case GiST scan can scan down subtree without key
checks.
Potential effect of these improvements must be benchmarked. Probably,
implementation of these two will spawn more ideas on GiST performance
improvements.
Finally, GiST do not provide API for bulk loading. Alexander Korotkov
during GSoC 2011 implemented buffered GiST build. This index
construction is faster, but yields the index tree with virtually same
querying performance. There are different algorithms aiming to provide
better indexing tree due to some knowledge of data, e.g. [3]Achakeev, Daniar, Bernhard Seeger, and Peter Widmayer. "Sort-based query-adaptive loading of r-trees." Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 2012.
[1]: Beckmann, Norbert, and Bernhard Seeger. "A revised r*-tree in comparison with related index structures." Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. ACM, 2009.
comparison with related index structures." Proceedings of the 2009 ACM
SIGMOD International Conference on Management of data. ACM, 2009.
[2]: /messages/by-id/CAJEAwVFMo-FXaJ6Lkj8Wtb1br0MtBY48EGMVEJBOodROEGykKg@mail.gmail.com
[3]: Achakeev, Daniar, Bernhard Seeger, and Peter Widmayer. "Sort-based query-adaptive loading of r-trees." Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 2012.
query-adaptive loading of r-trees." Proceedings of the 21st ACM
international conference on Information and knowledge management. ACM,
2012.
Best regards, Andrey Borodin.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 1/10/17 1:53 AM, Alexander Korotkov wrote:
1. What project ideas we have?
Perhaps allowing SQL-only extensions without requiring filesystem files
would be a good project.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2017-01-12 21:21 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/10/17 1:53 AM, Alexander Korotkov wrote:
1. What project ideas we have?
Perhaps allowing SQL-only extensions without requiring filesystem files
would be a good project.
Implementation safe evaluation untrusted PL functions - evaluation under
different user under different process.
Regards
Pavel
Show quoted text
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
A new data type, and/or a new index type could both be nicely scoped bits
of work.
On Thu, Jan 12, 2017 at 12:27 PM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:
2017-01-12 21:21 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/10/17 1:53 AM, Alexander Korotkov wrote:
1. What project ideas we have?
Perhaps allowing SQL-only extensions without requiring filesystem files
would be a good project.Implementation safe evaluation untrusted PL functions - evaluation under
different user under different process.Regards
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt."—Kurt Vonnegut
Jim Nasby wrote:
On 1/10/17 1:53 AM, Alexander Korotkov wrote:
1. What project ideas we have?
Perhaps allowing SQL-only extensions without requiring filesystem files
would be a good project.
Don't we already have that in patch form? Dimitri submitted it as I
recall.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 1/13/17 4:08 PM, Alvaro Herrera wrote:
Jim Nasby wrote:
On 1/10/17 1:53 AM, Alexander Korotkov wrote:
1. What project ideas we have?
Perhaps allowing SQL-only extensions without requiring filesystem files
would be a good project.Don't we already have that in patch form? Dimitri submitted it as I
recall.
My recollection is that he tried to boil the ocean and also support
handing compiled C libraries to the database, which was enough to sink
the patch. It might be nice to support that if we could, and maybe it
could be a follow-on project.
I do think complete lack of support for non-FS extensions is *seriously*
hurting use of the feature thanks to environments like RDS and heroku.
As Pavel mentioned, untrusted languages are in a similar boat. So maybe
the best way to address these things is to advertise them as "increase
usability in cloud environments" since cloud excites people.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 1/13/17 3:09 PM, Peter van Hardenberg wrote:
A new data type, and/or a new index type could both be nicely scoped
bits of work.
Did you have any particular data/index types in mind?
Personally I'd love something that worked like a python dictionary, but
I'm not sure how that'd work without essentially supporting a variant
data type. I've got code for a variant type[1], and I don't think
there's any holes in it, but the casting semantics are rather ugly. IIRC
that problem appeared to be solvable if there was a hook in the current
casting code right before Postgres threw in the towel and said a cast
was impossible.
1: https://github.com/BlueTreble/variant/
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I'm ready to be a mentor.
10.01.2017 12:53, Alexander Korotkov:
Hi all!
In 2016 PostgreSQL project didn't pass to GSoC program. In my
understanding the reasons for that are following.1. We did last-minute submission of our application to GSoC.
2. In 2016 GSoC application form for mentoring organizations has been
changed. In particular, it required more detailed information about
possible project.As result we didn't manage to make a good enough application that
time. Thus, our application was declined. See [1] and [2] for details.I think that the right way to manage this in 2017 would be to start
collecting required information in advance. According to GSoC 2017
timeline [3] mentoring organization can submit their applications from
January 19 to February 9. Thus, now it's a good time to start
collecting project ideas and make call for mentors. Also, we need to
decide who would be our admin this year.In sum, we have following questions:
1. What project ideas we have?
2. Who are going to be mentors this year?
3. Who is going to be project admin this year?BTW, I'm ready to be mentor this year. I'm also open to be an admin
if needed.[1]
/messages/by-id/CAA-aLv4p1jfuMpsRaY2jDUQqypkEXUxeb7z8Mp-0mW6M03St7A@mail.gmail.com
[2]
/messages/by-id/CALxAEPuGpAjBSN-PTuxHfuLLqDS47BEbO_ZYxUYQR3ud1nwbww@mail.gmail.com
[3] https://developers.google.com/open-source/gsoc/timeline------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
All,
* Alexander Korotkov (a.korotkov@postgrespro.ru) wrote:
Also, we need to decide who would
be our admin this year.
I don't see anyone jumping at the bit to be the admin (it's not exactly
a fun and exciting job, after all), so, unless someone really wants it
(or someone wishs to object), I volunteer as tribute to be the admin
this year.
As such, we need to get this whole thing moving, and pretty quickly, as
Alexander noted.
The first thing we need is an "Ideas" page which includes:
- Brief descriptions of projects that can be completed in about 12 weeks.
- For each project, a list of prerequisites, description of programming
skills needed and estimation of difficulty level.
- A list of potential mentors.
The GSoC 2016 page was a start on this. I copied that page and updated
it to be a somewhat clearer format, but it could probably use more work.
Here's what google says about the ideas page:
----------
The best pages include links to more detailed descriptions and related
materials for each project. They might even include actual use cases!
Keep in mind that this page is often the first view of your organization
by Google and potential student applicants. A link to your bug tracker
does not an Ideas Page make. Put your best foot forward. In addition to
a basic list, you might also consider providing links to relevant
resources for mentors and students, particular FAQ entries, the
timeline, etc. You might include a section on communication, giving
specific advice on which mailing lists, channels and emails to use and
how to use them. If your organization puts together an application
template for students, you should include that on your page as well.
Think of your Ideas Page as the GSoC portal to your organization.
----------
Would be great for folks to review what's there, maybe provide actual
use-cases for the existing project suggestions, verify that the projects
listed are still valid and appropriate at this point, and, please:
ADD YOUR PROJECTS.
https://wiki.postgresql.org/wiki/GSoC_2017
More information about what the project definition should look like is
included here:
http://write.flossmanuals.net/gsoc-mentoring/defining-a-project/
Before submitting it to Google, I'm going to either expand or nuke
everything under the 'core' section, so if there's something that that
you are really interested in, expand it out so we can have it properly
included in our application to Google.
Also, Google has said that they actually *like* "Umbrella" projects. As
such, I believe we should encourage projects which are closely related
to PostgreSQL to submit projects for consideration. I don't think "just
uses PostgreSQL" would be reasonable, but I do think something like "Add
feature XYZ to the pgconf.eu code base to help PostgreSQL-based
organizations and community conferences" would be.
Let's make this year's PostgreSQL GSoC awesome!
Thanks!
Stephen
A new currency type would be nice, and if kept small in scope, might be
manageable. Bringing Christoph Berg's PostgreSQL-units into core and
extending it could be interesting. Peter E's URL and email types might be
good candidates. What else? Informix Datablades had a media type way back
in the day... That's still a gap in community Postgres.
On Mon, Jan 16, 2017 at 6:43 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 1/13/17 3:09 PM, Peter van Hardenberg wrote:
A new data type, and/or a new index type could both be nicely scoped
bits of work.Did you have any particular data/index types in mind?
Personally I'd love something that worked like a python dictionary, but
I'm not sure how that'd work without essentially supporting a variant data
type. I've got code for a variant type[1], and I don't think there's any
holes in it, but the casting semantics are rather ugly. IIRC that problem
appeared to be solvable if there was a hook in the current casting code
right before Postgres threw in the towel and said a cast was impossible.1: https://github.com/BlueTreble/variant/
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
--
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt."—Kurt Vonnegut
On 1/23/17 3:45 PM, Peter van Hardenberg wrote:
A new currency type would be nice, and if kept small in scope, might be
manageable.
I'd be rather nervous about this. My impression of community consensus
on this is a currency type that doesn't somehow support conversion
between different currencies is pretty useless, and supporting
conversions opens a 55 gallon drum of worms. I could certainly be
mistaken in my impression, but I think there'd need to be some kind of
consensus on what a currency type should do before putting that up for GSoC.
But, speaking of types, I wish we had a timestamp type that stored what
the original timezone was, as well as the relevant TZDATA entry that was
in place for that timestamp when it was created. Since it'd be
completely impractical to store TZDATA as part of the dataum, there
would need to be an immutable catalog table that stored the contents of
TZDATA any time it changed, as well as a fast way to find the surrogate
key for the current TZDATA.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jan 23, 2017 at 4:12 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 1/23/17 3:45 PM, Peter van Hardenberg wrote:
A new currency type would be nice, and if kept small in scope, might be
manageable.I'd be rather nervous about this. My impression of community consensus on
this is a currency type that doesn't somehow support conversion between
different currencies is pretty useless, and supporting conversions opens a
55 gallon drum of worms. I could certainly be mistaken in my impression,
but I think there'd need to be some kind of consensus on what a currency
type should do before putting that up for GSoC.
There's a relatively simple solution to the currency conversion problem
which avoids running afoul of the various mistakes some previous
implementations have made. Track currencies separately and always ask for a
conversion chart at operation time.
Let the user specify the values they want at conversion time. That looks
like this:
=> select '1 CAD'::currency + '1 USD'::currency + '1 CHF'::currency
'1.00CAD 1.00USD 1.00CHF'
=> select convert('10.00CAD'::new_currency, ('USD, '1.25', 'CHF',
'1.50')::array, 'USD')
12.50USD
The basic concept is that the value of a currency type is that it would
allow you to operate in multiple currencies without accidentally adding
them. You'd flatten them to a single type if when and how you wanted for
any given operation but could work without fear of losing information.
I have no opinion about the most pleasing notation for the currency
conversion chart, but I imagine it would be reasonable to let users provide
a default set of conversion values somewhere.
There are interesting and worthwhile conversations to have about
non-decimal currencies, but I think it would be totally reasonable not to
support them at all in a first release. As for currency precision, I would
probably consider leaning on numeric under the hood for the actual currency
values themselves but IANAA (though I have done quite a lot of work on
billing systems).
If it would be helpful, I could provide a detailed proposal on the wiki for
others to critique?
-
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt."—Kurt Vonnegut
On 24 January 2017 at 03:42, Peter van Hardenberg <pvh@pvh.ca> wrote:
The basic concept is that the value of a currency type is that it would
allow you to operate in multiple currencies without accidentally adding
them. You'd flatten them to a single type if when and how you wanted for any
given operation but could work without fear of losing information.
I don't think this even needs to be tied to currencies. I've often
thought this would be generally useful for any value with units. This
would prevent you from accidentally adding miles to kilometers or
hours to parsecs which is just as valid as preventing you from adding
CAD to USD.
Then you could imagine having a few entirely optional helper functions
that could automatically provide conversion factors using units.dat or
currency exchange rates. But even if you don't use these helper
functions they would still be useful.
--
greg
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Greg Stark <stark@mit.edu> writes:
On 24 January 2017 at 03:42, Peter van Hardenberg <pvh@pvh.ca> wrote:
The basic concept is that the value of a currency type is that it would
allow you to operate in multiple currencies without accidentally adding
them. You'd flatten them to a single type if when and how you wanted for any
given operation but could work without fear of losing information.
I don't think this even needs to be tied to currencies. I've often
thought this would be generally useful for any value with units.
There already is an extension somewhere for attaching units to numeric
values, which would be a place to start from for this purpose. The
things I think are unique to the currency situation are:
* Time-varying conversion ratios.
* Conventional number of decimal places for any given currency.
* Idiosyncratic I/O formats (symbol to left or right of number,
odd rules for negatives, etc). I think the space here is covered
by the POSIX currency locale rules.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On January 27, 2017 07:08, Tom Lane wrote:
... The things I think are unique to the currency situation are: ...
Add the potential for regulatory requirements to change at any time - sort of like timezone information. So no hard coded behavior.
rounding method/accuracy
storage precision different than display precision
conversion method (multiply, divide, triangulate, other)
use of spot rates (multiple rate sources) rather than/in addition to time-varying rates
responding to the overall idea of a currency type
Numeric values with units so that you get a warning/error when you mix different units in calculations? Ability to specify rounding methods and intermediate precisions for calculations?
+1 Good ideas with lots of potential applications.
Built-in currency type?
-1 I suspect this is one of those things that seems like a good idea but really isn't.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Greg Stark wrote
I don't think this even needs to be tied to currencies. I've often
thought this would be generally useful for any value with units. This
would prevent you from accidentally adding miles to kilometers or
hours to parsecs which is just as valid as preventing you from adding
CAD to USD.
There is already such a concept - not tied to currencies or units in
general. The SQL standard calls it DISTINCT types. And it can prevent
comparing apples to oranges.
I don't have the exact syntax at hand, but it's something like this:
create distinct type customer_id_type as integer;
create distinct type order_id_type as integer;
create table customers (id customer_id_type primary key);
create table orders (id order_id_type primary key, customer_id
customer_id_type not null);
And because those columns are defined with different types, the database
will refuse to compare customers.id with orders.id (just like it would
refuse to compare an integer with a date).
So an accidental join like this:
select *
from orders o
join customers c using (id);
would throw an error because the data types of the IDs can not be compared.
--
View this message in context: http://postgresql.nabble.com/GSoC-2017-tp5938331p5941383.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 1/27/17 8:17 AM, Brad DeJong wrote:
Add the potential for regulatory requirements to change at any time - sort of like timezone information. So no hard coded behavior.
Well, I wish we had support for storing those changing requirements as
well. If we had that it would greatly simplify having a timestamp type
that stores the original timezone.
BTW, time itself fits in the multi-unit pattern, since months don't have
a fixed conversion to days (and technically seconds don't have a fixed
conversion to anything thanks to leap seconds).
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 27, 2017 at 2:48 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 1/27/17 8:17 AM, Brad DeJong wrote:
Add the potential for regulatory requirements to change at any time -
sort of like timezone information. So no hard coded behavior.Well, I wish we had support for storing those changing requirements as
well. If we had that it would greatly simplify having a timestamp type that
stores the original timezone.BTW, time itself fits in the multi-unit pattern, since months don't have a
fixed conversion to days (and technically seconds don't have a fixed
conversion to anything thanks to leap seconds).
I agree with Jim here.
I think we don't need to solve all the possible currency problems to have a
useful type. I'll reiterate what I think is the key point here:
A currency type should work like a wallet. If I have 20USD in my wallet and
I put 20EUR in the wallet, I have 20USD and 20EUR in the wallet, not 42USD
(or whatever the conversion rate is these days). If I want to convert those
to a single currency, I need to perform an operation.
If we had this as a basic building block, support for some of the major
currency formats, and a function that a user could call (think of the way
we justify_interval sums of intervals to account for the ambiguities in day
lengths and so on), I think we'd have a pretty useful type.
As to Tom's point, conversion rates do not vary with time, they vary with
time, space, vendor, whether you're buying or selling, and in what
quantity, and so on. We can give people the tools to more easily and
accurately execute this math without actually building a whole financial
tool suite in the first release.
I'll also note that in the absence of progress here, users continue to get
bad advice about using the existing MONEY type such as here:
http://stackoverflow.com/questions/15726535/postgresql-which-datatype-should-be-used-for-currency
--
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt."—Kurt Vonnegut
On 27 January 2017 at 14:52, Thomas Kellerer <spam_eater@gmx.net> wrote:
I don't have the exact syntax at hand, but it's something like this:
create distinct type customer_id_type as integer;
create distinct type order_id_type as integer;create table customers (id customer_id_type primary key);
create table orders (id order_id_type primary key, customer_id
customer_id_type not null);
That seems like a useful thing but it's not exactly the same use case.
Measurements with units and currency amounts both have the property
that you are likely to want to have a single column that uses
different units for different rows. You can aggregate across them
without converting as long as you have an appropriate where clause or
group by clause -- GROUP BY units_of(debit_amount) for example.
--
greg
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2017-01-10 12:53 GMT+03:00 Alexander Korotkov <a.korotkov@postgrespro.ru>:
1. What project ideas we have?
Hi!
We would like to propose a project on rewriting PostgreSQL executor from
traditional Volcano-style [1] Graefe G.. Volcano — an extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng.,6(1): 120–135, 1994. to so-called push-based architecture as
implemented in
Hyper [2]Efficiently Compiling Efficient Query Plans for Modern Hardware,[3]Compiling Database Queries into Machine Code, and VitesseDB [4]https://docs.google.com/presentation/d/1R0po7_Wa9fym5U9Y5qHXGlUi77nSda2LlZXPuAxtd-M/pub?slide=id.g9b338944f_4_131. The idea is to reverse the direction of
data flow
control: instead of pulling up tuples one-by-one with ExecProcNode(), we
suggest
pushing them from below to top until blocking operator (e.g. Aggregation) is
encountered. There’s a good example and more detailed explanation for this
approach in [2]Efficiently Compiling Efficient Query Plans for Modern Hardware,.
The advantages of this approach:
* It allows to completely avoid the need of loading/storing the internal
state of the bottommost
(scanning) nodes, which will significantly reduce overhead. With current
pull-based model,
we call functions like heapgettup_pagemode() (and many others)
number-of-tuples-to-retrieve
times, while in push-based model we will call them only once. Currently,
we have
implemented a prototype for SeqScan node and achieved 2x speedup on query
“select * from lineitem”;
* The number of memory accesses is minimized; generally better code and
data locality,
cache is used more effectively;
* Switching to push model also makes a good base for building effective
JIT-compiler.
Currently we have working LLVM-based JIT compiler for expressions [5]PostgreSQL with JIT compiler for expressions,,
as well as whole query
JIT-compiler [6]LLVM Cauldron, slides,, which speeds up TPC-H queries up to 4-5 times, but the
latter took manually
re-implementing the executor logic with LLVM API using push model to get
this speedup. JIT-compiling
from original Postgres C code didn't give significant improvement
because of Volcano-style model
inherent inefficiency. After making a switch to push-model we expect to
achieve speedup comparable
to stand-alone JIT, but using the same code for both JIT and the
interpreter.
Also, while working on this project, we are likely be revealing and fixing
other
weak places of the current query executor. Volcano-style model is known to
have
inadequate performance characteristics [7]MonetDB/X100: Hyper-Pipelining Query Execution[8]Vectorization vs. Compilation in Query Execution,, e.g. function call overhead,
and we should deal with it anyway. We also plan to make relatively small
patches,
which will optimize the redundant reload of the internal state in the
current pull-model.
Many DB systems with support of full query compilation (e.g. LegoBase [9]http://www.vldb.org/pvldb/vol7/p853-klonatos.pdf,
Hekaton [10]https://www.microsoft.com/en-us/research/wp-content/uploads/2013/06/Hekaton-Sigmod2013-final.pdf) implement it in push-based manner.
Also we have seen in the mailing list that Kumar Rajeev had been
investigating this idea too, and he reported that the results were
impressive (unfortunately, without specifying more details):
/messages/by-id/BF2827DCCE55594C8D7A8F7FFD3AB77159A9B904@szxeml521-mbs.china.huawei.com
References
[1]: Graefe G.. Volcano — an extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng.,6(1): 120–135, 1994.
system. IEEE Trans. Knowl. Data Eng.,6(1): 120–135, 1994.
[2]: Efficiently Compiling Efficient Query Plans for Modern Hardware,
http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
[3]: Compiling Database Queries into Machine Code,
http://sites.computer.org/debull/A14mar/p3.pdf
[4]: https://docs.google.com/presentation/d/1R0po7_Wa9fym5U9Y5qHXGlUi77nSda2LlZXPuAxtd-M/pub?slide=id.g9b338944f_4_131
https://docs.google.com/presentation/d/1R0po7_Wa9fym5U9Y5qHXGlUi77nSda2LlZXPuAxtd-M/pub?slide=id.g9b338944f_4_131
[5]: PostgreSQL with JIT compiler for expressions,
https://github.com/ispras/postgres
[6]: LLVM Cauldron, slides,
http://llvm.org/devmtg/2016-09/slides/Melnik-PostgreSQLLLVM.pdf
[7]: MonetDB/X100: Hyper-Pipelining Query Execution
http://cidrdb.org/cidr2005/papers/P19.pdf
[8]: Vectorization vs. Compilation in Query Execution,
https://pdfs.semanticscholar.org/dcee/b1e11d3b078b0157325872a581b51402ff66.pdf
[9]: http://www.vldb.org/pvldb/vol7/p853-klonatos.pdf
[10]: https://www.microsoft.com/en-us/research/wp-content/uploads/2013/06/Hekaton-Sigmod2013-final.pdf
https://www.microsoft.com/en-us/research/wp-content/uploads/2013/06/Hekaton-Sigmod2013-final.pdf
--
*Best Regards,**Ruben.* <ruben@ispras.ru>
ISP RAS.
Attachments:
On 2017/02/06 20:51, Ruben Buchatskiy wrote:
Also we have seen in the mailing list that Kumar Rajeev had been
investigating this idea too, and he reported that the results were
impressive (unfortunately, without specifying more details):/messages/by-id/BF2827DCCE55594C8D7A8F7FFD3AB77159A9B904@szxeml521-mbs.china.huawei.com
You might also want to take a look at some of the ongoing work in this area:
WIP: Faster Expression Processing and Tuple Deforming (including JIT)
/messages/by-id/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Greetings,
* Amit Langote (Langote_Amit_f8@lab.ntt.co.jp) wrote:
On 2017/02/06 20:51, Ruben Buchatskiy wrote:
Also we have seen in the mailing list that Kumar Rajeev had been
investigating this idea too, and he reported that the results were
impressive (unfortunately, without specifying more details):/messages/by-id/BF2827DCCE55594C8D7A8F7FFD3AB77159A9B904@szxeml521-mbs.china.huawei.com
You might also want to take a look at some of the ongoing work in this area:
WIP: Faster Expression Processing and Tuple Deforming (including JIT)
/messages/by-id/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
Yes, exactly that. Please review what's been currently done and,
ideally, have someone like Andres comment on your plan.
Perhaps you could arrange something with him as the mentor, since it
looked like you didn't have any specific mentors listed in a quick look.
That's definitely something that will be needed to include this project.
Thanks!
Stephen
* Stephen Frost (sfrost@snowman.net) wrote:
* Amit Langote (Langote_Amit_f8@lab.ntt.co.jp) wrote:
On 2017/02/06 20:51, Ruben Buchatskiy wrote:
Also we have seen in the mailing list that Kumar Rajeev had been
investigating this idea too, and he reported that the results were
impressive (unfortunately, without specifying more details):/messages/by-id/BF2827DCCE55594C8D7A8F7FFD3AB77159A9B904@szxeml521-mbs.china.huawei.com
You might also want to take a look at some of the ongoing work in this area:
WIP: Faster Expression Processing and Tuple Deforming (including JIT)
/messages/by-id/20161206034955.bh33paeralxbtluv@alap3.anarazel.deYes, exactly that. Please review what's been currently done and,
ideally, have someone like Andres comment on your plan.Perhaps you could arrange something with him as the mentor, since it
looked like you didn't have any specific mentors listed in a quick look.
That's definitely something that will be needed to include this project.
Apologies, looks like you do have a couple of mentors listed on the
wiki, so that looks good.
Thanks!
Stephen
Ruben,
* Ruben Buchatskiy (ruben@ispras.ru) wrote:
Difficulty Level
Moderate-level; however, microoptimizations might be hard.
Probably it will also be hard to keep the whole architecture as clean as it is
now.
The above difficulty level looks fine, but doesn't match what's on the
wiki. What's on the wiki looks like a copy/paste from one of the
SSI-related items.
Please fix.
Thanks!
Stephen
On Mon, Feb 6, 2017 at 6:51 AM, Ruben Buchatskiy <ruben@ispras.ru> wrote:
2017-01-10 12:53 GMT+03:00 Alexander Korotkov <a.korotkov@postgrespro.ru>:
1. What project ideas we have?
We would like to propose a project on rewriting PostgreSQL executor from
traditional Volcano-style [1] to so-called push-based architecture as
implemented inHyper [2][3] and VitesseDB [4]. The idea is to reverse the direction of data
flowcontrol: instead of pulling up tuples one-by-one with ExecProcNode(), we
suggestpushing them from below to top until blocking operator (e.g. Aggregation) is
encountered. There’s a good example and more detailed explanation for this
approach in [2].
I think this very possibly a good idea but extremely unlikely to be
something that a college student or graduate student can complete in
one summer. More like an existing expert developer and a year of
doing not much else.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2017-02-08 17:06 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
On Mon, Feb 6, 2017 at 6:51 AM, Ruben Buchatskiy <ruben@ispras.ru> wrote:
2017-01-10 12:53 GMT+03:00 Alexander Korotkov <a.korotkov@postgrespro.ru
:1. What project ideas we have?
We would like to propose a project on rewriting PostgreSQL executor from
traditional Volcano-style [1] to so-called push-based architecture as
implemented inHyper [2][3] and VitesseDB [4]. The idea is to reverse the direction of
data
flow
control: instead of pulling up tuples one-by-one with ExecProcNode(), we
suggestpushing them from below to top until blocking operator (e.g.
Aggregation) is
encountered. There’s a good example and more detailed explanation for
this
approach in [2].
I think this very possibly a good idea but extremely unlikely to be
something that a college student or graduate student can complete in
one summer. More like an existing expert developer and a year of
doing not much else.
+1
Pavel
Show quoted text
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
The expected result for this work is push-based executor working for many
types of queries (currently we aim at TPC-H), but it's unlikely to be a
production-ready patch to commit into mainline at that stage. This work is
the actual topic for our student's thesis, so he has already started, and
has working prototypes for very simple plans. Also, he won't be working on
this alone, but rather will make use of support and experience of our team
(as well as mentor's help).
So this is not about replacing current pull executor right away, but rather
to develop working prototype to find out about the benefits of switching
from pull to push model (for both the interpreter and LLVM JIT).
On Wed, Feb 8, 2017 at 7:06 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Feb 6, 2017 at 6:51 AM, Ruben Buchatskiy <ruben@ispras.ru> wrote:
2017-01-10 12:53 GMT+03:00 Alexander Korotkov <a.korotkov@postgrespro.ru
:1. What project ideas we have?
We would like to propose a project on rewriting PostgreSQL executor from
traditional Volcano-style [1] to so-called push-based architecture as
implemented inHyper [2][3] and VitesseDB [4]. The idea is to reverse the direction of
data
flow
control: instead of pulling up tuples one-by-one with ExecProcNode(), we
suggestpushing them from below to top until blocking operator (e.g.
Aggregation) is
encountered. There’s a good example and more detailed explanation for
this
approach in [2].
I think this very possibly a good idea but extremely unlikely to be
something that a college student or graduate student can complete in
one summer. More like an existing expert developer and a year of
doing not much else.--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Best regards,
Dmitry
Hi all!
It seems that PostgreSQL has passed to GSoC mentoring organizations this
year!
https://summerofcode.withgoogle.com/organizations/4558465230962688/
Congratulations!
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Tue, Feb 28, 2017 at 11:42 AM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
Hi all!
It seems that PostgreSQL has passed to GSoC mentoring organizations this
year!
https://summerofcode.withgoogle.com/organizations/4558465230962688/
Congratulations!
Very cool!
By the way, that page claims that PostgreSQL runs on Irix and Tru64,
which hasn't been true for a few years.
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2/27/17 4:52 PM, Thomas Munro wrote:
By the way, that page claims that PostgreSQL runs on Irix and Tru64,
which hasn't been true for a few years.
There could be a GSoC project to add support for those back in... ;P
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Mar 2, 2017 at 3:45 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 2/27/17 4:52 PM, Thomas Munro wrote:
By the way, that page claims that PostgreSQL runs on Irix and Tru64,
which hasn't been true for a few years.There could be a GSoC project to add support for those back in... ;P
Greg Stark and Tom Lane did some work to fix problems in our VAX
support a few years ago (try git log --grep=VAX), but I don't think
Greg ever got it fully working. There could be some point to putting
more effort into making PostgreSQL scale to very small systems. We
seen to run pretty well even on very low-end hardware like a Raspberry
Pi, but there's always something lower-end, and having compile or
runtime options that lower our memory footprint would probably be
useful as the natural opposite of the scalability and parallel query
work we've been doing over the last few years. Whether it's also
useful to try to support running the system on unobtainable operating
systems is less clear to me.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Mar 2, 2017 at 3:45 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 2/27/17 4:52 PM, Thomas Munro wrote:
By the way, that page claims that PostgreSQL runs on Irix and Tru64,
which hasn't been true for a few years.
There could be a GSoC project to add support for those back in... ;P
... Whether it's also
useful to try to support running the system on unobtainable operating
systems is less clear to me.
I seriously doubt that we'd take patches to run on non-mainstream OSes
without a concomitant promise to support buildfarm animals running such
OSes for the foreseeable future. Without that we don't know if the
patches still work even a week after they're committed. We killed the
above-mentioned OSes mainly for lack of any such animals, IIRC.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers