mailing list archiver chewing patches
Tim Bunce's recent patch has been mangled apparently by the list
archives. He sent it as an attachment, and that's how I have it in my
mailbox, so why isn't it appearing as such in the web archive so that it
can be nicely downloaded? See
<http://archives.postgresql.org/message-id/20100108124613.GL2505@timac.local>.
It's happened to other people as well:
<http://archives.postgresql.org/message-id/4B02D3E4.1040107@hut.fi>
Reviewers and others shouldn't have to c&p patches from web pages,
especially when it will be horribly line wrapped etc. Can we stop this
happening somehow?
cheers
andrew
Andrew Dunstan wrote:
Tim Bunce's recent patch has been mangled apparently by the list
archives. He sent it as an attachment, and that's how I have it in
my mailbox, so why isn't it appearing as such in the web archive so
that it can be nicely downloaded? See <http://archives.postgresql.org/message-id/20100108124613.GL2505@timac.local>.
It's happened to other people as well:
<http://archives.postgresql.org/message-id/4B02D3E4.1040107@hut.fi>Reviewers and others shouldn't have to c&p patches from web pages,
especially when it will be horribly line wrapped etc. Can we stop
this happening somehow?
Try this
http://archives.postgresql.org/msgtxt.php?id=20100108124613.GL2505@timac.local
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera wrote:
Andrew Dunstan wrote:
Tim Bunce's recent patch has been mangled apparently by the list
archives. He sent it as an attachment, and that's how I have it in
my mailbox, so why isn't it appearing as such in the web archive so
that it can be nicely downloaded? See <http://archives.postgresql.org/message-id/20100108124613.GL2505@timac.local>.
It's happened to other people as well:
<http://archives.postgresql.org/message-id/4B02D3E4.1040107@hut.fi>Reviewers and others shouldn't have to c&p patches from web pages,
especially when it will be horribly line wrapped etc. Can we stop
this happening somehow?Try this
http://archives.postgresql.org/msgtxt.php?id=20100108124613.GL2505@timac.local
This was previously broken for a lot of emails, but I just fixed some of
it, and it seems to work for the vast majority of our emails (and
certainly for all emails that matter).
The other point related to this is that each email should have a link
pointing to its text/plain version. This used to be present, but it got
broken (I think) at the same time that the anti-email-harvesting measure
got broken. I'm going to look at that next.
Let me know if you find something broken with this style of link.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Sat, Jan 09, 2010 at 02:17:27AM -0300, Alvaro Herrera wrote:
Alvaro Herrera wrote:
Andrew Dunstan wrote:
Tim Bunce's recent patch has been mangled apparently by the list
archives. He sent it as an attachment, and that's how I have it in
my mailbox, so why isn't it appearing as such in the web archive so
that it can be nicely downloaded? See <http://archives.postgresql.org/message-id/20100108124613.GL2505@timac.local>.
It's happened to other people as well:
<http://archives.postgresql.org/message-id/4B02D3E4.1040107@hut.fi>Reviewers and others shouldn't have to c&p patches from web pages,
especially when it will be horribly line wrapped etc. Can we stop
this happening somehow?Try this
http://archives.postgresql.org/msgtxt.php?id=20100108124613.GL2505@timac.local
That looks like it dumps the raw message. That'll cause problems for any
messages using quoted-printable encoding. I'd hazard a guess it also
won't do thing right thing for non-charset=us-ascii emails/attachments.
This was previously broken for a lot of emails, but I just fixed some of
it, and it seems to work for the vast majority of our emails (and
certainly for all emails that matter).The other point related to this is that each email should have a link
pointing to its text/plain version. This used to be present, but it got
broken (I think) at the same time that the anti-email-harvesting measure
got broken. I'm going to look at that next.Let me know if you find something broken with this style of link.
What's needed is a) a download link for each attachment, regardless of the
kind of attachment, and b) the download link should download the content
of the attachment in a way that's directly usable.
For example, see http://archives.postgresql.org/pgsql-hackers/2010-01/msg00589.php
Looking at the raw version of the original message
http://archives.postgresql.org/msgtxt.php?id=757953.70187.qm@web29001.mail.ird.yahoo.com
That message has a patch as an attachment:
Content-Type: application/octet-stream; name="patch_bit.patch"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="patch_bit.patch"
It gets a link in the archive (because it's a non-text content-type I presume):
http://archives.postgresql.org/pgsql-hackers/2010-01/bin5ThVOJC3jI.bin
but the link doesn't work well. The url ends with .bin and the http
response content-type is Content-Type: application/octet-stream so
downloaders get a .bin file instead of the original .patch file.
It seems that people wanting to send in a patch have two options: send
it as text/(something) so it's readable on the archive web page but not
copy-n-paste'able because of wordwrapping, or set it as
application/octet-stream so it's downloadable but not readable on the
web page.
Let me know if I've misunderstood anything.
Some sugestions:
- Provide links for all attachments, whether text/* or not.
- For text/* types show the content inline verbatim, don't wrap the text.
- If the attachment has a Content-Disposition with a filename then
append that to the url. It could simply be a fake 'path info':
.../2010-01/bin5ThVOJC3jI.bin/patch_bit.patch
- Instead of "Description: Binary data" on the web page, give the
values of the Content-Type and Content-Disposition headers.
Tim.
p.s. For background... I'm writing an email to the dbi-users &
dbi-announce mailing lists (~2000 & ~5000 users last time I checked)
asking anyone who might be interested to help review the plperl feature
patch and encouraging them to contribute to the commitfest review
process for other patches. It's important that it's *very* easy for
these new-comers to follow simple instructions to get involved.
I was hoping to be able to use a archives.postgresql.org url to the
message with the patch to explain what's the patch does _and_ provide a
download link. It seems I'll have to upload the patch somewhere else.
Tim Bunce wrote:
It seems that people wanting to send in a patch have two options: send
it as text/(something) so it's readable on the archive web page but not
copy-n-paste'able because of wordwrapping, or set it as
application/octet-stream so it's downloadable but not readable on the
web page.
That is assuming that the MUA gives you the option of specifying the
attachment MIME type. Many (including mine) do not. It would mean an
extra step - I'd have to gzip each patch or something like that. That
would be unfortunate,as well as imposing extra effort, because it would
make the patch not display inline in many MUAs (again, like mine).
cheers
andrew
Tim Bunce wrote:
Try this
http://archives.postgresql.org/msgtxt.php?id=20100108124613.GL2505@timac.local
That looks like it dumps the raw message. That'll cause problems for any
messages using quoted-printable encoding. I'd hazard a guess it also
won't do thing right thing for non-charset=us-ascii emails/attachments.
Yeah. Grab it and open it as an mbox.
What's needed is a) a download link for each attachment, regardless of the
kind of attachment, and b) the download link should download the content
of the attachment in a way that's directly usable.
Yeah, well, that's a bit outside what I am able to do, unless you can
get a MHonArc expert somewhere who can help us figure out how to
set it up for these requirements.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Andrew Dunstan <andrew@dunslane.net> writes:
That is assuming that the MUA gives you the option of specifying the
attachment MIME type. Many (including mine) do not. It would mean an extra
step - I'd have to gzip each patch or something like that. That would be
unfortunate,as well as imposing extra effort, because it would make the
patch not display inline in many MUAs (again, like mine).
Bad MUA, change MUA, or what they say…
More seriously though, it's not the first time we're having some
difficulties with the MHonArc setup, and I think it's also related to
the poor thread following on the archives website at month boundaries.
MHonArc (http://hydra.nac.uci.edu/indiv/ehood/mhonarc.html) seems to be
about converting the mails into some HTML pages, and offering the web
interface to get to use them, with some indexing and searches
facilities.
Are our indexing and searches provided by MHonArc or maintained by the
community? How helpful considering alternatives, such as AOX (which runs
atop PostgreSQL and would offer anonymous IMAP facility over the
archives) would be?
Of course it'll boil down to who's maintaining the current solution and
how much time is allocated to this, the solution research and migration
would have to fit in there I suppose. Same as pgfoundry. But still,
should we talk about it?
Regards,
--
dim
Dimitri Fontaine wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
That is assuming that the MUA gives you the option of specifying the
attachment MIME type. Many (including mine) do not. It would mean an extra
step - I'd have to gzip each patch or something like that. That would be
unfortunate,as well as imposing extra effort, because it would make the
patch not display inline in many MUAs (again, like mine).Bad MUA, change MUA, or what they say…
More seriously though, it's not the first time we're having some
difficulties with the MHonArc setup, and I think it's also related to
the poor thread following on the archives website at month boundaries.
Absolutely. The month boundary problem boils down to the fact that
Mhonarc does not scale very well, so we can't have mboxes that are too
large. This is why most people split their archives per month, and then
each month is published as an independent Mhonarc output archive. It's
a horrid solution.
Are our indexing and searches provided by MHonArc or maintained by the
community?
Searches are completely external to mhonarc.
How helpful considering alternatives, such as AOX (which runs
atop PostgreSQL and would offer anonymous IMAP facility over the
archives) would be?Of course it'll boil down to who's maintaining the current solution and
how much time is allocated to this, the solution research and migration
would have to fit in there I suppose. Same as pgfoundry. But still,
should we talk about it?
There's some talk about writing our own archiving system,
database-backed. There have been a few false starts but no concrete
result so far. We need a lot more manpower invested in this problem.
If there's interest, let's talk about it.
My daugher was born yesterday and I'm having a bit of a calm before the
storm because she's not coming home until Tuesday or so (at this time of
the day, that is, because I have to take care of the other daughter).
I'll be probably away for (at least) a week when she does; and I'll
probably have somewhat of a shortage of spare time after that.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
2010/1/11 Alvaro Herrera <alvherre@commandprompt.com>:
Dimitri Fontaine wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
That is assuming that the MUA gives you the option of specifying the
attachment MIME type. Many (including mine) do not. It would mean an extra
step - I'd have to gzip each patch or something like that. That would be
unfortunate,as well as imposing extra effort, because it would make the
patch not display inline in many MUAs (again, like mine).Bad MUA, change MUA, or what they say…
More seriously though, it's not the first time we're having some
difficulties with the MHonArc setup, and I think it's also related to
the poor thread following on the archives website at month boundaries.Absolutely. The month boundary problem boils down to the fact that
Mhonarc does not scale very well, so we can't have mboxes that are too
large. This is why most people split their archives per month, and then
each month is published as an independent Mhonarc output archive. It's
a horrid solution.
Yeah.
Are our indexing and searches provided by MHonArc or maintained by the
community?Searches are completely external to mhonarc.
It is, but it's tied into the format of the URLs and the format of the
actual messages in order to be more efficient. But it should be fairly
easy to adapt it to some other base system if we want.
How helpful considering alternatives, such as AOX (which runs
atop PostgreSQL and would offer anonymous IMAP facility over the
archives) would be?Of course it'll boil down to who's maintaining the current solution and
how much time is allocated to this, the solution research and migration
would have to fit in there I suppose. Same as pgfoundry. But still,
should we talk about it?There's some talk about writing our own archiving system,
database-backed. There have been a few false starts but no concrete
result so far. We need a lot more manpower invested in this problem.
If there's interest, let's talk about it.
Yeah, definitely, let's talk about it. Anything that gives us an
efficient backend with a good API is interesting (SQL is a reasonably
good API. Not so sure about IMAP, since it is a bit too focused on
single messages IIRC). Particularly, something that can separate
frontend and backend (can still be on the same machine of course, I'm
talking conceptually) seems to be a lot more flexible, which we'd
like.
As for AOX, my understanding is that it is no longer maintained, so
I'd be worried about choosing such a solution for a complex problem.
But it's open for discussion.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Alvaro Herrera <alvherre@commandprompt.com> writes:
Absolutely. The month boundary problem boils down to the fact that
Mhonarc does not scale very well, so we can't have mboxes that are too
large. This is why most people split their archives per month, and then
each month is published as an independent Mhonarc output archive. It's
a horrid solution.Are our indexing and searches provided by MHonArc or maintained by the
community?Searches are completely external to mhonarc.
Changing the MHonArc solution would probably mean adapting them, I
guess, or proposing a new solution with compatible output for the
searching to still work…
How helpful considering alternatives, such as AOX (which runs
atop PostgreSQL and would offer anonymous IMAP facility over the
archives) would be?Of course it'll boil down to who's maintaining the current solution and
how much time is allocated to this, the solution research and migration
would have to fit in there I suppose. Same as pgfoundry. But still,
should we talk about it?There's some talk about writing our own archiving system,
database-backed. There have been a few false starts but no concrete
result so far. We need a lot more manpower invested in this problem.
If there's interest, let's talk about it.
AOX is already a database backed email solution, offering an archive
page with searching. I believe the searching is baked by tsearch
indexing. That's why I think it'd be suitable.
They already archive and offer search over one of our mailing lists, and
from there it seems like we'd only miss the user interface bits:
http://archives.aox.org/archives/pgsql-announce
I hope the UI bits are not the most time demanding one.
Is there someone with enough time to install aox somewhere and have it
subscribed to our lists?
My daugher was born yesterday and I'm having a bit of a calm before the
storm because she's not coming home until Tuesday or so (at this time of
the day, that is, because I have to take care of the other daughter).
I'll be probably away for (at least) a week when she does; and I'll
probably have somewhat of a shortage of spare time after that.
Ahaha :)
IME that's not the shortage of spare time which ruins you the most as
the lack of energy when you do have this little precious resource
again, very few piece of it atime.
Regards,
--
dim
Hi,
Il 11/01/2010 11:18, Dimitri Fontaine ha scritto:
AOX is already a database backed email solution, offering an archive
page with searching. I believe the searching is baked by tsearch
indexing. That's why I think it'd be suitable.They already archive and offer search over one of our mailing lists, and
from there it seems like we'd only miss the user interface bits:http://archives.aox.org/archives/pgsql-announce
I hope the UI bits are not the most time demanding one.
Is there someone with enough time to install aox somewhere and have it
subscribed to our lists?
I recall having tried AOX a long time ago but I can't remember the
reason why I was not satisfied. I guess I can give another try by
setting up a test ML archive.
My daugher was born yesterday and I'm having a bit of a calm before the
storm because she's not coming home until Tuesday or so (at this time of
the day, that is, because I have to take care of the other daughter).
I'll be probably away for (at least) a week when she does; and I'll
probably have somewhat of a shortage of spare time after that.
BTW, congrats Alvaro!
Cheers
--
Matteo Beccati
Development & Consulting - http://www.beccati.com/
On Mon, Jan 11, 2010 at 5:23 PM, Matteo Beccati <php@beccati.com> wrote:
Hi,
Il 11/01/2010 11:18, Dimitri Fontaine ha scritto:
AOX is already a database backed email solution, offering an archive
page with searching. I believe the searching is baked by tsearch
indexing. That's why I think it'd be suitable.They already archive and offer search over one of our mailing lists, and
from there it seems like we'd only miss the user interface bits:http://archives.aox.org/archives/pgsql-announce
I hope the UI bits are not the most time demanding one.
Is there someone with enough time to install aox somewhere and have it
subscribed to our lists?I recall having tried AOX a long time ago but I can't remember the reason
why I was not satisfied. I guess I can give another try by setting up a test
ML archive.
I tried it too, before I started writing the new prototype archiver
from scratch. I too forget why I gave up on it, but it was a strong
enough reason for me to start coding from scratch.
BTW, we only need to replace the archiver/display code. The search
works well already.
--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com
Il 11/01/2010 12:58, Dave Page ha scritto:
On Mon, Jan 11, 2010 at 5:23 PM, Matteo Beccati<php@beccati.com> wrote:
I recall having tried AOX a long time ago but I can't remember the reason
why I was not satisfied. I guess I can give another try by setting up a test
ML archive.I tried it too, before I started writing the new prototype archiver
from scratch. I too forget why I gave up on it, but it was a strong
enough reason for me to start coding from scratch.BTW, we only need to replace the archiver/display code. The search
works well already.
It took me no more than 10 minutes to set up AOX and hook it up to a
domain. An email account is now subscribed to the hackers ML.
I'll try to estimate how hard it could be to write a web app that
displays the archive from the db, even though I'm not sure that this is
a good way to proceed.
Cheers
--
Matteo Beccati
Development & Consulting - http://www.beccati.com/
Magnus Hagander <magnus@hagander.net> writes:
As for AOX, my understanding is that it is no longer maintained, so
I'd be worried about choosing such a solution for a complex problem.
But it's open for discussion.
Ouch.
--
dim
Dave Page <dpage@pgadmin.org> writes:
I recall having tried AOX a long time ago but I can't remember the reason
why I was not satisfied. I guess I can give another try by setting up a test
ML archive.I tried it too, before I started writing the new prototype archiver
from scratch. I too forget why I gave up on it, but it was a strong
enough reason for me to start coding from scratch.BTW, we only need to replace the archiver/display code. The search
works well already.
What the current archiver looks like? A PG database containing the raw
mails and attachements? It that's the case the missing piece would be to
plug a browsing UI atop of that, right?
Regards,
--
dim
2010/1/11 Dimitri Fontaine <dfontaine@hi-media.com>:
Dave Page <dpage@pgadmin.org> writes:
I recall having tried AOX a long time ago but I can't remember the reason
why I was not satisfied. I guess I can give another try by setting up a test
ML archive.I tried it too, before I started writing the new prototype archiver
from scratch. I too forget why I gave up on it, but it was a strong
enough reason for me to start coding from scratch.BTW, we only need to replace the archiver/display code. The search
works well already.What the current archiver looks like? A PG database containing the raw
mails and attachements? It that's the case the missing piece would be to
plug a browsing UI atop of that, right?
No, the current archiver is a set of MBOX files that are processed
incrementally by mhonarc.
(yes, this is why it doesn't scale)
*search* is in a postgresql database, but it doesn't contain the
entire messages - doesn't have attachments for examples - only the
parts it has web-scraped off the the current archives.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Dimitri Fontaine <dfontaine@hi-media.com> writes:
Magnus Hagander <magnus@hagander.net> writes:
As for AOX, my understanding is that it is no longer maintained, so
I'd be worried about choosing such a solution for a complex problem.
But it's open for discussion.Ouch.
It seems that the company baking the development is dead, but the
developpers are still working on the product on their spare time. New
release ahead.
They're not working on the archive UI part.
--
dim
(Many thanks to Dimitri for bringing this thread to my attention.)
At 2010-01-11 10:46:10 +0100, magnus@hagander.net wrote:
As for AOX, my understanding is that it is no longer maintained, so
I'd be worried about choosing such a solution for a complex problem.
I'll keep this short: Oryx, the company behind Archiveopteryx (aox), is
no longer around, but the software is still maintained. The developers
(myself included) are still interested in keeping it alive. It's been a
while since the last release, but it'll be ready soon. If you're having
any sort of problems with it, write to me, and I'll help you.
(That said, we're not working on the web interface. It did work, in its
limited fashion, but it's not feature complete; and I need to find some
paying work, so it's not a priority. That, and some health problems, are
also why I haven't been active on the pg lists for a while.)
Feel free to write to me off-list for more.
-- ams
Magnus Hagander wrote:
No, the current archiver is a set of MBOX files that are processed
incrementally by mhonarc.(yes, this is why it doesn't scale)
*search* is in a postgresql database, but it doesn't contain the
entire messages - doesn't have attachments for examples - only the
parts it has web-scraped off the the current archives.
Fixing this mess and giving us decent archives with guaranteed
downloadable patches and good search would be a nice job for someone who
wants to contribute without having to cut or review core code.
cheers
andrew
Il 11/01/2010 15:00, Abhijit Menon-Sen ha scritto:
I'll keep this short: Oryx, the company behind Archiveopteryx (aox), is
no longer around, but the software is still maintained. The developers
(myself included) are still interested in keeping it alive. It's been a
while since the last release, but it'll be ready soon. If you're having
any sort of problems with it, write to me, and I'll help you.
That's good news indeed for the project, AOX seems to be working fine on
my server. I've had a few IMAP glitches, but it seems to live happily
with my qmail and stores the emails on the db, fulfilling my current needs.
So, I've decided to spend a bit more time on this and here is a proof of
concept web app that displays mailing list archives reading from the AOX
database:
Please take it as an exercise I've made trying to learn how to use
symfony this afternoon. It's not feature complete, nor probably very
scalable, but at least it features attachment download ;)
http://archives.beccati.org/pgsql-hackers/message/37
Cheers
--
Matteo Beccati
Development & Consulting - http://www.beccati.com/