[Jack-Devel] List Achives

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[Jack-Devel] List Achives

Thomas Brand
Hi all,

the list archive
http://lists.jackaudio.org/private.cgi/jack-devel-jackaudio.org/
starts from April 2015.

- Does anybody know about earlier archives?

- Does anybody have (personal) copies of the archives?

Any hints are valuable, thanks

Greetings
Thomas


_______________________________________________
Jack-Devel mailing list
[hidden email]
http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org
Reply | Threaded
Open this post in threaded view
|

Re: List Achives

Thomas Brand
On Tue, February 19, 2019 23:43, Adrian Knoth wrote:

> Hi!
>
>
>> the list archive
>> http://lists.jackaudio.org/private.cgi/jack-devel-jackaudio.org
>> starts from April 2015.
>>
>> - Does anybody know about earlier archives?
>> - Does anybody have (personal) copies of the archives?
>>
>
> I've found a copy (including the LAD list) on my machine.
>
>
> Goes back to at least 2011.
>
>
> You can download it here:
>
>
> http://adi.loris.tv/linuxaudio.tar.xz
>
>
> Let me know if you make this publicly available somewhere. You probably
> need to disentangle the multiple lists in this archive by checking some
> header fields.
>

Thank you Adrian! I'll have a look.
Currently the mailing list archive is for members only. Then again
nabble.com seems to have a copy of the whole archive.
Would there be any issue for anybody if all archives are published
publicly (say in a github repository)?. Any thoughts?

Greetings


_______________________________________________
Jack-Devel mailing list
[hidden email]
http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org
Reply | Threaded
Open this post in threaded view
|

Re: List Achives

John Rigg
On Wed, Feb 20, 2019 at 12:41:08AM +0100, Thomas Brand wrote:
> Currently the mailing list archive is for members only. Then again
> nabble.com seems to have a copy of the whole archive.
> Would there be any issue for anybody if all archives are published
> publicly (say in a github repository)?. Any thoughts?

If you do publish it please make sure email addresses are
obscured to make automated address harvesting more
difficult, as is done currently on nabble.com and the
official archive.

John
_______________________________________________
Jack-Devel mailing list
[hidden email]
http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org
Reply | Threaded
Open this post in threaded view
|

Re: List Achives

Thomas Brand
On Wed, February 20, 2019 10:24, John Rigg wrote:

> On Wed, Feb 20, 2019 at 12:41:08AM +0100, Thomas Brand wrote:
>
>> Currently the mailing list archive is for members only. Then again
>> nabble.com seems to have a copy of the whole archive. Would there be any
>> issue for anybody if all archives are published publicly (say in a
>> github repository)?. Any thoughts?
>
> If you do publish it please make sure email addresses are
> obscured to make automated address harvesting more difficult, as is done
> currently on nabble.com and the official archive.
>
> John

Yes I will do that. So far it looks pretty easy to extract just mails for
jack-devel. It needs a small parser per Mail to pick just the headers of
interest and handle content encoding (some mails have a base64 body). In
that step obfuscating the mail address is reasonable. I think nabble just
replaces the '@' with ' at ', which is better than nothing.
A flat file ordered by time would be the minimum. A better solution would
respect In-Reply-To and Message-ID headers in order to follow the thread.
It's good that data is not lost. In any case if there is something ready
to put out I'll first send a sample here.

Greetings
Thomas


_______________________________________________
Jack-Devel mailing list
[hidden email]
http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org
Reply | Threaded
Open this post in threaded view
|

Re: List Achives

Thomas Brand
In reply to this post by John Rigg
On Wed, February 20, 2019 10:24, John Rigg wrote:

> On Wed, Feb 20, 2019 at 12:41:08AM +0100, Thomas Brand wrote:
>
>> Currently the mailing list archive is for members only. Then again
>> nabble.com seems to have a copy of the whole archive. Would there be any
>> issue for anybody if all archives are published publicly (say in a
>> github repository)?. Any thoughts?
>
> If you do publish it please make sure email addresses are
> obscured to make automated address harvesting more difficult, as is done
> currently on nabble.com and the official archive.
>
> John
Looking at the mail archive, it is roughly covering a decade. During
testing the conversion output alone I stumbled upon mails that are
valuable to understand the history of the jack project better. It's a
concentrated resource (without ads!) of information that can be queried
with grep.

Attached to this mail are samples for 3 blindly chosen archived mails.

Each mail is represented as plain text and HTML file.

The encoding is UTF-8. If plain text file is viewed in browswer, the
display setting should be "Unicode" in order to correctly display. The
HTML variant should work out-of-the-box with an according header for
UTF-8. Some mails have garbled text which is possibly the result of mail
clients sending text forth and back with small encoding errors. Only very
few mails have that problem, where possible the encoding is converted from
the MIME part infos (using reformime, iconv).

Mail addresses are obfuscated with the pattern
Full Name <[hidden email]>

This happens for all header addresses and all addresses in the mail body
(eg. "Jon Doe <email here> wrote:" will be replaced). Already obfuscated
mail addresses are left as is.
VCF Card attachments are removed. PGP Signature attachments are removed.

HTML variant:
-every mail is in a folder
-index.html links to available attachments (images, pdf, diff files, ...)
if any, in same folder
-links to In-Reply-To and Follow-Up messages

Text variant:
-no inline attachments
-all text concatenated in to a gzip file is around 3.5 MB.

I'd like to make this resource available without restriction (eg. not
required to be mailinglist member) as a source of information that can be
used stand-alone or referenced from other places.

Please have a look at the examples.

If you have written to this list and would like to be excluded from the
archives, please tell so (this will be fiddly and make the archive less
useful so please think twice when even considering this).

Any other feedback on form and function is welcome!

Greetings
Thomas

_______________________________________________
Jack-Devel mailing list
[hidden email]
http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org

samples.tgz (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: List Achives

Malik Costet
On 2019-02-22 17:27, Thomas Brand wrote:
<snip />

> Mail addresses are obfuscated with the pattern
> Full Name <[hidden email]>

<snip/>

Not a fan. It still looks like an e-mail address.
I'd personally find something like "[hidden] at domain-untouched dot
tld" more aesthetically pleasing.

Commendable initiative, otherwise.

--
Malik.

_______________________________________________
Jack-Devel mailing list
[hidden email]
http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org
Reply | Threaded
Open this post in threaded view
|

Re: List Achives

Thomas Brand
On Fri, February 22, 2019 17:41, Malik Costet wrote:

> On 2019-02-22 17:27, Thomas Brand wrote:
> <snip />
>
>
>> Mail addresses are obfuscated with the pattern
>> Full Name <[hidden email]>
>>
>
> <snip/>
>
>
> Not a fan. It still looks like an e-mail address.
> I'd personally find something like "[hidden] at domain-untouched dot
> tld" more aesthetically pleasing.
>
> Commendable initiative, otherwise.
>
Thanks,
yes it could also replace the @ and dots.
This is the new regex:

$ echo "[hidden email]" | sed -r
's/\b[A-Za-z0-9._%+-]+@([A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b)/\[hidden\] at
\1/g' | sed 's/\./ dot /g'
[hidden] at here dot aaa dot bb



_______________________________________________
Jack-Devel mailing list
[hidden email]
http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org
Reply | Threaded
Open this post in threaded view
|

Re: List Achives

John Rigg
On Fri, Feb 22, 2019 at 05:58:50PM +0100, Thomas Brand wrote:
> [hidden] at here dot aaa dot bb

Looks fine to me. Thanks for your work on this.

John
_______________________________________________
Jack-Devel mailing list
[hidden email]
http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org
Reply | Threaded
Open this post in threaded view
|

Re: List Achives

Thomas Brand
On Sat, February 23, 2019 13:16, John Rigg wrote:
> On Fri, Feb 22, 2019 at 05:58:50PM +0100, Thomas Brand wrote:
>
>> [hidden] at here dot aaa dot bb
>>
>
> Looks fine to me. Thanks for your work on this.
>
>
Nice,
I'm yet unsure if the message IDs should be left plain, the part before @
is the only common identifier that will allow anybody to match the archive
output with the individually received mails.
Message-ID <AANLkTik_Lapa6aGxfgT4H_9QhRFz9jLAo=3GjwLyTpBF at mail dot
gmail dot com>
It's not mandatory to show it in the archive. Currently the main ID is the
filename of the mail as is in the tarball, in directory cur, as delivered
by Adi's mail agent.
I'd opt-in for keeping the ID part unless other reasons speak against it.
Greetings
Thomas


_______________________________________________
Jack-Devel mailing list
[hidden email]
http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org
Reply | Threaded
Open this post in threaded view
|

Re: List Achives

Jörn Nettingsmeier-5
On 2/23/19 7:31 PM, Thomas Brand wrote:

> On Sat, February 23, 2019 13:16, John Rigg wrote:
>> On Fri, Feb 22, 2019 at 05:58:50PM +0100, Thomas Brand wrote:
>>
>>> [hidden] at here dot aaa dot bb
>>>
>>
>> Looks fine to me. Thanks for your work on this.
>>
>>
> Nice,
> I'm yet unsure if the message IDs should be left plain, the part before @
> is the only common identifier that will allow anybody to match the archive
> output with the individually received mails.
> Message-ID <AANLkTik_Lapa6aGxfgT4H_9QhRFz9jLAo=3GjwLyTpBF at mail dot
> gmail dot com>
> It's not mandatory to show it in the archive. Currently the main ID is the
> filename of the mail as is in the tarball, in directory cur, as delivered
> by Adi's mail agent.
> I'd opt-in for keeping the ID part unless other reasons speak against it.
> Greetings
> Thomas


I take it you have found a complete archive? Otherwise I could supply
mails back to June 2008.


--
Jörn Nettingsmeier
Tuinbouwstraat 180, 1097 ZB Amsterdam, Nederland
Tel. +49 177 7937487

Meister für Veranstaltungstechnik (Bühne/Studio), Tonmeister VDT
http://stackingdwarves.net
_______________________________________________
Jack-Devel mailing list
[hidden email]
http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org
Reply | Threaded
Open this post in threaded view
|

Re: List Achives

Thomas Brand
On Sat, February 23, 2019 23:15, Jörn Nettingsmeier wrote:
>
> I take it you have found a complete archive? Otherwise I could supply
> mails back to June 2008.
>

Hard to say if it's complete, it starts around 2009 so your archive would
be about half a month "older"!

You could send a private mail with a link to a tarball and I'll see if
using it fully or for the starting year will be easier to combine.
I let the conversion run over night (it takes around 1 sec per mail per
format) and it looks not bad, most stuff seems good for publishing.

New:
-for HTML, there are "prev" "next" links now.
-follow-up and in-reply-to are not visible IDs but the subject, including
author
-small-ish text processing to remove empty lines before the first
non-empty line and the same for the tail

I plan to put it to a throw-away repository soon for testing, before it
goes to github.com/jackaudio/mailing_list_archive.

Greetings
Thomas

_______________________________________________
Jack-Devel mailing list
[hidden email]
http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org