|
|
|
SearchCategories
Books by the AuthorOther Ruby Projects |
mail_to_news.rb
Posted about 1 year ago
in The Gateway.
Required Reading: You need to know what the Gateway is and the rules for suggesting changes before reading this article. There are two halves to the Ruby Gateway. One half runs as a qmail filter for an email address on the Ruby Talk mailing list. Every message sent to that address is piped through this filter with a shell script like:
The email is piped to the filter via the standard input and the code is expected to handle the message by posting it to comp.lang.ruby or choosing to ignore it. If the filter exits normally, qmail considers the matter handled. A non-zero exit code will cause the filter to be called with that same message again later. The CodeLet’s dive right into the source of this half of the Gateway:
The code above just sets things up so this script can require some other files in the project normally. Here are those requires:
The last three requires are standard Ruby libraries. The first two are not. The servers_config.rb file sets up a The nntp.rb is pretty much a vendored copy of the net-nntp library. I’ve made some minor changes for debugging output purposes, but the library functions the same. We’re now ready to initiate logging. Here’s the code that starts that process:
That just builds a Now it’s time to start setting variables in preparation for the coming email parse:
The Gateway passes through only a subset of the email headers for the Usenet post. The above is the list of those headers and the Regexp that will locate them. Now, there are two types of messages we do not wish to forward: spam and a message sent to Ruby Talk by the other half of the Gateway (causing an infinite loop of sending). The following code prepares flags for these conditions:
The following code allocates variables to hold key header information parsed from the message:
Note that we get the headers started with an X-received-from explaining our service and add the X-rubymirror flag the other half of the Gateway will use to detect that this half of the Gateway sent this new post we are creating. Now we need to parse the email headers:
There’s nothing too tricky in the above code. We match headers with simple expressions, pulling the information we need into variables. We also set flags as appropriate and add to the headers we have started for the newsgroup post. This code stops reading at the blank line signaling the end of the email headers. The code above didn’t address flagged messages immediately, because we wanted to be able to log the key details about them. We now have those details, so it’s time to address the flags:
As you can see, flagged messages are noted in the log and we exit cleanly without further processing. The Gateway does some final header doctoring in an attempt to set a reasonable References header and also includes the Ruby Talk message id for reader reference:
We are finally ready to construct a complete Usenet post:
The above code just joins the headers and existing message body, cleans up the newlines and logs our progress. Actually sending the message is a two-step process. First we connect to our Usenet host:
Above you can see several references to the Note that we exit with an error code if anything goes wrong here, assuming the problem is temporary and allowing qmail to try again later. The final step is to send the message:
The above code makes the post and logs what we have accomplished. Again we exit with error codes if something goes wrong, to signal retries. The Possible ImprovementsUsenet and email are two different worlds with opposing rules. Our Usenet host, like many, does not allow the posting of multipart/alternative messages (used to send HTML email). Some have expressed a desire for the Gateway to convert these messages into a Usenet safe format. This could possibly be done by using the text/plain variant of content, when provided, and stripping the HTML when it is not. This change is of low importance to me, since I don’t believe posters should be sending HTML email to Ruby Talk. In a similar vein, some Usenet hosts reject certain types of multipart/mixed messages (used to send email attachments), generally those that have Base 64 encoded portions to avoid allowing binary content through. Our host allows such posts, but they may not be well circulated on Usenet for these reasons. Again we might be able to inline the content for these files, but this could get pretty tricky for some attachments. For example, imagine a post with a zip archive of files. This problem interests me more than HTML email. The first step to either off these changes is probably to switch to a real email parsing library. I imagine the original code didn’t use one because the choices weren’t convenient when the Gateway was designed. I just cleaned up the code in my rewrite and don’t have enough experience with such libraries to select the proper replacement. Odds are this could simplify a fair portion of the Gateway code though, if we find the right one. We are looking for a library that:
If you have experience with such a library, please leave a comment below showing how this could be used to simplify the code above. |
|
|
|
Thanks for writing this up.
I've been messing around with my own email list to blog gateway. For that I've found TMail to work pretty well for parsing email messages. I believe it supports all your criteria.
I have a brain-dead technique for trying to extract text from multipart emails:
This doesn't always work great, emails that have been cut and paste from Word into Outlook are particularly painful, but it works most of the time.
Have you considered trying to implement References or In-Reply-To header support for threading messages? I've worked on that but only to one level deep since in my gateway a new email is a post and a response becomes a comment to that post. The problem I've found in email is that some people respond to messages but start entirely new threads, and some people compose an entirely new email but are actually responding. Some even put an Re: subject line to make it look like they hit reply instead of compose.
I've found that using In-Reply-To, References and Thread-Index (Microsoft clients send this) with a dash of Subject line comparisons I get fairly accurate representations of the threads regardless of whether they hit reply or compose. I haven't tried implementing JWZ's threading algorithm.
Thanks for the tips. I will definitely play with TMail and see if I can use that to make this process easier.
Just FYI, your code checked
part.multipart?twice, once in theelsifcondition and again in the followingifcondition.I did show the code that handled References in my write-up. It's very basic. Improving this might be another possible area of improvement.