Diary of a Network Geek

The trials and tribulations of a Certified Novell Engineer who's been stranded in Houston, Texas.

1/10/2007

New Perl Scripts

Filed under: Deep Thoughts,Fun Work,Geek Work,PERL — Posted by the Network Geek during the Hour of the Monkey which is in the late afternoon or 5:59 pm for you boring, normal people.
The moon is a Third Quarter Moon

So, I’ve been writing a bit of Perl again…

It’s kind of a long story that’s really rather boring, I think, but we’re changing e-mail providers at work again. They’ve got a new system to try and reduce spam that involves a challenge-response system and a whitelist.
For those not familiar, it works like this: The first time you send an e-mail to their servers, the anti-spam system fires back a verification e-mail to you. That e-mail verifies that you’re a human and not a spambot by asking you to click on a link. When you click the link, it adds you to the system’s whitelist and lets your e-mail through from then on. Pretty good system, actually. And, about the only way to assure virtually no spam gets through.

Well, to minimize hassle to our customers, we decided to pregenerate a whitelist of known, good e-mails. Naturally, that task fell to yours truly.
So, I turned to my old pal Perl. The mail is mostly stored in a UNIX mail format called “mbox”, which, luckily for me, is basically a flat file. It’s like a giant text file that has a lot of extra junk in it that no one but mail programs care about. So, the first thing I did was dig up code, and modifiy it, to pull all the e-mail addresses out of those mbox files. I called it “emailpull.pl“. That managed to pull all kinds of addresses. In fact, after I culled out the obviously bad address and eliminated the duplicates, I had a little over 4000 addresses.
Well, that was just a little too many for me to just dump into a whitelist without some kind of extra verification. So, I hunted around and found a handy CPAN module called “Mail::CheckUser” which is meant, you guessed it, to help check e-mail users. A little finagling with the code and I put together “emailverify.pl“. That little badboy takes a list of e-mail address, in text file form, and verifies them with the alleged e-mail host. Works like a charm!

Oh, and if you’re a Perl fan/addict/whatever, check the links to the code. They take you to a place called PerlMonks.org. They used to be the place to get code and help and, well, everything Perl related. But, you know, lately? Not so much. When I was there putting these two snippets of code up, there was a whole big bruhaha going on about membership to some internal, super-secret cabal group. And, there’s a lot of focus on getting levels and all sorts of junk like that. Which is ironic, to me, considering that Larry Wall, the guy who wrote Perl, did so in the hopes it would draw people together in harmony and spirit of helpfulness.
Ah, well, at least I got my task accomplished. Well, at least it will be by morning. That second script was still running when I left the office.

Update: That second script, when it was done running, reduced 4060 e-mail addresses down to 3255 validated e-mail addresses. Hopefully, it culled all the potential spam originators!


Powered by WordPress
Any links to sites selling any reviewed item, including but not limited to Amazon, may be affiliate links which will pay me some tiny bit of money if used to purchase the item, but this site does no paid reviews and all opinions are my own.