Archive for the 'weblogs' Category

Planet Antispam Update

Friday, November 24th, 2006

Hey, some Planet Antispam updates.
I’ve upgraded to Planet 2.0, and that seems to have solved some of the wierdness
with consuming Atom feeds.

Also, there are two new antispam weblogs added to the subscription list:

Welcome guys!

(btw, if you’re wondering what happened to the music post — I moved it over here, to the mp3 blog where it was supposed to be posted in the first place, duh ;)

Tags:

This post was written by Justin, source: Planet Antispam Update

Blogorrah

Wednesday, July 5th, 2006

Blurred Keys:
Blogorrah.com - the start of empire building with ‘very few overheads’
.
Blurred Keys, “an Irish media blog”, brings the revelation that Blogorrah
“copies” Gawker.com.

Honestly, though, this is blatantly obvious — and I’d consider it unfair to
call this “copying”. It’s simply taking a successful format and adapting it to
the local market, and doing so very well indeed if you ask me.

Blogorrah is a hilarious read. If you’re Irish and you’re not subscribed,
you’re really missing out… it’s the funniest thing on the Irish web these
days.

Tags:

This post was written by Justin, source: Blogorrah

Blog Spam, and a ‘nofollow’ Post-Mortem

Wednesday, May 31st, 2006

An interesting article on blog-spam countermeasures — Google’s
embarrassing mistake
. Quote:

I think it’s time we all agreed that the ‘nofollow’ tag has been a complete
failure.

For those of you new to the concept, nofollow is a tag that blogs can add to
hyperlinks in blog comments. The tag tells Google not to use that link in
calculating the PageRank for the linked site. […]

Since its enthusiastic adoption a year and a half ago, by Google, Six Apart,
Wordpress, and of course the eminent Dave Winer, I think we can all agree
that nofollow has done — nothing. Comment spam? Thicker than ever. It’s had
absolutely no effect on the volume of spam. That’s probably because comment
spammers don’t give a crap, because the marginal cost of spamming is so low.
Also, nofollow-tagged links are still links, which means that humans can
still click on them — and if humans can click, there’s a chance somebody might
visit the linked sites after all.

I agree. At the time, I pointed at
this comment from Mark
Pilgrim
:

Spammers have it in their heads now that weblog comments are a vector to
exploit. They don’t look at individual results and tweak their software to
stop bothering individuals. They write generic software that works with
millions of sites and goes after them en masse. So you would end up with
just as much spam, it would just be displayed with unlinked URLs.

Spammers don’t read blogs; they just write to them.

I still think he was spot on.

However, one part of the ‘Google’s embarrassing mistake’ article is a red
herring — I think the chilling effect on “nonspam links” is not to be worried
about; as Jeremy Zawodny
said
, life’s too short to
worry about dropping links purely in the hopes of giving yourself Page Rank. I
don’t know if I really want links that people are leaving purely for that
reason. ;)

In fact, I wouldn’t be surprised to hear that Google’s crawler starts treating
“nofollow” links as mildly non-spammy in a future revision, due to their wide
use in wikis, blogs etc.

To be honest, though — I don’t see the problem of blog-spam much anymore.
As I said here:

[Weblog] comment spam should be a lot easier to deal with than SMTP spam. …
With weblog comments, you control the protocol entirely, whereas with SMTP
you’re stuck with an existing protocol and very little “wiggle room”.

On my WordPress weblog [ie. here] — which, admittedly, gets only about 1/4 of the
traffic plasticbag.org does — I’ve instituted
a very simple check stolen from Jeremy
Zawodny
. I simply include a form field
which asks the comment poster for my first name, and if they fail to supply
that, the comment is dropped. In addition, I’ve removed the form fields to
post directly, requiring that all comments are previewed; this has the nice
bonus of increasing comment quality, too.

Those are the only antispam measures I’m using there, and as a result of
those two I get about 1 successful spam posted per week, which is a one-click
moderation task in my email. That’s it.

The key is to not use the same measures as everyone else — if every weblog
has a different set of protocols, with different form fields asking different
simple questions, the only spammers that can beat that are the ones that
write custom code for your site — or use human operators sitting down to an
IE window.

Trackbacks, however — turn that off. The protocol was designed poorly, with
insufficient thought given to its abuse potential; there’s no point keeping it
around, now that it’s a spam vector.

Finally, a “perfect” solution to blog spam, while allowing comments, is
unachievable. There will always be one guy who’s going to sit down at a real
web browser to hand-type a comment extolling the virtues of some product or
another. The goal is to get it to a level where you get one of those per
week, and it’s a one-click operation to discard them.

Tags:

This post was written by Justin, source: Blog Spam, and a ‘nofollow’ Post-Mortem

Poll: keep ‘Fixing Email Weblog’ in Planet Antispam?

Tuesday, May 23rd, 2006

I added the Fixing Email weblog to Planet Antispam a while back — however, I’m not entirely sure at this stage that its content (which is primarily news syndication) fits with the “planet” concept (which is primarily intended for first-person posts).

So — quick poll. Let me know what you think, pro or con, Planet readers: should I remove the Fixing Email feed from that site?

Tags:

This post was written by Justin, source: Poll: keep ‘Fixing Email Weblog’ in Planet Antispam?

Link-blog Networking

Friday, May 12th, 2006

Cool — del.icio.us just added a
feature
whereby you
can now see who has you in their network, and, of course, you can further view
their networks
and see who’s in them.

This’d be great to produce social-network graphs, although I daresay Joshua
mightn’t be so keen on the spidering load. ;) I’ve optimistically requested some form of dump, anyway.

The social networking aspect of link collection and link-blogging via
del.icio.us is emerging nicely; I’m keen to see what’s next in the pipeline.

A few interesting things:

  • Almost everyone who’s using del.icio.us seriously for link collection — ie. applying some quality control thresholds, and bothering to write one-line descriptions, at least — has filled out their ‘network’ by now.

  • It’d be useful to have “groups”, so that we can now assert things like “jm, boogah, n0wak, negatendo, tweebiscuit, leonardr, muckster and torrez form a group”. I’m sure that’d provide useful info, although could probably be inferred anyway. (People are attempting to hack it by using a shared tag on all their postings, like the “irishblogs” tag, but that’s an awful misuse of tagging in my opinion ;)

  • Also, it’ll be interesting to see what’ll happen once Google Co-op figures out a way to incorporate the del.icio.us network data. To be honest, I’m very surprised it wasn’t already in there — it seems like a no-brainer… maybe some Y!/G corporate rivalry is getting in the way.

Anyway, in the meantime it’s producing lots of good fodder for my
SpicyLinks feed.

SpicyLinks is an implementation of something that I mentioned in a comment on
this weblog entry, regarding future
methods of reading weblogs; in essence, it’s an automated blog aggregation summariser. It
reads other people’s link-blogs, so I don’t have to, and reports the stuff that
proves popular in my personal collection of sources.
(Credit where due: HotLinks provided much of the inspiration, but doesn’t support personalisation, hence the reimplementation.)

SpicyLinks is similar to Populicious, but that app
really misses the point, in my opinion. I don’t particularly want to know what
everyone is pointing at; I want to know what a selected set of trusted
sources (with good taste!) are pointing at.

This aggregation is pretty similar to the del.icio.us ‘network’ feed, but with much lower volume, and a higher signal/noise ratio, attained by dropping the ‘one-off’ items that only one person is pointing
at. Initially, that may seem like a major failure, since you miss the ‘fresh
bits’ — but as long as you’ve got the right people in your source network, it
actually works very well.

It’d be great if this was one of the features implemented in the del.icio.us ‘network’ system…

Tags:

This post was written by Justin, source: Link-blog Networking

Planet Antispam update

Tuesday, March 28th, 2006

Quick update — I’ve added Ed Falk’s “Spam Diaries” to http://planet.spam.abuse.net/ .

This post was written by Justin, source: Planet Antispam update

Weblog Spam and Adversarial Classification

Monday, January 30th, 2006

Dr. Dave, author of the Spam Karma WordPress antispam plugin, has posted an
interesting article about new weblog-spammer
tactics
:

These spams do not present most of the idiotic traits of their lower
colleagues: they do not try cramming hundreds of URLs or inserting hundreds
of easily spotted junk keywords in the comment content. Instead, they use
only the dedicated name and homepage fields to sneak in spam URL and
keywords. The comment content is often perfectly innocuous, sometimes even
topical (by copying parts of another comment or a trackbacking post). All in
all, these spams could easily be missed by a human moderator who wouldn’t
look carefully at the contact name and URL.

(Thanks to Kelson Vibber for the pointer
to this.)

In other words, he is noting what we noticed in email anti-spam; that what
works well one year, is likely to degrade over time as the spammers attempt to
evade it, and one has to keep working to keep up.

The best term for this appears to be adversarial
classification
. Anti-spam
activities fall into this category, and it often means that classic text
classification algorithms aren’t suitable — after all, the Reuters-21578
dataset

never tried to evade your classifier ;)

In a similar vein, this MS research
paper
is interesting:

Previous work on adversarial classification has made the unrealistic
assumption that the attacker has perfect knowledge of the classifier. …. We
present efficient algorithms for reverse engineering linear classifiers with
either continuous or Boolean features and demonstrate their effectiveness
using real data from the domain of spam filtering.

It’s akin to John Graham-Cumming’s work looking into how a spammer could get
past a bayesian filter “from the outside”, but with more techniques, and
examining MS’ MaxEnt algorithm, too. PDF
here
, well worth
a read.

(By the way, I’m in the process of moving house, so if you send me an email, it
may take a while for me to reply. This situation is likely to prevail for the
next few weeks, for what it’s worth — fun.)

This post was written by Justin, source: Weblog Spam and Adversarial Classification

Planet Antispam: Beta No More

Tuesday, January 17th, 2006

Planet Antispam has been working
pretty nicely for the last couple of weeks — can’t say I’ve noticed any
trouble, and its RSS feed is turning out to be a nice aggregation of anti-spam
news. On top of that, John Levine was kind enough to set up a CNAME for it at a more
appropriate URL — http://planet.spam.abuse.net/.

As a result, it’s now fully-fledged, and fit to lose the ‘beta’ qualifier. Please bookmark,
subscribe to the feeds, and pass on the URL to others you think may be
interested!

This post was written by Justin, source: Planet Antispam: Beta No More

Planet Antispam

Tuesday, January 3rd, 2006

So a few weeks back, I mooted the idea of an anti-spam Planet site,
similar to Planet GNOME, Planet Java, Planet Perl et al.

Here’s the results: Planet Antispam.

It’s still got a few rough edges; notably, the URL is not permanent — I’d
prefer something at a more spam-themed domain — and the logo is the
generic “PlanetPlanet” one. But it’s up and running in a beta-ish
fashion.

Feel free to bookmark, subscribe, post the URL on, etc.; and if you’d like
to give it a better home with an A record at a spam-themed domain, drop me
a line.

By the way, it also needs more source feeds. If you know of people with
blogs, working on/writing about anti-spam (of both the weblog and the
email variety), with RSS feeds that work, include the post text, and
permit further redistribution of that text, drop us a line and I’ll add
them.

Finally, here’s a picture of a Starbucks SPAM(r) Sandwich. (shudder)

This post was written by Justin, source: Planet Antispam