URL shorteners, spam and DNS

Discussion:

Martijn Grooten

2011-08-22 16:35:05 UTC

URL shorteners (bit.ly, goo.gl, tinyurl.com etc.) have become popular in recent years for rather obvious reasons. They are being used by spammers for equally obvious reasons - both in email and on other platforms (e.g. Twitter).

A filter that checks URLs/domains against a blacklist will either miss the bad domains hidden behind the shorteners or, if they blacklist the shortener, find itself blocking legitimate messages. Do Not Use (third-party) URL Shorteners is sound advice to those sending email, but it's not going to stop random users from copying shortened URLs from Twitter or Facebook and pasting them into emails and shortened URLs are unlikely to stop featuring on Twitter.

Tell those providing shorteners to check URLs against blacklists is also a good idea - and probably necessary for them to stop ending up on blacklists themselves - but if a filter happens to prefer a different blacklist it doesn't help much. (I also don't know if checks are made every time someone clicks on the link or just when the shortened URL is generated.)

So I was wondering if it would help if shorteners published the URLs in a DNS txt record. As the path of a shortened URL usually consists of lowercase, uppercase letters and numbers, the uppercase letters need to be encoded, e.g. by preceding them with an underscore. So for instance to look up the URL behind

http://bit.ly/gkP0H

would require a lookup of the TXT record for

gk_p0_h._short.bit.ly

Now I don't know if this is something that would actually help those developing spam-/content-filters. Doing a HTTP lookup to determine the URL isn't exactly rocket science - though intuitively, it seems more 'natural' to use DNS, especially if that's what is used for the URL blacklist lookup.

Nor do I know if this would be something that would interest those providing shortening services. As it would allow browsers to avoid making a HTTP request to their services, it would mean they would stop having reliable click through statistics which, I guess, are a source of revenue to them.

But I thought I'd post it here anyway as perhaps it is useful. In which case I'm sure it can be improved upon.

Martijn.

Virus Bulletin Ltd, The Pentagon, Abingdon, OX14 3YP, England.
Company Reg No: 2388295. VAT Reg No: GB 532 5598 33.

Steve Atkins

2011-08-22 16:55:56 UTC

Permalink

Post by Martijn Grooten
URL shorteners (bit.ly, goo.gl, tinyurl.com etc.) have become popular in recent years for rather obvious reasons. They are being used by spammers for equally obvious reasons - both in email and on other platforms (e.g. Twitter).
A filter that checks URLs/domains against a blacklist will either miss the bad domains hidden behind the shorteners or, if they blacklist the shortener, find itself blocking legitimate messages. Do Not Use (third-party) URL Shorteners is sound advice to those sending email, but it's not going to stop random users from copying shortened URLs from Twitter or Facebook and pasting them into emails and shortened URLs are unlikely to stop featuring on Twitter.
Tell those providing shorteners to check URLs against blacklists is also a good idea - and probably necessary for them to stop ending up on blacklists themselves - but if a filter happens to prefer a different blacklist it doesn't help much. (I also don't know if checks are made every time someone clicks on the link or just when the shortened URL is generated.)
So I was wondering if it would help if shorteners published the URLs in a DNS txt record. As the path of a shortened URL usually consists of lowercase, uppercase letters and numbers, the uppercase letters need to be encoded, e.g. by preceding them with an underscore. So for instance to look up the URL behind
http://bit.ly/gkP0H
would require a lookup of the TXT record for
gk_p0_h._short.bit.ly
Now I don't know if this is something that would actually help those developing spam-/content-filters. Doing a HTTP lookup to determine the URL isn't exactly rocket science - though intuitively, it seems more 'natural' to use DNS, especially if that's what is used for the URL blacklist lookup.
Nor do I know if this would be something that would interest those providing shortening services. As it would allow browsers to avoid making a HTTP request to their services, it would mean they would stop having reliable click through statistics which, I guess, are a source of revenue to them.

It's not going to happen.

To do this would require a fairly complex, database backed "stunt" DNS server, hooked fairly intimately into the URL shorteners core application.

URL shorteners are deployed by people who understand webservers and HTTP, not DNS servers. Getting them to deploy anything beyond the basic DNS required to serve a website is a non-starter.

The only thing I'd expect to get any level of traction for would be something that can be implemented in a webserver, in much the same way as the URL shortener itself is implented (and, probably, running in the same webapp as the shortener itself).

There are several obvious ways to do that - http queries to a related hostname (http://destination.wttw.me/q8Qzth), a related path (http://wttw.me/d/q8Qzth), a suffix to the URL, a different MIME type, etc, etc.

The cleanest way would be to use a different request type. Conveniently, that's already implemented. If you send a HEAD request for the URL, you'll get a 301 Moved response, and a Location: header containing the target URL.

And, at least for the URL shortener I use (bit.ly), that HEAD request will not count as a click for statistics tracking.

platter:steve$ telnet wttw.me 80
Trying 168.143.174.97...
Connected to wttw.me.
Escape character is '^]'.
HEAD /q8Qzth HTTP/1.1
Host: wttw.me
Connection: close

HTTP/1.1 301 Moved
Server: nginx
Date: Mon, 22 Aug 2011 16:52:38 GMT
Content-Type: text/html; charset=utf-8
Connection: close
Set-Cookie: _bit=4eXXXXXXXXXXXXXXa8;domain=.wttw.me;expires=Sat Feb 18 11:52:38 2012;path=/; HttpOnly
Cache-control: private; max-age=90
Location: http://irtf.org/
MIME-Version: 1.0
Content-Length: 108

Connection closed by foreign host.

Cheers,
Steve

Daniel Feenberg

2011-08-22 18:02:17 UTC

Permalink

Is there some reason the spam filter can't just send a head request to
bit.ly and find out the destination URL? That wouldn't require any
cooperation among the various parties to this potential transaction, and
cooperation is always in short supply in this field.

Daniel Feenberg

Dotzero

2011-08-22 18:21:19 UTC

Permalink

Post by Daniel Feenberg

Bit.ly allows you to add a "+" to the end of a shortened URL to see
where it goes and to also see some stats regarding the URL. Picking a
random URL from Google News, try this.... http://bit.ly/ovrbAu+. My
understanding is that a number of other well known shortener providers
have followed suit.

Mike

Martijn Grooten

2011-08-23 16:55:19 UTC

Permalink

Post by Dotzero
Bit.ly allows you to add a "+" to the end of a shortened URL to see
where it goes and to also see some stats regarding the URL. Picking a
random URL from Google News, try this.... http://bit.ly/ovrbAu+. My
understanding is that a number of other well known shortener providers
have followed suit.

I think they have and I think it's a great thing if you're a human and you want to check a shortened URL before clicking the link. But if you're a computer it's not very useful, definitely not easier than sending a HEAD request (which I hadn't though of and will of course be faster than GET).

Martijn.

Virus Bulletin Ltd, The Pentagon, Abingdon, OX14 3YP, England.
Company Reg No: 2388295. VAT Reg No: GB 532 5598 33.

Emanuele Balla (aka Skull)

2011-08-22 19:19:28 UTC

Permalink

Post by Daniel Feenberg
Is there some reason the spam filter can't just send a head request to
bit.ly and find out the destination URL? That wouldn't require any
cooperation among the various parties to this potential transaction, and
cooperation is always in short supply in this field.

Yeah!
This way smaller redirectors abused by real spammers are simply DDoSed
out of business and stop being an issue!!!
+1! ;-)

--
Paranoia is a disease unto itself. And may I add: the person standing
next to you may not be who they appear to be, so take precaution.
-----------------------------------------------------------------------------
http://bofhskull.wordpress.com/

John Levine

2011-08-23 06:57:52 UTC

Permalink

Post by Martijn Grooten
So I was wondering if it would help if shorteners published the URLs in
a DNS txt record.

They could publish them in a DNS A record, which would require no
hackery at all to browsers. That is, instead of

http://bit.ly/abcde -> http://abcde.bit.ly

But this would be a big problem for large ISPs, due to DNS caches filling
up with the things, a problem noted when Facebook did something similar.

R's,
John

João Gouveia

2011-08-23 10:25:26 UTC

Permalink

Hi,

----- Original Message -----

Sent: Tuesday, August 23, 2011 7:57:52 AM
Subject: Re: [Asrg] URL shorteners, spam and DNS

Post by Martijn Grooten
So I was wondering if it would help if shorteners published the URLs in
a DNS txt record.

They could publish them in a DNS A record, which would require no
hackery at all to browsers. That is, instead of
http://bit.ly/abcde -> http://abcde.bit.ly

We're actually doing something similar with Mailspike, not only for shorteners bot also for some abused web hosting providers.
The general idea is tracking shorteners being used in spam and do the get-final-destination-and-check-if-it-is-bad on our infrastructure, and then we publish "bad" shortener URIs as FQDNs using rbldsnsd.
I'm attaching a sample perl implementation if you want to take a look.

This is valid also for example for yahoogroups and others.
For example:

http://tech.groups.yahoo.com/group/ebishuartrl/message points to a spammy URI, and the yahoo entry would map to the fqdn:

78e73102.0d448dd4.db0f6f37.tech.groups.yahoo.com

Thus allowing listing full URIs as fqdns.

But this would be a big problem for large ISPs, due to DNS caches filling
up with the things, a problem noted when Facebook did something similar.
R's,
John
_______________________________________________
Asrg mailing list
http://www.irtf.org/mailman/listinfo/asrg

--
Joao Gouveia
AnubisNetworks
Av. Quinta Grande, 53
EdifÃcio Prime, 5ÂºA
Alfragide
2610-156 AMADORA
Portugal
Tel. : +351 21 7252110
Mobile : +351 91 9512960
Fax : +351 21 7252119
***@anubisnetworks.com
http://www.anubisnetworks.com