DNSBL caches and IPv6, again

Discussion:

John R. Levine

2012-09-19 21:33:07 UTC

As I've mentioned a few time, I'm trying to figure out the cache behavior
of DNSBLs, so we can try and predict whether IPv6 BLs would make the
DNS melt down.

If I had traces of [IP,timestamp] from some medium sized mail sytems, I
could do some cache simulations. Medium is in in the range of a million
connections a day. Anyone have access to one of those? Nobody has IPv6
mail at that scale yet, but IPv4 would do fine for this.

I don't need to know whether the connection was for real mail or spam. If
you consider the IPs confidential, hashes or tokens would be fine so long
as the same token consistently corresponds to the same IP.

R's,
John

Dave Warren

2012-09-19 21:55:04 UTC

Permalink

Post by John R. Levine
As I've mentioned a few time, I'm trying to figure out the cache
behavior of DNSBLs, so we can try and predict whether IPv6 BLs would
make the DNS melt down.
If I had traces of [IP,timestamp] from some medium sized mail sytems,
I could do some cache simulations. Medium is in in the range of a
million connections a day. Anyone have access to one of those?
Nobody has IPv6 mail at that scale yet, but IPv4 would do fine for this.
I don't need to know whether the connection was for real mail or
spam. If you consider the IPs confidential, hashes or tokens would be
fine so long as the same token consistently corresponds to the same IP.

Isn't the fear that with IPv6, spammers simply won't use the same
address twice, thereby causing cache meltdown on a scale that isn't
possible in today's IP-scarce IPv4 world?

In other words, the data you get from legitimate mail servers in IPv4
may roughly correspond to the data you'd get from legitimate mail
servers in IPv4, but the data you get from spammers today won't be at
all representative of IPv6 spammer's potential behaviour.

Heck, even the data from legitimate mail might not mean much going
forward. I'd be at least a little tempted to send mail from different
clients from different IPs (or possibly even with more granularity for
clients who send person to person, bulk and transactional mail, but
don't currently send enough to justify wasting IPs to segregate such
traffic), so even legitimate sites might end up using a lot more
outbound IPs, simply because they can.

In other words, as much as I'd love to see some concrete data on this
going forward, I'm not sure that these simulations will apply to future
real-world situations.

--
Dave Warren
http://www.hireahit.com/
http://ca.linkedin.com/in/davejwarren

John Levine

2012-09-19 22:27:08 UTC

Permalink

Post by Dave Warren
Isn't the fear that with IPv6, spammers simply won't use the same
address twice, thereby causing cache meltdown on a scale that isn't
possible in today's IP-scarce IPv4 world?

That is my concern, but it is not at all clear how well existing DNSBL
queries cache. My current working hypothesis divides the mail world
into three parts:

Large: big mail systems get copies of the BLs out of band, e.g. by
rsync, and run a local rbldnsd on the same LAN as the mail servers.
Since a local rbldnsd can respond as fast as a cache, it uses a TTL of
zero to effectively bypass the cache, no problem.

Medium: mail systems query public BLs and use the local DNS cache.
Cache may or may not help.

Small: like medium, but so little traffic that cache entries all
expire before being reused, so it doesn't matter.

The concern is the medium systems. I have some hints, but nobody
really knows how well their queries cache.

If I had data I could try some experiments. Obvious things include
varying TTLs to see how that affects cache behavior. Slightly less
obvious things include the BL noting how many queries it gets for an
address, and returning a longer TTL for heavily queried addresses.
This would require a hacked server to vary the SOA TTL on negative
answers, but we know how to do that.

So, anyone got server log [IP,timestamp] data they can share?

R's,
John

Matthias Leisi

2012-09-20 05:10:39 UTC

Permalink

Post by John Levine
So, anyone got server log [IP,timestamp] data they can share?

I have [IP, # of queries] on a daily level from DNS query logs from
dnswl.org, eg in a file named "2012-08-03.aggregate":

| 178.63.223.135 1
| 219.255.134.101 4992
| 156.45.254.31 80

These are the numbers we see at the authoritative servers, ie after
caching by (mostly "medium" in your terminology) local resolvers. We
only collect about a third of the logs (we are only interested in
relative numbers, so that is not an issue for our own purposes).

Despite sanity checks, there are about 1% odd IPs, eg from those who
forget that they should use reverse-nibble notation for the lookups,
funny internal IP addressing schemes leaking out, DNSxLs trying to
look up whole ranges etc).

We also have data in the same format for the DNS server IPs that
actually query our servers.

We keep this data for about a month (the higher aggregated data, ie
sender magnitudes, top query sources etc are kept in the DB for
longer).

<shameless plug>We do not yet collect data on IPv6. If you want to
help us to change that, see
http://www.dnswl.org/news/archives/26-Do-you-want-to-support-the-dnswl.org-project.html</shameless
plug>

-- Matthias

John Levine

2012-09-20 06:08:09 UTC

Permalink

Post by Matthias Leisi

Post by John Levine
So, anyone got server log [IP,timestamp] data they can share?

I have [IP, # of queries] on a daily level from DNS query logs from

Thanks, but I really need mail server logs.

I have logs for korea.services.net which is still surprisingly
popular. Perhaps we could compare notes sometime.

R's,
John

Chris Lewis

2012-09-20 01:06:02 UTC

Permalink

Post by John R. Levine
As I've mentioned a few time, I'm trying to figure out the cache
behavior of DNSBLs, so we can try and predict whether IPv6 BLs would
make the DNS melt down.

Didn't you already do this with data from my logs a year or more ago?

I don't have large production flow anymore but will IP:timestamp from a
trap do? I have several at the million-ish/day level ;-)

John Levine

2012-09-20 05:06:58 UTC

Permalink

Post by Chris Lewis

Post by John R. Levine
As I've mentioned a few time, I'm trying to figure out the cache
behavior of DNSBLs, so we can try and predict whether IPv6 BLs would
make the DNS melt down.

Didn't you already do this with data from my logs a year or more ago?

No, that was something else.

Post by Chris Lewis
I don't have large production flow anymore but will IP:timestamp from a
trap do? I have several at the million-ish/day level ;-)

It's certainly better than nothing. A real mail server would probably have
large chunks of traffic from big mail hosts that a trap doesn't, but I'll
take what I can get.

By the way, anyone else interested in modelling and simulation? It's fun.

R's,
John