Wednesday, March 27, 2013

How to fix broken SOA serials


Today we will cover the basics on how to fix a broken serial number, or SOA, in BIND. For the sake of this example, let's assume we have two nameservers; ns0.example.com being the master, and ns1.example.com being the slave.

What's the scenario ?

You want to add a new PTR record to your master (ns0.example.com). After you reload BIND to push your new DNS changes, you notice you accidentally added an extra digit to the serial number while incrementing it. The new serial number, '20130327002', is one digit longer than the previous one, incrementing it tenfolds. In your eagerness to correct the mistake, you remove the last digit and reload the master. Unfortunately, this breaks propagation, as the slave will not accept any updates associated with serial numbers lower than its own. The slave still has the incorrect serial number. Updating the serial in the master won't suffice, as those changes will be ignored by the slave.

An easy fix would be to just add an even higher value and reload the master. While that would work, it doesn't fix or undo the original mistake. A better approach would be to reset the serial back to its original 10-digit format, without breaking configuration or propagation.

But how ?

First, we need to understand how serial numbers work in BIND. The serial number, and only the serial number, determines whether a change in the master should be pushed to its slaves. The serial numbers are maintained manually, where a total of two changes are made for every one record that is added; adding the actual record, and then incrementing the serial number to flag the changes as new. However, this process is prone to errors where admins sometime forget to update the serial, increment by too much, or not enough. Other DNS server packages might handle serial numbers differently, but for the sake of this article, let's assume the whole world runs BIND. The date format is really just a convention. Any number will do fine, but for administrative purposes, it is good to use the date of the latest change.

The Start of Authority (SOA) is an unsigned 32-bit integer that can wrap around the largest possible 32-bit unsigned integer. As such, one can increment the serial repeatedly until it rolls over the max and goes through the 0. What this means in layman's terms is that if you increase the SOA enough, it will overflow and start from 0 again. So, by adding a value 2 or 3 increments higher than the theoretical max, BIND will interpret it as a low value. After that, you can then reset the serial number to its original 10-digit format. However, let's go through the steps one by one, for clarity. It is assumed in this guide that today's date is the 27th of March, 2013. This would make the ideal serial for the first edit 2013032701.

1: Open the faulty zone file for your IP network. Make sure you replace the IP network with your own. The /24 listed below is a test network that will not work for you. The PTR we want to add in this example is 1 86400 in ptr ptr.example.com. Of course, your PTR should already be in the file since the only problem in this example is a faulty serial number.
One thing to consider, is that for the rest of the guide to accurately depict your situation, we recommend you turn on notify for your slaves, and/or lower the timer, so that the changes propagate faster. This will speed up the process considerably.
sudo vim /var/named/2.0.192.in-addr.arpa

2: Locate the faulty SOA (20130327002), and replace it with the max value for an unsigned 32-bit integer. Did it just get really complicated? Don't worry; you're not alone. This is where things get tricky for most people without a calculator. However, it is not as hard as it seems, as the max value can be calculated using the formula ((2^32) -1), which gives a theoretical range of 0 to 4294967295. Note that you should avoid using 0 as it may be significant and could have a special meaning in certain DNS implementations. Another thing to consider is that the max incrementation one can do, is actually calculated using ((2^31) -1) which gives a result of 2147483647. What this means, is that if you increment the number by the maximum value (2147483647), it will wrap through max and end up the same number. A lot of guides will tell you this part is a two step process, where you first add the max incrementation value, and then push the changes and increment it again. However, in most cases, you can save some time by incrementing it to the max value for an unsigned 32 bit integer. This will also save you some calculations, as incrementing with the max incrementation value, will only work if the old value + incrementation wont exceed the max. But, anyway, saving time is good, mkey. So, let's do that.
The new value after the change should be 4294967295.

3: Reload BIND (or just the zone file if you prefer that).
/etc/init.d/named reload

4: Make sure the change has now been pushed to all slaves.

4.1: You can do this by using bash:
host -C example.com. 

This should give you a result similar to (but with your data, of course. Note that the first value on rows 3 and 6 are the SOA):
        example.com has SOA record sns.dns.icann.org. noc.dns.icann.org. 
        4294967295 7200 3600 1209600 3600
        example.com has SOA record sns.dns.icann.org. noc.dns.icann.org. 
        4294967295 7200 3600 1209600 3600

4.2: By using the excellent interface at Robtex.com. It will provide you with a lot more information than is available by using the program 'host', as well as present the data in a more easily readable format. Remember, easy is good, mkey.

No matter what method you choose, make sure both the primary and secondary nameservers have the updated serial number (4294967295). If they do, you can safely assume the PTR changes have propagated across the network of nameservers (you could have a lot more than just two here, so make sure it is the same -everywhere-).

5: Open your zonefile again, and change the value to the correct serial. Again, we recommend using the date followed by two incrementation slots for maintenance convenience. In this case, it would be 2013032701. There is no need to make it 2013032702 as we have wrapped the value through max completely and anything above 0 would now work.

6: Repeat step 4 for the new serial number. If you were not successful, go back and make sure step 2 was followed to the last detail. If you were successful, proceed to the next step.

7: Celebrate by rewarding yourself with a cold drink in the sun. Congrats! You have now fixed the broken DNS propagation in BIND, using Serial Number Arithmetic and Mathematics For Fun and Profit. For more information, check the applicable RFC, available at: http://tools.ietf.org/html/rfc1982.

Thanks for listening!

Best regards,

Johan Boger, Robtex.com

(A shout-out to Mikael Abrahamsson and the rest of #networker for providing valuable feedback)

No comments:

Post a Comment