How to update SPF records

As currently specified, Sender Policy Framework (SPF) assumes that DNS updates are atomic and that SPF clients will always have a consistent view on the DNS. In practice, this is not true. Consequently, a very careful approach is needed if SPF records and DNS records which are referenced in SPF records are changed, or mail sent by your customers will temporarily bounce.

The synchronization problem

DNS is distributed database with an optimistic replication protocol, which trades temporary lack of consistency for increased availability. Let's look at an example to see what this means. Suppose enyo.de publishes the following DNS records:

enyo.de.	IN TXT	"v=spf1 mx -all"
enyo.de.	IN MX	10 mail.enyo.de.

mail.enyo.de is also the outgoing mail relay for clients sending mail from the enyo.de domain. This means that SPF-aware receiving mail transfer agents accept mail message.

Now suppose that the enyo.de administrator discovers that he cannot handle the growing mail load anymore, and therefore migrates mail handling to the ISP's mail server, mail.lf.net. The SPF record is updated accordingly, so that mail.enyo.de still can be used as an outgoing mail relay:

enyo.de.	IN TXT	"v=spf1 a:mail.enyo.de -all"
enyo.de.	IN MX	10 mail.lf.net.

However, when you do this, your customers will soon start to phone and complain to you that they cannot send mail anymore to some destinations. What happens? It turns out that due to the caching nature of DNS, resolvers at some store the old SPF record in their caches, but not the MX records, When an SPF-aware MTA receives mail from enyo.de, it tries to obtain the MX record for that domain from its local resolver. The resolver queries one of the authoritative name servers for enyo.de and obtains the new MX record (which already points to mail.lf.net). As a result, the SPF check fails, and the message bounces.

This is just one example how short-term DNS inconsistencies can lead to temporary SPF failures and bouncing of legitimate mail. If SPF records contain reference out-of-zone records, the synchronization problems get worse, and you may also have to deal with issues which are the result of crossing an administration boundary (for example, records in the target zone may be modified without prior consultation).

It gets worse: redirect, include, and dual-type records

The behavior of the redirect modifier and the include mechanism provided by SPF is particularly annoying: Because of the synchronization problem, the referenced record may not yet be available. The SPF algorithm yields a None, which is translated to a PermError, which in turn is historically translated into a SoftFail. There is some disagreement between the specification and existing practice what SoftFail means (see the separate section below); most sites translate a SoftFail to a Fail, which results in bounces.

If you think the redirect/include behavior is truly obnoxious, here's the real howler: If you publish SPF data in both SPF (type 99) and TXT DNS records, you cannot immediately change SPF data without risking an inconsistency between the two records of SPF and TXT type. Again, the SPF algorithm yields a PermError, which finally leads to bouncing of legitimate mail.

What to do?

Several rules can help to avoid the synchronization problem:

An example how to apply the last rule for a simple update is given in the next section.

Updating an A record: an example

Suppose that you currently publish the following DNS records for enyo.de:

enyo.de.       172800  IN TXT  "v=spf1 mx -all"
enyo.de.       172800  IN SPF  "v=spf1 mx -all"
enyo.de.       172800  IN MX   10 mail.enyo.de.
mail.enyo.de.  172800  IN A    212.9.189.167

You want to change the IP address of mail.enyo.de from 212.9.189.167 to 212.9.189.169. So the new configuration should look like this:

enyo.de.       172800  IN TXT  "v=spf1 mx -all"
enyo.de.       172800  IN SPF  "v=spf1 mx -all"
enyo.de.       172800  IN MX   10 mail.enyo.de.
mail.enyo.de.  172800  IN A    212.9.189.169

The necessary steps are outlined below. The change procedure begins on Monday morning.

  1. Monday morning, 9 o'clock AM: Low the TTLs of all records that need to be changed to a low value, say one hour (3600 seconds). Smaller values do not make a real difference, due to overzealous caching. Also drop the SPF record (but only the type 99 version).

    enyo.de.         3600  IN TXT  "v=spf1 mx -all"
    enyo.de.       172800  IN MX   10 mail.enyo.de.
    mail.enyo.de.    3600  IN A    212.9.189.167
    
  2. Monday morning, 9:15 AM: You have manually checked that all name servers for enyo.de publish the TXT, SPF and MX records with the lower TTL. No further changes are possible until Wednesday.

  3. Wednesday morning, 9:15 AM: Two days (or 172800 seconds), have passed and it's reasonable to assume that the lowered TTL has propagated to all caches, and that the the SPF (type 99) record has expired everywhere. Change the SPF record to a temporary version which includes both the old and the new IP address:

    enyo.de.         3600  IN TXT  ("v=spf1 ip4:212.9.189.167 "
                                    "ip4:212.9.189.169 -all")
    enyo.de.       172800  IN MX   10 mail.enyo.de.
    mail.enyo.de.    3600  IN A    212.9.189.167
    

    (The parentheses have no special meaning; they are just there to spread the TXT record across multiple lines.)

  4. Wednesday morning, 9:30 AM: You check that the zone update has successfully propagated to all authoritative name servers.

  5. Wednesday morning, 10:30 AM: At this point, the old SPF record should have expired on these caches. However, some resolvers cache for extended periods of time, so it is better to wait two more hours. (This is the reason setting a lower TTL in the first step does not make much sense.)

  6. Wednesday afternoon, 12:30 PM: The new SPF record is available from the majority of resolvers. You can begin to send mail from the 212.9.189.169 address. Change the A record to its final version:

    enyo.de.         3600  IN TXT  ("v=spf1 ip4:212.9.189.167 "
                                    "ip4:212.9.189.169 -all")
    enyo.de.       172800  IN MX   10 mail.enyo.de.
    mail.enyo.de.  172800  IN A    212.9.189.169
    
  7. Wednesday afternoon, 12:45 PM: The zone update has arrived on all authoritative name servers.

  8. Wednesday afternoon, 3:45 PM: The new A RR is available at most resolvers. You can now change the TXT record to its final version:

    enyo.de.       172800  IN TXT  "v=spf1 mx -all"
    enyo.de.       172800  IN MX   10 mail.enyo.de.
    mail.enyo.de.  172800  IN A    212.9.189.169
    
  9. Wednesday afternoon, 4 o'clock PM: The zone update has arrived on all authoritative name servers.

  10. Wednesday evening, 7 o'clock PM: The resolvers cache the new version. It is now safe to add the SPF (type 99) record:

    enyo.de.       172800  IN TXT  "v=spf1 mx -all"
    enyo.de.       172800  IN SPF  "v=spf1 mx -all"
    enyo.de.       172800  IN MX   10 mail.enyo.de.
    mail.enyo.de.  172800  IN A    212.9.189.169
    
  11. Wednesday evening, 7:15 PM: Check that the authoritative name servers serve an updated zone which includes the SPF (type 99) record.

Some sites may cache DNS records for even longer periods of time. If you know from experience that your customers send mail to such sites, you must increase the three hour delay to a more extended period time (maybe even a day or two). This means that the record update procedure can take a week before it is complete.

Keep in mind that negative caching of DNS records is common (RFC 2308 <https://www.rfc-editor.org/rfc/rfc2308.txt>). This means that you have to insert appropriate pauses before you can reference new DNS records directly or indirectly from SPF records. As described in the RFC, the TTL for negative caching is encoded in the SOA record of the zone.

The SoftFail ambiguity

SPF is supposed to be the Sender> Policy Framework. However, if you start using SoftFail (the "~" modifier), you delegate most of the policy decision to the receiver because the SoftFail specification is highly ambiguous:

A "SoftFail" result should be treated as somewhere between a "Fail" and a "Neutral". The domain believes the host isn't authorized but isn't willing to make that strong of a statement. Receiving software SHOULD NOT reject the message based solely on this result, but MAY subject the message to closer scrutiny than normal.

This sounds like a great way to introduce your SPF for your domains without taking too much responsibility, and many sites use a final "~all" term. However, most receivers treat a SoftFail like a Fail and bounce the message. This is particularly annoying because historically, PermError (which, despite its name, is also signaled for temporary errors, as explained above) is automatically mapped to SoftFail.

Necessary changes to the SPF specification

In my opinion, the update process described in the example is far too elaborate. The list below contains the necessary changes, and a proposal to resolve the SoftFail ambiguity is included.

After these changes, the example update is somewhat simpler. You still have to lower the TTL on the SPF records, but you need not remove the SPF (type 99) version. The remaining complexity is inherent to the way the Sender Policy Framework works. You can easily avoid it by only listing IP addresses in the SPF records. With the proposed changes, this is facilitated because the redirect modifier works reliably, even without complicated update procedures.

Revisions


Florian Weimer
Home Blog (DE) Blog (EN) Impressum RSS Feeds