As currently specified, Sender Policy Framework (SPF) assumes that DNS updates are atomic and that SPF clients will always have a consistent view on the DNS. In practice, this is not true. Consequently, a very careful approach is needed if SPF records and DNS records which are referenced in SPF records are changed, or mail sent by your customers will temporarily bounce.
DNS is distributed database with an optimistic replication protocol, which trades temporary lack of consistency for increased availability. Let's look at an example to see what this means. Suppose enyo.de
publishes the following DNS records:
enyo.de. IN TXT "v=spf1 mx -all" enyo.de. IN MX 10 mail.enyo.de.
mail.enyo.de
is also the outgoing mail relay for clients sending mail from the enyo.de
domain. This means that SPF-aware receiving mail transfer agents accept mail message.
Now suppose that the enyo.de
administrator discovers that he cannot handle the growing mail load anymore, and therefore migrates mail handling to the ISP's mail server, mail.lf.net
. The SPF record is updated accordingly, so that mail.enyo.de
still can be used as an outgoing mail relay:
enyo.de. IN TXT "v=spf1 a:mail.enyo.de -all" enyo.de. IN MX 10 mail.lf.net.
However, when you do this, your customers will soon start to phone and complain to you that they cannot send mail anymore to some destinations. What happens? It turns out that due to the caching nature of DNS, resolvers at some store the old SPF record in their caches, but not the MX records, When an SPF-aware MTA receives mail from enyo.de
, it tries to obtain the MX record for that domain from its local resolver. The resolver queries one of the authoritative name servers for enyo.de
and obtains the new MX record (which already points to mail.lf.net
). As a result, the SPF check fails, and the message bounces.
This is just one example how short-term DNS inconsistencies can lead to temporary SPF failures and bouncing of legitimate mail. If SPF records contain reference out-of-zone records, the synchronization problems get worse, and you may also have to deal with issues which are the result of crossing an administration boundary (for example, records in the target zone may be modified without prior consultation).
redirect
, include
, and dual-type records The behavior of the redirect
modifier and the include
mechanism provided by SPF is particularly annoying: Because of the synchronization problem, the referenced record may not yet be available. The SPF algorithm yields a None, which is translated to a PermError, which in turn is historically translated into a SoftFail. There is some disagreement between the specification and existing practice what SoftFail means (see the separate section below); most sites translate a SoftFail to a Fail, which results in bounces.
If you think the redirect
/include
behavior is truly obnoxious, here's the real howler: If you publish SPF data in both SPF (type 99) and TXT DNS records, you cannot immediately change SPF data without risking an inconsistency between the two records of SPF and TXT type. Again, the SPF algorithm yields a PermError, which finally leads to bouncing of legitimate mail.
Several rules can help to avoid the synchronization problem:
Never publish both SPF (type 99) and TXT record types under a single domain name.
Do not reference any other DNS records (SPF or not) from your SPF records. Only list explicit IPv4 and IPv6 prefixes. When migrating or extending your mail system (changing its IP addresses), change your SPF records to include the new address ranges a few days before your actually implement the changes. Keep the obsolete address ranges for a couple of days after the migration. (This is necessary to deal with DNS propagation delays. Some sites store records longer than the TTL permits.)
When you do not follow the first rule, you can temporarily publish a very permissive SPF record for your domains, for example "v=spf1 all
". When you activate this SPF record a few days before and after the migration, mail will not bounce. (Marc Haber kindly offered this solution.) You still must be very careful when you publish SPF records of both SPF (type 99) and TXT type, though.
Do not use the redirect
modifier or the include
mechanism. The None/PermError/SoftFail/Fail issue makes it too fragile.
If you must reference other DNS records from your SPF record and cannot follow the rules above, never change any DNS records, but create new records under a different domain name. (Obviously, this can only be implemented for A, AAAA and redirect
ed/include
d SPF records, not for MX records or domain SPF records.) Negative caching (RFC 2308) may still lead to problems when you use this approach, unless you add the new DNS records some time before you change the referencing SPF records.
If you do not follow any of the rules above, you have to lower TTLs before the update and carefully remove, add and replace DNS records so that no temporary inconsistencies are created.
An example how to apply the last rule for a simple update is given in the next section.
Suppose that you currently publish the following DNS records for enyo.de
:
enyo.de. 172800 IN TXT "v=spf1 mx -all" enyo.de. 172800 IN SPF "v=spf1 mx -all" enyo.de. 172800 IN MX 10 mail.enyo.de. mail.enyo.de. 172800 IN A 212.9.189.167
You want to change the IP address of mail.enyo.de
from 212.9.189.167 to 212.9.189.169. So the new configuration should look like this:
enyo.de. 172800 IN TXT "v=spf1 mx -all" enyo.de. 172800 IN SPF "v=spf1 mx -all" enyo.de. 172800 IN MX 10 mail.enyo.de. mail.enyo.de. 172800 IN A 212.9.189.169
The necessary steps are outlined below. The change procedure begins on Monday morning.
Monday morning, 9 o'clock AM: Low the TTLs of all records that need to be changed to a low value, say one hour (3600 seconds). Smaller values do not make a real difference, due to overzealous caching. Also drop the SPF record (but only the type 99 version).
enyo.de. 3600 IN TXT "v=spf1 mx -all" enyo.de. 172800 IN MX 10 mail.enyo.de. mail.enyo.de. 3600 IN A 212.9.189.167
Monday morning, 9:15 AM: You have manually checked that all name servers for enyo.de
publish the TXT, SPF and MX records with the lower TTL. No further changes are possible until Wednesday.
Wednesday morning, 9:15 AM: Two days (or 172800 seconds), have passed and it's reasonable to assume that the lowered TTL has propagated to all caches, and that the the SPF (type 99) record has expired everywhere. Change the SPF record to a temporary version which includes both the old and the new IP address:
enyo.de. 3600 IN TXT ("v=spf1 ip4:212.9.189.167 " "ip4:212.9.189.169 -all") enyo.de. 172800 IN MX 10 mail.enyo.de. mail.enyo.de. 3600 IN A 212.9.189.167
(The parentheses have no special meaning; they are just there to spread the TXT record across multiple lines.)
Wednesday morning, 9:30 AM: You check that the zone update has successfully propagated to all authoritative name servers.
Wednesday morning, 10:30 AM: At this point, the old SPF record should have expired on these caches. However, some resolvers cache for extended periods of time, so it is better to wait two more hours. (This is the reason setting a lower TTL in the first step does not make much sense.)
Wednesday afternoon, 12:30 PM: The new SPF record is available from the majority of resolvers. You can begin to send mail from the 212.9.189.169 address. Change the A record to its final version:
enyo.de. 3600 IN TXT ("v=spf1 ip4:212.9.189.167 " "ip4:212.9.189.169 -all") enyo.de. 172800 IN MX 10 mail.enyo.de. mail.enyo.de. 172800 IN A 212.9.189.169
Wednesday afternoon, 12:45 PM: The zone update has arrived on all authoritative name servers.
Wednesday afternoon, 3:45 PM: The new A RR is available at most resolvers. You can now change the TXT record to its final version:
enyo.de. 172800 IN TXT "v=spf1 mx -all" enyo.de. 172800 IN MX 10 mail.enyo.de. mail.enyo.de. 172800 IN A 212.9.189.169
Wednesday afternoon, 4 o'clock PM: The zone update has arrived on all authoritative name servers.
Wednesday evening, 7 o'clock PM: The resolvers cache the new version. It is now safe to add the SPF (type 99) record:
enyo.de. 172800 IN TXT "v=spf1 mx -all" enyo.de. 172800 IN SPF "v=spf1 mx -all" enyo.de. 172800 IN MX 10 mail.enyo.de. mail.enyo.de. 172800 IN A 212.9.189.169
Wednesday evening, 7:15 PM: Check that the authoritative name servers serve an updated zone which includes the SPF (type 99) record.
Some sites may cache DNS records for even longer periods of time. If you know from experience that your customers send mail to such sites, you must increase the three hour delay to a more extended period time (maybe even a day or two). This means that the record update procedure can take a week before it is complete.
Keep in mind that negative caching of DNS records is common (RFC 2308). This means that you have to insert appropriate pauses before you can reference new DNS records directly or indirectly from SPF records. As described in the RFC, the TTL for negative caching is encoded in the SOA record of the zone.
SPF is supposed to be the Sender> Policy Framework. However, if you start using SoftFail (the "~
" modifier), you delegate most of the policy decision to the receiver because the SoftFail specification is highly ambiguous:
A "SoftFail" result should be treated as somewhere between a "Fail" and a "Neutral". The domain believes the host isn't authorized but isn't willing to make that strong of a statement. Receiving software SHOULD NOT reject the message based solely on this result, but MAY subject the message to closer scrutiny than normal.
This sounds like a great way to introduce your SPF for your domains without taking too much responsibility, and many sites use a final "~all
" term. However, most receivers treat a SoftFail like a Fail and bounce the message. This is particularly annoying because historically, PermError (which, despite its name, is also signaled for temporary errors, as explained above) is automatically mapped to SoftFail.
In my opinion, the update process described in the example is far too elaborate. The list below contains the necessary changes, and a proposal to resolve the SoftFail ambiguity is included.
If a zone publishes records of both SPF (type 99) and TXT type, the SPF record overrides the relevant TXT records. No longer signal PermError if their contents does not match.
PermError must only be signaled for syntax errors.
TempError must be used when an error is detected that can be the result of a short-term DNS inconsistency. If these errors persist, implementations may treat these errors as permanent. (Similar to what is current practice with SMTP delivery.) For example, if the identifier
mechanism or the redirect
modifier references a nonexistent domain (or a domain without any SPF records), TempError must be signaled instead of PermError.
Consider abolishing SoftFail. SoftFail cannot be used to express sender policy. At the very least, the specification must indicate that most SPF-enabled sites treat SoftFail as Fail, contrary to what the specification says.
After these changes, the example update is somewhat simpler. You still have to lower the TTL on the SPF records, but you need not remove the SPF (type 99) version. The remaining complexity is inherent to the way the Sender Policy Framework works. You can easily avoid it by only listing IP addresses in the SPF records. With the proposed changes, this is facilitated because the redirect
modifier works reliably, even without complicated update procedures.
2005-08-11 13:00: published
2005-08-11 14:00: Corrected the example so that it is syntactically correct, but it still misses the point (thanks to Scott Kitterman).