SPF Internationalization
For ordinary ASCII mail addresses the I18N considerations for SPF are rather limited. When FAIL results for policies using -all or similar are rejected, as they should, the HELO identity or MAIL FROM (envelope sender) address was most likely forged, and spammers won't insist on an SMTP reply in their language.
If the FAIL rejection was caused by an erroneous policy for legit mail the policy could offer an explanation with the SPF exp= modifier pointing to a DNS TXT record. The explanation can include an URL, in simple cases it is an URL, controlled by the publisher of the erroneous policy. Simplified, if an explanation is or contains an URL make sure to use the upper case form of SPF macros for the proper URL-encoding. That still buys you nothing wrt I18N if the URL points to the Why-form on this site.
Similar why-services elsewhere could use http language negotiation and/or offer links to translated explanations for legit users of the relevant domain.
What you should not do is to create a TXT record with an explanation using non-ASCII characters. This won't work as expected for the most important case:
- Your user sends MAIL FROM your domain to somebody.
- The receiver forwards it to a third party without stating MAIL FROM receiver, i.e. keeping MAIL FROM your domain as is.
- The third party checks your policy, and gets a FAIL, your policy doesn't permit any IPs of arguably clueless forwarders.
- The third party rejects FAIL as it should, optionally adding the explanation offered in your policy with an exp= modifier pointing to your TXT.
- The forwarder has to create a non-delivery report (bounce) to your user for the rejected mail, and expects to have gotten an ASCII-reply. That's a limitation in SMTP, the reply can only be ASCII.
- The SMTP reply from the third party somehow ends up in a part of the non-delivery report, and at that point the only safe bet is ASCII.
IDNA
Internationalized domains will soon be popular. As far as SPF is concerned the rules are simple: Use the ASCII A-label form of domains, also known as LDH-labels as used in SMTP, in SPF policies. Any valid Unicode U-label has a corresponding A-label starting with xn--, ignoring cases where an U-label happens to be already in LDH-form.
SPF offers the controversial feature of per user policies, and similar tricks based on the %{l} macro for the local part of an address. In simple cases local parts are in LDH-form, consisting of one or more dot-separated A-labels, but generally local parts can contain other ASCII characters not limited to LDH. This is no serious problem for SPF, because DNS in fact allows any octet (hex. 00..FF), including ASCII (hex. 00..7F).
There are however various traps and pitfalls. In theory local parts are case-sensitive, SMTP explicitly makes an exception for the 1024 case-sensitive variants of PoStMaStEr. A-labels are in essence case-insensitive, if the local parts aAa and AaA should get different SPF per user policies it won't work.
Local parts can be quoted strings "like this"@example, SPF implementations likely strip the quotes arriving at a label like this with an embeded space. After that only one thing is sure, DNS permits it, therefore SPF in theory also permits it. From there the situation deteriorates, what about ".some..dots."@example, or quoted pairs as in "\b\a\c\k\s\l\a\s\h"@example ?
The problem of impossible labels derived from local parts even made it into the RFC 4408 errata, and the SMTP specification offers a clear SHOULD NOT for quoted strings, let alone quoted pairs. Please note that control characters cannot be used in local parts of mail addresses, for SMTP all potential problems are limited to hex. 20..7E including case-sensitive letters.
In theory DNS can deal with embedded dots within a label, as it can deal with any other octet, but admittedly SPF forgot to specify this oddity. Implementations treat dots as label separators, adjacent dots would be interpreted as zero-length label, and that cannot work, the only zero-length label is the DNS root corresponding to the often omitted single trailing dot of a fully qualified domain name.
Likewise overlong labels with more than 63 octets derived from a long local part cannot work.
Email Address Internationalization
The IETF EAI WG started an experiment with I18N for email addresses. For SPF the relevant part of this experiment is the UTF8SMTP extension of SMTP. As noted above SPF ignores local parts unless they are indirectly referenced in (variants of the) %{l} local part macro.
Where local parts are used in SPF they are used "as is", raw octets - stripping any quotes from quoted strings, hopefully also removing backslashes from quoted pairs within quoted strings, and likely exploring dark corners in some SPF implementations.
Local parts for EAI use UTF-8. As far as SPF is concerned octets are octets, and DNS can handle any octet. In other words SPF implementations should be able to handle EAI local parts "as is" with the same limitations as noted above for ASCII local parts, adjacent dots and overlong labels won't work. Please note that 63 octets could correspond to say only 21 UTF-8 characters depending on the used Unicode points.
Unfortunately EAI including UTF8SMTP also allows UTF-8 for the domain part (right hand side) of addresses. Unfortunately for SPF - mail users in parts of the world where the Latin script, let alone its ASCII subset, is unusual might love it when this works.
For SPF implementations this means that "something" has to transform U-labels into A-labels, and that means "punycode" among other things. As soon as the A-label format is clear everything works again "as is". Especially SPF policies stick to the known format talking about LDH-labels or octets hex. 20..2D and 2F..7E. Octet hex. 2E for dots is only used as label separator as noted above, and for some special characters such as the space SPF macros have to be used when they are needed within an SPF policy, but this is typically never the case.
Domains wishing to participate in the EAI experiment can in theory bypass potential SPF problems with U-labels in envelope sender addresses, MSAs could translate U-labels in a MAIL FROM address to the corresponding A-labels on the fly. But for a robust solution SPF implementations should support this also, see the discussion in an SPF EAI Internet Draft.