2022-07-25

Procmail and unicode headers

Letter

Procmail is an old tool, and the reason you're using it today is probably because you started using it many years ago, and changing to something else is not something you want to do. Procmail doesn't know about unicode, so unicode in headers is problematic. This post contains my solution to creating procmail rules that understand unicode subjects.

I wanted to sort emails into different mailboxes, depending on the subject line. I tried various approaches to deal with encoded unicode subjects in procmail, and ultimately ended up creating my own solution.

#!/usr/bin/env python
import datetime
import re
import sys
from email.header import decode_header
from email.parser import Parser


def proc_stdin(backup=True):
    mail = Parser().parsestr(sys.stdin.read())
    if backup is True:
        ts = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
        latest = f"backup_{ts}_latest.eml"
        with open(latest, "w") as fh:
            fh.write(str(mail))
    subject = mail["Subject"]
    if hasattr(subject, "replace"):  # Waaat
        subject = subject.replace("\n", " ")
    utf = re.compile(r"^(=\?UTF.*)", re.IGNORECASE)
    match = re.match(utf, subject)
    if match:
        dh = decode_header(subject)
        default_charset = "ASCII"
        subject = "".join([str(t[0], t[1] or default_charset) for t in dh])
        # Just output to stdout, procmail parses it, mail clients ignore it
        # The encoding is not right, but whatever
        print(f"Subject-Decoded: {subject}")
    print(mail)


if __name__ == "__main__":
    proc_stdin()

What this script does is fairly trivial: it reads the mail from stdin, looks if the Subject header has encoded unicode content, decodes it, and outputs a new header Subject-Decoded to stdout. You can test it like this:

cat Maildir/cur/the_email | ./headerscript.py

Once you're satisfied this works on your existing emails you can call it from procmail and update your rules as below:

# Decode unicode subject                                                                         
:0fhw                                                                                            
| /path/to/headerscript.py

# Sort mail with Important in subject to Important mailbox
:0                                                                                               
* ^Subject[^:]*: .*(Important).*                                
.Important/

The rule will match the Subject and the Subject-Decoded headers, actually, any Subject-foo headers. With this approach I could very easily update all my existing rules to be aware of unicode subjects.

Edit 2022-10: Procmail is supported upstream again, which is cool: https://github.com/BuGlessRB/procmail

Edit 2023-01: A reader reported UnicodeEncodeError problems, perhaps their system wasn't configured to use a unicode locale. Reportedly adding PYTHONIOENCODING="utf8" to their ~/.procmailrc fixed the problem.

0 comments

Reply

Cancel reply
Markdown. Syntax highlighting with <code lang="php"><?php echo "Hello, world!"; ?></code> etc.
DjangoPythonBitcoinTuxDebianHTML5 badgeSaltStackUpset confused bugMoneyHackerUpset confused bugX.OrggitFirefoxWindowMakerBashIs it worth the time?i3 window managerWagtailContainerIrssiNginxSilenceUse a maskWorldInternet securityPianoFontGnuPGThunderbirdJenkinshome-assistant-logo