Sunday, October 04, 2009

PDF to PS to PDF on OS X: How to Fix Those That Cause Errors

Apple's OS X generates anti-distillation blurbs in the PostScript files generated from "encrypted" PDFs. Remember prohibition, anyone?

The "encrypted", or locked down, rather, PDFs happen to be mostly everything these days. Forms that are meant to be fillable, bank account statements where you want to mark things up to reconcile accounts, etc. My most recent run-in with this stupidity was Anthem's and CompanionLife's insurance forms. I actually wish we didn't have to fill out, um, modify those, right? And surely it's every insurance companies' dream to get the forms back with my dreadful handwriting on them...

So, the pdfs are marked as protected from modification. OS X's otherwise excellent Preview doesn't ignore such marks when you print to PostScript. Thus, the resulting postscript files throw an error when you try to distill them back into pdf, say using ps2pdf14.

Upon inspection of the postscript files, you can see the eexec blurb, which can be decoded using ghostscript's decode.ps. The only useful part of the blurb is cg_md begin.


Thus, if you want to clean up your postscript files printed from "protected" PDFs , you need to replace stuff between mark currentfile eexec and cleartomark with cg_md begin. This can be done using this handy dandy utility:

#! /usr/bin/env python3
# copy a postscript file from stdin to stdout, removing

# Apple's ps-to-pdf "protection"
import sys;
inside = False
for line in sys.stdin:
    if not inside:
        if line.startswith("mark currentfile eexec"):
            inside = True
        else:
            print(line, file=sys.stdout, end="")
    else:
        if line.startswith("cleartomark"):
            print("cg_md begin"file=sys.stdout)
            inside = False


No comments:

Post a Comment