Book Production the Hard Way

Over the past few weeks I’ve been working to put a large project to bed. What started when a potential client contacted me about producing an ebook ended with the development of a combined ebook and PDF build system that saved a bunch of time when last minute changes were needed.

This project built on what I learned from a previous project where I paired an overly-complicated makefile with Pandoc to generate an ebook in French. This time, the project was in English which made it much easier to tailor my build system to the project. Since final page count of the PDF (6″×9″ trim size) came to just over 400 pages, I needed every shortcut and hack I could come up with to keep the ePub and PDF versions in sync.

Along the way I managed to have Pandoc, Make, Sed, Ruby, and LuaLaTeX all playing nicely together when I gave the command to make all.

Client needs

The book under examination is The Oasis of Insanity by Allen Barton (Kindle, Paperback). The first part is a memoir of how Allen became the owner of the Beverly Hills Playhouse acting school. The second part, rounding out the last two thirds, details exactly what an aspiring professional actor needs to learn in order to become, survive, and prosper as a working professional actor.

Sometimes being on the production side of a project means working on something that would never cross mind when contemplating a list of things to write about. This is one of those projects.

In the past I’ve blogged about making ebooks using the Ulysses writing app. One of those posts has a simple contact form for those not inclined to do it themselves.

For simple projects, abusing Markdown and exporting with my template will produce an ePub file that’s better than most of what’s seen on the Kindle store. (A majority of which seem to come from a Word DOCX export run through Amazon’s auto-mangler judging from questions posted to the support forums. /shudder)

The downside of auto-conversion shows up when a print version is also needed. If their aren’t any other options, at least the auto-mangler is available to authors. But the output will be sub-optimal at best. There will be words printed on the page but it’ll look like another home-office print job.

The only way to tell Amazon KDP Print/CreateSpace exactly what’s wanted is to supply a designed PDF. Which led me to add LaTeX to my toolchain. (Adobe InDesign was out as I wanted the same source files to feed both versions.) This also let me programmatically generate both versions on demand.

In this project, the ePub and a PDF were all Allen cared about. (the cover was handled elsewhere). He handled edits and approvals while I made books. From his end the process was quite opaque: Put exported text into the Dropbox and get ePubs/PDFs out of the Dropbox.

The rest of this story is about what happened in the meantime.

The toolchain

One thing that made this project more involved was that we agreed to start production before the final revisions were in. This was to reduce the time needed to get the final versions out the door. My thinking was that if I could nail down all the formatting ahead of time, the less Allen would have to wait once he finished editing. I envisioned “pouring” the finalized text into my build process (affectionately nicknamed The Chopper) that had been fine-tuned during the edit phase, and having the final files pop out ready to upload. For the ePub, this worked as planned. The PDF took much more fiddling with.

I started with the ePub, as it was a one-step conversion to get there from the Markdown that Allen supplied. This was the format we used to get most of the copy editing done. Along the way I did make a non-styled PDF so he’d have something to mark up. But the ePub got most of our attention until the editing was done.

The ePub was built with the help of a makefile that kept track of everything and saved me from having to remember all of the Pandoc options. Pandoc itself did the heavy lifting of converting the Markdown to a valid ePub v3.0 file. Along the way a Ruby script became a preprocessor for Pandoc. It took care of making a few changes to the text that needed to be done before Pandoc could do its magic. To make the PDF I wound up using two scripts like this. One to prep the files for Pandoc and another to clean them up for LuaLatex.

An overview of the production process.

While Pandoc can be a viable front end for the LaTeX family of tools, it fails when multiple typesetting runs are required to produce an index. So the makefile handed the build over to latexmk in order to ensure things were properly compiled. (The final compilation took four typesetting runs!)

While the software was chugging along, I was reduced to a spectator waiting for the new files to fall out the other end.

The process

The original text came to me in the form of a single-file Markdown export from Ulysses. While it was plain UTF-8 text, it wasn’t standardized in any way. To avoid trouble down the line, I had to clean up things like inconsistent spacing, floating tags (Pandoc will get confused if it can’t tell where a tag opens or closes), header spacing, and the “de-education/dumbifying” of the smart punctuation. That last item might seem odd given how much effort is put into making sure the conversion happens in the opposite direction.

With standard plain quotes and ellipses I had control over how they would be converted. To make the ePub, Pandoc swapped them back to curly quotes and “one character” ellipsis. But for the PDF it wrapped the quoted text in the \enquote{} command from the CSQuotes package (the ellipsis were converted to the \ldots{} command). With curly quotes in the text there was no way for the second conversion to happen. I went this way so that the usual TeX ligature of double tick mark quoting style wouldn’t get separated and I wouldn’t make the mistake of putting the wrong curly quote in the wrong place (a very real and embarrassing typesetting mistake). Also having all straight quotes makes it easier when searching and replacing.

Pandoc can convert Markdown into TeX without a template but to get the proper CSQuotes output, it has to see that the CSQuotes package is loaded. To make Pandoc behave I made a simple LaTeX template consisting of only two lines.

\usepackage{csquotes}
$body$

This gets the needed conversion done with a minimum of clutter. It helps that Pandoc’s template handling is mostly brain dead and doesn’t check for valid TeX documents.

I still needed a cleanup script to remove the first line so that the enquoted files wouldn’t choke LuaLaTeX. The makefile took care of running that post-processing script.

Now that there was a valid path from Markdown to TeX I only needed to write a TeX file that set up the parameters for the book layout and have it include the converted files in the right order.

The scripting

One sticking point was formatting the part headers so they’d work with both kinds of output. For this I used a one-line style in the Markdown files like this:

# PART ONE Some Useful Text

For the ePub, the script split into:

# PART ONE

Some Useful Text

Which made for a nice part break that allowed styling the header and text separately. In LaTeX, the name of the part is added automatically and the script removed the PART ONE from the line. With the titlesec package, styling both the header and text was relatively straight forward.

One thing that didn’t need any adjustment was the index. Pandoc will ignore LaTeX commands when converting from Markdown. So I was able to index the source files and not have to worry about side effects in the end result.

Conclusion

This article only touches on the high points. It omits some of the eye-bleedingly tedious aspects of makefile programming and how sometimes only threats of physical violence will make LaTeX cooperate.

As the programming became a bigger part of the project I started to wonder if it was worth the effort. My view shifted around the time the makefile stopped needing adjustments. Then I could focus on edits in the Markdown file, maybe tweak a post-processing script, and see the results show up in the PDF.

Then I knew the time was well spent.

One unexpected benefit was how I could make adjustments to the generated .tex files to verify the change then either add it into the Markdown or the post-processing script. Once the change was “upstream,” I would delete the TeX files and make them and the PDF again from scratch. This made sure the changes were correctly baked in. While not strictly necessary, it allowed me the piece of mind of knowing no edits would slip through the cracks.

This payed off like a broken slot machine when the day after I delivered the final files. Allen emailed to say he found one last typo. It was a simple matter of changing “met” to “meet.” With one correction to the Markdown file the makefile rebuilt the corresponding .tex file and spat out the corrected ePub and PDF. (Allen only spotted this typo in the PDF, but the Chopper made sure both got fixed.)

Along the way I learned a lot about the software I was using and how they worked together. With a little more work I was able to generalize the Chopper into a more universal system. One that removes most of the configuration hassles and lets me focus on design and writing.

This also means that as long as I have access to my makefile and setup script along with Pandoc and a TeXLive distribution, I can carry my publishing empire on a flash drive.

In the end, doing it the hard way made the easy way possible.