Kris Coppieters is an accomplished software-engineer and coach. He designs and develops innovative software solutions. His forte: simple, stable, maintainable, flexible solutions for complex problems. A pragmatic approach allows him to achieve realistic, results-focused outcomes. This includes helping companies kick-start their automation projects and software-developer team-building. Kris’s teaching skills enable him to explain complex matters in plain language.
Kris Coppieters will join us at Tech Forum 2020 on March 24 for a session called Better problem solving through scripting: How to think through your #eprdctn roadblocks and script your way to efficiency.
If you're involved with restructuring and improving EPUB files, you're bound to perform a lot of global find-and-replace operations in the component HTML and CSS files.
An EPUB file is not much more than a set of files and folders compressed into a single ZIP file, with an .epub filename extension instead of a .zip filename extension.
Use advanced editing tools
One approach for search-and-replace is to use advanced editing tools like BBEdit, the Oxygen XML Editor, or something similar, which can directly interact with text files inside an EPUB file, without the need to un-zip and re-zip it.
When it comes to processing multiple EPUB files, you might still find you're performing the same set of search-and-replace operations over and over.
Crack it open
Another common approach is to 'crack open' the EPUB.
First un-zip it into a file-and-folder structure filled with HTML, CSS, and a bunch of other files.
Then perform a global search-and-replace on the unzipped folder structure.
Finally, re-zip the file-and-folder structure into a new, updated EPUB file.
eCanCrusher is one of the many free tools that can be used to un-zip and re-zip an EPUB file: Drag-drop an EPUB file and it un-zips; drag-drop an unzipped EPUB folder and it re-zips.
Regular expressions
For efficient search-and-replace, it pays to understand regular expressions (RE) and GREP (the acronym stands for globally search a regular expression and print).
Regular expressions are a 'semi-standard' way to tell the computer to search for occurrences of certain patterns in the text, and replace any matching patterns with something else.
Imagine that you need to search the text for the word 'placeholder', but whoever typed up the text indiscriminately mixed uppercase and lowercase letters: PlaceHolder, PLACEHOLDER, placeHolder.… What a mess!
The regular expression /placeholder/i will be a match any time the word 'placeholder' occurs in the text, without case-sensitivity.
Without regular expressions, you'd have to separately search-and-replace each and every variant.
Regular expressions are very powerful, but also very dense and terse. I think of them as 'Write Once, Read Never' things: I find I understand them while building them, but when I come back to my own work a few days later, I always have a hard time figuring out what it does.
Between different environments, you'll find that the features of regular expressions can vary somewhat: InDesign's RE are different from your text editor's RE, are different from PHP RE, etc. But the differences are mostly in minor details.
Search-and-replace
There are many ways to do search-and-replace on text files.
You can use a good text editor or you could use some text-processing script files.
If you only need to process one particular EPUB file and perform one particular set of search-and-replace operations, then a good text editor will be pretty efficient. You can do a 'global' search-and-replace and process all text files inside the EPUB folder in one fell swoop.
When handling multiple EPUB files, you'll probably find yourself doing the same find-and-replace operations over and over. In that case, a command-line script makes you much more efficient: A single command can process all relevant files in the EPUB, and can perform a whole host of search-and-replace operations in one go.
Such command-line scripts can be written in a whole host of languages, like Python, PHP, Perl, or JavaScript. There is no 'best' here. It mostly depends on what the script developer is comfortable with.
DropToScript
Enter DropToScript: This is a small public-domain application published by the BC Libraries Cooperative.
DropToScript handles a lot of the nitty-gritty of EPUB search-and-replace, and allows you to focus on the important bits: the regular expressions and their replacement patterns.
Behind the screens, DropToScript will handle the un-zip/re-zip of the EPUB and perform a set of search-and-replace operations.
You don't need to know how to script; DropToScript also comes with a ready-made 'template' script that handles all of the search-and-replace code. All you need to add is a list of regular expressions and replacement patterns.
DropToScript then allows you to simply drag-and-drop an EPUB file onto an icon. It will perform all the search-and-replace operations with a few clicks.
Kris Coppieters will be talking more about scripting and ebooks at Tech Forum on March 24, 2020 in Toronto. You can find more details about the conference here, or sign up for the mailing list to get all of the conference updates.
[Editor’s note: Due to the COVID-19 pandemic, this Tech Forum session was delivered online. You can watch the recording below.]
How to use CataList reports to keep track of new drop-in titles and changes to key elements that publishers make to their forthcoming titles.