Lies, Damned Lies, and Mime Types
If you’ve ever had the joy of building EPUB files and struggled with the cryptic error messages produced by epubcheck, you might find this interesting. I was having a number of facepalm moments the other day while dealing with an image icon for a Twitter user. My new Twitter friend @longreads is giving me a steady stream of wonderful content to read. And yet, I was having an awful time trying to figure out why epubcheck was rejecting the longreads Twitter icon as having the wrong mime type.
Here is the url for the icon . It has a “.jpg” extension which suggests that its mimetype is “image/jpeg”. If you download the file and open it in an image viewer app like the Preview app for Mac, it can tell you that it is actually a PNG file. In my automated workflow, this little time bomb silently waits until the EPUB is created before it announces itself during epubcheck validation.
I was previously using the file extension to determine the mime type, and that was obviously not going to work in this world of deceptive file names. Moreover, there are image urls that do not have any file extension. So, I tried to get clever and check the “Content-Type” header in the response when downloading files. However, I even found that this was not always correct.
I did some research on existing open source tools (Java and Python-based) and it is surprising how many use the file extension as the main determinant. And the tools that actually read the file header were said to be buggy. So, it dawned on me later that I should just look into the epubcheck source code and find out how it was reading image types.
Here’s the source code for BitmapChecker.java in the epubcheck source code. As is, it is not designed to be used externally so I created a copy and compiled it into a command-line tool like epubcheck. It is called MimeCheck and you can run it like this. It reads the file header returns the mime type.
$ java -jar mimecheck-1.0.jar twitpic.jpg
That only works if the file is in the same directory as the mimecheck-1.0.jar file. You can also provide the absolute path and it works the same. Here’s a download link for the jar file if you are interested in using it.
Let me know if you have any questions or if you need the source code. I think I have a few other EPUB-related tools that I can contribute and perhaps turn this into an open source project on Google code. That depends on whether anyone else finds it useful. I realize that not many people need to do automated builds of EPUB files.