Tuesday, August 12, 2008

Making PDF documents readable on a digital book reader

In a strict sense, PDF is not, never has been and never shall be an eBook format, one of the prime reasons for this is that a PDF is not reflowable.

So what does is mean for a format to be reflowable? It means that if you open the document on a PDA or another device with a viewing screen, the layout and text of the document/book will automatically fit the screen. Therefore, PDF being a fixed format, doesn't fit (pun intended).

A lot of different hacks have been floating around the mobileread forums, some have even produced their own python based scripts. One of the many solution that got my attention was the simple and elegant algorithm which Huang Ying (AKA "Caritas") implemented.


1. Convert pdf to image. I use pdftoppm of xpdf. Such as:
pdftoppm -r 180 -f 245 -l 245 -gray -aa yes a.pdf a
2. Analyse the generated images. Break page into lines.
3. Divide each line long enough to two segments.
4. Rearrange the segments into a new page, with half of the width.



The current latest version is pi v0.6 which slices the document and sets is together into a single resliced pdf. Quite handy I daresay. The tools are posted in this thread.

Other solutions apply a common method of cropping the pages so that a minimal amount of useless document area occupy the screen. Such processed PDFs are often read in a landscape layout and with a "fit to width" format enabled. One such script can be found here , courtesy of "Hanselda"

In retrospect, I must say that you can't blame the different eBook readers for their inability to make the PDFs "readable", however, they should be punished for not having a proper zoom. The PDF format is rigid, so it's not fair to expect a reflowable PDF document once you download it into your reader. But this is also a key point to why all serious publishers should quit selling PDF versions of books unless they are reflowable.

Conclusively:
Don't buy a PDF eBook before checking it's reflowability, if you have loads of PDFs you like to read, like papers on Category Theory, Monads or whatever, don't give up! Most of us sitting with technical PDFs know they become grossly distorted when converted to something else, like Html or whatnot. But there are tools and various methods for making them readable. And surely I believe reflowable formats will become increasingly popular. Try out "Caritas" tool, I'm sure you'll like it.


/Gf

No comments: