Friday, August 29, 2008

Project page for the Unified Ebook Search Engine

I have created two pages for the UESE, one main page and one tools page which should receive continuous updates as the blog.

The idea is to gather some kind of "face" outwards so that ebook sellers can find the page and read about this, without me having to explain everything each time I contact a seller or a seller contacts me.

First beta version of the Unified Ebook Search Engine client released

The first version is released, it has it's own wiki page here, as soon as all sellers which have shown considerable interest are done with their implementations of the API, the program will be quite nice to use ;)

I will start working on a search page for the less command line interested users, and of course continue to perfect the client :)


Tuesday, August 26, 2008

Project layout

You can find a coarse flow layout of the client here.

Friday, August 22, 2008

Unified Search Engine powered by Witsbits

The great guys over at Witsbits agreed to sponsor the Unified Search Engine project by providing storage and site hosting! Since this is one of the major concerns for this kind of projects, where hardware is one of the major issues.

This actually reflects why cloud computing is the future of on demand resources.
Well, I will start coding on the client pronto!

Almost forgot: The java client will have a separate name from the online search engine, for those of you interested in the naming process: Go here

Thursday, August 21, 2008

API specification complete

The specification is now complete and has been mailed to the interested sellers, some have not replied yet, but I am keeping tabs on everyone :)

A client program will start taking form soon, and also a website. If I am lucky, maybe the kind guys over at will help me out with a small storage to keep the project and the site. After all, it's a cloud computing system with not only a cheap price, but also reliable systems ;)

Hang tight!

Tuesday, August 19, 2008

Unified eBook Search Engine - progress report 2

So, the replies from my mass request continue to pour in, a bit slowly I have to admit, but it shows that there is interest among the sellers. A lot of repliers informed me that they would go on vacation until the 1st of September, it's really normal and I didn't even think about it!

Meanwhile, I have finished the specification, was not so much about it, really, I might start tinkering with a Java/Haskell program and a page, even a C program if I'm in the mood. Maybe I drank too much coffee today, or maybe my wife passed on the cold she has, but I'm not feeling well.


Friday, August 15, 2008

Unified eBook Search Engine - progress report 1

Well, the name is surely only the project name ;) I have so far only got positive responses from the eBook sellers (hereafter referred to as the "alphas"). I am currently thinking about the API specification, it should be strong, reliable and simple. Strong in the sense that a lot can be achieved with it, reliable in the sense that adding functionality doesn't enforce major changes in the systems using it, and simple in the sense that hooking up new alphas is no more than a matter of registering an input source and search source.

If this goes well, I might also consider creating a stand-alone java command-line program similar to another client program I have written which also utilizes a HTTP driven API (including WebDav). The gain of having such a client program is that people interested in eBooks can easily script their own commands, for example it could look like this

zenon@matrix> ebooksearch --mobipocket Thief of time

fictionwise Thief of Time [Discworld #26]
fictionwise A Thief of Time [Navajo Tribal Police Series #8]


zenon@matrix> ebooksearch --format mobipocket adobe --title Automata

ebookmall Automata Theory with Modern Applications { Adobe Reader }
ebookmall Finite Automata { Mobipocket }
ebookmall Identification of Cellular Automata { Mobipocket }
mobipocket Finite Automata { Mobipocket }

there are so many things one could do with such a tool.

The general contact progress will mainly be posted here.

Thursday, August 14, 2008

Sustainability of eBook readers discussed

By sheer incident, I managed focus a discussion on the sustainability issues around eBooks. Many interesting points are highlighted, please; join in , read and discuss!

Unified Search Engine for eBooks

Failing to find an isomorph to pricerunner for eBooks, I have started my own pet project on this. I started out by collecting email addresses to the eBook sellers i found from the Wiki page at Some sellers (of course) had no email address, but some lovely amusing forms which had lovely obligatory fields and some toppled this with hilarious obligatory registration before you could contact them.

The idea is basically that each eBook seller creates their own API which should follow some unifies guidelines, like: Using POST method, you should be able to post some server of theirs, requesting a lookup. They reply in a unified XML format which is stitched together by a simple page (with some bells and whistles).

This has just begun, and I am sure a lot of people will have valuable suggestions and comment. This is a copy of the request/email sent to all the sellers.


Over at we are discussing the possibility of a (free to use) unifying eBook search engine
which would allow a user to search all possible ebook stores and list the results with regard to price and availability.
Something along the lines of price runner etc.

Are you interested in this? Would you consider to produce a web-based API for your database which would be usable for
this so that someone could create a search tool / site for the named purpose? If you are interested, your API should
be fully documented and be usable through the standard HTML methods like POST. The replies should be in a well formed
XML format. I will be the spider in the net, so to say (no pun intended) for now.

This email is not only sent to you, I have also sent this to the others listed below (yes, your
name is there also), If you who receive this email is not the person which can take this kind of decision, then
please make sure it gets to the right person. Also only reply to my email address: (--not shown here--) with
the Subject: "Unified search engine", it will make it easier for me to organize this.
Diesel Ebooks
Book Habit
Taylor & Francis
WHSmith eBooks
BLTC Press
Books On Board
Club Lighthouse Publishing
Ebooks about everything
Powell's Books

Ps. Some of your names may be misspelled, I apologize for this.

I have also announced this on the mobileread forums

For those of you wondering what this has to do with optimization; it's a search optimization for the individual. ;)


Tuesday, August 12, 2008

Academic eBooks

I have bought two academic titles from separate eBook sellers, and feel obliged to share the different sources of academic eBook titles I have found after several hours of press release grinding, link hunting and general coffee-drinking.

Taylor & Francis
Taylor & Francis has quite a healthy supply of books on the Computing & Information Technology section. The formats are Mobipocket, PDF and Microsoft Reader, the prices on the CS books are a bit high.

Cambridge University Press
Cambrigde eBooks has a healthy supply of books on Mathematics and Computer Science, however, I must regrettably say that as far as I have seen, all books are published as PDF's. Prices are good (low), alas, let us hope they start selling the books as Mobipockets.

Springer Link
Every well educated computer scientist ought to have a springerlink book in their library :), if you want one in your eBook reader, this is your source. Downside is: price is high, and the format is PDF.

I only found these interesting enough, there are floods of academic university libraries out there, but they don't let you buy the books. I am hopefull that more will come, to enrich the bookreaders :)


Making PDF documents readable on a digital book reader

In a strict sense, PDF is not, never has been and never shall be an eBook format, one of the prime reasons for this is that a PDF is not reflowable.

So what does is mean for a format to be reflowable? It means that if you open the document on a PDA or another device with a viewing screen, the layout and text of the document/book will automatically fit the screen. Therefore, PDF being a fixed format, doesn't fit (pun intended).

A lot of different hacks have been floating around the mobileread forums, some have even produced their own python based scripts. One of the many solution that got my attention was the simple and elegant algorithm which Huang Ying (AKA "Caritas") implemented.

1. Convert pdf to image. I use pdftoppm of xpdf. Such as:
pdftoppm -r 180 -f 245 -l 245 -gray -aa yes a.pdf a
2. Analyse the generated images. Break page into lines.
3. Divide each line long enough to two segments.
4. Rearrange the segments into a new page, with half of the width.

The current latest version is pi v0.6 which slices the document and sets is together into a single resliced pdf. Quite handy I daresay. The tools are posted in this thread.

Other solutions apply a common method of cropping the pages so that a minimal amount of useless document area occupy the screen. Such processed PDFs are often read in a landscape layout and with a "fit to width" format enabled. One such script can be found here , courtesy of "Hanselda"

In retrospect, I must say that you can't blame the different eBook readers for their inability to make the PDFs "readable", however, they should be punished for not having a proper zoom. The PDF format is rigid, so it's not fair to expect a reflowable PDF document once you download it into your reader. But this is also a key point to why all serious publishers should quit selling PDF versions of books unless they are reflowable.

Don't buy a PDF eBook before checking it's reflowability, if you have loads of PDFs you like to read, like papers on Category Theory, Monads or whatever, don't give up! Most of us sitting with technical PDFs know they become grossly distorted when converted to something else, like Html or whatnot. But there are tools and various methods for making them readable. And surely I believe reflowable formats will become increasingly popular. Try out "Caritas" tool, I'm sure you'll like it.


Saturday, August 9, 2008

Handy links and firmware upgrade

I upgraded my Cybook Gen 3 firmware from V 1.0 to 1.1 today (just a minute ago), and I must say it was painless. I got the information about the new firmware from The official Bookeen blog and the new firmware from The Support Page which requires you to have a login.

I just downloaded the firmware update, put it on the SD card, rebooted the device, following the very simple instructions and voila. Done.

I hope a lot of firmware upgrades will follow!


Academic eBooks

Having been the owner of the Cybook Gen 3 for some days, I have fallen completely in love with the ease of buying an eBook. Books not available in my city, which could have taken 2-3 weeks from the time of ordering, are at my finger tops just within minutes!

All kinds of books are available in the Gen 3 supported formats, novels, science fiction, love, etc. However, the category (no pun intended) of books that I seek, stay unseen. I am talking about academic books, books on Category Theory, Support Vector Machines, and other parts of Computer Science. Searching the net desperately, I found some eBooks on Finite Automata and Cellular Automata. However, other sciences seem to be over represented in the set of eBooks.

By a sheer coincidence, I found Taylor & Francis, an eBook store specializing in Academia. Here, I also observed that Computer Science is gravely under-represented. In total, the catalogue of academic titles is quite thin and should be expanded.

I will try to contact some of the publishers and ask them if it's possible to buy their books in Secure Mobipocket format.

There are far too few academic titles available in eBook format, more should be added, given that academic books tend to range 400 - 900 pages, it would be a considerable improvement in terms of sustainability to keep these pages digital.


Friday, August 8, 2008

PDF support is ... not impressive

After buying "Making Money", I also downloaded a set of different papers on Arrows , Category Theory, Lambda calculus and of course, Philip Wadler's excellent paper on monads (to honor him) into the Gen3.

Of course, one expects to be able to zoom if the pdf doesn't fit well. This is what I thought, but alas; no. The pdf in question was not readable unless I used an electron microscope, even if I set the layout to "Landscape" and "Fit to width", searching the documentation for zoom availability, it is never mentioned how you zoom manually. After some time, I resigned, accepting the harsh fact that there is no manual zoom levels available for the Gen3, what is referred to as "zoom", is actually the three different fits "Fit page, Fit to width, Fit height", which can be used in combination with "Landscape" and "Portrait", giving a total of 2*3 = 6 different "zooms" (most being equally or worse unreadable with a PDF).

A bit shocked by this, I surfed in on the Bookeen page, and found the forums, doing a quick search on "PDF support" or simply "PDF", reveals yet a shocking fact! Most readers out there seem to have the same problem! So, the solution?

Cropping. Simply remove all the white edges on your documents, (another forum post suggested converting the whole PDF into images), I must admit I haven't tested that, since the cropping method worked fine for me. Of course, you must have some program for this, but Adobe Acrobat Pro will suffice perfectly. On linux, imagemagick's convert should work as well. There are surely hundreds of different solutions to this, but It did the trick for me; fortunately I had Adobe Acrobat Pro in Windows Vista, thus making this easy.

It sucks not to be able to zoom with different manual levels, it sucks that almost no portable eBook reader handles PDF in a descent way. Cropping is your answer for the moment, converting the PDFs into another format will only degenerate the PDF-file to something unreadable. This must be fixed in firmware releases.

Selecting Advanced -> Advanced Editing -> Crop Tool -> "Remove white margins" made the PDFs extremely more readable with Landscape -> Fit to width.


Thursday, August 7, 2008

Finally got my eBook reader

After quite some waiting, the ebook reader arrived. All specs can be found on the Bookeen page, this entry will only be my own personal experiences of this device.

As promised, the screen does look like paper, I must say it has a more grayish tone than normal cheap pocket paper, but it suffices perfectly. No reflection or problem reading characters. When connecting the Gen3 to the computer (I use both Windows Vista and Ubuntu) the device shows it's connected by by displaying a "Connected" text together with an image of the Usb connector.

The size is all okay, a little larger than a pocket, and smaller than an average paper back novel. For comparison, I shot the Gen3 besides Beta.
The device came with some free books, a manual and some demo books. I also bought an SD card of 2GB, it shows up in windows as a separate? device which has the same internal directory structure as the on board memory of the device. Disregarding this, I start by purging all the unwanted demo books and demo images. Unluckily, not all the demo books have a demo suffix (some files have a "demo" suffix which eases the job considerably), as for those wondering if there is some "Remove book" option inside the menu ... there is none (should be added). Thus, after the purge what remains is

Cybook Gen3 - User Manual
Hans Christian Andersen - Contes Merveilleux
Bach [musical notes]
Bronte Emily - Les hauts De Hurlevent
Don Quichotte (First book)
Bram Stoker - Dracula
Crime And Punishment
I Robot
Nouvelles Histories Extraordinaires
The Memoirs of Sherlock Holmes

Needles to say, there where some french texts in there, and they also had to go for the benefit of the freshly bought "Making Money from Terry Pratchett", as this is my first ever bought EBook, I must say this was extremely easy. A lot of ebook (re?)sellers can be found on the Mobipocket Ebase 1.0. I set for "Fictionwise eBooks" which seemed to have a reasonable price. (Amazon would be the first choice, but buying their ebooks in another format than "Kindle" format is obscured beyond recognition; honestly I don't even know if they have another format than "Kindle" and on that matter, why foul things up with yet another format?).

So, disregarding Amazon for their lack of clarity (and because Kindle is ONLY sold in the US; thank you so much!), I bought the book (which was extremely easy;

1) Choose the book
2) Select format [secure Mobipocket]
3) Pay.

Once the purchase is complete, I entered the "Bookshelf" section of my account. This is where I had to register with my Mobipocket Reader PID (only once; several could be registered)

4) Register my Mobipocket PID
5) Download the book to my computer

And this is where Windows started to get annoying (should have done this in Ubuntu), Vista suddenly would not find my USB devices ... (a restart later), I dragged and dropped the eBook into the eBooks directory (just to make it more interesting, I actually copied it into the SD-memory "eBooks" directory), disconnected the Gen3, and voila, the book is readable!

All my future books will be digital / eBooks, moving my library with me could not be easier, Amazon is obscure and the market of available eBook readers is surprisingly thin for the Europeans.


Tuesday, August 5, 2008

Sustainable reading

E-book reader, digital reader, portable book reader, whatever you choose to call it, it's portable, (preferably small and mobile) and should be able to handle at least raw text, PDF files , HTML and some type of e-book format.

I finally bought one, (I've been drooling over them since I first saw Star Trek where they have plenty of portable digital book readers to go around) and I'm still waiting for it to arrive.

I'll give a detailed description of how this device turns out, and also write a bit about the current state of affairs of books and sustainability.