RSS LJ

January 7, 2010

Interesting Topaz DRM development (, )

by fluffy at 12:48 AM
As I've mentioned in the past, I worked on Kindle. I think I've specifically said I worked on the Topaz format. If not, well, that's what I did on Kindle — I designed the Topaz file format and rendering/layout library, and did a lot of the work and problem-solving on the actual conversion process.

One of the (minor but important) parts of the Topaz format is, of course, the DRM, which has so far eluded being compromised, which is funny because it's actually a pretty trivial "secret-sauce" algorithm which was implemented under some pretty ridiculous constraints (I had limited time to implement it, wasn't allowed to pull in any external libraries, and had to keep it performing quickly without using much memory on an already-constrained device), and somehow it's eluded being cracked for a bit over two years.

Until now.

Earlier today, someone (who I will of course keep nameless) asked me about a bit of Python code (which I will of course not link to) that he'd found which ostensibly would strip the DRM from a Topaz file as downloaded by the KindleForPC app. I looked at it, and yes, it looked like a plausible DRM stripper; presumably it was developed by someone who had run a disassembler on KindleForPC. It did require being run on the same PC as the Topaz file was provisioned for, however. But of course, this enterprising experimenter did not stop there: he analyzed the (again, pretty trivial) encryption algorithm and found a weakness in it (one which I will not name, but which I was aware of as a possibility even when I wrote it), and after not too much time, he'd written a C++ program which would very quickly brute-force the underlying encryption key and completely strip the file of all DRM.

He said that he wasn't interested in releasing it himself (he mostly did it as an intellectual challenge), and for obvious reasons I won't be releasing it either (or even describing the nature of the exploit), but yes, Topaz DRM has been completely compromised at least once, and it wouldn't surprise me if someone else has also figured out the flaw in the algorithm.

I just want to say ahead of time (before everyone emails, IMs, etc.) me that I am aware that it's broken (and this is for real, not the many previous iterations of Mobi DRM being cracked that had nothing to do with Topaz) and also state in my defense:

  • I am pretty anti-DRM myself. Not that I went out of my way to make the DRM on Topaz fragile per se, but I was under enormous management and time pressures (as I stated above) and only did a minimum-effort job as a short-term solution with the intention of revisiting it later. I'm actually pretty shocked that nobody at Amazon improved the DRM since I left (or even made any changes to the file format at all, as far as I can tell); they didn't even bother to wrap KindleForPC downloads in an extra layer of DRM like they did for Mobi books (which is what the previous "KINDLE DRM CRACKED!!!" announcement was about).
  • I was expecting it to be cracked within weeks or months of Kindle's release, but it took over two years. Not bad for something that can be brute-forced in a few milliseconds if you know the secret sauce (and considering the filesystem had been dumped within days and people could have run a disassembler on it, why was the sauce a secret this long? IT IS RUSSIAN DRESSING, PEOPLE).
  • The primary weakness in any DRM mechanism is one with key exchange. Even if I had used a stronger encryption algorithm (which would have prevented the brute-force no-key attack, or at least slowed it down significantly), there's still the issue of keeping the device key secure (which is basically impossible).
  • Cracking Topaz DRM doesn't really make book piracy any easier. For starters, someone needs to get a copy of (i.e. pay for) the book, and Topaz books tend to be of more niche interest. The ones which are of more widespread interest tend to get pirated in other ways, such as teams of book pirates doing their own plain-text conversion from an actual book, long before the Topaz version even becomes available to begin with.
  • It is also a bit of a pain in the ass to put an unencumbered Topaz file onto any arbitrary Kindle-type device (Kindle, PC, iPhone/iPod Touch), and while this helps the various reverse-engineering efforts, the format is still pretty complex.
  • The big boon I see for the removal of DRM is purely in the legitimate customer's interest; people want to back up their Topaz books without having to reprovision them in the event that they change devices. Currently they have no legal way to do this, especially if a book gets retracted by the publisher (as in the famous and overly-ironic 1984 case).
  • People are fundamentally honest and good. They pirate only when the legal approach is too cumbersome. Witness how the iTunes music store and AmazonMP3 and so on continue to have plenty of sales even with illegally-traded pirated mp3s, which have certainly been more prevalent for much longer (and derived from plain CDs) than those services have been DRM-free.
  • There is still a pretty high barrier to people stripping the DRM from their Topaz books; even if someone were to release this app (which neither I nor the curious experimenter are going to do), you still have to do a lot of things to get the actual .tpz/.azw/whatever file off the device and once you have it, you can't really do much with it.
The most interesting thing I see coming from this is people finally reverse-engineering the format and putting to bed some of the more ridiculous mythological beliefs I've seen spring up around it. For example:
  • It has absolutely no relationship to any other ebook format out there; it is not "mobipocket with fonts," it is not "a proprietary implementation of ePub," or any of the other ridiculous things I've seen written about it from people who have no idea but make these statements as if factual
  • The defects in Topaz books are because of defects inherent to the conversion process (which are the same defects which lead to it being converted in the way it exists to begin with), and not "piracy traps" or the like
  • It was not designed as a "better DRM than Mobipocket." It exists to fill a niche that only Amazon was capable of filling (at the time)
  • The nature of the files means that writing a viewer for other platforms is non-trivial; while it's possible to extract plain text from it, the plain text won't be particularly useful or accurate (and the plaintext representation of the book is only a very very small part of the file).
  • Topaz was not designed as something "only for the big publishers." Amazon does not provide Topaz tools to third parties. Only Amazon has the conversion tools to create Topaz books.
  • Topaz is not "the native format of the Kindle." If anything, I had to fight like crazy to keep it on the device — many, many times various people tried to kill the project because they had absolutely no faith in it or me. It was getting sick of this fight that most directly contributed to me deciding to part ways with Amazon two and a half years ago.
  • The format has problems and it's arguably roundabout and circuitous for getting a worse result than a more obvious approach, but it has very good reasons to exist. If the books that are in Topaz format weren't in Topaz format, they wouldn't be in any legal ebook format. The point of Topaz was to prove to an old and established industry that ebooks are beneficial, and provide them an easy means to make it happen so that in the long term, everything will be available as a well-formatted ebook to begin with.
The fact that Kindle has been a success and that this holiday season, Amazon sold more Kindle books than physical books, I'd say Topaz has done a very good thing in that regard, and even if something as trivial as a compromised DRM scheme causes Amazon to discontinue the Topaz program wholesale (which would be a pretty big baby to throw out with not a lot of bathwater), Topaz will still have done its not-so-trivial part in reinventing the way people buy books, which is why I am still extremely proud of everything I did as part of the project (even the stupid DRM). No matter what comes of this, I did something that helped to change the world for the better in a big way.

Comments

#12729 01/07/2010 12:23 pm Please, just a link?
Hi, i was wondering if you might be willing to just give a little link for a hand?
Yesterday i went amazon shopping, because i know how to dedrm azws and prcs, and i went a little bit overboard, and bought 200 dollars worth of ebooks. Most of these were language sort of books, and said that they were prcs, but when i tried to convert them, it said they were topaz. and then it really sucked.

Ill give you my email, dont worry its purely a junk email adress, so i dont mind spamming, and i was hoping that you could email me the script, that would be wonderful.

Because now i have around 150 dollars of ebooks which i have to glare at my computer screen to read if i want to finish them, id preffer them on my prs-600 (Which im amazingly excited about because i just got it a few days ago YEY), Thankyou so much for this amazing post, becuase i had just lost all hope in ever converting them.

My email is: boredgeek@live.co.uk

I know i feel obnoxious, but i really would like to read these, thankyou
#12730 01/07/2010 12:38 pm
Seriously?

Anyway, removing the DRM won't help you since there isn't a Topaz reader for the PRS-600. It's a completely different file format, not just a different DRM scheme (this is one of the sillier myths about Topaz to come about). Topaz files can still only be read on Kindles, KindleForPC, or KindleForiPhone (and in the last case I don't know if there's even a way to get any arbitrary file into the app without jailbreaking at the very least).
#12731 01/07/2010 01:07 pm
Really?

Well, thats sucks, so even if i do somewhere find out a way to get rid of the drm, im still screwed because no-one knows how to convert it?.

pfft *angry sigh*

Thankyou very much for your help. i read about someone emailing amazon and saying that they didnt like their books in topaz, and they got a refund, ill be hoping for this.

Thanks for your help, email me if you ever get any advances on that front, please

Thankyou
#12732 01/07/2010 01:18 pm
I think you still fail to understand my relationship with the Topaz format. I designed it to begin with, while I worked for Amazon. I am not working on reverse-engineering it or converting it or whatever. The format itself isn't really intended to be convertible to begin with. It's really more of an image format than a text format.

Also, given that I am intimately familiar with how the Topaz format is put together, I would be in a very difficult legal situation if I were to actually assist in any reverse-engineering efforts, as I'd be revealing Amazon trade secrets. So I am not going to do anything to actually assist with any sort of reverse-engineering efforts; however, I will happily cheer them on from the sidelines as I'd love nothing more than for people to analyze my work (and hopefully proclaim it a work of genius, but that might be a bit optimistic).
#12733 01/07/2010 01:21 pm
WAIT!, thats okay!, some of those books are just prcs with topas drm, so if i was able to get rid of that, i might be able to read them!, by the way i most definatly bought these legally off amazon, so amazon is getting it's money's worth, dont worry
#12734 01/07/2010 01:26 pm
Very good point about the legal issues, i didnt think about that, by the way, you did do a very good job on topaz, for thousands of hackers not to be able to crack it for around 3 years, that must be satisfiying, sorry about bugging you. Very Happy
#12735 01/07/2010 01:33 pm
"PRC with topaz DRM" means it's in Topaz format - the Topaz format is the one and only file format that uses Topaz DRM. The file extension is pretty much meaningless, and Amazon has played fast-and-loose with them. Technically, PRC as an extension is Palm Resource Container, which was intended as a file format for distributing PalmOS applications; the Mobipocket format derives from this (since it made it easier for them to distribute ebooks for PalmOS, which was at the time the dominant handheld device OS).

Depending on how you download a book from Amazon, it may come with a .prc, .azw, .azw1, or .tpz extension, but the extension basically has nothing to do with the contents. .azw1 and .tpz are almost always Topaz format, while .prc and .azw can be either Topaz or Mobi. I have a feeling that Kindle for PC just uses .prc extensions for everything because that's the sort of lazy thing that Amazon's client software developers tend to do, and it has probably caused a lot of grief for the dozens (or maybe even hundreds) of people who still use PalmOS devices.
#12736 01/07/2010 01:41 pm
GoatsonaBoat:
WAIT!, thats okay!, some of those books are just prcs with topas drm, so if i was able to get rid of that, i might be able to read them!, by the way i most definatly bought these legally off amazon, so amazon is getting it's money's worth, dont worry

Oh, and I'm not worried at all - like I said, a crack to Topaz DRM doesn't really affect legal book sales (but it does probably make publishers very nervous). This is how DRM has played out time and time again, and the more restrictive DRM gets, the more likely people are going to find other ways to get the content for free.

For a long time, it was the case that pirating TV and movies was easier than getting them legitimately, which only caused studios to clamp down with more and more restrictive DRM (AACS, HDCP, etc.), all of which have been broken time and time again. But then something amazing happened: Hulu and Netflix and iTunes and Amazon VOD and so on made it easy for people to legally get and pay for the content they wanted to watch, and amazingly enough, people happily pay for that privilege (either through subscription fees, per-movie payments, or watching ads). Sure, those things still have DRM on them, but they're convenient enough that only the tiny tiny minority of users even notice, and most of the ones who try to break the DRM are doing so for their own archival purposes, not for redistributing things illegally (since there are much easier ways to get higher-quality redistributed free versions of the same content).

Customers are going to behave in the way that you treat them. Treat them with respect and they act respectfully. Treat them like criminals (e.g. using overly-restrictive and painful DRM) and they'll act like criminals.
#12737 01/07/2010 01:45 pm
Yeah, its a bit sneaky how they dont tell you what kind of file it actually is..., not very impressed with amazon, and now i cant even read those books on kindle to pc, even after i re-downloaded them. (although im rather sure that that is user error, either that, or amazon is keeping tabs on anyone that dare try to defy them, then after finding out that i was trying to convert, they messed it up. <Serious>)

hmm, shall write a letter hoping for a refund, and then never use amazon again, EVER!, until a book that i really want is only on there and no-where else...

Thankyou so much for explaining this to a noob like me, and well done on not yelling at my inability to understand simple concepts such as 'no im not going to give you the script' Hah, lol

Well thankyou very much for your help.
#12738 01/07/2010 01:49 pm
Exactly, if amazon was nice enough to not try to brandish everything with the 'kindle' burning iron, then id give them money for ebooks, there is no way im getting a kindle by the way, i am far too poor, and i already have one ebook reader, anothers overkill to me
#12739 01/07/2010 01:56 pm
If you have an iPod Touch or iPhone, there's a Kindle app for it which is very good. That's actually the main thing I use my iPod Touch for anymore. I don't even have a Kindle, personally. I seem to be in the distinct minority of people who actually don't mind reading books on a small backlit screen, though (and admittedly, when I do read ebooks, it's only for a few minutes at a time when I'm on the train or whatever).
#12740 01/07/2010 02:09 pm
Sadly, i dont have either of those. But, yeah, ill just have too look very hard to find ebooks that are as cheap and as broad, selection wise as amazon. They really have done a good job making themselves the best place to buy ebooks, havnt they, kinda sad how they look to have a sort of monopoly on the whole thing. Oh, well, thats big business.

Wow, some of the bookstores turned ebook stores charge about 200 dollars for some ebooks, just rediculous.
#12744 01/08/2010 11:11 pm
Hi fluffy,

Thank you for your contribution to ebook technology. I am a fan of the Topaz format because I read academic books which are often only available in Topaz.

Something I have been wondering about is, once the Topaz format is understood, whether it would be feasible for the Kindle community to develop tools to convert image based books into Topaz to take advantage of the reflowable nature of the format. OCRing, formatting and correcting scanned books is a tedious and time consuming process and I fear that many older texts will never be sold as ebooks. Is this a possibility or is the conversion process too intricate?
#12745 01/09/2010 12:12 am
Well, it's certainly possible, but it's also pretty intricate, and takes rather a lot of computing power (and there's still some manual editing involved, although significantly less than in a traditional ebook conversion). It makes sense for Amazon to do it because they have a lot of time, money, and spare CPU cycles to throw at these problems. We spent a lot of time developing general-purpose tools and processes to make it work as well as it does, but if you focus on a single problem domain it doesn't take quite so much work.

Textbooks do fall into a few different categories of layout and design, though, and while some of them are pretty easy to deal with, others are much more difficult. Also, much of the interesting stuff for dealing with overall layout issues aren't really visible in the final file format, but I'm sure there are much more elegant solutions than what we came up with anyway (a lot of our solutions were driven by pressure-to-ship and paving over other bad decisions made along the way, same as any other commercial software project really).

I'm not sure how much legal trouble I'd potentially be in if I were to discuss these things publicly, but a lot of the more interesting things are in the public record via patents (although some of my later work superseded one of the key inventions but we never got around to patenting it, so that would still fall under the nebulous "trade secret" blanket, which is a shame since it's the part of the system as a whole I am most proud of).

In any case, at its most basic, the Topaz file format is pretty simple, and for certain classes of formatting it shouldn't be too hard for others to produce useful Topaz files. The most difficult part of reverse-engineering will probably be figuring out the array format, and perhaps figuring out how the various tables relate to each other.
#12824 Darkreverser fan (unregistered) 01/29/2010 06:30 am Illuminating background
Saw your post on the darkreverser forum, and had to come here and read your story. It is not often that we get to read the "other side" of the story, and your views are very illuminating. Due to the length of time it took to reverse the topaz format, I would say you are quite the programmer!
#12831 02/01/2010 12:12 am
Thanks. I'm actually a bit dismayed at how long it took, since again, the DRM algorithm is pretty trivial and was basically just an afterthought (through most of the development we assumed we'd just be wrapping it up in Mobipocket DRM at the end, but that turned out to not be as easy as previously thought).

It looks like as soon as the DRM was stripped, the format itself was reverse-engineered pretty quickly, which makes sense since I designed the format to be consistent rather than cryptic. I am a bit surprised that the table compression was figured out so quickly though. Smile

(It's also a bit dismaying that even now, people at Mobileread seem to think that the DRM is the only point to Topaz. It's not!)
#12864 anonymous333 (unregistered) 02/16/2010 05:58 pm
fluffy:
I'm actually a bit dismayed at how long it took, since again, the DRM algorithm is pretty trivial and was basically just an afterthought (through most of the development we assumed we'd just be wrapping it up in Mobipocket DRM at the end, but that turned out to not be as easy as previously thought).
It looks like as soon as the DRM was stripped, the format itself was reverse-engineered pretty quickly, which makes sense since I designed the format to be consistent rather than cryptic. I am a bit surprised that the table compression was figured out so quickly though. Smile

It wasn't broken because there was little interest, so no one really knowledgeable took a try on it until the userbase increased.
I've actually reversed the DRM and bits of the format a couple of years ago. But before I got to release it came Amazon's C&D to Mobileread for kindlepid. So I decided that Kindle users deserve the suffering they get with Topaz books and went to work on other stuff. And I don't buy books from Amazon on principle even though I know I can convert them.
#12865 02/16/2010 06:01 pm
Well, that's a bit more reassuring, then!
#13135 Generally Happy Kndl Usr (unregistered) 05/28/2010 02:38 pm Topaz
fluffy,

I have been using my Kindle DX faithfully since it came out, and I have about a half dozen Topaz formatted books.

They certainly look nice on the Kindle. I'm guessing they give more flexibility in the content that can be delivered.

My one complaint about it is that when you're reading a longer Topaz book, page changes and opening the book takes an absurdly long time when you're near the end of the book. The book gets slower the further into it you go.

Is that a weakness of your design or a bad implementation by the Kindle?

I'm also waiting for more textbook to be made available through it. I've had to download a few from other sources (not paying Amazon) because publishers don't seem to want to make textbooks as ebooks.
#13136 05/28/2010 02:46 pm
It's somewhere in between; some books would sometimes get pretty bad fragmentation problems where the individual images that composed the page would be scattered across the book, and that gets worse towards the end of the book. It was one of those things where you can optimize for size (only have one copy of every image) or for speed (duplicate the images which are used often to keep them always available nearby) but not for both, and we always erred on the side of size since wireless bandwidth is expensive.

Really, the issue comes down to an underlying issue with the defining characteristic of what Topaz does to begin with, and there's not really a lot you can do about that - it's basically a risk you have to take by that particular path of making ebooks easier to convert.

In theory the frequently-used images should just stay in memory all the time, but building up that cache takes a while, which is why it's especially problematic when you open up a book near the end (as opposed to paging toward it in a single session).

If the Kindle had more memory and a faster filesystem, that problem wouldn't be nearly as noticeable, but unfortunately it's a pretty constrained device, and the Topaz renderer doesn't have a lot of memory to work with for caching.

The short version: it's not a problem with the specific Topaz file format as I designed it, but it is an underlying problem with Topaz as a concept with no easy solution.

Oh, but on the plus side, the various Topaz creation tools could certainly be tuned to improve the issue on several fronts - it's really more of a content production issue than a content reader issue, and any improvements made in the tool wouldn't affect the Kindle itself.
#13173 joe schmoe (unregistered) 06/13/2010 01:50 pm fluffy, you're a complete idiot
and topaz (if you even did work on it as you claim) is complete garbage and completely inferior (text quality) to other formats. At least Amazon had some ppl with the right thinking to oppose the format. Topaz was cracked before you even realized it and there are multiple exploits out now. I wouldn't buy anything from Amazon in the format and I'm sure they've had quite a number of returns due to the format.
#13174 06/13/2010 04:04 pm
The customer of the format was the publisher, not the kindle user. It was intended to prove to publishers that ebooks are a viable business. It served that purpose well.

I'm sorry you don't like it. I want nothing more than for the format to not be necessary! But the real world hasn't caught up to the internet quite yet. Not everything is electronic.

Also I had been looking regularly for Topaz cracks and I was very surprised it took so long for one to become public. If there was an earlier one you can poinnt to, I'd be interested to see it. Like I said, the drm was weak and an afterthought and if it had been hacked sooner, great!

(But since you found my blog by searching on "convert topaz ebook" I'll just assume you're full of hot air.)
#13175 06/13/2010 04:19 pm
Oh, and the reason others opposed Topaz weren't due to the format itself, but because the Kindle was crashing a lot and they wanted something to blame. The crashes were not caused by Topaz, which I proved time and time again but they wanted a scapegoat. I suppose I could have been a bit more explicit about that in the original writeup.

Basically, every time a Kindle crashed, the main Kindle engineers blamed Topaz, even when Topaz wasn't even installed. It got old.
#13176 06/13/2010 04:54 pm
See, now this person posting as Idolse gets it:
People have complained about the quality of Topaz in the past, but what they're really complaining about is bad scans. When you think about what the format itself is doing - taking scans of fixed pages and turning them into a transparently reflowable format - it's pretty amazing.
#13218 Kindle DX User (unregistered) 07/02/2010 09:52 am Kindle Reader for Android
fluffy,

I was wondering if you still had any friends in high places at Amazon. The Kindle Reader for Android just came out, and it does not process the Topaz format correctly. The pure text itself shows up fine, but if there is any text in a colored box, the text does not show up.

I emailed Amazon support, but the only response I got back was that I could receive a refund on my book. So I looked at another Topaz book I own and it had the same problem. Is there anyone there you can alert to this issue so that it can be fixed?

I'm curious as to what you mean by 'scans'. If the Topaz format is just a 'scan', then how can you highlight text? Do you overlay (invisible) OCR text on top of the picture?

Thanks
#13219 07/02/2010 10:07 am
I don't know anyone left on the Kindle team, unfortunately. Colored text SHOULD work fine, but of course it's quite possible that someone screwed up when they ported that bit of code to Android (since the method it uses to recolor text is a little non-obvious due to other lower-level issues - basically it does stupid XOR tricks, just like the Good Old Days).

The Topaz file originates from scanned images but it knows about the separate words - otherwise it wouldn't be able to reformat it to the screen. It does contain per-word OCR data for the purpose of search and text clippings, however (with the usual craptastic level of accuracy that implies).
#13242 Anonymous 07/17/2010 09:59 pm
Excelent stuff, fluffy. Great to hear an insiders perspective on this.

Quick question: has somebody actually created a converter that changes a TOPAZ formatted ebook to one of the other open formats? It sounds like this would be "non-trivial" just as a viewer would be, and it smells like it'd be pretty lossy if you were moving to something like EPUB ...
#13243 07/18/2010 03:40 am
Yeah, if it were possible to do, I'm pretty sure we would have gone with a standard format as the basis.
#13318 geeklygeek (unregistered) 08/21/2010 01:09 pm
Yes, it has been accomplished - a Topaz file can be de-DRM'ed and converted into HTML. Of course, the result is a bit like a scanned-and-OCR'ed book - multiple misspellings and completely mussed up math formulas. In other words, pretty useless.
#13319 08/21/2010 06:53 pm
Yeah, there's nothing magic about the OCR text in Topaz books. It's only there to support search and excerpts (for clippings, bookmarks, etc.).
#13352 08/29/2010 06:49 am Thank you
It is really interesting to see the insider story for this technology.

Interesting enough for me to register just to say thank you Wink
#13434 10/07/2010 04:04 pm Kindle 3 - Topaz
I have a couple of topaz books and didn't even know it. I just knew they were formatted really bad and I couldn't imagine what tool had been used to create such a mess. I couldn't manage it with all of the ones available to me! Now I know.

With the Kindle 3, books in Topaz format have the new features grayed out. No switching to condensed or sans serif type, no changing the line spacing and three or four of the new type sizes do nothing. Of course, we know why. The problem is the publisher has been given a hammer (topaz authoring tool) and ALL of its books are a nail whether they contain formatting issues that textbooks or technical would have or if they are just words. One book I have is set in Fairfield LH type and at some sizes it is so ragged it looks as if I was sitting in front of an Apple II! It could possibly be because of the slight resolution change from the Kindle 1 through 3.

Amazon developed and has targeted the DX for textbooks and it is a fairly simple matter to just publish in the PDF format for DX. Trying to publish such books for the 6-inch Kindles is just plain stupid and makes the publisher and Amazon look bad. Amazon should make this more obvious and make it perfectly clear that the 6-inch Kindle is a READER, a damn good reader, but don't expect it to replace any textbook or technical book.

You did what you were told, but my opinion is it should have never been done. There are a lot of messy books out there because of it. The fact the publishers are so wired into "page appearance" for printed books (I've typeset a few) doesn't help them get on board with readability for ebooks.
#13435 10/07/2010 05:49 pm
If Topaz had never been done, the Kindle likely wouldn't have happened because we wouldn't have been able to get publishers on board. It's not a perfect solution but it's better than nothing.

The whole goal was to make something as a stopgap to get publishers ramped up on actual ebook production, and to provide a back catalog of books that would never be economically-feasible to re-publish digitally. The fact that new books are still being pushed through the Topaz pipeline instead of getting a full end-to-end digital treatment is disappointing, to say the least.
#13436 10/07/2010 06:58 pm
That's a good enough reason to have done it. Any method to get the donkey to move. My comment about publishers being so hard wired into the appearance of a page in printed books is going to be an issue until they realize that ebooks are a different world.

Having typeset books, I know how important page appearance is to a publisher. It is really an art. The hours spent adjusting tracking and image placement to get paragraphs to align on opposing pages without rivers of white space and widows and ...

Tests have shown that reading from ereaders is slower than from paper books, but this is without taking into consideration the ability to adjust type size, words per line, etc. With the default Kindle formatting the reader can find a sweet spot where reading is an effortless scan down the page. It would be nice if Amazon would sponsor some research to provide guidelines for ebook publishing. Given the Kindle sales volumes, this needs to be done sooner instead of later.

In the 1980s, there were two optometrist brothers that developed a way of breaking up text into sentence fragments that would be presented on a computer screen in an indented form with breaks at conjunctions, prepositions, sentences and other points so there were usually only four or five words on a line. They had a university test the reading speed with their formatting versus a standard book format and the reading rates, comprehension and retention were much higher. I worked at LexisNexis at the time and they were trying to get us to offer legal and news research results in this format. We didn't, but I played around with some of their sample books and found it remarkably easy to read compared to arbitrary line formatting. This would be a great thing for Kindle to build into the software.

I think their name was Wagner, or something similar.
#13437 10/07/2010 07:04 pm
That's pretty interesting - I hadn't heard about that one before but it seems plausible. The ocr data in Topaz might also be accurate enough to get some of that. If I were still at Amazon I'd look into adding it as a format mode.
#13438 10/07/2010 08:32 pm
I'm retired, but I'll see if I can find who they were and if they are still doing anything. I know they patented it. Maybe you would know who to tell about it. It would certainly be worth looking into and offering as a line formatting option. I'll post here if I find it.
#13439 10/07/2010 09:55 pm
I found the patents. The brothers were Walker not Wagner. Randall C. Walker has two patents that you can find on Google Patents:

5802533 & 6279017

Their company is Live Ink: http://www.liveink.com/

I used an early version of their clip reader for about a year. I think it is too expensive at $90 per year, told them years ago that they would sell tons is they just licensed it for $30 or so. Tests we conducted confirmed that reading was considerably faster and there was seldom "backtracking" -- when you lose what is being said in a sentence and have to scan backwards to the beginning and read it over. I'm convinced Live Ink works, but it has suffered from poor marketing skills -- in my opinion.

The clip reader is good, but a pain to copy what you want to read into it, though you can ctrl-a a huge document and it works just fine. Personally, I think Amazon would be wise to conduct a little research. With default Kindle formatting of a book, they have complete control of the text and could easily make it a formatting option. The "maximum words per line" option is an attempt to accomplish something like Live Ink, but more similar to the Evelyn Wood Speed Reading training books with one or two words per line down the center of each page.

FWIW
#13441 10/08/2010 10:08 am
Unfortunately, I don't actually know anyone who is still in the Kindle group at Amazon. All my former coworkers have left there as well (likely for the same reasons I did). I suspect some of the people who are now there may have seen this blog entry by now, though. Smile
#13446 10/10/2010 05:19 pm Essence of the Topaz format
#13447 10/10/2010 06:32 pm
At its core, that pretty much describes Topaz (moving word images around in order to create reflow from a static image), although it does wave its hands when it comes to actually doing the hard parts.

It's not surprising that someone else came up with the basic idea before we did, but there's a lot more to it than just that. Most of the patents granted over Topaz are with very specific things such as baseline matching, format clustering, compressing the data, image cleanup to improve the contour tracing, and so on.