Interesting Topaz DRM development (geekery, job stuff)
One of the (minor but important) parts of the Topaz format is, of course, the DRM, which has so far eluded being compromised, which is funny because it's actually a pretty trivial "secret-sauce" algorithm which was implemented under some pretty ridiculous constraints (I had limited time to implement it, wasn't allowed to pull in any external libraries, and had to keep it performing quickly without using much memory on an already-constrained device), and somehow it's eluded being cracked for a bit over two years.
Until now.
Earlier today, someone (who I will of course keep nameless) asked me about a bit of Python code (which I will of course not link to) that he'd found which ostensibly would strip the DRM from a Topaz file as downloaded by the KindleForPC app. I looked at it, and yes, it looked like a plausible DRM stripper; presumably it was developed by someone who had run a disassembler on KindleForPC. It did require being run on the same PC as the Topaz file was provisioned for, however. But of course, this enterprising experimenter did not stop there: he analyzed the (again, pretty trivial) encryption algorithm and found a weakness in it (one which I will not name, but which I was aware of as a possibility even when I wrote it), and after not too much time, he'd written a C++ program which would very quickly brute-force the underlying encryption key and completely strip the file of all DRM.
He said that he wasn't interested in releasing it himself (he mostly did it as an intellectual challenge), and for obvious reasons I won't be releasing it either (or even describing the nature of the exploit), but yes, Topaz DRM has been completely compromised at least once, and it wouldn't surprise me if someone else has also figured out the flaw in the algorithm.
I just want to say ahead of time (before everyone emails, IMs, etc.) me that I am aware that it's broken (and this is for real, not the many previous iterations of Mobi DRM being cracked that had nothing to do with Topaz) and also state in my defense:
- I am pretty anti-DRM myself. Not that I went out of my way to make the DRM on Topaz fragile per se, but I was under enormous management and time pressures (as I stated above) and only did a minimum-effort job as a short-term solution with the intention of revisiting it later. I'm actually pretty shocked that nobody at Amazon improved the DRM since I left (or even made any changes to the file format at all, as far as I can tell); they didn't even bother to wrap KindleForPC downloads in an extra layer of DRM like they did for Mobi books (which is what the previous "KINDLE DRM CRACKED!!!" announcement was about).
- I was expecting it to be cracked within weeks or months of Kindle's release, but it took over two years. Not bad for something that can be brute-forced in a few milliseconds if you know the secret sauce (and considering the filesystem had been dumped within days and people could have run a disassembler on it, why was the sauce a secret this long? IT IS RUSSIAN DRESSING, PEOPLE).
- The primary weakness in any DRM mechanism is one with key exchange. Even if I had used a stronger encryption algorithm (which would have prevented the brute-force no-key attack, or at least slowed it down significantly), there's still the issue of keeping the device key secure (which is basically impossible).
- Cracking Topaz DRM doesn't really make book piracy any easier. For starters, someone needs to get a copy of (i.e. pay for) the book, and Topaz books tend to be of more niche interest. The ones which are of more widespread interest tend to get pirated in other ways, such as teams of book pirates doing their own plain-text conversion from an actual book, long before the Topaz version even becomes available to begin with.
- It is also a bit of a pain in the ass to put an unencumbered Topaz file onto any arbitrary Kindle-type device (Kindle, PC, iPhone/iPod Touch), and while this helps the various reverse-engineering efforts, the format is still pretty complex.
- The big boon I see for the removal of DRM is purely in the legitimate customer's interest; people want to back up their Topaz books without having to reprovision them in the event that they change devices. Currently they have no legal way to do this, especially if a book gets retracted by the publisher (as in the famous and overly-ironic 1984 case).
- People are fundamentally honest and good. They pirate only when the legal approach is too cumbersome. Witness how the iTunes music store and AmazonMP3 and so on continue to have plenty of sales even with illegally-traded pirated mp3s, which have certainly been more prevalent for much longer (and derived from plain CDs) than those services have been DRM-free.
- There is still a pretty high barrier to people stripping the DRM from their Topaz books; even if someone were to release this app (which neither I nor the curious experimenter are going to do), you still have to do a lot of things to get the actual .tpz/.azw/whatever file off the device and once you have it, you can't really do much with it.
- It has absolutely no relationship to any other ebook format out there; it is not "mobipocket with fonts," it is not "a proprietary implementation of ePub," or any of the other ridiculous things I've seen written about it from people who have no idea but make these statements as if factual
- The defects in Topaz books are because of defects inherent to the conversion process (which are the same defects which lead to it being converted in the way it exists to begin with), and not "piracy traps" or the like
- It was not designed as a "better DRM than Mobipocket." It exists to fill a niche that only Amazon was capable of filling (at the time)
- The nature of the files means that writing a viewer for other platforms is non-trivial; while it's possible to extract plain text from it, the plain text won't be particularly useful or accurate (and the plaintext representation of the book is only a very very small part of the file).
- Topaz was not designed as something "only for the big publishers." Amazon does not provide Topaz tools to third parties. Only Amazon has the conversion tools to create Topaz books.
- Topaz is not "the native format of the Kindle." If anything, I had to fight like crazy to keep it on the device — many, many times various people tried to kill the project because they had absolutely no faith in it or me. It was getting sick of this fight that most directly contributed to me deciding to part ways with Amazon two and a half years ago.
- The format has problems and it's arguably roundabout and circuitous for getting a worse result than a more obvious approach, but it has very good reasons to exist. If the books that are in Topaz format weren't in Topaz format, they wouldn't be in any legal ebook format. The point of Topaz was to prove to an old and established industry that ebooks are beneficial, and provide them an easy means to make it happen so that in the long term, everything will be available as a well-formatted ebook to begin with.
Comments
Yesterday i went amazon shopping, because i know how to dedrm azws and prcs, and i went a little bit overboard, and bought 200 dollars worth of ebooks. Most of these were language sort of books, and said that they were prcs, but when i tried to convert them, it said they were topaz. and then it really sucked.
Ill give you my email, dont worry its purely a junk email adress, so i dont mind spamming, and i was hoping that you could email me the script, that would be wonderful.
Because now i have around 150 dollars of ebooks which i have to glare at my computer screen to read if i want to finish them, id preffer them on my prs-600 (Which im amazingly excited about because i just got it a few days ago YEY), Thankyou so much for this amazing post, becuase i had just lost all hope in ever converting them.
My email is: boredgeek@live.co.uk
I know i feel obnoxious, but i really would like to read these, thankyou
Anyway, removing the DRM won't help you since there isn't a Topaz reader for the PRS-600. It's a completely different file format, not just a different DRM scheme (this is one of the sillier myths about Topaz to come about). Topaz files can still only be read on Kindles, KindleForPC, or KindleForiPhone (and in the last case I don't know if there's even a way to get any arbitrary file into the app without jailbreaking at the very least).
Well, thats sucks, so even if i do somewhere find out a way to get rid of the drm, im still screwed because no-one knows how to convert it?.
pfft *angry sigh*
Thankyou very much for your help. i read about someone emailing amazon and saying that they didnt like their books in topaz, and they got a refund, ill be hoping for this.
Thanks for your help, email me if you ever get any advances on that front, please
Thankyou
Also, given that I am intimately familiar with how the Topaz format is put together, I would be in a very difficult legal situation if I were to actually assist in any reverse-engineering efforts, as I'd be revealing Amazon trade secrets. So I am not going to do anything to actually assist with any sort of reverse-engineering efforts; however, I will happily cheer them on from the sidelines as I'd love nothing more than for people to analyze my work (and hopefully proclaim it a work of genius, but that might be a bit optimistic).
Depending on how you download a book from Amazon, it may come with a .prc, .azw, .azw1, or .tpz extension, but the extension basically has nothing to do with the contents. .azw1 and .tpz are almost always Topaz format, while .prc and .azw can be either Topaz or Mobi. I have a feeling that Kindle for PC just uses .prc extensions for everything because that's the sort of lazy thing that Amazon's client software developers tend to do, and it has probably caused a lot of grief for the dozens (or maybe even hundreds) of people who still use PalmOS devices.
Oh, and I'm not worried at all - like I said, a crack to Topaz DRM doesn't really affect legal book sales (but it does probably make publishers very nervous). This is how DRM has played out time and time again, and the more restrictive DRM gets, the more likely people are going to find other ways to get the content for free.
For a long time, it was the case that pirating TV and movies was easier than getting them legitimately, which only caused studios to clamp down with more and more restrictive DRM (AACS, HDCP, etc.), all of which have been broken time and time again. But then something amazing happened: Hulu and Netflix and iTunes and Amazon VOD and so on made it easy for people to legally get and pay for the content they wanted to watch, and amazingly enough, people happily pay for that privilege (either through subscription fees, per-movie payments, or watching ads). Sure, those things still have DRM on them, but they're convenient enough that only the tiny tiny minority of users even notice, and most of the ones who try to break the DRM are doing so for their own archival purposes, not for redistributing things illegally (since there are much easier ways to get higher-quality redistributed free versions of the same content).
Customers are going to behave in the way that you treat them. Treat them with respect and they act respectfully. Treat them like criminals (e.g. using overly-restrictive and painful DRM) and they'll act like criminals.
hmm, shall write a letter hoping for a refund, and then never use amazon again, EVER!, until a book that i really want is only on there and no-where else...
Thankyou so much for explaining this to a noob like me, and well done on not yelling at my inability to understand simple concepts such as 'no im not going to give you the script' Hah, lol
Well thankyou very much for your help.
Wow, some of the bookstores turned ebook stores charge about 200 dollars for some ebooks, just rediculous.
Thank you for your contribution to ebook technology. I am a fan of the Topaz format because I read academic books which are often only available in Topaz.
Something I have been wondering about is, once the Topaz format is understood, whether it would be feasible for the Kindle community to develop tools to convert image based books into Topaz to take advantage of the reflowable nature of the format. OCRing, formatting and correcting scanned books is a tedious and time consuming process and I fear that many older texts will never be sold as ebooks. Is this a possibility or is the conversion process too intricate?
Textbooks do fall into a few different categories of layout and design, though, and while some of them are pretty easy to deal with, others are much more difficult. Also, much of the interesting stuff for dealing with overall layout issues aren't really visible in the final file format, but I'm sure there are much more elegant solutions than what we came up with anyway (a lot of our solutions were driven by pressure-to-ship and paving over other bad decisions made along the way, same as any other commercial software project really).
I'm not sure how much legal trouble I'd potentially be in if I were to discuss these things publicly, but a lot of the more interesting things are in the public record via patents (although some of my later work superseded one of the key inventions but we never got around to patenting it, so that would still fall under the nebulous "trade secret" blanket, which is a shame since it's the part of the system as a whole I am most proud of).
In any case, at its most basic, the Topaz file format is pretty simple, and for certain classes of formatting it shouldn't be too hard for others to produce useful Topaz files. The most difficult part of reverse-engineering will probably be figuring out the array format, and perhaps figuring out how the various tables relate to each other.
It looks like as soon as the DRM was stripped, the format itself was reverse-engineered pretty quickly, which makes sense since I designed the format to be consistent rather than cryptic. I am a bit surprised that the table compression was figured out so quickly though.
(It's also a bit dismaying that even now, people at Mobileread seem to think that the DRM is the only point to Topaz. It's not!)
It wasn't broken because there was little interest, so no one really knowledgeable took a try on it until the userbase increased.
I've actually reversed the DRM and bits of the format a couple of years ago. But before I got to release it came Amazon's C&D to Mobileread for kindlepid. So I decided that Kindle users deserve the suffering they get with Topaz books and went to work on other stuff. And I don't buy books from Amazon on principle even though I know I can convert them.
I have been using my Kindle DX faithfully since it came out, and I have about a half dozen Topaz formatted books.
They certainly look nice on the Kindle. I'm guessing they give more flexibility in the content that can be delivered.
My one complaint about it is that when you're reading a longer Topaz book, page changes and opening the book takes an absurdly long time when you're near the end of the book. The book gets slower the further into it you go.
Is that a weakness of your design or a bad implementation by the Kindle?
I'm also waiting for more textbook to be made available through it. I've had to download a few from other sources (not paying Amazon) because publishers don't seem to want to make textbooks as ebooks.
Really, the issue comes down to an underlying issue with the defining characteristic of what Topaz does to begin with, and there's not really a lot you can do about that - it's basically a risk you have to take by that particular path of making ebooks easier to convert.
In theory the frequently-used images should just stay in memory all the time, but building up that cache takes a while, which is why it's especially problematic when you open up a book near the end (as opposed to paging toward it in a single session).
If the Kindle had more memory and a faster filesystem, that problem wouldn't be nearly as noticeable, but unfortunately it's a pretty constrained device, and the Topaz renderer doesn't have a lot of memory to work with for caching.
The short version: it's not a problem with the specific Topaz file format as I designed it, but it is an underlying problem with Topaz as a concept with no easy solution.
Oh, but on the plus side, the various Topaz creation tools could certainly be tuned to improve the issue on several fronts - it's really more of a content production issue than a content reader issue, and any improvements made in the tool wouldn't affect the Kindle itself.
I'm sorry you don't like it. I want nothing more than for the format to not be necessary! But the real world hasn't caught up to the internet quite yet. Not everything is electronic.
Also I had been looking regularly for Topaz cracks and I was very surprised it took so long for one to become public. If there was an earlier one you can poinnt to, I'd be interested to see it. Like I said, the drm was weak and an afterthought and if it had been hacked sooner, great!
(But since you found my blog by searching on "convert topaz ebook" I'll just assume you're full of hot air.)
Basically, every time a Kindle crashed, the main Kindle engineers blamed Topaz, even when Topaz wasn't even installed. It got old.
I was wondering if you still had any friends in high places at Amazon. The Kindle Reader for Android just came out, and it does not process the Topaz format correctly. The pure text itself shows up fine, but if there is any text in a colored box, the text does not show up.
I emailed Amazon support, but the only response I got back was that I could receive a refund on my book. So I looked at another Topaz book I own and it had the same problem. Is there anyone there you can alert to this issue so that it can be fixed?
I'm curious as to what you mean by 'scans'. If the Topaz format is just a 'scan', then how can you highlight text? Do you overlay (invisible) OCR text on top of the picture?
Thanks
The Topaz file originates from scanned images but it knows about the separate words - otherwise it wouldn't be able to reformat it to the screen. It does contain per-word OCR data for the purpose of search and text clippings, however (with the usual craptastic level of accuracy that implies).
Quick question: has somebody actually created a converter that changes a TOPAZ formatted ebook to one of the other open formats? It sounds like this would be "non-trivial" just as a viewer would be, and it smells like it'd be pretty lossy if you were moving to something like EPUB ...
Interesting enough for me to register just to say thank you