Print Story Ask Husi -- book format conversions
By johnny (Wed Feb 10, 2010 at 02:36:28 PM EST) (all tags)
So after a nearly a decade of giving away PDFs of my first two books, I've decided to sell them as ebooks in different formats.

The technical hassles in so doing are bigger than they should be, although most of the problems are perhaps more in my head than in the format-conversion technology.

Mainly, I'm trying to convert PDF versions of my book to MS Word .doc format. 

Your help in making me un-stupid in this process would be much appreciated.


So I have three books out -- Acts of the Apostles, Cheap Complex Devices, and The Pains. I make Acts and CCD available in PDF; The Pains in html. These books are released under Creative Commons license. (There are other versions out on the net -- people other than me have done format conversions & make them available for free. I kinda consider that a violation of "No Derivative" clause of the license, but not everybody agrees with me. That's a topic for another day.)

With the help of some friends, I made a kindle version of The Pains some while ago, and it sells on I understand that there are some formatting glitches in it, but nothing terrible.


ANYWAY, after spending way too much time investigating this whole "e-book" topic and learning actually very little other than that the age of the Ebook has arrived at last, this time fer sher, I decided to convert my first (& best selling of the three) book Acts of the Apostles. The site Smashwords provides a free service to self-publishing dudes like me.  Give them your book in .doc format & they put it through their "meatgrinder" translater program and spit out a bunch of different versions of the book, including EPUB and mobipocket (kindle).

So I'm going to do this Smashwords thing, create the different versions, list my books with the Smashwords premium service, and also offer the differently-formatted versions on my site for $5 or so. I'll put the mobipocket version on Amazon also.

Then, after seeing how that goes, I'll explore other avenues, like the iphone app store, etc.

So good, at least I've made a decision.

Let teh stupid begin: Creating an MS Word version of Acts of the Apostles

So I need an MS Word version of Acts. My first attempt was to take the output of a freeware program that does just that -- converts PDF to Word. The output was not bad, but it contained literally hundreds of small formatting glitches.  Places where the font size (or font itself) changed for no apparent reason, unreadable characters, places where font scale & compression changed, images that just got dropped on the floor, etc.

So, after spending about a day fooling around trying to clean that up (I am a very unskilled user of Word) I decided to try another approach.

My pal Gary, who helps my with formatting my books, has the book in InDesign source format; these are the sources that were used to produce the PDF. So why not simply use InDesign to emit .doc instead of PDF?

So Gary did that for me, and the result was OK, but the InDesign-generated version had formatting errors too, only different ones than the freeware PDF-to-Word version. For example, consider page numbers and headers. The Word version from InDesign contains intelligence about these things, but in the PDF to Word version they're just images. So, since I don't want headers or page numbers in my source for Smashwords, it should be easier to strip them out in Word using the InDesign-version. But the other formatting issues in the InDesign version were worse.

After spending about a day fooling around with the InDesign version, I came to the conclusion that the first source was actually easier to work with than the second.  So that's what I'm in the middle of doing now, cleaning up the source generated by the PDF-to-Word program.

Is it incompetence or merely stupidity?

After having spent a lot of time worrying about things like how to handle headers page numbers and images so forth, I finally got around to reading the Smashwords Style Guide, which very clearly explains that you have to get rid of headers & footers & page numbers (I could have figured that out if I had spent more than about 3 seconds thinking about it), and you also have to get rid of all pictures unless you absolutely need them. 

(If you're wondering why I didn't read the Style Guide before I started the whole undertaking, it's because I'm an idiot and I don't have a brain in my head. I trust that answers the question.)

So now what I want to do is get rid of

-- page breaks
-- headers (left hand headers say "Acts of the Apostles", right hand says the name of the book section, of which there are seven, for example "Angel" "Small Miracles" "Conversion" "A Certain Centurian")
-- page numbers
-- little glyphs that I use to demarcate sections within chapters.

Can anybody tell me how to:

-- use Word's search & replace to get rid of page breaks?
-- use Word's search & replace to get rid headers? (Remember, in this version they're just text, not headers.)
-- get rid of all hard line feeds (carriage returns)?
-- etc?

Or Maybe I should just brute force it?

Maybe I should just take a Word source (either one?), save the damn thing as text-only, no markup, and use BBeddit to clean up everything as text, and then bring that back into word and reformat with typefaces and paragraph attributes?

Any suggestions welcome. If there are any programming maestros out there with nothing better to do, let me know & I'll send you the doc versions.

< I closed the gripper. | Blogging my weight loss >
Ask Husi -- book format conversions | 30 comments (30 topical, 0 hidden) | Trackback
is kellerin the editor ? by sasquatchan (2.00 / 0) #1 Wed Feb 10, 2010 at 03:46:54 PM EST
TF maybe from grant writing..

But layout and plumbing the depths of formatting word is something I know zilch about, and avoid if I have to..

Kellnerin edited Pains and is working by johnny (2.00 / 0) #3 Wed Feb 10, 2010 at 05:00:33 PM EST
with me on book-in-progress "Creation Science".

I wouldn't trouble her with this kind of stuff; I need her much more valuable skills in shaping the prose & stories.

I'm sure if I knew what I were doing this could be short work -- or, if I'm not sure of that, I suspect that.  If no Word Wizard speaks up with suggestions in the next hour or so, I'll just go back to the grindstone.

She has effectively checked out. She's an un-person of her own making. So it falls to me.--ad hoc (in the hole)

[ Parent ]
i know nothing... by ana (2.00 / 0) #2 Wed Feb 10, 2010 at 04:38:04 PM EST
but is there an intermediate format like RTF that you could use to transport it?

"And this ... is a piece of Synergy." --Kellnerin

I've got a full copy of Acrobat by lm (2.00 / 0) #4 Wed Feb 10, 2010 at 05:00:39 PM EST
Supposedly it can save a document as MS Word, RTF, HTML, XML, and etcetera.

I thought I'd give it a whirl for you to see if Acrobat produced any better results. But your PDFs are copy protected so I can't.

There is no more degenerate kind of state than that in which the richest are supposed to be the best.
Cicero, The Republic
First, thank you, and by johnny (2.00 / 0) #13 Wed Feb 10, 2010 at 06:10:19 PM EST
second, you may be onto something.

As explained above, those PDFs were created by InDesign. My friend used the InDesign sources to produce a word doc, and it was buggy.

But maybe if I use the InDesign to produce a non-copyprotected PDF, we can try Acrobat instead.

I'll look into that.

She has effectively checked out. She's an un-person of her own making. So it falls to me.--ad hoc (in the hole)

[ Parent ]
Stupid question by ObviousTroll (2.00 / 0) #5 Wed Feb 10, 2010 at 05:02:31 PM EST
What happened to the original manuscripts?

An Angry and Flatulent Pig, Trying to Tie Balloon Animals
The InDesign sources by johnny (2.00 / 0) #10 Wed Feb 10, 2010 at 05:52:20 PM EST
are definitive. That's what I used to produce the PDF that's on my website.

I wrote the original book using Wordperfect on a PC, mostly - plus some sources in that I created using a text editor on an early Mac and incorporated into Wordperfect sources with with endless woe.

When I did the first printing of the book, the printer required sources in Quark. So at great expense I rented time on a workstation that had Quark & that's how the first edition was created -- by me importing sources from Wordperfect and laboriously bringing them into Quark (& designing a layout, etc).

It had a bajillion typos & formatting nits, so a year or two after the first edition came out, I imported the Quark sources into InDesign and cleaned up all the nits (or "snots", as the bricklayers call little bits of cement in places you don't want it). (I didn't have to buy InDesign; I just downloaded a one-month free trial and did the work over the next 27 days.)

So anyway, the PDF created by the InDesign sources is the definitive version.

But Smashwords insists on Microsoft Word as its starting point. So somehow I have to get there.

She has effectively checked out. She's an un-person of her own making. So it falls to me.--ad hoc (in the hole)

[ Parent ]
MS Word by toxicfur (4.00 / 2) #6 Wed Feb 10, 2010 at 05:07:22 PM EST
You, my friend, are screwed. Creating a Word doc from any other format is ugly, unless someone has some magic spell I haven't found, of course - I don't pretend to be an expert. I suspect that going the text/no-format route is going to be the least tedious method (and probably the least likely to have glitches when you're done).

That said, you can find and replace formatting things in Word. If you go to "Find and Replace" (under the Edit menu), choose the "Replace" tab in the dialogue box, then at the bottom, you can pick what sort of formatting you want to find. I tried it by searching for "Manual Page Break" and replacing it with a space. I don't know if that'll help, but it might be worth a shot.
The amount of suck that you can put up with can be mind-boggling, but it only really hits you when it then ceases to suck. -- Kellnerin

Have you tried Calibre? by wiredog (2.00 / 0) #7 Wed Feb 10, 2010 at 05:27:17 PM EST
It's an awesome ebook library and conversion tool.


Earth First!
(We can strip mine the rest later.)

Darn it!! by johnny (2.00 / 0) #12 Wed Feb 10, 2010 at 06:02:56 PM EST
I had decided on Smashwords, and now you show me this. I'm all confused again. Thanks a lot, Wiredog.

How I sympathize with poor Michael Corleone: "Just when I think I'm out, they pull me back in!

She has effectively checked out. She's an un-person of her own making. So it falls to me.--ad hoc (in the hole)

[ Parent ]
I've been using Calibre for a while by wiredog (2.00 / 0) #14 Wed Feb 10, 2010 at 06:15:42 PM EST
It's awesome. I've been able to combine epubs, convert various formats to epub, and sync everything with my Nook.

Earth First!
(We can strip mine the rest later.)

[ Parent ]
Great, just great by johnny (2.00 / 0) #17 Wed Feb 10, 2010 at 07:23:16 PM EST
if Calibre's tools are that good, why will anybody pay me money for other versions, since my PDFs are free?

In conclusion, please keep you damn mouf shut about how much you like Calibre.  At least until I've sold enough ebook versions of my fantabulous novels to get my mortgage current.

She has effectively checked out. She's an un-person of her own making. So it falls to me.--ad hoc (in the hole)

[ Parent ]
Baen Books makes money by wiredog (2.00 / 0) #20 Wed Feb 10, 2010 at 08:26:44 PM EST
They sell DRM free ebooks at WebScription.

Earth First!
(We can strip mine the rest later.)

[ Parent ]
Why not use InDesign instead of a .doc kludge by marvin (2.00 / 0) #8 Wed Feb 10, 2010 at 05:38:56 PM EST
Output epub directly.

Looks like Mobipocket Desktop can convert epub to the mobipocket format. I agree with ObviousTroll - work from your original format and use better tools, rather than trying to convert to .doc and then into something else just because conversion is offered on the website. The important thing for me would be getting the chapters and stuff right, which is something I don't trust the .doc format to not screw up.

Excellent points, but by johnny (2.00 / 0) #15 Wed Feb 10, 2010 at 06:30:33 PM EST
Question: what's the best way for me to market these things?

I was kind of thinking of Smashwords as a marketing site as well as a format-conversion site.

The PDFs of Acts of the Apostles have been on the internet for nearly ten years. It's on Pirate Bay and similar sites. It's all over the place.

I'm trying to figure out how to get my versions in new formats (Epubs, Kindle, etc) before the buying public, and Smashwords seems to do a pretty good job of that.

Amazon of course works for Kindle versions -- although they take an obscene cut -- but that doesn't address what to do about getting my other versions out there & collecting $$ for them without all kinds of stupid work on my own site, which would seem the wrong way to go.

So my document-formatting problem is caused by my marketing approach.

(In this respect, by the way, Calibre, listed by Wiredog, above, is just what I don't need -- an easy way for Joe Random Internet User to convert my books from PDF to whatever format, obviating the need to buy a copy from me.

She has effectively checked out. She's an un-person of her own making. So it falls to me.--ad hoc (in the hole)

[ Parent ]
I think Joe Random Internet User does want to pay. by wiredog (2.00 / 0) #16 Wed Feb 10, 2010 at 07:12:24 PM EST
Just not $25 for a copy. Maybe not $10 for a copy.

This is a longish essay on the economics of book publishing and pricing.

Earth First!
(We can strip mine the rest later.)

[ Parent ]
Thanks, Wiredog by johnny (2.00 / 0) #18 Wed Feb 10, 2010 at 07:45:32 PM EST
that was the best article on this whole topic I've read to date.

Amazing that that guy is still around and publishing books. He warn't borned yesterday, you know.

She has effectively checked out. She's an un-person of her own making. So it falls to me.--ad hoc (in the hole)

[ Parent ]
I disagree with his politics. by wiredog (2.00 / 0) #19 Wed Feb 10, 2010 at 08:22:12 PM EST
But still read his website every day.

Earth First!
(We can strip mine the rest later.)

[ Parent ]
Wow, I can do something useful today. by ammoniacal (2.00 / 0) #9 Wed Feb 10, 2010 at 05:44:38 PM EST
What version of Word created the document? If it's a .docx file (Word 2007), you can copy and paste the appropriate formatting marks from the document into the search field and the replace field, and that should do the trick.

"To this day that was the most bullshit caesar salad I have every experienced..." - triggerfinger

Thanks for tip; I'll look into it by johnny (4.00 / 1) #11 Wed Feb 10, 2010 at 06:00:11 PM EST
On my machine I have Word 2004 that I upgraded a few years ago to handle docx.

She has effectively checked out. She's an un-person of her own making. So it falls to me.--ad hoc (in the hole)
[ Parent ]
the find box by garlic (4.00 / 1) #30 Sun Feb 14, 2010 at 08:02:01 AM EST
sometimes has a more button that is worth clicking. Somewhere there it'll let you add special characters to the find field. For instance, if you have a triple paragraph break you want to turn into a double, you'd use the paragraph mark search icon ^p, and have it search for ^p^p^p and replace it with ^p^p everywhere.

I know ^t is tab, but I don't know line breaks.

Perhaps a google search for: MS word find special characters

[ Parent ]
outsource sections to husi by misslake (2.00 / 0) #21 Wed Feb 10, 2010 at 11:03:58 PM EST
send me a chunk of the book, i'll compare it to the paper copy ni has, fix it up, and send it back to you.

can we google wave this? then we can all fiddle with it together until it's perfect, and then you can send it in.

That is a very sweet offer by johnny (2.00 / 0) #22 Wed Feb 10, 2010 at 11:17:07 PM EST
One nit: the reference version is not the printed copy, it's the PDF on my website -- it has many hundreds of small corrections to the text that was used in the first printing -- so if I ever do a second printing, I'll have a clean PDF to start from.

As to the outsourcing idea, I think I'll see what progress I can make tomorrow by myself. I expect that we'll be snowed in, so it will be extra cozy. I'll take puppy for a long walk in the snow, come home & make coffee, and see if I cannot crack this nut.

IF that fails, I may pursue the misslake option.

By the way, I really loved your madhouse diary. I've made similar treks myself, and it truly can be almost as much as a heart can bear. Your story conveyed that wonderfully, and your friend is lucky to have you in his life.

She has effectively checked out. She's an un-person of her own making. So it falls to me.--ad hoc (in the hole)

[ Parent ]
I want to say this: by Kellnerin (4.00 / 5) #25 Thu Feb 11, 2010 at 07:17:18 AM EST
Stop. Do not sink another day into this. Please, why are you not working on Creation Science instead?

If this mechanical work brings you to a sort of Zen state of mind (wax on, wax off) where your consciousness is open to the world and your entire being is infused with cosmic awesomeness that will make the new novel a hundred times better, then that's cool. But I sense that it is not.

I can explain at further length, and my apologies for not speaking up earlier. My excuse is that my day yesterday was stacked with lots of little frustrations and I did not want to bring the bad vibes to your diary. But I still don't think this is a good idea.

"Plans aren't check lists, they are loose frameworks for what's going to go wrong." -- technician

[ Parent ]
Kellnerin is right. by clock (4.00 / 1) #26 Thu Feb 11, 2010 at 07:57:45 AM EST
stop. stop. stop.

see my sig.

I agree with clock entirely --Kellnerin

[ Parent ]
two answers by johnny (2.00 / 0) #28 Thu Feb 11, 2010 at 02:55:05 PM EST
First, I am working on Creation Science & will be sending out an update soon. That remains my top priority. I work on it every day.

Second, it seems to me that if I'm going to be my own publisher -- and I am for my first three books already; who knows if I'll be able to sell Creation Science?--then at some point I need to figure out this whole ebook thing. I need to at least get my bearings. The whole business is evolving very quickly, and it seems, or seemed to me that I need have a strategy for this just as I do for my website.

She has effectively checked out. She's an un-person of her own making. So it falls to me.--ad hoc (in the hole)

[ Parent ]
clarification by Kellnerin (4.00 / 4) #29 Thu Feb 11, 2010 at 09:00:46 PM EST
First, I apologize for the abruptness of my earlier comment, posted on my way out the door to catch the train.

It is good to know that you are continuing work on Creation Science. From your entry, I got the impression that you had been spending solid days on end banging your head against this particular wall, and that did not seem like the best use of your time.

(An update would also be good for backers of the project who don't have a venue such as this to prod you about your work in progress.)

I agree that an eBook strategy is probably a good thing, but I guess I was disagreeing with your tactics. Smashwords may not be the best avenue for you to pursue, given the technical challenges. (A "free" service is not free when it begins to cost a lot of time and effort.) I'm pretty confident in saying that an iPhone app is definitely not something to pursue. (iPad book store, maybe, but who knows what that is going to be like?) And honestly, I don't know if anyone has an "eBook strategy," at least, not one that enables them to make big bucks on the venture. I wouldn't jump into the game just to be in it, because the rules are constantly changing.

In conclusion, I really think the best thing you can do is to concentrate on making Creation Science the best book possible, at least until you are close to a complete draft. I'm just cautioning against becoming too distracted by these side ventures, even if they fall under the general umbrella of your writing/publishing career.

"Plans aren't check lists, they are loose frameworks for what's going to go wrong." -- technician

[ Parent ]
Convert to Word? by Herring (4.00 / 1) #23 Thu Feb 11, 2010 at 04:17:47 AM EST
Are you sure this isn't just an elaborate troll?

Convert to a format where your layout can be screwed if you select a different printer?

I'm not sure anything automatic is going to be 100%. I'd be tempted to convert it to plain text (to lose the page breaks etc.) then start again with the formatting. Or print out the PDF, scan each page an paste each bitmap onto a page of the Word document, then send it to these people and serve them right.

christ, we're all old now - StackyMcRacky

Word is just by johnny (2.00 / 0) #24 Thu Feb 11, 2010 at 07:15:48 AM EST
the starting point for their tool chain.  The only formatting information they want you to incorporate, basically, are font preferences and paragraphs. From that they generate all kinds of different formats.

Sure doesn't look like a troll, and they've been recommended to me by people I trust. . .

She has effectively checked out. She's an un-person of her own making. So it falls to me.--ad hoc (in the hole)

[ Parent ]
I'd be skeptical of any company by muchagecko (2.00 / 0) #27 Thu Feb 11, 2010 at 12:01:57 PM EST
that needs to format Word documents.Word is the worst publishing/formatting program.

Have you contacted someone in their customer service department to see if there are alternatives?

InDesign has become the industry standard for print. What a horror if you have to convert it's lovely pdf into a Frankenstein Word doc.

A purpose gives you a reason to wake up every morning.
So a purpose is like a box of powdered donut holes?
My Name is Earl

[ Parent ]
Ask Husi -- book format conversions | 30 comments (30 topical, 0 hidden) | Trackback