Skip to main content

Introduction

By now, we’ve grown used to word processing and desktop publishing applications giving us a lot of control over fonts, spacing, and typographical effects in our digital documents. With the click of a button, you can change the size of your text, set margins, insert tab stops, create effects, and do other things to make your text pretty. These applications have also encouraged some bad keyboarding habits that can cause your text to do unpredictable things when published online.

The point of this piece is to share some tips I’ve gathered from years of editing copy and preparing it for publication, both in print and on the Internet.

The short version is that you’re probably putting too much effort into the way your writing looks.

Although typography for the Internet has come a long way since the invention of Hypertext Markup Language (HTML), it’s still limited compared to what you can do with word processing software. To bridge the divide, many word processing applications now have converters that can generate HTML versions of documents. Some are better than others. The free LibreOffice is very good at producing clean HTML.

Regardless of the word processor you favor, conversion to HTML often introduces a new set of problems, since it will include instructions for approximating the format of the original document. Those instructions almost always conflict with the formatting instructions in place on Internet publishing platforms like blogs or static pages like this one.

Consider what Microsoft Word does to generate an HTML version of this sentence:

This sentence is indented, in italic type, and it has been justified to both margins.

Take a look at the HTML that Microsoft Word produced to display that sentence.

Now consider what the same sentence looks like when properly composed for display on the Internet:

<p class="rtejustify rteindent1"><em>This sentence is indented, in italic type, and it has been justified to both margins.</em></p>

Much simpler!

But this piece is not about how to encode your text. It's about how to prepare your text in a way that will facilitate its timely publication. If you want your text to come out well on the Internet, and if you want to win the gratitude of anyone who has to work with your text, please follow these guidelines:

Keep it simple

Focus on content, not aesthetics. If you need to add emphasis to your text, stick to boldface, italics, and underline, since those are universally available in HTML.

For pointers on content, please see the following excellent articles on this site:

Use the return key only to start a new paragraph

It’s worth repeating: use the return key only to start a new paragraph. If you use it for anything else, for example, to create a blank space between two paragraphs, an HTML converter will turn the extra returns into empty paragraphs. That’s bad because the formatting instructions associated with paragraphs often include spacing, so empty paragraphs multiply that spacing. For that reason, many sites, including this one, implement something called an empty paragraph killer, but why not avoid senseless violence by being more sparing with that return key?

Consider the paragraphs in this piece. The space between them is part of their formatting instructions. I didn’t press the return key twice to create that space. It’s just part of the way this page is formatted by this site.

For the same reason, don’t press the return key at the end of every line, as you might on a manual typewriter. If you do, your text will end up looking like this:

Use the return key only to start a new paragraph. If you use it for anything else, for

example, to create a blank space between two paragraphs, an HTML converter will

turn the extra returns into empty paragraphs. That’s bad because the formatting

instructions associated with paragraphs often include spacing, so inserting empty

paragraphs multiplies that spacing.

Avoid the tab key

Tab spaces do not transfer well to Internet publications. That's one of the reasons why paragraphs in online documents are often separated by a blank line instead of an indentation. Tab spaces have to be handled by “non-breaking space” entities. Every time you press the tab key, you add handfuls of those codes, and there’s no way to control how they’ll be handled online.

If you need to separate text into columns or use spaces to convey organization, consider using a table instead. Your word processing application should have a tool for making tables. Those have their own problems, but they’ll do a better job than tabs.

Use the space bar only to put a space between words or sentences.

This is closely related to the previous tip.

Sometimes people want more control than the tab key gives them for inserting meaningful spaces in their documents, so they use the space bar. In HTML, it doesn’t matter if you press the space bar once, twice, or a dozen times, since only the first one registers with a web browser. Incidentally, that’s why the debate over whether to put one or two spaces after a period is moot when it comes to publishing online. But if you use the space bar to insert white space for formatting purposes in a word processing document, unpredictable things will happen in the conversion to HTML.

Use your word processor’s built-in styles

Most word processors have styles for normal paragraphs, heading 1, heading 2, heading 3, etc. Those correspond to structural elements in HTML documents, and they survive the transfer much better than anything that you might do manually to convey the same information.

For example, if you start a new section of your document with text that you have put in 18-point, boldface, all cap type, those formatting instructions will add lots of extra code that might be stripped out in a conversion process. But if you select “Heading 1” for that text instead, your text is more likely to be displayed appropriately when it appears on the Internet.

The headers in this document were created by using the styles available in all Microsoft Word documents.

This tip also applies to numbered lists (i.e., ordered lists) and bullet lists (i.e., unordered lists). Don't type numbers or bullets when your word processor will do it for you in a way that will transfer over to HTML.

Use Unicode fonts

Unicode is a computing industry standard for representing the world’s writing systems consistently and reliably. By now, most classicists have adopted a Unicode font for text in Greek or other alphabets. Fortunately, most modern computers include Unicode support for a variety of alphabets. SCS members can also download GreekKeys for free. (https://classicalstudies.org/publications-and-research/about-greekkeys-2015)

Use a clean-up tool

Blogs and other tools for publishing texts online often have WYSIWYG (What You See Is What You Get) editors that attempt to reproduce the writing environment you know from your word processing application. Many have a “Paste From Word” button that looks like a clipboard with a big, blue “W” on it. It opens a window where you can paste the text of your word processing document. Clicking on OK will strip out much of the extra code added by applications such as Microsoft Word.

You can also use tools such as https://word2cleanhtml.com, http://wordtohtml.net, or http://www.textfixer.com/html/convert-word-to-html.php.


Of course, the best way to prepare text for publication on the Internet is to write it in HTML in the first place, but that goes against the trend of facilitating access to digital publishing. The next best thing is to prepare your text in a way that will present the fewest obstacles to timely publication.

On behalf of web editors everywhere, thank you for considering these tips.

—Samuel J. Huskey, SCS Information Architect