Use Microsoft Word Features to Clean Up Copied Text

I sometimes receive Microsoft Word documents that are difficult to read. Text wraps at strange places, the formatting is inconsistent, and a variety of other problems make reading the document an annoying and frustrating experience.

Often, the problems stem from the fact that the document is a compilation of text that's been collected from other sources: primarily email messages and Web pages. Let's look at several ways to clean up copied text so your final document looks slick and professional.

Text Wrapping Problems

When text is copied from another source into a Word document, the text may wrap in the same place it did in the email message or on the Web page. Retaining the original word wrap means there's a return at the end of each line into the Word document (creating a document in which every line is a paragraph hugging the left side of the Word window). The cure differs depending on whether the return that's been copied is a soft return or a hard return, and you also have to determine whether the return is next to a space. (If you remove a return that isn't next to a space, the word before the return and the word after the return bunch together likethis.)

To see the type of return, you have to view the formatting marks; performing that task differs depending on the version of Word you're using.

In all versions of Word earlier than 2007, click the Show/Hide icon on the Standard Toolbar.

In Word 2007, click the Office button in the upper left corner of the window and then click Word Options (at the bottom of the window that opens). In the Word Options dialog box, select Display in the left pane, and then select Show All Formatting Marks.

When you reveal the formatting marks, hard returns look like a backwards capital "P", with two vertical lines instead of one (the official name of this symbol is "pilcrow"). Soft returns look like curved arrows.

You don't have to go through the document to remove each return individually; instead use the Replace function (Ctrl-H). In the Find field, insert one of the following:

^p if you're replacing hard returns (the p must be lower case)
^l if you're replacing soft returns (the l must be lower case)

In the Replace field, insert a space if there's no existing space next to the return. Skip the Replace field if a space already exists. Click Replace All to complete the task.

Some copied text has a hard return at the end of each line, and two hard returns at the end of each paragraph (so you can see where each paragraph ends, but the lines are truncated by the end-of-line returns). Following is an easy way to fix the entire document.

1. Press Ctrl-H to open the Replace dialog box.
2. In the Find field, enter ^p^p (to identify the new paragraphs).
3. In the Replace field, enter text that wouldn't be found in the document (I use $$).
4. Click Replace all.

The document now has streaming text with no paragraphs, but the lines are still truncated because the single hard returns still exist. Use the following steps to get rid of the single hard returns:

1. Press Ctrl-H to open the Replace dialog box.
2. In the Find field, enter ^p.
3. If no spaces exist next to the hard returns, enter a space in the Replace field; if a space exists, skip the Replace field.
4. Click Replace All.

The document now wraps properly, but there are no individual paragraphs, and you can see $$ (or whatever text you entered) in the text. Let's clean that up:

1. Press Ctrl-H to open the Replace dialog box.
2. In the Find field, enter $$ (or the text you used).
3. In the Replace field, enter ^p^p.
4. Click Replace All.


Font Problems

Web sites and email text often use fonts that don't exist in your system. After you copy text to a Word document, you may see fonts named Normal (web), Strong, Emphasize, Cutesy, MyFont, BobsFavoriteFont, or any other font that the writer created.

You can select the paragraph, or the entire document (using Ctrl-A), and apply one of your own fonts from the Style drop-down list in Word (usually, Normal style), but that often fails to correct the problem. Some paragraphs stubbornly refuse to change, or, even more annoying, some individual words in the paragraph fail to change.

You can thwart the stubborn style's effort to remain in your document by removing the stubborn style from the document, which forces the text to revert to the style named Normal.

To remove unwanted styles from a document, open the Font Organizer:

In versions of Word previous to 2007, choose Tools, Templates and Add-ins, and click the Organizer button.

In Word 2007, click the Office button, click Word Options, select Popular in the left pane, select Show Developer tab in the Ribbon, and click OK. Then select the Developer tab, click Document Template, and click Organizer.

The Organizer dialog box shows the styles available in the current document in the left pane, and the styles available in the Normal template (.dotm file 2007, .dot file in previous versions of Word).

In the left pane, use the Ctrl key to select all the styles you want to remove from the document (and automatically turn into Normal style), and click Delete. When the confirmation dialog box appears, click Yes to All to avoid having to delete each style individually. Some styles can’t be removed (such as Headings), but it's not difficult to find the styles you need to get rid of.

