Protecting Personal Data in Your Word 2003 Documents
Microsoft® Office Word 2003
Summary: By default, Microsoft Office Word 2003 documents contain hidden data. One measure you can take when sharing documents with others is to remove the information you don't want others to see. This article presents various ways that information is stored in a document and how to remove that information. (12 printed pages)
Types of Information
Viewing Personal or Hidden Information
Removing Personal Information during Document Saves
Manually Removing Personal Information
Manually Removing Personal Summary Information When Connected to a Network
Removing Personal Information Programmatically
Displaying Hidden Items
Removing Styles from Documents
Features That Store Hidden Information
Removing Links from Field Codes
Removing Your Name from Visual Basic Code
General Suggestions about Security
Microsoft® Office Word 2003 stores a lot of information in a document, about not only when and how the document was created, and what changes it has undergone in its life, but also about who created it and who made changes to the document. One relatively easy security measure you can take when sharing documents with others is to remove information that you don't want others to see. For example, you can remove information from a document to prevent others from seeing such information as the author, reviewers, date created, and so forth.
This article looks at some features and properties in Word that may contain data and ways you can remove this information. With some understanding of where sensitive information may existence in your document and how it got there, you can remove this information from your document with just a little effort. By using the information discussed in this article, you can prevent sensitive information from falling into the wrong hands.
Privacy and the protection of personal information has become vitally important as hackers, identity thieves, and even competitors continue to employ increasingly sophisticated ways to gain access to and exploit sensitive information about companies and individuals. Product sales and the success of new and legacy products hinge on the ability of software vendors to provide ways for users to identify and safeguard this information.
Most software applications store information (also known as metadata) in the files they use in order to provide and maintain a history of the file, to promote collaboration within the organization, and to keep all of the information relating to the file in one central location. Much of this information is stored as part of a product feature or property setting to aid in collaboration. However, sometimes data is stored without the user being aware of what data is stored or where it is stored. Even seemingly innocuous features and settings in a product can store information that can provide information about you or your company to prying eyes. For example, Microsoft Office Word 2003 and some other document processing applications allow you to store different versions of a document in the document file as hidden text. Let's say you spent several weeks working with your marketing and editorial staff to create a document outlining the features of a new product. You plan to send this document to your sales staff as part of a new marketing campaign. At the last minute, you decide to remove a couple of features from the product that require more testing but you definitely intend to include in the next version of the product. You are unaware that the versioning feature is turned on for this document so each one of your revisions, including the version with the removed product features, is saved with the document file. After sending the document by e-mail to the sales team, a copy of the e-mail attachment falls into the hands of a competitor who, after viewing the different versions of the document, sees the version detailing the removed features and sends that information to their engineering team.
The bottom line is that much of the metadata is stored for collaboration. However, collaboration ends when a user has published a document to the outside world, by posting it to the Internet or sending a completed document to a client. After collaboration is complete, before the document is published, you need to make sure that you remove all of the hidden data that you don't want others to see.
In many different document-processing applications, including Word, there are a number of different types of metadata stored. They can include:
- Your name
- Your initials
- Your company or organization name
- The name of your computer
- The name of the network server or hard disk where you saved the document
- The names of previous document authors
- Document revisions
- Document versions
- Template information
- Hidden text
The use of these types of metadata provides several benefits to the organization and the users of these documents. Storing the information in the document to which it applies ensures that the information is not separated from the document. Storing much of this information by default, such as when saving or revising the document, takes the onus from the user of remembering to track the information. In addition, storing proprietary information can help protect a company or individuals in the event of plagiarism or copyright infringement.
Thus, you can see that metadata definitely has a place when used wisely and maintained accordingly. By adopting a company-wide awareness policy and a desktop programming approach when publishing the document, you can identify and eliminate discrete types of metadata.
Now let's look at some ways of safeguarding and removing sensitive information from Word documents.
Discovering the metadata in a document isn't that difficult. For example, one feature of Word lets you open a document that is corrupt by viewing the text without the formatting. You can also use this feature to view some of the metadata associated with a document. Perform the following steps on a document of your own:
- Start Word.
- On the File menu, click Open.
- In the Files of type drop-down list, click Recover Text from Any File, locate a Word .doc file, and then click Open.
The document opens without any formatting. After scrolling through the document, you may see information such as the name of the author of the document, path of the stored document, and so forth. Note that the information you see may not be in context. You should be careful to note whether the information you are viewing is part of the text of the document or part of the metadata that was added to the document.
Before you provide others with a copy of your document, you should view any "invisible" or hidden information, and decide whether it is appropriate to include. For example, if you use Track Changes from the Tools menu, Versions from the File menu, or the Allow fast saves option from the Save tab of the Options dialog box available from the Tools menu, you should look at removing any hidden or deleted information that might remain in your document. You can do this by following the steps in the following sections.
You can ensure that Word removes personal information when saving your documents. To enable this option, use the following steps:
- On the Tools menu, click Options, and click the Security tab.
- Select the Remove personal information from file properties on save check box under the Privacy options section, and then click OK.
- Save the document.
When you use this option, Word removes the following personal information from your document:
- File properties: Author, Manager, Company, Last saved by.
- Names associated with comments or tracked changes: Word changes names to Author.
Note Tracked changes are marks that show when you make a deletion, insertion, or other editing change in a document.
- Routing slip: The routing slip is removed.
- The e-mail message header that is generated with the E-mail toolbar button is removed.
- Versioning: The name under Saved by is changed to Author.
Note The Remove personal information from file properties on save is not selected by default. In addition, when you select the option, it applies only to the active document and not to any existing or new documents. Therefore, you must select this option explicitly for each document.
Note The Microsoft Download Center contains an add-in that allows you to remove hidden data and collaboration data permanently, such as change tracking and comments, from Word, Microsoft Office Excel 2003, and Microsoft Office PowerPoint 2003 files. The Remove Hidden Data add-in removes personal or hidden data that might not be immediately apparent when you view the document in your Microsoft Office application.
You can manually remove personal information from a document by using one or both of the following procedures.
- Open the document in Word.
- On the File menu, click Properties.
The Summary, Statistics, Contents, and Custom tabs each contain information that you may want to remove.
- On the Tools menu, click Options, and then click the User Information tab.
The following edit boxes appear:
- Mailing Address
- If you do not want any of this information to appear in your documents, type non-identifying strings or spaces in the appropriate edit boxes, and then click OK to accept the changes.
Note Any new documents that you create contain this information instead of the default values provided when you installed Office. However, existing documents may already contain the default information.
From the Microsoft Windows® Explorer:
- In Windows Explorer, right-click the document, and then click Properties. The tabs in the Properties dialog box may contain information you want to remove.
If you are logged on to a network, your network user name may appear in the Author box on the Summary tab and in the Last saved by field on the Statistics tab when you save a document. This issue may occur even if you removed all other personal information from your computer.
To remove summary information from a document when you are on a network, follow these steps:
- If the document is stored on a network server, copy it to your local hard disk.
- Start your computer, but do not log on to your network. When you see the network logon dialog box, click Cancel or press ESC.
Note If you cannot start Windows by pressing ESC (for example, your computer is running Microsoft Windows NT), you cannot continue these steps.
- Open the document.
- On the File menu, click Properties.
- On the Summary tab, clear the text boxes for the Author, Manager, Company fields and any other fields that contain information that you do not want to distribute.
- On the Custom tab, delete any properties that contain information that you do not want to distribute.
- Click OK. On the File menu, click Save, and then click Close.
When you log on to the network, do not open the file. If you do, your network user name may be written in the file. However, you can use Windows Explorer to copy the file to either a network server or a floppy disk.
Word also provides the RemovePersonalInformation property which, when set to True, removes all user information from comments, revisions, and the Properties dialog box when the user saves a document. For example, this procedure sets the current document to remove personal information from the document the next time the user saves it:
Sub RemovePersonalInfo() ThisDocument.RemovePersonalInformation = True End Sub
Sometimes in order to protect your personal information, you must display the information before you can decide whether to remove it. The following sections explained how to display various items that may contain hidden information.
Display Tracked Changes and Comments
Markup items in a Word document consist of comments and tracked changes such as insertions, deletions, and formatting changes, which are used by writers and editors to annotate a document during the editing process. When you choose to display all markup, all types of markup and all reviewers' names are selected on the Show menu.
Note Before deleting, it is a good idea to print a document with the markup to keep a record of changes made to a document.
To display tracked changes or comments, click Markup on the View menu.
Display Hidden Text
Hidden text in a Word document is character formatting that allows you to show or hide specified text. To view hidden text, click Options on the Tools menu, click the View tab, and then select the Hidden text check box under Formatting marks. Word indicates the hidden text by underlining it with a dotted line.
To remove hidden text from a printed document, click Options on the Tools menu, click the Print tab, and then under Include with document, clear the Hidden text check box. If you plan to distribute the document online, just delete the hidden text as you would delete any other text.
Remove Previous Versions of a Document
You can specify that you want Word to save one or more versions of your document in the same file. Those versions are then saved as hidden information in the document so that you can retrieve them later. Because these hidden versions are available to others and because they do not remain hidden if the document is saved in another format, you may want to remove the versions before you share the document. There are a couple of ways to do this:
To keep the previous versions, the following steps allow you to save the current version as a separate document and then distribute only that document:
- On the File menu, click Versions.
- Click the version of the document you want to save as a separate file.
- Click Open.
- On the File menu, click Save As.
- In the File name box, type a name, and then click Save.
To delete the unwanted versions and then distribute the document, do the following:
- On the File menu, click Versions.
- Click the version of the document you want to delete.
- To select more than one version, hold down CTRL as you click each version.
- Click Delete.
Documents may include styles that contain metadata. You can remove these styles or rename them. To do this, follow these steps:
- Open the document that contains the styles.
- On the Format menu, click Style.
- Select the style that you want to delete or rename. Click Delete to delete the style, or click Modify to rename it.
Some features in Word store metadata by default. Clearing these features can remove metadata from your documents.
Fast Save Option
If you save a document with the Allow fast saves check box selected, and then open the document as a text file, the document may contain information that you previously deleted. This happens because a "fast save" appends the changes you make to the end of the document; it does not incorporate the changes (including deleted information) into the document itself.
To remove the deleted information from the document completely, do the following:
- If you opened the document as a text file, close the text file and open the document as a regular Word document.
- On the Tools menu, click Options, click the Save tab, and then clear the Allow fast saves check box.
- On the File menu, click Save.
Random Numbers Used When Merging Documents
When you compare and merge documents, Word uses randomly generated unique numbers (GUID) to help keep track of related documents. Word keeps a log of each GUID you generate on your local machine; if you receive a document with a GUID that matches one that you already had, it raises a 'Merge Changes?' message. However although these numbers are hidden, they could potentially be used to demonstrate that a document was created on your computer, if the person investigating has administrative access. To stop storing random numbers during the merge process, perform the following:
- On the Tools menu, click Options, and then click the Security tab.
- Clear the Store random number to improve merge accuracy check box.
Important If you choose not to store these numbers, the results of merged documents will be less than optimal, meaning that it may be difficult for Word to determine if a two or more documents are related.
Linked images and other objects in Word documents may contain linking information, such as the path of the linked image or object. You can remove linking information from your document by editing the field codes.
To display field codes, follow these steps:
- On the Tools menu, click Options, and then click the View tab.
- Click to select the Field codes check box, and then click OK.
After field codes are visible, you can check to see if any of them contain identifying information.
To remove the linking information from a linked image or other object, follow these steps:
- Select the linked image or object, or select the field code for the image or object if field codes are visible.
- Press CTRL+SHIFT+F9.
The image or object is now unlinked. When an image or object is unlinked, you cannot edit it.
Routing Slip Information
If you send a document through e-mail by using a routing slip, Word may attach routing information to the document. Routing information includes the e-mail addresses of all of the document recipients, plus any text that you added to the 'introduction' section. To remove this information from the document, you must save the document in a format that does not retain routing slip information such as Rich Text Format (RTF) or HTML format.
You can also use the following procedure to remove routing slip information:
- Turn off Allow fast saves option by using the steps in the Fast Save Option section of this article.
- On the File menu, point to Send to, and then click Routing Recipient.
- Click Clear to remove the routing slip, and then click OK.
- On the File menu, click Save.
The document is now saved without any routing slip information
Documents may contain hyperlinks to other documents or Web pages on an intranet or the Internet. This information is contained within the document and stays with the document if you share or copy it.
Note Hyperlinks typically appear as blue underlined text strings.
To delete a single hyperlink from a document manually, right-click the hyperlink, point to Hyperlink on the shortcut menu, and then click Remove Hyperlink.
To delete all hyperlinks in a document, you can use a Microsoft Visual Basic® for Applications (VBA) macro. The following sample VBA macro removes all hyperlinks in a document.
Sub RemoveHyperlinks() Dim objDoc As Document Dim objStory As Range Dim objHlink As Hyperlink For Each objStory In ActiveDocument.StoryRanges For Each objHlink In objStory.Hyperlinks objHlink.Delete Next Next End Sub
After running this macro, only the link is removed. The text of the hyperlink remains in the document.
To remove all traces of both the hyperlink and the text of the hyperlink from the document, you can use the following sample macro instead:
Sub RemoveAllHyperlinks() Dim objDoc As Document Dim objStory As Range Dim objHlink As Hyperlink For Each objStory In ActiveDocument.StoryRanges For Each objHlink In objStory.Hyperlinks objHlink.Range.Delete Next Next End Sub
When you record a VBA macro in Word, the recorded macro begins with a header similar to the following:
'Macro1 Macro 'Macro recorded 3/11/1999 by <User Name>
To remove your name from any macros that you record, perform the following:
- Open the document that contains the macros.
- On the Tools menu, point to Macro, and then click Visual Basic Editor.
- In the project window, double-click the module that contains the macros.
- Remove your name from the recorded macro code.
- Press ALT+Q to return to the program and then click Save on the File menu.
Document variables store information in a document. For example, you can use document variables to preserve macro settings in between macro sessions. They can also contain metadata.
You can use the following statement to display the number of variables in the document named "MyDoc.doc."
... MsgBox Documents("MyDoc.doc").Variables.Count & " variables" ...
You can use the following procedure to display the name and value of each document variable in the active document:
... For Each myVar In ActiveDocument.Variables MsgBox "Name =" & myVar.Name & vbCr & "Value = " & myVar.Value Next myVar ...
You can use the following statement to delete a particular document variable form a document:
... ActiveDocument.Variables.Item("MyVar").Delete ...
Following are some general suggestions that you can use to help increase the level of security in your computing environment:
- Information rights management (IRM) functionality helps organizations and employees protect and have greater control over digital information, such as confidential planning documents or financial reports. You can set policies that wield greater control over who can open, copy, print, or forward information.
- Whenever you are not at your computer, help to secure it by using a password-protected screen saver, power-on password, or the Windows Standby (lock) feature.
- If your computer has any shared folders, make sure that you apply passwords to them so that only authorized users can access your shares. For even more security, use user-level access control so that you can control exactly who can access your computer's shares.
- When you delete a file, empty the Recycle Bin immediately. You may want to consider a third party utility that completely erases or overwrites files when you delete them.
- When you back up your data, store the back up copies in a secure location, such as a safe, a security deposit box, or a locked cabinet.
- Make sure that important documents are password-protected so that only authorized users can open them. Store your passwords in a secure, separate location. Just remember that if you cannot recall a password, there is no way to recover the contents of a password-protected document except with some third party utilities. You can further secure the document by using password encryption; however, you should use a strong encryption key instead of the default encryption.
- Do not distribute documents in electronic form. Instead, print them out and maintain physical security on the copies. Do not use identifying elements such as distinctive fonts, watermarks, logos, or special paper, unless you must (for example, for a presentation).
- E-mail is not anonymous. Do not send a document by e-mail if you are concerned about your identity being attached in any way to the document.
- Do not send a document over the Internet by using either the HTTP (HyperText Transport Protocol) or File Transfer Protocol (FTP) unless you use Internet security and authentication protocol software. Information sent across these protocols is sent in clear text, this means that it is technically possible (however unlikely) for it to be intercepted.
This article reviewed just a few of the ways of dealing with metadata in your document. For more information on other ways to management this information, see the following references: