In the following steps, you setup and add Visual Basic code to the project.
-
First, you need to add a reference for the DocumentFormat.OpenXml.Packaging library to the project. This DLL file contains the namespace and programming members that you use to work with the Open XML Format files.
On the Project menu, click Show All Files.
-
In the Solution Explorer, right-click the References node and then click Add Reference.
In the Add Reference dialog box, on the .NET tab, select DocumentFormat.OpenXML, and then click OK.
On the Form1.vb Designer, double-click the btnGetFiles button to add the btnGetFiles_Click event procedure to the Form1 class.
Add the following code to the btnGetFiles_Click procedure.
Dim rootdir As String = txtDirectory.Text
numChanged = 0
SearchFolders(rootdir)
This event procedure is initiated when the user clicks the Search and Replace button. First, it assigns the starting directory to a variable and then calls the SearchFolders procedure.
Next, before the Public Class Form1 statement, add the following namespaces. Adding these namespaces here allows you to use the name for each member without having to fully qualify the member name. Note that the DocumentFormat.OpenXml.Packaging namespace contains the members used for working with the Open XML Format files.
Imports System
Imports System.IO
Imports System.Text
Imports System.Xml
Imports System.Windows.Forms
Imports DocumentFormat.OpenXml.Packaging
After the Public Class Form1 statement, add the following class fields.
Private fileType() As String = {"*.docx", "*.docm"}
Private numChanged As Integer
Const wordmlNamespace As String = "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
Here, you specify the file extensions that you will search. Instead of hard-coding the file extensions, you could add these selections to a combo box in the form and let the user select which types of document to search. You also specify a constant containing the WordprocessingML namespace.
Next, add the SearchFolders procedure to the project. This procedure is responsible for searching the folders and subfolders, if requested by the user, for the document files.
Private Sub SearchFolders(ByVal rootdir As String)
Dim files() As String
Dim file As String
Dim subdir As String
Try
' Search the files in the given directory.
For i As Integer = 0 To fileType.Length - 1
files = Directory.GetFiles(rootdir, fileType(i))
For Each file In files
Search_Replace(file)
Next
Next
' Also recursively search subfolders if requested.
If ckbIncludeSub.Checked Then
For Each subdir In Directory.GetDirectories(txtDirectory.Text)
SearchFolders(subdir)
Next
End If
Catch
' Ignore any errors.
End Try
End Sub
This procedure uses the GetFiles method of the Directory object to get the list of files from the root directory. The loop first looks for .docx files and the second pass looks for .docm files as specified in the fileType() array which you assigned in the class field declarations. Then, for each file found, the Search_Replace procedure is called.
Next, if the user selected the Search Subfolders checkbox (it is selected by default) in the form, the GetDirectories method of the Directory object is called with the root path, which then retrieves and assigns a subfolder path, if they exist, to the subdir string variable. That path is recursively passed to the SearchFolders procedure which reinitiates the search process. Note that in this particular structure, only subfolders one level below the root directory are searched. With a little research, you could modify the structure to search lower level subfolders.
Now add the Search_Replace procedure to the project. This procedure is where you search the Open XML Format files for the text terms.
Private Sub Search_Replace(ByVal file As String)
Dim wdDoc As WordprocessingDocument = WordprocessingDocument.Open(file, True)
' Manage namespaces to perform Xml XPath queries.
Dim nt As NameTable = New NameTable
Dim nsManager As XmlNamespaceManager = New XmlNamespaceManager(nt)
nsManager.AddNamespace("w", wordmlNamespace)
' Get the document part from the package.
Dim xdoc As XmlDocument = New XmlDocument(nt)
' Load the XML in the part into an XmlDocument instance.
xdoc.Load(wdDoc.MainDocumentPart.GetStream)
' Get the text nodes in the document.
Dim nodes As XmlNodeList = Nothing
nodes = xdoc.SelectNodes("//w:t", nsManager)
Dim node As XmlNode
Dim nodeText As String = ""
' Make the swap.
Dim oldText As String = txtOldText.Text
Dim newText As String = txtNewText.Text
For Each node In nodes
nodeText = node.FirstChild.InnerText
If (InStr(nodeText, oldText) > 0) Then
nodeText = nodeText.Replace(oldText, newText)
' Increment the occurrences counter.
numChanged += 1
End If
Next
' Write the changes back to the document.
xdoc.Save(wdDoc.MainDocumentPart.GetStream(FileMode.Create))
' Display the number of change occurrences.
txtNumChanged.Text = numChanged
End Sub
First, the file name is passed into the procedure. Then, the file is opened as a Open XML Format WordprocessingML document with read and write access. Next, the namespace alias used in the document is assigned to a namespace table. This namespace alias is used with the XPath query to find the text nodes in the document.
Then, an instance of an XML document is created and the contents of the main document part in the Open XML Format file is loaded into the XML document. The structure of the XML in the main document part of a very simple document is as follows.
<w:document>
<w:body>
<w:p>
<w:r>
<w:t> </w:t>
<w:r>
</w:p>
</w:body>
</document>
Where
w:p – Denotes a paragraph.
w:r – Denotes a run of text nodes.
w:t – Contains the text.
Thus, the following XPath expression selects all of the text nodes in the main document part.
nodes = xdoc.SelectNodes("//w:t", nsManager)
The next section in the procedure loops through each of the text nodes and searches for an occurrence the original term by using the InStr function. If the term is found, it is swapped with the new term.
For Each node In nodes
nodeText = node.FirstChild.InnerText
If (InStr(nodeText, oldText) > 0) Then
nodeText = nodeText.Replace(oldText, newText)
node.FirstChild.InnerText = nodeText
Next, a counter for the number of change occurrences is incremented. Once all of the nodes have been searched, the updated XML is saved back to the main document part. And finally, the txtNumChanged.Text textbox is set to the number of changes affected.
As you can see, this process is fairly straight-forward and made simple by using the methods of the DocumentFormat.OpenXml.Packaging namespace.
And finally, add the following procedure to the project. This event procedure is called when the user click the Close Form button.
Private Sub btnClose_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnClose.Click
Close()
End Sub