Skip navigation

Taking Advantage of LINQ and XML in Microsoft Office 2007 - 30 Jun 2009

 

 

Beth Massi, Microsoft Corporation
http://blogs.msdn.com/bethmassi
 
Many applications today need to take advantage of Microsoft Office which provides a rich Component Object Model (COM). However, processing these documents at a large scale often requires manipulation of the file formats directly which can be challenging. With the release of Microsoft Office 2007 developers can now work with a much simpler, standard, XML format called Open XML which opens the door for many types of applications that cannot or do not want to require Microsoft Office be installed to run.  And by harnessing the great language features of Visual Basic 9 in Visual Studio 2008, you can be much more productive when working with XML. This article will show you how to use the Open XML Format SDK and Visual Basic’s powerful and simple implementation of LINQ to XML to work with these new document formats much easier than ever before.
 
LINQ to XML in Visual Basic
Visual Basic 9 in Visual Studio 2008 has a set of language features that allows developers to work with XML in a much more productive way when using LINQ to XML through what’s called XML literals and XML axis properties. These features allow you to use a familiar, convenient syntax for working with XML in your Visual Basic code. LINQ to XML is an in-memory XML programming API specifically designed to leverage the LINQ framework. Even though you can call the LINQ APIs directly, only Visual Basic allows you to declare XML literals and directly access XML axis properties.
 
For instance, you can use an XML literal directly in the Visual Basic editor like any other data type:
 
Dim myXml = <?xml version="1.0"?>
            <root>
                <node>This is XML</node>
            </root>
 
Here, the myXML variable on the left-hand side of the expression is inferred by the compiler as an XDocument object, one of the main objects in the LINQ to XML API that represents an XML document.
 
XML literals alone would not be that interesting without embedded expressions. Embedded expressions allow you to write any Visual Basic code and have it evaluated directly in the literal using the <%= %> syntax. Any Visual Basic code can be used and you are not limited to the number of embedded expressions in your XML literals. You can also nest embedded expressions any number of levels deep. These expressions are compiled, not script, so you can benefit from the compile-time syntax checking and editor experience just like you’re accustomed to when writing programs in Visual Basic.
Take, for example, this LINQ query inside an embedded expression which produces XML that lists all the processes running on the machine:
 
 Dim myXml = <processes>
                <%= From p In System.Diagnostics.Process.GetProcesses() _
                    Select <process id=<%= p.Id %>>
                               <name><%= p.ProcessName %></name>
                               <threads><%= p.Threads.Count %></threads>
                           </process> %>
            </processes>
 
Notice this time I did not explicitly specify the XML declaration so Visual Basic will infer this as an XElement object instead. XML trees are made up of XElement objects which make them the fundamental class of the LINQ to XML API.
 
Visual Basic not only makes it easy to create XML but also to navigate and query it using another language feature called axis properties. Using a special syntax you can easily query the attribute, child and descendants axis. Use the attribute axis property to access the XML attributes in an XML element. Use the child axis property to access just the XML child elements in a given XML element.  Use the descendant axis property to access all XML elements that have a specified name that are contained in a specified XML element regardless of the level.
 
For instance, given the XML output of the previous example, say I wanted to transform the document to show only the process ID’s as elements:
 
Dim myIDs = <IDs>
               <%= From element In myXml.<process> _
                   Select <id><%= element.@id %></id> %>
            </IDs>
 
Here I’m using the child axis property to navigate to the process elements indicated by the dot (.) syntax and then selecting the process ID using the attribute axis as indicated by the @ syntax. The descendants axis property is indicated by a three dot (…) syntax. The child axis performs much better than the descendant axis so it is recommended that you use the child axis unless you really need to search for the element name in all levels of the tree. In the example I could have used the descendants axis property to achieve the same results.
 
Another Visual Basic XML feature is IntelliSense. When an XML schema is imported into the project, Visual Basic will enable IntelliSense over the XML when using axis properties. So when you type the dot (.) or the @ sign on an axis property you get IntelliSense displayed for the XML element based on the schema. Additionally, Visual Basic supports XML Namespaces in the Imports statement so you can work with multiple schema sets in a project.
 
This is just a brief explanation of some very powerful features in Visual Basic in order for you to better understand the code in the rest of this article. For a complete whitepaper on LINQ to XML in Visual Basic from members of the team that built the feature please see Sharpening Your Axis with Visual Basic 9 in the Jul/Aug 2008 issue of CoDe Magazine(http://www.code-magazine.com/Article.aspx?quickid=0807061) or visit the MSDN library.
 
 
The Open XML Format in Office 2007
 
Now that you understand the language syntax for working with XML, let us explore the Open XML Format in Microsoft Office 2007. Starting with the 2007 Microsoft Office system, Excel, Word, and PowerPoint have adopted an XML-based file format called Office Open XML. Using an XML file format not only improves security, file and data management, and data recovery but also makes it much easier to interoperate with line-of-business systems. Any development environment that supports ZIP file access and XML can access and manipulate the new file format directly. Additionally, many applications that traditionally had to work within the Office Component Object Model (COM) to manipulate documents are now able to transform the XML directly without the need to have Microsoft Office installed on the machine to run. Users of previous versions of Office can still work with this new format by installing the Office Compatibility Pack from the Microsoft Download Center (http://www.microsoft.com/downloads/details.aspx?FamilyID=941b3470-3ae9-4aee-8f43-c6bb74cd1466&displaylang=en).
 
Office Open XML is a fully editable file format that uses the Open Packaging Convention (OPC).  OPC defines a structured means to store application data together with related resources based on the simple, parts-based, compressed ZIP file format. Each file is comprised of a collection of any number of parts and it is this collection that defines the Office 2007 document. To easily demonstrate this, take a Word, Excel, or PowerPoint 2007 document and rename it with a .ZIP extension and take a look inside.
 
As you browse the contents you can see that most parts are XML files. These XML files describe application data, metadata, and even your own custom data which is stored inside the ZIP package. Other non-XML parts may also be included within the package, including images or OLE objects embedded in the document.  There are also parts that describe the relationships between other parts. While the parts make up the content of the file, the relationships describe how the pieces of content work together. Depending on the type of file, whether it is Word, Excel, or PowerPoint, the contents of the package are different. For instance, in a Word document the only required part is the main document part, word\document.xml.
 
An easier way to browse the contents of an Office Open XML file, especially if you are a developer already working in Visual Studio, is to install the VSTO Power Tools available from the Microsoft Download Center (http://www.microsoft.com/downloads/details.aspx?FamilyID=46B6BF86-E35D-4870-B214-4D7B72B02BF9&displaylang=en). One of the tools included here is a Visual Studio Add-in called the Open XML Package Editor and it allows you to manipulate Open XML and XPS packages directly in Visual Studio. It allows you to navigate, open, and edit the XML parts in the editor as shown in Figure 1.
 
The Open XML Package Editor provides a nice UI for developers wanting to inspect the packages or make quick updates to files but what a developer normally wants to do is to manipulate these programmatically. Now that you have a basic understanding of Office Open XML and LINQ to XML I will demonstrate how you can easily do this.
 
For more information on the Office Open XML format please read Introducing the Office (2007) Open XML File Formats in the MSDN Library (http://msdn.microsoft.com/en-us/library/aa338205.aspx).
 
To learn more about how to take advantage of the Open Packaging Convention within the .NET Framework see the article A New Standard for Packaging Your Data in the August 2007 issue of MSDN Magazine (http://msdn.microsoft.com/en-us/magazine/cc163372.aspx).
 
Getting Started with the Open XML Format SDK
You can access the parts of an Office Open XML file programmatically using managed code through the System.IO.Packaging classes that were introduced in .NET Framework 3.0. This namespace contains classes that let you work with generic packaging files, however there is no typed access to the specific parts that compose Microsoft Office 2007 documents. This is where the Open XML Format SDK comes in really handy.
 
The SDK, which can be downloaded from the Microsoft Download Center (http://www.microsoft.com/downloads/details.aspx?FamilyId=C6E744E5-36E9-45F5-8D8C-331DF206E0D0&displaylang=en), simplifies the manipulation of Open XML packages by providing an API over the System.IO.Packaging classes. It provides strongly typed part classes to manipulate Open XML documents easily. At the time of this writing version 2.0 has been available since September 2008 as a Community Technology Preview (CTP) but the plan is to release in spring 2009. The main features over version 1.0 are that all XML parts can be accessed through classes as well as providing deeper LINQ support. To support LINQ, version 2.0 of the SDK requires the .NET Framework 3.5.
 
When you install the SDK it places an assembly into the GAC called DocumentFormat.OpenXml.dll. Add a reference to this from your Visual Studio projects in order to use it. When deploying your application make sure you include this assembly in the deployment package. If you are using x-copy deployment, select the reference in the Solution Explorer (show all files) and set the property Copy Local equal to True in order to output the assembly with your application when you build. 
 
Let’s take a look at some examples of using the SDK to access document parts. To access the main document part in a Word document, first import the DocumentFormat.OpenXml .Packaging namespace and then you can write the following:
 
Dim writable As Boolean = False
'Open the WordprocessingDocument as read only
Using wordDoc As WordprocessingDocument = _
      WordprocessingDocument.Open("MyDocument.docx", writable)
    'Access the main document part (word\document.xml)
    Dim mainPart As MainDocumentPart = wordDoc.MainDocumentPart
 
    '...
End Using

Notice that the wordDoc variable is wrapped in a Using statement so that you can be sure that the document is closed when you are finished reading from it. Similarly you can access the PresentationPart and the WorkbookPart of PowerPoint and Excel files. The difference with these types of documents is that sheet and slide parts are not required so to get at these parts you need to enumerate the WorksheetParts collection in the case of Excel, and the SlideParts collection in the case of PowerPoint. For example, to access the first sheet of a workbook you write the following:
 
Dim writable As Boolean = False
'Open the SpreadsheetDocument as read only
Using spreadSheet As SpreadsheetDocument = _
      SpreadsheetDocument.Open("MyWorkbook.xlsx", writable)
 
    Dim workBook As WorkbookPart = spreadSheet.WorkbookPart
 
    'Access the first sheet if one exists using the LINQ extension method
    Dim sheet1 As WorksheetPart = workBook.WorksheetParts.FirstOrDefault()
    If sheet1 IsNot Nothing Then
        Console.WriteLine(sheet1.Uri)
    End If
 
End Using
 
Let’s take another example of working with the object model exposed by the SDK. In the Word document in Figure 1. it shows that there are five linked parts to the main document part as indicated by the links shown under the document.xml node in the tree view. If you want to create an XML document that describes the parts of the main document programmatically you write the following:
 
'Open the WordprocessingDocument as read only
Dim writable = False
Using WordDoc = WordprocessingDocument.Open("MyDocument.docx", writable)
    'Access the main document part (word\document.xml)
    Dim mainPart = WordDoc.MainDocumentPart
 
    'Create an XML document of information about the sub-parts
    Dim partsXML = <parts>
                   <%= From p In mainPart.Parts _
                       Select <part id=<%= p.RelationshipId %>>
                                <uri><%= p.OpenXmlPart.Uri %></uri>
                                <type><%= p.OpenXmlPart.ContentType %></type>
                              </part> %>
                   </parts>
 
    partsXML.Save("parts.xml")
End Using
 
Note in this example I’m taking advantage of the compiler’s ability to infer types for local variables, a feature of Visual Basic 9 (Option Infer ON in addition to Option Explicit ON, Option Strict ON) which means I do not have to explicitly declare the types of variables, the compiler does it automatically. This example uses XML literals and an embedded LINQ expression to create a simple XML document of the object properties of the parts of the main document. You can use this technique to inspect and create XML content from any object model in .NET. However the real value of using the Open XML SDK and LINQ to XML is realized when we start working with the actual XML content of these parts.
 
Manipulating XML in Document Parts
In order to access the actual XML in a part in an Open XML package you use a stream reader and de-serialize the content into an XDocument. You do this by first importing System.IO and System.Xml namespaces and then you can write the following:
 
Dim writable = False
Dim myXML As XDocument
Using wordDoc = WordprocessingDocument.Open("MyDocument.docx", writable)
   Dim mainPart = wordDoc.MainDocumentPart
 
   Using xr As New StreamReader(mainPart.GetStream())
        'Load the MainDocument part's XML
        myXML = XDocument.Load(xr)
    End Using
    '...
End Using
 
However it would be much better to write this as an extension method so that you could easily access the XDocument like any other method on the parts. Additionally, to prevent from unnecessarily de-serializing the XML content, once you have obtained the XDocument you can save it in an annotation on the part. If the annotation already exists, then de-serializing can be avoided and just the annotation is returned. To create an extension method in Visual Basic, first create a Module and write your method. Then attribute it with the System.Runtime.CompilerServices.Extension attribute. The first parameter to an extension method is the type you are extending. For example, here I am creating an extension method called GetXDocument that extends the OpenXmlPart class:
 
Imports System.Runtime.CompilerServices
Imports DocumentFormat.OpenXml.Packaging
Imports System.IO
Imports System.Xml
 
Module MyExtensions
    <Extension()> _
    Function GetXDocument(ByVal part As OpenXmlPart) As XDocument
        ' Create an extension method that first checks the
        '  annotation on the part for the XDocument.
        Dim xdoc As XDocument = part.Annotation(Of XDocument)()
        If (xdoc IsNot Nothing) Then
            Return xdoc
        End If
 
        Using streamReader As New StreamReader(part.GetStream())
            'Load the XML content into the XDocument
            xdoc = XDocument.Load(XmlReader.Create(streamReader))
            'An annotation is created when we first access the stream.
            part.AddAnnotation(xdoc)
        End Using
 
        Return xdoc
    End Function
End Module

You can easily use this extension method in your program to access the XML content as shown in the following example:
 
Dim writable = False
Using wordDoc = WordprocessingDocument.Open("MyDocument.docx", writable)
    'Use an extension method to retrieve the XML content of the part
    Dim myXML = wordDoc.MainDocumentPart.GetXDocument()
    '...
End Using
 
 
Now that there is an easy way to get the XDocument of the parts you can take advantage of the techniques I showed in the beginning of the article in order to query and manipulate XML using axis properties and XML literals. When querying the XML document parts you need to be aware of the XML schemas and namespaces that are being used in the XML. A Word document follows the word processing markup language (WordprocessingML) so you will need to import this XML namespace at the top of your program if you are navigating and querying the XML.
For instance, if you wanted to replace the first line of text in a Word document you would first import the XML namespace:
 
Imports <xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
Imports System.IO
Imports System.Xml
Imports DocumentFormat.OpenXml.Packaging
 
And then you can write the following code:

Dim writable = True
'Open the document for read-write access
Using wordDoc = WordprocessingDocument.Open("MyDocument.docx", writable)
    'Load the MainDocument part's XML
    Dim mainPart = wordDoc.MainDocumentPart
    Dim myXML = mainPart.GetXDocument()
 
    'Find the first line of text in the document:
    Dim element = (From item In myXML...<w:t>).FirstOrDefault()
    If element IsNot Nothing Then
        'Replace the value of the element:
        element.Value = <s>This is <%= Environment.UserName %>'s document</s>.Value
    End If
 
    Using sw As New StreamWriter(mainPart.GetStream(FileMode.Create))
        'Save the modified XML back to the MainDocument part
        myXML.Save(sw)
    End Using
End Using

If you forgot to import the XML namespace in this example then the compiler would have reported an error on the <w:t> element in the LINQ query notifying you that the XML namespace “w” was not defined. Also note that I could have created another extension method for serializing the XML back into the document part instead of doing it here.
 
These are simple examples used to demonstrate how easily accessible XML is from Office documents when using the Open XML Format SDK. Next I will show how you can use what you have learned so far and apply that to your business applications.  
 
Working with Custom XML Data in Documents
As mentioned earlier you can store your own custom XML in Office Open XML packages. This makes it easy to store your own structured data inside documents which may need to interact with line-of-business systems. Let’s take an example where you want to provide a Word 2007 document to users which they can use to collaborate with customers on a purchase order. You want to capture structured order data inside this document and save it back to the database.
 
One way to store data in a Word 2007 document is by using content controls. These allow you to define specific data areas/fields in the document which are then bound to XML that is placed inside the document. When users enter data into these areas of the document the data is stored as a custom XML part. This two-way data binding between the controls and the custom XML also allows you to replace the custom XML parts programmatically, as in generating sales contracts or other types of data-bound documentation, which users can then edit further. You can use Visual Studio to create content controls and map them to XML or you can use Word itself.
 
First create a purchase order template and lay out the content controls on the document. When you open Microsoft Word 2007 select the Developer tab and you will see the Controls section. There you can choose which types of controls to lay out on the document. Click the properties button to assign a friendly title and tag to the control as shown in Figure 2.
 
After the content controls are created you map the custom XML to these controls. For this example I have created a simple XML document that collects the CustomerID, OrderDate, Shipper and up to 10 order detail line items specifying the ProductName and Quantity. It is also very important to declare an XML namespace for your custom XML; you will see why shortly. The custom XML document looks like the following (for clarity, not all OrderDetail elements are shown):
 
<OrderEntry xmlns="urn:microsoft:examples:orderentry">
 <CustomerID />
 <OrderDate />
 <Shipper />
 <OrderDetails>
    <OrderDetail>
      <ProductName />
      <Quantity />
    </OrderDetail>
    <OrderDetail>
      <ProductName />
      <Quantity />
    </OrderDetail>
    <OrderDetail>...
 </OrderDetails>
</OrderEntry>

In order to easily bind the custom XML to the content controls you can use a tool called the Word 2007 Content Control Toolkit that you can download from CodePlex (http://www.codeplex.com/dbe).  This tool lets you drag elements from your custom XML onto the content controls to visually set the data binding as shown in Figure 3.
 
Once you data bind the content controls you can fill out some of the data on the purchase order.  For this example I will use the infamous Northwind Database that you can download from the Microsoft Download Center (http://www.microsoft.com/downloads/details.aspx?FamilyID=06616212-0356-46A0-8DA2-EEBC53A68034&displaylang=en). After you fill out the content controls in Word and save it, open the document in Visual Studio to view the custom XML with the Open XML Package Editor I demonstrated previously. This time you will see a new relationship from the main document part to your custom XML part. If you double-click on the custom XML part it will open in the XML editor and display all the data you entered. Now you will write a program to parse this data and update the database.
 
In order to easily access the data in this example, add a new item to the project and select LINQ to SQL classes. LINQ to SQL makes it easy to work with simple data models where each table in the database maps to a single class in code. From the Server Explorer drag the Customers, Orders, Order Details, Products and Shippers tables onto the designer. This will automatically infer the associations between the classes it generates based on the relations in the database. 
 
Now that the purchase order is created and you have an easy way to access data, you can now import a schema into the project that describes the OrderEntry custom XML. This will enable IntelliSense as you write the LINQ to XML queries. Visual Basic can infer schema sets from XML data using a new item template that was introduced in Visual Studio 2008 Service Pack 1. Add a new item and select “XML to Schema”. This will open a dialog that lets you specify where the XML data resides. You can just copy the OrderEntry custom XML to the clipboard and then paste it into this dialog as shown in Figure 4.
Now you can write the following code to query the OrderEntry data from the custom XML part:
 
Imports DocumentFormat.OpenXml.Packaging
Imports System.IO
Imports <xmlns="urn:microsoft:examples:orderentry">
 
Module Module1
    'This is the DataContext for our LINQ to SQL classes
    ' used to quickly access the Northwind database tables.
    Private db As New NorthwindDataContext
 
    Sub Main()
        Dim msg As String = ""
        Dim docXML As XElement = Nothing
        Try
            Using wordDoc = _
                  WordprocessingDocument.Open("MyTestOrder.docx", False)
 
                'Get the main document part (document.xml)
                Dim mainPart = wordDoc.MainDocumentPart
 
                'Find the order data custom XML part
                For Each part In mainPart.CustomXmlParts
                    'This will only return the customXML document that
                    ' is in the XML namespace we imported above.
                    docXML = part.GetXDocument.<OrderEntry>.FirstOrDefault()
                    If docXML IsNot Nothing Then
                        Exit For
                    End If
                Next
            End Using
 
            If docXML Is Nothing Then
                msg = "This document does not contain order entry data."
                Throw New InvalidOperationException(msg)
            End If
 
            'Grab the order data fields from the XML
            Dim customerID = docXML.<CustomerID>.Value.Trim()
            Dim orderDate = docXML.<OrderDate>.Value.Trim()
            Dim shipperName = docXML.<Shipper>.Value.Trim()
 
            If customerID <> "" AndAlso IsDate(orderDate) Then
                'Find the customer in the database
                Dim customer = (From c In db.Customers _
                                Where c.CustomerID = _
                                      customerID).FirstOrDefault()
 
                If customer IsNot Nothing Then
                    'Add a new order for this customer based
                    ' on the CustomXML data
                    Dim order As New Order With _
                                   {.Customer = customer, _
                                    .OrderDate = CDate(orderDate)}
 
                    Dim shipper = (From s In db.Shippers _
                                   Where s.CompanyName.Trim.ToLower = _
                                      shipperName.ToLower).FirstOrDefault()
 
                    If shipper IsNot Nothing Then
                        order.Shipper = shipper
                    End If
 
                    For Each item In docXML.<OrderDetails>.<OrderDetail>
                        'Grab order details data fields
                        Dim productName = item.<ProductName>.Value.Trim()
                        Dim quantity = item.<Quantity>.Value.Trim()
 
                        If productName <> "" AndAlso IsNumeric(quantity) Then
                            'Add a new Detail for each product found
                            Dim product = (From p In db.Products _
                                        Where p.ProductName.Trim.ToLower = _
                                        productName.ToLower).FirstOrDefault()
 
                            If product IsNot Nothing Then
                                Dim detail As New Order_Detail With _
                                       {.Order = order, _
                                        .Product = product, _
                                        .Quantity = CShort(quantity), _
                                        .UnitPrice = CDec(product.UnitPrice)}
                            End If
                        End If
                    Next
                    'Save the data back to the database
                    db.SubmitChanges()
                    msg = "Document data parsed and saved successfully."
                    Console.WriteLine(msg)
                End If
            End If
 
        Catch ex As Exception
            Console.WriteLine(ex.ToString)
        End Try
        Console.ReadLine()
    End Sub
End Module
 
There are a few important things to point out in this code. Notice that there is an Imports statement that specifies that the default namespace should use the orderentry schema now in the project. This is necessary for the custom part query to be able to return the OrderEnrty custom XML. Additionally, as I wrote the code, Visual Basic provided full IntelliSense on the XML axis properties because a schema was added to the project. Finally, realize that in trying to keep this example concise, I did not write code to check for duplicate submissions.
 
As you can see most of the code is around the data access and not the LINQ to XML queries themselves. This is the beauty of using LINQ to XML with the Open XML SDK; the technologies are easy to work with. Let’s take one final example that is very common in business applications; creating documents from database data.
 
Creating Documents from Database Data
 
Another common requirement in business applications is the ability to generate documents from data in a database, similar to a mail merge. You can do this easily in Microsoft Word but what if you needed to automate this process in a batch job or part of a workflow? What if there was a need to produce documents from data entered from a high volume website? Using Microsoft Word in this case would not be a good choice because Word cannot process the data fast enough and is not multi-threaded. Instead it is much better to create the document formats directly.
 
You can use a similar technique to the one shown previously by creating a template with content controls mapped to a custom XML part and then replacing the XML by an automated process. However in many cases you do not need to collect any additional information nor do you want users to see content controls in their documents as is the case when generating letters to customers. Instead you want to dynamically create the XML for the entire main document part. This technique also gives you the ability to replace any of the text in the document.
 
To illustrate this, the next example will generate letters from the Northwind database for customers where their orders have shipped. The letter notifies them of the delivery as well as the employee contact information that entered the order. I will also include a picture of the employee in the document in order to show how you work with images in an Open XML package.
 
First create a template document that contains the text and a sample picture you want to include in the letter. This time you do not need to use any special controls, just type the text you want and enter placeholders for the database fields as shown in Figure 5.
 
Like the previous example, this one will also use LINQ to SQL classes to easily extract data from the Northiwind database. Add a new item to your project and select LINQ to SQL classes. This time drag Customers, Employees and Orders tables onto the designer to create the object model.
Next create a simple class called Letter that has three properties, CustomerID As String, Document As XDocument, and EmployeePhoto As Byte(). A collection of these classes will be created from a LINQ query. The Document property will hold the XML contents of the main document part, and embedded expressions will be used to merge the database fields into this XML.
First write the query to gather the list of customer orders that have shipped and create the letter objects. Note that I have modified some of the data in Northwind so that the order ship dates are in this century:
Dim db As New NorthwindDataContext
 
'Create a query that will create Letter objects for each
' of the customers in Northwind that have shipping orders:
Dim letters = _
    From Order In db.Orders _
    Join Employee In db.Employees _
        On Order.EmployeeID Equals Employee.EmployeeID _
    Where Order.OrderDate IsNot Nothing AndAlso _
          Order.ShippedDate IsNot Nothing AndAlso _
          Order.ShippedDate.Value.Date >= #1/1/2008# _
    Let OrdDate = Order.OrderDate.Value.ToShortDateString _
    Let ShipDate = Order.ShippedDate.Value.ToShortDateString _
    Select New Letter With { _
          .CustomerID = Order.Customer.CustomerID, _
          .EmployeePhoto = If(Employee.Photo IsNot Nothing, _
                              Employee.Photo.ToArray, _
                              Nothing), _
          .Document = <?xml version="1.0"?>
                      <doc>.....</doc>}
 
Now you need to open the letter template in the Open XML Package Editor as shown previously and copy to the clipboard the XML contents from the word\document.xml part. Then paste it into the query as the value of the Document property (be careful not to delete the last curly brace that closes the With {} clause). Then you can go through the XML literal and embed database fields from the query into any place in the document like so (the entire XML literal is omitted for clarity):
 
<w:r>
   <w:t xml:space="preserve">If you have any issues please contact </w:t>
</w:r>
<w:proofErr w:type="spellStart"/>
<w:r>
   <w:t><%= Employee.FirstName & " " & Employee.LastName %></w:t>
</w:r>
 
The relationship ID of the picture in this example is rId4 which you can see near the bottom of the XML literal. Now you can use the following code to generate the letters into a Letter directory and replace the main document part and the picture. The letters are named after the customer’s ID in this example but any unique identifier could be used.
 
'Place the letters in a separate directory
Dim sourceFile = CurDir() & "\LetterTemplate.docx"
Dim letterDir = CurDir() & "\Letters\"
My.Computer.FileSystem.CreateDirectory(letterDir)
 
For Each letter In letters
    Dim customerFile = letterDir & letter.CustomerID & ".docx"
    My.Computer.FileSystem.CopyFile(sourceFile, customerFile, True)
 
    'Open the Document using the Open XML SDK
    Using wordDoc = WordprocessingDocument.Open(customerFile, True)
        'Get the main document part (document.xml)
        Dim mainPart = wordDoc.MainDocumentPart
 
        'Replace the document part with our XML
        Using sw As New StreamWriter(mainPart.GetStream(FileMode.Create))
            letter.Document.Save(sw)
        End Using
 
        If letter.EmployeePhoto IsNot Nothing Then
            'Replace the photo
            Dim imagePart = mainPart.GetPartById("rId4")
 
            'NOTE: Northwind Photos have an OLE header
            ' on them of 78 bytes that we must strip off
            Using ms As New _
                  System.IO.MemoryStream(letter.EmployeePhoto, 78, _
                  letter.EmployeePhoto.Length - 78)
               
                imagePart.FeedData(ms)
            End Using
        End If
    End Using
Next
 
Figure 6. shows one of the letters that was created after running this code.
As you can see there are many uses of the Open XML SDK and many more for XML Literals in Visual Basic. You can easily work with Office 2007 document formats directly in managed code, completely avoiding COM or having to have Office installed. This opens the door for many types of applications that were difficult or impossible to build previously. I hope that this article has demonstrated some of the power and productivity gains you get using these technologies in tandem in your business applications. Enjoy!
 
About Beth Massi
Beth is a Program Manager on the Visual Studio Community Team at Microsoft and is responsible for producing and managing content for business application developers, driving community features and team participation onto MSDN Developer Centers (http://msdn.com), and helping make Visual Studio one of the best developer tools in the world. She also produces regular content on her blog (http://blogs.msdn.com/bethmassi), Channel 9, and a variety of other developer sites and magazines. As a community champion and a long-time member of the Microsoft developer community she also helps with the San Francisco East Bay .NET user group and is a frequent speaker at various software development events. Before Microsoft, she was a Senior Architect at a health care software product company and a Microsoft Solutions Architect MVP. Over the last decade she has worked on distributed applications and frameworks, web and Windows-based applications using Microsoft development tools in a variety of businesses. She loves teaching, hiking, mountain biking, and driving really fast.

 

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish