Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

c# - Merge multiple word documents into one Open Xml

I have around 10 word documents which I generate using open xml and other stuff. Now I would like to create another word document and one by one I would like to join them into this newly created document. I wish to use open xml, any hint would be appreciable. Below is my code:

 private void CreateSampleWordDocument()
    {
        //string sourceFile = Path.Combine("D:\GeneralLetter.dot");
        //string destinationFile = Path.Combine("D:\New.doc");
        string sourceFile = Path.Combine("D:\GeneralWelcomeLetter.docx");
        string destinationFile = Path.Combine("D:\New.docx");
        try
        {
            // Create a copy of the template file and open the copy
            //File.Copy(sourceFile, destinationFile, true);
            using (WordprocessingDocument document = WordprocessingDocument.Open(destinationFile, true))
            {
                // Change the document type to Document
                document.ChangeDocumentType(DocumentFormat.OpenXml.WordprocessingDocumentType.Document);
                //Get the Main Part of the document
                MainDocumentPart mainPart = document.MainDocumentPart;
                mainPart.Document.Save();
            }
        }
        catch
        {
        }
    }

Update( using AltChunks):

using (WordprocessingDocument myDoc = WordprocessingDocument.Open("D:\Test.docx", true))
        {
            string altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString().Substring(0, 2) ;
            MainDocumentPart mainPart = myDoc.MainDocumentPart;
            AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
                AlternativeFormatImportPartType.WordprocessingML, altChunkId);
            using (FileStream fileStream = File.Open("D:\Test1.docx", FileMode.Open))
                chunk.FeedData(fileStream);
            AltChunk altChunk = new AltChunk();
            altChunk.Id = altChunkId;
            mainPart.Document
                .Body
                .InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
            mainPart.Document.Save();
        } 

Why this code overwrites the content of the last file when I use multiple files? Update 2:

 using (WordprocessingDocument myDoc = WordprocessingDocument.Open("D:\Test.docx", true))
        {

            MainDocumentPart mainPart = myDoc.MainDocumentPart;
            string altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString().Substring(0, 3);
            AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, altChunkId);
            using (FileStream fileStream = File.Open("d:\Test1.docx", FileMode.Open))
            {
                chunk.FeedData(fileStream);
                AltChunk altChunk = new AltChunk();
                altChunk.Id = altChunkId;
                mainPart.Document
                    .Body
                    .InsertAfter(altChunk, mainPart.Document.Body
                    .Elements<Paragraph>().Last());
                mainPart.Document.Save();
            }
            using (FileStream fileStream = File.Open("d:\Test2.docx", FileMode.Open))
            {
                chunk.FeedData(fileStream);
                AltChunk altChunk = new AltChunk();
                altChunk.Id = altChunkId;
                mainPart.Document
                    .Body
                    .InsertAfter(altChunk, mainPart.Document.Body
                    .Elements<Paragraph>().Last());
            }
            using (FileStream fileStream = File.Open("d:\Test3.docx", FileMode.Open))
            {
                chunk.FeedData(fileStream);
                AltChunk altChunk = new AltChunk();
                altChunk.Id = altChunkId;
                mainPart.Document
                    .Body
                    .InsertAfter(altChunk, mainPart.Document.Body
                    .Elements<Paragraph>().Last());
            } 
        }

This code is appending the Test2 data twice, in place of Test1 data as well. Means I get:

Test
Test2
Test2

instead of :

Test
Test1
Test2
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Using openXML SDK only, you can use AltChunk element to merge the multiple document into one.

This link the-easy-way-to-assemble-multiple-word-documents and this one How to Use altChunk for Document Assembly provide some samples.

EDIT 1

Based on your code that uses altchunk in the updated question (update#1), here is the VB.Net code I have tested and that works like a charm for me:

Using myDoc = DocumentFormat.OpenXml.Packaging.WordprocessingDocument.Open("D:\Test.docx", True)
        Dim altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString().Substring(0, 2)
        Dim mainPart = myDoc.MainDocumentPart
        Dim chunk = mainPart.AddAlternativeFormatImportPart(
            DocumentFormat.OpenXml.Packaging.AlternativeFormatImportPartType.WordprocessingML, altChunkId)
        Using fileStream As IO.FileStream = IO.File.Open("D:\Test1.docx", IO.FileMode.Open)
            chunk.FeedData(fileStream)
        End Using
        Dim altChunk = New DocumentFormat.OpenXml.Wordprocessing.AltChunk()
        altChunk.Id = altChunkId
        mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements(Of DocumentFormat.OpenXml.Wordprocessing.Paragraph).Last())
        mainPart.Document.Save()
End Using

EDIT 2

The second issue (update#2)

This code is appending the Test2 data twice, in place of Test1 data as well.

is related to altchunkid.

For each document you want to merge in the main document, you need to:

  1. add an AlternativeFormatImportPart in the mainDocumentPart with an Id which must to be unique. This element contains the Inserted data
  2. add in the body an Altchunk element in which you set the id to reference the previous AlternativeFormatImportPart.

In your code, you are using the same Id for all the AltChunks. It's why you see many time the same text.

I am not sure the altchunkid will be unique with your code: string altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString().Substring(0, 2);

If you don't need to set a specific value, I recommend you to not set explicitly the AltChunkId when you add the AlternativeFormatImportPart. Instead, you get one generated by the SDK like this:

VB.Net

Dim chunk As AlternativeFormatImportPart = mainPart.AddAlternativeFormatImportPart(DocumentFormat.OpenXml.Packaging.AlternativeFormatImportPartType.WordprocessingML)
Dim altchunkid As String = mainPart.GetIdOfPart(chunk)

C#

AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(DocumentFormat.OpenXml.Packaging.AlternativeFormatImportPartType.WordprocessingML);
string altchunkid = mainPart.GetIdOfPart(chunk);

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...