Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

c# - Add HTML String to OpenXML (*.docx) Document

I am trying to use Microsoft's OpenXML 2.5 library to create a OpenXML document. Everything works great, until I try to insert an HTML string into my document. I have scoured the web and here is what I have come up with so far (snipped to just the portion I am having trouble with):

Paragraph paragraph = new Paragraph();
Run run = new Run();

string altChunkId = "id1";
AlternativeFormatImportPart chunk =
       document.MainDocumentPart.AddAlternativeFormatImportPart(
           AlternativeFormatImportPartType.Html, altChunkId);
chunk.FeedData(new MemoryStream(Encoding.UTF8.GetBytes(ioi.Text)));
AltChunk altChunk = new AltChunk { Id = altChunkId };

run.AppendChild(new Break());

paragraph.AppendChild(run);
body.AppendChild(paragraph);

Obviously, I haven't actually added the altChunk in this example, but I have tried appending it everywhere - to the run, paragraph, body, etc. In ever case, I am unable to open up the docx file in Word 2010.

This is making me a little nutty because it seems like it should be straightforward (I will admit that I'm not fully understanding the AltChunk "thing"). Would appreciate any help.

Side Note: One thing I did find that was interesting, and I don't know if it's actually a problem or not, is this response which says AltChunk corrupts the file when working from a MemoryStream. Can anybody confirm that this is/isn't true?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I can reproduce the error "... there is a problem with the content" by using an incomplete HTML document as the content of the alternative format import part. For example if you use the following HTML snippet <h1>HELLO</h1> MS Word is unable to open the document.

The code below shows how to add an AlternativeFormatImportPart to a word document. (I've tested the code with MS Word 2013).

using (WordprocessingDocument doc = WordprocessingDocument.Open(@"test.docx", true))
{
  string altChunkId = "myId";
  MainDocumentPart mainDocPart = doc.MainDocumentPart;

  var run = new Run(new Text("test"));
  var p = new Paragraph(new ParagraphProperties(
       new Justification() { Val = JustificationValues.Center }),
                     run);

  var body = mainDocPart.Document.Body;
  body.Append(p);        

  MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes("<html><head></head><body><h1>HELLO</h1></body></html>"));

  // Uncomment the following line to create an invalid word document.
  // MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes("<h1>HELLO</h1>"));

  // Create alternative format import part.
  AlternativeFormatImportPart formatImportPart =
     mainDocPart.AddAlternativeFormatImportPart(
        AlternativeFormatImportPartType.Html, altChunkId);
  //ms.Seek(0, SeekOrigin.Begin);

  // Feed HTML data into format import part (chunk).
  formatImportPart.FeedData(ms);
  AltChunk altChunk = new AltChunk();
  altChunk.Id = altChunkId;

  mainDocPart.Document.Body.Append(altChunk);
}

According to the Office OpenXML specification valid parent elements for the w:altChunk element are body, comment, docPartBody, endnote, footnote, ftr, hdr and tc. So, I've added the w:altChunk to the body element.

For more information on the w:altChunk element see this MSDN link.

EDIT

As pointed out by @user2945722, to make sure that the OpenXml library correctlty interprets the byte array as UTF-8, you should add the UTF-8 preamble. This can be done this way:

MemoryStream ms = new MemoryStream(new UTF8Encoding(true).GetPreamble().Concat(Encoding.UTF8.GetBytes(htmlEncodedString)).ToArray()

This will prevent your é's from being rendered as ??'s, your ?'s as ?¤'s, etc.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...