Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
332 views
in Technique[技术] by (71.8m points)

c# - .net document write with mshtml

I am using mshtml for html parsing. (version 7.0.3300.0, C:Program FilesMicrosoft.NETPrimary Interop AssembliesMicrosoft.mshtml.dll).

HTMLDocumentClass have a write method so i used it but it raises ComException with ErrorCode:-2147352571 and Message:Type mismatch. What is the reason for it? If write method of HTMLDocumentClass will not be used why did they define?

    HTMLDocumentClass getHTMLDocument(string html)
    {
        HTMLDocumentClass doc = new HTMLDocumentClass();

        doc.write(new object[] { html }); // raises exception
        doc.close();

        return doc;
    }

    HTMLDocumentClass getHTMLDocument2(string html)
    {
        HTMLDocumentClass doc = new HTMLDocumentClass();
        IHTMLDocument2 doc2 = (IHTMLDocument2)doc;
        doc2.write(new object[] { html });
        doc2.close();

        return doc;
    }
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Okay, I found it. This is an interesting failure mode. All of the PIAs for Microsoft.mshtml that I have installed on machine are outdated. No less than 4 of them, all version 7.0.3300.0 with a runtime target of 1.0.3705 (which is quite old).

The fooClass interop class that's generated by the type library importer is the cause. It is a synthetic class, it exists to make events a bit easier to deal with, they are done very differently in COM. The class is a flattened version of all of the combined methods of all interfaces. The current SDK version of the HTMLDocument coclass is declared as follows (from mshmtl.idl):

[
    uuid(25336920-03F9-11cf-8FD0-00AA00686F13)
]
coclass HTMLDocument
{
    [default]           dispinterface DispHTMLDocument;
    [source, default]   dispinterface HTMLDocumentEvents;
    [source]            dispinterface HTMLDocumentEvents2;
    [source]            dispinterface HTMLDocumentEvents3;
                        interface IHTMLDocument2;
                        interface IHTMLDocument3;
                        interface IHTMLDocument4;
                        interface IHTMLDocument5;
                        interface IHTMLDocument6;
                        interface IHTMLDOMNode;
                        interface IHTMLDOMNode2;
                        interface IDocumentSelector;
                        interface IHTMLDOMConstructor;
};

If you use Object Browser on the interop library, you'll see that HTMLDocumentClass is missing the interface methods for IHTMLDocument6, IDocumentSelector and IHTMLDOMConstructor. The write() method you are using is past these interfaces.

Which means that if you use HTMLDocumentClass.write(), you'll call the wrong method. The exception is raised because whatever method is being called isn't happy about the argument. Of course it is not.

This is a nasty failure mode of course. This came about because Microsoft broke a very hard COM requirement, changing a COM interface or coclass requires a different guid. The [uuid] attribute in the above declaration. That however also makes new versions of Internet Explorer completely incompatible with old code that uses it. Rock and a hard place, backwards compatibility is quite sacred at Microsoft. The order of interface implementations in a coclass is not normally a problem in regular COM clients. Except in .NET, it breaks the layout of the synthetic XxxClass type that tlbimp generates.

I've never seen a case where that synthetic class was actually required and never use it myself. You can always obtain the correct interface pointer by casting in C#, that calls QueryInterface() and always returns the correct pointer regardless of the version. Your alternative is the proper workaround.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...