Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
249 views
in Technique[技术] by (71.8m points)

c# - WebBrowser Html Document to Image

I'm trying to make image of webpage, but some pages shows me as white page.

In Registry editor browse HKEY_CURRENT_USERSoftwareMicrosoftInternet ExplorerMainFeatureControlFEATURE_BROWSER_EMULATION and add there this:

  • WindowsFormsApp1.exe with decimal value 11000

  • WindowsFormsApp1.vshost.exe with decimal value 11000

Here is my code:

using System;
using System.Collections.Generic;
using System.Windows.Forms;
using System.Drawing;
using System.Drawing.Imaging;
using System.Runtime.InteropServices;

namespace WindowsFormsApp1
{
public partial class Form1 : Form
{
    Dictionary<Uri, Bitmap> browserShots = new Dictionary<Uri, Bitmap>();
    WebBrowser browser = new WebBrowser();
    public Form1()
    {
        InitializeComponent();
        browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(browser_DocumentCompleted);
    }
    //=========================================MADE BY JIMY====================================
    private void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
        var browser = sender as WebBrowser;
        if (browser.ReadyState != WebBrowserReadyState.Complete) return;

        var bitmap = WebBrowserExtender.DrawContent(browser);
        if (bitmap != null)
        {
            if (!browserShots.ContainsKey(browser.Url))
                browserShots.Add(browser.Url, bitmap);
            else
            {
                browserShots[browser.Url]?.Dispose();
                browserShots[browser.Url] = bitmap;
            }
            // Show the Bitmap in a  PictureBox control, eventually
            pictureBox1.Image = browserShots[browser.Url];
        }
    }
    public class WebBrowserExtender
    {
        public static Bitmap DrawContent(WebBrowser browser)
        {
            if (browser.Document == null) return null;
            Size docSize = Size.Empty;
            Graphics g = null;
            var hDc = IntPtr.Zero;

            try
            {
                docSize.Height = (int)((dynamic)browser.Document.DomDocument).documentElement.scrollHeight;
                docSize.Width = (int)((dynamic)browser.Document.DomDocument).documentElement.scrollWidth;
                docSize.Height = Math.Max(Math.Min(docSize.Height, 32750), 1);
                docSize.Width = Math.Max(Math.Min(docSize.Width, 32750), 1);

                var previousSize = browser.ClientSize;
                browser.ClientSize = new Size(docSize.Width, docSize.Height);

                var bitmap = new Bitmap(docSize.Width, docSize.Height, PixelFormat.Format32bppArgb);
                g = Graphics.FromImage(bitmap);
                var rect = new RECT(0, 0, bitmap.Width, bitmap.Height);
                hDc = g.GetHdc();
                var view = browser.ActiveXInstance as IViewObject;
                view.Draw(1, -1, IntPtr.Zero, IntPtr.Zero, IntPtr.Zero, hDc, ref rect, IntPtr.Zero, IntPtr.Zero, 0);
                browser.ClientSize = previousSize;
                return bitmap;
            }
            catch
            {
                // This catch block is like this on purpose: nothing to do here
                return null;
            }
            finally
            {
                if (hDc != null) g?.ReleaseHdc(hDc);
                g?.Dispose();
            }
        }

        [ComImport]
        [Guid("0000010D-0000-0000-C000-000000000046")]
        [InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
        interface IViewObject
        {
            void Draw(uint dwAspect, int lindex, IntPtr pvAspect, [In] IntPtr ptd,
                      IntPtr hdcTargetDev, IntPtr hdcDraw, ref RECT lprcBounds,
                      [In] IntPtr lprcWBounds, IntPtr pfnContinue, uint dwContinue);
        }

        [StructLayout(LayoutKind.Sequential, Pack = 4)]
        struct RECT
        {
            public int Left;
            public int Top;
            public int Right;
            public int Bottom;
            public RECT(int left, int top, int width, int height)
            {
                Left = left; Top = top; Right = width; Bottom = height;
            }
        }
    }
    //=========================================MADE BY JIMY====================================}

    private void button1_Click(object sender, EventArgs e)
    {
        browser.Navigate(textBox1.Text, null, null, "User-Agent: User agent");
    }
}
}

Windows From Designer

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In order to print the Html content of a WebBrowser Control, there are a few points that need to be considered:

  1. We need to use the WebBrowser's DocumentCompleted event to determine when the current Document is loaded and rendered
  2. A single Document may (will) contain more that one sub-Document, usually contained inside Frames/IFrames. Each IFrame contains its own Document: when a Document contained in an IFrame is loaded, the DocumentCompleted is reaised. This means that the event can and will be raised multiple times when the WebBrowser navigates to a URL.

    The notes here explain more: How to get an HtmlElement value inside Frames/IFrames?

  3. The managed properties of the WebBrowser don't always reflect the DOM's real values. For example, the actual dimensions of the Html Document, when the rendering is completed, are not reflected anywhere, so we need to get those measures from the DOM ourselves. The current DOM rendered dimensions are referenced by:

    [WebBrowser].Document.DomDocument.documentElement.scrollHeight;
    [WebBrowser].Document.DomDocument.documentElement.scrollWidth;
    

    See: Measuring Element Dimension and Location with CSSOM in Windows Internet Explorer

  4. The WebBrowser Control DrawToBitmap() method is derived from Control but it's not actually implemented as we could expect. The same applies to other Controls: the RichTextBox is known to print blank content when this method is used.

  5. A Html Document may be larger than the maximum Size supported by a Bitmap. There is also a more subtle memory limit: the Bitmap object needs to store its content in a contiguous memory space, so the limit in Size of a Bitmap is actually hard to pre-determine and may cause exceptions when we might not expect it.
  6. The WebBrowser control's Emulation Feature must be set to Internet Explorer 11. See:
    How can I get the WebBrowser control to show modern contents?
    Web browser control emulation issue (FEATURE_BROWSER_EMULATION)

To proceed, first subscribe to DocumentCompleted event of the WebBrowser Control.

A Dictionary<Uri, Bitmap> is used here to store the Bitmap representing the Html content of URLs visited in a session.
When the DocumentCompleted event is raised, we add a new element to the Dictionary when the current URL has never been visited before.
If the Uri is already stored, we updated the related Bitmap object, so only the most recent snapshot of a Html Document is present in the collection.

I'm using a support class to handle the Bitmaps creation and to declare the native COM Interface used to generate the Bitmap from the current ISurfacePresenter.
Since the WebBrowser control is forced to use VIEW_OBJECT_COMPOSITION_MODE_LEGACY as the CompositionMode for all sites, the internal GetPrintBitmap method calls the IViewObject Interface Draw() method in this situation, so do we.

To print the content (all the content) of the current Html Document, call the DrawContent(WebBrowser browser) static method of the WebBrowserExtender class:

Dictionary<Uri, Bitmap> browserShots = new Dictionary<Uri, Bitmap>();

private void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    var browser = sender as WebBrowser;
    if (browser.ReadyState != WebBrowserReadyState.Complete) return;

    var bitmap = WebBrowserExtender.DrawContent(browser);
    if (bitmap != null) {
        if (!browserShots.ContainsKey(browser.Url)) {
            browserShots.Add(browser.Url, bitmap);
        }
        else {
            browserShots[browser.Url]?.Dispose();
            browserShots[browser.Url] = bitmap;
        }
        // Show the Bitmap in a  PictureBox control, eventually
        [PictureBox].Image = browserShots[browser.Url];
    }
}

The WebBrowserExtender support class:

using System.Drawing;
using System.Drawing.Imaging;
using System.Runtime.InteropServices;
using System.Windows.Forms;

public class WebBrowserExtender
{
    public static Bitmap DrawContent(WebBrowser browser)
    {
        if (browser.Document == null) return null;
        Size docSize = Size.Empty;
        Graphics g = null;
        var hDc = IntPtr.Zero;

        try {
            docSize.Height = (int)((dynamic)browser.Document.DomDocument).documentElement.scrollHeight;
            docSize.Width = (int)((dynamic)browser.Document.DomDocument).documentElement.scrollWidth;

            var screenWidth = Screen.FromHandle(browser.Handle).Bounds.Width;
            docSize.Width = Math.Max(Math.Min(docSize.Width, screenWidth), 1);
            docSize.Height = Math.Max(Math.Min(docSize.Height, 32750), 1);

            var previousSize = browser.ClientSize;
            browser.ClientSize = new Size(docSize.Width, docSize.Height);

            var bitmap = new Bitmap(docSize.Width, docSize.Height, PixelFormat.Format32bppArgb);
            g = Graphics.FromImage(bitmap);
            var rect = new RECT(0, 0, bitmap.Width, bitmap.Height);
            hDc = g.GetHdc();
            var view = browser.ActiveXInstance as IViewObject;
            view.Draw(1, -1, IntPtr.Zero, IntPtr.Zero, IntPtr.Zero, hDc, ref rect, IntPtr.Zero, IntPtr.Zero, 0);
            browser.ClientSize = previousSize;
            return bitmap;
        }
        catch {
            // This catch block is like this on purpose: nothing to do here
            return null;
        }
        finally {
            if (hDc != null) g?.ReleaseHdc(hDc);
            g?.Dispose();
        }
    }

    [ComImport]
    [Guid("0000010D-0000-0000-C000-000000000046")]
    [InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
    interface IViewObject
    {
        void Draw(uint dwAspect, int lindex, IntPtr pvAspect, [In] IntPtr ptd, 
                  IntPtr hdcTargetDev, IntPtr hdcDraw, ref RECT lprcBounds, 
                  [In] IntPtr lprcWBounds, IntPtr pfnContinue, uint dwContinue);
    }

    [StructLayout(LayoutKind.Sequential, Pack = 4)]
    struct RECT
    {
        public int Left;
        public int Top;
        public int Right;
        public int Bottom;

        public RECT(int left, int top, int width, int height)
        {
            Left = left; Top = top; Right = width; Bottom = height;
        }
    }
}

This is how it works:

The full Document is captured. Of course, the Bitmap can also be limited to a specific maximum/minimum size, to capture just a section of the Html Document.

WebBrowser ScreenShots:

Sample WinForms Project on Google Drive.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...