Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
99 views
in Technique[技术] by (71.8m points)

javascript - parse and execute JS by C#

i have simple crawler which crawl and search page. but now i have problem how to execute and parse js link from that page. Does anyone have any idea how to parse and execute js page?

example:

  • some_url is simple to parse with webhtmltoolktit
  • JAVASCRIPT:runmeat(1,7,0,2,7,9) is js link which redirect then to some_url2 page and this page i need to crawl then. but problem is how to execute this javascript in C# to get that some_url2 link?
Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

To answer the question title "How to parse and execute JS in C#", here is piece of code that wraps the Windows Script Engines. It supports 32-bit and 64-bit environments.

In your specific case, it means depending on the .JS code, you may have to emulate/implement some HTML DOM element such as 'document', 'window', etc. (using the 'named items' feature, with the MyItem class. that's exactly what Internet Explorer does).

Here are some sample of what you can do with it:

1) Direct expressions evaluation:

Console.WriteLine(ScriptEngine.Eval("jscript", "1+2/3"));

will display 1.66666666666667

2) Function call, with optional arguments:

using (ScriptEngine engine = new ScriptEngine("jscript"))
{
  ParsedScript parsed = engine.Parse("function MyFunc(x){return 1+2+x}");
  Console.WriteLine(parsed.CallMethod("MyFunc", 3));
}

Will display 6

3) Function call with named items, and optional arguments:

using (ScriptEngine engine = new ScriptEngine("jscript"))
{
    ParsedScript parsed = engine.Parse("function MyFunc(x){return 1+2+x+My.Num}");
    MyItem item = new MyItem();
    item.Num = 4;
    engine.SetNamedItem("My", item);
    Console.WriteLine(parsed.CallMethod("MyFunc", 3));
}

[ComVisible(true)] // Script engines are COM components.
public class MyItem
{
    public int Num { get; set; }
}

Will display 10.

Edit: I have added the possibility to use a CLSID instead of a script language name, so we can re-use the new and fast IE9+ "chakra" javascript engine, like this:

using (ScriptEngine engine = new ScriptEngine("{16d51579-a30b-4c8b-a276-0ff4dc41e755}"))
{
    // continue with chakra now
}

Here is the full source:

/// <summary>
/// Represents a Windows Script Engine such as JScript, VBScript, etc.
/// </summary>
public sealed class ScriptEngine : IDisposable
{
    /// <summary>
    /// The name of the function used for simple evaluation.
    /// </summary>
    public const string MethodName = "EvalMethod";

    /// <summary>
    /// The default scripting language name.
    /// </summary>
    public const string DefaultLanguage = JavaScriptLanguage;

    /// <summary>
    /// The JavaScript or jscript scripting language name.
    /// </summary>
    public const string JavaScriptLanguage = "javascript";

    /// <summary>
    /// The javascript or jscript scripting language name.
    /// </summary>
    public const string VBScriptLanguage = "vbscript";

    /// <summary>
    /// The chakra javascript engine CLSID. The value is {16d51579-a30b-4c8b-a276-0ff4dc41e755}.
    /// </summary>
    public const string ChakraClsid = "{16d51579-a30b-4c8b-a276-0ff4dc41e755}";

    private IActiveScript _engine;
    private IActiveScriptParse32 _parse32;
    private IActiveScriptParse64 _parse64;
    internal ScriptSite Site;
    private Version _version;
    private string _name;

    [Guid("BB1A2AE1-A4F9-11cf-8F20-00805F2CD064"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
    private interface IActiveScript
    {
        [PreserveSig]
        int SetScriptSite(IActiveScriptSite pass);
        [PreserveSig]
        int GetScriptSite(Guid riid, out IntPtr site);
        [PreserveSig]
        int SetScriptState(ScriptState state);
        [PreserveSig]
        int GetScriptState(out ScriptState scriptState);
        [PreserveSig]
        int Close();
        [PreserveSig]
        int AddNamedItem(string name, ScriptItem flags);
        [PreserveSig]
        int AddTypeLib(Guid typeLib, uint major, uint minor, uint flags);
        [PreserveSig]
        int GetScriptDispatch(string itemName, out IntPtr dispatch);
        [PreserveSig]
        int GetCurrentScriptThreadID(out uint thread);
        [PreserveSig]
        int GetScriptThreadID(uint win32ThreadId, out uint thread);
        [PreserveSig]
        int GetScriptThreadState(uint thread, out ScriptThreadState state);
        [PreserveSig]
        int InterruptScriptThread(uint thread, out System.Runtime.InteropServices.ComTypes.EXCEPINFO exceptionInfo, uint flags);
        [PreserveSig]
        int Clone(out IActiveScript script);
    }

    [Guid("4954E0D0-FBC7-11D1-8410-006008C3FBFC"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
    private interface IActiveScriptProperty
    {
        [PreserveSig]
        int GetProperty(int dwProperty, IntPtr pvarIndex, out object pvarValue);
        [PreserveSig]
        int SetProperty(int dwProperty, IntPtr pvarIndex, ref object pvarValue);
    }

    [Guid("DB01A1E3-A42B-11cf-8F20-00805F2CD064"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
    private interface IActiveScriptSite
    {
        [PreserveSig]
        int GetLCID(out int lcid);
        [PreserveSig]
        int GetItemInfo(string name, ScriptInfo returnMask, out IntPtr item, IntPtr typeInfo);
        [PreserveSig]
        int GetDocVersionString(out string version);
        [PreserveSig]
        int OnScriptTerminate(object result, System.Runtime.InteropServices.ComTypes.EXCEPINFO exceptionInfo);
        [PreserveSig]
        int OnStateChange(ScriptState scriptState);
        [PreserveSig]
        int OnScriptError(IActiveScriptError scriptError);
        [PreserveSig]
        int OnEnterScript();
        [PreserveSig]
        int OnLeaveScript();
    }

    [Guid("EAE1BA61-A4ED-11cf-8F20-00805F2CD064"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
    private interface IActiveScriptError
    {
        [PreserveSig]
        int GetExceptionInfo(out System.Runtime.InteropServices.ComTypes.EXCEPINFO exceptionInfo);
        [PreserveSig]
        int GetSourcePosition(out uint sourceContext, out int lineNumber, out int characterPosition);
        [PreserveSig]
        int GetSourceLineText(out string sourceLine);
    }

    [Guid("BB1A2AE2-A4F9-11cf-8F20-00805F2CD064"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
    private interface IActiveScriptParse32
    {
        [PreserveSig]
        int InitNew();
        [PreserveSig]
        int AddScriptlet(string defaultName, string code, string itemName, string subItemName, string eventName, string delimiter, IntPtr sourceContextCookie, uint startingLineNumber, ScriptText flags, out string name, out System.Runtime.InteropServices.ComTypes.EXCEPINFO exceptionInfo);
        [PreserveSig]
        int ParseScriptText(string code, string itemName, IntPtr context, string delimiter, int sourceContextCookie, uint startingLineNumber, ScriptText flags, out object result, out System.Runtime.InteropServices.ComTypes.EXCEPINFO exceptionInfo);
    }

    [Guid("C7EF7658-E1EE-480E-97EA-D52CB4D76D17"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
    private interface IActiveScriptParse64
    {
        [PreserveSig]
        int InitNew();
        [PreserveSig]
        int AddScriptlet(string defaultName, string code, string itemName, string subItemName, string eventName, string delimiter, IntPtr sourceContextCookie, uint startingLineNumber, ScriptText flags, out string name, out System.Runtime.InteropServices.ComTypes.EXCEPINFO exceptionInfo);
        [PreserveSig]
        int ParseScriptText(string code, string itemName, IntPtr context, string delimiter, long sourceContextCookie, uint startingLineNumber, ScriptText flags, out object result, out System.Runtime.InteropServices.ComTypes.EXCEPINFO exceptionInfo);
    }

    [Flags]
    private enum ScriptText
    {
        None = 0,
        //DelayExecution = 1,
        //IsVisible = 2,
        IsExpression = 32,
        IsPersistent = 64,
        //HostManageSource = 128
    }

    [Flags]
    private enum ScriptInfo
    {
        //None = 0,
        //IUnknown = 1,
        ITypeInfo = 2
    }

    [Flags]
    private enum ScriptItem
    {
        //None = 0,
        IsVisible = 2,
        IsSource = 4,
        //GlobalMembers = 8,
        //IsPersistent = 64,
        //CodeOnly = 512,
        //NoCode = 1024
    }

    private enum ScriptThreadState
    {
        //NotInScript = 0,
        //Running = 1
    }

    private enum ScriptState
    {
        Uninitialized = 0,
        Started = 1,
        Connected = 2,
        Disconnected = 3,
        Closed = 4,
        Initialized = 5
    }

    private const int TYPE_E_ELEMENTNOTFOUND = unchecked((int)(0x8002802B));
    private const int E_NOTIMPL = -2147467263;

    /// <summary>
    /// Determines if a script engine with the input name exists.
    /// </summary>
    /// <param name="language">The language.</param>
    /// <returns>true if the engine exists; false otherwise.</returns>
    public static Version GetVersion(string language)
    {
        if (language == null)
            throw new ArgumentNullException("language");

        Type engine;
        Guid clsid;
        if (Guid.TryParse(language, out clsid))
        {
            engine = Type.GetTypeFromCLSID(clsid, false);
        }
        else
        {
            engine = Type.GetTypeFromProgID(language, false);
        }
        if (engine == null)
            return null;

        IActiveScript scriptEngine = Activator.CreateInstance(engine) as IActiveScript;
        if (scriptEngine == null)
            return null;

        IActiveScriptProperty scriptProperty = scriptEngine as IActiveScriptProperty;
        if (scriptProperty == null)
            return new Version(1, 0, 0, 0);

        int major = GetProperty(scriptProperty, SCRIPTPROP_MAJORVERSION, 0);
        int minor = GetProperty(scriptProperty, SCRIPTPROP_MINORVERSION, 0);
        int revision = GetProperty(scriptProperty, SCRIPTPROP_BUILDNUMBER, 0);
        Version version = new Version(major, minor, Environment.OSVersion.Version.Build, revision);
        Marshal.ReleaseComObject(scriptProperty);
        Marshal.ReleaseComObject(scriptEngine);
        return version;
    }

    private static T GetProperty<T>(IActiveScriptProperty prop, int index, T defaultValue)
    {
        object value;
        if (prop.GetProperty(index, IntPtr.Zero, out value) != 0)
            return defaultValue;

        try
        {
            return (T)Convert.ChangeType(value, typeof(T));
        }
        catch
        {
            return defaultValue;
        }
    }

    /// <summary> 
    /// Initializes a new instance of the <see cref="ScriptEngine"/> class. 
    /// </summary> 
    /// <param name="language">The scripting language. Standard Windows Script engines names are 'jscript' or 'vbscript'.</param> 
    public ScriptEngine(string language)
    {
        if (language == null)
            throw new ArgumentNullException("language");

        Type engine;
        Guid clsid;
        if (Guid.TryParse(language, out clsid))
        {
            engine = Type.GetTypeFromCLSID(clsid, true);
        }
        else
        {
            engine = Type.GetTypeFromProgID(language, true);
        }
        _engine = Activator.CreateInstance(engine) as IActiveScript;
        if (_engine == null)
            throw new ArgumentException(language + " is not an Windows Script Engine", "language");

        Site = new ScriptSite();
        _engine.SetScriptSite(Site);

        // support 32-bit & 64-bit process 
        if (IntPtr.Size == 4)
        {
            _parse32 = (IActiveScriptParse32)_engin

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...