Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
708 views
in Technique[技术] by (71.8m points)

c# - Parsing HTML to get script variable value

I'm trying to find a method of accessing data between tags returned by a server I am making HTTP requests to. The document has multiple tags, but only one of the tags has JavaScript code between it, the rest are included from files. I want to accesses the code between the script tag.

An example of the code is:

<html>
    // Some HTML

    <script>
        var spect = [['temper', 'init', []],
                    ['fw/lib', 'init', [{staticRoot: '//site.com/js/'}]],
                    ["cap","dm",[{"tackmod":"profile","xMod":"timed"}]]];

    </script>

    // More HTML
</html>

I'm looking for an ideal way to grab the data between 'spect' and parse it. Sometimes there is a space between 'spect' and the '=' and sometimes there isn't. No idea why, but I have no control over the server.

I know this question may have been asked, but the responses suggest using something like HTMLAgilityPack, and I'd rather avoid using a library for this task as I only need to get the JavaScript from the DOM once.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Very simple example of how this could be easy using a HTMLAgilityPack and Jurassic library to evaluate the result:

var html = @"<html>
             // Some HTML
             <script>
               var spect = [['temper', 'init', []],
               ['fw/lib', 'init', [{staticRoot: '//site.com/js/'}]],
               [""cap"",""dm"",[{""tackmod"":""profile"",""xMod"":""timed""}]]];
             </script>
             // More HTML
             </html>";

// Grab the content of the first script element
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var script = doc.DocumentNode.Descendants()
                             .Where(n => n.Name == "script")
                             .First().InnerText;

// Return the data of spect and stringify it into a proper JSON object
var engine = new Jurassic.ScriptEngine();
var result = engine.Evaluate("(function() { " + script + " return spect; })()");
var json = JSONObject.Stringify(engine, result);

Console.WriteLine(json);
Console.ReadKey();

Output:

[["temper","init",[]],["fw/lib","init",[{"staticRoot":"//site.com/js/"}]],["cap","dm",[{"tackmod":"profile","xMod":"timed"}]]]

Note: I am not accounting for errors or anything else, this merely serves as an example of how to grab the script and evaluate for the value of spect.

There are a few other libraries for executing/evaluating JavaScript as well.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...