Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
696 views
in Technique[技术] by (71.8m points)

java - Get title, meta description content using URL

I am trying to extract title and meta tag's description content from a URL, this is what I have:

fin[] //urls in a string array

for (int f = 0; f < fin.length; f++)
{
Document finaldoc = Jsoup.connect(fin[f]).get(); //fin[f] contains url at each instance
Elements finallink1 = finaldoc.select("title");
out.println(finallink1);
Elements finallink2 = finaldoc.select("meta");
out.println(finallink2.attr("name"));
out.println(fin[f]); //printing url at last
}

but it is not printing the title, and simply prints description as "description" and prints the url.

result :

description plus.google.com generator en.wikipedia.org/wiki/google description earth.google.com

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use this:

String getMetaTag(Document document, String attr) {
    Elements elements = document.select("meta[name=" + attr + "]");
    for (Element element : elements) {
        final String s = element.attr("content");
        if (s != null) return s;
    }
    elements = document.select("meta[property=" + attr + "]");
    for (Element element : elements) {
        final String s = element.attr("content");
        if (s != null) return s;
    }
    return null;
}

Then:

 String title = document.title();
 String description = getMetaTag(document, "description");
 if (description == null) {
    description = getMetaTag(document, "og:description");
 }
 // and others you need to
 String ogImage = getMetaTag(document, "og:image") 

....


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...