I have an webpage with source contains several similar structure like below:
<tr>
<td width="10%" bgcolor="#FFFFFF"><font class="bodytext9">1-Jun-2013</font></td>
<td width="4%" bgcolor="#FFFFFF" align=center><font class="bodytext9">Sat</font></td>
<td width="5%" bgcolor="#FFFFFF" align="center"></td>
<td width="5%" bgcolor="#FFFFFF" align="center"><font class="bodytext9">Another Text</font></td>
<td width="5%" bgcolor="#FFFFFF" align="center"><font class="bodytext9"><img src="img/colors/white.gif"></font></td>
<td width="15%" bgcolor="#FFFFFF" align="center"><a class="black_9" href="link2">Here is also Text</a></td>
<td width="15%" bgcolor="#FFFFFF" align="center"><a href="LINKtoWeb" class=list><u>STRING TO CAPTURE</u></a></td>
<td width="4%" bgcolor="#FFFFFF" align="center"><a target="_new" href="AnotherLink"><img src="img/img2.gif" border="0"></a></td>
</tr>
This kind of structure repeated many time with different text inside, but I only want to extract this set because the text "STRING TO CAPTURE" appear here FIRST TIME. So how do I use Jsoup to extract only this set, and the visible text between it, as well as the url
AnotherLink
at the line of the text "STRING TO CAPTURE" appears ?
I am new to Jsoup, so I only tried this
Document doc = Jsoup.connect("http://www.website.com").get();
Element link = doc.select("a").first();
String relHref = link.attr("href");
String absHref = link.attr("abs:href");
String text = doc.body().text();
String linkHref = link.attr("href");
String linkText = link.text();
System.out.println("link:" + link);
System.out.println("text:" + text);
but cant do it in advance for this purpose, please give me some advices ! Thank you !
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…