Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
336 views
in Technique[技术] by (71.8m points)

html - 如何从<a>html标签</a>提取超链接文本<a>?</a>(How to extract the hyperlink text from a <a> html tag?)

Given a string containing 'blabla <a href="address">text</a> blabla' , I want to extract 'text' from it.

(给定一个包含'blabla <a href="address">text</a> blabla'的字符串,我想从中提取'text' 。)
regexp doc suggests '<(\w+).*>.*</\1>' expression, but it extracts the whole <a> ... </a> thing.

(regexp doc建议使用'<(\w+).*>.*</\1>'表达式,但它将提取整个<a> ... </a>内容。)
Of course I can continue using strfind like this:

(当然,我可以像这样继续使用strfind :)

line = 'blabla <a href="address">text</a> blabla';
atag = regexp(line,'<(w+).*>.*</1>','match', 'once');
from = strfind(atag, '>');
to = strfind(atag, '<');
text = atag((from(1)+1):(to(2)-1))

, but, can I use another expression to find text at once?

(,但是,我可以使用其他表达式一次查找text吗?)

  ask by saastn translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use the extractHTMLText function in Matlab, you can read about it in the following link .

(您可以在Matlab中使用extractHTMLText函数,可以在以下链接中阅读有关该函数的信息。)

Example that get the desired output:

(获得所需输出的示例:)

line = 'blabla <a href="address">text</a> blabla';
l = split(extractHTMLText(line), ' ');
l{2}

If you don't want to use a built in function you could use regex as Nick suggested.

(如果您不想使用内置函数,可以按照Nick的建议使用正则表达式 。)

line = 'blabla <a href="address">text</a> blabla';
[atag,tok] = regexp(line,'<(w+).*>(.*?)</1>','match','tokens'); 
t = tok(1,1){1};
t{2}

and you'll get the desired output

(然后您将获得所需的输出)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...