Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
530 views
in Technique[技术] by (71.8m points)

Extract hashtags from a string in Excel

How can I extract multiple hashtags from a string in Excel? I played around with MID and SEARCH formulas but couldn't come up with anything good.

**Example input:**
Daniel Craig primed for action on SKYFALL (2012) and SPECTRE (2015). #007 #JamesBond #DanielCraig

**Example output:**
#007
#JamesBond
#DanielCraig
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you have Excel 2013+ with the FILTERXML function you can:

  • convert the string into an XML, using the spaces for the different nodes
    • "<t><s>" & SUBSTITUTE(A$1," ","</s><s>") & "</s></t>"
  • use an Xpath to extract the nodes containing the #
    • "//s[contains(.,'#')]
  • in the formula, [" & ROWS($1:1) & "]") becomes a position argument in the xpath so it will sequentially return the first, second, ...nth node that matches the condition.
  • The IFERROR is to blank out the result if you fill down more than there are hashtags.

=IFERROR(FILTERXML("<t><s>" & SUBSTITUTE(A$1," ","</s><s>") & "</s></t>","//s[contains(.,'#')][" & ROWS($1:1) & "]"),"")

In the example, I placed the formula in A3 and filled down five rows.

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...