Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
79 views
in Technique[技术] by (71.8m points)

How to strip HTML tags from a string in SQL Server?

I've got data in SQL Server 2005 that contains HTML tags and I'd like to strip all that out, leaving just the text between the tags. Ideally also replacing things like &lt; with <, etc.

Is there an easy way to do this or has someone already got some sample T-SQL code?

I don't have the ability to add extended stored procs and the like, so would prefer a pure T-SQL approach (preferably one backwards compatible with SQL 2000).

I just want to retrieve the data with stripped out HTML, not update it, so ideally it would be written as a user-defined function, to make for easy reuse.

So for example converting this:

<B>Some useful text</B>&nbsp;
<A onclick="return openInfo(this)"
   href="http://there.com/3ce984e88d0531bac5349"
   target=globalhelp>
   <IMG title="Source Description" height=15 alt="Source Description" 
        src="/ri/new_info.gif" width=15 align=top border=0>
</A>&gt;&nbsp;<b>more text</b></TD></TR>

to this:

Some useful text > more text
Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There is a UDF that will do that described here:

User Defined Function to Strip HTML

CREATE FUNCTION [dbo].[udf_StripHTML] (@HTMLText VARCHAR(MAX))
RETURNS VARCHAR(MAX) AS
BEGIN
    DECLARE @Start INT
    DECLARE @End INT
    DECLARE @Length INT
    SET @Start = CHARINDEX('<',@HTMLText)
    SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
    SET @Length = (@End - @Start) + 1
    WHILE @Start > 0 AND @End > 0 AND @Length > 0
    BEGIN
        SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')
        SET @Start = CHARINDEX('<',@HTMLText)
        SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
        SET @Length = (@End - @Start) + 1
    END
    RETURN LTRIM(RTRIM(@HTMLText))
END
GO

Edit: note this is for SQL Server 2005, but if you change the keyword MAX to something like 4000, it will work in SQL Server 2000 as well.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...