c# - How to encode the ampersand if it is not already encoded?

Question

Welcome To Ask or Share your Answers For Others

c# - How to encode the ampersand if it is not already encoded?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

c# - How to encode the ampersand if it is not already encoded?

I need a c# method to encode ampersands if they are not already encoded or part of another encoded epxression

eg

"tom & jill" should become "tom &amp; jill"


"tom &amp; jill" should remain "tom &amp; jill"


"tom &euro; jill" should remain "tom &euro; jill"


"tom <&> jill" should become "tom <&amp;> jill"


"tom &quot;&&quot; jill" should become "tom &quot;&amp;&quot; jill"

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:39:37+0000

What you actually want to do, is first decode the string and then encode it again. Don't bother trying to patch an encoded string.

Any encoding is only worth its salt if it can be decoded easily, so reuse that logic to make your life easier. And your software less bug-prone.

Now, if you are unsure of whether the string is encoded or not - the problem will most certainly not be the string itself, but the ecosystem that produced the string. Where did you get it from? Who did it pass through before it got to you? Do you trust it?

If you really have to resort to creating a magic-fix-weird-data function, then consider building a table of "encodings" and their corresponding characters:

&amp; -> &
&euro; -> €
&lt; -> <
// etc.

Then, first decode all encountered encodings according to the table and later reencode the whole string. Sure, you might get more efficient methods when fumbling without decoding first. But you won't be sane next year. And this is your carrier, right? You need to stay right in the head! You'll loose your mind if you try to be too clever. And you'll lose your job when you go mad. Sad things happen to people who let maintaining their hacks destroy their minds...

EDIT: Using the .NET library, of course, will save you from madness:

HttpUtility.HtmlDecode(string)
HttpUtility.HtmlEncode(string)

I just tested it, and it seems to have no problems with decoding strings with just ampersands in them. So, go ahead:

string magic(string encodedOrNot)
{
    var decoded = HttpUtility.HtmlDecode(encodedOrNot);
    return HttpUtility.HtmlEncode(decoded);
}

EDIT#2: It turns out, that the decoder HttpUtility.HtmlDecode will work for your purpose, but the encoder will not, since you don't want angle brackets (<, >) to be encoded. But writing an encoder is really easy:

define encoder(string decoded):
    result is a string-builder
    for character in decoded:
        if character in encoding-table:
           result.append(encoding-table[character])
        else:
           result.append(character)
    return result as string

Categories

c# - How to encode the ampersand if it is not already encoded?

c# - How to encode the ampersand if it is not already encoded?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags