Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
2.9k views
in Technique[技术] by (71.8m points)

regex - Replace superscript and subscript chars from a string Javascript

I want to remove all superscript and subscript chars from the text.

Exp: '?'.

I found an example on stackoverflow, but it only considers superscript numbers and not characters or subscripts.

Anyone knows how to achieve this? A way would be to have all possible superscripts and subscripts and replace them one by one but that is a bit impractical.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Based on the subscript and superscript Unicode range reference and a manual search for "subscript" and "superscript" in a UniView tool, you may use

.replace(/[u006Eu00B0u00B2u00B3u00B9u02AFu0670u0711u2121u213Bu2207u29B5uFC5B-uFC5DuFC63uFC90uFCD9u2070u2071u2074-u208Eu2090-u209Cu0345u0656u17D2u1D62-u1D6Au2A27u2C7C]+/g, '')

See the regex demo.

The + quantifier (one or more consecutive occurrences) will make it easier for the regex engine to remove whole chunks of 1+ sub/superscript chars in one go.

Note that ?? are modifier letters and are not formally superscript chars. If you want to include them, you need

var res = s.replace(/(?:uD81A[uDF40-uDF43]|uD81B[uDF93-uDF9FuDFE0]|[u006Eu00B0u00B2u00B3u00B9u02AFu0670u0711u2121u213Bu2207u29B5uFC5B-uFC5DuFC63uFC90uFCD9u2070u2071u2074-u208Eu2090-u209Cu0345u0656u17D2u1D62-u1D6Au2A27u2C7Cu02B0-u02C1u02C6-u02D1u02E0-u02E4u02ECu02EEu0374u037Au0559u0640u06E5u06E6u07F4u07F5u07FAu081Au0824u0828u0971u0E46u0EC6u10FCu17D7u1843u1AA7u1C78-u1C7Du1D2C-u1D6Au1D78u1D9B-u1DBFu2071u207Fu2090-u209Cu2C7Cu2C7Du2D6Fu2E2Fu3005u3031-u3035u303Bu309Du309Eu30FC-u30FEuA015uA4F8-uA4FDuA60CuA67FuA69CuA69DuA717-uA71FuA770uA788uA7F8uA7F9uA9CFuA9E6uAA70uAADDuAAF3uAAF4uAB5C-uAB5FuFF70uFF9EuFF9F])+/g, '')

See this demo

To normalize subscript and superscript digits, it makes sense to use a dictionary and replace dynamically within an anonymous method passed as the replacement argument:

var super_sub_script_dict = {'u2070': '0', 'u00B9': '1', 'u00B2': '2', 'u00B3': '3', 'u2074': '4', 'u2075': '5', 'u2076': '6', 'u2077': '7', 'u2078': '8', 'u2079': '9', 'u2080': '0', 'u2081': '1', 'u2082': '2', 'u2083': '3', 'u2084': '4', 'u2085': '5', 'u2086': '6', 'u2087': '7', 'u2088': '8', 'u2089': '9'};
var test_string = "Subscript: ?????????? and superscript: ?123??????";
var regex = new RegExp('[' + Object.keys(super_sub_script_dict).join("") + ']', 'g'); // => /[?123????????????????]/g
// Or
// var regex = /[u00B9u00B2u00B3u2070u2074-u2089]/g;
console.log(test_string.replace(regex, function(x) { 
    return super_sub_script_dict[x];
}))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...