Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
126 views
in Technique[技术] by (71.8m points)

javascript - How can I replace all internal urls in a string of html with their relative external url?

Given a string of html like the one below:

<html>
<link rel="stylesheet" href="/thing.css">

<body>
    <script src="/nothing.js"></script>
    <link rel="stylesheet" href="/styles.css">
    <a href='#a_hash'>A link</a>
</body>

</html>

I want to be able to get the following:

<html>
<link rel="stylesheet" href="//example.com/thing.css">

<body>
    <script src="//example.com/nothing.js"></script>
    <link rel="stylesheet" href="//example.com/styles.css">
    <a href='//example.com#a_hash'>A link</a>
</body>

</html>

And I preferably need to do this without a library, and in vanilla JavaScript. Currently I have this regex to find urls (I'm open to new ones!):

<.+?(?:href|src)=(?:"|')([^"']+)(?:"|').*?>
question from:https://stackoverflow.com/questions/65904318/how-can-i-replace-all-internal-urls-in-a-string-of-html-with-their-relative-exte

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use

.replace(/((?:href|src)=)(?!//example.com)(["']?)([^"']+)2/gi, 
   (_,x,y,z) => z.charAt(0) == '/' ? 
   `${x}${y}//example.com${z}${y}` : `${x}${y}//example.com/${z}${y}`)

See regex proof.

Explanation

--------------------------------------------------------------------------------
                         the boundary between a word char (w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  (                        group and capture to 1:
--------------------------------------------------------------------------------
    (?:                      group, but do not capture:
--------------------------------------------------------------------------------
      href                     'href'
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      src                      'src'
--------------------------------------------------------------------------------
    )                        end of grouping
--------------------------------------------------------------------------------
    =                        '='
--------------------------------------------------------------------------------
  )                        end of 1
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    /                       '/'
--------------------------------------------------------------------------------
    /                       '/'
--------------------------------------------------------------------------------
    example                  'example'
--------------------------------------------------------------------------------
    .                       '.'
--------------------------------------------------------------------------------
    com                      'com'
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  (                        group and capture to 2:
--------------------------------------------------------------------------------
    ["']?                    any character of: '"', ''' (optional
                             (matching the most amount possible))
--------------------------------------------------------------------------------
  )                        end of 2
--------------------------------------------------------------------------------
  (                        group and capture to 3:
--------------------------------------------------------------------------------
    [^"']+                   any character except: '"', ''' (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of 3
--------------------------------------------------------------------------------
  2                       what was matched by capture 2

const string = ' href="nowhere"  src="/nothing.js"';
const rx = /((?:href|src)=)(?!//example.com)(["']?)([^"']+)2/gi;
console.log(string.replace(rx, (_,x,y,z) => z.charAt(0) == '/' ? 
   `${x}${y}//example.com${z}${y}` : `${x}${y}//example.com/${z}${y}`));

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...