Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
276 views
in Technique[技术] by (71.8m points)

percent encoding URL with python

When I enter a URL into maps.google.com such as https://dl.dropbox.com/u/94943007/file.kml , it will encode this URL into:

https:%2F%2Fdl.dropbox.com%2Fu%2F94943007%2Ffile.kml

I am wondering what is this encoding called and is there a way to encode a URL like this using python?

I tried this:

The process is called URL encoding:

>>> urllib.quote('https://dl.dropbox.com/u/94943007/file.kml', '')
'https%3A%2F%2Fdl.dropbox.com%2Fu%2F94943007%2Ffile.kml'

but did not get the expected results:

'https%3A//dl.dropbox.com/u/94943007/file.kml'

what i need is this:

https:%2F%2Fdl.dropbox.com%2Fu%2F94943007%2Ffile.kml

how do i encode this URL properly?

the documentation here:

https://developers.google.com/maps/documentation/webservices/

states:

All characters to be URL-encoded are encoded using a '%' character and a two-character hex value corresponding to their UTF-8 character. For example, 上海+中國 in UTF-8 would be URL-encoded as %E4%B8%8A%E6%B5%B7%2B%E4%B8%AD%E5%9C%8B. The string ? and the Mysterians would be URL-encoded as %3F+and+the+Mysterians.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use

urllib.quote_plus(url, safe=':')

Since you don't want the colon encoded you need to specify that when calling urllib.quote():

>>> expected = 'https:%2F%2Fdl.dropbox.com%2Fu%2F94943007%2Ffile.kml'
>>> url = 'https://dl.dropbox.com/u/94943007/file.kml'
>>> urllib.quote(url, safe=':') == expected
True

urllib.quote() takes a keyword argument safe that defaults to / and indicates which characters are considered safe and therefore don't need to be encoded. In your first example you used '' which resulted in the slashes being encoded. The unexpected output you pasted below where the slashes weren't encoded probably was from a previous attempt where you didn't set the keyword argument safe at all.

Overriding the default of '/' and instead excluding the colon with ':' is what finally yields the desired result.

Edit: Additionally, the API calls for spaces to be encoded as plus signs. Therefore urllib.quote_plus() should be used (whose keyword argument safe doesn't default to '/').


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...