Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
223 views
in Technique[技术] by (71.8m points)

python 3.x - What does sys.intern() do and when should it be used?

I came across this question about memory management of dictionaries, which mentions the intern function. What exactly does it do, and when would it be used?

To give an example: if I have a set called seen, that contains tuples in the form (string1,string2), which I use to check for duplicates, would storing (intern(string1),intern(string2)) improve performance w.r.t. memory or speed?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

From the Python 3 documentation:

sys.intern(string)

Enter string in the table of “interned” strings and return the interned string – which is string itself or a copy. Interning strings is useful to gain a little performance on dictionary lookup – if the keys in a dictionary are interned, and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer compare instead of a string compare. Normally, the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys.

Interned strings are not immortal; you must keep a reference to the return value of intern() around to benefit from it.

Clarification:

As the documentation suggests, the sys.intern function is intended to be used for performance optimization.

The sys.intern function maintains a table of interned strings. When you attempt to intern a string, the function looks it up in the table and:

  1. If the string does not exists (hasn't been interned yet) the function saves it in the table and returns it from the interned strings table.

    >>> import sys
    >>> a = sys.intern('why do pangolins dream of quiche')
    >>> a
    'why do pangolins dream of quiche'
    

    In the above example, a holds the interned string. Even though it is not visible, the sys.intern function has saved the 'why do pangolins dream of quiche' string object in the interned strings table.

  2. If the string exists (has been interned) the function returns it from the interned strings table.

    >>> b = sys.intern('why do pangolins dream of quiche')
    >>> b
    'why do pangolins dream of quiche'
    

    Even though it is not immediately visible, because the string 'why do pangolins dream of quiche' has been interned before, b holds now the same string object as a.

    >>> b is a
    True
    

    If we create the same string without using intern, we end up with two different string objects that have the same value.

    >>> c = 'why do pangolins dream of quiche'
    >>> c is a
    False
    >>> c is b
    False
    

By using sys.intern you ensure that you never create two string objects that have the same value—when you request the creation of a second string object with the same value as an existing string object, you receive a reference to the pre-existing string object. This way, you are saving memory. Also, string objects comparison is now very efficient because it is carried out by comparing the memory addresses of the two string objects instead of their content.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...