You are getting confused by the Python string representation.
When you print a python list (or any other standard Python container), the contents are shown in special representation to make debugging easier; each value is shown is the result of calling the repr()
function on that value. For string values, that means the result is a unicode string representation, and that is not the same thing as what you see when the string is printed directly.
Unicode and byte strings, when shown like that, are presented as string literals; quoted values that you can copy and paste straight back into Python code without having to worry about encoding; anything that is not a printable ASCII character is shown in quoted form. Unicode code points beyond the latin-1 plane are shown as 'u....'
escape sequences. Characters in the latin-1 range use the 'x..
escape sequence. Many control characters are shown in their 1-letter escape form, such as
and
.
The python interactive prompt does the same thing; when you echo a value on the prompt without using print
, the value in 'represented', shown in the repr()
form:
>>> print u'u2036Hello World!u2033'
?Hello World!″
>>> u'u2036Hello World!u2033'
u'u2036Hello World!u2033'
>>> [u'u2036Hello World!u2033', u'Another
string']
[u'u2036Hello World!u2033', u'Another
string']
>>> print _[1]
Another
string
This entirly normal behaviour. In other words, your code works, nothing is broken.
To come back to your code, if you want to extract just the 'text'
key from the tweet JSON structures, filter while reading the file, don't bother with looping twice:
import json
with open("file_name.txt") as tweets_file:
tweets = []
for line in tweets_file:
data = json.loads(a)
if 'text' in data:
tweets.append(data['text'])
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…