Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
448 views
in Technique[技术] by (71.8m points)

python - Trying to strip b' ' from my Numpy array's savetxt() representation

So I have what I feel is a very dumb problem.

I create an array from a file:

A1=np.loadtxt(file, dtype='a100')

I want to write that array after it's done processing to another file:

np.savetxt("Test.txt", A1, fmt=%s, delimiter=',')

Why is it writing out b'string'? I think I understand it's writing it out as byte but for the life of me I can't figure out how to write it out without the b''.

I know this is probably something incredibly easy I'm overlooking!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

A1 is loaded as an array of bytestrings. Python3 used unicode strings as default, so usually prepends them with the 'b'. That's normal with print. I'm a little surprised that it does so also during the file write.

In any case, this seems to do the trick:

A2=np.array([x.decode() for x in A1])
np.savetxt("Test.txt", A2, fmt='%s', delimiter=',')

A2 will have a dtype like dtype='<U100'.


My test array is:

array([b'one.com', b'two.url', b'three.four'], dtype='|S10')

loaded from a simple text file:

one.com
two.url
three.four

.decode is a string method. [x.decode() for x in A1] works for a simple 1d array of bytestrings. If A1 is 2d, the iteration has to be done over all elements, not just the rows. And if A1 is structured array, is has to be applied to the strings within the elements.


Another possibility is to use a converter during load, so you get an array of (unicode) strings

In [508]: A1=np.loadtxt('urls.txt', dtype='U',
    converters={0:lambda x:x.decode()})
In [509]: A1
Out[509]: 
array(['one.com', 'two.url', 'three.four'], dtype='<U10')
In [510]: np.savetxt('test0.txt',A1,fmt='%s')
In [511]: cat test0.txt
one.com
two.url
three.four

The lib that contains loadtxt has a couple of converter functions, asbytes, asbytes_nested, and asstr. So converters could also be: converters={0:np.lib.npyio.asstr}.

genfromtxt handles this without converters:

 A1=np.genfromtxt('urls.txt', dtype='U')
 # array(['one.com', 'two.url', 'three.four'], dtype='<U10')

To understand why savetxt save unicode strings as we want, but appends the b for bytestrings, we have to look at its code.

np.savetxt (running on py3) is essentially:

fh = open(fname, 'wb')
X = np.atleast_2d(X).T
# make a 'fmt' that matches the columns of X (with delimiters)
for row in X:
    fh.write(asbytes(format % tuple(row) + newline))

Looking at two sample strings (str and bytestr):

In [617]: asbytes('%s'%tuple(['one.two']))
Out[617]: b'one.two'

In [618]: asbytes('%s'%tuple([b'one.two']))
Out[618]: b"b'one.two'"

Writing to a 'wb' file removes that outer layer of b'', leaving the inner for the bytestring. It also explains why strings ('plain' py3 unicode) are written as 'latin1' strings to the file.


You could write a bytestrings array directly, without savetxt. For example:

A0 = array([b'one.com', b'two.url', b'three.four'], dtype='|S10')
with open('test0.txt','wb') as f:
    for x in A0:
        f.write(x+b'
')

cat test0.txt
    one.com
    two.url
    three.four

Unicode strings can also be written directly, producing the same file:

A1 = array(['one.com', 'two.url', 'three.four'], dtype='<U10')
with open('test1.txt','w') as f:
    for x in A1:
        f.write(x+'
')

The default encoding for such a file is encoding='UTF-8', the same as used with 'one.com'.encode(). The effect it is the same as what savetxt does:

with open('test1.txt','wb') as f:
    for x in A1:
        f.write(x.encode()+b'
')

np.char has .encode and .decode methods, which appear to operate iteratively on the elements of an array.

Thus

 np.char.decode(A1)   # convert |S10 to <U10, like [x.decode() for x in A1]
 np.char.encode(A1)   # convert <U10 to |S10

This works with multidimensional arrays

 np.savetxt('testm.txt',np.char.decode(A_bytes[:,None][:,[0,0]]),
     fmt='%s',delimiter=',  ')

With a structured array, np.char.decode has to be applied individually to each of the char fields.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...