Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
597 views
in Technique[技术] by (71.8m points)

pandas - Split a series to several columns based on length in Python

I have a series that looks like this:

01 1ABCD     E    1   4.011   3.952   7.456 -0.3096  1.0132  0.2794

02 1ABCD     F    2   4.088   3.920   7.517  0.3839 -0.5482 -1.3874

...

I want to split it into 10 columns based on the length: the first 4 characters including spaces = column 1, the seconds 5 characters = column 2, ..., the last 8 characters = column10

The result should be something like this:

column1 column2 column3 .... column10
01 1 ABCD E ..... 0.2794
02 1 ABCD F .... -1.3874
question from:https://stackoverflow.com/questions/65844594/split-a-series-to-several-columns-based-on-length-in-python

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

An elegant solution is to:

  • Start with a list of sizes (how many chars should be in each "segment").
  • Create a (compiled) Regex pattern with named capturing groups, each capturing a stated number of chars.
  • Use str.extract to extract the required substrings from your Series. Group names will be used as names of output columns.

Assuming that s is the source Series, the code to do it is:

import re

# Define size of each group
sizes = [4, 4, 6, 5, 8, 8, 8, 8, 8, 8]
# Generate the pattern string and compile it
pat = re.compile(''.join([ f'(?P<Column{idx}>.{{{n}}})'
    for idx, n in enumerate(sizes, start=1) ]))
# Generate the result
result = s.str.extract(pat)

The result is:

  Column1 Column2 Column3 Column4   Column5   Column6   Column7   Column8  Column9  Column10
0    01 1    ABCD       E       1     4.011     3.952     7.456   -0.3096   1.0132    0.2794 
1    02 1    ABCD       F       2     4.088     3.920     7.517    0.3839  -0.5482   -1.3874 

But note that all columns in result are of object type (actually they are strings). So to perform any sensible processing of them, you should probably:

  • strip spaces from each column (both leading and trailing),
  • convert some columns to either int or float.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...