pandas - Split a series to several columns based on length in Python

Question

Welcome To Ask or Share your Answers For Others

pandas - Split a series to several columns based on length in Python

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

pandas - Split a series to several columns based on length in Python

I have a series that looks like this:

01 1ABCD     E    1   4.011   3.952   7.456 -0.3096  1.0132  0.2794

02 1ABCD     F    2   4.088   3.920   7.517  0.3839 -0.5482 -1.3874

...

I want to split it into 10 columns based on the length: the first 4 characters including spaces = column 1, the seconds 5 characters = column 2, ..., the last 8 characters = column10

The result should be something like this:

column1	column2	column3	....	column10
01 1	ABCD	E	.....	0.2794
02 1	ABCD	F	....	-1.3874

question from:https://stackoverflow.com/questions/65844594/split-a-series-to-several-columns-based-on-length-in-python

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:30:45+0000

An elegant solution is to:

Start with a list of sizes (how many chars should be in each "segment").
Create a (compiled) Regex pattern with named capturing groups, each capturing a stated number of chars.
Use str.extract to extract the required substrings from your Series. Group names will be used as names of output columns.

Assuming that s is the source Series, the code to do it is:

import re

# Define size of each group
sizes = [4, 4, 6, 5, 8, 8, 8, 8, 8, 8]
# Generate the pattern string and compile it
pat = re.compile(''.join([ f'(?P<Column{idx}>.{{{n}}})'
    for idx, n in enumerate(sizes, start=1) ]))
# Generate the result
result = s.str.extract(pat)

The result is:

  Column1 Column2 Column3 Column4   Column5   Column6   Column7   Column8  Column9  Column10
0    01 1    ABCD       E       1     4.011     3.952     7.456   -0.3096   1.0132    0.2794 
1    02 1    ABCD       F       2     4.088     3.920     7.517    0.3839  -0.5482   -1.3874

But note that all columns in result are of object type (actually they are strings). So to perform any sensible processing of them, you should probably:

strip spaces from each column (both leading and trailing),
convert some columns to either int or float.

Categories

pandas - Split a series to several columns based on length in Python

pandas - Split a series to several columns based on length in Python

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags