Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
513 views
in Technique[技术] by (71.8m points)

python - Turning a Pandas Dataframe to an array and evaluate Multiple Linear Regression Model

I am trying to evaluate a multiple linear regression model. I have a data set like this :

enter image description here

This data set has 157 rows * 54 columns.

I need to predict ground_truth value from articles. I will add my multiple linear model 7 articles between en_Amantadine with en_Common.

I have code for multiple linear regression :

from sklearn.linear_model import LinearRegression
X = [[6, 2], [8, 1], [10, 0], [14, 2], [18, 0]] // need to modify for my problem
y = [[7],[9],[13],[17.5], [18]] // need to modify
model = LinearRegression()
model.fit(X, y)

My problem is, I cannot extract data from my DataFrame for X and y variables. In my code X should be:

X = [[4984, 94, 2837, 857, 356, 1678, 29901],
     [4428, 101, 4245, 906, 477, 2313, 34176],
      ....
     ]
y = [[3.135999], [2.53356] ....]

I cannot convert DataFrame to this type of structure. How can i do this ?

Any help is appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can turn the dataframe into a matrix using the method as_matrix directly on the dataframe object. You might need to specify the columns which you are interested in X=df[['x1','x2','X3']].as_matrix() where the different x's are the column names.

For the y variables you can use y = df['ground_truth'].values to get an array.

Here is an example with some randomly generated data:

import numpy as np
#create a 5X5 dataframe
df = pd.DataFrame(np.random.random_integers(0, 100, (5, 5)), columns = ['X1','X2','X3','X4','y'])

calling as_matrix() on df returns a numpy.ndarray object

X = df[['X1','X2','X3','X4']].as_matrix()

Calling values returns a numpy.ndarray from a pandas series

y =df['y'].values

Notice: You might get a warning saying:FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.

To fix it use values instead of as_matrix as shown below

X = df[['X1','X2','X3','X4']].values

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...