I am trying to utilize a CSR matrix as a variable to enhance my model. This matrix is derived from analyzing tf-idf metrics from string values in a pandas dataframe.
The series that the CSR matrix is derived from has 7325 records. After the CSR Matrix is generated it has a shape of (7325, 4927). I am not clear on the matrix format or what that 4927 represents.
But basically I am trying to use the matrix as 1 variable in a multivariate random forest classification model. I have tried converting the matrix to a dataframe, and then adding the martix dataframe and 2 other series' to create a new dataframe representing all my variable to plug into the model.
pd.DataFrame(pd.DataFrame(matrix), df['var1'], df['var2'])
but my dataframe is crazy. The matrix data isn't in the table. Furthermore Var 2 becomes the x-axis and var 1 is the y-axis. This does not happen if I just join the var 1 and var 2 series in a separate dataframe.
[![enter image description here][1]][1]
I can convert the matrix to a dataframe with a shape of (7325,1) just fine by
pd.DataFrame(matrix)
The shape of each of the other series' are (7325,). I don't know if this has something to do with it.
I generate the matrix via a tf-idf analysis of a string variable of parcel owner names. It involves tokenizing the string varibale and assigning values to every element in the string. I am able to pass the CSR matrix directly to sklearn RandomForestClassifier model and it works fine. I am now trying to add variables to the model:
from nltk.stem import PorterStemmer
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
stemmer =PorterStemmer()
df['String_variable']=df['String_variable'].apply(lambda x: [stemmer.stem(y) for y in x])
count_vect = CountVectorizer()
counts = count_vect.fit_transform(df['String_variable'])
transformer = TfidfTransformer().fit(counts)
matrix=transformer.transform(counts)
[1]: https://i.stack.imgur.com/C5eDS.png
question from:
https://stackoverflow.com/questions/65942508/using-a-csr-matrix-in-a-multivariate-random-forest-classification-model 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…