Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
102 views
in Technique[技术] by (71.8m points)

python - How can I concatenation of the TF sentence representation and the word-based features (Lexicon) and used as input to the different algorithms?

enter image description hereI have a dataset(Arabic Tweets) and Lexicon, I want to detect the emotion by machine learning algorithms. how can I do this step because I am a beginner in python:

I did the preprocessing step and other functions as shown in the code below. just I want to apply these steps:

  1. compute the TF scheme in order to obtain how frequently an expression (term, word) occurs in a document.

  2. To incorporate the affective lexical features we check the presence of lexicon terms in the sentence and we obtain a vector that represents each emotional category (anger, fear, sadness, and joy).

  3. Finally, to carry out the classification, the concatenation of the TF sentence representation and the word-based features are used as input to the different algorithms (SVM, LR, MLP, MultinomialNB).

    df=pd.read_csv("C:/Users/User/Desktop/Dataset with stopword.csv")
    df.shape
    
    def noramlize(Tweet):
        Tweet = re.sub(r"[?????]", "?", Tweet)
        Tweet = re.sub(r"?", "?", Tweet)
        Tweet = re.sub(r"?", "?", Tweet)
        Tweet = re.sub(r"?", "?", Tweet)
        Tweet = re.sub(r'[^?-? ]', "", Tweet)
    
     noise = re.compile(""" ?    | # Tashdid
                          ?    | # Fatha
                          ?    | # Tanwin Fath
                          ?    | # Damma
                          ?    | # Tanwin Damm
                          ?    | # Kasra
                          ?    | # Tanwin Kasr
                          ?    | # Sukun
                          ?     # Tatwil/Kashida
                      """, re.VERBOSE)
     Tweet = re.sub(noise, '', Tweet)
     return Tweet
    
     def stopWordRmove(Tweet):
         ar_stop_list = open("ar_stop_word_list.txt", "r", encoding="utf8")
         stop_words = ar_stop_list.read().split('
    ')
     needed_words = []
     words = word_tokenize(Tweet)
     for w in words:
         if w not in (stop_words):
             needed_words.append(w)
     filtered_sentence = " ".join(needed_words)
     return filtered_sentence
    
     def stemming(Tweet):
         st = ISRIStemmer()
         stemmed_words = []
         words = word_tokenize(Tweet)
         for w in words:
             stemmed_words.append(st.stem(w))
         stemmed_sentence = " ".join(stemmed_words)
         return stemmed_sentence
    
    
     def prepareDataSets(df):
         sentences = []
     for index, r in df.iterrows():
         Tweet = noramlize(r['Tweet'])
         Tweet = stopWordRmove(r['Tweet'])
         Tweet = stemming(r['Tweet'])
    
         if r['Affect Dimension'] == 'fear': 
             sentences.append([Tweet, 'fear'])
    
         if r['Affect Dimension'] == 'anger': 
             sentences.append([Tweet, 'anger'])
    
         if r['Affect Dimension'] == 'joy': 
             sentences.append([Tweet, 'joy'])
    
         if r['Affect Dimension'] == 'sadness': 
             sentences.append([Tweet, 'sadness'])
    
     df_sentences = DataFrame(sentences, columns=['Tweet', 'Affect Dimension'])
     return df_sentences
    
     preprocessed_df = prepareDataSets(df)
     preprocessed_df
    
     def featureExtraction(data):
     vectorizer = CountVectorizer()
     Count_data = vectorizer.fit_transform(data)
     return Count_data
    
    def learning(clf, X, Y):
    X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size=0.33, random_state=0)
    classifer = clf()
    classifer.fit(X_train, Y_train)
    
    predict = cross_val_predict(classifer, X_test, Y_test, cv=10, fit_params=None)
    
    scores = cross_val_score(classifer,X_test, Y_test, cv=10)
    
    print (scores)
    
    print ("Accuracy of %s: %0.2f (+/- %0.2f)" % (classifer, scores.mean(), scores.std() *2))
    print (classification_report(Y_test, predict))
    
    main(SVC) 
    
    clfs = [LogisticRegression, MultinomialNB, MLPClassifier]
    
    for clf in clfs:
        main(clf)
    
question from:https://stackoverflow.com/questions/65922093/how-can-i-concatenation-of-the-tf-sentence-representation-and-the-word-based-fea

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...