I am trying to use GPT2 for Arabic text classification task as follows:
tokenizer = GPT2Tokenizer.from_pretrained(model_path) model = GPT2ForSequenceClassification.from_pretrained(model_path, num_labels=len(lab2ind))
However, when I use the tokenizer it converts the Arabic characters to symbols like this '?ù??aù??±'
'?ù??aù??±'
1.4m articles
1.4m replys
5 comments
57.0k users