I'm a beginner to this field and am stuck. I am following this tutorial (https://towardsdatascience.com/multi-label-multi-class-text-classification-with-bert-transformer-and-keras-c6355eccb63a) to build a multi-label classification using huggingface tranformers.
Following is the code I'm using to train my model.
# Name of the BERT model to use
model_name = 'bert-base-uncased'
# Max length of tokens
max_length = 100
PATH = 'uncased_L-12_H-768_A-12/'
# Load transformers config and set output_hidden_states to False
config = BertConfig.from_pretrained(PATH)
config.output_hidden_states = False
# Load BERT tokenizer
tokenizer = BertTokenizerFast.from_pretrained(PATH, local_files_only=True, config = config)
# tokenizer = BertTokenizer.from_pretrained(PATH, local_files_only=True, config = config)
# Load the Transformers BERT model
transformer_model = TFBertModel.from_pretrained(PATH, config = config,from_pt=True)
#######################################
### ------- Build the model ------- ###
# Load the MainLayer
bert = transformer_model.layers[0]
# Build your model input
input_ids = Input(shape=(None,), name='input_ids', dtype='int32')
# attention_mask = Input(shape=(max_length,), name='attention_mask', dtype='int32')
# inputs = {'input_ids': input_ids, 'attention_mask': attention_mask}
inputs = {'input_ids': input_ids}
# Load the Transformers BERT model as a layer in a Keras model
bert_model = bert(inputs)[1]
dropout = Dropout(config.hidden_dropout_prob, name='pooled_output')
pooled_output = dropout(bert_model, training=False)
# Then build your model output
issue = Dense(units=len(data.U_label.value_counts()), kernel_initializer=TruncatedNormal(stddev=config.initializer_range), name='issue')(pooled_output)
outputs = {'issue': issue}
# And combine it all in a model object
model = Model(inputs=inputs, outputs=outputs, name='BERT_MultiLabel_MultiClass')
# Take a look at the model
model.summary()
#######################################
### ------- Train the model ------- ###
# Set an optimizer
optimizer = Adam(
learning_rate=5e-05,
epsilon=1e-08,
decay=0.01,
clipnorm=1.0)
# Set loss and metrics
loss = {'issue': CategoricalCrossentropy(from_logits = True)}
# loss = {'issue': CategoricalCrossentropy()}
metric = {'issue': CategoricalAccuracy('accuracy')}
# Compile the model
model.compile(
optimizer = optimizer,
loss = loss,
metrics = metric)
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(data['U_H_Label'])
# Ready output data for the model
y_issue = to_categorical(le.transform(data['U_H_Label']))
# Tokenize the input (takes some time)
x = tokenizer(
text=data['Input_Data'].to_list(),
add_special_tokens=True,
max_length=max_length,
truncation=True,
padding=True,
return_tensors='tf',
return_token_type_ids = False,
return_attention_mask = True,
verbose = True)
# Fit the model
history = model.fit(
# x={'input_ids': x['input_ids'], 'attention_mask': x['attention_mask']},
x={'input_ids': x['input_ids']},
y={'issue': y_issue},
validation_split=0.2,
batch_size=64,
epochs=10)
When I use the model.predict() function, I think I get logit scores for each class, and would like to convert them to probability scores ranging from 0 to 1.
I have read in multiple blogs that a softmax function is what I have to use, but am not able to relate on where and how. If anyone could please tell me what line of code would be required, I'd be grateful!
question from:
https://stackoverflow.com/questions/65918679/huggingface-transformers-convert-logit-scores-to-probability