It sounds like trg_mask and np_mask are tensors stored on two different devices (cpu and cuda:0). If you want to perform an operation on them, they will need to both be on either cpu or they will both need to be on cuda:0.
Based on the information given I'm not sure which variable is on which device, but if you want to move a variable from cuda:0 to cpu you can do this.
var = var.detach().cpu().numpy()
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…