Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
689 views
in Technique[技术] by (71.8m points)

python - PyTorch / Gensim - How to load pre-trained word embeddings

I want to load a pre-trained word2vec embedding with gensim into a PyTorch embedding layer.

So my question is, how do I get the embedding weights loaded by gensim into the PyTorch embedding layer.

Thanks in Advance!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I just wanted to report my findings about loading a gensim embedding with PyTorch.


  • Solution for PyTorch 0.4.0 and newer:

From v0.4.0 there is a new function from_pretrained() which makes loading an embedding very comfortable. Here is an example from the documentation.

import torch
import torch.nn as nn

# FloatTensor containing pretrained weights
weight = torch.FloatTensor([[1, 2.3, 3], [4, 5.1, 6.3]])
embedding = nn.Embedding.from_pretrained(weight)
# Get embeddings for index 1
input = torch.LongTensor([1])
embedding(input)

The weights from gensim can easily be obtained by:

import gensim
model = gensim.models.KeyedVectors.load_word2vec_format('path/to/file')
weights = torch.FloatTensor(model.vectors) # formerly syn0, which is soon deprecated

As noted by @Guglie: in newer gensim versions the weights can be obtained by model.wv:

weights = model.wv

  • Solution for PyTorch version 0.3.1 and older:

I'm using version 0.3.1 and from_pretrained() isn't available in this version.

Therefore I created my own from_pretrained so I can also use it with 0.3.1.

Code for from_pretrained for PyTorch versions 0.3.1 or lower:

def from_pretrained(embeddings, freeze=True):
    assert embeddings.dim() == 2, 
         'Embeddings parameter is expected to be 2-dimensional'
    rows, cols = embeddings.shape
    embedding = torch.nn.Embedding(num_embeddings=rows, embedding_dim=cols)
    embedding.weight = torch.nn.Parameter(embeddings)
    embedding.weight.requires_grad = not freeze
    return embedding

The embedding can be loaded then just like this:

embedding = from_pretrained(weights)

I hope this is helpful for someone.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...