I'm not an expert in distributed system and CUDA. But there is one really interesting feature that PyTorch support which is nn.DataParallel
and nn.DistributedDataParallel
. How are they actually implemented? How do they separate common embeddings and synchronize data?
Here is a basic example of DataParallel
.
import torch.nn as nn
from torch.autograd.variable import Variable
import numpy as np
class Model(nn.Module):
def __init__(self):
super().__init__(
embedding=nn.Embedding(1000, 10),
rnn=nn.Linear(10, 10),
)
def forward(self, x):
x = self.embedding(x)
x = self.rnn(x)
return x
model = nn.DataParallel(Model())
model.forward(Variable.from_numpy(np.array([1,2,3,4,5,6], dtype=np.int64)).cuda()).cpu()
PyTorch can split the input and send them to many GPUs and merge the results back.
How does it manage embeddings and synchronization for a parallel model or a distributed model?
I wandered around PyTorch's code but it's very hard to know how the fundamentals work.
question from:
https://stackoverflow.com/questions/53375422/how-does-pytorchs-parallel-method-and-distributed-method-work 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…