Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
171 views
in Technique[技术] by (71.8m points)

How can I generate samples from a non-normal multivariable distribution in Python?

I have an input dataframe df_input with 10 variables and 100 rows. This data are not normal distributed. I would like to generate an output dataframe with 10 variables and 10,000 rows, such that the covariance matrix and mean of the new dataframe are the same as those of the original one. The output variables should not be normal distributed, but rather have a distribution similar to the input variables. That is: Cov(df_output) = Cov(df_input) and mean(df_ouput) = mean(df_input) Is there a Python function that does it?

Note: np.random.multivariate_normal(mean_input,Cov_input,10000) does almost this, but the output variables are normal distributed, whereas I need them to have the same (or similar) distribution as the input.

question from:https://stackoverflow.com/questions/65877211/how-can-i-generate-samples-from-a-non-normal-multivariable-distribution-in-pytho

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Have you considered using a GAN (generative adversarial network)? Takes a bit more effort than just using a predefined function, but essentially it does exactly what you are hoping to do. Here's the original paper: https://arxiv.org/abs/1406.2661

There are many PyTorch/Tensorflow codes that you can download and fit to your purposes, for example this one: https://github.com/eriklindernoren/PyTorch-GAN

Here is also a blog post I found quite helpful with an introduction to GANs. https://medium.com/ai-society/gans-from-scratch-1-a-deep-introduction-with-code-in-pytorch-and-tensorflow-cb03cdcdba0f

Maybe GAN is overkill for this problem and there are simpler methods for upscaling the sample size, in which case I'd be interested to learn about them.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...