What is going on when I use something like
KernelDensity(kernel='gaussian', bandwidth=1.0).fit(X)
(cf. https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KernelDensity.html) with X an n-by-d (instances by features) 2D-array?
Is this a really a multivariate Gaussian, i.e. is a sample covariance matrix being calculated from X (and scaled by the bandwidth) as described in something like the scipy implementation https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html? If so, how do I retrieve that information? I assume it isn't (otherwise the docs would say so), but it's not clear what the above actually produces.
FWIW, I'm trying to build a probabilistic classifier (Bayesian with kernel density estimation) and thought I'd try to use what's out there instead of starting from scratch (since I know nothing about stats, computers, or machine learning). My starting point is more-or-less https://jakevdp.github.io/PythonDataScienceHandbook/05.13-kernel-density-estimation.html#Example:-Not-So-Naive-Bayes.
1.4m articles
1.4m replys
5 comments
57.0k users