mlpack is an intuitive, fast, and flexible C++ machine learning library with
bindings to other languages. It is meant to be a machine learning analog to
LAPACK, and aims to implement a wide array of machine learning methods and
functions as a "swiss army knife" for machine learning researchers. In addition
to its powerful C++ interface, mlpack also provides command-line programs,
Python bindings, Julia bindings, Go bindings and R bindings.
mlpack uses an open governance model and is fiscally
sponsored by NumFOCUS. Consider making a
tax-deductible donation to help the
project pay for developer time, professional services, travel, workshops, and a
variety of other needs.
The mlpack website can be found at https://www.mlpack.org and it contains
numerous tutorials and extensive documentation. This README serves as a guide
for what mlpack is, how to install it, how to run it, and where to find more
documentation. The website should be consulted for further information:
If you use mlpack in your research or software, please cite mlpack using the
citation below (given in BibTeX format):
@article{mlpack2018,
title = {mlpack 3: a fast, flexible machine learning library},
author = {Curtin, Ryan R. and Edel, Marcus and Lozhnikov, Mikhail and
Mentekidis, Yannis and Ghaisas, Sumedh and Zhang,
Shangtong},
journal = {Journal of Open Source Software},
volume = {3},
issue = {26},
pages = {726},
year = {2018},
doi = {10.21105/joss.00726},
url = {https://doi.org/10.21105/joss.00726}
}
Citations are beneficial for the growth and improvement of mlpack.
All of those should be available in your distribution's package manager. If
not, you will have to compile each of them by hand. See the documentation for
each of those packages for more information.
If you would like to use or build the mlpack Python bindings, make sure that the
following Python packages are installed:
setuptools
cython >= 0.24
numpy
pandas >= 0.15.0
If you would like to build the Julia bindings, make sure that Julia >= 1.3.0 is
installed.
If you would like to build the Go bindings, make sure that Go >= 1.11.0 is
installed with this package:
Gonum
If you would like to build the R bindings, make sure that R >= 4.0 is
installed with these R packages.
If the STB library headers are available, image loading support will be
compiled.
If you are compiling Armadillo by hand, ensure that LAPACK and BLAS are enabled.
4. Building mlpack from source
This document discusses how to build mlpack from source. These build directions
will work for any Linux-like shell environment (for example Ubuntu, macOS,
FreeBSD etc). However, mlpack is in the repositories of many Linux distributions
and so it may be easier to use the package manager for your system. For example,
on Ubuntu, you can install the mlpack library and command-line executables (e.g.
mlpack_pca, mlpack_kmeans etc.) with the following command:
$ sudo apt-get install libmlpack-dev mlpack-bin
On Fedora or Red Hat (EPEL):
$ sudo dnf install mlpack-devel mlpack-bin
Note: Older Ubuntu versions may not have the most recent version of mlpack
available---for instance, at the time of this writing, Ubuntu 16.04 only has
mlpack 3.4.2 available. Options include upgrading your Ubuntu version, finding
a PPA or other non-official sources, or installing with a manual build.
Note: If you are using RHEL7/CentOS 7, gcc 4.8 is too old to compile mlpack.
One option is to use devtoolset-8; see
here for more
information.
There are some useful pages to consult in addition to this section:
mlpack uses CMake as a build system and allows several flexible build
configuration options. You can consult any of the CMake tutorials for
further documentation, but this tutorial should be enough to get mlpack built
and installed.
First, unpack the mlpack source and change into the unpacked directory. Here we
use mlpack-x.y.z where x.y.z is the version.
$ tar -xzf mlpack-x.y.z.tar.gz
$ cd mlpack-x.y.z
Then, make a build directory. The directory can have any name, but 'build' is
sufficient.
$ mkdir build
$ cd build
The next step is to run CMake to configure the project. Running CMake is the
equivalent to running ./configure with autotools. If you run CMake with no
options, it will configure the project to build with no debugging symbols and
no profiling information:
$ cmake ../
Options can be specified to compile with debugging information and profiling information:
$ cmake -D DEBUG=ON -D PROFILE=ON ../
Options are specified with the -D flag. The allowed options include:
DEBUG=(ON/OFF): compile with debugging symbols
PROFILE=(ON/OFF): compile with profiling symbols
ARMA_EXTRA_DEBUG=(ON/OFF): compile with extra Armadillo debugging symbols
ARMADILLO_INCLUDE_DIR=(/path/to/armadillo/include/): path to Armadillo headers
ARMADILLO_LIBRARY=(/path/to/armadillo/libarmadillo.so): Armadillo library
BUILD_CLI_EXECUTABLES=(ON/OFF): whether or not to build command-line programs
BUILD_PYTHON_BINDINGS=(ON/OFF): whether or not to build Python bindings
PYTHON_EXECUTABLE=(/path/to/python_version): Path to specific Python executable
PYTHON_INSTALL_PREFIX=(/path/to/python/): Path to root of Python installation
BUILD_JULIA_BINDINGS=(ON/OFF): whether or not to build Julia bindings
JULIA_EXECUTABLE=(/path/to/julia): Path to specific Julia executable
BUILD_GO_BINDINGS=(ON/OFF): whether or not to build Go bindings
GO_EXECUTABLE=(/path/to/go): Path to specific Go executable
BUILD_GO_SHLIB=(ON/OFF): whether or not to build shared libraries required by Go bindings
BUILD_R_BINDINGS=(ON/OFF): whether or not to build R bindings
R_EXECUTABLE=(/path/to/R): Path to specific R executable
BUILD_TESTS=(ON/OFF): whether or not to build tests
BUILD_SHARED_LIBS=(ON/OFF): compile shared libraries and executables as
opposed to static libraries
DISABLE_DOWNLOADS=(ON/OFF): whether to disable all downloads during build
ENSMALLEN_INCLUDE_DIR=(/path/to/ensmallen/include): path to include directory
for ensmallen
STB_IMAGE_INCLUDE_DIR=(/path/to/stb/include): path to include directory for
STB image library
USE_OPENMP=(ON/OFF): whether or not to use OpenMP if available
BUILD_DOCS=(ON/OFF): build Doxygen documentation, if Doxygen is available
(default ON)
For example, to build mlpack library and CLI bindings statically the following
command can be used:
$ cmake -D BUILD_SHARED_LIBS=OFF ../
Other tools can also be used to configure CMake, but those are not documented
here. See this section of the build guide
for more details, including a full list of options, and their default values.
By default, command-line programs will be built, and if the Python dependencies
(Cython, setuptools, numpy, pandas) are available, then Python bindings will
also be built. OpenMP will be used for parallelization when possible by
default.
Once CMake is configured, building the library is as simple as typing 'make'.
This will build all library components and bindings.
$ make
If you do not want to build everything in the library, individual components
of the build can be specified:
$ make mlpack_pca mlpack_knn mlpack_kfn
If you want to build the tests, just make the mlpack_test target, and use
ctest to run the tests:
$ make mlpack_test
$ ctest .
If the build fails and you cannot figure out why, register an account on Github
and submit an issue. The mlpack developers will quickly help you figure it out:
Alternately, mlpack help can be found in IRC at #mlpack on chat.freenode.net.
If you wish to install mlpack to /usr/local/include/mlpack/, /usr/local/lib/,
and /usr/local/bin/, make sure you have root privileges (or write permissions
to those three directories), and simply type
$ make install
You can now run the executables by name; you can link against mlpack with
-lmlpack
and the mlpack headers are found in
/usr/local/include/mlpack/
and if Python bindings were built, you can access them with the mlpack
package in Python.
If running the programs (i.e. $ mlpack_knn -h) gives an error of the form
error while loading shared libraries: libmlpack.so.2: cannot open shared object file: No such file or directory
then be sure that the runtime linker is searching the directory where
libmlpack.so was installed (probably /usr/local/lib/ unless you set it
manually). One way to do this, on Linux, is to ensure that the
LD_LIBRARY_PATH environment variable has the directory that contains
libmlpack.so. Using bash, this can be set easily:
(or whatever directory libmlpack.so is installed in.)
5. Running mlpack programs
After building mlpack, the executables will reside in build/bin/. You can call
them from there, or you can install the library and (depending on system
settings) they should be added to your PATH and you can call them directly. The
documentation below assumes the executables are in your PATH.
Consider the 'mlpack_knn' program, which finds the k nearest neighbors in a
reference dataset of all the points in a query set. That is, we have a query
and a reference dataset. For each point in the query dataset, we wish to know
the k points in the reference dataset which are closest to the given query
point.
Alternately, if the query and reference datasets are the same, the problem can
be stated more simply: for each point in the dataset, we wish to know the k
nearest points to that point.
Each mlpack program has extensive help documentation which details what the
method does, what each of the parameters is, and how to use them:
$ mlpack_knn --help
Running mlpack_knn on one dataset (that is, the query and reference
datasets are the same) and finding the 5 nearest neighbors is very simple:
The -v (--verbose) flag is optional; it gives informational output. It is not
unique to mlpack_knn but is available in all mlpack programs. Verbose
output also gives timing output at the end of the program, which can be very
useful.
6. Using mlpack from Python
If mlpack is installed to the system, then the mlpack Python bindings should be
automatically in your PYTHONPATH, and importing mlpack functionality into Python
should be very simple:
>>>frommlpackimportknn
Accessing help is easy:
>>>help(knn)
The API is similar to the command-line programs. So, running knn()
(k-nearest-neighbor search) on the numpy matrix dataset and finding the 5
nearest neighbors is very simple:
This will store the output neighbors in output['neighbors'] and the output
distances in output['distances']. Other mlpack bindings function similarly,
and the input/output parameters exactly match those of the command-line
programs.
7. Further documentation
The documentation given here is only a fraction of the available documentation
for mlpack. If doxygen is installed, you can type make doc to build the
documentation locally. Alternately, up-to-date documentation is available for
older versions of mlpack:
If you find a bug in mlpack or have any problems, numerous routes are available
for help.
Github is used for bug tracking, and can be found at
https://github.com/mlpack/mlpack/issues.
It is easy to register an account and file a bug there, and the mlpack
development team will try to quickly resolve your issue.
In addition, mailing lists are available. The mlpack discussion list is
available at
请发表评论