在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
开源软件名称(OpenSource Name):kvh/ramp开源软件地址(OpenSource Url):https://github.com/kvh/ramp开源编程语言(OpenSource Language):Python 100.0%开源软件介绍(OpenSource Introduction):Ramp - Rapid Machine Learning PrototypingRamp is a python library for rapid prototyping of machine learning solutions. It's a light-weight pandas-based machine learning framework pluggable with existing python machine learning and statistics tools (scikit-learn, rpy2, etc.). Ramp provides a simple, declarative syntax for exploring features, algorithms and transformations quickly and efficiently. Documentation: http://ramp.readthedocs.org Why Ramp?
Normalize(Log('x'))
Interactions([Log('x1'), (F('x2') + F('x3')) / 2])
DimensionReduction([F('x%d'%i) for i in range(100)], decomposer=PCA(n_components=3))
Residuals(simple_model_def) + Predictions(complex_model_def)
Quick startGetting started with Ramp: Classifying insults Or, the quintessential Iris example: import pandas
from ramp import *
import urllib2
import sklearn
from sklearn import decomposition
# fetch and clean iris data from UCI
data = pandas.read_csv(urllib2.urlopen(
"http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"))
data = data.drop([149]) # bad line
columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']
data.columns = columns
# all features
features = [FillMissing(f, 0) for f in columns[:-1]]
# features, log transformed features, and interaction terms
expanded_features = (
features +
[Log(F(f) + 1) for f in features] +
[
F('sepal_width') ** 2,
combo.Interactions(features),
]
)
# Define several models and feature sets to explore,
# run 5 fold cross-validation on each and print the results.
# We define 2 models and 4 feature sets, so this will be
# 4 * 2 = 8 models tested.
shortcuts.cv_factory(
data=data,
target=[AsFactor('class')],
metrics=[
[metrics.GeneralizedMCC()],
],
# report feature importance scores from Random Forest
reporters=[
[reporters.RFImportance()],
],
# Try out two algorithms
model=[
sklearn.ensemble.RandomForestClassifier(
n_estimators=20),
sklearn.linear_model.LogisticRegression(),
],
# and 4 feature sets
features=[
expanded_features,
# Feature selection
[trained.FeatureSelector(
expanded_features,
# use random forest's importance to trim
selectors.RandomForestSelector(classifier=True),
target=AsFactor('class'), # target to use
n_keep=5, # keep top 5 features
)],
# Reduce feature dimension (pointless on this dataset)
[combo.DimensionReduction(expanded_features,
decomposer=decomposition.PCA(n_components=4))],
# Normalized features
[Normalize(f) for f in expanded_features],
]
) StatusRamp is alpha currently, so expect bugs, bug fixes and API changes. Requirements
AuthorKen Van Haren. Email with feedback/questions: [email protected] @squaredloss Contributors |
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论