Automated machine learning for analytics & production

Overview

auto_ml

Automated machine learning for production and analytics

Build Status Documentation Status PyPI version Coverage Status license

Installation

  • pip install auto_ml

Getting started

from auto_ml import Predictor
from auto_ml.utils import get_boston_dataset

df_train, df_test = get_boston_dataset()

column_descriptions = {
    'MEDV': 'output',
    'CHAS': 'categorical'
}

ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)

ml_predictor.train(df_train)

ml_predictor.score(df_test, df_test.MEDV)

Show off some more features!

auto_ml is designed for production. Here's an example that includes serializing and loading the trained model, then getting predictions on single dictionaries, roughly the process you'd likely follow to deploy the trained model.

from auto_ml import Predictor
from auto_ml.utils import get_boston_dataset
from auto_ml.utils_models import load_ml_model

# Load data
df_train, df_test = get_boston_dataset()

# Tell auto_ml which column is 'output'
# Also note columns that aren't purely numerical
# Examples include ['nlp', 'date', 'categorical', 'ignore']
column_descriptions = {
  'MEDV': 'output'
  , 'CHAS': 'categorical'
}

ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)

ml_predictor.train(df_train)

# Score the model on test data
test_score = ml_predictor.score(df_test, df_test.MEDV)

# auto_ml is specifically tuned for running in production
# It can get predictions on an individual row (passed in as a dictionary)
# A single prediction like this takes ~1 millisecond
# Here we will demonstrate saving the trained model, and loading it again
file_name = ml_predictor.save()

trained_model = load_ml_model(file_name)

# .predict and .predict_proba take in either:
# A pandas DataFrame
# A list of dictionaries
# A single dictionary (optimized for speed in production evironments)
predictions = trained_model.predict(df_test)
print(predictions)

3rd Party Packages- Deep Learning with TensorFlow & Keras, XGBoost, LightGBM, CatBoost

auto_ml has all of these awesome libraries integrated! Generally, just pass one of them in for model_names. ml_predictor.train(data, model_names=['DeepLearningClassifier'])

Available options are

  • DeepLearningClassifier and DeepLearningRegressor
  • XGBClassifier and XGBRegressor
  • LGBMClassifier and LGBMRegressor
  • CatBoostClassifier and CatBoostRegressor

All of these projects are ready for production. These projects all have prediction time in the 1 millisecond range for a single prediction, and are able to be serialized to disk and loaded into a new environment after training.

Depending on your machine, they can occasionally be difficult to install, so they are not included in auto_ml's default installation. You are responsible for installing them yourself. auto_ml will run fine without them installed (we check what's installed before choosing which algorithm to use).

Feature Responses

Get linear-model-esque interpretations from non-linear models. See the docs for more information and caveats.

Classification

Binary and multiclass classification are both supported. Note that for now, labels must be integers (0 and 1 for binary classification). auto_ml will automatically detect if it is a binary or multiclass classification problem - you just have to pass in ml_predictor = Predictor(type_of_estimator='classifier', column_descriptions=column_descriptions)

Feature Learning

Also known as "finally found a way to make this deep learning stuff useful for my business". Deep Learning is great at learning important features from your data. But the way it turns these learned features into a final prediction is relatively basic. Gradient boosting is great at turning features into accurate predictions, but it doesn't do any feature learning.

In auto_ml, you can now automatically use both types of models for what they're great at. If you pass feature_learning=True, fl_data=some_dataframe to .train(), we will do exactly that: train a deep learning model on your fl_data. We won't ask it for predictions (standard stacking approach), instead, we'll use it's penultimate layer to get it's 10 most useful features. Then we'll train a gradient boosted model (or any other model of your choice) on those features plus all the original features.

Across some problems, we've witnessed this lead to a 5% gain in accuracy, while still making predictions in 1-4 milliseconds, depending on model complexity.

ml_predictor.train(df_train, feature_learning=True, fl_data=df_fl_data)

This feature only supports regression and binary classification currently. The rest of auto_ml supports multiclass classification.

Categorical Ensembling

Ever wanted to train one market for every store/customer, but didn't want to maintain hundreds of thousands of independent models? With ml_predictor.train_categorical_ensemble(), we will handle that for you. You'll still have just one consistent API, ml_predictor.predict(data), but behind this single API will be one model for each category you included in your training data.

Just tell us which column holds the category you want to split on, and we'll handle the rest. As always, saving the model, loading it in a different environment, and getting speedy predictions live in production is baked right in.

ml_predictor.train_categorical_ensemble(df_train, categorical_column='store_name')

More details available in the docs

http://auto-ml.readthedocs.io/en/latest/

Advice

Before you go any further, try running the code. Load up some data (either a DataFrame, or a list of dictionaries, where each dictionary is a row of data). Make a column_descriptions dictionary that tells us which attribute name in each row represents the value we're trying to predict. Pass all that into auto_ml, and see what happens!

Everything else in these docs assumes you have done at least the above. Start there and everything else will build on top. But this part gets you the output you're probably interested in, without unnecessary complexity.

Docs

The full docs are available at https://auto_ml.readthedocs.io Again though, I'd strongly recommend running this on an actual dataset before referencing the docs any futher.

What this project does

Automates the whole machine learning process, making it super easy to use for both analytics, and getting real-time predictions in production.

A quick overview of buzzwords, this project automates:

  • Analytics (pass in data, and auto_ml will tell you the relationship of each variable to what it is you're trying to predict).
  • Feature Engineering (particularly around dates, and NLP).
  • Robust Scaling (turning all values into their scaled versions between the range of 0 and 1, in a way that is robust to outliers, and works with sparse data).
  • Feature Selection (picking only the features that actually prove useful).
  • Data formatting (turning a DataFrame or a list of dictionaries into a sparse matrix, one-hot encoding categorical variables, taking the natural log of y for regression problems, etc).
  • Model Selection (which model works best for your problem- we try roughly a dozen apiece for classification and regression problems, including favorites like XGBoost if it's installed on your machine).
  • Hyperparameter Optimization (what hyperparameters work best for that model).
  • Big Data (feed it lots of data- it's fairly efficient with resources).
  • Unicorns (you could conceivably train it to predict what is a unicorn and what is not).
  • Ice Cream (mmm, tasty...).
  • Hugs (this makes it much easier to do your job, hopefully leaving you more time to hug those those you care about).

Running the tests

If you've cloned the source code and are making any changes (highly encouraged!), or just want to make sure everything works in your environment, run nosetests -v tests.

CI is also set up, so if you're developing on this, you can just open a PR, and the tests will run automatically on Travis-CI.

The tests are relatively comprehensive, though as with everything with auto_ml, I happily welcome your contributions here!

Analytics

Comments
  • Comparison with other automatic ML libraries?

    Comparison with other automatic ML libraries?

    First, thank you very much for the hard work and awesome project. I think it will get a lot of use in my workflow.

    I was surveying the landscape of automatic ML solutions, and found your package along with tpot and auto-sklearn. I am trying to figure out what kind of strengths and weaknesses all these packages have. Would you mind discussing what auto_ml does differently and/or better?

    Thanks again.

    opened by sergeyf 12
  • ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape

    ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape

    When I train with DeepLearningRegressor with a 5k dataset everything works fine but when I do it on 50k dataset I get this error.

    Caused by op u'dense_1/random_normal/RandomStandardNormal', defined at:
      File "salary_predict.py", line 38, in <module>
        ml_predictor.train(df_train, model_names=['DeepLearningRegressor'])
      File "/home/ubuntu/deeparted/auto_ml/predictor.py", line 471, in train
        self.trained_final_model = self.train_ml_estimator(estimator_names, self._scorer, X_df, y)
      File "/home/ubuntu/deeparted/auto_ml/predictor.py", line 674, in train_ml_estimator
        trained_final_model = self.fit_single_pipeline(X_df, y, estimator_names[0], feature_learning=feature_learning)
      File "/home/ubuntu/deeparted/auto_ml/predictor.py", line 548, in fit_single_pipeline
        ppl.fit(X_df, y)
      File "/home/ubuntu/deeparted/auto_ml/utils_model_training.py", line 88, in fit
        self.model.fit(X_fit, y, callbacks=[early_stopping])
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/wrappers/scikit_learn.py", line 138, in fit
        self.model = self.build_fn(**self.filter_sk_params(self.build_fn))
      File "/home/ubuntu/deeparted/auto_ml/utils_models.py", line 559, in make_deep_learning_model
        model.add(Dense(hidden_layers[0], input_dim=num_cols, kernel_initializer='normal', kernel_regularizer=regularizers.l2(0.01)))
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/models.py", line 433, in add
        layer(x)
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/engine/topology.py", line 558, in __call__
        self.build(input_shapes[0])
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/layers/core.py", line 827, in build
        constraint=self.kernel_constraint)
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 88, in wrapper
        return func(*args, **kwargs)
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/engine/topology.py", line 391, in add_weight
        weight = K.variable(initializer(shape), dtype=dtype, name=name)
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/initializers.py", line 75, in __call__
        dtype=dtype, seed=self.seed)
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 3356, in random_normal
        dtype=dtype, seed=seed)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/random_ops.py", line 76, in random_normal
        shape_tensor, dtype, seed=seed1, seed2=seed2)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_random_ops.py", line 220, in _random_standard_normal
        name=name)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
        op_def=op_def)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2514, in create_op
        original_op=self._default_original_op, op_def=op_def)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
        self._traceback = _extract_stack()
    

    ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[47302,1] [[Node: dense_1/random_normal/RandomStandardNormal = RandomStandardNormalT=DT_INT32, dtype=DT_FLOAT, seed=87654321, seed2=5687716, _device="/job:localhost/replica:0/task:0/gpu:0"]]

    Tensorflow: Version: 1.1.0 Cuda: 8.0 Cudann: 5.1.10

    System Config: Im using P2 (p2.8xlarge) 8 NVIDIA K80 GPUs(192 GB) 64 vCPUs 732 GiB of host memory

    Training: batch_size: 50 Dataset size: 50k No of columns: 4 (1 Output, 2 Categorical, 1 Float)

    Github Issues: https://github.com/tensorflow/tensorflow/issues/4735 https://github.com/tensorflow/tensorflow/issues/1355 and many more on github

    None of this solved the issue. Can anyone help me on this.

    opened by sameerpallav 12
  • User validation on fl_data

    User validation on fl_data

    Do you have an example of using feature learning? I assumed I could just do feature_learning on the training dataset but I get an error like so when running it on the boston dataset:

    ml_predictor.train(df_train, feature_learning=True, fl_data=df_train)

    
    Traceback (most recent call last):
      File "/home/data/.local/lib/python3.6/site-packages/pandas/indexes/base.py", line 2134, in get_loc
        return self._engine.get_loc(key)
      File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
      File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
      File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
      File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
    KeyError: 'MEDV'
    
    opened by calz1 11
  • TypeError: cannot perform reduce with flexible type OR AttributeError: 'Predictor' object has no attribute 'grid_search_pipelines'

    TypeError: cannot perform reduce with flexible type OR AttributeError: 'Predictor' object has no attribute 'grid_search_pipelines'

    Very cool package!

    I am trying out auto_ml with this dataset on SMS spam. I added a header row to the file to give it column names and then do the following:

    import pandas as p  
    import dill  
    from sklearn.model_selection import train_test_split   
    from auto_ml import Predictor 
    
    df = p.read_table('/home/data/auto_ml/sms.txt')
    df_train, df_test = train_test_split(df, test_size=0.5, random_state=42)
    column_descriptions = {
      'spam': 'output'
      , 'text': 'nlp'
    }
    
    ml_predictor = Predictor(type_of_estimator='classifier', column_descriptions=column_descriptions)
    ml_predictor.train(df_train)
    

    You can see it sort of works because it is telling me about feature importance but then gives :

    .... nlp_text_txt: 0.0373 nlp_text_free: 0.0441 Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/auto_ml/predictor.py", line 597, in train if len(self.grid_search_pipelines) > 1: AttributeError: 'Predictor' object has no attribute 'grid_search_pipelines'

    Originally I was trying: ml_predictor.train(df_train,ml_for_analytics=True)

    and got:

    test_score = ml_predictor.score(df_test, df_test.spam) Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/auto_ml/predictor.py", line 1014, in score score, probas = self._scorer.score(self.trained_pipeline, X_test, y_test, advanced_scoring=advanced_scoring) File "/usr/local/lib/python2.7/dist-packages/auto_ml/utils_scoring.py", line 268, in score score = self.scoring_func(y, predictions) File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 1884, in brier_score_loss pos_label = y_true.max() File "/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py", line 26, in _amax return umr_maximum(a, axis, None, out, keepdims) TypeError: cannot perform reduce with flexible type

    opened by calz1 11
  • error during LGBM predict_proba

    error during LGBM predict_proba

    Hi all..

    After long hours of training my model with lightgbm, I just run predict_proba and at first I ran into data_rate_limit in Jupyiter.. then I changed that limit and had to train the model again.. but this time I ran into another error:

    Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'

    can someone help me please? thanks

    opened by vkocaman 10
  • AttributeError: 'XGBRegressor' object has no attribute 'get_fscore'

    AttributeError: 'XGBRegressor' object has no attribute 'get_fscore'

    Testing out auto_ml with XGBoost and ran into this issue. This is against a fresh clone of the XGBoost repository so it looks like their API changed.

    predictor.train(x_train, verbose=True, model_names=['XGBRegressor'])

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    /home/ubuntu/venv/lib/python3.5/site-packages/auto_ml/predictor.py in _get_xgb_feat_importances(self, clf)
        890             # xgb.XGBClassifier.fit() or xgb.XGBRegressor().fit()
    --> 891             fscore = clf.booster().get_fscore()
        892         except:
    
    TypeError: 'str' object is not callable
    
    During handling of the above exception, another exception occurred:
    
    AttributeError                            Traceback (most recent call last)
    <ipython-input-37-eafdc24b187b> in <module>()
    ----> 1 predictor.train(x_train, verbose=True, model_names=['XGBRegressor'])
    
    /home/ubuntu/venv/lib/python3.5/site-packages/auto_ml/predictor.py in train(self, raw_training_data, user_input_func, optimize_final_model, write_gs_param_results_to_file, perform_feature_selection, verbose, X_test, y_test, ml_for_analytics, take_log_of_y, model_names, perform_feature_scaling, calibrate_final_model, _scorer, scoring, verify_features, training_params, grid_search_params, compare_all_models, cv, feature_learning, fl_data)
        469 
        470         # This is our main logic for how we train the final model
    --> 471         self.trained_final_model = self.train_ml_estimator(estimator_names, self._scorer, X_df, y)
        472 
        473         # Calibrate the probability predictions from our final model
    
    /home/ubuntu/venv/lib/python3.5/site-packages/auto_ml/predictor.py in train_ml_estimator(self, estimator_names, scoring, X_df, y, feature_learning)
        672         # Use Case 1: Super straightforward: just train a single, non-optimized model
        673         if len(estimator_names) == 1 and self.optimize_final_model != True:
    --> 674             trained_final_model = self.fit_single_pipeline(X_df, y, estimator_names[0], feature_learning=feature_learning)
        675 
        676         # Use Case 2: Compare a bunch of models, but don't optimize any of them
    
    /home/ubuntu/venv/lib/python3.5/site-packages/auto_ml/predictor.py in fit_single_pipeline(self, X_df, y, model_name, feature_learning)
        554 
        555         self.trained_final_model = ppl
    --> 556         self.print_results(model_name)
        557 
        558         return ppl
    
    /home/ubuntu/venv/lib/python3.5/site-packages/auto_ml/predictor.py in print_results(self, model_name)
        578 
        579         elif self.ml_for_analytics and model_name in ['RandomForestClassifier', 'RandomForestRegressor', 'XGBClassifier', 'XGBRegressor', 'GradientBoostingRegressor', 'GradientBoostingClassifier', 'LGBMRegressor', 'LGBMClassifier']:
    --> 580             self._print_ml_analytics_results_random_forest()
        581 
        582 
    
    /home/ubuntu/venv/lib/python3.5/site-packages/auto_ml/predictor.py in _print_ml_analytics_results_random_forest(self)
        938         # XGB's Classifier has a proper .feature_importances_ property, while the XGBRegressor does not.
        939         if final_model_obj.model_name in ['XGBRegressor', 'XGBClassifier']:
    --> 940             self._get_xgb_feat_importances(final_model_obj.model)
        941 
        942         else:
    
    /home/ubuntu/venv/lib/python3.5/site-packages/auto_ml/predictor.py in _get_xgb_feat_importances(self, clf)
        893             # Handles case when clf has been created by calling xgb.train.
        894             # Thus, clf is an instance of xgb.Booster.
    --> 895             fscore = clf.get_fscore()
        896 
        897         trained_feature_names = self._get_trained_feature_names()
    
    AttributeError: 'XGBRegressor' object has no attribute 'get_fscore'
    
    opened by volker48 9
  • Error on install - Windows 10

    Error on install - Windows 10

    I have progressed through the install. although I got stuck with not having visual C++ 14 installed. I now get the following error at the end of the install. can you please help. What more info do you need.

    Command "c:\users\username\appdata\local\programs\python\python35-32\python.exe -u -c "import setuptools, tokenize;file='C:\Users\username\AppData\Local\Temp\pip-build-j_5l4z6_\scipy\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\username\AppData\Local\Temp\pip-5r95bpz0-record\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\username\AppData\Local\Temp\pip-build-j_5l4z6_\scipy\

    opened by bitsam 9
  • 'FinalModelATC' object has no attribute 'feature_ranges'

    'FinalModelATC' object has no attribute 'feature_ranges'

    I'm trying to run your "Getting Started" example on the numerai training data and getting the following error:

    AttributeError                            Traceback (most recent call last)
    <ipython-input-39-aab5c9ba7e0f> in <module>()
          6 # Can pass in type_of_estimator='regressor' as well
          7 
    ----> 8 ml_predictor.train(df_dict)
          9 # Wait for the machine to learn all the complex and beautiful patterns in your data...
         10 
    
    /Users/alex/anaconda/envs/dsi/lib/python2.7/site-packages/auto_ml/predictor.pyc in train(***failed resolving arguments***)
        553 
        554 
    --> 555         self.perform_grid_search_by_model_names(estimator_names, scoring, X_df, y)
        556 
        557         # If we ran GridSearchCV, we will have to pick the best model
    
    /Users/alex/anaconda/envs/dsi/lib/python2.7/site-packages/auto_ml/predictor.pyc in perform_grid_search_by_model_names(self, estimator_names, scoring, X_df, y)
        671 
        672             if self.ml_for_analytics and model_name in ('LogisticRegression', 'RidgeClassifier', 'LinearRegression', 'Ridge'):
    --> 673                 self._print_ml_analytics_results_regression()
        674             elif self.ml_for_analytics and model_name in ['RandomForestClassifier', 'RandomForestRegressor', 'XGBClassifier', 'XGBRegressor', 'GradientBoostingRegressor', 'GradientBoostingClassifier']:
        675                 self._print_ml_analytics_results_random_forest()
    
    /Users/alex/anaconda/envs/dsi/lib/python2.7/site-packages/auto_ml/predictor.pyc in _print_ml_analytics_results_regression(self)
        770             trained_coefficients = self.trained_pipeline.named_steps['final_model'].model.coef_
        771 
    --> 772         feature_ranges = self.trained_pipeline.named_steps['final_model'].feature_ranges
        773 
        774         # TODO(PRESTON): readability. Can probably do this in a single zip statement.
    
    AttributeError: 'FinalModelATC' object has no attribute 'feature_ranges'
    

    Are you familiar with this type of issue?

    opened by akodate 9
  • far future: take in dataframes or other sparse data structures directly

    far future: take in dataframes or other sparse data structures directly

    right now taking in python dictionaries is awesome for it's flexibility and ease of development, but is killing us on memory, even if it is a super sparse data structure.

    one workaround we could do for this is described in https://github.com/ClimbsRocks/auto_ml/issues/40, though that feels fairly hacky. taking in a DataFrame seems much more obvious.

    opened by ClimbsRocks 9
  • Fix XGBoost error

    Fix XGBoost error

    It appears that the current XGBoost package that is installed with pip does not have the feature_importance_ attribute. Therefore if you install the xgboost package using pip install xgboost you will be unable to conduct feature extraction from the XGBClassifier or the XGBRegressor object.

    I made a workaround after trying to check for feature_importance_ because if the newest version of XGBoost is installed from source then feature_importance_ works fine so it will likely exist in future versions. But currently the version available by pip install xgboost does not provide the attribute.

    opened by a-holm 7
  • Got an unexpected keyword argument 'max_iter' in SGDClassifier

    Got an unexpected keyword argument 'max_iter' in SGDClassifier

    Fail to run the example in README.

    from auto_ml import Predictor
    from auto_ml.utils import get_boston_dataset
    
    df_train, df_test = get_boston_dataset()
    
    column_descriptions = {
        'MEDV': 'output'
        , 'CHAS': 'categorical'
    }
    
    ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)
    
    ml_predictor.train(df_train)
    
    ml_predictor.score(df_test, df_test.MEDV)
    

    And here is the error message.

    ➜ python ./automl_demo.py
    Using TensorFlow backend.
    Welcome to auto_ml! We're about to go through and make sense of your data using machine learning, and give you a production-ready pipeline to get predictions with.
    
    If you have any issues, or new feature ideas, let us know at https://github.com/ClimbsRocks/auto_ml
    Now using the model training_params that you passed in:
    {}
    After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
    {'presort': False, 'warm_start': True, 'learning_rate': 0.1}
    Traceback (most recent call last):
      File "./automl_demo.py", line 13, in <module>
        ml_predictor.train(df_train)
      File "/usr/local/lib/python2.7/site-packages/auto_ml/predictor.py", line 611, in train
        X_df = self.fit_transformation_pipeline(X_df, y, estimator_names)
      File "/usr/local/lib/python2.7/site-packages/auto_ml/predictor.py", line 834, in fit_transformation_pipeline
        ppl = self._construct_pipeline(model_name=model_names[0], keep_cat_features=self.keep_cat_features)
      File "/usr/local/lib/python2.7/site-packages/auto_ml/predictor.py", line 206, in _construct_pipeline
        final_model = utils_models.get_model_from_name(model_name, training_params=params)
      File "/usr/local/lib/python2.7/site-packages/auto_ml/utils_models.py", line 129, in get_model_from_name
        'SGDClassifier': SGDClassifier(max_iter=1000, tol=0.001),
    TypeError: __init__() got an unexpected keyword argument 'max_iter'
    
    opened by tobegit3hub 7
  • get bad score running the sample code

    get bad score running the sample code

    1. I configure everything and run the whole script and get negative score on the boston datasets. Is it just a sample since i get a bad score is normal ?

    2. The default is only using gradient boosting for the classification and regression and not automatically choose the best model for taining and prediction?

    opened by Aun0124 0
  • pip install automl gets stuck after installing multiprocess-0.70.7

    pip install automl gets stuck after installing multiprocess-0.70.7

    The following is the last snippet in the pip install logs before the installation gets stuck indefinitely:

    Collecting multiprocess>=0.70.7 Using cached multiprocess-0.70.11-py3-none-any.whl (98 kB) Using cached multiprocess-0.70.10.zip (2.4 MB) Using cached multiprocess-0.70.9.tar.gz (1.6 MB) Using cached multiprocess-0.70.8.tar.gz (1.6 MB) Using cached multiprocess-0.70.7.tar.gz (1.4 MB)

    Even without using the cached copies, the installation gets stuck at this point.

    Update: One possible reason for this error could be that \sklearn_deap2-0.2.2-py3.8\evolutionary_search\cv.py incorrectly tries to import check_scoring in the following manner:

    from sklearn.metrics.scorer import check_scoring

    instead of this:

    from sklearn.metrics import check_scoring

    opened by akshatpv 2
  • docs: fix simple typo, puncutation -> punctuation

    docs: fix simple typo, puncutation -> punctuation

    There is a small typo in docs/source/formatting_data.rst.

    Should read punctuation rather than puncutation.

    Semi-automated pull request generated by https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md

    opened by timgates42 1
  • Update DataFrameVectorizer.py

    Update DataFrameVectorizer.py

    DeprecationWarning: The module is deprecated in version 0.21 and removed in version 0.23. This module was removed in the latest scikit-learn version. please remove this module.

    opened by karthikreddykuna 1
Releases(v2.7.0)
  • v2.7.0(Sep 12, 2017)

    Ensembling's back for it's alpha release, evolutionary algorithms are doing our hyperparameter search now, we've handled a bunch of dependency updates, and a bunch of smaller performance tweaks.

    Source code(tar.gz)
    Source code(zip)
  • v2.4.0(Jul 14, 2017)

    Using quantile regression, we can now return prediction intervals.

    Another minor change is adding in a column of absolute changes for feature_responses

    Source code(tar.gz)
    Source code(zip)
  • v2.3.5(Jul 9, 2017)

  • v2.2.1(Jun 13, 2017)

    Avoids double training deep learning models, changes how we sort and order features for analytics reporting, and adds a new _all_small_categories category to categorical ensembling.

    Source code(tar.gz)
    Source code(zip)
  • v2.2.0(Jun 6, 2017)

  • 2.1.5(May 18, 2017)

  • 2.1.2(May 3, 2017)

  • 2.1(Apr 19, 2017)

    Feature learning and categorical ensembling are really cool features that each get us 2-5% accuracy gains!

    For full info, check the docs.

    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Apr 4, 2017)

    Enough incremental improvements have added up that we're now ready to mark a 2.0 release!

    Part of the progress also means deprecating a few unused features that were adding unnecessary complexity and preventing us from implementing new features like ensembling properly.

    New changes for the 2.0 release:

    • Refactored and cleaned up code. Ensembling should now be much easier to add in, and in a way that's fast enough to be used in production (getting predictions from 10 models should take less than 10x as long as getting predictions from 1 model)
    • Deprecated compute_power
    • Deprecated several methods for grid searching over transformation_pipeline hyperparameters (different methods for feature selection, whether or not to do feature scaling, etc.). We just directly made a decision to prioritize the final model hyperparameter search.
    • Deprecated the current implementation of ensembling. It was implemented in such a way that it was not quick enough to make predictions in prod, and thus, did not meet the primary use cases of this project. Part of removing it allows us to reimplement ensembling in a way that is prod-ready.
    • Deprecated X_test and y_test, except for working with calibrate_final_model.
    • Added better documentation on features that were in silent alpha release previously.
    • Improved test coverage!

    Major changes since the 1.0 release:

    • Integrations for deep learning (using TensorFlow and Keras)
    • Integration of Microsoft's LightGBM, which appears to be a possibly better version of XGBoost
    • Quite a bit more user logging, warning, and input validation/input cleaning
    • Quite a few edge case bug fixes and minor performance improvements
    • Fully automated test suite with decent test coverage!
    • Better documentation
    • Support for pandas DataFrames- much more space efficient than lists of dictionaries
    Source code(tar.gz)
    Source code(zip)
    auto_ml-2.0.0-py2.py3-none-any.whl(47.43 KB)
    auto_ml-2.0.0.tar.gz(41.64 KB)
  • v1.12.2(Mar 16, 2017)

    This will be our final release before v2.

    Includes many recent changes- Deep Learning with Keras/TensorFlow, more efficient hyperparameter optimization, Microsoft's LightGBM, more advanced logging for scoring, and quite a few minor usability improvements (like improved logging when input is not as expected).

    Source code(tar.gz)
    Source code(zip)
  • v1.3(Oct 11, 2016)

Owner
Preston Parry
Rock Climber, Biker, Community Builder, Teacher, data scientist & machine learning geek
Preston Parry
Machine learning tools in JavaScript

ml.js - Machine learning tools in JavaScript Introduction This library is a compilation of the tools developed in the mljs organization. It is mainly

ml.js 2.3k Jan 1, 2023
Machine-learning for Node.js

Limdu.js Limdu is a machine-learning framework for Node.js. It supports multi-label classification, online learning, and real-time classification. The

Erel Segal-Halevi 1k Dec 16, 2022
Train and test machine learning models for your Arduino Nano 33 BLE Sense in the browser.

Tiny Motion Trainer Train and test IMU based TFLite models on the Web Overview Since 2009, coders have created thousands of experiments using Chrome,

Google Creative Lab 59 Nov 21, 2022
JavaScript Machine Learning Toolkit

The JavaScript Machine Learning Toolkit, or JSMLT, is an open source JavaScript library for education in machine learning.

JSMLT 25 Nov 23, 2022
Friendly machine learning for the web! 🤖

Read our ml5.js Code of Conduct and software licence here! This project is currently in development. Friendly machine learning for the web! ml5.js aim

ml5 5.9k Jan 2, 2023
Machine Learning library for node.js

shaman Machine Learning library for node.js Linear Regression shaman supports both simple linear regression and multiple linear regression. It support

Luc Castera 108 Feb 26, 2021
machinelearn.js is a Machine Learning library written in Typescript

machinelearn.js is a Machine Learning library written in Typescript. It solves Machine Learning problems and teaches users how Machine Learning algorithms work.

machinelearn.js 522 Jan 2, 2023
Unsupervised machine learning with multivariate Gaussian mixture model which supports both offline data and real-time data stream.

Gaussian Mixture Model Unsupervised machine learning with multivariate Gaussian mixture model which supports both offline data and real-time data stre

Luka 26 Oct 7, 2022
A JavaScript deep learning and reinforcement learning library.

neurojs is a JavaScript framework for deep learning in the browser. It mainly focuses on reinforcement learning, but can be used for any neural networ

Jan 4.4k Jan 4, 2023
Support Vector Machine (SVM) library for nodejs

node-svm Support Vector Machine (SVM) library for nodejs. Support Vector Machines Wikipedia : Support vector machines are supervised learning models t

Nicolas Panel 296 Nov 6, 2022
Fork, customize and deploy your Candy Machine v2 super quickly

Candy Machine V2 Frontend This is a barebones implementation of Candy Machine V2 frontend, intended for users who want to quickly get started selling

AL 107 Oct 24, 2022
Deep Learning in Javascript. Train Convolutional Neural Networks (or ordinary ones) in your browser.

ConvNetJS ConvNetJS is a Javascript implementation of Neural networks, together with nice browser-based demos. It currently supports: Common Neural Ne

Andrej 10.4k Dec 31, 2022
K-nearest neighbors algorithm for supervised learning implemented in javascript

kNear Install npm install knear --save About kNear is a javascript implementation of the k-nearest neighbors algorithm. It is a supervised machine lea

Nathan Epstein 45 Mar 7, 2022
🤖chat discord bot powered by Deep learning algorithm🧠

✨ Akaya ✨ ❗ Discord integration functionality not implemented yet! Only the deep-learning module working. Install git clone https://github.com/LyeZinh

Pedro Kaleb! 3 Jun 23, 2022
Next-gen mobile first analytics server (think Mixpanel, Google Analytics) with built-in encryption supporting HTTP2 and gRPC. Node.js, headless, API-only, horizontally scaleable.

Introduction to Awacs Next-gen behavior analysis server (think Mixpanel, Google Analytics) with built-in encryption supporting HTTP2 and gRPC. Node.js

Socketkit 52 Dec 19, 2022
Visualizer for neural network, deep learning, and machine learning models

Netron is a viewer for neural network, deep learning and machine learning models. Netron supports ONNX, TensorFlow Lite, Caffe, Keras, Darknet, Paddle

Lutz Roeder 21k Jan 5, 2023
Grupprojekt för kurserna 'Javascript med Ramverk' och 'Agil Utveckling'

JavaScript-med-Ramverk-Laboration-3 Grupprojektet för kurserna Javascript med Ramverk och Agil Utveckling. Utvecklingsguide För information om hur utv

Svante Jonsson IT-Högskolan 3 May 18, 2022
Hemsida för personer i Sverige som kan och vill erbjuda boende till människor på flykt

Getting Started with Create React App This project was bootstrapped with Create React App. Available Scripts In the project directory, you can run: np

null 4 May 3, 2022
Kurs-repo för kursen Webbserver och Databaser

Webbserver och databaser This repository is meant for CME students to access exercises and codealongs that happen throughout the course. I hope you wi

null 14 Jan 3, 2023