Spark

June 19, 2018
Feature Selection Using Feature Importance Score - Creating a PySpark Estimator
python spark big-data
Extending Pyspark's MLlib native feature selection function by using a feature importance score generated from a machine learning model and extracting the variables that are plausibly the most important
Read article
April 8, 2018
Creating a Custom Cross-Validation Function in PySpark
python spark big-data
Custom cross-validation class written in PySpark with support for user-defined category such as by time, geographical or consumer segments.
Read article

Feature Selection Using Feature Importance Score - Creating a PySpark Estimator