skops - Machine Learning Model Persistence

2 minute read

Published:

Persisting ML/AI models (or model serialization) allows users to save models that they’ve trained so that they can be reloaded and reused. Model persistence is important as it can save time and allows models to be shared with other users without the need to retrain the model (which can be computationally expensive and time consuming) or share the training data set (which can be large).

scikit-learn lists a number of ways to persist models as well as providing an overview of each method (pros/cons/risks). Of these skops feels like the best. Compared to ONNX it is rather simple and preserves the python object allowing you to work with trained model. Compared to pickle, joblib, and cloudpickle, it is more secure; when loading a file only trusted types will be loaded, any untrusted types require use validation before loading. Regarding IO, in my use case skops was faster then ONNX and offers much better compression with speeds and compression compariable to joblib (while being more secure).

One downside to skops was identifying what data types are trusted. The webpage has a section on Supported libraries, but it’s unclear if Supported means trusted. However, with some digging you can find all of the trusted types and list them using import skops.io_trusted_types as trusted, you can then tab through trusted to identify the trusted types.

The code below simplifies this and creates a DataFrame of trusted modules (columns) and their types (row)

import skops.io._trusted_types as trusted
from inspect import getmembers
import pandas as pd
df = pd.DataFrame()

val = list()
ind = list()

for w in getmembers(trusted):
    if isinstance(w[1],list) and 'NAMES' in w[0]:
        print(w[0])
        val.append(w[1])
        ind.append(w[0])

trusted_df = pd.DataFrame(val,index=ind).transpose()

The first five rows of this DataFrame are below

CONTAINER_TYPE_NAMESNUMPY_DTYPE_TYPE_NAMESNUMPY_UFUNC_TYPE_NAMES
builtins.listnumpy.bool_numpy.core._multiarray_umath.absolute
builtins.setnumpy.bytes_numpy.core._multiarray_umath.add
builtins.mapnumpy.clongdoublenumpy.core._multiarray_umath.arccos
builtins.tuplenumpy.complex128numpy.core._multiarray_umath.arccosh
 numpy.complex64numpy.core._multiarray_umath.arcsin
SCIPY_UFUNC_TYPE_NAMESSKLEARN_ESTIMATOR_TYPE_NAMES
scipy.special._ufuncs.agmsklearn.linear_model._bayes.ARDRegression
scipy.special._ufuncs.airysklearn.ensemble._weight_boosting.AdaBoostClassifier
scipy.special._ufuncs.airyesklearn.ensemble._weight_boosting.AdaBoostRegressor
scipy.special._ufuncs.bdtrsklearn.kernel_approximation.AdditiveChi2Sampler
scipy.special._ufuncs.bdtrcsklearn.cluster._affinity_propagation.AffinityPropagation

You can simplify this even more by turning it into a function

import skops.io._trusted_types as trusted
from skops.io._utils import get_type_name
from inspect import getmembers

val = list()
ind = list()

for w in getmembers(trusted):
    if isinstance(w[1],list) and 'NAMES' in w[0]:
        val.append(w[1])
        ind.append(w[0])

t_type = [x for xs in val for x in xs]

def is_trusted(obj)-> bool: 
    return get_type_name(type(obj)) in t_type
import numpy as np
is_trusted(np.int16())

True
import pandas as pd
is_trusted(pd.DataFrame())

False