References for: doi:10.3233/DS-240059

Full identifier: https://doi.org/10.3233/DS-240059

Nanopublication Part Subject Predicate Object Published By Published On
links a nanopublication to its assertion http://www.nanopub.org/nschema#hasAssertion assertion
doi:10.3233/DS-240059
Measuring Data Drift with the Unstable Population Indicator
Tobias Kuhn
2024-02-29T09:56:50.813Z
links a nanopublication to its assertion http://www.nanopub.org/nschema#hasAssertion assertion
doi:10.3233/DS-240059
Tobias Kuhn
2024-02-29T09:56:50.813Z
links a nanopublication to its assertion http://www.nanopub.org/nschema#hasAssertion assertion
doi:10.3233/DS-240059
Tobias Kuhn
2024-02-29T09:56:50.813Z
links a nanopublication to its assertion http://www.nanopub.org/nschema#hasAssertion assertion
doi:10.3233/DS-240059
Tobias Kuhn
2024-02-29T09:56:50.813Z
links a nanopublication to its assertion http://www.nanopub.org/nschema#hasAssertion assertion
doi:10.3233/DS-240059
2024
Tobias Kuhn
2024-02-29T09:56:50.813Z
links a nanopublication to its assertion http://www.nanopub.org/nschema#hasAssertion assertion
doi:10.3233/DS-240059
Tobias Kuhn
2024-02-29T09:56:50.813Z
links a nanopublication to its assertion http://www.nanopub.org/nschema#hasAssertion assertion
doi:10.3233/DS-240059
Tobias Kuhn
2024-02-29T09:56:50.813Z
links a nanopublication to its assertion http://www.nanopub.org/nschema#hasAssertion assertion
doi:10.3233/DS-240059
Measuring data drift is essential in machine learning applications where model scoring (evaluation) is done on data samples that differ from those used in training. The Kullback-Leibler divergence is a common measure of shifted probability distributions, for which discretized versions are invented to deal with binned or categorical data. We present the Unstable Population Indicator, a robust, flexible and numerically stable, discretized implementation of Jeffrey's divergence, along with an implementation in a Python package that can deal with continuous, discrete, ordinal and nominal data in a variety of popular data types. We show the numerical and statistical properties in controlled experiments. It is not advised to employ a common cut-off to distinguish stable from unstable populations, but rather to let that cut-off depend on the use case.
Tobias Kuhn
2024-02-29T09:56:50.813Z
links a nanopublication to its pubinfo http://www.nanopub.org/nschema#hasPublicationInfo pubinfo
doi:10.3233/DS-240059
Tobias Kuhn
2024-02-29T09:56:50.813Z