imbalanced-learn

Python library for handling imbalanced datasets with sampling techniques.

EstablishedOpen SourceLow lock-in

Visit Website ↗Compare ⇄

Pricing

Free tier

Flat rate

Adoption

→Stable

License

Open Source

Data freshness

Aging · Jun 8, 2026

Overview

What is imbalanced-learn?

imbalanced-learn is a Python module that provides various under-sampling and over-sampling techniques to handle imbalanced datasets, which are common in machine learning tasks. It helps improve model performance by balancing the class distribution.

Key differentiator

“imbalanced-learn stands out by offering a wide range of sampling techniques directly compatible with scikit-learn pipelines, making it easy to integrate into existing machine learning workflows without significant overhead.”

Capability profile

Capability Radar

Honest assessment

Strengths & Weaknesses

↑ Strengths

Various over-sampling techniques like SMOTE, ADASYNmedium

Under-sampling methods including Random Under-Sampling and Tomek Linksmedium

Pipeline integration with scikit-learn for seamless use in ML workflowsmedium

↓ Weaknesses

Limited support for advanced techniques beyond SMOTE and ADASYNhigh

The library primarily focuses on well-known methods like SMOTE and ADASYN, lacking more sophisticated or recent algorithms.

Poor documentation for complex use casesmedium

Documentation is thorough for basic usage but lacks examples and explanations for advanced configurations and edge cases.

Performance issues with large datasetshigh

Over-sampling techniques like SMOTE can be computationally expensive, leading to slow processing times on large datasets.

Limited integration support outside of scikit-learn ecosystemmedium

While it integrates well with scikit-learn pipelines, support for other machine learning frameworks or libraries is minimal.

Fit analysis

Who is it for?

✓ Best for

Projects dealing with highly imbalanced datasets where minority classes are critical for model performance.

Developers looking to integrate sampling techniques directly into their scikit-learn pipelines.

✕ Not a fit for

Real-time applications requiring immediate response as the library is designed for batch processing.

Scenarios where computational resources are extremely limited, given that some oversampling methods can be resource-intensive.

Cost structure

Pricing

Free Tier

Available

Open source — free to use

Starts at

Model

Flat rate

Enterprise

None

Performance benchmarks

How Fast Is It?

Ecosystem

Relationships

Works well with

matplotlib NumPy Pandas Seaborn

Integrations

(supported)(supported)(community)(supported)(community)(supported)

Next step

Get Started with imbalanced-learn

Step-by-step setup guide with code examples and common gotchas.

View Setup Guide →