Welcome to segmentation_models_pytorch’s documentation!

Python library with Neural Networks for Image Segmentation based on PyTorch

PyPI version Build Status Generic badge

The main features of this library are:

  • High level API (just two lines to create neural network)
  • 5 models architectures for binary and multi class segmentation (including legendary Unet)
  • 46 available encoders for each architecture
  • All encoders have pre-trained weights for faster and better convergence

Quick start

Since the library is built on the PyTorch framework, created segmentation model is just a PyTorch nn.Module, which can be created as easy as:

import segmentation_models_pytorch as smp

model = smp.Unet()

Depending on the task, you can change the network architecture by choosing backbones with fewer or more parameters and use pretrainded weights to initialize it:

model = smp.Unet('resnet34', encoder_weights='imagenet')

Change number of output classes in the model:

model = smp.Unet('resnet34', classes=3, activation='softmax')

All models have pretrained encoders, so you have to prepare your data the same way as during weights pretraining:

from segmentation_models_pytorch.encoders import get_preprocessing_fn

preprocess_input = get_preprocessing_fn('resnet18', pretrained='imagenet')


  • Training model for cars segmentation on CamVid dataset here.
  • Training SMP model with Catalyst (high-level framework for PyTorch), Ttach (TTA library for PyTorch) and Albumentations (fast image augmentation library) - here Open In Colab



class segmentation_models_pytorch.Unet(encoder_name: str = 'resnet34', encoder_depth: int = 5, encoder_weights: str = 'imagenet', decoder_use_batchnorm: bool = True, decoder_channels: List[int] = (256, 128, 64, 32, 16), decoder_attention_type: Optional[str] = None, in_channels: int = 3, classes: int = 1, activation: Union[str, callable, None] = None, aux_params: Optional[dict] = None)

Unet is a fully convolution neural network for image semantic segmentation

  • encoder_name – name of classification model (without last dense layers) used as feature extractor to build segmentation model.
  • encoder_depth (int) – number of stages used in decoder, larger depth - more features are generated. e.g. for depth=3 encoder will generate list of features with following spatial shapes [(H,W), (H/2, W/2), (H/4, W/4), (H/8, W/8)], so in general the deepest feature tensor will have spatial resolution (H/(2^depth), W/(2^depth)]
  • encoder_weights – one of None (random initialization), imagenet (pre-training on ImageNet).
  • decoder_channels – list of numbers of Conv2D layer filters in decoder blocks
  • decoder_use_batchnorm – if True, BatchNormalisation layer between Conv2D and Activation layers is used. If ‘inplace’ InplaceABN will be used, allows to decrease memory consumption. One of [True, False, ‘inplace’]
  • decoder_attention_type – attention module used in decoder of the model One of [None, scse]
  • in_channels – number of input channels for model, default is 3.
  • classes – a number of classes for output (output shape - (batch, classes, h, w)).
  • activation – activation function to apply after final convolution; One of [sigmoid, softmax, logsoftmax, identity, callable, None]
  • aux_params

    if specified model will have additional classification auxiliary output build on top of encoder, supported params:

    • classes (int): number of classes
    • pooling (str): one of ‘max’, ‘avg’. Default is ‘avg’.
    • dropout (float): dropout factor in [0, 1)
    • activation (str): activation function to apply “sigmoid”/”softmax” (could be None to return logits)


Return type:



Encoder Weights Params, M
resnet18 imagenet 11M
resnet34 imagenet 21M
resnet50 imagenet 23M
resnet101 imagenet 42M
resnet152 imagenet 58M
resnext50_32x4d imagenet 22M
resnext101_32x8d imagenetinstagram 86M
resnext101_32x16d instagram 191M
resnext101_32x32d instagram 466M
resnext101_32x48d instagram 826M
dpn68 imagenet 11M
dpn68b imagenet+5k 11M
dpn92 imagenet+5k 34M
dpn98 imagenet 58M
dpn107 imagenet+5k 84M
dpn131 imagenet 76M
vgg11 imagenet 9M
vgg11_bn imagenet 9M
vgg13 imagenet 9M
vgg13_bn imagenet 9M
vgg16 imagenet 14M
vgg16_bn imagenet 14M
vgg19 imagenet 20M
vgg19_bn imagenet 20M
senet154 imagenet 113M
se_resnet50 imagenet 26M
se_resnet101 imagenet 47M
se_resnet152 imagenet 64M
se_resnext50_32x4d imagenet 25M
se_resnext101_32x4d imagenet 46M
densenet121 imagenet 6M
densenet169 imagenet 12M
densenet201 imagenet 18M
densenet161 imagenet 26M
inceptionresnetv2 imagenetimagenet+background 54M
inceptionv4 imagenetimagenet+background 41M
efficientnet-b0 imagenet 4M
efficientnet-b1 imagenet 6M
efficientnet-b2 imagenet 7M
efficientnet-b3 imagenet 10M
efficientnet-b4 imagenet 17M
efficientnet-b5 imagenet 28M
efficientnet-b6 imagenet 40M
efficientnet-b7 imagenet 63M
mobilenet_v2 imagenet 2M
xception imagenet 22M

Models API

  • model.encoder - pretrained backbone to extract features of different spatial resolution
  • model.decoder - depends on models architecture (Unet/Linknet/PSPNet/FPN)
  • model.segmentation_head - last block to produce required number of mask channels (include also optional upsampling and activation)
  • model.classification_head - optional block which create classification head on top of encoder
  • model.forward(x) - sequentially pass x through model`s encoder, decoder and segmentation head (and classification head if specified)

Input channels parameter allow you to create models, which process tensors with arbitrary number of channels. If you use pretrained weights from imagenet - weights of first convolution will be reused for 1- or 2- channels inputs, for input channels > 4 weights of first convolution will be initialized randomly.

model = smp.FPN('resnet34', in_channels=1)
mask = model(torch.ones([1, 1, 64, 64]))

All models support aux_params parameters, which is default set to None. If aux_params = None than classification auxiliary output is not created, else model produce not only mask, but also label output with shape NC. Classification head consist of GlobalPooling->Dropout(optional)->Linear->Activation(optional) layers, which can be configured by aux_params as follows:

    pooling='avg',             # one of 'avg', 'max'
    dropout=0.5,               # dropout ratio, default is None
    activation='sigmoid',      # activation function, default is None
    classes=4,                 # define number of output labels
model = smp.Unet('resnet34', classes=4, aux_params=aux_params)
mask, label = model(x)

Depth parameter specify a number of downsampling operations in encoder, so you can make your model lighted if specify smaller depth.

model = smp.Unet('resnet34', encoder_depth=4)


PyPI version:

$ pip install segmentation-models-pytorch

Latest version from source:

$ pip install git+https://github.com/qubvel/segmentation_models.pytorch

Competitions won with the library

Segmentation Models package is widely used in the image segmentation competitions. Here you can find competitions, names of the winners and links to their solutions.


Project is distributed under MIT License


$ docker build -f docker/Dockerfile.dev -t smp:dev . && docker run --rm smp:dev pytest -p no:cacheprovider
$ docker build -f docker/Dockerfile.dev -t smp:dev . && docker run --rm smp:dev python misc/generate_table.py