Welcome to segmentation_models_pytorch’s documentation!¶

The main features of this library are:
- High level API (just two lines to create neural network)
- 5 models architectures for binary and multi class segmentation (including legendary Unet)
- 46 available encoders for each architecture
- All encoders have pre-trained weights for faster and better convergence
Contents
Quick start¶
Since the library is built on the PyTorch framework, created segmentation model is just a PyTorch nn.Module, which can be created as easy as:
import segmentation_models_pytorch as smp
model = smp.Unet()
Depending on the task, you can change the network architecture by choosing backbones with fewer or more parameters and use pretrainded weights to initialize it:
model = smp.Unet('resnet34', encoder_weights='imagenet')
Change number of output classes in the model:
model = smp.Unet('resnet34', classes=3, activation='softmax')
All models have pretrained encoders, so you have to prepare your data the same way as during weights pretraining:
from segmentation_models_pytorch.encoders import get_preprocessing_fn
preprocess_input = get_preprocessing_fn('resnet18', pretrained='imagenet')
Examples¶
- Training model for cars segmentation on CamVid dataset here.
- Training SMP model with
Catalyst (high-level
framework for PyTorch), Ttach
(TTA library for PyTorch) and
Albumentations (fast
image augmentation library) -
here
Models¶
Architectures¶
-
class
segmentation_models_pytorch.
Unet
(encoder_name: str = 'resnet34', encoder_depth: int = 5, encoder_weights: str = 'imagenet', decoder_use_batchnorm: bool = True, decoder_channels: List[int] = (256, 128, 64, 32, 16), decoder_attention_type: Optional[str] = None, in_channels: int = 3, classes: int = 1, activation: Union[str, callable, None] = None, aux_params: Optional[dict] = None)¶ Unet is a fully convolution neural network for image semantic segmentation
Parameters: - encoder_name – name of classification model (without last dense layers) used as feature extractor to build segmentation model.
- encoder_depth (int) – number of stages used in decoder, larger depth - more features are generated. e.g. for depth=3 encoder will generate list of features with following spatial shapes [(H,W), (H/2, W/2), (H/4, W/4), (H/8, W/8)], so in general the deepest feature tensor will have spatial resolution (H/(2^depth), W/(2^depth)]
- encoder_weights – one of
None
(random initialization),imagenet
(pre-training on ImageNet). - decoder_channels – list of numbers of
Conv2D
layer filters in decoder blocks - decoder_use_batchnorm – if
True
,BatchNormalisation
layer betweenConv2D
andActivation
layers is used. If ‘inplace’ InplaceABN will be used, allows to decrease memory consumption. One of [True, False, ‘inplace’] - decoder_attention_type – attention module used in decoder of the model
One of [
None
,scse
] - in_channels – number of input channels for model, default is 3.
- classes – a number of classes for output (output shape -
(batch, classes, h, w)
). - activation – activation function to apply after final convolution;
One of [
sigmoid
,softmax
,logsoftmax
,identity
, callable, None] - aux_params –
if specified model will have additional classification auxiliary output build on top of encoder, supported params:
- classes (int): number of classes
- pooling (str): one of ‘max’, ‘avg’. Default is ‘avg’.
- dropout (float): dropout factor in [0, 1)
- activation (str): activation function to apply “sigmoid”/”softmax” (could be None to return logits)
Returns: Unet
Return type: torch.nn.Module
Encoders¶
Encoder | Weights | Params, M |
---|---|---|
resnet18 | imagenet | 11M |
resnet34 | imagenet | 21M |
resnet50 | imagenet | 23M |
resnet101 | imagenet | 42M |
resnet152 | imagenet | 58M |
resnext50_32x4d | imagenet | 22M |
resnext101_32x8d | imagenetinstagram | 86M |
resnext101_32x16d | 191M | |
resnext101_32x32d | 466M | |
resnext101_32x48d | 826M | |
dpn68 | imagenet | 11M |
dpn68b | imagenet+5k | 11M |
dpn92 | imagenet+5k | 34M |
dpn98 | imagenet | 58M |
dpn107 | imagenet+5k | 84M |
dpn131 | imagenet | 76M |
vgg11 | imagenet | 9M |
vgg11_bn | imagenet | 9M |
vgg13 | imagenet | 9M |
vgg13_bn | imagenet | 9M |
vgg16 | imagenet | 14M |
vgg16_bn | imagenet | 14M |
vgg19 | imagenet | 20M |
vgg19_bn | imagenet | 20M |
senet154 | imagenet | 113M |
se_resnet50 | imagenet | 26M |
se_resnet101 | imagenet | 47M |
se_resnet152 | imagenet | 64M |
se_resnext50_32x4d | imagenet | 25M |
se_resnext101_32x4d | imagenet | 46M |
densenet121 | imagenet | 6M |
densenet169 | imagenet | 12M |
densenet201 | imagenet | 18M |
densenet161 | imagenet | 26M |
inceptionresnetv2 | imagenetimagenet+background | 54M |
inceptionv4 | imagenetimagenet+background | 41M |
efficientnet-b0 | imagenet | 4M |
efficientnet-b1 | imagenet | 6M |
efficientnet-b2 | imagenet | 7M |
efficientnet-b3 | imagenet | 10M |
efficientnet-b4 | imagenet | 17M |
efficientnet-b5 | imagenet | 28M |
efficientnet-b6 | imagenet | 40M |
efficientnet-b7 | imagenet | 63M |
mobilenet_v2 | imagenet | 2M |
xception | imagenet | 22M |
Models API¶
model.encoder
- pretrained backbone to extract features of different spatial resolutionmodel.decoder
- depends on models architecture (Unet
/Linknet
/PSPNet
/FPN
)model.segmentation_head
- last block to produce required number of mask channels (include also optional upsampling and activation)model.classification_head
- optional block which create classification head on top of encodermodel.forward(x)
- sequentially passx
through model`s encoder, decoder and segmentation head (and classification head if specified)
Input channels parameter allow you to create models, which process tensors with arbitrary number of channels. If you use pretrained weights from imagenet - weights of first convolution will be reused for 1- or 2- channels inputs, for input channels > 4 weights of first convolution will be initialized randomly.
model = smp.FPN('resnet34', in_channels=1)
mask = model(torch.ones([1, 1, 64, 64]))
All models support aux_params
parameters, which is default set to
None
. If aux_params = None
than classification auxiliary output
is not created, else model produce not only mask
, but also label
output with shape NC
. Classification head consist of
GlobalPooling->Dropout(optional)->Linear->Activation(optional) layers,
which can be configured by aux_params
as follows:
aux_params=dict(
pooling='avg', # one of 'avg', 'max'
dropout=0.5, # dropout ratio, default is None
activation='sigmoid', # activation function, default is None
classes=4, # define number of output labels
)
model = smp.Unet('resnet34', classes=4, aux_params=aux_params)
mask, label = model(x)
Depth parameter specify a number of downsampling operations in encoder,
so you can make your model lighted if specify smaller depth
.
model = smp.Unet('resnet34', encoder_depth=4)
Installation¶
PyPI version:
$ pip install segmentation-models-pytorch
Latest version from source:
$ pip install git+https://github.com/qubvel/segmentation_models.pytorch
Competitions won with the library¶
Segmentation Models
package is widely used in the image segmentation
competitions.
Here
you can find competitions, names of the winners and links to their
solutions.
License¶
Project is distributed under MIT License
Contributing¶
$ docker build -f docker/Dockerfile.dev -t smp:dev . && docker run --rm smp:dev pytest -p no:cacheprovider
$ docker build -f docker/Dockerfile.dev -t smp:dev . && docker run --rm smp:dev python misc/generate_table.py