API

Unet

class segmentation_models_pytorch.Unet(encoder_name: str = 'resnet34', encoder_depth: int = 5, encoder_weights: str = 'imagenet', decoder_use_batchnorm: bool = True, decoder_channels: List[int] = (256, 128, 64, 32, 16), decoder_attention_type: Optional[str] = None, in_channels: int = 3, classes: int = 1, activation: Union[str, callable, None] = None, aux_params: Optional[dict] = None)

Unet is a fully convolution neural network for image semantic segmentation

Parameters:
  • encoder_name – name of classification model (without last dense layers) used as feature extractor to build segmentation model.
  • encoder_depth (int) – number of stages used in decoder, larger depth - more features are generated. e.g. for depth=3 encoder will generate list of features with following spatial shapes [(H,W), (H/2, W/2), (H/4, W/4), (H/8, W/8)], so in general the deepest feature tensor will have spatial resolution (H/(2^depth), W/(2^depth)]
  • encoder_weights – one of None (random initialization), imagenet (pre-training on ImageNet).
  • decoder_channels – list of numbers of Conv2D layer filters in decoder blocks
  • decoder_use_batchnorm – if True, BatchNormalisation layer between Conv2D and Activation layers is used. If ‘inplace’ InplaceABN will be used, allows to decrease memory consumption. One of [True, False, ‘inplace’]
  • decoder_attention_type – attention module used in decoder of the model One of [None, scse]
  • in_channels – number of input channels for model, default is 3.
  • classes – a number of classes for output (output shape - (batch, classes, h, w)).
  • activation – activation function to apply after final convolution; One of [sigmoid, softmax, logsoftmax, identity, callable, None]
  • aux_params

    if specified model will have additional classification auxiliary output build on top of encoder, supported params:

    • classes (int): number of classes
    • pooling (str): one of ‘max’, ‘avg’. Default is ‘avg’.
    • dropout (float): dropout factor in [0, 1)
    • activation (str): activation function to apply “sigmoid”/”softmax” (could be None to return logits)
Returns:

Unet

Return type:

torch.nn.Module

Linknet

class segmentation_models_pytorch.Linknet(encoder_name: str = 'resnet34', encoder_depth: int = 5, encoder_weights: Optional[str] = 'imagenet', decoder_use_batchnorm: bool = True, in_channels: int = 3, classes: int = 1, activation: Union[str, callable, None] = None, aux_params: Optional[dict] = None)

Linknet is a fully convolution neural network for fast image semantic segmentation

Note

This implementation by default has 4 skip connections (original - 3).

Parameters:
  • encoder_name – name of classification model (without last dense layers) used as feature extractor to build segmentation model.
  • encoder_depth (int) – number of stages used in decoder, larger depth - more features are generated. e.g. for depth=3 encoder will generate list of features with following spatial shapes [(H,W), (H/2, W/2), (H/4, W/4), (H/8, W/8)], so in general the deepest feature will have spatial resolution (H/(2^depth), W/(2^depth)]
  • encoder_weights – one of None (random initialization), imagenet (pre-training on ImageNet).
  • decoder_use_batchnorm – if True, BatchNormalisation layer between Conv2D and Activation layers is used. If ‘inplace’ InplaceABN will be used, allows to decrease memory consumption. One of [True, False, ‘inplace’]
  • in_channels – number of input channels for model, default is 3.
  • classes – a number of classes for output (output shape - (batch, classes, h, w)).
  • activation – activation function used in .predict(x) method for inference. One of [sigmoid, softmax, callable, None]
  • aux_params

    if specified model will have additional classification auxiliary output build on top of encoder, supported params:

    • classes (int): number of classes
    • pooling (str): one of ‘max’, ‘avg’. Default is ‘avg’.
    • dropout (float): dropout factor in [0, 1)
    • activation (str): activation function to apply “sigmoid”/”softmax” (could be None to return logits)
Returns:

Linknet

Return type:

torch.nn.Module

FPN

class segmentation_models_pytorch.FPN(encoder_name: str = 'resnet34', encoder_depth: int = 5, encoder_weights: Optional[str] = 'imagenet', decoder_pyramid_channels: int = 256, decoder_segmentation_channels: int = 128, decoder_merge_policy: str = 'add', decoder_dropout: float = 0.2, in_channels: int = 3, classes: int = 1, activation: Optional[str] = None, upsampling: int = 4, aux_params: Optional[dict] = None)

FPN is a fully convolution neural network for image semantic segmentation :param encoder_name: name of classification model (without last dense layers) used as feature

extractor to build segmentation model.
Parameters:
  • encoder_depth – number of stages used in decoder, larger depth - more features are generated. e.g. for depth=3 encoder will generate list of features with following spatial shapes [(H,W), (H/2, W/2), (H/4, W/4), (H/8, W/8)], so in general the deepest feature will have spatial resolution (H/(2^depth), W/(2^depth)]
  • encoder_weights – one of None (random initialization), imagenet (pre-training on ImageNet).
  • decoder_pyramid_channels – a number of convolution filters in Feature Pyramid of FPN.
  • decoder_segmentation_channels – a number of convolution filters in segmentation head of FPN.
  • decoder_merge_policy – determines how to merge outputs inside FPN. One of [add, cat]
  • decoder_dropout – spatial dropout rate in range (0, 1).
  • in_channels – number of input channels for model, default is 3.
  • classes – a number of classes for output (output shape - (batch, classes, h, w)).
  • activation (str, callable) – activation function used in .predict(x) method for inference. One of [sigmoid, softmax2d, callable, None]
  • upsampling – optional, final upsampling factor (default is 4 to preserve input -> output spatial shape identity)
  • aux_params

    if specified model will have additional classification auxiliary output build on top of encoder, supported params:

    • classes (int): number of classes
    • pooling (str): one of ‘max’, ‘avg’. Default is ‘avg’.
    • dropout (float): dropout factor in [0, 1)
    • activation (str): activation function to apply “sigmoid”/”softmax” (could be None to return logits)
Returns:

FPN

Return type:

torch.nn.Module

PSPNet

class segmentation_models_pytorch.PSPNet(encoder_name: str = 'resnet34', encoder_weights: Optional[str] = 'imagenet', encoder_depth: int = 3, psp_out_channels: int = 512, psp_use_batchnorm: bool = True, psp_dropout: float = 0.2, in_channels: int = 3, classes: int = 1, activation: Union[str, callable, None] = None, upsampling: int = 8, aux_params: Optional[dict] = None)

PSPNet is a fully convolution neural network for image semantic segmentation

Parameters:
  • encoder_name – name of classification model used as feature extractor to build segmentation model.
  • encoder_depth – number of stages used in decoder, larger depth - more features are generated. e.g. for depth=3 encoder will generate list of features with following spatial shapes [(H,W), (H/2, W/2), (H/4, W/4), (H/8, W/8)], so in general the deepest feature will have spatial resolution (H/(2^depth), W/(2^depth)]
  • encoder_weights – one of None (random initialization), imagenet (pre-training on ImageNet).
  • psp_out_channels – number of filters in PSP block.
  • psp_use_batchnorm – if True, BatchNormalisation layer between Conv2D and Activation layers is used. If ‘inplace’ InplaceABN will be used, allows to decrease memory consumption. One of [True, False, ‘inplace’]
  • psp_dropout – spatial dropout rate between 0 and 1.
  • in_channels – number of input channels for model, default is 3.
  • classes – a number of classes for output (output shape - (batch, classes, h, w)).
  • activation – activation function used in .predict(x) method for inference. One of [sigmoid, softmax, callable, None]
  • upsampling – optional, final upsampling factor (default is 8 for depth=3 to preserve input -> output spatial shape identity)
  • aux_params

    if specified model will have additional classification auxiliary output build on top of encoder, supported params:

    • classes (int): number of classes
    • pooling (str): one of ‘max’, ‘avg’. Default is ‘avg’.
    • dropout (float): dropout factor in [0, 1)
    • activation (str): activation function to apply “sigmoid”/”softmax” (could be None to return logits)
Returns:

PSPNet

Return type:

torch.nn.Module

PAN

class segmentation_models_pytorch.pan.model.PAN(encoder_name: str = 'resnet34', encoder_weights: str = 'imagenet', encoder_dilation: bool = True, decoder_channels: int = 32, in_channels: int = 3, classes: int = 1, activation: Union[str, callable, None] = None, upsampling: int = 4, aux_params: Optional[dict] = None)

Implementation of _PAN (Pyramid Attention Network). Currently works with shape of input tensor >= [B x C x 128 x 128] for pytorch <= 1.1.0 and with shape of input tensor >= [B x C x 256 x 256] for pytorch == 1.3.1

Parameters:
  • encoder_name – name of classification model (without last dense layers) used as feature extractor to build segmentation model.
  • encoder_weights – one of None (random initialization), imagenet (pre-training on ImageNet).
  • encoder_dilation – Flag to use dilation in encoder last layer. Doesn’t work with [*ception*, vgg*, densenet*] backbones, default is True.
  • decoder_channels – Number of Conv2D layer filters in decoder blocks
  • in_channels – number of input channels for model, default is 3.
  • classes – a number of classes for output (output shape - (batch, classes, h, w)).
  • activation – activation function to apply after final convolution; One of [sigmoid, softmax, logsoftmax, identity, callable, None]
  • upsampling – optional, final upsampling factor (default is 4 to preserve input -> output spatial shape identity)
  • aux_params

    if specified model will have additional classification auxiliary output build on top of encoder, supported params:

    • classes (int): number of classes
    • pooling (str): one of ‘max’, ‘avg’. Default is ‘avg’.
    • dropout (float): dropout factor in [0, 1)
    • activation (str): activation function to apply “sigmoid”/”softmax” (could be None to return logits)
Returns:

PAN

Return type:

torch.nn.Module