API¶

Unet¶

class segmentation_models_pytorch.Unet(encoder_name: str = 'resnet34', encoder_depth: int = 5, encoder_weights: str = 'imagenet', decoder_use_batchnorm: bool = True, decoder_channels: List[int] = (256, 128, 64, 32, 16), decoder_attention_type: Optional[str] = None, in_channels: int = 3, classes: int = 1, activation: Union[str, callable, None] = None, aux_params: Optional[dict] = None)¶

Unet is a fully convolution neural network for image semantic segmentation

Parameters:	encoder_name – name of classification model (without last dense layers) used as feature extractor to build segmentation model. encoder_depth (int) – number of stages used in decoder, larger depth - more features are generated. e.g. for depth=3 encoder will generate list of features with following spatial shapes [(H,W), (H/2, W/2), (H/4, W/4), (H/8, W/8)], so in general the deepest feature tensor will have spatial resolution (H/(2^depth), W/(2^depth)] encoder_weights – one of `None` (random initialization), `imagenet` (pre-training on ImageNet). decoder_channels – list of numbers of `Conv2D` layer filters in decoder blocks decoder_use_batchnorm – if `True`, `BatchNormalisation` layer between `Conv2D` and `Activation` layers is used. If ‘inplace’ InplaceABN will be used, allows to decrease memory consumption. One of [True, False, ‘inplace’] decoder_attention_type – attention module used in decoder of the model One of [`None`, `scse`] in_channels – number of input channels for model, default is 3. classes – a number of classes for output (output shape - `(batch, classes, h, w)`). activation – activation function to apply after final convolution; One of [`sigmoid`, `softmax`, `logsoftmax`, `identity`, callable, None] aux_params – if specified model will have additional classification auxiliary output build on top of encoder, supported params: classes (int): number of classes pooling (str): one of ‘max’, ‘avg’. Default is ‘avg’. dropout (float): dropout factor in [0, 1) activation (str): activation function to apply “sigmoid”/”softmax” (could be None to return logits)
Returns:	Unet
Return type:	`torch.nn.Module`

Linknet¶

class segmentation_models_pytorch.Linknet(encoder_name: str = 'resnet34', encoder_depth: int = 5, encoder_weights: Optional[str] = 'imagenet', decoder_use_batchnorm: bool = True, in_channels: int = 3, classes: int = 1, activation: Union[str, callable, None] = None, aux_params: Optional[dict] = None)¶

Linknet is a fully convolution neural network for fast image semantic segmentation

Note

This implementation by default has 4 skip connections (original - 3).

Parameters:	encoder_name – name of classification model (without last dense layers) used as feature extractor to build segmentation model. encoder_depth (int) – number of stages used in decoder, larger depth - more features are generated. e.g. for depth=3 encoder will generate list of features with following spatial shapes [(H,W), (H/2, W/2), (H/4, W/4), (H/8, W/8)], so in general the deepest feature will have spatial resolution (H/(2^depth), W/(2^depth)] encoder_weights – one of `None` (random initialization), `imagenet` (pre-training on ImageNet). decoder_use_batchnorm – if `True`, `BatchNormalisation` layer between `Conv2D` and `Activation` layers is used. If ‘inplace’ InplaceABN will be used, allows to decrease memory consumption. One of [True, False, ‘inplace’] in_channels – number of input channels for model, default is 3. classes – a number of classes for output (output shape - `(batch, classes, h, w)`). activation – activation function used in `.predict(x)` method for inference. One of [`sigmoid`, `softmax`, callable, None] aux_params – if specified model will have additional classification auxiliary output build on top of encoder, supported params: classes (int): number of classes pooling (str): one of ‘max’, ‘avg’. Default is ‘avg’. dropout (float): dropout factor in [0, 1) activation (str): activation function to apply “sigmoid”/”softmax” (could be None to return logits)
Returns:	Linknet
Return type:	`torch.nn.Module`

FPN¶

class segmentation_models_pytorch.FPN(encoder_name: str = 'resnet34', encoder_depth: int = 5, encoder_weights: Optional[str] = 'imagenet', decoder_pyramid_channels: int = 256, decoder_segmentation_channels: int = 128, decoder_merge_policy: str = 'add', decoder_dropout: float = 0.2, in_channels: int = 3, classes: int = 1, activation: Optional[str] = None, upsampling: int = 4, aux_params: Optional[dict] = None)¶

FPN is a fully convolution neural network for image semantic segmentation :param encoder_name: name of classification model (without last dense layers) used as feature

extractor to build segmentation model.

Parameters:	encoder_depth – number of stages used in decoder, larger depth - more features are generated. e.g. for depth=3 encoder will generate list of features with following spatial shapes [(H,W), (H/2, W/2), (H/4, W/4), (H/8, W/8)], so in general the deepest feature will have spatial resolution (H/(2^depth), W/(2^depth)] encoder_weights – one of `None` (random initialization), `imagenet` (pre-training on ImageNet). decoder_pyramid_channels – a number of convolution filters in Feature Pyramid of FPN. decoder_segmentation_channels – a number of convolution filters in segmentation head of FPN. decoder_merge_policy – determines how to merge outputs inside FPN. One of [`add`, `cat`] decoder_dropout – spatial dropout rate in range (0, 1). in_channels – number of input channels for model, default is 3. classes – a number of classes for output (output shape - `(batch, classes, h, w)`). activation (str, callable) – activation function used in `.predict(x)` method for inference. One of [`sigmoid`, `softmax2d`, callable, None] upsampling – optional, final upsampling factor (default is 4 to preserve input -> output spatial shape identity) aux_params – if specified model will have additional classification auxiliary output build on top of encoder, supported params: classes (int): number of classes pooling (str): one of ‘max’, ‘avg’. Default is ‘avg’. dropout (float): dropout factor in [0, 1) activation (str): activation function to apply “sigmoid”/”softmax” (could be None to return logits)
Returns:	FPN
Return type:	`torch.nn.Module`

PSPNet¶

class segmentation_models_pytorch.PSPNet(encoder_name: str = 'resnet34', encoder_weights: Optional[str] = 'imagenet', encoder_depth: int = 3, psp_out_channels: int = 512, psp_use_batchnorm: bool = True, psp_dropout: float = 0.2, in_channels: int = 3, classes: int = 1, activation: Union[str, callable, None] = None, upsampling: int = 8, aux_params: Optional[dict] = None)¶

PSPNet is a fully convolution neural network for image semantic segmentation

Parameters:	encoder_name – name of classification model used as feature extractor to build segmentation model. encoder_depth – number of stages used in decoder, larger depth - more features are generated. e.g. for depth=3 encoder will generate list of features with following spatial shapes [(H,W), (H/2, W/2), (H/4, W/4), (H/8, W/8)], so in general the deepest feature will have spatial resolution (H/(2^depth), W/(2^depth)] encoder_weights – one of `None` (random initialization), `imagenet` (pre-training on ImageNet). psp_out_channels – number of filters in PSP block. psp_use_batchnorm – if `True`, `BatchNormalisation` layer between `Conv2D` and `Activation` layers is used. If ‘inplace’ InplaceABN will be used, allows to decrease memory consumption. One of [True, False, ‘inplace’] psp_dropout – spatial dropout rate between 0 and 1. in_channels – number of input channels for model, default is 3. classes – a number of classes for output (output shape - `(batch, classes, h, w)`). activation – activation function used in `.predict(x)` method for inference. One of [`sigmoid`, `softmax`, callable, None] upsampling – optional, final upsampling factor (default is 8 for depth=3 to preserve input -> output spatial shape identity) aux_params – if specified model will have additional classification auxiliary output build on top of encoder, supported params: classes (int): number of classes pooling (str): one of ‘max’, ‘avg’. Default is ‘avg’. dropout (float): dropout factor in [0, 1) activation (str): activation function to apply “sigmoid”/”softmax” (could be None to return logits)
Returns:	PSPNet
Return type:	`torch.nn.Module`

PAN¶

class segmentation_models_pytorch.pan.model.PAN(encoder_name: str = 'resnet34', encoder_weights: str = 'imagenet', encoder_dilation: bool = True, decoder_channels: int = 32, in_channels: int = 3, classes: int = 1, activation: Union[str, callable, None] = None, upsampling: int = 4, aux_params: Optional[dict] = None)¶

Implementation of _PAN (Pyramid Attention Network). Currently works with shape of input tensor >= [B x C x 128 x 128] for pytorch <= 1.1.0 and with shape of input tensor >= [B x C x 256 x 256] for pytorch == 1.3.1

Parameters:	encoder_name – name of classification model (without last dense layers) used as feature extractor to build segmentation model. encoder_weights – one of `None` (random initialization), `imagenet` (pre-training on ImageNet). encoder_dilation – Flag to use dilation in encoder last layer. Doesn’t work with [`ception`, `vgg`, `densenet`] backbones, default is True. decoder_channels – Number of `Conv2D` layer filters in decoder blocks in_channels – number of input channels for model, default is 3. classes – a number of classes for output (output shape - `(batch, classes, h, w)`). activation – activation function to apply after final convolution; One of [`sigmoid`, `softmax`, `logsoftmax`, `identity`, callable, None] upsampling – optional, final upsampling factor (default is 4 to preserve input -> output spatial shape identity) aux_params – if specified model will have additional classification auxiliary output build on top of encoder, supported params: classes (int): number of classes pooling (str): one of ‘max’, ‘avg’. Default is ‘avg’. dropout (float): dropout factor in [0, 1) activation (str): activation function to apply “sigmoid”/”softmax” (could be None to return logits)
Returns:	PAN
Return type:	`torch.nn.Module`