API¶
Unet¶
-
class
segmentation_models_pytorch.
Unet
(encoder_name: str = 'resnet34', encoder_depth: int = 5, encoder_weights: str = 'imagenet', decoder_use_batchnorm: bool = True, decoder_channels: List[int] = (256, 128, 64, 32, 16), decoder_attention_type: Optional[str] = None, in_channels: int = 3, classes: int = 1, activation: Union[str, callable, None] = None, aux_params: Optional[dict] = None)¶ Unet is a fully convolution neural network for image semantic segmentation
Parameters: - encoder_name – name of classification model (without last dense layers) used as feature extractor to build segmentation model.
- encoder_depth (int) – number of stages used in decoder, larger depth - more features are generated. e.g. for depth=3 encoder will generate list of features with following spatial shapes [(H,W), (H/2, W/2), (H/4, W/4), (H/8, W/8)], so in general the deepest feature tensor will have spatial resolution (H/(2^depth), W/(2^depth)]
- encoder_weights – one of
None
(random initialization),imagenet
(pre-training on ImageNet). - decoder_channels – list of numbers of
Conv2D
layer filters in decoder blocks - decoder_use_batchnorm – if
True
,BatchNormalisation
layer betweenConv2D
andActivation
layers is used. If ‘inplace’ InplaceABN will be used, allows to decrease memory consumption. One of [True, False, ‘inplace’] - decoder_attention_type – attention module used in decoder of the model
One of [
None
,scse
] - in_channels – number of input channels for model, default is 3.
- classes – a number of classes for output (output shape -
(batch, classes, h, w)
). - activation – activation function to apply after final convolution;
One of [
sigmoid
,softmax
,logsoftmax
,identity
, callable, None] - aux_params –
if specified model will have additional classification auxiliary output build on top of encoder, supported params:
- classes (int): number of classes
- pooling (str): one of ‘max’, ‘avg’. Default is ‘avg’.
- dropout (float): dropout factor in [0, 1)
- activation (str): activation function to apply “sigmoid”/”softmax” (could be None to return logits)
Returns: Unet
Return type: torch.nn.Module
Linknet¶
-
class
segmentation_models_pytorch.
Linknet
(encoder_name: str = 'resnet34', encoder_depth: int = 5, encoder_weights: Optional[str] = 'imagenet', decoder_use_batchnorm: bool = True, in_channels: int = 3, classes: int = 1, activation: Union[str, callable, None] = None, aux_params: Optional[dict] = None)¶ Linknet is a fully convolution neural network for fast image semantic segmentation
Note
This implementation by default has 4 skip connections (original - 3).
Parameters: - encoder_name – name of classification model (without last dense layers) used as feature extractor to build segmentation model.
- encoder_depth (int) – number of stages used in decoder, larger depth - more features are generated. e.g. for depth=3 encoder will generate list of features with following spatial shapes [(H,W), (H/2, W/2), (H/4, W/4), (H/8, W/8)], so in general the deepest feature will have spatial resolution (H/(2^depth), W/(2^depth)]
- encoder_weights – one of
None
(random initialization),imagenet
(pre-training on ImageNet). - decoder_use_batchnorm – if
True
,BatchNormalisation
layer betweenConv2D
andActivation
layers is used. If ‘inplace’ InplaceABN will be used, allows to decrease memory consumption. One of [True, False, ‘inplace’] - in_channels – number of input channels for model, default is 3.
- classes – a number of classes for output (output shape -
(batch, classes, h, w)
). - activation – activation function used in
.predict(x)
method for inference. One of [sigmoid
,softmax
, callable, None] - aux_params –
if specified model will have additional classification auxiliary output build on top of encoder, supported params:
- classes (int): number of classes
- pooling (str): one of ‘max’, ‘avg’. Default is ‘avg’.
- dropout (float): dropout factor in [0, 1)
- activation (str): activation function to apply “sigmoid”/”softmax” (could be None to return logits)
Returns: Linknet
Return type: torch.nn.Module
FPN¶
-
class
segmentation_models_pytorch.
FPN
(encoder_name: str = 'resnet34', encoder_depth: int = 5, encoder_weights: Optional[str] = 'imagenet', decoder_pyramid_channels: int = 256, decoder_segmentation_channels: int = 128, decoder_merge_policy: str = 'add', decoder_dropout: float = 0.2, in_channels: int = 3, classes: int = 1, activation: Optional[str] = None, upsampling: int = 4, aux_params: Optional[dict] = None)¶ FPN is a fully convolution neural network for image semantic segmentation :param encoder_name: name of classification model (without last dense layers) used as feature
extractor to build segmentation model.Parameters: - encoder_depth – number of stages used in decoder, larger depth - more features are generated. e.g. for depth=3 encoder will generate list of features with following spatial shapes [(H,W), (H/2, W/2), (H/4, W/4), (H/8, W/8)], so in general the deepest feature will have spatial resolution (H/(2^depth), W/(2^depth)]
- encoder_weights – one of
None
(random initialization),imagenet
(pre-training on ImageNet). - decoder_pyramid_channels – a number of convolution filters in Feature Pyramid of FPN.
- decoder_segmentation_channels – a number of convolution filters in segmentation head of FPN.
- decoder_merge_policy – determines how to merge outputs inside FPN.
One of [
add
,cat
] - decoder_dropout – spatial dropout rate in range (0, 1).
- in_channels – number of input channels for model, default is 3.
- classes – a number of classes for output (output shape -
(batch, classes, h, w)
). - activation (str, callable) – activation function used in
.predict(x)
method for inference. One of [sigmoid
,softmax2d
, callable, None] - upsampling – optional, final upsampling factor (default is 4 to preserve input -> output spatial shape identity)
- aux_params –
if specified model will have additional classification auxiliary output build on top of encoder, supported params:
- classes (int): number of classes
- pooling (str): one of ‘max’, ‘avg’. Default is ‘avg’.
- dropout (float): dropout factor in [0, 1)
- activation (str): activation function to apply “sigmoid”/”softmax” (could be None to return logits)
Returns: FPN
Return type: torch.nn.Module
PSPNet¶
-
class
segmentation_models_pytorch.
PSPNet
(encoder_name: str = 'resnet34', encoder_weights: Optional[str] = 'imagenet', encoder_depth: int = 3, psp_out_channels: int = 512, psp_use_batchnorm: bool = True, psp_dropout: float = 0.2, in_channels: int = 3, classes: int = 1, activation: Union[str, callable, None] = None, upsampling: int = 8, aux_params: Optional[dict] = None)¶ PSPNet is a fully convolution neural network for image semantic segmentation
Parameters: - encoder_name – name of classification model used as feature extractor to build segmentation model.
- encoder_depth – number of stages used in decoder, larger depth - more features are generated. e.g. for depth=3 encoder will generate list of features with following spatial shapes [(H,W), (H/2, W/2), (H/4, W/4), (H/8, W/8)], so in general the deepest feature will have spatial resolution (H/(2^depth), W/(2^depth)]
- encoder_weights – one of
None
(random initialization),imagenet
(pre-training on ImageNet). - psp_out_channels – number of filters in PSP block.
- psp_use_batchnorm – if
True
,BatchNormalisation
layer betweenConv2D
andActivation
layers is used. If ‘inplace’ InplaceABN will be used, allows to decrease memory consumption. One of [True, False, ‘inplace’] - psp_dropout – spatial dropout rate between 0 and 1.
- in_channels – number of input channels for model, default is 3.
- classes – a number of classes for output (output shape -
(batch, classes, h, w)
). - activation – activation function used in
.predict(x)
method for inference. One of [sigmoid
,softmax
, callable, None] - upsampling – optional, final upsampling factor (default is 8 for depth=3 to preserve input -> output spatial shape identity)
- aux_params –
if specified model will have additional classification auxiliary output build on top of encoder, supported params:
- classes (int): number of classes
- pooling (str): one of ‘max’, ‘avg’. Default is ‘avg’.
- dropout (float): dropout factor in [0, 1)
- activation (str): activation function to apply “sigmoid”/”softmax” (could be None to return logits)
Returns: PSPNet
Return type: torch.nn.Module
PAN¶
-
class
segmentation_models_pytorch.pan.model.
PAN
(encoder_name: str = 'resnet34', encoder_weights: str = 'imagenet', encoder_dilation: bool = True, decoder_channels: int = 32, in_channels: int = 3, classes: int = 1, activation: Union[str, callable, None] = None, upsampling: int = 4, aux_params: Optional[dict] = None)¶ Implementation of _PAN (Pyramid Attention Network). Currently works with shape of input tensor >= [B x C x 128 x 128] for pytorch <= 1.1.0 and with shape of input tensor >= [B x C x 256 x 256] for pytorch == 1.3.1
Parameters: - encoder_name – name of classification model (without last dense layers) used as feature extractor to build segmentation model.
- encoder_weights – one of
None
(random initialization),imagenet
(pre-training on ImageNet). - encoder_dilation – Flag to use dilation in encoder last layer.
Doesn’t work with [
*ception*
,vgg*
,densenet*
] backbones, default is True. - decoder_channels – Number of
Conv2D
layer filters in decoder blocks - in_channels – number of input channels for model, default is 3.
- classes – a number of classes for output (output shape -
(batch, classes, h, w)
). - activation – activation function to apply after final convolution;
One of [
sigmoid
,softmax
,logsoftmax
,identity
, callable, None] - upsampling – optional, final upsampling factor (default is 4 to preserve input -> output spatial shape identity)
- aux_params –
if specified model will have additional classification auxiliary output build on top of encoder, supported params:
- classes (int): number of classes
- pooling (str): one of ‘max’, ‘avg’. Default is ‘avg’.
- dropout (float): dropout factor in [0, 1)
- activation (str): activation function to apply “sigmoid”/”softmax” (could be None to return logits)
Returns: PAN
Return type: torch.nn.Module