Image Segmentation
Image segmentation refers to the process of dividing an image into meaningful and distinct regions or objects at the pixel level. It involves assigning a label or class to each pixel in an image to identify different objects, boundaries, or areas of interest. The goal of image segmentation is to separate and distinguish different objects or regions within an image, enabling a computer or an algorithm to understand and analyze the image at a more detailed level.
Segmentation has benefits to downstream tasks such as object recognition and tracking, scene understanding, medical image analysis and robotics to name a few.
U-Net
The U-Net is a convolutional neural net (CNN) that was originally developed in 2015 at the Computer Science Department of the University of Freiburg for the task of biomedical image segmentation.
U-Net introduced an encoder-decoder architecture with skip connections. The contracting path captured context and abstract features, while the expansive path recovered spatial resolution using skip connections. U-Net's design made it highly effective for biomedical image segmentation and subsequently gained popularity in other domains.
from fastai.vision.all import *
from fastai.data.all import *
FloodNet
The below description is from the FloodNet GitHub
FloodNet provides high-resolution UAV (Unmanned Aerial Vehicle) imageries with detailed semantic annotation regarding the damages. To advance the damage assessment process for post-disaster scenarios, we present a unique challenge considering classification, semantic segmentation, visual question answering highlighting the UAS imagery-based FloodNet dataset.
Track 1
In this track, participants are required to complete two semi-supervised tasks. The first task is image classification, and the second task is semantic segmentation. 1. Semi-Supervised Classification: Classification for FloodNet dataset requires classifying the images into ‘Flooded’ and ‘Non-Flooded’ classes. Only a few of the training images have their labels available, while most of the training images are unlabeled.
- Semi-Supervised Semantic Segmentation: The semantic segmentation labels include: 1) Background, 2) Building Flooded, 3) Building Non-Flooded, 4) Road Flooded, 5) Road Non-Flooded, 6) Water, 7)Tree, 8) Vehicle, 9) Pool, 10) Grass. Only a small portion of the training images have their corresponding masks available.
Links
import numpy as np
import pandas as pd
from pathlib import Path
Segmentaion datasets usually consist of image files, mask files and codes which are the segmenttion pixel labels.
path = Path.cwd()/'floodnet_data'
path
# get loabels / codes
col_map = {'Class Index.1': 'class_id', 'Class Name.1':'label'}
df_codes = pd.read_csv(
path/'class_mapping.csv',
header=2
).iloc[:, -2:].rename(columns=col_map)
codes = df_codes.label.values
df_codes.head()
# Get all the files in path with optional extensions
# mask files are PNG so we can exclude these by specifying the extensions
fnames = get_files(path/"train", extensions='.jpg')
fnames[0]
def label_func(fn):
p = path/'train'/fn.parts[-3]/'mask'/f'{fn.stem}_lab.png'
return p
dls = SegmentationDataLoaders.from_label_func(
path,
bs=8,
fnames=fnames,
label_func=label_func,
codes=codes,
item_tfms=Resize(128)
)
segmentation
dls.show_batch(max_n=4)
Model: U-Net
Traditional convolutional neural networks (CNNs) are effective for various computer vision tasks, such as image classification, object detection, and localization. However, they have limitations when it comes to image segmentation. Reasons for this include...
- Resolution Loss: CNNs typically downsample the input image as they progress through the network to capture higher-level features. This downsampling reduces the resolution of the feature maps, making it challenging to accurately localize and segment small objects or fine details in the image.
- Contextual Information: Segmentation tasks often require capturing contextual information to distinguish between objects with similar appearances or to handle complex object boundaries. Traditional CNNs, with their hierarchical feature extraction, may struggle to capture long-range dependencies and global context, which are crucial for accurate segmentation.
- Limited Localization Accuracy: CNNs designed for classification or localization tasks focus on identifying the presence of objects within an image but do not provide precise information about their boundaries. Segmenting an image requires pixel-level localization accuracy, which is not emphasized in traditional CNNs.
The U-Net is specifically designed for semantic segmentation and addresses the above limitations. It employs a U-shaped architecture, consisting of a contracting path (encoder) and an expansive path (decoder), with skip connections between corresponding encoder and decoder layers. Advantages of using a UNet include... - U-shaped Architecture: U-Net's U-shaped design enables the preservation of high-resolution feature maps through skip connections, which helps in localizing objects accurately. - Context Aggregation: Skip connections in UNet allow the decoder to receive feature maps from different resolutions, incorporating both local and global contextual information. This aids in better segmentation by capturing fine details and understanding the overall context. - Dense Feature Propagation: U-Net uses upsampling and concatenation operations during the decoding phase, which helps in recovering the lost spatial resolution. This dense feature propagation aids in precise segmentation by retaining spatial information.
model = unet_learner(dls, resnet34)
model.fine_tune(10)
model.show_results(max_n=4, figsize=(7,10))
interp = SegmentationInterpretation.from_learner(model)
top_losses = interp.top_losses(4, largest=True)[1]
interp.show_results(top_losses.data)
The interpreter shows the model makes some reasonable predictions, but there is still room for improvement!
Bibliography
arXiv preprint arXiv:2012.02951
2020