Module 09

So you want to segment with machine learning

This lesson adapts the `1_sklearn_introduction.ipynb` notebook into a guided walkthrough of normalization, unsupervised clustering, segmentation quality, feature engineering, and random-forest-based segmentation.

Estimated time: 60 to 90 min Main tools: NumPy, scikit-image, scikit-learn

Source notebook topics

  1. Image loading and display using `skimage.io.imread` and matplotlib.
  2. Min-max normalization for image intensities.
  3. Unsupervised segmentation with K-means clustering.
  4. Intersection over Union to judge segmentation quality.
  5. Feature engineering for supervised pixel classification.
  6. Random forest as a first supervised segmentation model.

Why this matters

This notebook marks the transition from “opening images and looking at them” to “using measurable pixel features to make segmentation decisions.” It is a useful bridge between classical image analysis and later deep-learning tools.

Suggested learning flow

Part 1: load and normalize

Start with a real microscopy image such as `wt_dna.tif`. Display it, inspect its value range, and implement your own min-max normalization function using NumPy. This reinforces that image data are arrays first and pictures second.

Part 2: unsupervised segmentation

Use K-means with two clusters as a first binary segmentation attempt. The goal is not perfection. The goal is to understand the machine-learning workflow: initialize a model, fit on features, predict, then reshape the output back into image form.

Part 3: judge quality

Compare your predicted segmentation to a reference mask using Intersection over Union. This is the first time many learners see the difference between “the output looks okay” and “the output can be measured objectively.”

Part 4: move to supervised learning

Engineer extra per-pixel features and train a random forest. This helps learners understand that supervised segmentation requires both labels and a useful feature representation.

Representative code examples

Load and normalize

import numpy as np
from skimage.io import imread
from matplotlib import pyplot as plt

image = imread("../data/images/wt_dna.tif")

def custom_normalize(input_image):
    input_image = input_image.astype(np.float32)
    input_max = np.max(input_image)
    input_min = np.min(input_image)
    output = (input_image - input_min) / (input_max - input_min)
    return output

normalized_image = custom_normalize(image)
plt.imshow(normalized_image, cmap="gray")

K-means segmentation

from sklearn.cluster import KMeans

pixels = normalized_image.reshape(-1, 1)
model = KMeans(n_clusters=2, random_state=0)
labels = model.fit_predict(pixels)
segmentation = labels.reshape(normalized_image.shape)

IoU

def iou_score(prediction, reference):
    intersection = np.logical_and(prediction, reference).sum()
    union = np.logical_or(prediction, reference).sum()
    return intersection / union

Important conceptual takeaway

The notebook teaches a pattern that will appear again later: data preparation, model fitting, prediction, reshaping the result, and then evaluating whether the output is biologically useful.

Exercises to keep in the website version

  1. Write your own normalization function using `np.min` and `np.max`.
  2. Run K-means with two clusters and display the segmentation.
  3. Compute IoU against a reference segmentation.
  4. Add at least one new engineered feature and compare the result.
  5. Train a random forest and discuss whether the segmentation improved.

Where it belongs in this site

This module fits naturally after the core Jupyter and napari lessons. It is advanced enough to need notebook confidence, but concrete enough to remain accessible for motivated researchers.