Module 09
So you want to segment with machine learning
This lesson adapts the `1_sklearn_introduction.ipynb` notebook into a guided walkthrough of normalization, unsupervised clustering, segmentation quality, feature engineering, and random-forest-based segmentation.
Source notebook topics
- Image loading and display using `skimage.io.imread` and matplotlib.
- Min-max normalization for image intensities.
- Unsupervised segmentation with K-means clustering.
- Intersection over Union to judge segmentation quality.
- Feature engineering for supervised pixel classification.
- Random forest as a first supervised segmentation model.
Why this matters
This notebook marks the transition from “opening images and looking at them” to “using measurable pixel features to make segmentation decisions.” It is a useful bridge between classical image analysis and later deep-learning tools.
Suggested learning flow
Part 1: load and normalize
Start with a real microscopy image such as `wt_dna.tif`. Display it, inspect its value range, and implement your own min-max normalization function using NumPy. This reinforces that image data are arrays first and pictures second.
Part 2: unsupervised segmentation
Use K-means with two clusters as a first binary segmentation attempt. The goal is not perfection. The goal is to understand the machine-learning workflow: initialize a model, fit on features, predict, then reshape the output back into image form.
Part 3: judge quality
Compare your predicted segmentation to a reference mask using Intersection over Union. This is the first time many learners see the difference between “the output looks okay” and “the output can be measured objectively.”
Part 4: move to supervised learning
Engineer extra per-pixel features and train a random forest. This helps learners understand that supervised segmentation requires both labels and a useful feature representation.
Representative code examples
Load and normalize
import numpy as np
from skimage.io import imread
from matplotlib import pyplot as plt
image = imread("../data/images/wt_dna.tif")
def custom_normalize(input_image):
input_image = input_image.astype(np.float32)
input_max = np.max(input_image)
input_min = np.min(input_image)
output = (input_image - input_min) / (input_max - input_min)
return output
normalized_image = custom_normalize(image)
plt.imshow(normalized_image, cmap="gray")
K-means segmentation
from sklearn.cluster import KMeans
pixels = normalized_image.reshape(-1, 1)
model = KMeans(n_clusters=2, random_state=0)
labels = model.fit_predict(pixels)
segmentation = labels.reshape(normalized_image.shape)
IoU
def iou_score(prediction, reference):
intersection = np.logical_and(prediction, reference).sum()
union = np.logical_or(prediction, reference).sum()
return intersection / union
Important conceptual takeaway
The notebook teaches a pattern that will appear again later: data preparation, model fitting, prediction, reshaping the result, and then evaluating whether the output is biologically useful.
Exercises to keep in the website version
- Write your own normalization function using `np.min` and `np.max`.
- Run K-means with two clusters and display the segmentation.
- Compute IoU against a reference segmentation.
- Add at least one new engineered feature and compare the result.
- Train a random forest and discuss whether the segmentation improved.
Where it belongs in this site
This module fits naturally after the core Jupyter and napari lessons. It is advanced enough to need notebook confidence, but concrete enough to remain accessible for motivated researchers.