On this path

Install Python
Set up a working interpreter on your local computer.
Virtual environments
Isolate project dependencies with venv or conda.
Python basics
Learn variables, data types, loops, and functions with exercises.
Scientific Python libraries
Meet the core tools used in research computing and image analysis.
Jupyter notebooks
Run code interactively with examples and exercises.
pandas and matplotlib
Load tables, summarize data, and make plots that tell a story.
napari + mAIcrobe
Launch a viewer workflow for microbial image analysis.
Create a package
Turn scripts into a reusable Python project.
ML segmentation
Use scikit-learn for unsupervised and supervised segmentation.
Cell classification
Train a classifier on single-cell measurements.
Segmentation QC
Filter out poor segmentations automatically.
Neural networks
Build intuition for network structure with NumPy.
Pretrained models
Explore Cellpose and model-zoo workflows.

Module 10

So you want to classify single cells

This lesson adapts `2_sklearn_classification.ipynb`, which uses morphological measurements to classify *S. aureus* cells into different cell-cycle phases.

Estimated time: 60 to 90 min Main tools: pandas, matplotlib, scikit-learn

Core learning arc from the notebook

Load tabular measurements from `morphological_measurements.csv` with pandas.
Explore columns, row previews, histograms, and scatter plots.
Normalize features by standardization (Z-scores).
Train a logistic regression model for cell-cycle classification.
Evaluate with a confusion matrix.
Use k-fold cross-validation and parameter sweeps to refine the model.

Why it is valuable

This notebook shows a very common bioimage-analysis pattern: a segmentation or measurement step produces tabular features, and those features then become the input to a classifier. It helps researchers see how image-derived measurements connect to cell-state inference.

How the website should present it

Part 1: data loading and exploration

Slow down here and make the learner inspect the table first. The notebook already uses `data.head()`, histograms, and scatter plots. Those should become explicit mini-tasks in the site.

Part 2: feature normalization

Reintroduce standardization carefully. The notebook normalizes all non-label columns to mean 0 and standard deviation 1. This is a good chance to connect numeric preprocessing to model behavior.

Part 3: build a first classifier

Logistic regression is a helpful first model because it is not too opaque. The focus should be on understanding the pipeline rather than treating the model as magic.

Part 4: evaluate and tune

The confusion matrix, cross-validation, and parameter sweeps make this a strong intermediate lesson. These ideas should be preserved because they teach good scientific skepticism about model quality.

Representative code examples

Load and inspect the data

import pandas as pd

data = pd.read_csv("../data/morphological_measurements.csv")
data.head()
data["Cell Cycle Phase"].hist()

Normalize feature columns

def normalize_column(data_column):
    mean = data_column.mean()
    std = data_column.std()
    normalized_column = (data_column - mean) / std
    return normalized_column

normalized_data = data.copy()
feature_columns = normalized_data.columns[:-1]

for column_name in feature_columns:
    normalized_data[column_name] = normalize_column(normalized_data[column_name])

Train a classifier

from sklearn.linear_model import LogisticRegression

X = normalized_data[feature_columns]
y = normalized_data["Cell Cycle Phase"]

model = LogisticRegression(max_iter=1000)
model.fit(X, y)
predictions = model.predict(X)

Research-facing framing

The biological goal is not just “build a classifier.” It is understanding whether the measured features contain enough information to separate meaningful cell states reliably.

Exercises worth carrying over

Plot histograms for multiple features and compare their distributions.
Make a scatter plot of `Area` vs `Perimeter` and discuss separation.
Write a small normalization helper and apply it column by column.
Train logistic regression and inspect the confusion matrix.
Try one parameter sweep and discuss whether the model improved meaningfully.

Back to ML segmentation Continue to segmentation QC