On this path

Install Python
Get a working local interpreter and verify your terminal setup.
Virtual environments
Compare venv and conda, then create an isolated workspace.
Python basics
Learn variables, data types, loops, and functions with exercises.
Scientific Python libraries
Meet the libraries that show up everywhere in research code.
Jupyter notebooks
Launch notebooks, run examples, and work through short exercises.
pandas and matplotlib
Load tables, summarize measurements, and make plots.
napari + mAIcrobe
Install the viewer, add the plugin, and try a workflow.
Create a package
Generate a package template you can grow into a real analysis project.
ML segmentation
Use classical machine learning for segmentation.
Cell classification
Train classifiers on morphology-derived measurements.
Segmentation QC
Filter out poor segmentations automatically.
Neural networks
Build intuition before bigger DL tools.
Pretrained models
Explore Cellpose and model-zoo workflows.

Module 06

So you want to use pandas and matplotlib

Good choice. A surprising amount of research progress starts with “load the table, check the columns, and plot the thing before you overthink it.”

Estimated time: 45 to 60 min Prerequisite: Jupyter notebooks

Load a table

import pandas as pd

data = pd.read_csv("../data/morphological_measurements.csv")
data.head()

The first thing to do is almost never modeling. It is checking whether the table looks sensible.

Inspect useful columns

print(data.columns)
print(data.shape)
print(data["Area"].mean())
print(data["Perimeter"].max())

data[["Area", "Perimeter"]].head()

Make quick plots

import matplotlib.pyplot as plt

data["Cell Cycle Phase"].hist()
plt.title("Cell cycle phase counts")
plt.tight_layout()

data.plot.scatter(x="Area", y="Perimeter")
plt.title("Area vs Perimeter")
plt.tight_layout()

Group and summarize

summary = data.groupby("Cell Cycle Phase")[["Area", "Perimeter"]].mean()
print(summary)

summary.plot(kind="bar")
plt.ylabel("Mean value")
plt.tight_layout()

The habit worth building

Use pandas and matplotlib to ask the boring, high-value questions first: what is in the table, what is missing, what looks strange, and what changes across conditions?

Exercises

Load the CSV file and print the first five rows.
Plot a histogram of one numeric column.
Make a scatter plot of two measurements.
Group by one label column and compute a summary statistic.
Write one sentence on what you think is worth checking next.

Back to Jupyter notebooks Continue to napari + mAIcrobe