Module 04
So you want to know the scientific Python stack
Before every notebook starts casually importing six packages as if you were born knowing them, it helps to pause and meet the ones you will keep seeing in scientific workflows.
The usual suspects
- NumPy for arrays and numerical work.
- pandas for tables and tabular measurements.
- matplotlib for plots and quick visual inspection.
- scikit-image for image loading, filtering, and segmentation utilities.
- scikit-learn for machine learning and model evaluation.
- SciPy for numerical methods and image-adjacent utilities.
- napari for interactive image viewing.
What each library tends to feel like in practice
NumPy: arrays everywhere
import numpy as np
image = np.array([[1, 2], [3, 4]])
print(image.shape)
print(image.mean())
pandas: tables with labels
import pandas as pd
data = pd.DataFrame(
{
"sample": ["wt", "mutant"],
"cell_count": [120, 95],
}
)
print(data)
matplotlib: just plot the thing
import matplotlib.pyplot as plt
plt.plot([1, 2, 3], [2, 5, 4])
plt.title("Quick plot")
plt.show()
Combined example
This is the kind of short mixed example that appears all over scientific Python:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
measurements = pd.DataFrame(
{
"cell_id": ["c1", "c2", "c3"],
"mean_intensity": np.array([120, 151, 118]),
}
)
measurements.plot(x="cell_id", y="mean_intensity", kind="bar", legend=False)
plt.ylabel("Mean intensity")
plt.tight_layout()
The practical rule
You do not need to master every library before you start. You do need to get comfortable with what kind of problem each one is usually solving.
Try this for yourself
- Import NumPy and create a small array.
- Create a tiny pandas table with two columns.
- Plot one of those columns with matplotlib.
- Write one sentence on which library felt most intuitive and why.