Module 04

So you want to know the scientific Python stack

Before every notebook starts casually importing six packages as if you were born knowing them, it helps to pause and meet the ones you will keep seeing in scientific workflows.

Estimated time: 30 to 45 min Prerequisite: Python basics

The usual suspects

  • NumPy for arrays and numerical work.
  • pandas for tables and tabular measurements.
  • matplotlib for plots and quick visual inspection.
  • scikit-image for image loading, filtering, and segmentation utilities.
  • scikit-learn for machine learning and model evaluation.
  • SciPy for numerical methods and image-adjacent utilities.
  • napari for interactive image viewing.

What each library tends to feel like in practice

NumPy: arrays everywhere

import numpy as np

image = np.array([[1, 2], [3, 4]])
print(image.shape)
print(image.mean())

pandas: tables with labels

import pandas as pd

data = pd.DataFrame(
    {
        "sample": ["wt", "mutant"],
        "cell_count": [120, 95],
    }
)
print(data)

matplotlib: just plot the thing

import matplotlib.pyplot as plt

plt.plot([1, 2, 3], [2, 5, 4])
plt.title("Quick plot")
plt.show()

Combined example

This is the kind of short mixed example that appears all over scientific Python:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

measurements = pd.DataFrame(
    {
        "cell_id": ["c1", "c2", "c3"],
        "mean_intensity": np.array([120, 151, 118]),
    }
)

measurements.plot(x="cell_id", y="mean_intensity", kind="bar", legend=False)
plt.ylabel("Mean intensity")
plt.tight_layout()

The practical rule

You do not need to master every library before you start. You do need to get comfortable with what kind of problem each one is usually solving.

Try this for yourself

  1. Import NumPy and create a small array.
  2. Create a tiny pandas table with two columns.
  3. Plot one of those columns with matplotlib.
  4. Write one sentence on which library felt most intuitive and why.