10 Output Quality

How to pick a model?

Authors

Affiliations

Morgan Schwartz

HHMI Janelia Research Campus

Diane Adjavon

Starting prompt for this chapter: Chapter 10 addresses how to assess the quality of a model’s output, mentioning Metrics Reloaded. This chapter should address the question: how do I know my model is good enough? It should frame this discussion using the example of a segmentation model and discuss how tools can identify uncertain decisions from a model.

why do we need metrics? to pick models from different training iterations (avoid overfitting) and to select models with different hyperparameters
mention train/validate/test, discuss the need for “ground truth” data, can callback to chapter 5

10.1 Metrics and Losses

difference between “metric” and “loss”
loss:
- has to be “differentiable”
- used to train the network
- should already be close to the metric you want to use
- can be used to assess model quality on validation data (but a custom metric might be more insightful)
metric:
- an application specific measure of how close you are to the ground truth
- used to select a model and to measure progress
- does not need to be “differentiable”
- but if it is, that’s great, you can use it as a loss
- otherwise, find a loss that is a good proxy or chose your model based on the metric on the validation dataset

10.2 What Metric to Pick?

ideally: metric reflects time/cost needed to clean up for a particular application
this can mean different things, and sometimes there is no single number (see Cell Tracking below)
Discuss tradeoffs in metrics choice
Interpretation of different metrics with guidance on how to balance/trade-off priorities
- E.g. segmentation for counting (segmentation as object detection) vs. segmentation for size estimation

10.3 Examples

Note: We are considering picking one example to place in call out boxes throughout the chapter to facilitate an ongoing discussion with a concrete example. For examples, we will use real data/GT, but synthetically generate the predictions in order to better highlight specific issues.

10.3.1 Segmentation Metrics

show on examples:
- dice
- Hausdorff
- AP_x (and required matching)
mention MetricsReloaded

10.3.2 Cell Tracking Metrics

usually not a single number: topological correctness vs. positional correctness
show traccuracy and discuss some of their metrics