10 Output Quality
How to pick a model?
Starting prompt for this chapter: Chapter 10 addresses how to assess the quality of a model’s output, mentioning Metrics Reloaded. This chapter should address the question: how do I know my model is good enough? It should frame this discussion using the example of a segmentation model and discuss how tools can identify uncertain decisions from a model.
- why do we need metrics? to pick models from different training iterations (avoid overfitting) and to select models with different hyperparameters
- mention train/validate/test, discuss the need for “ground truth” data, can callback to chapter 5
10.1 Metrics and Losses
difference between “metric” and “loss”
loss:
- has to be “differentiable”
- used to train the network
- should already be close to the metric you want to use
- can be used to assess model quality on validation data (but a custom metric might be more insightful)
metric:
- an application specific measure of how close you are to the ground truth
- used to select a model and to measure progress
- does not need to be “differentiable”
- but if it is, that’s great, you can use it as a loss
- otherwise, find a loss that is a good proxy or chose your model based on the metric on the validation dataset
10.2 What Metric to Pick?
- ideally: metric reflects time/cost needed to clean up for a particular application
- this can mean different things, and sometimes there is no single number (see Cell Tracking below)
- Discuss tradeoffs in metrics choice
- Interpretation of different metrics with guidance on how to balance/trade-off priorities
- E.g. segmentation for counting (segmentation as object detection) vs. segmentation for size estimation
10.3 Examples
Note: We are considering picking one example to place in call out boxes throughout the chapter to facilitate an ongoing discussion with a concrete example. For examples, we will use real data/GT, but synthetically generate the predictions in order to better highlight specific issues.
10.3.1 Segmentation Metrics
- show on examples:
- dice
- Hausdorff
- AP_x (and required matching)
- mention MetricsReloaded
10.3.2 Cell Tracking Metrics
- usually not a single number: topological correctness vs. positional correctness
- show traccuracy and discuss some of their metrics