Glossary
Aberration
The spreading of light (also called ‘wavefront distortion’) due to imperfections in the optical path or variations in refractive index at the sample, which results in images that are blurrier than the ideal diffraction-limited image we would expect were aberrations absent.
Activation Function
An activation function is a mathematical formula that calculates the output of a node. These functions are nonlinear, which allows the network to solve nontrivial problems. They are so named because the output of the function decides to what extent each node in the network is “activated,” an analogy to the way in which biological neurons can be activated.
Activations
The values of elements in neural network units, often specifically the values of the output of an activation function.
Adaptive Optics
Technology that senses distortions in the wavefront of light and cancels them, thereby suppressing optical aberrations to enhance image clarity.
Autoencoder
A deep learning architecture used to learn efficient coding of unlabeled data. An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function that recreates the input data from the encoded representation. It is an unsuperivsed learning version of the more general encoder-decoder architecture.
Auxiliary Variables
Auxiliary variables are variables added to a linear programming problem to simplify mathematics, improve precison, or handle missing data. An example of an auxiliary variable is distance to the boundary. These or other shape descriptors can be used as part of of a loss function.
Backpropagation
The method used by neural networks to learn from its predictions. Once the prediction is done, it is compared with the ground truth through a training loss and the value of the comparison is used backwards to sequentially update the weights in the neural network, reward it when making a good prediction and punish it when making a bad prediction.
Batch
A small group of data that is processed together at the same time. For example, when training a machine learning model, a batch is a group of data that is given to the model for learning. Batches are commonly used to make the processes more efficient.
Bayesian Optimization
A strategy that allows the optimization of black-box functions such as deep neural networks. It creates a surrogate model, which is a probabilistic representation of the objective function, using only a few example points.
Bias Vector
The bias vector is a set of values (one for each node in the layer) that is added to the weighted input of the layer before the activation function is calculated. Note that the use of “bias” here is mathematical and not related to any potential biases in the model’s decision making.
Binary Segmentation
A type of image segmentation where each pixel is classified into one of two categories—typically “foreground” (e.g., cell) or “background.” The output is a binary mask distinguishing objects (set to a value of 1) from their background (0).
Capacity
A model’s capacity reflects how much information a network can store. It corresponds to a network’s ability to model an arbitrary function.
CARE (Content-aware image restoration)
A deep learning-based method for image restoration that leverages content-specific features to enhance degraded images. See https://github.com/CSBDeep/CSBDeep for more information.
Channels
In natural color images there are often three channels (red, green, and blue). In microscopy images, the number of channels can refer to the number of fluorescent labels or image settings used while imaging. The term is used similarly in neural networks where channels refers to the number of layers in an image. These channels can come from convolutional operations, leading to images with hundreds of channels corresponding to each filter used.
Computer Vision
A field of computer science wherein computers extract information from images. It often involves object detection within images and can involve classification of the images and/or objects.
Convolution
A mathematical process where a kernel (small matrix) slides over input data (e.g., images) to compute feature maps, highlighting patterns like edges or textures.
Convolutional Neural Networks (CNNs)
A deep learning architecture that applies convolutions to automatically learn features from images for computer vision tasks like classification and detection.
Data Augmentation
A strategy to artificially increase the diversity of a dataset prior to training by applying transformations such as rotation, flipping, or brightness adjustment. It helps improve model robustness and generalisation.
Deconvolution
A mathematical process to partially reverse the blurring effect caused by the microscope’s PSF, increasing contrast and resolution over the raw image data if performed carefully.
Domain Randomization
Using simulations or synthetic training data, domain randomization applies random and exaggerated variations to background, lighting, shapes, or textures in the synthetic dataset. This strategy helps the model learn domain-invariant features and is usually used for pretraining a neural network or to enable simulation-to-real transfer.
Downsampling
Downsampling is a technique to reduce the sampling of input data. For an image, this involves descreasing the number of pixels, and thus decreasing the resolution of the image.
Effect Size
How “strong” a phenotype is, or how mathematically possible it is to distinguish a given population from the control population.
Embedding Space
Embedding space is also know as a latent space or latent feature space. This space is a set of variables that describe items such that similar items are positioned closer together in the space.
Encoder-Decoder
An encoder-decoder architecture refers to a network with two parts. The “encoder” component of the network is a series of layers that take the input and map it to an abstract representation. The second “decoder” component of the network is trained to recreate input data from the abstract representation. This type of architecture learns an efficient way to represent the input data and is often used for dimensionality reduction.
Epoch
One complete pass through the entire training dataset during the training process.
F1 Score
A classification metric that gives the harmonic mean of precision (proportion of correct true positive predictions across all predicted positive cases) and recall (proportion of true positive predictions against the total positive cases). The harmonic mean is a method to balance both metrics equally. This metric was originally designed for binary classification but can be adapted to multiclass classification by calculating the F1 score per class.
False Negatives
In a scenario where you have two classes “positive” and “negative”, you try to predict cases as one of those classes. False negatives are the cases that you incorrectly predicted as negative and were really positive.
False Positives
In a scenario where you have two classes “positive” and “negative”, you try to predict cases as one of those classes. False positives are the cases that you incorrectly predicted as positive and were really negative.
FIJI
An image processing platform that comes bundled with many plugins for scientific image analysis. See https://imagej.net/software/fiji/ for more information.
Foundation Model
A foundation model is a model that was trained on a very large and very diverse dataset. Because of this diversity in training data, foundation models can be used for many downstream tasks or can be fine-tuned for specific purposes.
Frequency Domain
The representation of an image as a function of spatial frequency, obtained by transforming an image into the spatial domain using the Fourier transform.
Gaussian Process
A common surrogate model for optimization strategies such as Bayesian Optimization. Gaussian Processes are non-parametric a case that models a conditional probability function. In the hyperparameter search scenario, the Gaussian Process models the probability of getting an objective function value based on some hyperparameters.
Generative Adversarial Networks (GANs)
A deep learning architecture where two neural networks, a generator and a discriminator, are trained in an adversarial process, enabling the generator to create synthetic data, such as realistic images, by learning to deceive the discriminator.
Genetic Algorithms
An optimisation method inspired by the principles of natural selection and genetics. It starts with a population of solutions. These solutions are combined through a process called crossover to produce new solutions (offspring). During this process, random changes or mutations may occur to introduce diversity. After crossover and mutation, a selection step chooses the best solutions from both the parent and offspring populations to form the next generation. This cycle repeats for a set number of generations or until a predefined goal or stopping criterion is met.
Gradient Descent
Gradient descent computes the gradient of the loss with respect to each weight. Moving the weights in the negative direction of the gradient reduces the loss for the given images or data points over which the loss is computed.
Ground Truth
Accurate data against which a model can be evaluated. Ground truth data is often manually annotated. The data type itself will vary depending on the task and evaluation. e.g. instance segmentation may be compared to ground truth object counts or masks.
Hallucinations
Outputs from a model that do not have a basis in the input data and may contain false or misleading information.
Hyperparameters
The options you choose when training a machine learning model that affect the training process or the architecture of the model (e.g., learning rate, batch size, number of layers, training loss, etc.) are called hyperparameters. This term is used to differentiate them from the parameters (also known as weights) of the machine learning model.
Image Classification
A computer vision task where each image is associated with one class and the goal of this task is to correctly predict that class.
Image Restoration
The process of recovering clear, high-quality images from degraded raw data contaminated by blur, noise, or other distortions.
Instance Segmentation
A segmentation task that not only separates objects from the background but also distinguishes between individual objects of the same type (e.g., separating touching cells one by one).
IoU
“Intersection over Union”. A segmentation metric that calculates the difference between the area of overlap between two segmentation masks divided by the area of union.
Kernel
Convolutional kernels are also known as filters. They are small matrices that define the function for taking input pixels and creating an output image. They are often used for tasks like blurring or edge detection.
Linear Layer
Linear layers are also known as fully-connected layers or dense layers. They are a set of input nodes (or neurons) that are each connected to every output node of the layer.
Loss function
A loss function (or cost function) is a formula for quantifying how much a model’s predication deviates from the actual ground truth value. The loss function returns a single scalar number to quantify the loss.
Manual Annotation
The process of manually labeling specific structures or objects in an image using drawing tools. Typically done in software like Fiji or Napari, this step is essential for creating ground truth data to train or evaluate machine learning models.
Metadata
Any data that provides additional information about other data. In bioimaging, examples include information about sample preparation, the imaging instrument, and image acquisition parameters.
Momentum
In the context of learning rate, momentum speeds up learning when the gradient is consistent over multiple iterations, which allows the training to avoid local minima.
Multilayer Perceptron
A sequence of fully-connected layers that are applied in series, each on the output of the previous one, is called a multilayer perceptron (MLP). This is the simplest example of a deep neural network.
N2N (Noise2Noise)
A supervised denoising method that trains a neural network on pairs of independently noisy images of the same scene, requiring no clean reference data but needing paired noisy inputs. See https://github.com/NVlabs/noise2noise for more information.
N2S (Noise2Self)
A self-supervised denoising method that trains a neural network assuming statistically independent noise across the image, requiring only single noisy images without paired clean data. See https://github.com/czbiohub-sf/noise2self for more information.
N2V (Noise2Void)
A self-supervised denoising method that trains a neural network to predict pixel values from noisy images by masking input pixels, requiring only single noisy images without paired clean data. See https://github.com/juglab/n2v for more information.
Natural Image
Natural images are images of the environment. This most commonly refers to color photographs of people, nature, objects, etc. Many machine learning researchers focus on natural images for tasks like self-driving cars, but microscopy images pose additional challenges for AI.
Network Architecture
The architecture of a machine learning model refers to the design and structure of the model, including choices in how individual components (such as preprocessing or feature extraction) are connected to each other.
Neural Network
Neural networks are a type of computational model that were inspired by the way that biological neural networks are structured.
Neural Network Unit
A neural unit is the processing element of a neural network. It also sometimes called a node or artificial neuron.
Nonlinear
If a system or function is non-linear, changes in its output are not proportional to the changes that are made to the input. Non-linear functions are incorporated into machine learning networks to allow for the solution of complicated problems because it allows for non-linear decision boundaries.
Nonlinear Problem
A mathematical problem where the governing equations or operations are nonlinear, meaning outputs are not linearly proportional to inputs.
Object Detection
A computer vision task that identifies and locates individual objects within an image, typically by drawing bounding boxes around them. It provides both the category (what) and position (where) of each object.
Optimizer
An optimizer is a function that is used to adjust model parameters during training. By adjusting weights and biases, the optimizer attempts to minimize the loss function and thus make the model more accurate.
Padding
Padding involves adding extra layers of pixels around an image to avoid edge effects. Padding can be performed by simply setting the extra pixels to zero, or it can take into account the image itself (such as in mirror padding or replicate padding).
Panoptic Segmentation
A computer vision technique that is a combination of semantic segmentation and instance segmentation. It separates an image into regions while also detecting individual object instances within those regions.
Patches
In image processing, a patch refers to a small region or subset of pixels that are extracted from the larger image.
Perceptron
Perceptrons are linear weighted sums of inputs followed by a nonlinear activation function. They are used for binary classification, and can be combined to create more complex network archictures. Perceptrons are a specific type of an artificial neuron or node.
Pixel Classifiers
Machine learning models that classify each pixel in an image based on features such as intensity, texture, or local neighborhood. Commonly used in traditional workflows for segmentation or classification tasks.
Point Spread Function (PSF)
A mathematical function that describes how an imaging system blurs a point source.
Pooling Layer
A pooling layer aggregates information from many vectors into fewer vectors, i.e., it downsamples the image. The pooling layer does this using sliding windows across the image of maximum or average operations. This removes redundant information and reduces the size of the image for the following calculations.
Probability Vector
A probability vector is composed of values that indicate the probability of a particular variable (e.g., the probability that an image is a dog or cat). The values of a probability vector sum to one.
Quality Control Metric
Any metric that can be used to evaluate quality. It will vary depending on the task and data type. It can be binary (e.g. an image doesn’t have debris) or continuous (e.g. annotated object centroids are within 5 pixels of the ground truth centroids).
RCAN (residual channel attention network)
A deep learning-based method using residual learning and channel attention to improve image restoration tasks. See https://github.com/AiviaCommunity/3D-RCAN for more information.
Receptive Field
The receptive field size of a convolutional layer is the total spatial extent of pixels that influence the activations of the layer. The receptive field of the entire network describes the size of the region in the input image that influences the output image.
Regularization
Regularization adds a penalty term to the loss function to prevent overfitting. This term discourages overly complex models.
ReLU
An activation function common in deep learning that outputs the input directly if it is positive, and outputs zero otherwise. That is, it is the ramp function where it equals x if x>0 and 0 if x<=0. This characteristic helps introduce non-linearity into the model and mitigate the vanishing gradient problem. ReLU stands for rectified linear unit.
Self-attention
Attention allows a model to determine the importance of each component of an input sequence. Self-attention is a type of attention mechanism that is commonly used in transformer architectures.
Self-supervised learning
A deep learning method where models generate their own supervisory signals from unlabeled data, often by using pretext tasks, to learn useful representations that can be applied to various downstream tasks.
Semantic Segmentation
A form of segmentation where each pixel in an image is assigned to a class (e.g., nucleus, cytoplasm, background), but it does not distinguish between separate instances of the same class.
Sigmoid Function
An activation function common in deep learning that non-linearly maps real inputs to outputs between 0 and 1, being most sensitive to changes in inputs around zero and increasingly compressing extreme positive or negative inputs as they approach 1 or 0 respectively; this characteristic enables it to model probabilities for binary classification and introduce smooth non-linearity.
Skip Connection
Skip connections are also sometimes know as residual connections. These connections are made when the output of a layer of the model is added to or concatenated with a later layer in the network while bypassing any layers that may be in between. For example, the layers in a U-Net architecture connect across image resolutions.
Spatial Domain
The representation of an image as a function of spatial coordinates.
Star-convex Polygon
A geometric shape used in segmentation algorithms like StarDist. Imagine drawing straight lines (rays) from the centre of an object out toward its edges—if you can see the edge from the centre in all directions, the object is considered star-convex. This method works well for blob-like structures such as nuclei, because their general shape can be captured by measuring how far each ray travels from the centre to the boundary.
Stride
When performing a convolution or cross-correlation, the kernel (aka the filter) moves across the entire image. When the stride is 1, this corresponds to moving the filter across every pixel in the image. For larger values of the stride, the kernel is moved more than one element at a time. For example, a stride of 2 would skip every other pixel in the image.
Supervised learning
A deep learning method where models learn from labeled data (input-output pairs), enabling them to learn a mapping function for making predictions or decisions on unseen inputs.
Training Data
Data used to train an algorithm to make predictions.
Transfer Learning
A deep learning technique that reuses a model pre-trained on one task as the starting point for a new, related task, leveraging its learned knowledge to improve performance or reduce training requirements. In practice, part of a pretrained neural network (usually the initial layers, responsible for feature extraction) is frozen and reused in a new model. These frozen layers, with the knowledge from a previous dataset, are combined with untrained layers tailored for a specific bioimaging task. During training, only the new layers will be updated, allowing the model to adapt to the new task with limited data.
Transformer Models
A deep learning architecture based on the multi-head attention mechanism; specifically referring to the ‘vision transformer’ architecture. A vision transformer (ViT) is a transformer designed for computer vision. A ViT decomposes an input image into a series of patches (rather than text into tokens), serializes each patch into a vector, and maps it to a smaller dimension with a single matrix multiplication. These vector embeddings are then processed by a transformer encoder as if they were token embeddings.
True Negatives
In a scenario where you have two classes “positive” and “negative”, you try to predict cases as one of those classes. True positives are the cases that you predicted as negative and were really negative.
True Positives
In a scenario where you have two classes “positive” and “negative”, you try to predict cases as one of those classes. True positives are the cases that you predicted as positive and were really positive.
Upsampling
Upsampling is a technique to increase the sampling of input data. For an image, this involves increasing the number of pixels and interpolating values for the newly sampled pixels.
Virtual Machine
On a physical computer, you install an operating system (e.g., Windows or Ubuntu) that you interact with. A virtual machine is a program that simulates a complete computer with its own operating system. This lets you run a “computer inside your computer” (e.g., using Linux inside Windows or the other way around). As this simulated computer is separate from your physical one, it adds an extra layer of security, because unless the user specifically allows it, the virtual machine cannot access or connect to your real computer.
Zernike Modes
A set of orthogonal polynomials used to describe and correct wavefront aberrations in optical systems.