Loss Functions

Understanding the objectives that drive SOEN learning

Loss Functions for SOEN Training

Different tasks require different loss functions to guide the neural network training process. SOEN supports various loss functions tailored to specific learning objectives, from standard classification to margin-based learning that encourages robust decision boundaries.

Why Different Loss Functions Matter

The choice of loss function fundamentally shapes how your SOEN model learns:

Classification tasks: Need to assign inputs to discrete categories
Margin-based learning: Requires robust separation between classes
Sequence modeling: Demands temporal consistency and next-token prediction
Regularization: Controls model complexity and prevents overfitting

Each loss function encodes different inductive biases and learning objectives into the training process.

Cross Entropy Loss

Cross entropy is the standard loss function for multi-class classification problems. It measures the difference between predicted probability distributions and the true class labels.

Mathematical Formulation

For a single sample with true class $y$ and predicted probabilities $\hat{p}_i$:

$$\mathcal{L}_{CE} = -\log(\hat{p}_y) = -\log\left(\frac{e^{z_y}}{\sum_{j=1}^C e^{z_j}}\right)$$

For a batch of $N$ samples:

$$\mathcal{L}_{CE} = -\frac{1}{N} \sum_{i=1}^N \log(\hat{p}_{y_i})$$

Where:

$z_j$: Raw logit for class $j$
$C$: Total number of classes
$y_i$: True class label for sample $i$
$\hat{p}_{y_i}$: Predicted probability for the true class

How Cross Entropy Works

Probability Interpretation: Cross entropy treats the model output as a probability distribution over classes. The softmax function converts raw logits into probabilities:

$$\hat{p}_j = \frac{e^{z_j}}{\sum_{k=1}^C e^{z_k}}$$

Loss Behavior:

When $\hat{p}_y \to 1$ (confident correct prediction): $\mathcal{L}_{CE} \to 0$
When $\hat{p}_y \to 0$ (confident wrong prediction): $\mathcal{L}_{CE} \to \infty$
Loss decreases exponentially as confidence in correct class increases

Visual Intuition: Imagine the loss surface as a steep valley that guides the model toward high confidence in the correct class. The exponential nature creates strong gradients when predictions are wrong, providing clear learning signals.

When to Use Cross Entropy

✅ Ideal for:

Standard multi-class classification
Tasks where you want probabilistic outputs
Balanced datasets with clear class boundaries
When interpretability of class probabilities matters

❌ Limitations:

Doesn't explicitly enforce margins between classes
Can be sensitive to outliers and label noise
May lead to overconfident predictions
Doesn't directly optimize for robust decision boundaries

Gap Loss (Margin-Based Loss)

Gap loss is a margin-based objective that explicitly encourages separation between the correct class and incorrect classes. Unlike cross entropy, it focuses on the relative ordering of class scores rather than their absolute probabilities.

Mathematical Formulation

For a sample $i$ with true class $y_i$ and logits $z_j$:

$$\mathcal{L}_{\text{gap}} = \frac{1}{N} \sum_{i=1}^N \sum_{j \neq y_i} \max(0, z_{i,j} - z_{i,y_i} + m)$$

Where:

$m$: Margin parameter (typically 0.2-1.0)
$z_{i,y_i}$: Logit for the true class of sample $i$
$z_{i,j}$: Logit for incorrect class $j$ of sample $i$
$C$: Number of classes

How Gap Loss Works

Margin Enforcement: Gap loss penalizes incorrect classes that are within a margin $m$ of the correct class score. The loss is zero when the correct class score exceeds all incorrect class scores by at least the margin.

Hinge-like Behavior:

When $z_y - z_j > m$ (good separation): Loss contribution is 0
When $z_y - z_j < m$ (insufficient separation): Linear penalty proportional to gap shortfall
The $\max(0, \cdot)$ creates a "hinge" that activates only when margin is violated

Geometric Interpretation: Gap loss carves out a "safety zone" of width $m$ around the correct class decision boundary. Any incorrect class that ventures into this zone incurs a penalty.

Gap Loss vs Cross Entropy

Aspect	Cross Entropy	Gap Loss
Focus	Probability calibration	Decision boundary margins
Output	Well-calibrated probabilities	Robust class separation
Gradient	Exponential (soft)	Linear (hard)
Outliers	Sensitive to outliers	More robust to outliers
Overconfidence	Can promote overconfidence	Encourages modest confidence

When to Use Gap Loss

✅ Ideal for:

Tasks requiring robust classification under state readout noise
When decision boundary quality matters more than probability calibration

❌ Considerations:

Doesn't provide well-calibrated probabilities
Requires tuning the margin parameter $m$

Hyperparameter Tuning

Margin ($m$) Selection:

Small margins (0.1-0.3): Gentle separation, faster convergence
Medium margins (0.5-1.0): Balanced robustness and trainability
Large margins (1.0+): Strong separation, may slow convergence

Rule of thumb: Start with $m = 0.5$ and adjust based on validation performance and desired robustness level.

Advanced Loss Combinations

In practice, you often combine multiple loss functions to achieve specific learning objectives by defining multiple loss components in the training configuration. Each loss component is computed independently and then summed, weighted by a factor you provide.

Multi-Objective Training

$$\mathcal{L}_{\text{total}} = w_1 \mathcal{L}_1 + w_2 \mathcal{L}_2 + \dots + w_n \mathcal{L}_n$$

Where $w_n$ terms balance different objectives, such as:

Classification accuracy (e.g., cross entropy)
Decision boundary robustness (e.g., gap loss)
Model complexity (e.g., regularization losses like reg_J_loss)

SOEN-Specific Considerations

Temporal Dynamics: SOEN models have temporal evolution, so loss functions can be applied:

At final timestep (static classification)
Across multiple timesteps (temporal consistency)
With different weights over time (curriculum learning)

Physical Constraints: SOEN circuit parameters may benefit from:

Connection weight regularization
State magnitude penalties
Dynamic range constraints

See the Training Configuration page for details on how to combine multiple loss functions in your SOEN experiments.

🔧 Implementation in SOEN

Cross Entropy Configuration:

loss_function: "cross_entropy"
loss_params:

Gap Loss Configuration:

loss_function: "gap_loss"  
loss_params:
  margin: 0.5

Loss Functions for SOEN Training

Why Different Loss Functions Matter

Cross Entropy Loss

Mathematical Formulation

How Cross Entropy Works

When to Use Cross Entropy

Gap Loss (Margin-Based Loss)

Mathematical Formulation

How Gap Loss Works

Gap Loss vs Cross Entropy

When to Use Gap Loss

Hyperparameter Tuning

Advanced Loss Combinations

Multi-Objective Training

SOEN-Specific Considerations

🔧 Implementation in SOEN

Related Topics