Loss Functions

Understanding the objectives that drive SOEN learning

Loss Functions for SOEN Training

Different tasks require different loss functions to guide the neural network training process. SOEN supports various loss functions tailored to specific learning objectives, from standard classification to margin-based learning that encourages robust decision boundaries.

Why Different Loss Functions Matter

The choice of loss function fundamentally shapes how your SOEN model learns:

  • Classification tasks: Need to assign inputs to discrete categories
  • Margin-based learning: Requires robust separation between classes
  • Sequence modeling: Demands temporal consistency and next-token prediction
  • Regularization: Controls model complexity and prevents overfitting

Each loss function encodes different inductive biases and learning objectives into the training process.

Cross Entropy Loss

Cross entropy is the standard loss function for multi-class classification problems. It measures the difference between predicted probability distributions and the true class labels.

Mathematical Formulation

For a single sample with true class \(y\) and predicted probabilities \(\hat{p}_i\):

$$\mathcal{L}_{CE} = -\log(\hat{p}_y) = -\log\left(\frac{e^{z_y}}{\sum_{j=1}^C e^{z_j}}\right)$$

For a batch of \(N\) samples:

$$\mathcal{L}_{CE} = -\frac{1}{N} \sum_{i=1}^N \log(\hat{p}_{y_i})$$

Where:

  • \(z_j\): Raw logit for class \(j\)
  • \(C\): Total number of classes
  • \(y_i\): True class label for sample \(i\)
  • \(\hat{p}_{y_i}\): Predicted probability for the true class

How Cross Entropy Works

Probability Interpretation: Cross entropy treats the model output as a probability distribution over classes. The softmax function converts raw logits into probabilities:

$$\hat{p}_j = \frac{e^{z_j}}{\sum_{k=1}^C e^{z_k}}$$

Loss Behavior:

  • When \(\hat{p}_y \to 1\) (confident correct prediction): \(\mathcal{L}_{CE} \to 0\)
  • When \(\hat{p}_y \to 0\) (confident wrong prediction): \(\mathcal{L}_{CE} \to \infty\)
  • Loss decreases exponentially as confidence in correct class increases

Visual Intuition: Imagine the loss surface as a steep valley that guides the model toward high confidence in the correct class. The exponential nature creates strong gradients when predictions are wrong, providing clear learning signals.

When to Use Cross Entropy

Ideal for:

  • Standard multi-class classification
  • Tasks where you want probabilistic outputs
  • Balanced datasets with clear class boundaries
  • When interpretability of class probabilities matters

Limitations:

  • Doesn't explicitly enforce margins between classes
  • Can be sensitive to outliers and label noise
  • May lead to overconfident predictions
  • Doesn't directly optimize for robust decision boundaries

Gap Loss (Margin-Based Loss)

Gap loss is a margin-based objective that explicitly encourages separation between the correct class and incorrect classes. Unlike cross entropy, it focuses on the relative ordering of class scores rather than their absolute probabilities.

Mathematical Formulation

For a sample \(i\) with true class \(y_i\) and logits \(z_j\):

$$\mathcal{L}_{\text{gap}} = \frac{1}{N} \sum_{i=1}^N \sum_{j \neq y_i} \max(0, z_{i,j} - z_{i,y_i} + m)$$

Where:

  • \(m\): Margin parameter (typically 0.2-1.0)
  • \(z_{i,y_i}\): Logit for the true class of sample \(i\)
  • \(z_{i,j}\): Logit for incorrect class \(j\) of sample \(i\)
  • \(C\): Number of classes

How Gap Loss Works

Margin Enforcement: Gap loss penalizes incorrect classes that are within a margin \(m\) of the correct class score. The loss is zero when the correct class score exceeds all incorrect class scores by at least the margin.

Hinge-like Behavior:

  • When \(z_y - z_j > m\) (good separation): Loss contribution is 0
  • When \(z_y - z_j < m\) (insufficient separation): Linear penalty proportional to gap shortfall
  • The \(\max(0, \cdot)\) creates a "hinge" that activates only when margin is violated

Geometric Interpretation: Gap loss carves out a "safety zone" of width \(m\) around the correct class decision boundary. Any incorrect class that ventures into this zone incurs a penalty.

Gap Loss vs Cross Entropy

Aspect Cross Entropy Gap Loss
FocusProbability calibrationDecision boundary margins
OutputWell-calibrated probabilitiesRobust class separation
GradientExponential (soft)Linear (hard)
OutliersSensitive to outliersMore robust to outliers
OverconfidenceCan promote overconfidenceEncourages modest confidence

When to Use Gap Loss

Ideal for:

  • Tasks requiring robust classification under state readout noise
  • When decision boundary quality matters more than probability calibration

Considerations:

  • Doesn't provide well-calibrated probabilities
  • Requires tuning the margin parameter \(m\)

Hyperparameter Tuning

Margin (\(m\)) Selection:

  • Small margins (0.1-0.3): Gentle separation, faster convergence
  • Medium margins (0.5-1.0): Balanced robustness and trainability
  • Large margins (1.0+): Strong separation, may slow convergence

Rule of thumb: Start with \(m = 0.5\) and adjust based on validation performance and desired robustness level.

Advanced Loss Combinations

In practice, you often combine multiple loss functions to achieve specific learning objectives by defining multiple loss components in the training configuration. Each loss component is computed independently and then summed, weighted by a factor you provide.

Multi-Objective Training

$$\mathcal{L}_{\text{total}} = w_1 \mathcal{L}_1 + w_2 \mathcal{L}_2 + \dots + w_n \mathcal{L}_n$$

Where \(w_n\) terms balance different objectives, such as:

  • Classification accuracy (e.g., cross entropy)
  • Decision boundary robustness (e.g., gap loss)
  • Model complexity (e.g., regularization losses like reg_J_loss)

SOEN-Specific Considerations

Temporal Dynamics: SOEN models have temporal evolution, so loss functions can be applied:

  • At final timestep (static classification)
  • Across multiple timesteps (temporal consistency)
  • With different weights over time (curriculum learning)

Physical Constraints: SOEN circuit parameters may benefit from:

  • Connection weight regularization
  • State magnitude penalties
  • Dynamic range constraints

See the Training Configuration page for details on how to combine multiple loss functions in your SOEN experiments.

🔧 Implementation in SOEN

Cross Entropy Configuration:

loss_function: "cross_entropy"
loss_params: 

Gap Loss Configuration:

loss_function: "gap_loss"  
loss_params:
  margin: 0.5

Related Topics