Previously the trainer saved a new "best" model if either separation
OR loss improved, with loss checked as a fallback. This caused
confusing behavior where models with lower separation could overwrite
better models.
Now only separation (gap between positive and negative similarity) is
used to determine the best model, which is the key metric for
contrastive learning quality.