Usage in framework
All in tf.keras.losses available loss functions can be used in our framework. Just use the class name as configuration key and the arguments as as dict.
YAML Configuration
E.g. for tf.losses.MeanAbsoluteError without no arguments
loss:
MeanSquaredError:
or for tf.losses.Huber with arguments:
loss:
Huber:
delta: 0.3
Keras loss functions
BinaryCrossentropy class
tensorflow.keras.losses.BinaryCrossentropy(
from_logits=False, label_smoothing=0, reduction="auto", name="binary_crossentropy"
)
Computes the cross-entropy loss between true labels and predicted labels.
Use this cross-entropy loss when there are only two label classes (assumed to be 0 and 1). For each example, there should be a single floating-point value per prediction.
In the snippet below, each of the four examples has only a single
floating-pointing value, and both y_pred
and y_true
have the shape
[batch_size]
.
Standalone usage:
y_true = [[0., 1.], [0., 0.]] y_pred = [[0.6, 0.4], [0.4, 0.6]]
Using 'auto'/'sum_over_batch_size' reduction type.
bce = tf.keras.losses.BinaryCrossentropy() bce(y_true, y_pred).numpy() 0.815
Calling with 'sample_weight'.
bce(y_true, y_pred, sample_weight=[1, 0]).numpy() 0.458
Using 'sum' reduction type.
bce = tf.keras.losses.BinaryCrossentropy( ... reduction=tf.keras.losses.Reduction.SUM) bce(y_true, y_pred).numpy() 1.630
Using 'none' reduction type.
bce = tf.keras.losses.BinaryCrossentropy( ... reduction=tf.keras.losses.Reduction.NONE) bce(y_true, y_pred).numpy() array([0.916 , 0.714], dtype=float32)
Usage with the tf.keras
API:
model.compile(optimizer='sgd', loss=tf.keras.losses.BinaryCrossentropy())
CategoricalCrossentropy class
tensorflow.keras.losses.CategoricalCrossentropy(
from_logits=False, label_smoothing=0, reduction="auto", name="categorical_crossentropy"
)
Computes the crossentropy loss between the labels and predictions.
Use this crossentropy loss function when there are two or more label classes.
We expect labels to be provided in a one_hot
representation. If you want to
provide labels as integers, please use SparseCategoricalCrossentropy
loss.
There should be # classes
floating point values per feature.
In the snippet below, there is # classes
floating pointing values per
example. The shape of both y_pred
and y_true
are
[batch_size, num_classes]
.
Standalone usage:
y_true = [[0, 1, 0], [0, 0, 1]] y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
Using 'auto'/'sum_over_batch_size' reduction type.
cce = tf.keras.losses.CategoricalCrossentropy() cce(y_true, y_pred).numpy() 1.177
Calling with 'sample_weight'.
cce(y_true, y_pred, sample_weight=tf.constant([0.3, 0.7])).numpy() 0.814
Using 'sum' reduction type.
cce = tf.keras.losses.CategoricalCrossentropy( ... reduction=tf.keras.losses.Reduction.SUM) cce(y_true, y_pred).numpy() 2.354
Using 'none' reduction type.
cce = tf.keras.losses.CategoricalCrossentropy( ... reduction=tf.keras.losses.Reduction.NONE) cce(y_true, y_pred).numpy() array([0.0513, 2.303], dtype=float32)
Usage with the compile()
API:
model.compile(optimizer='sgd', loss=tf.keras.losses.CategoricalCrossentropy())
CategoricalHinge class
tensorflow.keras.losses.CategoricalHinge(reduction="auto", name="categorical_hinge")
Computes the categorical hinge loss between y_true
and y_pred
.
loss = maximum(neg - pos + 1, 0)
where neg=maximum((1-y_true)*y_pred) and pos=sum(y_true*y_pred)
Standalone usage:
y_true = [[0, 1], [0, 0]] y_pred = [[0.6, 0.4], [0.4, 0.6]]
Using 'auto'/'sum_over_batch_size' reduction type.
h = tf.keras.losses.CategoricalHinge() h(y_true, y_pred).numpy() 1.4
Calling with 'sample_weight'.
h(y_true, y_pred, sample_weight=[1, 0]).numpy() 0.6
Using 'sum' reduction type.
h = tf.keras.losses.CategoricalHinge( ... reduction=tf.keras.losses.Reduction.SUM) h(y_true, y_pred).numpy() 2.8
Using 'none' reduction type.
h = tf.keras.losses.CategoricalHinge( ... reduction=tf.keras.losses.Reduction.NONE) h(y_true, y_pred).numpy() array([1.2, 1.6], dtype=float32)
Usage with the compile()
API:
model.compile(optimizer='sgd', loss=tf.keras.losses.CategoricalHinge())
CosineSimilarity class
tensorflow.keras.losses.CosineSimilarity(axis=-1, reduction="auto", name="cosine_similarity")
Computes the cosine similarity between labels and predictions.
Note that it is a number between -1 and 1. When it is a negative number
between -1 and 0, 0 indicates orthogonality and values closer to -1
indicate greater similarity. The values closer to 1 indicate greater
dissimilarity. This makes it usable as a loss function in a setting
where you try to maximize the proximity between predictions and targets.
If either y_true
or y_pred
is a zero vector, cosine similarity will be 0
regardless of the proximity between predictions and targets.
loss = -sum(l2_norm(y_true) * l2_norm(y_pred))
Standalone usage:
y_true = [[0., 1.], [1., 1.]] y_pred = [[1., 0.], [1., 1.]]
Using 'auto'/'sum_over_batch_size' reduction type.
cosine_loss = tf.keras.losses.CosineSimilarity(axis=1)
l2_norm(y_true) = [[0., 1.], [1./1.414], 1./1.414]]]
l2_norm(y_pred) = [[1., 0.], [1./1.414], 1./1.414]]]
l2_norm(y_true) . l2_norm(y_pred) = [[0., 0.], [0.5, 0.5]]
loss = mean(sum(l2_norm(y_true) . l2_norm(y_pred), axis=1))
= -((0. + 0.) + (0.5 + 0.5)) / 2
cosine_loss(y_true, y_pred).numpy() -0.5
Calling with 'sample_weight'.
cosine_loss(y_true, y_pred, sample_weight=[0.8, 0.2]).numpy() -0.0999
Using 'sum' reduction type.
cosine_loss = tf.keras.losses.CosineSimilarity(axis=1, ... reduction=tf.keras.losses.Reduction.SUM) cosine_loss(y_true, y_pred).numpy() -0.999
Using 'none' reduction type.
cosine_loss = tf.keras.losses.CosineSimilarity(axis=1, ... reduction=tf.keras.losses.Reduction.NONE) cosine_loss(y_true, y_pred).numpy() array([-0., -0.999], dtype=float32)
Usage with the compile()
API:
model.compile(optimizer='sgd', loss=tf.keras.losses.CosineSimilarity(axis=1))
Args:
axis: (Optional) Defaults to -1. The dimension along which the cosine
similarity is computed.
reduction: (Optional) Type of tf.keras.losses.Reduction
to apply to loss.
Default value is AUTO
. AUTO
indicates that the reduction option will
be determined by the usage context. For almost all cases this defaults to
SUM_OVER_BATCH_SIZE
. When used with tf.distribute.Strategy
, outside of
built-in training loops such as tf.keras
compile
and fit
, using
AUTO
or SUM_OVER_BATCH_SIZE
will raise an error. Please see this
custom training [tutorial]
(https://www.tensorflow.org/tutorials/distribute/custom_training) for more
details.
name: Optional name for the op.
Hinge class
tensorflow.keras.losses.Hinge(reduction="auto", name="hinge")
Computes the hinge loss between y_true
and y_pred
.
loss = maximum(1 - y_true * y_pred, 0)
y_true
values are expected to be -1 or 1. If binary (0 or 1) labels are
provided we will convert them to -1 or 1.
Standalone usage:
y_true = [[0., 1.], [0., 0.]] y_pred = [[0.6, 0.4], [0.4, 0.6]]
Using 'auto'/'sum_over_batch_size' reduction type.
h = tf.keras.losses.Hinge() h(y_true, y_pred).numpy() 1.3
Calling with 'sample_weight'.
h(y_true, y_pred, sample_weight=[1, 0]).numpy() 0.55
Using 'sum' reduction type.
h = tf.keras.losses.Hinge( ... reduction=tf.keras.losses.Reduction.SUM) h(y_true, y_pred).numpy() 2.6
Using 'none' reduction type.
h = tf.keras.losses.Hinge( ... reduction=tf.keras.losses.Reduction.NONE) h(y_true, y_pred).numpy() array([1.1, 1.5], dtype=float32)
Usage with the compile()
API:
model.compile(optimizer='sgd', loss=tf.keras.losses.Hinge())
Huber class
tensorflow.keras.losses.Huber(delta=1.0, reduction="auto", name="huber_loss")
Computes the Huber loss between y_true
and y_pred
.
For each value x in error = y_true - y_pred
:
loss = 0.5 * x^2 if |x| <= d
loss = 0.5 * d^2 + d * (|x| - d) if |x| > d
where d is delta
. See: https://en.wikipedia.org/wiki/Huber_loss
Standalone usage:
y_true = [[0, 1], [0, 0]] y_pred = [[0.6, 0.4], [0.4, 0.6]]
Using 'auto'/'sum_over_batch_size' reduction type.
h = tf.keras.losses.Huber() h(y_true, y_pred).numpy() 0.155
Calling with 'sample_weight'.
h(y_true, y_pred, sample_weight=[1, 0]).numpy() 0.09
Using 'sum' reduction type.
h = tf.keras.losses.Huber( ... reduction=tf.keras.losses.Reduction.SUM) h(y_true, y_pred).numpy() 0.31
Using 'none' reduction type.
h = tf.keras.losses.Huber( ... reduction=tf.keras.losses.Reduction.NONE) h(y_true, y_pred).numpy() array([0.18, 0.13], dtype=float32)
Usage with the compile()
API:
model.compile(optimizer='sgd', loss=tf.keras.losses.Huber())
kl_divergence function
tensorflow.keras.losses.KLD(y_true, y_pred)
Computes Kullback-Leibler divergence loss between y_true
and y_pred
.
loss = y_true * log(y_true / y_pred)
See: https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
Standalone usage:
y_true = np.random.randint(0, 2, size=(2, 3)).astype(np.float64) y_pred = np.random.random(size=(2, 3)) loss = tf.keras.losses.kullback_leibler_divergence(y_true, y_pred) assert loss.shape == (2,) y_true = tf.keras.backend.clip(y_true, 1e-7, 1) y_pred = tf.keras.backend.clip(y_pred, 1e-7, 1) assert np.array_equal( ... loss.numpy(), np.sum(y_true * np.log(y_true / y_pred), axis=-1))
Args: y_true: Tensor of true targets. y_pred: Tensor of predicted targets.
Returns:
A Tensor
with loss.
Raises:
TypeError: If y_true
cannot be cast to the y_pred.dtype
.
KLDivergence class
tensorflow.keras.losses.KLDivergence(reduction="auto", name="kl_divergence")
Computes Kullback-Leibler divergence loss between y_true
and y_pred
.
loss = y_true * log(y_true / y_pred)
See: https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
Standalone usage:
y_true = [[0, 1], [0, 0]] y_pred = [[0.6, 0.4], [0.4, 0.6]]
Using 'auto'/'sum_over_batch_size' reduction type.
kl = tf.keras.losses.KLDivergence() kl(y_true, y_pred).numpy() 0.458
Calling with 'sample_weight'.
kl(y_true, y_pred, sample_weight=[0.8, 0.2]).numpy() 0.366
Using 'sum' reduction type.
kl = tf.keras.losses.KLDivergence( ... reduction=tf.keras.losses.Reduction.SUM) kl(y_true, y_pred).numpy() 0.916
Using 'none' reduction type.
kl = tf.keras.losses.KLDivergence( ... reduction=tf.keras.losses.Reduction.NONE) kl(y_true, y_pred).numpy() array([0.916, -3.08e-06], dtype=float32)
Usage with the compile()
API:
model.compile(optimizer='sgd', loss=tf.keras.losses.KLDivergence())
LogCosh class
tensorflow.keras.losses.LogCosh(reduction="auto", name="log_cosh")
Computes the logarithm of the hyperbolic cosine of the prediction error.
logcosh = log((exp(x) + exp(-x))/2)
,
where x is the error y_pred - y_true
.
Standalone usage:
y_true = [[0., 1.], [0., 0.]] y_pred = [[1., 1.], [0., 0.]]
Using 'auto'/'sum_over_batch_size' reduction type.
l = tf.keras.losses.LogCosh() l(y_true, y_pred).numpy() 0.108
Calling with 'sample_weight'.
l(y_true, y_pred, sample_weight=[0.8, 0.2]).numpy() 0.087
Using 'sum' reduction type.
l = tf.keras.losses.LogCosh( ... reduction=tf.keras.losses.Reduction.SUM) l(y_true, y_pred).numpy() 0.217
Using 'none' reduction type.
l = tf.keras.losses.LogCosh( ... reduction=tf.keras.losses.Reduction.NONE) l(y_true, y_pred).numpy() array([0.217, 0.], dtype=float32)
Usage with the compile()
API:
model.compile(optimizer='sgd', loss=tf.keras.losses.LogCosh())
Loss class
tensorflow.keras.losses.Loss(reduction="auto", name=None)
Loss base class.
To be implemented by subclasses:
* call()
: Contains the logic for loss calculation using y_true
, y_pred
.
Example subclass implementation:
class MeanSquaredError(Loss):
def call(self, y_true, y_pred):
y_pred = tf.convert_to_tensor_v2(y_pred)
y_true = tf.cast(y_true, y_pred.dtype)
return tf.reduce_mean(math_ops.square(y_pred - y_true), axis=-1)
When used with tf.distribute.Strategy
, outside of built-in training loops
such as tf.keras
compile
and fit
, please use 'SUM' or 'NONE' reduction
types, and reduce losses explicitly in your training loop. Using 'AUTO' or
'SUM_OVER_BATCH_SIZE' will raise an error.
Please see this custom training tutorial for more details on this.
You can implement 'SUM_OVER_BATCH_SIZE' using global batch size like:
with strategy.scope():
loss_obj = tf.keras.losses.CategoricalCrossentropy(
reduction=tf.keras.losses.Reduction.NONE)
....
loss = (tf.reduce_sum(loss_obj(labels, predictions)) *
(1. / global_batch_size))
mean_absolute_error function
tensorflow.keras.losses.MAE(y_true, y_pred)
Computes the mean absolute error between labels and predictions.
loss = mean(abs(y_true - y_pred), axis=-1)
Standalone usage:
y_true = np.random.randint(0, 2, size=(2, 3)) y_pred = np.random.random(size=(2, 3)) loss = tf.keras.losses.mean_absolute_error(y_true, y_pred) assert loss.shape == (2,) assert np.array_equal( ... loss.numpy(), np.mean(np.abs(y_true - y_pred), axis=-1))
Args:
y_true: Ground truth values. shape = [batch_size, d0, .. dN]
.
y_pred: The predicted values. shape = [batch_size, d0, .. dN]
.
Returns:
Mean absolute error values. shape = [batch_size, d0, .. dN-1]
.
mean_absolute_percentage_error function
tensorflow.keras.losses.MAPE(y_true, y_pred)
Computes the mean absolute percentage error between y_true
and y_pred
.
loss = 100 * mean(abs((y_true - y_pred) / y_true), axis=-1)
Standalone usage:
y_true = np.random.random(size=(2, 3)) y_true = np.maximum(y_true, 1e-7) # Prevent division by zero y_pred = np.random.random(size=(2, 3)) loss = tf.keras.losses.mean_absolute_percentage_error(y_true, y_pred) assert loss.shape == (2,) assert np.array_equal( ... loss.numpy(), ... 100. * np.mean(np.abs((y_true - y_pred) / y_true), axis=-1))
Args:
y_true: Ground truth values. shape = [batch_size, d0, .. dN]
.
y_pred: The predicted values. shape = [batch_size, d0, .. dN]
.
Returns:
Mean absolute percentage error values. shape = [batch_size, d0, .. dN-1]
.
mean_squared_error function
tensorflow.keras.losses.MSE(y_true, y_pred)
Computes the mean squared error between labels and predictions.
After computing the squared distance between the inputs, the mean value over the last dimension is returned.
loss = mean(square(y_true - y_pred), axis=-1)
Standalone usage:
y_true = np.random.randint(0, 2, size=(2, 3)) y_pred = np.random.random(size=(2, 3)) loss = tf.keras.losses.mean_squared_error(y_true, y_pred) assert loss.shape == (2,) assert np.array_equal( ... loss.numpy(), np.mean(np.square(y_true - y_pred), axis=-1))
Args:
y_true: Ground truth values. shape = [batch_size, d0, .. dN]
.
y_pred: The predicted values. shape = [batch_size, d0, .. dN]
.
Returns:
Mean squared error values. shape = [batch_size, d0, .. dN-1]
.
mean_squared_logarithmic_error function
tensorflow.keras.losses.MSLE(y_true, y_pred)
Computes the mean squared logarithmic error between y_true
and y_pred
.
loss = mean(square(log(y_true + 1) - log(y_pred + 1)), axis=-1)
Standalone usage:
y_true = np.random.randint(0, 2, size=(2, 3)) y_pred = np.random.random(size=(2, 3)) loss = tf.keras.losses.mean_squared_logarithmic_error(y_true, y_pred) assert loss.shape == (2,) y_true = np.maximum(y_true, 1e-7) y_pred = np.maximum(y_pred, 1e-7) assert np.allclose( ... loss.numpy(), ... np.mean( ... np.square(np.log(y_true + 1.) - np.log(y_pred + 1.)), axis=-1))
Args:
y_true: Ground truth values. shape = [batch_size, d0, .. dN]
.
y_pred: The predicted values. shape = [batch_size, d0, .. dN]
.
Returns:
Mean squared logarithmic error values. shape = [batch_size, d0, .. dN-1]
.
MeanAbsoluteError class
tensorflow.keras.losses.MeanAbsoluteError(reduction="auto", name="mean_absolute_error")
Computes the mean of absolute difference between labels and predictions.
loss = abs(y_true - y_pred)
Standalone usage:
y_true = [[0., 1.], [0., 0.]] y_pred = [[1., 1.], [1., 0.]]
Using 'auto'/'sum_over_batch_size' reduction type.
mae = tf.keras.losses.MeanAbsoluteError() mae(y_true, y_pred).numpy() 0.5
Calling with 'sample_weight'.
mae(y_true, y_pred, sample_weight=[0.7, 0.3]).numpy() 0.25
Using 'sum' reduction type.
mae = tf.keras.losses.MeanAbsoluteError( ... reduction=tf.keras.losses.Reduction.SUM) mae(y_true, y_pred).numpy() 1.0
Using 'none' reduction type.
mae = tf.keras.losses.MeanAbsoluteError( ... reduction=tf.keras.losses.Reduction.NONE) mae(y_true, y_pred).numpy() array([0.5, 0.5], dtype=float32)
Usage with the compile()
API:
model.compile(optimizer='sgd', loss=tf.keras.losses.MeanAbsoluteError())
MeanAbsolutePercentageError class
tensorflow.keras.losses.MeanAbsolutePercentageError(
reduction="auto", name="mean_absolute_percentage_error"
)
Computes the mean absolute percentage error between y_true
and y_pred
.
loss = 100 * abs(y_true - y_pred) / y_true
Standalone usage:
y_true = [[2., 1.], [2., 3.]] y_pred = [[1., 1.], [1., 0.]]
Using 'auto'/'sum_over_batch_size' reduction type.
mape = tf.keras.losses.MeanAbsolutePercentageError() mape(y_true, y_pred).numpy() 50.
Calling with 'sample_weight'.
mape(y_true, y_pred, sample_weight=[0.7, 0.3]).numpy() 20.
Using 'sum' reduction type.
mape = tf.keras.losses.MeanAbsolutePercentageError( ... reduction=tf.keras.losses.Reduction.SUM) mape(y_true, y_pred).numpy() 100.
Using 'none' reduction type.
mape = tf.keras.losses.MeanAbsolutePercentageError( ... reduction=tf.keras.losses.Reduction.NONE) mape(y_true, y_pred).numpy() array([25., 75.], dtype=float32)
Usage with the compile()
API:
model.compile(optimizer='sgd',
loss=tf.keras.losses.MeanAbsolutePercentageError())
MeanSquaredError class
tensorflow.keras.losses.MeanSquaredError(reduction="auto", name="mean_squared_error")
Computes the mean of squares of errors between labels and predictions.
loss = square(y_true - y_pred)
Standalone usage:
y_true = [[0., 1.], [0., 0.]] y_pred = [[1., 1.], [1., 0.]]
Using 'auto'/'sum_over_batch_size' reduction type.
mse = tf.keras.losses.MeanSquaredError() mse(y_true, y_pred).numpy() 0.5
Calling with 'sample_weight'.
mse(y_true, y_pred, sample_weight=[0.7, 0.3]).numpy() 0.25
Using 'sum' reduction type.
mse = tf.keras.losses.MeanSquaredError( ... reduction=tf.keras.losses.Reduction.SUM) mse(y_true, y_pred).numpy() 1.0
Using 'none' reduction type.
mse = tf.keras.losses.MeanSquaredError( ... reduction=tf.keras.losses.Reduction.NONE) mse(y_true, y_pred).numpy() array([0.5, 0.5], dtype=float32)
Usage with the compile()
API:
model.compile(optimizer='sgd', loss=tf.keras.losses.MeanSquaredError())
MeanSquaredLogarithmicError class
tensorflow.keras.losses.MeanSquaredLogarithmicError(
reduction="auto", name="mean_squared_logarithmic_error"
)
Computes the mean squared logarithmic error between y_true
and y_pred
.
loss = square(log(y_true + 1.) - log(y_pred + 1.))
Standalone usage:
y_true = [[0., 1.], [0., 0.]] y_pred = [[1., 1.], [1., 0.]]
Using 'auto'/'sum_over_batch_size' reduction type.
msle = tf.keras.losses.MeanSquaredLogarithmicError() msle(y_true, y_pred).numpy() 0.240
Calling with 'sample_weight'.
msle(y_true, y_pred, sample_weight=[0.7, 0.3]).numpy() 0.120
Using 'sum' reduction type.
msle = tf.keras.losses.MeanSquaredLogarithmicError( ... reduction=tf.keras.losses.Reduction.SUM) msle(y_true, y_pred).numpy() 0.480
Using 'none' reduction type.
msle = tf.keras.losses.MeanSquaredLogarithmicError( ... reduction=tf.keras.losses.Reduction.NONE) msle(y_true, y_pred).numpy() array([0.240, 0.240], dtype=float32)
Usage with the compile()
API:
model.compile(optimizer='sgd',
loss=tf.keras.losses.MeanSquaredLogarithmicError())
Poisson class
tensorflow.keras.losses.Poisson(reduction="auto", name="poisson")
Computes the Poisson loss between y_true
and y_pred
.
loss = y_pred - y_true * log(y_pred)
Standalone usage:
y_true = [[0., 1.], [0., 0.]] y_pred = [[1., 1.], [0., 0.]]
Using 'auto'/'sum_over_batch_size' reduction type.
p = tf.keras.losses.Poisson() p(y_true, y_pred).numpy() 0.5
Calling with 'sample_weight'.
p(y_true, y_pred, sample_weight=[0.8, 0.2]).numpy() 0.4
Using 'sum' reduction type.
p = tf.keras.losses.Poisson( ... reduction=tf.keras.losses.Reduction.SUM) p(y_true, y_pred).numpy() 0.999
Using 'none' reduction type.
p = tf.keras.losses.Poisson( ... reduction=tf.keras.losses.Reduction.NONE) p(y_true, y_pred).numpy() array([0.999, 0.], dtype=float32)
Usage with the compile()
API:
model.compile(optimizer='sgd', loss=tf.keras.losses.Poisson())
ReductionV2 class
tensorflow.keras.losses.Reduction(*args, **kwargs)
Types of loss reduction.
Contains the following values:
AUTO
: Indicates that the reduction option will be determined by the usage context. For almost all cases this defaults toSUM_OVER_BATCH_SIZE
. When used withtf.distribute.Strategy
, outside of built-in training loops such astf.keras
compile
andfit
, we expect reduction value to beSUM
orNONE
. UsingAUTO
in that case will raise an error.NONE
: Weighted losses with one dimension reduced (axis=-1, or axis specified by loss function). When this reduction type used with built-in Keras training loops likefit
/evaluate
, the unreduced vector loss is passed to the optimizer but the reported loss will be a scalar value.SUM
: Scalar sum of weighted losses.SUM_OVER_BATCH_SIZE
: ScalarSUM
divided by number of elements in losses. This reduction type is not supported when used withtf.distribute.Strategy
outside of built-in training loops liketf.keras
compile
/fit
.
You can implement 'SUM_OVER_BATCH_SIZE' using global batch size like:
with strategy.scope():
loss_obj = tf.keras.losses.CategoricalCrossentropy(
reduction=tf.keras.losses.Reduction.NONE)
....
loss = tf.reduce_sum(loss_obj(labels, predictions)) *
(1. / global_batch_size)
Please see the custom training guide # pylint: disable=line-too-long for more details on this.
SparseCategoricalCrossentropy class
tensorflow.keras.losses.SparseCategoricalCrossentropy(
from_logits=False, reduction="auto", name="sparse_categorical_crossentropy"
)
Computes the crossentropy loss between the labels and predictions.
Use this crossentropy loss function when there are two or more label classes.
We expect labels to be provided as integers. If you want to provide labels
using one-hot
representation, please use CategoricalCrossentropy
loss.
There should be # classes
floating point values per feature for y_pred
and a single floating point value per feature for y_true
.
In the snippet below, there is a single floating point value per example for
y_true
and # classes
floating pointing values per example for y_pred
.
The shape of y_true
is [batch_size]
and the shape of y_pred
is
[batch_size, num_classes]
.
Standalone usage:
y_true = [1, 2] y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
Using 'auto'/'sum_over_batch_size' reduction type.
scce = tf.keras.losses.SparseCategoricalCrossentropy() scce(y_true, y_pred).numpy() 1.177
Calling with 'sample_weight'.
scce(y_true, y_pred, sample_weight=tf.constant([0.3, 0.7])).numpy() 0.814
Using 'sum' reduction type.
scce = tf.keras.losses.SparseCategoricalCrossentropy( ... reduction=tf.keras.losses.Reduction.SUM) scce(y_true, y_pred).numpy() 2.354
Using 'none' reduction type.
scce = tf.keras.losses.SparseCategoricalCrossentropy( ... reduction=tf.keras.losses.Reduction.NONE) scce(y_true, y_pred).numpy() array([0.0513, 2.303], dtype=float32)
Usage with the compile()
API:
model.compile(optimizer='sgd',
loss=tf.keras.losses.SparseCategoricalCrossentropy())
SquaredHinge class
tensorflow.keras.losses.SquaredHinge(reduction="auto", name="squared_hinge")
Computes the squared hinge loss between y_true
and y_pred
.
loss = square(maximum(1 - y_true * y_pred, 0))
y_true
values are expected to be -1 or 1. If binary (0 or 1) labels are
provided we will convert them to -1 or 1.
Standalone usage:
y_true = [[0., 1.], [0., 0.]] y_pred = [[0.6, 0.4], [0.4, 0.6]]
Using 'auto'/'sum_over_batch_size' reduction type.
h = tf.keras.losses.SquaredHinge() h(y_true, y_pred).numpy() 1.86
Calling with 'sample_weight'.
h(y_true, y_pred, sample_weight=[1, 0]).numpy() 0.73
Using 'sum' reduction type.
h = tf.keras.losses.SquaredHinge( ... reduction=tf.keras.losses.Reduction.SUM) h(y_true, y_pred).numpy() 3.72
Using 'none' reduction type.
h = tf.keras.losses.SquaredHinge( ... reduction=tf.keras.losses.Reduction.NONE) h(y_true, y_pred).numpy() array([1.46, 2.26], dtype=float32)
Usage with the compile()
API:
model.compile(optimizer='sgd', loss=tf.keras.losses.SquaredHinge())
binary_crossentropy function
tensorflow.keras.losses.binary_crossentropy(y_true, y_pred, from_logits=False, label_smoothing=0)
Computes the binary crossentropy loss.
Standalone usage:
y_true = [[0, 1], [0, 0]] y_pred = [[0.6, 0.4], [0.4, 0.6]] loss = tf.keras.losses.binary_crossentropy(y_true, y_pred) assert loss.shape == (2,) loss.numpy() array([0.916 , 0.714], dtype=float32)
Args:
y_true: Ground truth values. shape = [batch_size, d0, .. dN]
.
y_pred: The predicted values. shape = [batch_size, d0, .. dN]
.
from_logits: Whether y_pred
is expected to be a logits tensor. By default,
we assume that y_pred
encodes a probability distribution.
label_smoothing: Float in [0, 1]. If > 0
then smooth the labels.
Returns:
Binary crossentropy loss value. shape = [batch_size, d0, .. dN-1]
.
categorical_crossentropy function
tensorflow.keras.losses.categorical_crossentropy(y_true, y_pred, from_logits=False, label_smoothing=0)
Computes the categorical crossentropy loss.
Standalone usage:
y_true = [[0, 1, 0], [0, 0, 1]] y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]] loss = tf.keras.losses.categorical_crossentropy(y_true, y_pred) assert loss.shape == (2,) loss.numpy() array([0.0513, 2.303], dtype=float32)
Args:
y_true: Tensor of one-hot true targets.
y_pred: Tensor of predicted targets.
from_logits: Whether y_pred
is expected to be a logits tensor. By default,
we assume that y_pred
encodes a probability distribution.
label_smoothing: Float in [0, 1]. If > 0
then smooth the labels.
Returns: Categorical crossentropy loss value.
categorical_hinge function
tensorflow.keras.losses.categorical_hinge(y_true, y_pred)
Computes the categorical hinge loss between y_true
and y_pred
.
loss = maximum(neg - pos + 1, 0)
where neg=maximum((1-y_true)*y_pred) and pos=sum(y_true*y_pred)
Standalone usage:
y_true = np.random.randint(0, 3, size=(2,)) y_true = tf.keras.utils.to_categorical(y_true, num_classes=3) y_pred = np.random.random(size=(2, 3)) loss = tf.keras.losses.categorical_hinge(y_true, y_pred) assert loss.shape == (2,) pos = np.sum(y_true * y_pred, axis=-1) neg = np.amax((1. - y_true) * y_pred, axis=-1) assert np.array_equal(loss.numpy(), np.maximum(0., neg - pos + 1.))
Args:
y_true: The ground truth values. y_true
values are expected to be 0 or 1.
y_pred: The predicted values.
Returns: Categorical hinge loss values.
cosine_similarity function
tensorflow.keras.losses.cosine_similarity(y_true, y_pred, axis=-1)
Computes the cosine similarity between labels and predictions.
Note that it is a number between -1 and 1. When it is a negative number
between -1 and 0, 0 indicates orthogonality and values closer to -1
indicate greater similarity. The values closer to 1 indicate greater
dissimilarity. This makes it usable as a loss function in a setting
where you try to maximize the proximity between predictions and
targets. If either y_true
or y_pred
is a zero vector, cosine
similarity will be 0 regardless of the proximity between predictions
and targets.
loss = -sum(l2_norm(y_true) * l2_norm(y_pred))
Standalone usage:
y_true = [[0., 1.], [1., 1.], [1., 1.]] y_pred = [[1., 0.], [1., 1.], [-1., -1.]] loss = tf.keras.losses.cosine_similarity(y_true, y_pred, axis=1) loss.numpy() array([-0., -0.999, 0.999], dtype=float32)
Args: y_true: Tensor of true targets. y_pred: Tensor of predicted targets. axis: Axis along which to determine similarity.
Returns: Cosine similarity tensor.
deserialize function
tensorflow.keras.losses.deserialize(name, custom_objects=None)
Deserializes a serialized loss class/function instance.
Arguments: name: Loss configuration. custom_objects: Optional dictionary mapping names (strings) to custom objects (classes and functions) to be considered during deserialization.
Returns:
A Keras Loss
instance or a loss function.
get function
tensorflow.keras.losses.get(identifier)
Retrieves a Keras loss as a function
/Loss
class instance.
The identifier
may be the string name of a loss function or Loss
class.
loss = tf.keras.losses.get("categorical_crossentropy") type(loss)
loss = tf.keras.losses.get("CategoricalCrossentropy") type(loss)
You can also specify config
of the loss to this function by passing dict
containing class_name
and config
as an identifier. Also note that the
class_name
must map to a Loss
class
identifier = {"class_name": "CategoricalCrossentropy", ... "config": {"from_logits": True}} loss = tf.keras.losses.get(identifier) type(loss)
Arguments: identifier: A loss identifier. One of None or string name of a loss function/class or loss configuration dictionary or a loss function or a loss class instance
Returns:
A Keras loss as a function
/ Loss
class instance.
Raises:
ValueError: If identifier
cannot be interpreted.
hinge function
tensorflow.keras.losses.hinge(y_true, y_pred)
Computes the hinge loss between y_true
and y_pred
.
loss = mean(maximum(1 - y_true * y_pred, 0), axis=-1)
Standalone usage:
y_true = np.random.choice([-1, 1], size=(2, 3)) y_pred = np.random.random(size=(2, 3)) loss = tf.keras.losses.hinge(y_true, y_pred) assert loss.shape == (2,) assert np.array_equal( ... loss.numpy(), ... np.mean(np.maximum(1. - y_true * y_pred, 0.), axis=-1))
Args:
y_true: The ground truth values. y_true
values are expected to be -1 or 1.
If binary (0 or 1) labels are provided they will be converted to -1 or 1.
shape = [batch_size, d0, .. dN]
.
y_pred: The predicted values. shape = [batch_size, d0, .. dN]
.
Returns:
Hinge loss values. shape = [batch_size, d0, .. dN-1]
.
huber function
tensorflow.keras.losses.huber(y_true, y_pred, delta=1.0)
Computes Huber loss value.
For each value x in error = y_true - y_pred
:
loss = 0.5 * x^2 if |x| <= d
loss = 0.5 * d^2 + d * (|x| - d) if |x| > d
where d is delta
. See: https://en.wikipedia.org/wiki/Huber_loss
Args: y_true: tensor of true targets. y_pred: tensor of predicted targets. delta: A float, the point where the Huber loss function changes from a quadratic to linear.
Returns: Tensor with one scalar loss entry per sample.
kl_divergence function
tensorflow.keras.losses.kl_divergence(y_true, y_pred)
Computes Kullback-Leibler divergence loss between y_true
and y_pred
.
loss = y_true * log(y_true / y_pred)
See: https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
Standalone usage:
y_true = np.random.randint(0, 2, size=(2, 3)).astype(np.float64) y_pred = np.random.random(size=(2, 3)) loss = tf.keras.losses.kullback_leibler_divergence(y_true, y_pred) assert loss.shape == (2,) y_true = tf.keras.backend.clip(y_true, 1e-7, 1) y_pred = tf.keras.backend.clip(y_pred, 1e-7, 1) assert np.array_equal( ... loss.numpy(), np.sum(y_true * np.log(y_true / y_pred), axis=-1))
Args: y_true: Tensor of true targets. y_pred: Tensor of predicted targets.
Returns:
A Tensor
with loss.
Raises:
TypeError: If y_true
cannot be cast to the y_pred.dtype
.
kl_divergence function
tensorflow.keras.losses.kld(y_true, y_pred)
Computes Kullback-Leibler divergence loss between y_true
and y_pred
.
loss = y_true * log(y_true / y_pred)
See: https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
Standalone usage:
y_true = np.random.randint(0, 2, size=(2, 3)).astype(np.float64) y_pred = np.random.random(size=(2, 3)) loss = tf.keras.losses.kullback_leibler_divergence(y_true, y_pred) assert loss.shape == (2,) y_true = tf.keras.backend.clip(y_true, 1e-7, 1) y_pred = tf.keras.backend.clip(y_pred, 1e-7, 1) assert np.array_equal( ... loss.numpy(), np.sum(y_true * np.log(y_true / y_pred), axis=-1))
Args: y_true: Tensor of true targets. y_pred: Tensor of predicted targets.
Returns:
A Tensor
with loss.
Raises:
TypeError: If y_true
cannot be cast to the y_pred.dtype
.
kl_divergence function
tensorflow.keras.losses.kullback_leibler_divergence(y_true, y_pred)
Computes Kullback-Leibler divergence loss between y_true
and y_pred
.
loss = y_true * log(y_true / y_pred)
See: https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
Standalone usage:
y_true = np.random.randint(0, 2, size=(2, 3)).astype(np.float64) y_pred = np.random.random(size=(2, 3)) loss = tf.keras.losses.kullback_leibler_divergence(y_true, y_pred) assert loss.shape == (2,) y_true = tf.keras.backend.clip(y_true, 1e-7, 1) y_pred = tf.keras.backend.clip(y_pred, 1e-7, 1) assert np.array_equal( ... loss.numpy(), np.sum(y_true * np.log(y_true / y_pred), axis=-1))
Args: y_true: Tensor of true targets. y_pred: Tensor of predicted targets.
Returns:
A Tensor
with loss.
Raises:
TypeError: If y_true
cannot be cast to the y_pred.dtype
.
log_cosh function
tensorflow.keras.losses.log_cosh(y_true, y_pred)
Logarithm of the hyperbolic cosine of the prediction error.
log(cosh(x))
is approximately equal to (x ** 2) / 2
for small x
and
to abs(x) - log(2)
for large x
. This means that 'logcosh' works mostly
like the mean squared error, but will not be so strongly affected by the
occasional wildly incorrect prediction.
Standalone usage:
y_true = np.random.random(size=(2, 3)) y_pred = np.random.random(size=(2, 3)) loss = tf.keras.losses.logcosh(y_true, y_pred) assert loss.shape == (2,) x = y_pred - y_true assert np.allclose( ... loss.numpy(), ... np.mean(x + np.log(np.exp(-2. * x) + 1.) - math_ops.log(2.), axis=-1), ... atol=1e-5)
Args:
y_true: Ground truth values. shape = [batch_size, d0, .. dN]
.
y_pred: The predicted values. shape = [batch_size, d0, .. dN]
.
Returns:
Logcosh error values. shape = [batch_size, d0, .. dN-1]
.
log_cosh function
tensorflow.keras.losses.logcosh(y_true, y_pred)
Logarithm of the hyperbolic cosine of the prediction error.
log(cosh(x))
is approximately equal to (x ** 2) / 2
for small x
and
to abs(x) - log(2)
for large x
. This means that 'logcosh' works mostly
like the mean squared error, but will not be so strongly affected by the
occasional wildly incorrect prediction.
Standalone usage:
y_true = np.random.random(size=(2, 3)) y_pred = np.random.random(size=(2, 3)) loss = tf.keras.losses.logcosh(y_true, y_pred) assert loss.shape == (2,) x = y_pred - y_true assert np.allclose( ... loss.numpy(), ... np.mean(x + np.log(np.exp(-2. * x) + 1.) - math_ops.log(2.), axis=-1), ... atol=1e-5)
Args:
y_true: Ground truth values. shape = [batch_size, d0, .. dN]
.
y_pred: The predicted values. shape = [batch_size, d0, .. dN]
.
Returns:
Logcosh error values. shape = [batch_size, d0, .. dN-1]
.
mean_absolute_error function
tensorflow.keras.losses.mae(y_true, y_pred)
Computes the mean absolute error between labels and predictions.
loss = mean(abs(y_true - y_pred), axis=-1)
Standalone usage:
y_true = np.random.randint(0, 2, size=(2, 3)) y_pred = np.random.random(size=(2, 3)) loss = tf.keras.losses.mean_absolute_error(y_true, y_pred) assert loss.shape == (2,) assert np.array_equal( ... loss.numpy(), np.mean(np.abs(y_true - y_pred), axis=-1))
Args:
y_true: Ground truth values. shape = [batch_size, d0, .. dN]
.
y_pred: The predicted values. shape = [batch_size, d0, .. dN]
.
Returns:
Mean absolute error values. shape = [batch_size, d0, .. dN-1]
.
mean_absolute_percentage_error function
tensorflow.keras.losses.mape(y_true, y_pred)
Computes the mean absolute percentage error between y_true
and y_pred
.
loss = 100 * mean(abs((y_true - y_pred) / y_true), axis=-1)
Standalone usage:
y_true = np.random.random(size=(2, 3)) y_true = np.maximum(y_true, 1e-7) # Prevent division by zero y_pred = np.random.random(size=(2, 3)) loss = tf.keras.losses.mean_absolute_percentage_error(y_true, y_pred) assert loss.shape == (2,) assert np.array_equal( ... loss.numpy(), ... 100. * np.mean(np.abs((y_true - y_pred) / y_true), axis=-1))
Args:
y_true: Ground truth values. shape = [batch_size, d0, .. dN]
.
y_pred: The predicted values. shape = [batch_size, d0, .. dN]
.
Returns:
Mean absolute percentage error values. shape = [batch_size, d0, .. dN-1]
.
mean_absolute_error function
tensorflow.keras.losses.mean_absolute_error(y_true, y_pred)
Computes the mean absolute error between labels and predictions.
loss = mean(abs(y_true - y_pred), axis=-1)
Standalone usage:
y_true = np.random.randint(0, 2, size=(2, 3)) y_pred = np.random.random(size=(2, 3)) loss = tf.keras.losses.mean_absolute_error(y_true, y_pred) assert loss.shape == (2,) assert np.array_equal( ... loss.numpy(), np.mean(np.abs(y_true - y_pred), axis=-1))
Args:
y_true: Ground truth values. shape = [batch_size, d0, .. dN]
.
y_pred: The predicted values. shape = [batch_size, d0, .. dN]
.
Returns:
Mean absolute error values. shape = [batch_size, d0, .. dN-1]
.
mean_absolute_percentage_error function
tensorflow.keras.losses.mean_absolute_percentage_error(y_true, y_pred)
Computes the mean absolute percentage error between y_true
and y_pred
.
loss = 100 * mean(abs((y_true - y_pred) / y_true), axis=-1)
Standalone usage:
y_true = np.random.random(size=(2, 3)) y_true = np.maximum(y_true, 1e-7) # Prevent division by zero y_pred = np.random.random(size=(2, 3)) loss = tf.keras.losses.mean_absolute_percentage_error(y_true, y_pred) assert loss.shape == (2,) assert np.array_equal( ... loss.numpy(), ... 100. * np.mean(np.abs((y_true - y_pred) / y_true), axis=-1))
Args:
y_true: Ground truth values. shape = [batch_size, d0, .. dN]
.
y_pred: The predicted values. shape = [batch_size, d0, .. dN]
.
Returns:
Mean absolute percentage error values. shape = [batch_size, d0, .. dN-1]
.
mean_squared_error function
tensorflow.keras.losses.mean_squared_error(y_true, y_pred)
Computes the mean squared error between labels and predictions.
After computing the squared distance between the inputs, the mean value over the last dimension is returned.
loss = mean(square(y_true - y_pred), axis=-1)
Standalone usage:
y_true = np.random.randint(0, 2, size=(2, 3)) y_pred = np.random.random(size=(2, 3)) loss = tf.keras.losses.mean_squared_error(y_true, y_pred) assert loss.shape == (2,) assert np.array_equal( ... loss.numpy(), np.mean(np.square(y_true - y_pred), axis=-1))
Args:
y_true: Ground truth values. shape = [batch_size, d0, .. dN]
.
y_pred: The predicted values. shape = [batch_size, d0, .. dN]
.
Returns:
Mean squared error values. shape = [batch_size, d0, .. dN-1]
.
mean_squared_logarithmic_error function
tensorflow.keras.losses.mean_squared_logarithmic_error(y_true, y_pred)
Computes the mean squared logarithmic error between y_true
and y_pred
.
loss = mean(square(log(y_true + 1) - log(y_pred + 1)), axis=-1)
Standalone usage:
y_true = np.random.randint(0, 2, size=(2, 3)) y_pred = np.random.random(size=(2, 3)) loss = tf.keras.losses.mean_squared_logarithmic_error(y_true, y_pred) assert loss.shape == (2,) y_true = np.maximum(y_true, 1e-7) y_pred = np.maximum(y_pred, 1e-7) assert np.allclose( ... loss.numpy(), ... np.mean( ... np.square(np.log(y_true + 1.) - np.log(y_pred + 1.)), axis=-1))
Args:
y_true: Ground truth values. shape = [batch_size, d0, .. dN]
.
y_pred: The predicted values. shape = [batch_size, d0, .. dN]
.
Returns:
Mean squared logarithmic error values. shape = [batch_size, d0, .. dN-1]
.
mean_squared_error function
tensorflow.keras.losses.mse(y_true, y_pred)
Computes the mean squared error between labels and predictions.
After computing the squared distance between the inputs, the mean value over the last dimension is returned.
loss = mean(square(y_true - y_pred), axis=-1)
Standalone usage:
y_true = np.random.randint(0, 2, size=(2, 3)) y_pred = np.random.random(size=(2, 3)) loss = tf.keras.losses.mean_squared_error(y_true, y_pred) assert loss.shape == (2,) assert np.array_equal( ... loss.numpy(), np.mean(np.square(y_true - y_pred), axis=-1))
Args:
y_true: Ground truth values. shape = [batch_size, d0, .. dN]
.
y_pred: The predicted values. shape = [batch_size, d0, .. dN]
.
Returns:
Mean squared error values. shape = [batch_size, d0, .. dN-1]
.
mean_squared_logarithmic_error function
tensorflow.keras.losses.msle(y_true, y_pred)
Computes the mean squared logarithmic error between y_true
and y_pred
.
loss = mean(square(log(y_true + 1) - log(y_pred + 1)), axis=-1)
Standalone usage:
y_true = np.random.randint(0, 2, size=(2, 3)) y_pred = np.random.random(size=(2, 3)) loss = tf.keras.losses.mean_squared_logarithmic_error(y_true, y_pred) assert loss.shape == (2,) y_true = np.maximum(y_true, 1e-7) y_pred = np.maximum(y_pred, 1e-7) assert np.allclose( ... loss.numpy(), ... np.mean( ... np.square(np.log(y_true + 1.) - np.log(y_pred + 1.)), axis=-1))
Args:
y_true: Ground truth values. shape = [batch_size, d0, .. dN]
.
y_pred: The predicted values. shape = [batch_size, d0, .. dN]
.
Returns:
Mean squared logarithmic error values. shape = [batch_size, d0, .. dN-1]
.
poisson function
tensorflow.keras.losses.poisson(y_true, y_pred)
Computes the Poisson loss between y_true and y_pred.
The Poisson loss is the mean of the elements of the Tensor
y_pred - y_true * log(y_pred)
.
Standalone usage:
y_true = np.random.randint(0, 2, size=(2, 3)) y_pred = np.random.random(size=(2, 3)) loss = tf.keras.losses.poisson(y_true, y_pred) assert loss.shape == (2,) y_pred = y_pred + 1e-7 assert np.allclose( ... loss.numpy(), np.mean(y_pred - y_true * np.log(y_pred), axis=-1), ... atol=1e-5)
Args:
y_true: Ground truth values. shape = [batch_size, d0, .. dN]
.
y_pred: The predicted values. shape = [batch_size, d0, .. dN]
.
Returns:
Poisson loss value. shape = [batch_size, d0, .. dN-1]
.
Raises:
InvalidArgumentError: If y_true
and y_pred
have incompatible shapes.
serialize function
tensorflow.keras.losses.serialize(loss)
Serializes loss function or Loss
instance.
Arguments:
loss: A Keras Loss
instance or a loss function.
Returns: Loss configuration dictionary.
sparse_categorical_crossentropy function
tensorflow.keras.losses.sparse_categorical_crossentropy(y_true, y_pred, from_logits=False, axis=-1)
Computes the sparse categorical crossentropy loss.
Standalone usage:
y_true = [1, 2] y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]] loss = tf.keras.losses.sparse_categorical_crossentropy(y_true, y_pred) assert loss.shape == (2,) loss.numpy() array([0.0513, 2.303], dtype=float32)
Args:
y_true: Ground truth values.
y_pred: The predicted values.
from_logits: Whether y_pred
is expected to be a logits tensor. By default,
we assume that y_pred
encodes a probability distribution.
axis: (Optional) Defaults to -1. The dimension along which the entropy is
computed.
Returns: Sparse categorical crossentropy loss value.
squared_hinge function
tensorflow.keras.losses.squared_hinge(y_true, y_pred)
Computes the squared hinge loss between y_true
and y_pred
.
loss = mean(square(maximum(1 - y_true * y_pred, 0)), axis=-1)
Standalone usage:
y_true = np.random.choice([-1, 1], size=(2, 3)) y_pred = np.random.random(size=(2, 3)) loss = tf.keras.losses.squared_hinge(y_true, y_pred) assert loss.shape == (2,) assert np.array_equal( ... loss.numpy(), ... np.mean(np.square(np.maximum(1. - y_true * y_pred, 0.)), axis=-1))
Args:
y_true: The ground truth values. y_true
values are expected to be -1 or 1.
If binary (0 or 1) labels are provided we will convert them to -1 or 1.
shape = [batch_size, d0, .. dN]
.
y_pred: The predicted values. shape = [batch_size, d0, .. dN]
.
Returns:
Squared hinge loss values. shape = [batch_size, d0, .. dN-1]
.