PyTorch 学习 -6- 损失函数

本文最后更新于：2024年5月7日下午

模型学习的根源在于需要知道当前模型的问题出在哪，为模型优化指明方向和距离就需要依靠损失函数，本文介绍 Pytorch 的损失函数。

参考深入浅出PyTorch ，系统补齐基础知识。

本节目录

在深度学习中常见的损失函数及其定义方式
PyTorch中损失函数的调用

二分类交叉熵损失函数

1	`torch.nn.BCELoss(weight=None, size_average=None, reduce=None, reduction='mean')`

功能：计算二分类任务时的交叉熵（Cross Entropy）函数。在二分类中，label是{0,1}。对于进入交叉熵函数的input为概率分布的形式。一般来说，input为sigmoid激活层的输出，或者softmax的输出。

主要参数：

weight: 每个类别的loss设置权值

size_average: 数据为bool，为True时，返回的 loss 为平均值；为False时，返回的各样本的 loss 之和。这个参数已经被重命名为 reduction，将在将来的版本中删除。请使用 reduction 参数代替。

reduce: 数据类型为bool，为True时，loss的返回是标量。

核心实现：

1 2	`def forward(self, input: Tensor, target: Tensor) -> Tensor: return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)`

因此使用时，第一个参数为输入数据 $d$，第二个参数是目标数据 $t$，则 loss 为：
$$
loss=t\log \frac{1}{d}+(1-t)\log\frac{1}{1-d}
$$

import torch
from torch import nn
import numpy as np

loss = nn.BCELoss()
m = nn.Sigmoid()

data = torch.tensor([0.0], requires_grad=True)
target = torch.ones(1)

l = loss(m(data), target)

print(l)
print(np.log(2))

pass


-->
tensor(0.6931, grad_fn=<BinaryCrossEntropyBackward0>)
0.6931471805599453

交叉熵损失函数

1	`torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')`

功能：计算交叉熵函数

主要参数：

weight:每个类别的loss设置权值。

size_average:数据为bool，为True时，返回的loss为平均值；为False时，返回的各样本的loss之和。

ignore_index:忽略某个类的损失函数。

reduce:数据类型为bool，为True时，loss的返回是标量。

import torch
import torch.nn as nn

x_input=torch.randn(3,5)#随机生成输入 
print('x_input:\n',x_input) 
y_target=torch.tensor([4,2,0])#设置输出具体值 print('y_target\n',y_target)

#计算输入softmax，此时可以看到每一行加到一起结果都是1
softmax_func=nn.Softmax(dim=1)
soft_output=softmax_func(x_input)
print('soft_output:\n',soft_output)

#在softmax的基础上取log
log_output=torch.log(soft_output)
print('log_output:\n',log_output)

#对比softmax与log的结合与nn.LogSoftmaxloss(负对数似然损失)的输出结果，发现两者是一致的。
logsoftmax_func=nn.LogSoftmax(dim=1)
logsoftmax_output=logsoftmax_func(x_input)
print('logsoftmax_output:\n',logsoftmax_output)

#pytorch中关于NLLLoss的默认参数配置为：reducetion=True、size_average=True
nllloss_func=nn.NLLLoss(reduction="none")
nlloss_output=nllloss_func(logsoftmax_output,y_target)
print('nlloss_output:\n',nlloss_output)

#直接使用pytorch中的loss_func=nn.CrossEntropyLoss()看与经过NLLLoss的计算是不是一样
crossentropyloss=nn.CrossEntropyLoss(reduction="none")
crossentropyloss_output=crossentropyloss(x_input,y_target)
print('crossentropyloss_output:\n',crossentropyloss_output)

输出：

x_input:
 tensor([[-1.7327, -0.1885, -0.7649,  0.8701,  0.4981],
        [-2.1903,  0.5137, -0.3262,  0.1239,  0.0126],
        [ 0.8400,  1.4696, -0.2860, -2.8149, -0.3208]])
soft_output:
 tensor([[0.0321, 0.1505, 0.0846, 0.4338, 0.2990],
        [0.0241, 0.3595, 0.1552, 0.2435, 0.2178],
        [0.2825, 0.5302, 0.0916, 0.0073, 0.0885]])
log_output:
 tensor([[-3.4380, -1.8939, -2.4702, -0.8352, -1.2072],
        [-3.7271, -1.0231, -1.8630, -1.4128, -1.5242],
        [-1.2643, -0.6346, -2.3902, -4.9191, -2.4250]])
logsoftmax_output:
 tensor([[-3.4380, -1.8939, -2.4702, -0.8352, -1.2072],
        [-3.7271, -1.0231, -1.8630, -1.4128, -1.5242],
        [-1.2643, -0.6346, -2.3902, -4.9191, -2.4250]])
nlloss_output:
 tensor([1.2072, 1.8630, 1.2643])
crossentropyloss_output:
 tensor([1.2072, 1.8630, 1.2643])

L1损失函数

1	`torch.nn.L1Loss(size_average=None, reduce=None, reduction='mean')`

功能： 计算输出y和真实标签target之间的差值的绝对值。

我们需要知道的是，reduction参数决定了计算模式。有三种计算模式可选：none：逐个元素计算。 sum：所有元素求和，返回标量。 mean：加权平均，返回标量。如果选择none，那么返回的结果是和输入元素相同尺寸的。默认计算方式是求平均。

计算公式如下： $L_{n}=\left|x_{n}-y_{n}\right| $

import torch
import torch.nn as nn

data = torch.randn([2,4], requires_grad=True)
target = torch.empty([2,4]).random_(2)

print(data)
print(target)

loss = nn.L1Loss(reduction="none")
res = loss(data, target)

print(res)
pass

输出：

tensor([[ 0.7438, -0.7181,  1.7000,  0.2125],
        [-0.8243,  1.0593, -1.5408, -0.9641]], requires_grad=True)
tensor([[0., 1., 1., 1.],
        [1., 0., 1., 1.]])
tensor([[0.7438, 1.7181, 0.7000, 0.7875],
        [1.8243, 1.0593, 2.5408, 1.9641]], grad_fn=<AbsBackward0>)

MSE损失函数

1	`torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')`

功能： 计算输出y和真实标签target之差的平方。

和L1Loss一样，MSELoss损失函数中，reduction参数决定了计算模式。有三种计算模式可选：none：逐个元素计算。 sum：所有元素求和，返回标量。默认计算方式是求平均。

计算公式： $ l_{n}=\left(x_{n}-y_{n}\right)^{2} $

loss = nn.MSELoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
output.backward()

print('MSE损失函数的计算结果为',output)

-->
MSE损失函数的计算结果为 tensor(1.6968, grad_fn=<MseLossBackward>)

平滑L1 (Smooth L1)损失函数

1	`torch.nn.SmoothL1Loss(size_average=None, reduce=None, reduction='mean', beta=1.0)`

功能： L1的平滑输出，其功能是减轻离群点带来的影响

reduction参数决定了计算模式。有三种计算模式可选：none：逐个元素计算。 sum：所有元素求和，返回标量。默认计算方式是求平均 mean。

提醒： 之后的损失函数中，关于reduction 这个参数依旧会存在。所以，之后就不再单独说明。

计算公式如下：

$$ \operatorname{loss}(x, y)=\frac{1}{n} \sum_{i=1}^{n} z_{i} 其中， z_{i}=\left\{\begin{array}{ll}0.5\left(x_{i}-y_{i}\right)^{2}, & \text { if }\left|x_{i}-y_{i}\right|<1 \\ \left|x_{i}-y_{i}\right|-0.5, & \text { otherwise }\end{array}\right. $$

loss = nn.SmoothL1Loss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
output.backward()

print('SmoothL1Loss损失函数的计算结果为',output)

-->
SmoothL1Loss损失函数的计算结果为 tensor(0.7808, grad_fn=<SmoothL1LossBackward>)

平滑L1与L1的对比

这里我们通过可视化两种损失函数曲线来对比平滑L1和L1两种损失函数的区别。

inputs = torch.linspace(-10, 10, steps=5000)
target = torch.zeros_like(inputs)

loss_f_smooth = nn.SmoothL1Loss(reduction='none')
loss_smooth = loss_f_smooth(inputs, target)
loss_f_l1 = nn.L1Loss(reduction='none')
loss_l1 = loss_f_l1(inputs,target)

plt.plot(inputs.numpy(), loss_smooth.numpy(), label='Smooth L1 Loss')
plt.plot(inputs.numpy(), loss_l1, label='L1 loss')
plt.xlabel('x_i - y_i')
plt.ylabel('loss value')
plt.legend()
plt.grid()
plt.show()

可以看出，对于smoothL1来说，在 0 这个尖端处，过渡更为平滑。

目标泊松分布的负对数似然损失

1	`torch.nn.PoissonNLLLoss(log_input=True, full=False, size_average=None, eps=1e-08, reduce=None, reduction='mean')`

功能： 泊松分布的负对数似然损失函数，针对的是神经网络输出为泊松分布参数 $\lambda$ 时损失计算的情况。由于输出为 $\lambda$ 而不是概率值，因此需要将该值转化为概率。

主要参数：

log_input：输入是否为对数形式，决定计算公式。

full：计算所有 loss，默认为 False。表示loss计算是否保留 $log(y_{n}!)%$ 如果保留使用

- 当 $ \mathrm{y}_{\mathrm{n}} \leq 1, \log \left(\mathrm{y}_{\mathrm{n}} !\right) $ 近似为 0 - 当 $ \mathrm{y}_{\mathrm{n}}>1 $ ，使用斯特林公式(Stirling’s formula) ，$ \log \left(\mathrm{y}_{\mathrm{n}} !\right) $ 近似为 $ \mathrm{y}_{\mathrm{n}} * \log \left(\mathrm{y}_{\mathrm{n}}\right)-\mathrm{y}_{\mathrm{n}}+0.5 * \log \left(2 \pi \mathrm{y}_{\mathrm{n}}\right) $.

eps：修正项，避免 input 为 0 时，$log(input)$ 为 nan 的情况。

原理：

泊松分布概率计算公式：
$$
\mathrm{P}(\mathrm{Y}=\mathrm{k})=\frac{\lambda^{\mathrm{k}}}{\mathrm{k} !} \mathrm{e}^{-\lambda}
$$
对于包含 $N$ 个样本的 batch 数据 $D ( x , y )$，$y$ 是样本对应的类别标签，服从泊松分布。$x$ 与 $y$ 的维度相同。

当网络输出参数为 $x_n$ 时，若该样本对应的标签为 $y_n$:

若 $x$ 是神经网络的输出，且未进行对数化处理。第 $n$ 个样本对应的损失 $l_{n}$ 为：

$$ \mathrm{P}\left(\mathrm{Y}=\mathrm{y}_{\mathrm{n}}\right)=\frac{\mathrm{x}_{\mathrm{n}}^{\mathrm{y}_{\mathrm{n}}}}{\mathrm{y}_{\mathrm{n}} !} \mathrm{e}^{-\mathrm{x}_{\mathrm{n}}} $$ $$ \mathrm{l}_{\mathrm{n}}=-\log \mathrm{P}\left(\mathrm{Y}=\mathrm{y}_{\mathrm{n}}\right)=\mathrm{x}_{\mathrm{n}}-\mathrm{y}_{\mathrm{n}} \log \mathrm{x}_{\mathrm{n}}+\log \left(\mathrm{y}_{\mathrm{n}} !\right) $$

若 $x$ 是神经网络的输出，且进行过了对数化处理。第 $n$ 个样本对应的损失 $l_{n}$ 为：

$ \mathrm{x}_{\mathrm{n}} $ 替换为 $ \exp \left(\mathrm{x}_{\mathrm{n}}\right) $

$$ \mathrm{l}_{\mathrm{n}}=-\log \mathrm{P}\left(\mathrm{Y}=\mathrm{y}_{\mathrm{n}}\right)=\exp \left(\mathrm{x}_{\mathrm{n}}\right)-\mathrm{y}_{\mathrm{n}} \mathrm{x}_{\mathrm{n}}+\log \left(\mathrm{y}_{\mathrm{n}} !\right) $$

最后一项$ log(y_{n}!)$可以省略或者用斯特林公式(Stirling’s formula)近似。

数学公式：

当参数log_input=True： $ \operatorname{loss}\left(x_{n}, y_{n}\right)=e^{x_{n}}-x_{n} \cdot y_{n} $
当参数log_input=False：$ \operatorname{loss}\left(x_{n}, y_{n}\right)=x_{n}-y_{n} \cdot \log \left(x_{n}+\right. eps ) $

import torch
import matplotlib.pyplot as plt
import torch.nn as nn


loss = nn.PoissonNLLLoss(reduction='none')
log_input = torch.randn(5, 2, requires_grad=True)
target = torch.empty(5,2).random_(5)

output = loss(log_input, target)


print('PoissonNLLLoss损失函数的计算结果为',output)

-->
PoissonNLLLoss损失函数的计算结果为 tensor([[1.8573, 2.2177],
        [1.9914, 1.0427],
        [4.5823, 0.8821],
        [4.5176, 1.0008],
        [2.6423, 0.3343]], grad_fn=<SubBackward0>)

KL散度

1	`torch.nn.KLDivLoss(size_average=None, reduce=None, reduction='mean', log_target=False)`

功能： 计算KL散度，也就是计算相对熵。用于连续分布的距离度量，并且对离散采用的连续输出空间分布进行回归通常很有用。

主要参数:

reduction：计算模式，可为 none/sum/mean/batchmean。

none：逐个元素计算。

sum：所有元素求和，返回标量。

mean：加权平均，返回标量。

batchmean：batchsize 维度求平均值。

计算公式：

$$ \begin{aligned} D_{\mathrm{KL}}(P, Q)=\mathrm{E}_{X \sim P}\left[\log \frac{P(X)}{Q(X)}\right] & =\mathrm{E}_{X \sim P}[\log P(X)-\log Q(X)] \\ & =\sum_{i=1}^{n} P\left(x_{i}\right)\left(\log P\left(x_{i}\right)-\log Q\left(x_{i}\right)\right)\end{aligned} $$

使用流程：

使用时输入为 input 和 target，其中 target 相当于公式中的 $P$，此处的 target 为概率值， input 为概率的对数结果，因此其实计算的是 $\sum target \times (\ln target -input)$

import torch.nn as nn
import torch
import torch.nn.functional as F

x = torch.randn((1, 8))
y = torch.randn((1, 8))
# 先转化为概率，之后取对数
x_log = F.log_softmax(x,dim=1)
# 只转化为概率
y = F.softmax(y,dim=1)
kl = nn.KLDivLoss(reduction='batchmean')
out = kl(x_log, y)
print(x)
print(y)
print(out)


-->
tensor([[-0.9543, -0.4117,  0.0377, -0.3320,  0.2467, -0.4887,  0.1111,  1.2274]])
tensor([[0.0630, 0.0266, 0.0735, 0.2664, 0.1959, 0.1449, 0.1859, 0.0438]])
tensor(0.4630)

验证示例：

import torch
import torch.nn as nn
import math

def validate_loss(output, target):
    val = 0
    for li_x, li_y in zip(output, target):
        for i, xy in enumerate(zip(li_x, li_y)):
            x, y = xy
            loss_val = y * (math.log(y, math.e) - x)
            val += loss_val
    return val / output.nelement()

torch.manual_seed(20)
loss = nn.KLDivLoss()
input = torch.Tensor([[-2, -6, -8], [-7, -1, -2], [-1, -9, -2.3], [-1.9, -2.8, -5.4]])
target = torch.Tensor([[0.8, 0.1, 0.1], [0.1, 0.7, 0.2], [0.5, 0.2, 0.3], [0.4, 0.3, 0.3]])
output = loss(input, target)
print("default loss:", output)

output = validate_loss(input, target)
print("validate loss:", output)

loss = nn.KLDivLoss(reduction="batchmean")
output = loss(input, target)
print("batchmean loss:", output)

loss = nn.KLDivLoss(reduction="mean")
output = loss(input, target)
print("mean loss:", output)

loss = nn.KLDivLoss(reduction="none")
output = loss(input, target)
print("none loss:", output)


-->
default loss: tensor(0.6209)
validate loss: tensor(0.6209)
batchmean loss: tensor(1.8626)
mean loss: tensor(0.6209)
none loss: tensor([[1.4215, 0.3697, 0.5697],
        [0.4697, 0.4503, 0.0781],
        [0.1534, 1.4781, 0.3288],
        [0.3935, 0.4788, 1.2588]])

MarginRankingLoss

1	`torch.nn.MarginRankingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')`

功能： 计算两个向量之间的相似度，用于排序任务。该方法用于计算两组数据之间的差异。

主要参数:

margin：边界值，$ x_{1} $ 与 $ x_{2} $ 之间的差异值。

reduction：计算模式，可为 none/sum/mean。

计算公式：

$ \operatorname{loss}(x 1, x 2, y)=\max (0,-y *(x 1-x 2)+\operatorname{margin}) $

loss = nn.MarginRankingLoss()
input1 = torch.randn(3, requires_grad=True)
input2 = torch.randn(3, requires_grad=True)
target = torch.randn(3).sign()
output = loss(input1, input2, target)
output.backward()

print('MarginRankingLoss损失函数的计算结果为',output)

-->
MarginRankingLoss损失函数的计算结果为 tensor(0.7740, grad_fn=<MeanBackward0>)

多标签边界损失函数

1	`torch.nn.MultiLabelMarginLoss(size_average=None, reduce=None, reduction='mean')`

功能： 对于多标签分类问题计算损失函数。

主要参数:

reduction：计算模式，可为 none/sum/mean。

计算公式： $\operatorname{loss}(x, y)=\sum_{i j} \frac{\max (0,1-x[y[j]]-x[i])}{x \cdot \operatorname{size}(0)} $

其中对于所有的和都有并且其中, $ , i=0, \ldots, x \cdot \operatorname{size}(0), j=0, \ldots, y \cdot \operatorname{size}(0) $, 对于所有的 $ i $ 和 $ j $, 都有 $ y[j] \geq 0 $ 并且 $ i \neq y[j] $

loss = nn.MultiLabelMarginLoss()
x = torch.FloatTensor([[0.9, 0.2, 0.4, 0.8]])
# for target y, only consider labels 3 and 0, not after label -1
y = torch.LongTensor([[3, 0, -1, 1]])# 真实的分类是，第3类和第0类
output = loss(x, y)

print('MultiLabelMarginLoss损失函数的计算结果为',output)

-->
MultiLabelMarginLoss损失函数的计算结果为 tensor(0.4500)

二分类损失函数

1	`torch.nn.SoftMarginLoss(size_average=None, reduce=None, reduction='mean')torch.nn.(size_average=None, reduce=None, reduction='mean')`

功能： 计算二分类的 logistic 损失。

主要参数:

reduction：计算模式，可为 none/sum/mean。

计算公式：$ \operatorname{loss}(x, y)=\sum_{i} \frac{\log (1+\exp (-y[i] \cdot x[i]))}{x \cdot \operatorname{nelement}()} $

其中, $ x . nelement ()$ 为输入 $ x $ 中的样本个数。注意这里 $ y $ 世有 1 和 -1 两种模式。

inputs = torch.tensor([[0.3, 0.7], [0.5, 0.5]])  # 两个样本，两个神经元
target = torch.tensor([[-1, 1], [1, -1]], dtype=torch.float)  # 该 loss 为逐个神经元计算，需要为每个神经元单独设置标签

loss_f = nn.SoftMarginLoss()
output = loss_f(inputs, target)

print('SoftMarginLoss损失函数的计算结果为',output)


-->
SoftMarginLoss损失函数的计算结果为 tensor(0.6764)

多分类的折页损失

1	`torch.nn.MultiMarginLoss(p=1, margin=1.0, weight=None, size_average=None, reduce=None, reduction='mean')`

功能： 计算多分类的折页损失

主要参数:

reduction：计算模式，可为 none/sum/mean。

p：可选 1 或 2。

weight：各类别的 loss 设置权值。

margin：边界值

计算公式： $ \operatorname{loss}(x, y)=\frac{\sum_{i} \max (0, \operatorname{margin}-x[y]+x[i])^{p}}{x \cdot \operatorname{size}(0)} $

其中 $ x \in{0, \ldots, x \cdot \operatorname{size}(0)-1}, y \in{0, \ldots, y \cdot \operatorname{size}(0)-1} $, 对于所有 $i,j$，都有 $0 \leq y[j] \leq x \cdot \operatorname{size}(0)-1 $ 和 $ i \neq y[j] $.

inputs = torch.tensor([[0.3, 0.7], [0.5, 0.5]]) 
target = torch.tensor([0, 1], dtype=torch.long) 

loss_f = nn.MultiMarginLoss()
output = loss_f(inputs, target)

print('MultiMarginLoss损失函数的计算结果为',output)


-->
MultiMarginLoss损失函数的计算结果为 tensor(0.6000)

三元组损失

1	`torch.nn.TripletMarginLoss(margin=1.0, p=2.0, eps=1e-06, swap=False, size_average=None, reduce=None, reduction='mean')`

功能： 计算三元组损失。

三元组: 这是一种数据的存储或者使用格式。<实体1，关系，实体2>。在项目中，也可以表示为< anchor, positive examples , negative examples>

在这个损失函数中，我们希望去anchor的距离更接近positive examples，而远离negative examples

主要参数:

reduction：计算模式，可为 none/sum/mean。

p：可选 1 或 2。

margin：边界值

计算公式：

$L(a, p, n)=\max \left\{d\left(a_{i}, p_{i}\right)-d\left(a_{i}, n_{i}\right)+\operatorname{margin}, 0\right\}$ 其中, $ d\left(x_{i}, y_{i}\right)=\left\|\mathbf{x}_{i}-\mathbf{y}_{i}\right\| $.

triplet_loss = nn.TripletMarginLoss(margin=1.0, p=2)
anchor = torch.randn(100, 128, requires_grad=True)
positive = torch.randn(100, 128, requires_grad=True)
negative = torch.randn(100, 128, requires_grad=True)
output = triplet_loss(anchor, positive, negative)
output.backward()
print('TripletMarginLoss损失函数的计算结果为',output)


-->
TripletMarginLoss损失函数的计算结果为 tensor(1.1667, grad_fn=<MeanBackward0>)

HingEmbeddingLoss

1	`torch.nn.HingeEmbeddingLoss(margin=1.0, size_average=None, reduce=None, reduction='mean')`

功能： 对输出的embedding结果做Hing损失计算

主要参数:

reduction：计算模式，可为 none/sum/mean。

margin：边界值

计算公式：

$$ l_{n}=\left\{\begin{array}{ll}x_{n}, & \text { if } y_{n}=1 \\ \max \left\{0, \Delta-x_{n}\right\}, & \text { if } y_{n}=-1\end{array}\right. $$

注意事项： 输入$x$应为两个输入之差的绝对值。

可以这样理解，让个输出的是正例 $y_n=1$,那么 loss 就是 $x$，如果输出的是负例 $y=-1$，那么输出的loss就是要做一个比较。

loss_f = nn.HingeEmbeddingLoss()
inputs = torch.tensor([[1., 0.8, 0.5]])
target = torch.tensor([[1, 1, -1]])
output = loss_f(inputs,target)

print('HingEmbeddingLoss损失函数的计算结果为',output)


->
HingEmbeddingLoss损失函数的计算结果为 tensor(0.7667)

余弦相似度

1	`torch.nn.CosineEmbeddingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')`

功能： 对两个向量做余弦相似度

主要参数:

reduction：计算模式，可为 none/sum/mean。

margin：可取值[-1,1] ，推荐为[0,0.5] 。

计算公式：

$$ \begin{array}{l}\operatorname{loss}(x, y)=\left\{\begin{array}{ll}1-\cos \left(x_{1}, x_{2}\right), & \text { if } y=1 \\ \max \left\{0, \cos \left(x_{1}, x_{2}\right)-\operatorname{margin}\right\}, & \text { if } y=-1\end{array} \text { 其中， }\right. \\ \cos (\theta)=\frac{A \cdot B}{\|A\|\|B\|}=\frac{\sum_{i=1}^{n} A_{i} \times B_{i}}{\sqrt{\sum_{i=1}^{n}\left(A_{i}\right)^{2}} \times \sqrt{\sum_{i=1}^{n}\left(B_{i}\right)^{2}}}\end{array} $$

这个损失函数应该是最广为人知的。对于两个向量，做余弦相似度。将余弦相似度作为一个距离的计算方式，如果两个向量的距离近，则损失函数值小，反之亦然。

loss_f = nn.CosineEmbeddingLoss()
inputs_1 = torch.tensor([[0.3, 0.5, 0.7], [0.3, 0.5, 0.7]])
inputs_2 = torch.tensor([[0.1, 0.3, 0.5], [0.1, 0.3, 0.5]])
target = torch.tensor([1, -1], dtype=torch.float)
output = loss_f(inputs_1,inputs_2,target)

print('CosineEmbeddingLoss损失函数的计算结果为',output)


-->
CosineEmbeddingLoss损失函数的计算结果为 tensor(0.5000)

CTC损失函数

1	`torch.nn.CTCLoss(blank=0, reduction='mean', zero_infinity=False)`

功能： 用于解决时序类数据的分类

计算连续时间序列和目标序列之间的损失。CTCLoss对输入和目标的可能排列的概率进行求和，产生一个损失值，这个损失值对每个输入节点来说是可分的。输入与目标的对齐方式被假定为 “多对一”，这就限制了目标序列的长度，使其必须是≤输入长度。

主要参数:

reduction：计算模式，可为 none/sum/mean。

blank：blank label。

zero_infinity：无穷大的值或梯度值为

# Target are to be padded
T = 50      # Input sequence length
C = 20      # Number of classes (including blank)
N = 16      # Batch size
S = 30      # Target sequence length of longest target in batch (padding length)
S_min = 10  # Minimum target length, for demonstration purposes

# Initialize random batch of input vectors, for *size = (T,N,C)
input = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_()

# Initialize random batch of targets (0 = blank, 1:C = classes)
target = torch.randint(low=1, high=C, size=(N, S), dtype=torch.long)

input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long)
target_lengths = torch.randint(low=S_min, high=S, size=(N,), dtype=torch.long)
ctc_loss = nn.CTCLoss()
loss = ctc_loss(input, target, input_lengths, target_lengths)
loss.backward()


# Target are to be un-padded
T = 50      # Input sequence length
C = 20      # Number of classes (including blank)
N = 16      # Batch size

# Initialize random batch of input vectors, for *size = (T,N,C)
input = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_()
input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long)

# Initialize random batch of targets (0 = blank, 1:C = classes)
target_lengths = torch.randint(low=1, high=T, size=(N,), dtype=torch.long)
target = torch.randint(low=1, high=C, size=(sum(target_lengths),), dtype=torch.long)
ctc_loss = nn.CTCLoss()
loss = ctc_loss(input, target, input_lengths, target_lengths)
loss.backward()

print('CTCLoss损失函数的计算结果为',loss)
CTCLoss损失函数的计算结果为 tensor(16.0885, grad_fn=<MeanBackward0>)