ONNX 不支持 adaptive_avg_pool 算子的解决方案

本文最后更新于：2024年5月7日下午

ONNX 部署pytorch 模型时，可能会遇到 adaptive_avg_pool 算子不支持而报错的情况，本文记录解决方案。

简介

自适应平均池算子是自适应平均池的简称，是深度学习和神经网络体系结构中常用的一种数学运算。可将张量池化到任意的尺寸上。

问题复现

有时在模型转换到 ONNX 时报错：

1	`Unsupported: ONNX export of operator adaptive_avg_pool1d, output size that are not factor of input size. Please feel free to request support or submit a pull request on PyTorch GitHub.`

pytorch 仓库也有这个问题（2D算子）：

https://github.com/pytorch/pytorch/issues/42653

解决方案

用朴实的 torch 语法重写这个算子

方案一

上述 issue 中有大神提到了解决方案（2D）：

class AdaptiveAvgPool2dCustom(nn.Module):
    def __init__(self, output_size):
        super(AdaptiveAvgPool2dCustom, self).__init__()
        self.output_size = np.array(output_size)

    def forward(self, x: torch.Tensor):
        stride_size = np.floor(np.array(x.shape[-2:]) / self.output_size).astype(np.int32)
        kernel_size = np.array(x.shape[-2:]) - (self.output_size - 1) * stride_size
        avg = nn.AvgPool2d(kernel_size=list(kernel_size), stride=list(stride_size))
        x = avg(x)
        return x

思路是将原始数据维度降维到新的目标维度，通过动态自适应调整池化的步长和窗口实现自适应池化。

我对照这份代码修改出了 1D 的算子：

class AdaptiveAvgPool1dCustom(nn.Module):
    def __init__(self, output_size):
        super(AdaptiveAvgPool1dCustom, self).__init__()
        self.output_size = np.array(output_size)

    def forward(self, x: torch.Tensor):
        cur_shape = np.array(x.shape[-1])
        if cur_shape < self.output_size:
            raise RuntimeError(f"AdaptiveAvgPool1dCustom is converting {cur_shape} feature to {self.output_size} by avgpool which is not supported, suggestion is to change outputsize to input shape {cur_shape}.")
        stride_size = np.floor(np.array(x.shape[-1]) / self.output_size).astype(np.int32)
        kernel_size = np.array(x.shape[-1]) - (self.output_size - 1) * stride_size
        avg = nn.AvgPool1d(kernel_size=kernel_size, stride=stride_size)
        x = avg(x)
        return x

但是该方法导出的 onnx 模型有时会在 onnx 运行时报错，可能是因为输出维度更大时 stride_size 为 0 导致的

方案二

上述代码在数据降维的时候可以正常运行，但是当数据维度升高时无法正常工作，而且输出结果与原始自适应池化算子不一致。

这是由于原始自适应池化算子的计算原理与上述方案不同：
$$
lstart=floor(i*L_{in}/L_{out})
$$

$$
lend=ceil((i+1)*L_{in}/L_{out})
$$

$$
Output(i)=\frac{sum(Input[lstart:lend])}{(lstart-lend)}
$$

在上述 issue 中也有大神提到了这种原理的计算方式，这篇博客也提到了类似计算方法:

def torch_pool(inputs, target_size):
    start_points = (torch.arange(target_size, dtype=torch.float32) * (inputs.size(-1) / target_size)).long()
    end_points = ((torch.arange(target_size, dtype=torch.float32)+1) * (inputs.size(-1) / target_size)).ceil().long()
    pooled = []
    for idx in range(target_size):
        pooled.append(torch.mean(inputs[:, :, start_points[idx]:end_points[idx]], dim=-1, keepdim=False))
    pooled = torch.cat(pooled, -1)
    return pooled

原理应该没有问题，不过这份代码我没有运行过

但是我考虑这些代码都执行了 for 循环，我觉得不够优雅，写了如下版本，可以正常运行，也可以保存 onnx 模型，供大家参考：


class AdaptiveAvgPool1dCustomPlus(nn.Module):
    def __init__(self, output_size):
        super(AdaptiveAvgPool1dCustomPlus, self).__init__()
        self.output_size = int(output_size)
        assert self.output_size > 0

    def forward(self, x: torch.Tensor):
        L = x.shape[-1]
        cum_res = torch.cumsum(x, dim=-1)
        cum_res = torch.cat((torch.zeros(*x.shape[:-1], 1).to(x.device), cum_res), dim=-1)
        indexs = torch.arange(0, self.output_size) * L /self.output_size
        indexs_larger = indexs + L /self.output_size
        lstart_t = torch.floor(indexs).to(torch.long).to(x.device)
        lend_t = torch.ceil(indexs_larger).to(torch.long).to(x.device)

        output = (cum_res[...,lend_t] - cum_res[...,lstart_t]) / (lend_t - lstart_t)
        return output

实现原理是一致的，只是通过累加和 来计算起始和结束的下标，摒弃了 for 循环。

方案三

方案二的版本可以成功转换 onnx 模型并且可以正常运行，但是转为 Tensorrt 后速度很慢，可以尝试直接使用 F.interpolate 算子

class AdaptiveAvgPool1dVersion3_d3(nn.Module):
    def __init__(self, output_size):
        super(AdaptiveAvgPool1dVersion3_d3, self).__init__()
        self.output_size = int(output_size)
        assert self.output_size > 0
    def forward(self, x):
        x = F.interpolate(x, self.output_size, mode='linear')
        return x