Halcon 并行性能

本文最后更新于:2023年4月15日 晚上

本文记录 Halcon 性能测试实验结果,算法CPU服务器并行计算模拟检测流程性能分析报告。

算法CPU服务器并行计算模拟检测流程性能分析报告

背景

当前算法运行过程中数据之间存在内容相关的耦合,难以实现并行运算。
未来当切换到路径规划的拍摄方案时,FOV 分组后的内容相对独立稳定,在数据来源上保证了并行运算的可行性。

同时,未来有提高相机拍摄帧率的规划,为此测试算法在常规缺陷检测上的并行运算数据吞吐量。

目标

  1. 实现 C++ 调用 Halcon 实现数据并行运算
  2. 测试不同 AOP策略、算法进程占用CPU核数、线程数量、线程绑定CPU策略、线程绑定CPU个数 等情况下的算法吞吐量以及服务器CPU占用程度
  3. 确定最优 AOP、CPU、线程调度策略
  4. 测试不同 Model 尺寸、精对位搜索区域尺寸、检测区域尺寸下的算法吞吐量

实验设计

实验条件

  1. 实验目的在于测试算法在常规检测下的吞吐量,因此假设输入数据的带宽无限,即系统输入数据的吞吐量为无限大
  2. 图像尺寸: 4096*3072
  3. 服务器性能:
项目 内容
操作系统 Windows
处理器 Intel(R)Xeon(R)Silver 4314 CPU@2.40GHz 3.40 GHz (2处理器) 64逻辑核
独立显卡
内存 128g
硬盘 500g固态 4T机械
VS 版本 2017
Halcon 版本      22.11

关键指标

为了便于对比同时易于理解,设置算法流程处理相同的数据 1000 次
若非实验需求,不修改核心运算流程
对于每一组实验来说,以 完成所有数据的算法计算消耗的总时间(s) 作为该组实验的性能指标
该指标直接将单位换算为 ms 可以得到单个数据是算法处理平均耗时

框架设计

之前在 Halcon 中已经实现了算法并行实验,用于模拟算法在实际并行计算中的环境,此次实验在 C++ 上实现

  1. 设计模式为生产者消费者模式
  2. 启动数据处理前配置进程 CPU 核数,配置初始化参数,加载初始化资源
  3. 启动线程时绑定 CPU,线程独立执行算法流程
  4. 由 join 函数确保数据处理完成,从创建第一个线程开始计时、从所有线程完成任务退出后结束计时,中间间隔为单组实验耗时

核心流程

单个数据的核心处理流程:

  1. 两次图像旋转
  2. 图像平场校正
  3. 图像精对位匹配
  4. 图像裁剪、作差、结果二值化

C++ 源码

main.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
#pragma once
#include <time.h>
#include <json/json.h>
#include <fstream>
#include "HalconCpp.h"
#include<math.h>

using namespace std;
using namespace HalconCpp;


struct DataBus {

HObject ho_DarkImage;
HObject ho_SubImage;
HObject ho_golden_img;
HObject ho_input_img;
HTuple hv_image_path;
HTuple hv_image_height;
HTuple hv_image_width;
HTuple hv_Mean;
HTuple hv_ncc_model;
};

class TicToc
{
public:
clock_t start = 0;
clock_t finish = 0;
clock_t time = 0;

void tic()
{
start = clock();
}
void toc()
{
finish = clock();
time = finish - start;
}
};

DataBus prepare_assets_data(float golden_area_rate, int NumLevels);
void FlatFieldProcessing(HObject* ho_DarkImage, HObject* ho_SubImage, HTuple hv_FLAT_IMAGE_PATH,
HTuple hv_DARK_IMAGE_PATH, HTuple* hv_Mean, HTuple* hv_Deviation);

HObject DistortMap;


HObject ExpGetGlobalVar_DistortMap(void)
{
return DistortMap;
}

void ExpSetGlobalVar_DistortMap(HObject obj)
{
if (!obj.IsInitialized()) {
return;
}
DistortMap = obj;
}


void FlatFieldProcessing(HObject* ho_DarkImage, HObject* ho_SubImage, HTuple hv_FLAT_IMAGE_PATH,
HTuple hv_DARK_IMAGE_PATH, HTuple* hv_Mean, HTuple* hv_Deviation)
{

// Local iconic variables
HObject ho_FlatImage, ho_Domain;

// Local control variables
HTuple hv_FileExists1, hv_FileExists2;

FileExists(hv_FLAT_IMAGE_PATH, &hv_FileExists1);
FileExists(hv_DARK_IMAGE_PATH, &hv_FileExists2);
if (0 != (HTuple(hv_FileExists1.TupleNot()).TupleOr(hv_FileExists2.TupleNot()))) {
// stop(...); only in hdevelop
}
ReadImage(&ho_FlatImage, hv_FLAT_IMAGE_PATH);
ReadImage(&(*ho_DarkImage), hv_DARK_IMAGE_PATH);
SubImage(ho_FlatImage, (*ho_DarkImage), &(*ho_SubImage), 1, 0);
GetDomain((*ho_SubImage), &ho_Domain);
Intensity(ho_Domain, (*ho_SubImage), &(*hv_Mean), &(*hv_Deviation));
return;
}


void ImageProcess(HObject ho_DarkImage, HObject ho_SubImage, HObject ho_input_img,
HObject* ho_ImageRotate, HTuple hv_image_path, HTuple hv_image_height, HTuple hv_image_width,
HTuple hv_mult, HTuple hv_rotate_angle)
{

// Local iconic variables
HObject ho_FovImage, ho_SubFovImage, ho_ImageResult;

//global object DistortMap

//try
//read_sequence (FovImage, 0, image_width, image_height, 0, 0, image_width, image_height, 'byte', 'MSBFirst', 'MSBFirst', 'byte', 1, image_path)
//catch (Exception)
//read_image (FovImage, image_path)
//endtry
ho_FovImage = ho_input_img;

//WaitSeconds(0.05);

RotateImage(ho_FovImage, &ho_FovImage, -90, "constant");
RotateImage(ho_FovImage, &ho_FovImage, 90, "constant");
SubImage(ho_FovImage, ho_DarkImage, &ho_SubFovImage, 1, 0);
DivImage(ho_SubFovImage, ho_SubImage, &ho_ImageResult, hv_mult, 0);
//rotate_image (ImageResult, ImageRotate, rotate_angle, 'constant')
//map_image (ImageRotate, DistortMap, ImageRotate)
(*ho_ImageRotate) = ho_ImageResult;
return;
}
main.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
#include <iostream>
#include <condition_variable>
#include <mutex>
#include <thread>
#include <vector>
#include <queue>
#include "main.h"


using namespace std;
using namespace HalconCpp;

static const int repository_size = 10;//循环队列的大小
static const int item_total = 20;//要生产的产品数目

std::mutex mtx; // 互斥量,保护产品缓冲区
std::condition_variable cond; // 产品

std::queue<HImage> original_image_queue;
vector<TicToc> time_list;

static std::size_t pushed_data_num = 0; // 生产者生成产品数量
static std::size_t popped_data_num = 0; // 消费者处理产品数量

std::chrono::microseconds t1(1); //a new feature of c++ 11 standard


void generate_data(HImage data)
{
original_image_queue.push(data); // 待处理数据入队列
}

int pop_data(HImage& data)
{
if (not original_image_queue.empty()) {
data = original_image_queue.front(); // 读取数据
original_image_queue.pop(); // 调整数据队列
return 0;
} else {
return -1;
}
}

void test_run(HObject ho_DarkImage, HObject ho_SubImage, HObject ho_golden_img,
HObject ho_input_img, HTuple hv_image_path, HTuple hv_image_height, HTuple hv_image_width,
HTuple hv_Mean, HTuple hv_ncc_model, HTuple* hv_result, float search_alpha, float diff_alpha, int NumLevels)
{
// Local iconic variables
HObject ho_ImageRotate, ho_Rectangle, ho_ImageReduced, ho_ROI_0;
HObject ho_ImagePart, ho_ImageSub, ho_brightRegion, ho_darkRegion;

// Local control variables
HTuple hv_Row, hv_Column, hv_Angle, hv_Score;
HTuple hv_gtWidth, hv_gtHeight, hv_top, hv_down, hv_left;
HTuple hv_right, hv_thre, hv_bValue, hv_dValue;

ImageProcess(ho_DarkImage, ho_SubImage, ho_input_img, &ho_ImageRotate, hv_image_path,
hv_image_height, hv_image_width, hv_Mean, 90);

//cout << search_alpha << " " << diff_alpha << endl;

int y1 = 2317;
int x1 = 1824;
int y2 = 2941;
int x2 = 2438;

int cur_area = (y2 - y1 + 1) * (x2 - x1 + 1);
HTuple gd_width, gd_height;
GetImageSize(ho_ImageRotate, &gd_width, &gd_height);

int W = gd_width.I();
int H = gd_height.I();
int max_area = W * H;

float target_area = cur_area + (max_area - cur_area) * search_alpha;

float K1 = H - 1 - y2 + y1;
float K2 = y2 - y1 + 1;
float K3 = W - 1 - x2 + x1;
float K4 = x2 - x1 + 1;

float A = K1 * K3;
float B = K1 * K4 + K2 * K3;
float C = K2 * K4 - target_area;

float alpha = (-B + pow((pow(B, 2) - 4 * A * C), 0.5)) / (2 * A);
float ny1 = y1 - y1 * alpha;
float ny2 = (H - 1 - y2) * alpha + y2;
float nx1 = x1 - x1 * alpha;
float nx2 = (W - 1 - x2) * alpha + x2;

float narea = (ny2 - ny1 + 1) * (nx2 - nx1 + 1);

//cout << alpha << ' ' << narea << ' ' << target_area << " " << ny2 << " " << nx2 << endl;

GenRectangle1(&ho_ROI_0, ny1, nx1, ny2, nx2);
ReduceDomain(ho_ImageRotate, ho_ROI_0, &ho_ImageReduced);


FindNccModel(ho_ImageReduced, hv_ncc_model, 0, 0, 0.5, 1, 0.5, "true", NumLevels, &hv_Row,
&hv_Column, &hv_Angle, &hv_Score);

target_area = cur_area + (max_area - cur_area) * diff_alpha;

K1 = H - 1 - y2 + y1;
K2 = y2 - y1 + 1;
K3 = W - 1 - x2 + x1;
K4 = x2 - x1 + 1;

A = K1 * K3;
B = K1 * K4 + K2 * K3;
C = K2 * K4 - target_area;

alpha = (-B + pow((pow(B, 2) - 4 * A * C), 0.5)) / (2 * A);
ny1 = y1 - y1 * alpha;
ny2 = (H - 1 - y2) * alpha + y2;
nx1 = x1 - x1 * alpha;
nx2 = (W - 1 - x2) * alpha + x2;

/* hv_top = (hv_Row - (hv_gtHeight / 2)).TupleInt();
hv_down = (hv_top + hv_gtHeight) - 1;
hv_left = (hv_Column - (hv_gtWidth / 2)).TupleInt();
hv_right = (hv_left + hv_gtWidth) - 1;*/


// GenRectangle1(&ho_Rectangle, hv_top, hv_left, hv_down, hv_right);
GenRectangle1(&ho_Rectangle, ny1, nx1, ny2, nx2);
ReduceDomain(ho_ImageRotate, ho_Rectangle, &ho_ImageReduced);
CropDomain(ho_ImageReduced, &ho_ImagePart);

SubImage(ho_ImagePart, ho_ImagePart, &ho_ImageSub, 1, 128);

hv_thre = 50;

Threshold(ho_ImageSub, &ho_brightRegion, 128 + hv_thre, 255);
RegionFeatures(ho_brightRegion, "area", &hv_bValue);

Threshold(ho_ImageSub, &ho_darkRegion, 0, 128 - hv_thre);
RegionFeatures(ho_darkRegion, "area", &hv_dValue);

(*hv_result) = hv_bValue + hv_dValue;

return;
}

HTuple consume_item(DataBus assets_data, float search_alpha, float diff_alpha, int NumLevels)
{
HObject ho_DarkImage;
HObject ho_SubImage;
HObject ho_golden_img;
HObject ho_input_img;
HTuple hv_image_path;
HTuple hv_image_height;
HTuple hv_image_width;
HTuple hv_Mean;
HTuple hv_ncc_model;
HTuple hv_result;

ho_DarkImage = assets_data.ho_DarkImage;
ho_SubImage = assets_data.ho_SubImage;
ho_golden_img = assets_data.ho_golden_img;
ho_input_img = assets_data.ho_input_img;
hv_image_path = assets_data.hv_image_path;
hv_image_height = assets_data.hv_image_height;
hv_image_width = assets_data.hv_image_width;
hv_Mean = assets_data.hv_Mean;
hv_ncc_model = assets_data.hv_ncc_model;

// 处理核心业务逻辑
test_run(ho_DarkImage, ho_SubImage, ho_golden_img,
ho_input_img, hv_image_path, hv_image_height, hv_image_width,
hv_Mean, hv_ncc_model, &hv_result, search_alpha, diff_alpha, NumLevels);

return hv_result;
}

void Producer_thread(HImage data, int max_num)
{
bool ready_to_exit = false;

while (1) {
std::this_thread::sleep_for(t1);
std::unique_lock<std::mutex> lck(mtx); // 加锁

generate_data(data);
pushed_data_num ++;

if (pushed_data_num >= max_num) {
ready_to_exit = true;
}
//std::cout << "生产者线程 " << std::this_thread::get_id()
// << "生产:" << pushed_data_num << std::endl;

lck.unlock(); // 解锁
cond.notify_one();

if (ready_to_exit == true) {
cond.notify_all();
break;
}
}

std::cout << "Producer thread " << std::this_thread::get_id()
<< " is exiting..." << std::endl;
}


void Consumer_thread(DataBus assets_data, int max_num, int id, bool affine_cup, float search_alpha, float diff_alpha, int NumLevels)
{
__int64 base = 1;
if (affine_cup) {
SetThreadAffinityMask(GetCurrentThread(), (base << id));
}
bool ready_to_exit = false;
HImage data;

while (1) {
std::this_thread::sleep_for(t1);
std::unique_lock<std::mutex> lck(mtx); // 加锁

while (original_image_queue.empty() and pushed_data_num < max_num) {
std::cout << "Consumer is waiting for items..." << std::endl;
cond.wait(lck); // Unlock mu and wait to be notified
}

int mark = pop_data(data);
if (mark == 0) {
++popped_data_num;
}

if (popped_data_num >= max_num) {
ready_to_exit = true;
}

/*std::cout << "消费者线程" << std::this_thread::get_id()
<< "消费第" << popped_data_num << "个产品" << std::endl;*/

lck.unlock(); // 释放数据队列锁,让其他生产者、消费者可以读写数据队列
if (mark == 0) {
TicToc cur_time;
cur_time.tic();
assets_data.ho_input_img = data;
HTuple resRegion = consume_item(assets_data, search_alpha, diff_alpha, NumLevels);
cur_time.toc();
time_list.push_back(cur_time);
}

if (ready_to_exit == true) {
break;
}
}

std::cout << "Consumer thread " << std::this_thread::get_id()
<< " is exiting..." << std::endl;
}


DataBus prepare_assets_data(float golden_area_rate, int NumLevels)
{

// Local iconic variables
HObject ho_golden_img, ho_ROI_0, ho_ImageReduced;
HObject ExpTmpLocalVar_DistortMap, ho_DarkImage, ho_SubImage;
HObject ho_Image;

// Local control variables
HTuple hv_nccModel, hv_DictHandle, hv_FLAT_IMAGE_PATH;
HTuple hv_DARK_IMAGE_PATH, hv_image_width, hv_image_height;
HTuple hv_image_path, hv_Mean, hv_Deviation;

ReadImage(&ho_golden_img, "assets/GoldenDie.png");

int y1 = 9;
int x1 = 183;
int y2 = 76;
int x2 = 419;

int cur_area = (y2 - y1 + 1) * (x2 - x1 + 1);
HTuple gd_width, gd_height;
GetImageSize(ho_golden_img, &gd_width, &gd_height);

int W = gd_width.I();
int H = gd_height.I();
int max_area = W * H;

float target_area = cur_area + (max_area - cur_area) * golden_area_rate;

// ny1 = y1 - y1 * alpha
// ny2 = (H -1 -y2) * alpha + y2
// nx1 = x1 - x1 * alpha
// nx2 = (W - 1 - X2) * alpha - X2
// nh = ny2-ny1 + 1
// = (H - 1 - y2 + y1) * alpha + y2 - y1 + 1
// nw = nx2 - nx1 + 1
// = (W - 1 - X2 + X1) * alpha + X2 - x1 + 1
// K1 = H - 1 - y2 + y1
// K2 = y2 - y1 + 1
// K3 = W - 1 - x2 + x1
// K4 = x2 - x1 + 1
// narea = nh * nw
// = (k1 * alpha + k2) * (k3 * alpha + k4)
// = K1K3 * alpha^2 + (K1K4+K2K3) * alpha + K2K4
// narea = target_area
// K1K3 * alpha^2 + (K1K4+K2K3) * alpha + K2K4 - target_area = 0
// A = K1K3
// B = K1K4+K2K3
// C = K2K4 - target_area
// A* alpha ^2 + B * alpha + C = 0
//alpha = (-B +- (B^2 - 4AC) ^ 0.5) / (2A)

float K1 = H - 1 - y2 + y1;
float K2 = y2 - y1 + 1;
float K3 = W - 1 - x2 + x1;
float K4 = x2 - x1 + 1;

float A = K1 * K3;
float B = K1 * K4 + K2 * K3;
float C = K2 * K4 - target_area;

float alpha = (-B + pow((pow(B, 2) - 4 * A * C), 0.5)) / (2 * A);
float ny1 = y1 - y1 * alpha;
float ny2 = (H - 1 - y2) * alpha + y2;
float nx1 = x1 - x1 * alpha;
float nx2 = (W - 1 - x2) * alpha + x2;

float narea = (ny2 - ny1 + 1) * (nx2 - nx1 + 1);

//cout << alpha << ' ' << narea << ' ' << target_area << " " << ny2 << " " << nx2 << endl;

GenRectangle1(&ho_ROI_0, ny1, nx1, ny2, nx2);
ReduceDomain(ho_golden_img, ho_ROI_0, &ho_ImageReduced);
if (NumLevels < 1) {
CreateNccModel(ho_ImageReduced, "auto", 0, 0, "auto", "use_polarity", &hv_nccModel);
} else {
CreateNccModel(ho_ImageReduced, NumLevels, 0, 0, "auto", "use_polarity", &hv_nccModel);
}
//global object DistortMap
ReadDict("assets/DistortionMap", HTuple(), HTuple(), &hv_DictHandle);
GetDictObject(&ExpTmpLocalVar_DistortMap, hv_DictHandle, "DistortionMap");
ExpSetGlobalVar_DistortMap(ExpTmpLocalVar_DistortMap);

hv_FLAT_IMAGE_PATH = "assets/2X-Flat.bmp";
hv_DARK_IMAGE_PATH = "assets/2X-Dark.bmp";

hv_image_width = 4096;
hv_image_height = 3072;

hv_image_path = "data/0.png";

FlatFieldProcessing(&ho_DarkImage, &ho_SubImage, hv_FLAT_IMAGE_PATH, hv_DARK_IMAGE_PATH,
&hv_Mean, &hv_Deviation);

RotateImage(ho_DarkImage, &ho_DarkImage, -90, "constant");
RotateImage(ho_SubImage, &ho_SubImage, -90, "constant");

ReadImage(&ho_Image, hv_image_path);

DataBus assets_data;

assets_data.ho_DarkImage = ho_DarkImage;
assets_data.ho_SubImage = ho_SubImage;
assets_data.ho_golden_img = ho_golden_img;
assets_data.hv_image_height = hv_image_height;
assets_data.hv_image_width = hv_image_width;
assets_data.hv_image_path = hv_image_path;
assets_data.hv_Mean = hv_Mean;
assets_data.hv_ncc_model = hv_nccModel;
assets_data.ho_input_img = ho_Image;

return assets_data;
}


int main()
{
//float golden_area_rate = 0.0;
//float search_alpha = 1;
//float diff_alpha = 0;


Json::StyledWriter style_writer;

HImage data;
ReadImage(&data, "data/0.png");

int item_index = 0;

Json::Value root;

std::vector<std::thread> thread_vector1;
std::vector<std::thread> thread_vector2;

clock_t start = -1;
clock_t end = -1;

__int64 base_move = 1;
__int64 dwProcessAffinityMask = 0;


for (int NumLevels(0); NumLevels <= 0; NumLevels++) {
for (float golden_area_rate(0); golden_area_rate <= 1; golden_area_rate += 0.004) {
DataBus assets_data = prepare_assets_data(golden_area_rate, NumLevels);
for (float search_alpha(0); search_alpha <= 0; search_alpha += 0.01) {
for (float diff_alpha(0); diff_alpha <= 0; diff_alpha += 0.01) {
for (int process_cpu_num(57); process_cpu_num <= 57; process_cpu_num += 7) {
dwProcessAffinityMask = 0;

for (int cpu_index(0); cpu_index < process_cpu_num; cpu_index++) {
dwProcessAffinityMask += base_move << cpu_index;
}

int res = SetProcessAffinityMask(GetCurrentProcess(), dwProcessAffinityMask);

for (int aop_index(0); aop_index < 1; aop_index++) {
if (aop_index == 0) {
SetSystem("parallelize_operators", "false");
} else {
SetSystem("parallelize_operators", "true");
}
for (int affine_cpu_index(1); affine_cpu_index < 2; affine_cpu_index++) {
for (int thread_num(50); thread_num <= 50; thread_num += 1) {
int aop_thread_num = -1;
if (aop_index == 2) {

if (64 - thread_num > 0) {
aop_thread_num = 64 - thread_num;
SetSystem("thread_num", aop_thread_num);
}
}
time_list.clear();

thread_vector1.clear();
thread_vector2.clear();

start = -1;
end = -1;
HTuple hNumLevels, hAngleStart, hAngleExtent, hAngleStep, hMetric;
GetNccModelParams(assets_data.hv_ncc_model, &hNumLevels, &hAngleStart, &hAngleExtent, &hAngleStep, &hMetric);

start = clock();

int max_data_num = 10;
pushed_data_num = 0;
popped_data_num = 0;

for (int i = 0; i < 1; ++i) {
thread_vector1.push_back(std::thread(Producer_thread, data, max_data_num)); // 创建生产者线程.
}

for (int i = 0; i < thread_num; ++i) {
//SetThreadAffinityMask(cur_consumer.native_handle, 1);
thread_vector2.push_back(std::thread(Consumer_thread, assets_data, max_data_num, i, affine_cpu_index, search_alpha, diff_alpha, NumLevels)); // 创建消费者线程.
}

for (auto& thr2 : thread_vector2) {
thr2.join();
}
for (auto& thr1 : thread_vector1) {
thr1.join();
}

end = clock();

Json::Value item;


item["AOP"] = aop_index;

item["Aop Thread Num"] = aop_thread_num;
item["Thread Num"] = thread_num;
item["Cost Time"] = (end - start) * 1.0 / CLOCKS_PER_SEC;
item["Process CPU num"] = process_cpu_num;
item["golden_area_rate"] = golden_area_rate;
item["search_alpha"] = search_alpha;
item["diff_alpha"] = diff_alpha;
item["NumLevels"] = NumLevels;
item["RealNumLevels"] = hNumLevels.I();

cout << "Time Cost: " << (end - start) * 1.0 / CLOCKS_PER_SEC << " s" << endl;

string info_str = style_writer.write(item);
string info_path = "GoldenRate_" + to_string(golden_area_rate) + "#search_alpha_" + to_string(search_alpha) + "#diff_alpha_" + to_string(diff_alpha) + "#NumLevels_" + to_string(NumLevels) + "_res.json";
ofstream ofs(info_path);
ofs << info_str;
ofs.close();

root[item_index++] = item;
}
}
}
}
}
}
}
}
string info_str1 = style_writer.write(root);
string info_path1 = "test.json";
ofstream ofs(info_path1);
ofs << info_str1;
ofs.close();
}

结果与结论

并行配置策略

实验配置
项目 内容
AOP 策略 开启AOP——WITH-AOP,关闭AOP——NO-AOP,自适应AOP——ADAP-AOP
线程绑定 CPU 策略 线程绑定CPU——BIND-CPU / 线程未绑定——FREE-CPU
进程CPU核数 64
线程数 1 - 64
实验结果

实验结论

  1. 在线程数较多时,BIND-CPU 性能总要优于 FREE-CPU 性能,因此 BIND-CPU 可以作为并行计算的贪心策略
  2. 在 BIND-CPU 下三种 AOP 策略中,线程数较多时关闭 AOP 性能较好,可以理解为 CPU 运算充分时 AOP 的引入略微增加了 CPU 直接的交互代价,因此 NO-AOP 性能更好,但是提升幅度不大

算法进程占用 CPU 核数

实验配置

项目 内容
AOP 策略 开启AOP——WITH-AOP,关闭AOP——NO-AOP,自适应AOP——ADAP-AOP
线程绑定 CPU 策略 线程绑定CPU——BIND-CPU
进程CPU核数 1 - 64
线程数 1 - 64
实验结果



实验结论

  1. 总体趋势来看,算法进程 CPU 核数越多,算法性能上限越高,但三组实验均有一个现象是算法使用57个核的性能会优于使用64个核,可能是需要额外算力进行调度的原因
  2. 实验过程中有一个奇怪的现象,C++虽然为算法分配了 60 个核,但剩余4个核也有明显的利用率提升




3. 所有实验中性能最高的实验组为 进程57核,线程数50

为线程绑定不同数量 CPU 核

实验配置

项目 内容
AOP 策略 开启AOP——WITH-AOP,关闭AOP——NO-AOP,自适应AOP——ADAP-AOP
线程绑定 CPU 策略 线程绑定CPU——BIND-CPU
进程CPU核数 64
线程数 1 - 32 / 1 - 16
线程绑定CPU核数 2 / 4

实验结果



实验结论

  1. 绑定 4 CPU 性能显著下降,为线程绑定太多 CPU 会造成算力浪费
  2. 绑定 2 CPU 没有明显性能收益,过程中 CPU 利用率也是 100%
  3. 在绑定 2 CPU 的折线图中可以看到 NO-AOP 的策略性能更佳

不同检测面积

实验配置

项目 内容
AOP 策略 关闭AOP——NO-AOP
线程绑定 CPU 策略 线程绑定CPU——BIND-CPU
进程CPU核数 57
线程数 50
线程绑定CPU核数 1
NCC Model 面积 3.7e5
搜索区域面积 1.2e7
检测面积 3.7e5 - 1.2e7

实验结果

实验结论

  1. 检测面积与运算时间总体成线性正比例关系
  2. 检测面积从 600600 增加到 40963072 对于并行算法来说,每张图增加 1.8 ms 左右

不同模型面积/搜索区域面积

实验配置

项目 内容
AOP 策略 关闭AOP——NO-AOP
线程绑定 CPU 策略 线程绑定CPU——BIND-CPU
进程CPU核数 57
线程数 50
线程绑定CPU核数 1
NCC Model 面积 1.5e4 - 3.7e5
搜索区域面积 3.7e5 - 1.2e7
NCC 模型金字塔层数 AUTO
检测面积 3.7e5

实验结果

实验结论

  1. 运算时间与模型面积、搜索区域面积总体成线性正比例关系
  2. 但由于检测参数中金字塔层数设置为 AUTO,随着面积变化可能使用的金字塔层级也发生变化,因此针对不同金字塔层级组织更细节的实验

模板匹配算法金字塔层数

实验配置

项目 内容
AOP 策略 关闭AOP——NO-AOP
线程绑定 CPU 策略 线程绑定CPU——BIND-CPU
进程CPU核数 57
线程数 50
线程绑定CPU核数 1
NCC Model 面积 1.5e4 - 3.7e5
搜索区域面积 3.7e5 - 1.2e7
NCC 模型金字塔层数 1 - 10
检测面积 3.7e5

实验结果