三種 Badcase 精度驗(yàn)證方案詳解與 hbm_infer 部署實(shí)錄
數(shù)據(jù)常常分散在客戶服務(wù)器;
有些數(shù)據(jù)是動(dòng)態(tài)生成、無(wú)法導(dǎo)出;
板端資源有限,難以長(zhǎng)期駐留模型或數(shù)據(jù)。
優(yōu)點(diǎn):
無(wú)需開(kāi)發(fā)板,部署輕量;
適合多模型結(jié)構(gòu)快速迭代驗(yàn)證;
缺點(diǎn):
本地仿真推理因?yàn)槿鄙倭藢S冒宥擞布⑴c,速度相對(duì)較差。
優(yōu)點(diǎn):
數(shù)據(jù)留在服務(wù)端,可動(dòng)態(tài)調(diào)度;
使用板端 硬件推理,速度較快,且度評(píng)估基于真實(shí) BPU,結(jié)果可靠;
缺點(diǎn):
網(wǎng)絡(luò)帶寬影響推理效率;
需依賴板端資源;
優(yōu)點(diǎn):
推理速度最快,完全無(wú)網(wǎng)絡(luò)瓶頸;
精度結(jié)果與部署完全一致;
缺點(diǎn):
需預(yù)先準(zhǔn)備所有測(cè)試數(shù)據(jù);
動(dòng)態(tài)輸入或在線調(diào)試能力較弱
重度需依賴板端資源;
# 安裝核心組件 1. hbm_infer的使用依賴算法工具發(fā)布的docker環(huán)境,因此在使用hbm_infer前需要先構(gòu)建后DOCKER環(huán)境,然后在容器中安裝hbm_infer組件 2. 在NDA支持下獲取hbm_infer python安裝包,進(jìn)入docker環(huán)境后使用pip install 安裝后使用
import torch
import time
from hbm_infer.hbm_rpc_session import HbmRpcSession
def test_hbm_infer():
hbm_model = HbmRpcSession(
host="192.168.1.100", # 板端 IP
local_hbm_path="./model.hbm"
)
hbm_model.show_input_output_info()
data = {
"input_0_y": torch.randint(0, 256, (1, 512, 960, 1), dtype=torch.uint8),
"input_0_uv": torch.randint(0, 256, (1, 256, 480, 2), dtype=torch.uint8),
}
begin = time.time()
for _ in range(10):
outputs = hbm_model(data)
print({k: v.shape for k, v in outputs.items()})
print(f"Avg time: {round((time.time()-begin)*1000 / 10, 2)} ms")
hbm_model.close_server()
if __name__ == "__main__":
test_hbm_infer()from hbm_infer.hbm_rpc_session_flexible import (
HbmRpcSession, init_server, deinit_server, init_hbm, deinit_hbm
)
import torch, time
def test_flexible():
server = init_server(host="192.168.1.100")
handle = init_hbm(hbm_rpc_server=server, local_hbm_path="./model.hbm")
hbm_model = HbmRpcSession(hbm_rpc_server=server, hbm_handle=handle)
data = {
"input_0_y": torch.randint(0, 256, (1, 512, 960, 1), dtype=torch.uint8),
"input_0_uv": torch.randint(0, 256, (1, 256, 480, 2), dtype=torch.uint8),
}
begin = time.time()
for _ in range(10):
outputs = hbm_model(data)
print({k: v.shape for k, v in outputs.items()})
print(f"Avg time: {round((time.time()-begin)*1000 / 10, 2)} ms")
hbm_model.close_server()
deinit_hbm(handle)
deinit_server(server)
if __name__ == "__main__":
test_flexible()板端與服務(wù)端建議處于同網(wǎng)段或直連,降低傳輸延遲;
對(duì)于批量推理任務(wù),可提前批量加載數(shù)據(jù)并串行發(fā)送;
支持 with_profile=True 打開(kāi)性能日志分析;
*博客內(nèi)容為網(wǎng)友個(gè)人發(fā)布,僅代表博主個(gè)人觀點(diǎn),如有侵權(quán)請(qǐng)聯(lián)系工作人員刪除。









