在线服务调试和压测

本文介绍：

使用开发环境对在线服务代码进行调试；
使用 wrk 工具对在线服务进行压测。

在开发环境中调试在线服务

在制作模型包前，可以使用开发环境调试模型包的基础镜像、推理代码和模型。具体方法如下：

使用模型包基础镜像，算法卷，模型卷创建开发环境。模型包基础镜像需要安装必要组件，参见：准备 TFServing 类型模型包基础镜像，准备 Seldon 类型模型包基础镜像；
使用 VSCode，Jupyter 或者 SSH 登陆到开发环境中；

创建软链接，将算法代码和模型代码所在的文件夹链接在线服务运行时的相应位置。比如：

ln -s /workspace/algorithm /workspace/inference_code
ln -s /workspace/model/private/for-example /workspace/inference_model

使用启动代码启动在线服务。比如启动 seldon 服务：

cd /workspace/inference_code && seldon-core-microservice inference_service

使用 curl 访问服务，检查服务是否正常运行；

curl 'http://example.com:30080/api/v1.0/predictions' \
   -H 'Accept: application/json, text/plain, */*' \
   -H 'Content-Type: application/json' \
   --data @test.json \
   --insecure

如果服务运行异常，可以调试 inference_code 中的推理代码和 inference_model 中的模型。重复第 4 ~ 5 步，直到服务正常运行。

在线服务压测

在线服务压测是指使用压测工具模拟多个用户同时访问在线服务，以测试在线服务的性能。压测工具有很多，比如 Apache Bench，wrk，JMeter 等。本文介绍使用 wrk 工具对在线服务进行压测。

创建 post.lua 文件，比如：

-- post.lua
wrk.method = "POST"
wrk.headers["Content-Type"] = "application/json"

local file_path = "test.json"
local file = io.open(file_path, "r")
local file_content = file:read("*all")
file:close()

wrk.body = file_content

使用 wrk 工具进行性能评估：
```
wrk -t12 -c400 -d30s -s post.lua http://example.com:30080/api/v1.0/predictions
```
其中：
- -t12 表示使用 12 个线程。
- -c400 表示使用 400 个连接。
- -d30s 表示持续 30 秒。
- -s post.lua 表示使用 post.lua 文件作为压测脚本。
- http://example.com:30080/api/v1.0/predictions 表示压测的 URL。