Author: Chen qiukai (qiusuo)
preface
KubeDL is Ali's open source AI workload management framework based on Kubernetes, which is taken from the abbreviation of "Kubernetes deep learning". It is hoped to feed back the experience of large-scale machine learning job scheduling and management to the community based on Alibaba's scenario. At present, KubeDL has entered the incubation of CNCF Sandbox project. We will continue to explore the best practices in cloud native AI scenarios and help algorithm scientists realize innovation simply and efficiently.
In the latest kubedl release 0.4 In version 0, We have brought the ability of model version management. AI scientists can easily track, mark and store model versions like managing images. More importantly, in the classical machine learning pipeline, the two stages of "training" and "reasoning" are relatively independent, and the "training - > model - > reasoning" in the Perspective of algorithmic scientists The pipeline lacks faults, and the "model", as the intermediate product of the two, can just play the role of "connecting the past and the future".
Github: https://github.com/kubedl-io/kubedl
Website: https://kubedl.io/model/intro/
Current situation of model management
Model file is the product of distributed training. It is the essence of algorithm preserved after full iteration and search. In industry, algorithm model has become a valuable digital asset. Generally, different distributed frameworks will output model files in different formats. For example, Tensorflow training jobs usually output CheckPoint(.ckpt), GraphDef(.pb), SavedModel and other formats, while PyTorch is usually in pth suffix, different frameworks will parse the run-time data flow diagram, run parameters and their weights when loading the model. For the file system, they are a (or a group of) special format files, just like JPEG and PNG format image files.
Therefore, the typical management method is to treat them as files and host them in a unified object storage (such as Alibaba cloud OSS and AWS S3). Each tenant / team is assigned a directory, and their members store the model files in their corresponding subdirectories. SRE controls the read and write permissions uniformly:
The advantages and disadvantages of this management method are obvious:
- The advantage is to retain the user's API usage habit, specify your own directory as the output path in the training code, and then mount the corresponding directory of cloud storage into the container of reasoning service to load the model;
- However, this puts forward higher requirements for SRE. Unreasonable read-write permission authorization and misoperation may cause file permission disclosure and even large-scale misdeletion; At the same time, the file based management method is not easy to realize the version management of the model, which usually requires the user to mark according to the file name, or the upper platform to bear the complexity of version management; In addition, the corresponding relationship between the model file and the algorithm code / training parameters cannot be mapped directly, and even the same file will be overwritten many times in multiple training, which is difficult to trace the history;
Based on the above situation, KubeDL fully combines the advantages of Docker image management and introduces a set of Image Based Image Management API, which makes the combination of distributed training and reasoning services closer and more natural, and greatly simplifies the complexity of model management.
Starting from the mirror
Image is the soul of Docker and the core infrastructure in the container era. Image itself is a layered immutable file system, and model files can naturally be used as an independent image layer. The combination of the two will burst out other sparks:
- Instead of facing the file management model, users can directly use the ModelVersion API provided by KubeDL, and the training and reasoning services are bridged through the ModelVersion API;
- Like the image, you can Tag the model for version traceability and push it to the unified image Registry for authentication. At the same time, the storage back end of the image Registry can also be replaced with the user's own OSS/S3, so that the user can make a smooth transition;
- Once the model image is built, it will become a read-only template and can no longer be overwritten or written, practicing the concept of "immutable infrastructure" of Serverless;
- The image Layer reduces the cost of model file storage and speeds up the efficiency of distribution through compression algorithm and hash de duplication;
On the basis of "model mirroring", we can also fully combine the open source image management components to maximize the advantages brought by the image:
- In the scenario of large-scale reasoning service expansion, you can accelerate the image distribution efficiency through {Dragonfly. When facing the scenario of sudden traffic, you can quickly pop up stateless reasoning service instances. At the same time, you can avoid the current limitation problem of concurrent reading of large-scale instances that may occur when mounting cloud storage volumes;
- For daily reasoning service deployment, you can also preheat the model image on the node in advance through the # ImagePullJob in openkruse to improve the efficiency of capacity expansion and release.
Model and ModelVersion
KubeDL model management introduces two resource objects: model and ModelVersion. Model represents a specific model, ModelVersion represents a specific version in the iteration process of the model, and a group of modelversions are derived from the same model. The following is an example:
apiVersion: model.kubedl.io/v1alpha1 kind: ModelVersion metadata: name: my-mv namespace: default spec: # The model name for the model version modelName: model1 # The entity (user or training job) that creates the model createdBy: user1 # The image repo to push the generated model imageRepo: modelhub/resnet imageTag: v0.1 # The storage will be mounted at /kubedl-model inside the training container. # Therefore, the training code should export the model at /kubedl-model path. storage: # The local storage to store the model localStorage: # The local host path to export the model path: /foo # The node where the chief worker run to export the model nodeName: kind-control-plane # The remote NAS to store the model nfs: # The NFS server address server: ***.cn-beijing.nas.aliyuncs.com # The path under which the model is stored path: /foo # The mounted path inside the container mountPath: /kubedl/models --- apiVersion: model.kubedl.io/v1alpha1 kind: Model metadata: name: model1 spec: description: "this is my model" status: latestVersion: imageName: modelhub/resnet:v1c072 modelVersion: mv-3
The Model resource itself only corresponds to the description of a certain type of Model, and tracks the latest version of the Model and its image name to inform the user. The user mainly defines the Model configuration through ModelVersion:
- modelName: used to point to the corresponding model name;
- createBy: the entity that creates the ModelVersion is used to trace upstream producers, usually a distributed training job;
- imageRepo: the address of the image Registry. After building the model image, push the image to this address;
- Storage: the storage carrier of model files. At present, we support three storage media: NAS, AWSEfs and LocalStorage. More mainstream storage methods will be supported in the future. The above examples show two modes of model output (local storage volume and NAS storage volume). Generally, only one storage mode is allowed to be specified.
When KubeDL monitors the creation of ModelVersion, it will trigger the workflow of model construction:
- Listen to the ModelVersion event and initiate a model build;
- Create the corresponding PV and PVC according to the type of storage and wait for the volume to be ready;
- Create a Model Builder to build the image in the user state. For the Model Builder, we adopt kaniko's scheme. Its construction process is completely consistent with the image format and the standard Docker, but all this occurs in the user state and does not depend on any Docker Daemon of the host machine;
- The Builder will copy the model file (either a single file or a directory) from the corresponding path of volume, and use it as an independent image layer to build a complete Model Image;
- Push the output Model Image to the image Registry warehouse specified in the ModelVersion object;
- End the whole construction process;
At this point, the model of the corresponding version of the ModelVersion is solidified in the image warehouse and can be distributed to subsequent reasoning services for consumption.
From training to model
Although ModelVersion supports independent creation and initiation of construction, we prefer to automatically trigger the construction of the model after the successful completion of the distributed training job, which is naturally connected in series into a pipeline.
KubeDL supports this submission method. Take TFJob job as an example. When initiating distributed training, specify the output path of model file and the pushed warehouse address. When the job is successfully executed, a ModelVersion object will be automatically created and createdBy will point to the upstream job name, The creation of ModelVersion will not be triggered when the job execution fails or terminates prematurely.
The following is an example of distributed mnist training, which outputs the model file to the / models/model-example-v1} path of the local node, and triggers the construction of the model after the successful operation:
apiVersion: "training.kubedl.io/v1alpha1" kind: "TFJob" metadata: name: "tf-mnist-estimator" spec: cleanPodPolicy: None # modelVersion defines the location where the model is stored. modelVersion: modelName: mnist-model-demo # The dockerhub repo to push the generated image imageRepo: simoncqk/models storage: localStorage: path: /models/model-example-v1 mountPath: /kubedl-model nodeName: kind-control-plane tfReplicaSpecs: Worker: replicas: 3 restartPolicy: Never template: spec: containers: - name: tensorflow image: kubedl/tf-mnist-estimator-api:v0.1 imagePullPolicy: Always command: - "python" - "/keras_model_to_estimator.py" - "/tmp/tfkeras_example/" # model checkpoint dir - "/kubedl-model" # export dir for the saved_model format
% kubectl get tfjob NAME STATE AGE MAX-LIFETIME MODEL-VERSION tf-mnist-estimator Succeeded 10min mnist-model-demo-e7d65 % kubectl get modelversion NAME MODEL IMAGE CREATED-BY FINISH-TIME mnist-model-demo-e7d65 tf-mnist-model-example simoncqk/models:v19a00 tf-mnist-estimator 2021-09-19T15:20:42Z % kubectl get po NAME READY STATUS RESTARTS AGE image-build-tf-mnist-estimator-v19a00 0/1 Completed 0 9min
Through this mechanism, you can also solidify other "Artifacts files that will be output only when the job is executed successfully" into the image and use them in subsequent stages.
From model to reasoning
With the above foundation, when deploying reasoning services, directly reference the constructed ModelVersion, you can load the corresponding model and directly provide reasoning services. So far, the stages of the algorithm model life cycle (code - > Training - > model - > deployment online) are connected through model related API s.
When deploying an Inference service through the Inference resource object provided by KubeDL, you only need to fill in the corresponding ModelVersion name in a predictor template. When creating the predictor, the Inference Controller will inject a Model Loader, which will pull the image carrying the model file to the local, The model file is mounted in the main container by sharing Volume between containers to realize the loading of the model. As mentioned above, combined with OpenKruise's ImagePullJob, we can easily warm up the model image to speed up the loading of the model. For the consistency of user perception, the model mounting path of reasoning service is consistent with the model output path of distributed training job by default.
apiVersion: serving.kubedl.io/v1alpha1 kind: Inference metadata: name: hello-inference spec: framework: TFServing predictors: - name: model-predictor # model built in previous stage. modelVersion: mnist-model-demo-abcde replicas: 3 batching: batchSize: 32 template: spec: containers: - name: tensorflow args: - --port=9000 - --rest_api_port=8500 - --model_name=mnist - --model_base_path=/kubedl-model/ command: - /usr/bin/tensorflow_model_server image: tensorflow/serving:1.11.1 imagePullPolicy: IfNotPresent ports: - containerPort: 9000 - containerPort: 8500 resources: limits: cpu: 2048m memory: 2Gi requests: cpu: 1024m memory: 1Gi
For a complete reasoning service, it is possible to Serve multiple predictors of different model versions at the same time. For example, in common search and recommendation scenarios, it is expected to compare the effects of multiple model iterations with A/B Testing experiments at the same time, which can be easily done through influence + modelversion. We can reference different versions of models for different predictors and allocate traffic with reasonable weight to achieve the purpose of serving different versions of models and gray comparison effect under one reasoning service:
apiVersion: serving.kubedl.io/v1alpha1 kind: Inference metadata: name: hello-inference-multi-versions spec: framework: TFServing predictors: - name: model-a-predictor-1 modelVersion: model-a-version1 replicas: 3 trafficWeight: 30 # 30% traffic will be routed to this predictor. batching: batchSize: 32 template: spec: containers: - name: tensorflow // ... - name: model-a-predictor-2 modelVersion: model-version2 replicas: 3 trafficWeight: 50 # 50% traffic will be roted to this predictor. batching: batchSize: 32 template: spec: containers: - name: tensorflow // ... - name: model-a-predictor-3 modelVersion: model-version3 replicas: 3 trafficWeight: 20 # 20% traffic will be roted to this predictor. batching: batchSize: 32 template: spec: containers: - name: tensorflow // ...
summary
KubeDL introduces two resource objects, model and ModelVersion, and combines them with the standard container image to realize the functions of model construction, marking and version traceability, immutable storage and distribution, liberating the extensive model file management mode. Mirroring can also be combined with other excellent open source communities to realize the functions of image distribution acceleration and model image preheating, Improve the efficiency of model deployment. At the same time, the introduction of Model Management API well connects the two originally separated stages of distributed training and reasoning service, and significantly improves the automation of machine learning pipeline, as well as the experience and efficiency of algorithmic scientists on-line model and experimental comparison. We welcome more users to try KubeDL and give us valuable opinions. We also look forward to more developers paying attention to and participating in the construction of KubeDL community!
KubeDL Github address:
https://github.com/kubedl-io/kubedl
Poke Here , learn about the KubeDL project now!