Using KServe Modelcar for Model Storage
TOC
OverviewBenefits of Using OCI Containers for Model StoragePrerequisitesPackaging Model as OCI ImageOption 1: Using Busybox Base Image (Alauda AI Recommendation)Option 2: Using UBI Micro Base Image (Red Hat Recommendation)Building and Pushing the Model ImageDeploying Model from OCI ImagePrerequisites for DeploymentCreating the InferenceServiceApplying the InferenceServiceVerifying the DeploymentBest PracticesTroubleshootingCommon IssuesDebugging StepsConclusionOverview
KServe Modelcar, also known as OCI container-based model storage, is a powerful approach for deploying models in cloud-native environments. By packaging models as OCI container images, you can leverage container runtime capabilities to achieve faster startup times and more efficient resource utilization.
Benefits of Using OCI Containers for Model Storage
- Reduced startup times: Avoid downloading the same model multiple times
- Lower disk space usage: Reduce the number of models downloaded locally
- Improved model performance: Allow pre-fetched images for faster loading
- Offline environment support: Ideal for environments with limited internet access
- Simplified model distribution: Use enterprise internal registries like Quay or Harbor
Prerequisites
- Alauda AI platform installed and running
- Model files ready for packaging
- Access to a container registry (e.g., Harbor, Quay)
- Podman or Docker installed on your local machine
Packaging Model as OCI Image
Option 1: Using Busybox Base Image (Alauda AI Recommendation)
Create a Containerfile with the following content:
Option 2: Using UBI Micro Base Image (Red Hat Recommendation)
Create a Containerfile with the following content:
Building and Pushing the Model Image
-
Create a temporary directory for storing the model and support files:
-
Create a models folder (and optionally a version subdirectory for frameworks like OpenVINO):
-
Copy your model files to the appropriate directory:
- For most frameworks:
cp -r your-model-folder/* models/ - For OpenVINO:
cp -r your-model-folder/* models/1/
- For most frameworks:
-
Build the OCI container image:
-
Push the image to your container registry:
Note If your repository is private, ensure that you are authenticated to the registry before uploading your container image.
Deploying Model from OCI Image
Prerequisites for Deployment
No additional prerequisites are required beyond the general prerequisites listed above.
Creating the InferenceService
Create an InferenceService YAML file with the following content:
- Replace
Qwen2.5-0.5B-Instructwith your actual model name. aml.cpaas.io/runtime-type: vllmspecifies the code runtime type. For more information about custom inference runtimes, see Extend Inference Runtimes.- Replace
aml-vllm-0.11.2-cpuwith the runtime name that is already installed in your platform (corresponding to a ClusterServingRuntime CRD instance). storageUri: oci://build-harbor.alauda.cn/test/qwen-oci:v1.0.0specifies the OCI image URI with tag where the model is stored.
Applying the InferenceService
Use kubectl to apply the InferenceService configuration:
Verifying the Deployment
Check the status of the InferenceService:
You should see the service in Ready state once the deployment is successful.
Best Practices
- Model Versioning: Use tags in your container images to version your models
- Image Size Optimization: Use lightweight base images and only include necessary model files
- Registry Management: Use private registries with proper access control
- Security: Follow container security best practices, including regular vulnerability scans
- Caching: Leverage container registry caching to improve pull times
Troubleshooting
Common Issues
- Permission Errors: Ensure the model files in the image have proper permissions
- Registry Authentication: Verify that the cluster has access to the container registry
Debugging Steps
-
Check the InferenceService events:
-
Check the predictor pod logs:
-
Verify the model image can be pulled:
Conclusion
Using KServe Modelcar (OCI container-based model storage) provides a efficient way to deploy models in Alauda AI platform. By following the steps outlined in this guide, you can package your models as OCI images and deploy them with faster startup times and improved resource utilization.