Using KServe Modelcar for Model Storage

Overview Benefits of Using OCI Containers for Model Storage Prerequisites Packaging Model as OCI Image Option 1: Using Busybox Base Image (Alauda AI Recommendation)Option 2: Using UBI Micro Base Image (Red Hat Recommendation)Building and Pushing the Model Image Deploying Model from OCI Image Prerequisites for Deployment Creating the InferenceService Applying the InferenceService Verifying the Deployment Best Practices Troubleshooting Common Issues Debugging Steps Conclusion

Overview

KServe Modelcar, also known as OCI container-based model storage, is a powerful approach for deploying models in cloud-native environments. By packaging models as OCI container images, you can leverage container runtime capabilities to achieve faster startup times and more efficient resource utilization.

Benefits of Using OCI Containers for Model Storage

Reduced startup times: Avoid downloading the same model multiple times
Lower disk space usage: Reduce the number of models downloaded locally
Improved model performance: Allow pre-fetched images for faster loading
Offline environment support: Ideal for environments with limited internet access
Simplified model distribution: Use enterprise internal registries like Quay or Harbor

Prerequisites

Alauda AI platform installed and running
Model files ready for packaging
Access to a container registry (e.g., Harbor, Quay)
Podman or Docker installed on your local machine

Packaging Model as OCI Image

Option 1: Using Busybox Base Image (Alauda AI Recommendation)

Create a Containerfile with the following content:

# Use lightweight busybox as base image
FROM busybox
 
# Create directory for model and set permissions
RUN mkdir -p /models && chmod 775 /models
 
# Copy local model folder contents to /models directory in image
# Note: This copies the contents of the models folder
COPY models/ /models/
 
# According to KServe convention, model loader usually only needs image layers
# No need to keep running, but can add CMD if debugging is needed

Option 2: Using UBI Micro Base Image (Red Hat Recommendation)

Create a Containerfile with the following content:

FROM registry.access.redhat.com/ubi9/ubi-micro:latest
COPY --chown=0:0 models /models
RUN chmod -R a=rX /models

# nobody user
USER 65534

Building and Pushing the Model Image

Create a temporary directory for storing the model and support files:
cd $(mktemp -d)
Create a models folder (and optionally a version subdirectory for frameworks like OpenVINO):
mkdir -p models/1
Copy your model files to the appropriate directory:
- For most frameworks: cp -r your-model-folder/* models/
- For OpenVINO: cp -r your-model-folder/* models/1/

Build the OCI container image:

# Using Podman
podman build --format=oci -t <registry>/<repository>:<tag> .

# Using Docker
docker build -t <registry>/<repository>:<tag> .

Push the image to your container registry:
# Using Podman podman push <registry>/<repository>:<tag> # Using Docker docker push <registry>/<repository>:<tag>
Note If your repository is private, ensure that you are authenticated to the registry before uploading your container image.

Deploying Model from OCI Image

Prerequisites for Deployment

No additional prerequisites are required beyond the general prerequisites listed above.

Creating the InferenceService

Create an InferenceService YAML file with the following content:

kind: InferenceService
apiVersion: serving.kserve.io/v1beta1
metadata:
  annotations:
    aml-model-repo: Qwen2.5-0.5B-Instruct
    aml-pipeline-tag: text-generation
    serving.kserve.io/deploymentMode: Standard
  labels:
    aml-pipeline-tag: text-generation
    aml.cpaas.io/runtime-type: vllm
  name: oci-demo
  namespace: demo-space
spec:
  predictor:
    maxReplicas: 1
    minReplicas: 1
    model:
      modelFormat:
        name: transformers
      protocolVersion: v2
      resources:
        limits:
          cpu: '2'
          ephemeral-storage: 10Gi
          memory: 8Gi
        requests:
          cpu: '2'
          memory: 4Gi
      runtime: aml-vllm-0.11.2-cpu
      storageUri: oci://build-harbor.alauda.cn/test/qwen-oci:v1.0.0
    securityContext:
      seccompProfile:
        type: RuntimeDefault

Replace Qwen2.5-0.5B-Instruct with your actual model name.
aml.cpaas.io/runtime-type: vllm specifies the code runtime type. For more information about custom inference runtimes, see Extend Inference Runtimes.
Replace aml-vllm-0.11.2-cpu with the runtime name that is already installed in your platform (corresponding to a ClusterServingRuntime CRD instance).
storageUri: oci://build-harbor.alauda.cn/test/qwen-oci:v1.0.0 specifies the OCI image URI with tag where the model is stored.

Applying the InferenceService

Use kubectl to apply the InferenceService configuration:

kubectl apply -f oci-inference-service.yaml

Verifying the Deployment

Check the status of the InferenceService:

kubectl get inferenceservices -n demo-space

You should see the service in Ready state once the deployment is successful.

Best Practices

Model Versioning: Use tags in your container images to version your models
Image Size Optimization: Use lightweight base images and only include necessary model files
Registry Management: Use private registries with proper access control
Security: Follow container security best practices, including regular vulnerability scans
Caching: Leverage container registry caching to improve pull times

Troubleshooting

Common Issues

Permission Errors: Ensure the model files in the image have proper permissions
Registry Authentication: Verify that the cluster has access to the container registry

Debugging Steps

Check the InferenceService events:

kubectl describe inferenceservice oci-demo -n demo-space

Check the predictor pod logs:

kubectl logs -n demo-space -l serving.kserve.io/inferenceservice=oci-demo

Verify the model image can be pulled:

# On a node in the cluster
crictl pull <registry>/<repository>:<tag>

Conclusion

Using KServe Modelcar (OCI container-based model storage) provides a efficient way to deploy models in Alauda AI platform. By following the steps outlined in this guide, you can package your models as OCI images and deploy them with faster startup times and improved resource utilization.

#Using KServe Modelcar for Model Storage

#TOC

#Overview

#Benefits of Using OCI Containers for Model Storage

#Prerequisites

#Packaging Model as OCI Image

#Option 1: Using Busybox Base Image (Alauda AI Recommendation)

#Option 2: Using UBI Micro Base Image (Red Hat Recommendation)

#Building and Pushing the Model Image

#Deploying Model from OCI Image

#Prerequisites for Deployment

#Creating the InferenceService

#Applying the InferenceService

#Verifying the Deployment

#Best Practices

#Troubleshooting

#Common Issues

#Debugging Steps

#Conclusion