Skip to content

AltCLIP

Classification Base Model

What is AltCLIP?

AltCLIP is a multi-modal vision model. With AltCLIP, you can compare the similarity between text and images, or the similarlity between two images. AltCLIP was trained on multi-lingual text-image pairs, which means it can be used for zero-shot classification with text prompts in different languages. Read the AltCLIP paper for more information.

The Autodistill AltCLIP module enables you to use AltCLIP for zero-shot classification.

Installation

To use AltCLIP with autodistill, you need to install the following dependency:

pip3 install autodistill-altclip

Quickstart

from autodistill_altclip import AltCLIP
from autodistill.detection import CaptionOntology

# define an ontology to map class names to our AltCLIP prompt
# the ontology dictionary has the format {caption: class}
# where caption is the prompt sent to the base model, and class is the label that will
# be saved for that caption in the generated results
# then, load the model
base_model = AltCLIP(
    ontology=CaptionOntology(
        {
            "person": "person",
            "a forklift": "forklift"
        }
    )
)

results = base_model.predict("construction.jpg")

print(results)

License

The AltCLIP model is licensed under an Apache 2.0 license. See the model README for more information.

🏆 Contributing

We love your input! Please see the core Autodistill contributing guide to get started. Thank you 🙏 to all our contributors!