Combine Models
You can combine detection, segmentation, and classification models to leverage the strengths of each model.
For example, consider a scenario where you want to build a logo detection model that identifies popular logos. You could use a detection model to identify logos (i.e. Grounding DINO), then a classification model to classify between the logos (i.e. Microsoft, Apple, etc.).
To combine models, you need to choose:
- Either a detection or a segmentation model, and;
- A classification model.
Let's walk through an example of using a combination of Grounding DINO and SAM (GroundedSAM), and CLIP for logo classification.
from autodistill_clip import CLIP
from autodistill.detection import CaptionOntology
from autodistill_grounded_sam import GroundedSAM
import supervision as sv
from autodistill.core.custom_detection_model import CustomDetectionModel
import cv2
classes = ["McDonalds", "Burger King"]
SAMCLIP = CustomDetectionModel(
detection_model=GroundedSAM(
CaptionOntology({"logo": "logo"})
),
classification_model=CLIP(
CaptionOntology({k: k for k in classes})
)
)
IMAGE = "logo.jpg"
results = SAMCLIP.predict(IMAGE)
image = cv2.imread(IMAGE)
annotator = sv.MaskAnnotator()
label_annotator = sv.LabelAnnotator()
labels = [
f"{classes[class_id]} {confidence:0.2f}"
for _, _, confidence, class_id, _ in results
]
annotated_frame = annotator.annotate(
scene=image.copy(), detections=results
)
annotated_frame = label_annotator.annotate(
scene=annotated_frame, labels=labels, detections=results
)
sv.plot_image(annotated_frame, size=(8, 8))
Here are the results:
See Also¶
- Automatically Label Product SKUs with Autodistill : Uses a combination of Grounding DINO and CLIP to label product SKUs.
Code Reference¶
Bases: DetectionBaseModel
Run inference with a detection model then run inference with a classification model on the detected regions.
Source code in autodistill/core/composed_detection_model.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
|
predict(image)
¶
Run inference with a detection model then run inference with a classification model on the detected regions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
image |
str
|
The image to run inference on |
required |
annotator |
The annotator to use to annotate the image |
required |
Returns:
Type | Description |
---|---|
sv.Detections
|
detections (sv.Detections) |
Source code in autodistill/core/composed_detection_model.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
|