Results show that most of the selected images share similar semantics, color, texture etc. with the query.
For example, for queries with correct main model prediction:
For queries with incorrect predictions:
This information is useful to diagnose the model. For example, we can deduce the misprediction of #31 is caused by the grayscale which is rare in the training dataset. Similarly, for #0, #1 and #15, the misprediction is likely due to blur and overexposure. For #39, the misprediction is caused by the dog image with a similar environment, posture and color.