Note: the "Random results" are intended to serve as a baseline. For each query it will randomly select 5 traning datapoints with the same label as the query label.

Results show that most of the selected images share similar semantics, color, texture etc. with the query.

For example, for queries with correct main model prediction:

query-#4: similar semantics (men in red on horses)
query-#18: similar color & texture
query-#22: similar semantics
query-#25: similar color & posture

For queries with incorrect predictions:

query-#0, query-#1 & query-#15: blur & overexposure
query-#8: similar color
query-#18: similar color & posture
query-#21: similar color, texture and shape
query-#31: grayscale
query-#39: similar semantics (animal on the grass), color and posture

This information is useful to diagnose the model. For example, we can deduce the misprediction of #31 is caused by the grayscale which is rare in the training dataset. Similarly, for #0, #1 and #15, the misprediction is likely due to blur and overexposure. For #39, the misprediction is caused by the dog image with a similar environment, posture and color.

Our results on queries from correct predictions

Our results on queries from wrong predictions

Random results on queries from correct predictions

Random results on queries from wrong predictions

Note: the "Random results" are intended to serve as a baseline. For each query it will randomly select 5 traning datapoints with the same label as the query label.