It would be wonderful to implement some perceptual/visual image hashing/fingerprinting mechanism in MediaWiki for detection of non-exact duplicate uploads.
This came up a few times recently in relation to copyvio uploads (copyright violation), in particular on T120867 and on-wiki discussion of T120453. In this context, it would be critical to be able to find similar images cross-wiki.
I'm pretty sure this is still an area of active academic research… but then, that means there are probably papers written by smart people available for us to use! Quite a lot of content with big words like transforms and wavelets comes up for this topic.
It's clearly possible, even on a large set of images, as evidenced by Google Images and TinEye doing this. (Their algorithms actually look pretty different; Google Images often produces results that are similar in the sense of containing the same kind of object, while TinEyes seems to give images that were actually derived from the same source image, cropped or resized or otherwise). There's a Windows freeware tool called Visipics which also does this (I found it pretty good; closed source, unfortunately) and a Linux one called GQview (primarily an image viewer, but can detect duplicates).
The last one, GQview, is actually open source (http://gqview.sourceforge.net/view-down.html) and uses a pretty simple algorithm (src/similar.c): it essentially just resizes all images to 32x32px and uses that as a fingerprint to compare them. Images are considered similar if the fingerprints differ by no more than 5%. Not sure if we would be able to do that kind of comparison in a SQL query (but the power of SQL keeps surprising me).
In terms of specific implementations of this goal, T167947 aims to make image hashes available through an API (preferred) or database query, a "first step", T251026 is regarding checking existing files for duplicates before uploading a new one, and this task itself contains planning for checking all already-uploaded files for duplicates systematically.