Skip to content
Advertisement

As a result of the Euclidean distance many images are mistakenly identified as identical

Sorry for my bad english, i have a small database that contains hashes of photos, when I try to find similar photos to the one below:

for which the following hash was calculated: “0f3f2764ecc482c2” using the method average_hash()

The system finds a very large number of collisions, below is an example of photos that were identified as completely identical:

The table in which I store hashes of photos:

Adding hash:

SQL Query by which I calculate the Euclidean distance between my hash and stored hashes:

The photos table contains 7889 photos, 959 of them are mistakenly determined by this query as completely identical (Euclidean distance is 0). About a week I can not solve this problem please someone help me.

Advertisement

Answer

You need to convert the hex strings to integers before doing xor operations

because all strings the first character of which is not 1-9 are converted to 0.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement