The Puzzle library
The Puzzle library is designed to quickly find visually similar images (gif, png, jpg), even if they have been resized, recompressed, recolored or slightly modified. The library is free, lightweight yet very fast, configurable, easy to use and it has been designed with security in mind.
The Puzzle library is designed to quickly find visually similar images (GIF, PNG, JPG), even if they have been resized, recompressed, recolored or slightly modified.
The library is free, lightweight yet very fast, configurable, easy to use and
it has been designed with security in mind.
This is a C library, but is also comes with a command-line tool and PHP bindings.
LIBPUZZLE 0.11 HAS BEEN RELEASED ON MARCH 24, 2009
The library is a free implementation of the algorithm published as an image signature for any kind of image by H. Chi Wong, Marshall Bern and David Goldberg.
The first step splits a bitmap picture into blocks. This is a “summary” of the picture, after an initial automatic cropping of featureless borders.
The relationships between adjacent blocks construct a vector (PuzzleCvec), that is the signature of the picture.
The similarity between two pictures can be characterized as the normalized distance between two PuzzleCvec vectors.
A typical image signature only requires 182 bytes, using the built-in compression/decompression functions.
Similar signatures share identical “words”, ie. identical sequences of values at the same positions. By using compound indexes (word + position), the set of possible similar vectors is dramatically reduced, and in most cases, no vector distance actually requires to get computed.
Indexing through words and positions also makes it easy to split the data into multiple tables and servers.
So yes, the Puzzle library is certainely not incompatible with projects that need to index millions of pictures.
The puzzle library is free. It is released under the 2-clauses BSD public license.
Permission to use, copy, modify, and distribute this software for any purpose with or without fee is hereby granted, provided that the original copyright appear in all copies.
The library relies on the GD Library in order to load bitmap pictures.
It is known to compile and to work on recent Linux and OpenBSD systems, amd64 and i386, but it also probably works on other operating systems with minor (if any) tweaks.