For the security of the hash visualization, Random Artneeds to satisfy the near-one-way
requirements listed in section 2.2. We can achieve the hash function one-way
properties by hashing the input string with a cryptographically secure hash function, such as SHA-1
[10], to seed a cryptographically secure random number generator, such as Blum-Blum-Shub
[5]. Hence, we can achieve the pre-image-resistance
property. But the difficulty is to achieve collision-free, due to the properties of the image
generation. First, it is easy to see that many different trees generate identical images. As an
example we mention the commutative property of addition or multiplication, where the image remains
invariant when swapping sub-trees of those operators. In addition, certain constructs can
``destroy'' randomness: As an example we consider a construct such as If x 0.999 then A
else B. Since , the condition is always true if the image is smaller than 1000x1000
pixels, hence the subtree in the else case is never evaluated. In addition, the hash
visualization properties take the human factor into account and therefore two images and
collide if they are near . Due to these issues, we could not formally prove that
the images generated by Random Artsatisfy the hash visualization requirements.
Instead, we propose to perform experiments to empirically estimate the difficulty to generate an
image collision. With the assumption that all images produced by Random Artare equally
likely, our approach is to pick random seeds and count how many perceptually different images can
be generated. To perform the actual counting, we make use of a statistical estimation method, based
on the birthday paradox. The birthday paradox basically expresses that if we draw random samples out
of a uniform distribution with distinct values, we start seeing duplicates after approximately
drawings. The name birthday paradox comes from the fact that we expect with a 50%
probability, to have two people with the same birthday as soon as we have more than 24 people.
We can apply this technique to estimate the total number of images in the following way. First, we
assume that every image is equally likely. We can then compute the probability that a certain number
of collisions were encountered, given the total number of different images , the number of images
generated , and the number of collisions . An upper bound for the probability is: . But since the user study will reveal
for each sample whether it is new or a collision, we can compute the
precise probability:
Since only and are known from the user study, we can compute through a maximum-likelihood estimation: which value of maximizes the probability? It is generally known as a good rule of thumb that squaring the number of samples after the first collision, is a good estimate for .
Notice that Random Artis only a prototypical solution, and hence, might not be the final answer for hash visualization.