from dataclasses import dataclass
from typing import Callable, Optional, Tuple
import random
Zoombinis - Allergic Cliffs
Growing up, I played a lot of Zoombinis. Zoombinis is well-known for teaching maths, but I think most people talk more about the “high-concept” maths that it teaches, i.e. set theory. I haven’t seen much discussion on the algorithmics that it teaches.
The topic of this blog post is the first level of Zoombinis. In it, you have a set of Zoombinis and two paths (cliffs) in front of you. Each Zoombini can travel across exactly one path - but you don’t know which one until you try it! Which path they can traverse is based on the features of the Zoombinis - on the easiest difficulty, one of the cliffs is “allergic” to Zoombinis with exactly one type of trait, and the other is allergic to all other Zoombinis.
Solution
The bottom cliff only wants one-eyed Zoombinis.
We can talk about how it is teaching basic set theory (complements and unions - depending on difficulty level), but that does not interest me that much. Rather, I’m interested in how to efficiently solve the level. There are multiple ways to define efficiency:
- Runtime
- Memory use
- Number of attempts to cross the cliff
- Called “Output Queries”
- Number of times we search our set of Zoombinis for a specific feature/set of features
- Called “Input Queries”
- For humans, this can be the part that takes the most real-life time
Since the problem size for Zoombinis is fixed (at most 16 Zoombinis per level), considering asymptotic complexity is not of much interest to me currently. Also, I’m most interested in algorithms humans would actually implement, in which case the time sink is the number of input and output queries.
Problem Definition
A “Zoombini” can be defined as follows:
@dataclass
class Zoombini:
int
hair: int
eyes: int
nose: int feet:
Where there are 5 possible values for each attribute of the Zoombini.
We could, of course, generalize what it means to be a Zoombini:
# Number of values per attribute
dict[str, int] = {
possible_values: "Head": 5,
"Eyes": 5,
"Nose": 5,
"Feet": 5
}
@dataclass
class GeneralizedZoombini:
dict[str, int]
attributes:
def __hash__(self) -> int:
"""
This is necessary to allow
our zoombini to be in a set
"""
return hash(
tuple(
self.attributes[k])
(k, for k
in sorted(self.attributes.keys())
) )
The input then consists of a set of Zoombinis and an oracle to query:
set[GeneralizedZoombini]
zoombinis: bool] cliff_oracle: Callable[GeneralizedZoombini,
This setup will allow us to design algorithms built around minimizing calls to cliff_oracle
- if we wanted to measure input queries, we’d have to store the zoombinis in a more restricted data structure.
# This is a decorator to measure function calls
# If you've never used decorators before, just
# ignore this.
dict = {}
expensive_operations: def expensive(name: str) -> "Decorator":
def out(function: Callable) -> Callable:
def func(*args, **kwargs):
if name in expensive_operations:
+= 1
expensive_operations[name] else:
= 1
expensive_operations[name] return function(*args, **kwargs)
return func
return out
class ZoombiniSet:
set[GeneralizedZoombini]
zoombinis:
def __init__(self, z: set[GeneralizedZoombini]):
self.zoombinis = z
def _remove(self, zoombini: GeneralizedZoombini):
"""
Private method; you're not allowed to use this.
"""
self.zoombinis.remove(zoombini)
def size(self):
return len(self.zoombinis)
def arbitrary_zoombini(self) -> GeneralizedZoombini:
return random.choice(list(self.zoombinis))
@expensive("Input Query")
def get_zoombini_with_features(
self,
dict[str, int],
features: -> Optional[GeneralizedZoombini]:
) for zoombini in self.zoombinis:
= True
zoombini_works for feature, value in features.items():
if zoombini.attributes[feature] != value:
= False
zoombini_works break
if zoombini_works:
return zoombini
return None
We have leeway in how we define get_zoombini_with_features
; how complicated are we allowed to make our input query? If we allow it to be too complicated, it would take “too long” for a human to go through - but in theory we should allow all valid predicate logic statements as queries, since a human could do this.
To have a viable playground to experiment with this problem, we should have a way of creating random instances of the problem. Generating a random ZoombiniSet
is easy:
expensive_operations
{'Input Query': 52}
def random_zoombini_set(
int,
size: dict[str, int]
possible_values: -> ZoombiniSet:
)
return ZoombiniSet({
GeneralizedZoombini(={
attributes0, val)
key: random.randint(for key, val
in possible_values.items()
}
)for _ in range(size)
})
We also want a random oracle. For now we’ll just assume that our cliffs are at difficultly level one in Zoombinis - i.e. they just care about one feature.
def random_oracle(
dict[str, int]
possible_values: -> Callable[GeneralizedZoombini, bool]:
) # Decide whether the cliff only accepts Zoombinis
# with a certain trait,
# Or only rejects Zoombinis with a certain trait:
= True if random.randint(0, 2) == 0 else False
accepter
# Decide what trait to look for
= random.choice(list(possible_values.keys()))
trait_name = random.randint(0, possible_values[trait_name])
trait_value
@expensive("Output Query")
def to_return(zoombini: GeneralizedZoombini) -> bool:
= zoombini.attributes[trait_name] == trait_value
has_trait if accepter:
return has_trait
else:
return not has_trait
return to_return
def send_zoombini_across_bridge(
zoombini: GeneralizedZoombini,
zoombinis: ZoombiniSet,bool],
oracle: Callable[GeneralizedZoombini, bool
pred_bridge: -> bool:
) """
Returns whether the bridge was correct or not
"""
= oracle(zoombini)
true_bridge = (pred_bridge == true_bridge)
can_cross if can_cross:
zoombinis._remove(zoombini)return can_cross
Solutions to Easy Difficulty
Here’s a naïve solution - we always send a Zoombini into the last bridge that worked.
def naive_algorithm(
zoombinis: ZoombiniSet,dict[str, int],
possible_values:
oracle
):= True
pred_bridge while zoombinis.size() > 0:
= zoombinis.arbitrary_zoombini()
zoombini if not send_zoombini_across_bridge(
zoombini,
zoombinis,
oracle,
pred_bridge
):= not pred_bridge
pred_bridge
send_zoombini_across_bridge(
zoombini,
zoombinis,
oracle,
pred_bridge )
dict[str, int] = {
possible_values: "Head": 5,
"Eyes": 5,
"Nose": 5,
"Feet": 5
}
def problem_loop(
algorithm,dict[str, int],
possible_values: int = 10000,
num_runs: int = 16,
zoombini_count: str = "Not So Easy"
level: -> dict:
) global expensive_operations
= {}
expensive_operations
for i in range(0, num_runs):
= random_zoombini_set(zoombini_count, possible_values)
zoombinis: ZoombiniSet = random_oracle(possible_values, level=level)
oracle
algorithm(zoombinis, possible_values, oracle)
return expensive_operations
= problem_loop(naive_algorithm, possible_values)
expensive_operations
print("Mean Expensive Operations: ")
:= {
display(mean_eo / num_runs
key: value for key, value
in expensive_operations.items()
})
Mean Expensive Operations:
{'Output Query': 20.4189}
In Zoombinis, you loose if you make 6 failed attempts - i.e. 16+6=22 output queries. Thus, the naïve solution actually works on the easy difficulty on average!
We can be more advanced by keeping track of who has been accepted and always sending the Zoombini across the path that it has the most in common with the Zoombinis that already passed it.
def attributes_as_set(zoombini) -> set[str]:
"""
This is a helper function to allow us to
do set maths on Zoombinis
"""
return {
+ str(value)
key for key, value
in zoombini.attributes.items()
}
def all_attributes_as_set(possible_values: dict[str, int]) -> set[str]:
"""
Get a set containing all possible zoombini attributes
"""
= set({})
to_return for key, value in possible_values.items():
|= {
to_return + str(v)
key for v in range(value)
}return to_return
dict[str, int] = {
possible_values: "Head": 5,
"Eyes": 5,
"Nose": 5,
"Feet": 5
}
def similarity_algorithm(
zoombinis: ZoombiniSet,dict[str, int],
possible_values:
oracle
):dict[bool, set["Features"]]
accepted: = {
accepted True: all_attributes_as_set(possible_values),
False: all_attributes_as_set(possible_values)
}= True
pred_bridge while zoombinis.size() > 0:
= zoombinis.arbitrary_zoombini()
zoombini = attributes_as_set(zoombini)
attributes
# Always send across the bridge
# with more attributes matching
= len(accepted[pred_bridge] & attributes)
pred_match = len(accepted[not pred_bridge] & attributes)
unpred_match if pred_match == 0:
if unpred_match == 0:
# Both empty set, want to go
# to the less exclusive side
# For heuristic reasons, and
# because this is the correct
# choice in the limit
# The less exclusive side has a
# smaller intersection.
if len(accepted[not pred_bridge]) < len(accepted[pred_bridge]):
= not pred_bridge
pred_bridge # If we've matched in one but not the other, we should
# go to the one we've matched with, because that's probably
# the exclusive side.
if pred_match == 0:
if unpred_match > 0:
= not pred_bridge
pred_bridge # If we've matched in both, go to the less exclusive side
if len(accepted[not pred_bridge]) < len(accepted[pred_bridge]):
= not pred_bridge
pred_bridge
# Try to send across the bridge
if not send_zoombini_across_bridge(
zoombini,
zoombinis,
oracle,
pred_bridge
):= not pred_bridge
pred_bridge
send_zoombini_across_bridge(
zoombini,
zoombinis,
oracle,
pred_bridge
)
# Update our list of accepted things
&= attributes accepted[pred_bridge]
= problem_loop(similarity_algorithm, possible_values)
expensive_operations
print("Mean Expensive Operations: ")
:= {
display(mean_eo / num_runs
key: value for key, value
in expensive_operations.items()
})
Mean Expensive Operations:
{'Output Query': 19.2147}
So it seems like being smart awards us one less guess. If we consider the case when there are very many Zoombinis, we gain a larger speedup:
dict[str, int] = {
possible_values: "Head": 5,
"Eyes": 5,
"Nose": 5,
"Feet": 5
}= problem_loop(
expensive_operations
naive_algorithm,
possible_values,=100
zoombini_count
)
print("Mean Expensive Operations: ")
:= {
display(mean_eo / num_runs
key: value for key, value
in expensive_operations.items()
})
Mean Expensive Operations:
{'Output Query': 123.0532}
dict[str, int] = {
possible_values: "Head": 5,
"Eyes": 5,
"Nose": 5,
"Feet": 5
}= problem_loop(
expensive_operations
similarity_algorithm,
possible_values,=100
zoombini_count
)
print("Mean Expensive Operations: ")
:= {
display(mean_eo / num_runs
key: value for key, value
in expensive_operations.items()
})
Mean Expensive Operations:
{'Output Query': 114.5253}
So it seems that being ‘smart’ doesn’t gain too much benefit on the easy difficulty. This makes sense; if a cliff only accepts a single feature, most Zoombinis will be on one side. The naïve solution does well because the chance of two Zoombinis being accepted by the same cliff is rather high, since the same cliff accepts most Zoombinis. If we increase the difficulty to level two or higher, the Zoombinis should hopefully be a more even split, allowing the smarter algorithm to take the lead.
Difficulties
Here is a brief explanation of all the difficulties.
Level 1 - Not So Easy
One cliff accepts all Zoombinis with a specific attribute (such as sunglasses), and the other cliff rejects them all.
Level 2 - Oh So Hard
One cliff accepts a Zoombini if it has one of two types of the same attribute (i.e. one cliff might accept either curly hair or a hat, but you won’t get a mix of accepting curly hair or sunglasses). It rejects all others.
Level 3 - Very Hard
One cliff accepts a Zoombini if it has one of two features, where the features are from different attributes (i.e. curly hair or sunglasses). It rejects all others.
Level 4 - Very, Very Hard
Same as before, but with three features instead of two. In the original game, it is apparently impossible to guarantee you will deduce the rule within the amount of guesses alloted to you - although I don’t know the details of this fact.
Higher Difficulty Comparison
def random_oracle(
dict[str, int],
possible_values: str = "Not So Easy"
level: -> Callable[GeneralizedZoombini, bool]:
) # Decide whether the cliff only accepts Zoombinis
# with a certain trait,
# Or only rejects Zoombinis with a certain trait:
= True if random.randint(0, 2) == 0 else False
accepter
if level == "Not So Easy":
# Decide what trait to look for
= random.choice(list(possible_values.keys()))
trait_name = random.randint(0, possible_values[trait_name])
trait_value
@expensive("Output Query")
def to_return(zoombini: GeneralizedZoombini) -> bool:
= zoombini.attributes[trait_name] == trait_value
has_trait if accepter:
return has_trait
else:
return not has_trait
return to_return
elif level == "Oh So Hard":
# Decide what trait to look for
= random.choice(list(possible_values.keys()))
trait_name = random.sample(
trait_values list(range(0, possible_values[trait_name])),
2
)
@expensive("Output Query")
def to_return(zoombini: GeneralizedZoombini) -> bool:
= zoombini.attributes[trait_name] in trait_values
has_trait if accepter:
return has_trait
else:
return not has_trait
return to_return
elif level == "Very Hard":
# Decide what trait to look for
= random.sample(list(possible_values.keys()), 2)
trait_names = {
trait_values 0, possible_values[trait])
trait: random.randint(for trait in trait_names
}
@expensive("Output Query")
def to_return(zoombini: GeneralizedZoombini) -> bool:
for trait in trait_names:
= zoombini.attributes[trait] == trait_values[trait]
has_trait if accepter:
return has_trait
else:
return not has_trait
return to_return
elif level == "Very, Very Hard":
# Decide what trait to look for
= random.sample(list(possible_values.keys()), 3)
trait_names = {
trait_values 0, possible_values[trait])
trait: random.randint(for trait in trait_names
}
@expensive("Output Query")
def to_return(zoombini: GeneralizedZoombini) -> bool:
for trait in trait_names:
= zoombini.attributes[trait] == trait_values[trait]
has_trait if accepter:
return has_trait
else:
return not has_trait
return to_return
else:
raise NotImplementedError(f"No such level {level}")
for level in ["Not So Easy", "Oh So Hard", "Very Hard", "Very, Very Hard"]:
= problem_loop(
expensive_operations
naive_algorithm,
possible_values,=level,
level=16
zoombini_count
)print(f"Mean Expensive Operations for {level}: ")
:= {
display(mean_eo / num_runs
key: value for key, value
in expensive_operations.items()
})
Mean Expensive Operations for Not So Easy:
Mean Expensive Operations for Oh So Hard:
Mean Expensive Operations for Very Hard:
Mean Expensive Operations for Very, Very Hard:
{'Output Query': 20.3887}
{'Output Query': 23.0109}
{'Output Query': 20.4337}
{'Output Query': 20.4269}
for level in ["Not So Easy", "Oh So Hard", "Very Hard", "Very, Very Hard"]:
= problem_loop(
expensive_operations
similarity_algorithm,
possible_values,=level,
level=16
zoombini_count
)print(f"Mean Expensive Operations for {level}: ")
:= {
display(mean_eo / num_runs
key: value for key, value
in expensive_operations.items()
})
Mean Expensive Operations for Not So Easy:
Mean Expensive Operations for Oh So Hard:
Mean Expensive Operations for Very Hard:
Mean Expensive Operations for Very, Very Hard:
{'Output Query': 19.2308}
{'Output Query': 22.5889}
{'Output Query': 19.1853}
{'Output Query': 19.2089}
The smarter algorithm does notably worse on the second difficulty level. However, this should not be surprising - it was designed for the Not So Easy level, whereas for Oh So Hard we can’t take a simple set intersection because the oracle involves a choice between two attributes. Programming a solution to Oh So Hard is more difficult than the latter two, which involve different features. For the latter two, instead of doing an intersection of all features, we just take an intersection for each feature individually. It is surprising that the naive algorithm also does so comparatively well, which implies that the even split hypothesis was incorrect.
However, I’ve spent enough time in Zoombini Land, so that’s all for today. Algorithm could be improved by allowing it to search the set of Zoombinis for interesting Zoombinis that are useful for hypothesis deduction.