# Optimization by puzzle

Given a `query` routine that takes a name and may return several, write a routine that takes a single name and returns a set of names for which each of the following is true:

1. For each name in the set, `query` has been called exactly once.
2. All the results from the calls to `query` are included in the set
3. the parameter to the routine is not included in the set

You may assume the following:

1. Calls to `query` are idempotent1.
2. There is a finite number of values for names.
3. Names are less-than-comparable value-types (i.e. you can store them in an `std::set`) and are not expensive to copy
4. `query` results never contain their argument2

This is almost exactly the problem I had to solve recently: a tool was taking several minutes to perform a routine task that, in my opinion, should take milliseconds. Several other issues were involved as well, but this one has the bonus of being fun.

I should make this an interview question.

The way this ends up working is as follows:

1. We create three sets: one for the `results`, one for the things we’ve `checked` and one for the things that remain `to_check`.
2. We insert the value we got as a parameter in the `to_check` set.
3. As long as there are things left to check:
1. run `query` for each value in `to_check`
2. insert the results from the query in the `results` set
3. After iterating over each of the values, insert the values from `to_check` into the `checked` set,
4. clear the `to_check` set
5. fill `to_check` with the set difference between the `results` and the `checked` sets

Or, in C++:

```30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 template < typename T, typename F > set< T > foo(T t, F query) { set< T > results; set< T > checked; set< T > to_check; to_check.insert(t);   do { for (typename set< T >::const_iterator check(to_check.begin()); check != to_check.end(); ++check) { typename F::result_type query_results(query(*check)); results.insert(query_results.begin(), query_results.end()); } checked.insert(to_check.begin(), to_check.end()); to_check.clear(); set_difference(results.begin(), results.end(), checked.begin(), checked.end(), inserter(to_check, to_check.end())); } while (!to_check.empty()); return results; }```

Insertion into a set is $O(\lg{n})$ so lines 43 and 45 are both $O(n\lg{n})$. Line 46 should be $O(c)$ but is probably $O(n)$. Line 47 is $O(n)$ so the whole things boils down to $O(n\lg{n})$ complexity.

In order to play with the code a bit, I put it on GitHub as a Gist, with a test case (Query fails if you call it more than once with the same value):

1. so you really do need to call them only once []
2. i.e. for the case at hand, we’re querying a directed acyclic graph, so our first argument will never be seen in any of the `query` results, although any given value may appear more than once in `query` results []
 Tweet