Optimization by puzzle
Given a query
routine that takes a name and may return several, write a routine that takes a single name and returns a set of names for which each of the following is true:
- For each name in the set,
query
has been called exactly once. - All the results from the calls to
query
are included in the set - the parameter to the routine is not included in the set
You may assume the following:
- Calls to
query
are idempotent1. - There is a finite number of values for names.
- Names are less-than-comparable value-types (i.e. you can store them in an
std::set
) and are not expensive to copy query
results never contain their argument2.
This is almost exactly the problem I had to solve recently: a tool was taking several minutes to perform a routine task that, in my opinion, should take milliseconds. Several other issues were involved as well, but this one has the bonus of being fun.
I should make this an interview question.
The way this ends up working is as follows:
- We create three sets: one for the
results
, one for the things we’vechecked
and one for the things that remainto_check
. - We insert the value we got as a parameter in the
to_check
set. -
As long as there are things left to check:
- run
query
for each value into_check
- insert the results from the query in the
results
set - After iterating over each of the values, insert the values from
to_check
into thechecked
set, - clear the
to_check
set - fill
to_check
with the set difference between theresults
and thechecked
sets
Or, in C++:
template < typename T, typename F >
set< T > foo(T t, F query)
{
set< T > results;
set< T > checked;
set< T > to_check;
to_check.insert(t);
do
{
for (typename set< T >::const_iterator check(to_check.begin()); check != to_check.end(); ++check)
{
typename F::result_type query_results(query(*check));
results.insert(query_results.begin(), query_results.end());
}
checked.insert(to_check.begin(), to_check.end());
to_check.clear();
set_difference(results.begin(), results.end(), checked.begin(), checked.end(), inserter(to_check, to_check.end()));
} while (!to_check.empty());
return results;
}
Insertion into a set is $O(\lg{n})$ so lines 43 and 45 are both $O(n\lg{n})$. Line 46 should be $O(c)$ but is probably $O(n)$. Line 47 is $O(n)$ so the whole things boils down to $O(n\lg{n})$ complexity.
In order to play with the code a bit, I put it on GitHub as a Gist, with a test case (Query fails if you call it more than once with the same value):