Run-time composed predicates and Code generation

While working on Arachnida, preparing version 2.2 due out this fall, one of the things we’ll be introducing is a hardened OpenSSL transport-layer-security plug-in, to replace the one we’ve had for the last seven or so years. One of the new features in this plug-in (which is part of Arachnida’s “Scorpion” module) is a much more flexible configuration scheme including the subject of today’s post: run-time composed predicates.

As the name indicates, run-time composed predicates are predicates that are composed at run-time. In this case, we use them for post-connection validations of the SSL/TLS connection: the user can plug their own post-connection validations in and combine them with the ones provided in the library using AND, OR, NOR, NAND, XOR and NOT primitives. Typically, such a composed predicate would look like this:

configuration.post_connection_verification_predicate_ = and_(
    and_(  peer_provided_certificate__, fqdn_matches_peer__)
         , userProvidedPredicate);

in which userProvidedPredicate is a pointer to a user-provided predicate function whereas the other two predicates are included in the library.

The thing is that each of the following will also work:

// if the peer provided a predicate, assume everything is fine
configuration.post_connection_verification_predicate_ = peer_provided_certificate__;
// we accept this only of the FQDN in the peer-provided certificate DOES NOT match the peer's FQDN
// THIS IS STUPID - DO NOT DO THIS IN YOUR CODE!
configuration.post_connection_verification_predicate_ = not_(fqdn_matches_peer__);
// apply only the user's predicate
configuration.post_connection_verification_predicate_ = userProvidedPredicate;

The trick here is that the predicate type, PostConnectionVerificationPredicate, is a function-to-pointer type and the functions and_, or_, xor_, nand_, nor_ and not_ each return a function to a “newly created” function.

Of course, C++ does not allow the creation of functions at run-time and, as the call-back is passed to OpenSSL and OpenSSL is written in C, more to the point, neither does C.

As Arachnida is designed to run on industrial control systems and industrial embedded devices, we want to avoid run-time memory allocation if at all possible — and when that’s not possible, we want to control it. In this case, we avoid it by creating an array of pointers to functions, another array of “configurations” for those functions and a function for each position in the array. We do this using a Perl script (because we usually use Perl to generate code and it integrates nicely with our build system).

The following chunk of code is the generation script verbatim — annotated.

First, the usual pre-amble code: for the Perl part, this is invoking the interpreter; for the C++ code, this is including the neccessary headers.

#! /usr/bin/env perl
my $name = $0;
my $max_predicate_count = 20;
 
print <<EOF
#line 7 "${name}"
#include "Scorpion/OpenSSL/Details/PostConnectionVerificationPredicate.h"
#include <new>
#include <stdexcept>

The maximum predicate count is set above, and replicated in the generated C++ source code here. To change it, we currently need to change the script. At some point (probably before version 2.2 of Arachnida is released) this will become a command-line argument to the script.

#define MAX_PREDICATE_COUNT ${max_predicate_count}
 
namespace Scorpion { namespace OpenSSL { namespace Details {
namespace {
	static unsigned int next_predicate_id__ = 0;

The following is how predicates are allocated: any call to any of the predicate construction functions (and_, or_, etc.) will call this once, and throw bad_alloc if it fails.

	unsigned int allocatePredicateID()
	{
		if (MAX_PREDICATE_COUNT == next_predicate_id__) throw std::bad_alloc();
		return next_predicate_id__++;
	}

The following structure holds the configuration of the “generated” predicate. This is all we need to know for any operator: what the left-hand-side of the expression is, what the right-hand-side is and what operator it is. One operator is unary, all the others are binary. The unary one only uses the lhs_ member of this structure.

	struct PredicateInfo
	{
		enum Type {
			  and__
			, or__
			, xor__
			, nand__
			, nor__
			, not__
		};
 
		Type type_;
		PostConnectionVerificationPredicate lhs_;
		PostConnectionVerificationPredicate rhs_;
	};

The following is an array of each of these configurations, followed by Perl code to generate each of the functions. I could have used a template to generate these rather than generated code but I find as long as I’m generating code anyway, it makes more sense to just keep generating — especially if there’s no compelling reason to do otherwise.

	PredicateInfo predicate_infos__[MAX_PREDICATE_COUNT];
EOF
;
 
for (my $i = 0; $i < $max_predicate_count; ++$i) {
	print <<EOF
#line 46 "${name}"
	bool predicate${i}(SSL *ssl, char *host)
	{
		switch (predicate_infos__[${i}].type_)
		{
		case PredicateInfo::and__ :
			return (predicate_infos__[${i}].lhs_(ssl, host) && predicate_infos__[${i}].rhs_(ssl, host));
		case PredicateInfo::or__ :
			return (predicate_infos__[${i}].lhs_(ssl, host) || predicate_infos__[${i}].rhs_(ssl, host));
		case PredicateInfo::xor__ :
		{
			long lhs_result(predicate_infos__[${i}].lhs_(ssl, host));
			long rhs_result(predicate_infos__[${i}].rhs_(ssl, host));
 
			return ((lhs_result != 0) ^ (rhs_result != 0));
		}
		case PredicateInfo::nand__ :
			return !(predicate_infos__[${i}].lhs_(ssl, host) && predicate_infos__[${i}].rhs_(ssl, host));
		case PredicateInfo::nor__ :
			return !(predicate_infos__[${i}].lhs_(ssl, host) && predicate_infos__[${i}].rhs_(ssl, host));
		case PredicateInfo::not__ :
			return !predicate_infos__[${i}].lhs_(ssl, host);
		}
		throw std::logic_error("Should not reach this code");
	}
EOF
	;
}

We can now generate the array of function pointers that the operator/generator code will pick from:

print <<EOF
#line 77 "${name}"
	PostConnectionVerificationPredicate predicates__[] = {
EOF
;
my $first = 1;
for (my $i = 0; $i < $max_predicate_count; ++$i) {
	if ($first) {
	print <<EOF
#line 84 "${name}"
		  predicate${i}
EOF
		;
	}
	else {
	print <<EOF
#line 91 "${name}"
		, predicate${i}
EOF
		;
	}
	$first = 0;
}
print <<EOF
#line 99 "${name}"
	};
EOF
	;
 
print <<EOF
#line 105 "${name}"
}
EOF
;

and create a function for each operator. Not that the binary operators are all the same for all intents and purposes, so might as well generate those too.

my @keywords = qw/and or nor xor nand/;
 
foreach $keyword (@keywords)  {
	print <<EOF
#line 113 "${name}"
PostConnectionVerificationPredicate ${keyword}_(PostConnectionVerificationPredicate lhs, PostConnectionVerificationPredicate rhs)
{
	unsigned int predicate_id(allocatePredicateID());
	predicate_infos__[predicate_id].type_ = PredicateInfo::${keyword}__;
	predicate_infos__[predicate_id].lhs_ = lhs;
	predicate_infos__[predicate_id].rhs_ = rhs;
	return predicates__[predicate_id];
}
EOF
	;
}
 
print <<EOF
#line 127 "${name}"
PostConnectionVerificationPredicate not_(PostConnectionVerificationPredicate lhs)
{
	unsigned int predicate_id(allocatePredicateID());
	predicate_infos__[predicate_id].type_ = PredicateInfo::not__;
	predicate_infos__[predicate_id].lhs_ = lhs;
	predicate_infos__[predicate_id].rhs_ = 0;
	return predicates__[predicate_id];
}
 
}}}
EOF
;

A few fun tidbits: the #line directives tell the compiler where to look for the code for stepping etc., so if you step through this code you’ll be stepping into Perl!

This approach works for a whole slew of other repetitive code. Generated code, once debugged etc., usually scales pretty well: if I need a thousand of these operators for some reason, I have one constant to change and no other questions to ask (except perhaps why I could possibly need that many predicates!)

I used a very similar approach to translate a dump from the Unicode into C code to parse it: computers are very good at repeating themselves with minor variations in what they’re saying. This is an example of how you can reduce the amount of work you do by making the computer do more.

About rlc

Software Analyst in embedded systems and C++, C and VHDL developer, I specialize in security, communications protocols and time synchronization, and am interested in concurrency, generic meta-programming and functional programming and their practical applications. I take a pragmatic approach to project management, focusing on the management of risk and scope. I have over two decades of experience as a software professional and a background in science.
This entry was posted in C & C++, C++ for the self-taught, Software Design and tagged , . Bookmark the permalink.