SOCKS 5, step 1: Serializing a Token

The first project in the C++ for the self-taught series is a SOCKS proxy server that we will call “Chausette” – which is french for sock. SOCKS is a simple proxying protocol that supports proxying for any number of other protocols (it is protocol-independent) and that is supported by a wide variety of applications. It also has a few interesting features which include user authentication – which is a problem you are likely to run into very often.

If you’ve been following the bi-weekly installments of this series, you’ll have noted that we haven’t coded anything for quite a while now, so I’ll try to make up for that by including some coding in each and every one of the following installments – including this one.

In this installment, we will see how integers are stored in memory, how to overload operators, how to extend a standard I/O stream object by attaching a flag to it, what endianness is and how to handle it, and we will touch on a few templates.

SOCKS5 is governed by three RFCs: 1928, 1929 and 1961. The first, 1928, specifies the protocol itself and, among other things, specifies that a compliant implementation must also implement RFC 1961, which specifies GSSAPI authentication for SOCKS 5. Let’s take a glance at that RFC.

Before reading on, please take the time to download RFC 1961 and read it. You will notice that it doesn’t specifiy GSS-API but rather tells us how to use it. Among other things, it tell us to create a token and send it to the server. If we follow these instructions, we run into our first piece of code: the Token structure:

1
2
3
4
5
6
7
struct Token
{
	uint8_t 	ver_;
	uint8_t 	mtyp_;
	uint16_t 	len_;
	uint8_t 	token_[];
};

Now, if you imagine that uint8_t and uint16_t are real types (we’ll get them from the boost namespace) and that we have C99’s feature of empty arrays we still have a few problems. First of all, Token, as described here, is an incomplete type, even in C99. That means that we can’t create instances of it, can’t create arrays of them, etc. On the other hand, declaring a complete type, such as

1
2
3
4
5
6
7
struct Token
{
	uint8_t 	ver_;
	uint8_t 	mtyp_;
	uint16_t 	len_;
	uint8_t 	token_[65535];
};

which would be valid C++, declare a complete type, etc. has the drawback of having to allocate all of the 64K a token might weigh every time. Although that is a potentially serious drawback, that’s what we will do anyway. The reason for this is twofold: we want our program to be portable – some compilers don’t support the empty array in a structure and newer Microsoft compilers generate a warning for them – and we will handle the possibility of bloat differently – i.e. by allocating very few tokens.

The second problem is that computers store integers in a sometimes somewhat perculiar way.

Even if you don’t have any idea of how an integer is stored in computer memory, you’ll at least have heard something about zeroes and ones. Computers are build on electronics, which are built on electric circuits and signals of which you can measure the voltage, which is usually either 0V or some standard non-zero voltage (1.8, 3.3, 5.5, 12 or 24V). That non-zero voltage depends on the context the circuit is in, but in any case is considered “up” while a 0V signal is considered “down”. That means that any electric signal can have one of two logical values: “up” (1) or “down” (0). When we represent a signal like this, we call it a “bit”.

The logic inside a computer doesn’t deal with individual bits, though: it is much more practical – and more efficient – to work with a bunch of bits at the same time. That’s where “byte”s come from: on most modern computers, a byte is a collection of eight bits that can be managed efficiently by the processor. On older computers, that was also the size of its registers and of its “word”s – which used to be the name of the integer size that could be managed most efficiently. After a while, though, 16-bit architectures were introduced and, while a “byte” remained a collection of eight bits, a word became a 16-bit entity. A while later, the size of the computer’s word was doubled again, but the name stuck to the 16-bit entity, which meant the 32-bit version got a new name: double word, or dword. I’ll let you guess how big a quadword is.

To add to the confusion, bytes weren’t eight bits in size on all computer architectures – sometimes they were only seven bits – so the eight-bit byte is also called an “octet”. When we refer to bytes, we’ll always assume eight-bit bytes. We will avoid using the names “word”, “double word”, etc.

Now, as I explained, a bit can have two values: 0 and 1. That means that a combination of n bits can have 2n values: doubled for every bit. For example, two bits can have 22=four values: 00, 01, 10 and 11, three bits can have 23=eight values: 000, 001, 010, 011, 100, 101, 110 and 111, etc. An eight-bit value can therefore have 28=256 values. Each of the bits in such a multi-bit value is numbered from the least significant bit to the most significant bit, such that the least significant bit is bit 0. By convention, this means that we number them from right to left: just like with decimals, the most significant part of the number – the digit that has the most influence on the size of the number – is on the left.

If you’re good at math, you may have noticed that the bit numbers are also related to the value each of the bits “contributes” to the integer’s value. For example, an eight-bit value 01101011 has value:

value   bit   value
27 * 0 = 0
26 * 1 = 64
25 * 1 = 32
24 * 0 = 0
23 * 1 = 8
22 * 0 = 0
21 * 1 = 2
20 * 1 = 1

Now, I’ve already told you that computer architectures progressively moved to bigger word sizes: from eight bits to 16, to 32. The funny thing is, though, that the way these values are stored in memory aren’t the same for all architectures: there is a thing called “endianness” which determines in which order the bits and bytes in an integer are stored. “Big-endian” architectures store the value in memory as one might expect it: the most significant bit first. “Little-endian” architectures, however, split the value into bytes and store the bytes in reverse order, so 0x01020304 actually becomes 0x04030201 in memory. Note that this isn’t 0x00418081, which would be the same bits but in reverse order. This can be a bit confusing, but it is also very important – and the reason why I’ve been going on about bits, bytes and words: by convention, integers are always transmitted over a network in big-endian (most significant bit first) order – which means that if we want to transmit our Token structure, we may have some work to do.

We will call the process of converting an in-memory structore to something we can send over the wire “serialization”. There are other uses for serialization (such as saving the structure to a file) but we will get to those when we need them. One way of serializing is to overload the operator that most already-existing classes use to serialize something into a stream: operator<<. Of course, the reverse process would be done by operator>>. The usual signature of these operators would look like this:

std::ostream & operator<<(std::ostream & os, const Token & token);
std::istream & operator>>(std::istream & is, const Token & token);

but these operators don’t tell us how the structure should be formatted, so the user might expect something human-readable rather than something machine-readable – or they might expect something that works for both. In order to avoid such confusion, we will add a “formatting helper” called “Bin“, which will look like this:

struct Bin
{
	Bin()
		: os_(0)
		, is_(0)
	{ /* no-op */ }
 
	std::ostream * os_;
	std::istream * is_;
};

and we will split our operators into two (each), resulting in these declarations:

Bin & operator<<(std::ostream & os, Bin & bin);
Bin & operator>>(std::istream & is, Bin & bin);
std::ostream & operator<<(Bin & bin, const Token & token);
std::istream & operator>>(Bin & bin, Token & token);

As you can see, these declarations depend on a little trick in which operator<< doesn’t, as it usually would, return its first argument, but rather wraps that argument into an object that will tell the implementation what kind of formatting we want. There are alternative approaches to doing this, which may be better in some cases, but we’ll look at this one first. Here’s what the implementation looks like:

Bin & operator<<(std::ostream & os, Bin & bin)
{
	bin.os_ = &os;
	return bin;
}
 
Bin & operator>>(std::istream & is, Bin & bin)
{
	bin.is_ = &is;
	return bin;
}
 
std::ostream & operator<<(Bin & bin, const Token & token)
{
	assert(bin.os_ != 0);
	std::ostream & os(*bin.os_);
	bin.os_ = 0;
	os.put(token.ver_);
	os.put(token.mtyp_);
	boost::uint16_t len(token.len_);
	if (need_swab__.need_swab_)
	{
		len = swab(len);
	}
	else
	{ /* no swab needed */ }
	os.write((const char*)&len, sizeof(len));
	os.write((const char*)token.token_, token.len_);
 
	return os;
}
 
std::istream & operator>>(Bin & bin, Token & token)
{
	assert(bin.is_ != 0);
	std::istream & is(*bin.is_);
	bin.is_ = 0;
	char c;
	is.get(c);
	token.ver_ = c;
	is.get(c);
	token.mtyp_ = c;
	is.read((char*)&token.len_, sizeof(token.len_));
	if (need_swab__.need_swab_)
	{
		token.len_ = swab(token.len_);
	}
	else
	{ /* no swab needed */ }
	is.read((char*)token.token_, token.len_);
 
	return is;
}

As you can see, the first two operators wrap the stream object into the Bin and return the Bin object. The latter two take the stream from the Bin object, fill it up or read it (depending on the operator) and return the stream. The way to use this would be like this:

Token token;
/* fill in the token */
std::stringstream ss;
ss << bin__ << token; // assuming bin__ is a global static of type Bin
/* elsewhere */
Token output;
ss >> bin__ >> output;

Of course, this works like a charm – especially in single-threaded environments and in environments where bin__ is thread-local. The following code, however, will not work:

ss << bin__ << token << another_token;

That is because unlike std::hex for example, Bin() does not change the state of the stream but rather puts itself in place of the stream. Here's where the alternative comes in, which is automagically thread-safe, by the way:

#include "Token.h"
#include <cassert>
#include <ios>
 
static int stream_flag_id__ = std::ios_base::xalloc();
 
/* snip */
 
std::ostream & operator<<(std::ostream & os, const Bin & bin)
{
	if (bin.unbin_)
	{
		os.iword(stream_flag_id__) &= 0xfe;
	}
	else
	{
		os.iword(stream_flag_id__) |= 1;
	}
 
	return os;
}
 
std::istream & operator>>(std::istream & is, const Bin & bin)
{
	if (bin.unbin_)
	{
		is.iword(stream_flag_id__) &= 0xfd;
	}
	else
	{
		is.iword(stream_flag_id__) |= 2;
	}
 
	return is;
}
 
std::ostream & operator<<(std::ostream & os, const Token & token)
{
	assert((os.iword(stream_flag_id__) & 1) == 1); // not implemented otherwise for the moment
	os.put(token.ver_);
	os.put(token.mtyp_);
	boost::uint16_t len(token.len_);
	if (need_swab__.need_swab_)
	{
		len = swab(len);
	}
	else
	{ /* no swab needed */ }
	os.write((const char*)&len, sizeof(len));
	os.write((const char*)token.token_, token.len_);
 
	return os;
}
 
std::istream & operator>>(std::istream & is, Token & token)
{
	assert((is.iword(stream_flag_id__) & 2) == 2); // not implemented otherwise for the moment
	char c;
	is.get(c);
	token.ver_ = c;
	is.get(c);
	token.mtyp_ = c;
	is.read((char*)&token.len_, sizeof(token.len_));
	if (need_swab__.need_swab_)
	{
		token.len_ = swab(token.len_);
	}
	else
	{ /* no swab needed */ }
	is.read((char*)token.token_, token.len_);
 
	return is;
}

Now, the "automagically thread-safe" part comes from the fact that there's no longer any global mutable object: the instance of Bin is constant. This works because during start-up, we allocate an index in an internal "extension array" in all streams, current and future, that we will use for a flag. The flag will indicate to any code that supports it that the stream is in "binary mode" and therefore expects binary serialization into it. The thread-safety is limited, by the way, in that no two threads are expected to use the flag at the same time - but it is generally not a good idea to share a stream among threads.

Of course, our operators now all return the stream they work on and the operators that work on the stream expect the stream to always have the appropriate binary flag set. This flag is independent from the mode in which the stream was opened, which is unfortunate but cannot be avoided for the moment.

You'll also have noticed that there's a chunk of code missing above (if you tried to copy-paste and compile it, you'll certainly have noticed that): we've overloaded the two operators but we haven't solved our endianness problem yet. To do that, we need to do two things: we need to detect whether we actually need to do anything, and we need to do that "anything" when we do. Let's start with the detecting part: if an integer is byte-swapped in memory, that means that if we look at the bytes directly in memory, they'll be swapped. If they aren't, they won't be. We only need to byte-swap them if the processor has done that too. Here's the code to detect that:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
struct NeedSwab
{
	BOOST_STATIC_ASSERT(sizeof(boost::uint16_t) == (2 * sizeof(boost::uint8_t)));
	union U_
	{
		boost::uint16_t u16_;
		boost::uint8_t u8_[sizeof(boost::uint16_t)];
	};
	BOOST_STATIC_ASSERT(sizeof(U_) == sizeof(boost::uint16_t));
	BOOST_STATIC_ASSERT(offsetof(U_, u16_) == 0);
	BOOST_STATIC_ASSERT(offsetof(U_, u8_[0]) == 0);
	BOOST_STATIC_ASSERT(offsetof(U_, u8_[1]) == 1);
	BOOST_STATIC_ASSERT(sizeof(char) == sizeof(boost::uint8_t));
 
	NeedSwab()
	{
		U_ u;
		u.u16_ = 0x0102;
		need_swab_ = (u.u8_[0] == 0x02);
		if (need_swab_)
		{
			assert(u.u8_[1] == 0x01);
		}
		else
		{
			assert(u.u8_[0] == 0x01);
			assert(u.u8_[1] == 0x02);
		}
	}
 
	bool need_swab_;
};
static NeedSwab need_swab__;

Let's go through this code line by line: on line 33, there's a static instance of the struct we declare in the 32 other lines. That means that "some time before main is called" this code is called, which means we effectively always know whether we need to byte-swap or not. Like I said before, we find out whether or not we need to byte-swap (called "to swab" in this code) by looking at an integer in memory. If an integer value 0x0102 is stored most-significant-bit-first in memory, this two-byte value will be stored as a 0x01 followed by a 0x02. Otherwise, it will be stored as 0x02 followed by 0x01. In order to look at that, we need to have two variables - a 16-bit integer and a two-byte array of bytes - in the same location. The union declared on lin 4 does exactly that, and the static assertions on lines 3, 9, 10, 11, 12 and 13 make sure the compiler does what we expect it to do here: on line 3, we make sure a 16-bit value is stored in two bytes; on line 9 we make sure the union is the same size as the 16-bit integer; on line 12 we make sure that the second byte is where we expect it to be; on line 13 we make sure a byte is a byte is a byte and lines 10 and 11 make sure the first byte and the start of the 16-bit integer are really the same. Of course, if any single one of these assertions would fail, most likely all of them would and the compiler would be very broken, "but ye never know until ye find out".

Now, in order to know whether we need to swab, we create an instance of the union, put a value in it and see how that value is stored in memory. Lo and behold, on my computer we need to swab!

Now let's see how swabbing is done:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
template < typename T >
union U_
{
        BOOST_STATIC_ASSERT(boost::is_pod< T >::value);
        U_(T t)
                : t_(t)
        { /* no-op */ }
 
        T t_;
        boost::uint8_t u8_[sizeof(T)];
};
 
template < typename T, unsigned long s >
struct Swabber_;
 
template < typename T >
struct Swabber_< T, 2 >
{
	static T swab(T t)
	{
		assert(need_swab__.need_swab_);
		U_< T > u(t);
		std::swap(u.u8_[0], u.u8_[1]);
		return u.t_;
	}
};
 
template < typename T >
struct Swabber_< T, 4 >
{
	static T swab(T t)
	{
		assert(need_swab__.need_swab_);
		U_< T > u(t);
		std::swap(u.u8_[0], u.u8_[3]);
		std::swap(u.u8_[1], u.u8_[2]);
		return u.t_;
	}
};
 
template < typename T >
T swab(T t)
{
	return Swabber_< T, sizeof(T) >::swab(t);
}

Again, line by line: the function itself is declared on lines 41 through 45. As you can see, it only calls the appropriate Swabber_ - which is almost dutch for mop. I've implemented two Swabber_s here: one for two-byte values and one for four-byte values. Now, the Standard guarantees that a memcopy of a plain-old-date (POD) type will result in an exact copy of that POD type, which implies that if we swab the same POD twice we will get our POD back. That also means that we can only swab PODs, because swabbing anything else would result in undefined behavior. Hence the static assertion on line 4: we can only create unions of PODs anyway, and that's how we swab, so we assert that we're swabbing a POD at compile-time. Each version of Swabber_ creates an instance of this union and swaps the bytes around with std::swap, resulting in the byte-swapped value. Now, let's take a final look at all of this put together:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
#include "Token.h"
#include <cassert>
#include <cstddef>
#include <sstream>
#include <boost/type_traits/is_pod.hpp>
 
static int stream_flag_id__ = std::ios_base::xalloc();
 
struct NeedSwab
{
	BOOST_STATIC_ASSERT(sizeof(boost::uint16_t) == (2 * sizeof(boost::uint8_t)));
	union U_
	{
		boost::uint16_t u16_;
		boost::uint8_t u8_[sizeof(boost::uint16_t)];
	};
	BOOST_STATIC_ASSERT(sizeof(U_) == sizeof(boost::uint16_t));
	BOOST_STATIC_ASSERT(offsetof(U_, u16_) == 0);
	BOOST_STATIC_ASSERT(offsetof(U_, u8_[0]) == 0);
	BOOST_STATIC_ASSERT(offsetof(U_, u8_[1]) == 1);
	BOOST_STATIC_ASSERT(sizeof(char) == sizeof(boost::uint8_t));
 
	NeedSwab()
	{
		U_ u;
		u.u16_ = 0x0102;
		need_swab_ = (u.u8_[0] == 0x02);
		if (need_swab_)
		{
			assert(u.u8_[1] == 0x01);
		}
		else
		{
			assert(u.u8_[0] == 0x01);
			assert(u.u8_[1] == 0x02);
		}
	}
 
	bool need_swab_;
};
static NeedSwab need_swab__;
 
template < typename T >
union U_
{
	BOOST_STATIC_ASSERT(boost::is_pod< T >::value);
	U_(T t)
		: t_(t)
	{ /* no-op */ }
 
	T t_;
	boost::uint8_t u8_[sizeof(T)];
};
 
template < typename T, unsigned long s >
struct Swabber_;
 
template < typename T >
struct Swabber_< T, 2 >
{
	static T swab(T t)
	{
		assert(need_swab__.need_swab_);
		U_< T > u(t);
		std::swap(u.u8_[0], u.u8_[1]);
		return u.t_;
	}
};
 
template < typename T >
struct Swabber_< T, 4 >
{
	static T swab(T t)
	{
		assert(need_swab__.need_swab_);
		U_< T > u(t);
		std::swap(u.u8_[0], u.u8_[3]);
		std::swap(u.u8_[1], u.u8_[2]);
		return u.t_;
	}
};
 
template < typename T >
T swab(T t)
{
	return Swabber_< T, sizeof(T) >::swab(t);
}
 
std::ostream & operator<<(std::ostream & os, const Bin & bin)
{
	if (bin.unbin_)
	{
		os.iword(stream_flag_id__) &= 0xfe;
	}
	else
	{
		os.iword(stream_flag_id__) |= 1;
	}
 
	return os;
}
 
std::istream & operator>>(std::istream & is, const Bin & bin)
{
	if (bin.unbin_)
	{
		is.iword(stream_flag_id__) &= 0xfd;
	}
	else
	{
		is.iword(stream_flag_id__) |= 2;
	}
 
	return is;
}
 
std::ostream & operator<<(std::ostream & os, const Token & token)
{
	assert((os.iword(stream_flag_id__) & 1) == 1); // not implemented otherwise for the moment
	os.put(token.ver_);
	os.put(token.mtyp_);
	boost::uint16_t len(token.len_);
	if (need_swab__.need_swab_)
	{
		len = swab(len);
	}
	else
	{ /* no swab needed */ }
	os.write((const char*)&len, sizeof(len));
	os.write((const char*)token.token_, token.len_);
 
	return os;
}
 
std::istream & operator>>(std::istream & is, Token & token)
{
	assert((is.iword(stream_flag_id__) & 2) == 2); // not implemented otherwise for the moment
	char c;
	is.get(c);
	token.ver_ = c;
	is.get(c);
	token.mtyp_ = c;
	is.read((char*)&token.len_, sizeof(token.len_));
	if (need_swab__.need_swab_)
	{
		token.len_ = swab(token.len_);
	}
	else
	{ /* no swab needed */ }
	is.read((char*)token.token_, token.len_);
 
	return is;
}
 
std::vector< char > serialize(const Token & token)
{
	std::stringstream ss(std::ios::out | std::ios::binary);
	ss << bin << token;
	std::string s(ss.str());
	return std::vector< char >(s.begin(), s.end());
}
 
Token deserialize(const std::vector< char > & serialized_token)
{
	std::string s(serialized_token.begin(), serialized_token.end());
	std::stringstream ss(s, std::ios::in | std::ios::binary);
	Token token;
	ss >> bin >> token;
	return token;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#ifndef VLINDER_CHAUSETTE_TOKEN_H
#define VLINDER_CHAUSETTE_TOKEN_H
 
#include <boost/cstdint.hpp>
#include <boost/static_assert.hpp>
#include <iostream>
#include <vector>
 
BOOST_STATIC_ASSERT(sizeof(boost::uint8_t) == 1);
BOOST_STATIC_ASSERT(sizeof(boost::uint16_t) == 2);
 
struct Token
{
	boost::uint8_t ver_;
	boost::uint8_t mtyp_;
	boost::uint16_t len_;
	boost::uint8_t token_[65535];
};
 
struct Bin
{
	Bin(bool unbin = false)
		: unbin_(unbin)
	{ /* no-op */ }
 
	bool unbin_;
};
 
static Bin bin;
static Bin unbin(true);
 
std::ostream & operator<<(std::ostream & os, const Bin & bin);
std::istream & operator>>(std::istream & is, const Bin & bin);
std::ostream & operator<<(std::ostream & os, const Token & token);
std::istream & operator>>(std::istream & is, Token & token);
std::vector< char > serialize(const Token & token);
Token deserialize(const std::vector< char > & serialized_token);
 
#endif

About rlc

Software Analyst in embedded systems and C++, C and VHDL developer, I specialize in security, communications protocols and time synchronization, and am interested in concurrency, generic meta-programming and functional programming and their practical applications. I take a pragmatic approach to project management, focusing on the management of risk and scope. I have over two decades of experience as a software professional and a background in science.
This entry was posted in C++ for the self-taught and tagged . Bookmark the permalink.