• Lawyers- you gotta (something) them

    we will need to make sure legal requirements are at least similar across the board

    Legal requirements are something engineers don’t like to think too much about: they get in the way of progress and are generally perceived as a pain in the nether regions. There are some disparaging ideas about what lawyers do on a daily basis that seem to be prevalent in the industry. From my point of view, though, they’re not that different from software developers. They just have extra difficulties software developers don’t have: they can only test their “code” by confronting a judge. In software, we have almost instant feed-back running our code using unit tests.

  • Coaching and problem solving

    I am not a teacher. According to my wife, who is a professor at law and therefore knows a thing or two about teaching, I am really, definitely not a teacher. I may have taught the occasional workshop and may try to explain things from time to time, but who am I to argue with my wife? I do find myself in the position of having to explain things a lot, though, and with today’s teleconferencing technologies, I find myself explaining to an ever-wider audience. The people on the other end of the connection are generally not novices: we share a common vocabulary and a common way of thinking about problems that makes it easier to convey whatever message I’m trying to convey. For my wife, that would be the equivalent of talking to graduate or post-graduate students. Sometimes, though, I don’t get my point accross, so I decided to read up on teaching.

  • Fundamental limitations of quantum error mitigation

    People have different ways of relaxing. Some people like to watch movies, others like to listen to music, … I like to read papers, usually either about cybersecurity or quantum computing. Yesterday, I had a bit of time on my hands and decided to read on the latter: I had found an interesting paper called “Fundamental limitations of quantum error mitigation” on Arxiv, in which the authors, Ryuji Takagi, Suguru Endo, Shintaro Minagawa and Mile Gu, propose a new model for quantum error mitigation and, building on that model, find the fundamental limits.

  • Contents of the Quebec vaccine passport -- TMI?

    While driving this afternoon, my wife and I had a chat about the contents of the QR code that encodes the vaccine passport here in Quebec. Apparently there had been some questions to the premier and the minister of health about “hackers” getting to its contents, and the privacy implications of such “cracks”. I had some ideas on how I’d design it, but I didn’t know how it actually worked, and I was clueless as to what a hacker could well crack (regardless of the color of their hat). Surely the contents would be signed and there’d be no more than strictly necessary encoded in the “passport”?

  • Experimental test of local observer-independence

    In a recent paper published on arxiv, what was formerly a thought experiment has been realized (with minor tweaks) and, while some say this indicates there is no objective reality, I rather think it means something else.

  • Here’s something I don’t understand

    Every time I look at VHDL code written by (sometimes veteran, sometimes not so much) firmware engineers, the code looks similar: a bunch of signals coming in with their direction encore in the name, and sometimes the polarity as well, but very little in the way of functionality: sometimes it’s just the datasheet pin name of the device the signal is from that made it all the way into the component I’m looking at (which, when I find that annoying, is not the top).

    This part I kinda get: it’s the same issue we’ve had in software for ages, dating back before the Hungarian warthogs of the 1990s.

  • Wow, this is weird

    The last time I wrote anything on this blog was more than 20 months ago. Back then, I had just come back from spending Christmas with friends and family in Florida. Since then, we’ve been in a global pandemic and no-one in their right mind would go to Florida.

  • Happy new year!

    Happy new year!

    2019 was an interesting year for many reasons, and I was lucky enough to finish it with family and friends in sunny Florida. 2020 also promises to be interesting, but being the nerd and nit-picker that I am, let me just rant about one minor detail…

  • Weird title in this morning's Washington Post

    I subscribe to the Washington Post, not because I read it that much (I don’t have much time for that), but because I think they do a good job of balanced journalism that warrants the few dollars the subscription costs. After all, journalists need to eat too. I strongly suggest you do the same for whatever press outlet you think does a good job.

    While flipping through its pages today, I came accross a title that looked, wrong: “A NASA spacecraft circling the sub stumbled upon a trail of shooting stars”. Any geek worth their salt will see what’s wrong…

  • Quantum teleportation

    A bit more fun with quantum computing…

    Quantum teleportation is one of those things that Star Trek fans (like myself) like to believe is a dream come true: if it’s possible to teleport qubits, surely it may be possible to teleport real-world things some day?

  • My first results with quantum computing experiments

    I ran two quantum circuits on two real quantum computers and one simulator. I’ll share my results and some observations.

  • Why classical computers need exponentially more time and memory to simulate quantum computers

    If you’re a bit like me, you get annoyed by the over-simplified explanations of quantum computers that have been going around since Google demonstrated quantum supremacy. One of the things that those explanations always gloss over is how it’s so much harder for a classical computer to simulate a quantum computer running what is basically linear algebra, than it is for a quantum computer to just run it. The answer to that is quantum entanglement, and in this post I will try to explain how it works.

    I should point out that this means either math or meth will be involved in understanding what I’m about to write. The second option being temporary for understanding and permanent for negative effects, I recommend the first.

  • Authentication of individual users in DNP3 Secure Authentication- TB2019-001, and more

    In February of this year, the DNP Technical Committee published TB2019-001: “Authentication of individual users is obsolete in DNP3-SA”. This technical bulletin, which was the first work item from the Tech Committee’s Secure Authentication Task Force to be published, was the fruit of two and a half years of work between the moment the Tech Committee decided to remove multi-user support and the moment the document was created, edited, reviewed, etc.

    In this post, I will take a close look at what the impact of this document is on existing implementations of DNP3: systems, devices and firmware.

  • We live in a wonderful world

    We truly live in a wonderful world that would have been impossible to imagine only a few decades ago.

    Allow me to wax eloquent for a moment.

  • When RSA dies

    The TL;DR:

    Below, I explain (as best I can):

    • why the end of RSA is nigh
    • why ephemeral Diffie-Hellman will survive
    • what we can and cannot build on top of ephemeral Diffie-Hellman
    • what this means for post-quantum PKI
    • why we need a quantum-resistant digital signature algorithm

    All of this is both complex and complicated. It is hard to write about this with any level of accuracy and still be readable for someone who hasn't spent an unreasonable amount of time steeped in articles about abstract math.

    I gloss over a lot of details trying to keep it reasonably understandable, and I hope I haven't dumbed everything down too much. I apologize in advance both for the bits that are too hard to understand, and the bits that may seem too obvious. It's hard to find a middle ground.

  • Quantum Supremacy

    A few days ago, the Financial Times reported that “Google claims to have reached quantum supremacy”. The paper in question, available here, explains how they reached this milestone, and how they proved it. It does beg the question, though: what is quantum supremacy?

  • That lunch wasn't free after all

    The Spectre and Meltdown bugs have shown that the free lunch was indeed over a decade ago. We should therefore stop attempting to exploit instruction-level parallelism with ever more complex stacks and ever more complex pipelines and branch predictors, and start exploiting the inherent parallelism of hardware. In order to do that, we need to change the way we thing about software from our current imperative way of thinking to a more declarative way of thinking. At the same time, we need to change the way our computers think about software to allow them to exploit this more declarative style to use their inherent parallelism and free up die space currently used for caches and ILP.

    Read the article

  • reboot

    Readers of this blog may wonder what happened: the layout is completely different, but the thing is also a lot faster..??!

    The reason for this is simple: I stopped using Wordpress. It was giving me more trouble than it was worth, so I decided to move the site, and all of its contents, to Jekyll. This has the advantage of being able to write directly in Vim, with the only minor disadvantage of having to build the site.

    Some formatting will need to be fixed for older posts, and I will start doing that when I have a bit of time. In the mean time, I will try to just post stuff out.

    Some of what I post may be PDF files: I’ve been using LaTeX a lot lately, and have some interesting stuff that presents better as PDFs. I’ll see how that turns out as well.

    Also: please let me know about any bugs you see: the site is large enough for me to not be able to re-read everything, so there may be a few bugs I’ve missed.

  • Update

    Readers of this site may wonder what I’ve been doing for last six months, which is an unusually long hiatus for me to not write anything. The answer is: I’ve been busy.

  • The Logging "problem"

    A recurring problem in real-time industrial devices is logging: you want to log all the information you need to diagnose a problem, but you don’t want to slow down your system and miss timing constraints, or unnecessarily burden your system when there are no problems. On the other hand, you often don’t know that there is a problem to be diagnosed (and therefore logged) before there is an actual problem, in which case you may be too late to start your logs.

    The solution seems obvious: just look a minute or so into the future and, if any trouble is brewing, start logging. But as a wise person once said: “Always in motion the future is.” In the real world, we need real solutions.

  • The Equifax data breach: what we know, what you can do, what's next

    The TL;DR:
    TL;DR mindmap
    TL;DR mindmap
  • temporarily down

    The C++ for the self-taught site is temporarily down for “unscheduled maintenance” (i.e. a bug).

    I haven’t had time to look into fixing it yet: I just found out it was misbehaving about an hour ago, during my routine check of my websites. I’ll try to fix it tonight and update this post when I have news.

    If you want to help out: you could donate to my BitCoin address 1JE9wominCU1mw1JtD7JWu8vfYfcGQ9pKj.

    **Update (21:58 EDT): ** An automatic updates seems to have bugged out and left the site inoperable. According to the logs this happened sometime during my vacation. The site looks OK now – please let me know if you see anything awry.

  • The problem with making things too easy

    The TL;DR:
  • To those of you who don't speak French and follow me on Twitter

    As may know, France is going to the polls tomorrow to elect a new president. They have a choice between an unaffiliated centrist, Emmanuel Macron, and an unavowed fascist, Marine le Pen.

    I am not French, but my wife is, and my children have a number citizenships among which French is one they all share. Aside from that, the stakes for the French election are much higher than they were for the Dutch elections, a few months ago, and arguably even for the American presidential election last November.

    Let me explain those assertions.

  • This guy is out of his mind (and lucky if he can still see)

    This guy has to be completely bonkers: he wrote an application in C# (would not have my language of choice) to detect a human face in a live video feed and point a laser at it.

  • "Police hack PGP server" -- really?

    This afternoon, this headline caught my attention: “Police hack PGP server with 3.6 million messages from organized crime BlackBerrys”. When I read it, I thought: “either the journalist/title writer got it wrong, or PGP is broken”.

  • Writing unmaintainable code in five easy steps

    I have recently had to modify some code that, to say the least, was very hard to maintain – or refactor, for that matter.

    The following are a few, firmly tongue-in-cheek, steps to make sure your code is thoroughly frustrating to whoever needs to maintain it after you.

  • Meetings, meetings, and more meetings

    Recently, I spent a significant part of the day in a meeting reviewing the year’s progress on several projects, including the introduction of an agile methodology – Scrum. The approach in the meeting was simple: write on a sticky note what we did well, and on another what we should not repeat or how we should improve. The subject was “Scrum/agile”. I only wrote one sticky note: “get rid of Scrum”.

    The TL;DR:
    Scrum, in my opinion, is (moderately) useful for small teams with a single, short-term project -- something like a web application. The overhead it imposes _vastly_ outweighs the benefits for larger teams and larger projects.
  • Debugging — or: what I do for a living

    I am often asked by friends and acquaintances of various backgrounds, what I do for a living. Depending on my mood at the time, I can answer in any number of ways, but invariably my answers are met with blank stares, questions that clearly demonstrate that I have once again failed to make myself understood and an eventual change of subject.

  • Really, Twitterverse?

    Felicia, our cat, relaxing
    Felicia, our cat, relaxing

    The Twitterverse has spoken, quietly, with a single vote – a cat it is…

  • Setting up a Xubuntu-based kiosk

    This is another “HOWTO” post – setting up a Xubuntu-based kiosk, which I did to make a new “TV” for my kids.

  • Technocracy II

    In my previous post, I described technocracy as something that is positive in project and product management, and in team organization. In this post, to supply a boundary to my previous text, I will make the case for the opposite.

  • Technocracy

    In a discussion with a “Product Owner” recently, I told him I take a more technocratic approach to project management than they did. We discussed different project management styles for the next hour or so.

    TL;DR: I believe that
    • to effectively and efficiently run a large team of developers who are collectively responsible for a product with a large code-base, that team needs to be organized as a network of smaller teams with experts leading each of those smaller teams, and
    • to successfully manage an "agile" development team and create a viable product, one has to have a vision and break it down from there.
  • Real-time thirsty

    The TL;DR:
    In this post, I show using a fictitious example why real-time systems are defined by their worst-case timing rather than their average-case timing.

    Imagine you’re running a coffee shop – not the kind you find in Amsterdam, but one where they actually serve coffee. Your customers are generally in a hurry, so they just want to get a cup of coffee, pay and leave to catch their plane, train or automobile. To attract more customers and appeal to the Geek crowd, you name your coffee shop “Real-Time Thirsty” and promise an “Average case serving within one minute!”.

    While you get many customers, you’re not getting the Geeks-in-a-hurry crowd you were expecting.

  • Setting up Cygwin for X forwarding

    The TL;DR:
    This is one of those "recipe" posts that tend to be useful if you happen to want to do exactly what I just did. The end result of this one is a Windows shortcut called "Linux terminal" on the desktop, that opens up an SSH terminal to a Linux box, with X forwarding.
  • Shutting down servers

    I used to have a server with five operating systems, running in VMs, merrily humming away compiling whatever I coded. I say “used to have” because I shut it down a few weeks ago. Now, I have those same operating systems, as well as a large number of others, running on systems I don’t need to worry about.

  • Checked output iterator

    While writing about security – which takes a great deal of my time lately, which is one of the reasons I haven’t updated my blog as often as I usually would – I came to the conclusion that, while I recommend using STL algorithms, iterators and containers for safety purposes that doesn’t solve the problem when the standard algorithms don’t check the validity of their output ranges.

  • Schoenmaker, blijf bij je leest (Cobbler, stick to your last)

    This is an old Dutch saying, which probably has its origins in a village with a particularly opinionated cobbler.

    I am not one to stick to my last – but if I were a cobbler, I don’t think I’d be that cobbler: I like to know what I’m doing.

  • Interesting modifications to the Lamport queue, part II

    In the previous installment, on this subject, I described a few modifications to the Lamport queue introduced by Nhat Minh Le et al. to relax operations on shared state as much as possible, while maintaining correctness.

    In this article, I will discuss the further optimizations to reduce the number of operations on shared state, thus eliminating the need for memory barriers completely in many cases.

  • Interesting modifications to the Lamport queue

    While researching lock-free queue algorithms, I came across a few articles that made some interesting modifications to the Lamport queue. One made it more efficient by exploiting C11’s new memory model, while another made it more efficient by using cache locality. As I found the first one to be more interesting, and the refinements more useful for general multi-threaded programming, I thought I’d explain that one in a bit more detail.

  • Progress in DNP3 security

    In July last year, I discussed why Adam Crain and Chris Sistrunk fuzzed DNP3 stacks in devices from various vendors, finding many issues along the way (see project Robus). This time, I’ll provide a bit of an overview of what has happened since.

  • CIS: "Protecting" code in stead of data

    The Windows API contains a synchronization primitive that is a mutual exclusion device, but is also a colossal misnomer. I mean, of course, the CRITICAL_SECTION.

  • CIS: Lock Leaks

    The two most popular threading APIs, the Windows API and pthreads, both have the same basic way of locking and unlocking a mutex – that is, with two separate functions. This leaves the code prone to lock leak: the thread that acquired a lock doesn’t release it because an error occurred.

  • CIS: Unexpected Coupling

    One of the most common problems with synchronization occurs when things need each other that you didn’t expect to need each other.

  • Git demystification

    There are a few misconceptions I hear about Git that I find should be cleared up a bit, so here goes:

  • Three ideas you should steal from Continuous Integration

    I like Continuous Integration – a lot. Small incremental changes, continuous testing, continuous builds: these are Good Things. They provide statistics, things you can measure your progress with. But Continuous Integration requires an investment on the part of the development team, the testers, etc. There are, however, a few things you can adopt right now so, I decided to give you a list of things I think you should adopt.

  • Eliminating waste as a way to optimize

    I recently had a chance to work on an implementation of an Arachnida-based web server that had started using a lot of memory as new features were being added.

    Arachnida itself is pretty lean and comes with a number of tools to help build web services in industrial devices, but it is not an “app in a box”: some assembly is required and you have to make some of the parts yourself.

  • Technical documentation

    Developers tend to have a very low opinion of technical documentation: it is often wrong, partial, unclear and not worth the trouble of reading. This is, in part, a self-fulfilling prophecy: such low opinions of technical documentation results in them not being read, and not being invested in.

  • The story of "Depends"

    Today, I announced on behalf of my company, Vlinder Software, that we would no longer be supporting “Depends”, the dependency tracker. I think it may be worthwhile to tell you a by about the history of Depends, how it became a product of Vlinder Software, and why it no longer is one.

  • Vlinder Software announces the release of Acari as an independent library

    Parsing and generating text can require a lot of memory – to the point where running the parser can be prohibitive on some devices. This is often due to sub-optimal handling of strings, bad integration with the system’s allocators, … … Continue reading
  • Vlinder Software ceases commercial support for Depends

    Effective immediately, Vlinder Software is ceasing commercial support for the Depends dependency tracking library. The Depends dependency tracker library was created in 2007 during some experiments being conducted for the now-defunct Jail-Ust project. It then morphed into a stand-alone project … Continue reading
  • Bayes' theorem in non-functional requirements analysis -- an example

    Bayes' theorem
    Bayes' theorem

    I am not a mathematician, but I do like Bayes’ theorem for non-functional requirements analysis – and I’d like to present an example of its application. ((I was actually going to give a theoretical example of availability requirements, but then a real example popped up…))

  • Globe and Mail: Canada lacks law that defines, protects trade secrets

    According to the Globe and Mail (Iain Marlow, 20 May 2015) the 32-count indictment against six Chinese nationals who allegedly used their positions to obtain intellectual property from universities and businesses in the U.S. and then take that knowledge home to China, would not be possible here: “Canadian observers say the 32 count indictment, which was unsealed late on Monday, highlights the prevalence and severity of industrial espionage in North America, and underscores the need for Canada to adopt more stringent laws. Canada has no dedicated act on trade secrets and economic espionage and has not successfully prosecuted a similar case, experts say.”

  • Vlinder Software is moving

    Vlinder Software is moving to a brand new office in a brand new building. The website address, E-mail, etc. will, of course, remain the same, and the majority of our services will remain unaffected, but quotes, invoices, and other administrative … Continue reading
  • Why I didn't buy a new iPad today

    Behavioural economists will tell you that the “happy high” you get from buying a new toy, a new device, a new computer, a new car or a new house usually wares off within three months. It’s called the ever-receding horizon of happiness (or something like that – something close to the ever-receding hair line) and it’s why I have a small car (just big enough for day-to-day requirements but not big enough to take the whole family on vacation), a fairly crappy laptop computer (good enough to run OpenOffice Write and an SSH client on, but not good enough to compile FPGA firmware or big chunks of software in any hurry, but that’s what the SSH client is there for) and why I’ve had the same iPad for the last five years or so.

  • Implementing time-outs (safely)

    Thyme is a herb that grows in gardens.

  • Bungee coding

    For the last few weeks, I’ve been doing what you might call bungee coding: going from high-level to low-level code and back. This week, a whole team is doing it – fun!

  • Adding SPI support to the BrainF interpreter

    While at Chicago’s O’Hare airport, waiting for my connecting flight to Reno, I had a bit of time to start coding on my BrainF interpreter again – once I had found an outlet, that is ((Apparently, power outlets at Chicago O’Hare are a rare commodity, to the point that their internal website points you to “Power stations” of which there were three in my vacinity, but all of them were fully – ehm.. – used. I finally found an outlet in the foodcourt with a gentleman standing next to it, but only using one socket, so I connected my laptop the the other so socket and a small constellation of devices to the various USB ports on my laptop…)). My goal was to add something that would allow something else to communicate with the interpreter. There are a few buses I like for this kind of thing, and SPI is one of them.

  • Vlinder Software announces the first release candidate for Arachnida version 2.3

    Arachnida is an HTTP(S) server an client framework for embedded devices. It supports HTTP/1.1 and makes it easy to integrate an HTTP server into your application, on devices that may not have a file system. In this release, we’ve introduced … Continue reading
  • Miss(ed) Communication

    Miss(ed) Communication
    Miss(ed) Communication
  • Radical Refactoring: Breaking Changes

    One of the most common sources of bugs is ambiguity: some too-subtle API change that’s missed in a library update and introduces a subtle bug, that finally only gets found out in the field. My answer to that problem is radical: make changes breaking changes – make sure the code just won’t compile unless fixed: the compiler is generally better at finding things you missed than you are.

  • Improving the BrainF interpreter

    As I wrote in a previous post, I wrote a BrainF interpreter in VHDL over a week-end. I decided to improve it a bit.

  • Radical Refactoring: Have the compiler to (some of) the reviewing

    One of the most common sources of bugs is ambiguity: some too-subtle API change that’s missed in a library update and introduces a subtle bug, that finally only gets found out in the field. My answer to that problem is radical: make changes breaking changes – make sure the code just won’t compile unless fixed: the compiler is generally better at finding things you missed than you are.

  • Writing a BrainF interpreter ... in VHDL

    I’ve written parsers and interpreters before, but usually in C++ or, if I was feeling like doing all of the hard work myself, in C.

  • A different take on the "optimize by puzzle" problem

    I explained the problem I presented in my previous post to my wife overt dinner yesterday. She’s a professor at law and a very intelligent person, but has no notion of set theory, graph theory, or algorithms. I’m sure many of my colleagues run into similar problems, so I thought I’d share the analogies I used to explain the problem, and the solution. I didn’t get to explaining how to arrive at computational complexity, though.

  • Optimization by puzzle

    Given a query routine that takes a name and may return several, write a routine that takes a single name and returns a set of names for which each of the following is true:

    1. For each name in the set, query has been called exactly once.
    2. All the results from the calls to query are included in the set
    3. the parameter to the routine is not included in the set

    You may assume the following:

    1. Calls to query are idempotent ((so you really do need to call them only once)).
    2. There is a finite number of values for names.
    3. Names are less-than-comparable value-types (i.e. you can store them in an std::set) and are not expensive to copy
    4. query results never contain their argument ((i.e. for the case at hand, we’re querying a directed acyclic graph, so our first argument will never be seen in any of the query results, although any given value may appear more than once in query results))
  • Looking for bugs (in several wrong places)

    I recently went on a bug-hunt in a huge system that I knew next to nothing about. The reason I went on this bug-hunt was because, although I didn’t know the system itself, I knew what the system was supposed to do, and I can read and write all the programming languages involved in developing the system (C++, C and VHDL). I’m also very familiar with the protocol of which the implementation was buggy, so not knowing the system was a minor inconvenience.

    These are some notes I took during the bug-hunt, some of which intentionally kept vague so as to protect the guilty.

  • Re: E-mail

    The Globe&Mail; dedicated half a page of the Report on Business section to managing your inbox today. People who work with me know that

    1. if you want to get ahold of me quickly, E-mail is not the way to go
    2. if you want a thought-out, thorough response, E-mail is the way to go
  • ICS Security: Current and Future Focus

    The flurry of DNP3-related vulnerabilities reported to ICS-CERT as part of Automatak’s project Robus seems to have subsided a bit, so it may be time to take a look at where we are regarding ICS security, and where we might be going next.

    Of course, I’ll only look at communications protocol security in this context: low-tech attacks on the grid ((e.g. letting two helium-filled balloons up with a wire between them, under a high-voltage power line, in order to cause a short between the phases <table class="image"> <caption align="bottom">Balloon hack illustration – don’t do this!</caption> <tr><td>Balloon hack illustration -- don't do this!</td></tr> </table>

    )) is outside the scope of this article. In stead, I will take a look at two questions: why the focus on DNP3, and what else could they, and should they, be looking at.

  • Is Open Source software security falling apart?

    There have been a number of well-publicized security flaws in open source software lately – the most well-publicized of course being the OpenSSL Heartbleed bug1.

    Then there’s the demise of Truecrypt, recent bugs in GnuTLS and recent bugs in the Linux kernel.

    So, is there a systemic problem with Open Source software? Does proprietary software have the same problem?

    1. OpenSSL is very widely used, which makes its effect on the Internet enormous, and the effect of bugs in the protocol implementation huge. That explains why such bugs are so well-publicized. Another factor in the publicity is the name of the bug (which was very well-found). 

  • "A camel is a horse designed by a committee"

    I don’t usually use this blog to vent frustration, but I’ve been reading standards lately…

    There are four versions of the horse:

    • Pony. Horses as the Good Lord intended them. Strong and sturdy, yet soft and cuddly; obedient yet intelligent; and I’m told they’re rather tasty too!

    • Horse. All the qualities of the pony, without the esthetics.

    • Donkey. The beta version of the pony: strong and sturdy, but none of the frills and quite a few bugs in the programming. Also: they don’t taste nearly as good (or so I’m told).

    • Ass. What the beta version became when the PMO took over.

    • Cow. A forked-off project from the (then open-source) Horse project that went for taste, combined with a bigger ass for the workload (in the form of an ox – you didn’t think I misspelled ass, did you?)

    • Dromedary. When some of the committee members got tired of trying to reach a consensus, they took what they had and ran with it – even if it’s running was more than a bit awkward.

    • Camel. None of the looks. Some of the features. Some features you didn’t think a horse should have. Some you didn’t think a horse could have. More of the smell. Much, much more.

    When you count, that doesn’t add up to four, does it?

    That’s what design by committee is all about!

  • Vlinder Software announces the new Arachnida website

    Vlinder Software is happy to announce the new website for the Arachnida HTTP(S) webserver/client framework is now live. You can find documentation and information for the webserver/client framework at Information will be added to the site on an ongoing … Continue reading
  • What the industry should do with the upcoming Aegis release

    Automatak will be releasing the Aegis fuzzing tool publicly and for free for the first time in a few days. Like I said yesterday:

    to which Adam replied:

    I don’t think the industry is ready – and here’s why.

  • Optimizing with type lists

    In this post, I will take a brief look at how using type lists can help optimize certain applications.

  • A functional version of the KMP algorithm

    For one of the projects I’m working on, I needed a compile-time version of the KMP algorithm in C++. I started by making the algorithm functional.

  • ICS security and regulatory requirements

    In North America, ICS security, as regards the electricity grid, is regulated by NERC, which provides and enforces, among other things, the Critical Infrastructure Protection (CIP) standards.

    In this post, I’ll provide a quick overview of those standards, provisions slightly more in-depth information than in my previous post.

  • The Crain-Sistrunk vulnerabilities

    In the two previous posts, I’ve shown that industrial control systems – ICSs – are becoming more pervasive, and that they rely on security through obscurity.

    Now, let’s make the link with current events.

  • The importance of ICS security: ICS communications

    For an ICS, having communications abilities generally means implementing some machine-to-machine communications protocol, such as DNP3 or Modbus. These protocols, which allow the device to report data to a “master” device and take their cue from those devices w.r.t. things they should be doing, are generally not designed with security in mind: most of them do not require, or expect, user authentication for any commands you might send them, and don’t implement anything approaching what you’d expect from, e.g., a bank (confidentiality, integrity, authentication, authorization, non-repudiation).

  • The importance of ICS security: pervasiveness of ICSs

    Industrial Control Systems (ICSs) are becoming pervasive throughout all branches of industry and all parts of our infrastructure: they are a part of every part of the electricity grid, from the nuclear power station to your home; they’re found in the traffic lights of virtually every crossing; they regulate train traffic; they run the cookie factory that makes your favorite cookies and pack the pills your doctor prescribed.

  • Perl: Practical or Pathologically Eclectic? Both?

    There are two canonical acronyms for Perl: “Practical Extraction and Report Language” and “Pathologically Eclectic Rubbish Lister”. Arguably, Perl can be both.

  • A few thoughts on BitCoin

    Mindmap of a few thoughts on BitCoin I’d meant to turn into prose (still might)

  • Vlinder Software announces Arachnida version 2.2

    Vlinder Software is announcing the release of version 2.2 of Arachnida, our HTTP server framework for embedded devices. This version introduces two important features: a hardened, more versatile OpenSSL plug-in: we’ve scrapped the plug-in that was originally created for version … Continue reading
  • Qt to quickly write a GUI app

    Today, my wife asked me to write an app that would tell her to sit straight every 15 minutes. I know apps like that already exist and I could’ve pointed her to one, but I decided to write one myself. The result is tannez-moi (which is French for “bother me”).

  • The benefits of formal, executable specifications

    While a specification should not specify the C++ code that should be implemented for the specified feature, it should specify the feature in a verifiable manner. In some cases, formal – and even executable – specifications can be of great help.

  • Why #fixthathouse?

    Those of you who follow me on Twitter might wonder why, all of a sudden, I started tweeting assertions with the #fixthathouse hashtag. The reason is simple, CBC The House made me do it.

  • Run-time composed predicates and Code generation

  • Common nonsense: the charter of Quebec Values


    Four of these need not apply for a government job in Quebec if the new PQ charter of values becomes law. Can you pick the one that might still get the job?

  • Sometimes, your right hand should know what your left hand is doing

    Especially if you’re a compiler…

  • Conditional in-place merge algorithm

    Say you have a sorted sequence of objects.

    Go ahead, say: “I have a sorted sequence of objects!”

    Now say it’s fairly cheap to copy those objects, you need to be space-efficient and your sequence may have partial duplicates – i.e. objects that, under some conditions, could be merged together using some transformation.

    OK, so don’t say it. It’s true anyway. Now we need an algorithm to

    1. check for each pair of objects in the sequence whether they can be transformed into a single object

    2. apply the transformation if need be

    Let’s have a look at that algorithm.

  • Why I decided Vlinder Software should stop selling Funky

    If you follow the News feed from Vlinder Software’s site you know that I’ve posted an announcement saying Funky is now in its end-of-life cycle. This is our first product to enter end-of-life, but what it basically means is that we won’t actively work on improving the software anymore.

    If you’ve been following me for a while, you’ll know that I am the founder and sole proprietor of Vlinder Software, as well as the CEO and an Analyst. I don’t usually sign off as CEO, but this is one of those decisions that is mine alone to take. In this post, I will explain why.

  • Vlinder Software Announces Arachnida version 2.1

    Vlinder Software is announcing the release of version 2.1 of Arachnida. This latest version of our HTTP server framework makes it even more versatile than it was before and introduces several improvements and fixes. The single most important addition to … Continue reading
  • Structure alignment and padding

    In my previous post on the subject, I talked about using magic numbers and versions, alignment, and later added a note about endianness after a suggestion from Michel Fortin. This time, I’ll talk about padding, how the sizeof operator can be misleading and how to debug padding and alignment errors.

  • Flawed ways of working: centrally managed version control

    Imagine, just for a moment (it would be painful to do this longer than just a moment) that Linus, when he decided to leave BitKeeper behind, switched to Subversion in stead of developing Git and that for any commit into the master branch of that repository, you’d need his approval. While you’re imagining that, just a few microseconds more, imagine he stuck to his guns.

    Either Linux would no longer exist or Linus would have been declared mad, and Linux would have moved on without him.

    Centrally managed version control systems are fundamentally flawed and impede productivity. Any project with more than a handful of developers/programmers using a centrally managed version control system will either lose control over the quality of the product, or bring productivity to a grinding halt.

  • Minor changes in style

    I am not usually one to make much of a fuss about coding style: as long as the code is easily readable, I don’t much care whether you use tabs or spaces to indent, how you align your curly quotes, etc. There’s really only two things I do care about when reading new code:

    1. is it easy to read the code without being misled by it?
    2. does the new code integrate well with the rest of the code? I do have a few suggestions, though, but above all, I recognize it can be difficult to change habits – and therefore to change one’s coding style.
  • Even in Quebec, Winter is not the only season

    An just to remind myself and some of my colleagues, I drew this on the office whiteboard yesterday: 20121220-193947.jpg

  • Poll: New Debug Tool from Vlinder Software

    Among the most common errors in software involves problems using the right parameters for functions like memcpy, memcmp, memmove, strcpy, strcat, wcscpy, tcscpy, etc. Often, the bugs are simple off-by-one errors, character strings that don’t end with a NUL character, … Continue reading
  • What happens if structures aren't well-designed

    In my previous post, I explained how to design a structure for persisting and communicating. I didn’t say why I explained it – just that things get frustrating if these simple rules aren’t followed. In this post, I will tell you why I wrote the previous one.

  • How to design a struct for storage or communicating

    One of the most common ways of “persisting” or communicating data in an embedded device is to just dump it into persistent storage or onto the wire: rather than generating XML, JSON or some other format which would later have to be parsed and which takes a lot of resources both ways, both in terms of CPU time to generate and parse and in terms of storage overhead, dumping binary data into storage or onto the wire has only the – inevitable – overhead of accessing storage/the wire itself. There are, however, several caveats to this, some of which I run into on a more-or-less regular basis when trying to decipher some of that data, so in stead of just being frustrated with hard-to-decipher data, I choose to describe how it should be done in stead.

    Note that I am by no means advocating anything more than a few simple rules to follow when dumping data. Particularly, I am not going to advocate using XML, JSON or any other intermediary form: each of those has their place, but they neither should be considered to solve the problems faced when trying to access binary data, nor can they replace binary data.

  • SPF deployed on domain

    Vlinder Software has deployed SPF DNS records on the domain, tagging any E-mail not sent by one of the mail servers we normally use as a “soft failure”. In light of our project to roll out spam prevention measures … Continue reading
  • Rolling Out Spam Prevention

    It has come to our attention that a spammer, apparently of Russian origin, has been sending spam E-mails with forged From: addresses in the domain. An example spam E-mail runs as follows: From: Subject: RE: GALE - Copies … Continue reading
  • Exceptions and Embedded Devices

    Lately, I’ve had a number of discussions on this subject, in which the same questions cropped up again and again:

    1. should exceptions be used in embedded devices?

    2. should exceptions occur in “normal operation” (i.e. is every exception a bug)?

    My answer to these two questions are yes and yes (no) resp.: exceptions can and should be used (appropriately) in embedded devices and exceptions may occur during normal operation (i.e. not every exception that occurs is a bug).

  • Quick Summary: Synchronization in Next-Generation Telecom Networks

    This is a quick summary of the ComSoc webinar on Synchronization in Next-Generation Telecom Networks

    Over the last few years, communications networks have changed radically: their use has gone from predominantly voice to predominantly data and they have themselves gone from predominantly synchronous networks to predominantly packet networks.

    Time synchronization requirements, in terms of quality of time, have only gotten stricter, so new methods for clock synchronization are now required - i.e. NTP can’t do the job to the level of accuracy that’s needed.

  • On the importance of clear technical specifications

    Even when the code is working like a charm, technical specifications – and their different interpretations by different people – can lead to confusion and hours-long debugging sessions.

  • Plain and clear cases of "don't do that - fix your code in stead"

    For the last few days, a discussion (that has become heated from time to time) has been going on on the comp.lang.c usenet group. The subject is a “signal anomaly”: the OP wants to catch SIGSEGV and carry on along its merry way.

  • Announcing a new support system on the website

    We are happy to announce that we have made a simple, but effective, support ticket system available directly from the website. Getting support for one of our products has never been easier: just go to the Support page and click … Continue reading
  • New Product Announcement: µpool2

    Vlinder Software is very happy to announce a new product for embedded software development; the µpool2 memory allocator. Memory allocation and management is the most important bottleneck in embedded software development: most memory allocators cannot be configured or adapted to … Continue reading
  • A new website – and lots more to come

    Vlinder Software is pleased to announce our new website. Its look and feel has been improved, information should now be easier to find and we will be able to maintain and expand it much quicker than we have been in … Continue reading
  • When hardware foils software -- and then helps it out!

    Sometimes, an oscilloscope can come in very handy.

  • Please use my time wisely

    Just because I charge by the hour, that doesn’t mean you should be wasting my time…

    This morning, in the wee hours of the morning (time differences can keep you up at night, as can young children), I spent more than an hour and a half doing makework. Most of that work, probably all of it, could have been avoided if I’d been given a working setup rather than a huge chunk of source code and a recipe to make it work. Granted, the recipe did work, but it was still a huge waste of time.

  • Why CS shouldn't be taught before high school (and coding for kids is a bad idea)

    An introduction to computer science was part of my high school curriculum. I was about 16 years old at the time and had been coding in Basic and Pascal for a few years already - I was just getting started with C. This part of the curriculum was a complete waste of time. Not because I had books that taught me better than my teacher ever could, but because, in order to make it easier for us, the programming language we had to use was a version of Pascal … translated to Dutch.

    It made no sense to me, and honestly still doesn’t, to translate a programming language. How is code that says afdrukken any clearer than the same code that says print? I didn’t get the point and I was one of those kids that, if they didn’t see the point in learning something, refused to learn it. For that same reason, I refused to learn French and German (and had to redo my second year of high school because of it) ((The irony, of course, is in the fact that I make a living doing what I refused to learn, that my wife is French and we speak French at home (and my children speak French as well) and my dad’s girlfriend is German)).

    Still, teaching computer science before high school is, IMO, a bad idea. Children this age should focus on four things: social interaction, basic reading and writing skills (understanding for the reading, structure, spelling and grammar for the writing), math and analytical reasoning. At twelve, a child should be able to read and understand a newspaper or a book, write a letter, do basic money-based math (and perhaps a bit of algebra) and independently find the answer to a question such as “why is the sky blue?” Each of these skills is a prerequisite of solving a problem in Computer Science and the level of each of these skills is lacking (to put it mildly) in most of today’s twelve-year-olds.

    Now, don’t get me wrong: I was writing code well before the age of twelve and I don’t regret it. However, at twelve, I also possessed three of the four skills mentioned above. Exceptions will exist and some kids will learn how to code, but coding isn’t the goal in software engineering or (more generally) in computer science: it’s solving problems (or, as I like to call it, “Making Life Easier”). Coding is only a small part of that and isn’t even a skill you need to solve problems in CS or SE, though it is sometimes helpful.

    Update is the education system ready to teach CS to kids? Read these:

  • Sometimes, use-cases just aren't what you need

    I’ve written about use-cases on this blog before (parts one, two and three of the sidebar on use-cases in my podcast come to mind) but I haven’t really talked about when to avoid them.

    When you get a new piece of hardware and a vague set of requirements, what do you do?

    1. try to get the most out of the hardware you possible can
    2. design to meet the need, using use-cases to guide you
    3. a bit of a, a bit of b
    4. other… (leave a comment)
  • Robustness analysis: the fundamentals

    Up until 2008, the global economy was humming along on what seemed like smooth sailing, doing a nice twenty knots on clear waters, with only an occasional radio message saying there were icebergs up ahead. Surely none of that was anything to be worried about: this new economy was well-designed, after all. Redundant and unnecessary checks had been removed but, in order for the economy to be robust, the engineers of the economy had made sure that at least two whole compartments could be flooded before anything really nasty would happen.

    Sound familiar?

  • Robustness analysis: finding fault(s)

    When working on a large project, implementing a system that has to run 24/7 and handle significant peak loads of communication, at some point, you have to ask yourself how robust your solution really is. You have to ascertain that it meets the goals you have set out and will consistently do so. There are diverse ways of doing this. Some are more efficient than others. In this article, I will discuss some of the methods I have found useful in the past.

  • I'll be back (soon)

    Those of you who have been following this blog or the podcast may be wondering why I’ve been silent lately. The answer to that is simple: lack of sleep. My baby boy is starting to sleep nights, though, and some time should hopefully clear in my schedule to pick up the podcast where I left it, and to write some more posts on this blog. In the mean time: patience is a virtue – and sleep an under-rated commodity.

  • Changing an API in subtle, unpredictable ways

    Many seasoned Windows systems programmers will know that you can wait for the death of a thread with WaitForSingleObject and for the deaths of multiple threads with its bigger brother, WaitForMultipleObjects. Big brother changes its behavior on some platforms, though – as I just found out myself, the hard way.

  • Opening a support ticket with Microsoft (or: how not to support your customers)

    I had to open a support ticket with Microsoft today: I found a bug in the TCP/IP stack of Windows Embedded Compact 7 that I wanted them to know about (and to fix). I also wanted to know when it would be fixed – after all, the bug is critical and the company I work for is a Microsoft Gold partner, so I had a reasonably high expectation of service.

    Suffice it to say I was disappointed.

  • Winter wallpapers

    As has become my custom (at least since this summer) I’ve changed the theme a few days go, at the start of the season. Here are the associated wallpaper images…

  • Sleep(...)

    For those of you waiting for the next installment of “C++ for the self-taught”: I’m on parental leave at the moment. The podcast (and the rest of the blog) will be back in a few weeks.

  • Radix Sort

    image The Radix Sort algorithm is a stable sorting algorithm that allows you to sort a series of numerical values in linear time. What amazed me, however, is that it is also a natural approach to sorting: this is a picture of my daughter applying a radix sort to her homework (without knowing it’s a radix sort, of course, but after explaining the algorithm perfectly)!

  • The underestimated legacy of Dennis Ritchie

    Dennis Ritchie is the inventor of the C programming language, which is the ancestor of a whole family of programming languages that includes C++, Java and C# – probably the three most popular programming languages today – as well as D and Objective-C, which are less popular but significant nonetheless.

    Ritchie is also one of the authors of the early UNIX kernel, which was the first significant program written in C and for which C was originally designed, and which is the ancestor of a whole family of operating systems that includes Linux, MacOS X, iOS, BSD and many others.

    C was the first programming language that allowed the programmer to structure data and code, making it relatively easy to handle very large quantities of data while also maintaining full control of how the hardware is used. Most operating systems today, including significant parts of Windows, are written in C – and most OS designs are at least partly based on UNIX.

    Everywhere you look, you can see the fruits of Dennis Ritchie’s labor – and by his fruits you shall know the man – but it seems this man is known only to those of us who either have an intimate knowledge of C and/or UNIX, or are more-than-usually interested in programming language design.

    That is a real shame: I think there is a lot we can learn from his legacy – and a lot to be gained from continuing his work.

  • Making the enabling of online copyright infringement itself an infringement of copyright

    Bill C-11 amends the Copyright Act in several different ways. One of the states purposes of those amendments is to “make the enabling of online copyright infringement itself an infringement of copyright”. While I can understand that this adds significant new protections to copyrighted materials, I think this may quickly become either unenforceable, or introduce serious new restrictions on how communications over the Internet can legally take place. It all hinges on the definition of “enabling”, however.

  • Harper government reintroduces toughened online copyright law

    In the Vancouver Sun: bill C-32 from last session has been re-introduced (probably with some modification – I haven’t had a chance to read the bill yet) and is far more likely to pass, now that there’s a conservative majority in Parliament.

    Update Oct 8, 2008: the re-introduced Copyright Modernization Act is numbered C-11, and is available here.

  • Autumn is here - and so is the autumn banner

    OK, autumn has been here for about a week already, and the banner was ready two months ago, but I only now had both the time and the inclination to put it up…

    You might remember that the corresponding desktop wallpapers are in the Canada Day post.

  • Moving to GitHub

    I will be moving my open source projects (yes, all of them) to GitHub.

  • Eclipse: kudos

    One of the things I like about Eclipse is the way it is designed. I’m not talking about the GUI when I say that - although the GUI is arguably well-designed as well: I mean the way hundreds of pieces fit together to make Eclipse an IDE for Java, C, C++, PHP, Python, …, etc.

  • You, according to Google Analytics

    This blog uses Google Analytics, which provides a treasure-trove of information about the site’s visitors. To use that information to improve the site, it has to be parsed.

    Here’s a sketch of what a typical user may look like - and what that tells me about what I should do with the site.

  • shtrict: a very restricted shell for *nix

    I needed a restricted shell for my shell server - the one that’s available from outside my firewall, so I wrote one. You can download it under the terms of the GNU General Public License, version 3.

  • New GnuPG key

    For those who want to be able to verify .deb packages I make: I have a new GnuPG key.

    Type bits/keyID     Date       User ID
    pub  2048R/6D3CD07B 2011-07-20 Ronald Landheer-Cieslak (Software Analyst) <>
    	 Fingerprint=9DAC FA3D D7A5 001F A0B2  DA59 5E0C 4AF1 6D3C D07B 

    You can download it from

  • From #NotW in the GMT morning to #UBB in the EDT afternoon -- an example of devoted journalism

    I’ve just been catching up on my Twitter account’s updates for today, where possibly the only non-tech person I follow, a politics journalist from the CBC called Kady O’Malley (@kady and @anotherkady) is still tweeting after 15 hours.

    She started liveblogging the #NotW scandal in the UK this morning at 6 am and continued on the CRTC Usage-Based Billing hearings when those started.

    This is the same Kady O’Malley that answers the questions sent to CBC Radio’s The House in the “That’s a good question” section.

    I already knew CBC Radio provides well-informed, balanced journalism (I don’t watch TV so I don’t know about CBC TV) but now I know how they do it: this is one example the people at News International should follow – rather than hacking into people’s voicemail and giving journalism a bad name.

  • "Changer son fusil d'épaule"

    Sometimes, when all else fails, you have to change your tack.

  • Happy Canada Day

    On the occasion of Canada day, I thought I’d put up the Canada-themed autumn wallpapers I’d prepared.

  • Hardware designers, please, think of us!

    One of the most time-consuming tasks in embedded software development can be device driver debugging. Especially if that debugging has to be done in a real-time system without disturbing its real-time characteristics. This usually amounts to producing an output signal on a pin of the CPU and probing the output to see what’s going on. In order to be able to do that, the people who design the hardware have to keep in mind that the people who design the software will have some debugging to do on the final hardware – even if it’s just to make sure everything is working OK.

  • Canada Post Labor Dispute -- Resolved?

    I’ve been watching the Canada Post labor dispute from afar over Twitter and saw the back-to-work bill pass on third reading. Does that mean the dispute is over? I don’t think so…

  • Lonely Planet's Travel Top Ten

    Lonely Planet came out with a book on their top-ten places to visit recently. In light of recent events, some of their choices merit revision and as I don’t have anything better to do right now, I thought I’d do a bit of revision on my iPod…

  • The Manchester Baby is 63 years old today

    The first “modern” programmable computer with 32 words of memory, is 63 years old today.

    Manchester's Baby

    A revised history of the Manchester Baby, in two parts, by B. Jack Copeland from the University of Canterbury in Christchurch, New Zealand, is available here and here – a really interesting read.

  • Summer is here

    Summer is here, so it’s time to update desktop backgrounds and site headers with something a bit more summery.

    This wallpaper of course has the Vlinder logo and the url of this website but, more prominently, it has a lily flower - which also figures prominently (but stylized) on the flag of the Canadian province I live in.

  • Functional Programming at Compile-Time

    [audio src=””]In the previous installment I talked about functional programming a bit, introducing the idea of functors and lambda expressions. This time, we will look at another type of functional programming: a type that is done at compile-time.

  • From Copyright


  • Using Ranges and Functional Programming in C++

    [audio src=””]C++ is a very versatile language. Among other things, you can do generic meta-programming and functional programming in C++, as well as the better-known facilities for procedural and object-oriented programming. In this installment, we will look at the functional programming facilities in the now-current C++ standard (C++03) as well as the upcoming C++0x standard. We will look at what a closure is and how to apply one to a range, but we will first look at some simpler uses of ranges – to warm up.

  • Starting Python - 99 bottles of beer

    After a brief discussion on the subject on StackOverflow chat, I’ve decided to try my hand at Python, using the on-line IDE at Here is my rendering of “99 bottles of beer” in Python…

  • Why I Recommend BrainF--- (and what I recommend it for)

    BrainFuck is an esoteric Turing-complete programming language that consists of only the bare minimum commands for Turing-completeness. It is exactly this bare-minimum-ness that makes it an interesting language - although at first a bit awkward to wrap your head around.

  • Shining light on bugs: testing

    Bugs like to hide in the darker corners of the code: the parts that are least exercised, less well-structured. They don’t react to light very well. tests are like a spotlight that you shine upon specific pieces of the code. The first time you do that – especially if the code has been around a while – the bugs will come crawling out of the woodworks.

  • C++0b

    The C++ standard committee has been meeting in Madrid and has, according to the latest news, approved the new C++ standard. As Michael Wong said on his C/C++ Cafe Blog, C++0x is now C++0b – though it might be C++0c by the time ISO gets done with it.

  • Ranges

    [audio src=””]Iterators, Ranges, Containers and Standard AlgorithmsThe concept of a range is one of the fundamental concepts in the design of the STL and of the C++ programming language. In this installment, we will take a close look at what a range is, and we will take a look at some parts of the design of the STL. This will help you to understand the lines of code I skipped over when we looked at the code in the previous installment. In the next installment, we will look at that code again.

    This installment is heavily based on a tutorial presentation on ranges in the STL, of which the slides are included.

  • Geek Mythology: Women and the Start of Software Engineering

    According to Geek mythology, when Charles Babbage had invented the Analytical Engine, he sat back and said: “Behold! I have created the first pocket calculator!”. Of course, he hadn’t actually built the thing yet, and lacket the practical skill to do so. When it finally was built, there wasn’t a pocket large enough on Earth to put it in. Thus was the inception of the hardware engineering discipline.

    While hardware had gotten off to a good start, software took a more practical approach: when Ada Lovelace heard of the Analytical Engine, she said to herself: “Forsooth, such a mighty machine needeth a touche feminine if ever it is to serve a purpose” and proceeded to write the first computer program. It took several decades for the hardware engineering discipline to catch up with the software engineering discipline and for the two to come together and actually do something useful.

  • The Evolution of the Software Engineering Practice Faced With The Knowledge That "Bugs Must Exist"

    Though laudable, the quest for bug-free software is doomed to failure. This should be news to no-one as the argument for this is as old as I am.

  • A bulldog approach to bugs

    The only bugs I like are butterflies - and even then, only a specific blue butterfly that happens to be a drawing. Aside from those, I spend a lot of time rooting them out.

    I advocate what you might call a bulldog approach to bugs: track them, hunt them down, kill them. Don’t let go until you’re sure they’re dead. This might seem overly agressive, but remember we’re talking about software errors here - not actual living beings.

  • Applying the Barton-Nackman idiom

    It is amazing how much code you can cram into a single line, and how much magic happens behind the scenes when you do.

  • A "brilliant" idea (?)

    For a few days now, I’ve been carrying an idea around for a new app I could really use for my projects: something that integrates requirements management, risk management, workflow, billing, bug/issue tracking, action items, etc. with the code repositories. Wouldn’t that be fun?

  • The Art and Science of Risk Management

    I like to take a rational approach to risk management: identify risks and opportunities, their probability and their impact, maximize the impact and probability of opportunities and minimize those of risks. In this article, I explain a bit of my approach, I expound upon risk dependencies, based on a recent article by Tak Wah Kwan and Hareton K.N. Leung, and I offer some practical advice.

  • The Observer Pattern


    In this installment of C++ for the self-taught, we will be looking at the Observer pattern: we will be starting the implementation of the proxy-part of our SOCKS server by accepting connections and servicing them.

    In this installment, there will be quite a few things aside from the Observer pattern that will appear in the code, but we won’t dwell on those for now - that just means we will be mining this code for another installment or two to thoroughly understand what’s going on in it.

    The focus, in the code itself, is on code re-use, terseness and functional clarity. The focus is not on how easy it is to understand what’s going on behind the scenes at first glance.

  • A new look and a new address

    Due to some technical difficulties I was having with the previous installation of the software running this site, I decided to re-install the software from scratch and, while at it, change the address from to The look has been updated a bit and some further improvements will take place over the next few weeks.

    I’ve tried to make sure that all links to redirect correctly to, so any existing (recent) permalink will continue to work. If any links have been seriously broken due to this move, please let me know.

    Also note I will be making a few changes to some of the pages on the blog (not the posts, just the static stuff), so links to those might still break - but there were very few visitors to the static pages anyway, so I guess that’s not too much of a problem.

    Thanks for your patience,


  • Security Awareness and Embedded Software

    In a recent interview with Ivan Arce of Core Security Technologies by Gary McGraw of Cigital, Arce made the point that embedded systems are becoming a security issue. At about the same time, US Army General Keith B. Alexander, director of the US National Security Agency, said that a separate secure network needs to be created for critical civilian infrastructure. They are probably both right.

  • How error messages can backfire

    Error messages should provide enough information for the user to correct their error, but they shouldn’t provide any more than that, or malicious users could abuse them - as shown recently with the ASP.NET server.

  • Testing Lock-Free Software

    When a test has been running non-stop for over six months, beating the heck out of an algorithm, can we be confident the algorithm is OK?

  • Event-driven software, step 1: select


    In this installment, we will look at the basic networking functions and start looking at event-driven software design. Starring in this installment will be the select function.

  • More than the absence of problems

    Quality can be defined in many ways: ISO defines quality relative to requirements as a measure of how well the object’s characteristics meet those requirements. Those requirements can be specified (documented) or implied (customary). This has the advantage of making quality more or less measurable, but it has the disadvantage of making it harder to justify improving the product if the (minimum) requirements are met.

    In my view, quality is a measure of excellence: it is more than the absence of problems and aims towards the prevention of problems.

  • When the cup is full, carry it level

    It is both a problem and a privilege to have too much work. It is a problem because, at some point, things don’t get done and it is a privilege because it means, among other things, that people are trusting you with things to do.

    The C++ for the self-taught podcast, however, is one of the things I am not getting done this time. I will, therefore, have to revert to the original, monthly, schedule for the time being, while I get all the work I have, done.

  • Annoying Script Kiddies

    I don’t host any of my sites, except for, myself: my Internet connection isn’t reliable enough, power outages are too frequent, and it’s basically too much of a hassle. So, my sites are hosted by a professional hosting service and that service is responsible for the security of those sites. How annoying is it, then, when three of those sites get cracked through the FTP server?

  • Events in SOA

    In a recent article on ZDNet, Joe McKendrick writes that Roy Schulte, the Gartner analyst who helped define the SOA space more than a decade ago, says as SOA becomes embedded into the digital enterprise realm, organizations are moving services to support event-driven interactions, versus request/reply interactions.

    This, of course, is old news…

  • Why IPv6 Matters

    Given the rapid growth of the Internet, and the number of Internet-enabled devices, we are running out of IPv4 addresses - fast. This is a problem mostly for ISPs and large businesses who allocate their own public IP addresses from pools of addresses and sell or sub-let those addresses to .. us. When they run out of addresses, as with any finite resource, the haves will once again be pitted against the have-nots and the Internet will become less egalitarian. But that is not the only reason why you should be interested in IPv6: more important than the 340 trillion, trillion, trillion addresses that the 128-bit address space of IPv6 allows (as opposed to the “mere” four billion of IPv4) are IPv6’s other features.

  • New, interesting stuff on this blog

    Those of you who have been following this blog for a while will have noticed an important change in the last few days: more content, from different sources.

    This is what has happened: I’ve linked the blog to my Google Reader account in a new way. There was already an “Interesting Stuff” column on the right-hand-side, which will disappear shortly. In stead, the interesting stuff in question is now integrated directly into the blog.

    The selection of articles is done by hand, by me. They are basically articles that I read, liked and decided to share with you. They are always on the same topics as this blog itself and may outnumber blog posts by yours truly: I do read more than I write.

    When I post articles like this on the blog, that does not automatically mean that I agree with the content - just that I find it to be interesting. Nor does it mean that I agree with everything the author says, or that I recommend his/her site.

    Comments will be possible on this blog regarding the contents of the posts - whether those posts were written by me or not. As always, all comments are moderated and that moderation is done on a basis of pertinence w.r.t. the contents of the post commented on, decency, etc.

    If you have any feed-back w.r.t this new policy, please leave a comment to this post.

    Thanks, and enjoy :)


  • Refactoring Exceptions


    As I mentioned in the previous installment, our current way of handling exceptions leaves a few things to be desired. In this installment, we will fix that problem.

  • Negotiation: first steps


    As discussed last month, the requirement for encapsulation pushes us towards allowing the user to know that there’s a negotiation between the two peers, and does not alleviate the requirement that the user understand the errors. So in this installment, we will start using the new implementation of exceptions we worked out in the previous installment, and start on the negotiation mechanism from two installments ago.

  • Is technology making us sick?

    In my view, technology should make our lives easier - that’s what I try to work for, that’s what this blog is about and that, in general, is what at least fundamental research is aimed at. But are we going about it the wrong way? Is technology really making our lives harder, rather than easier?

  • Updated: Not-so-permanent permalinks (all permalinks changed)

    Due to the addition of an important feature on the site, all permalinks for all posts have changed. Following the old links will send you to an error page where the the proposed options should include the page you’re looking for.

    Sorry for the inconvenience.

    2010-09-28: to make sure everything continues to work, I’ll be using less pretty, but more effective permalinks as per the default of the blogging engine, from now on. Only one additional permalink is broken with this change, but it does actually fix a few bugs, so I guess I’ll live with the one broken link.

    Again, sorry for the inconvenience, but if all goes well, things will get more convenient from here on.

  • Error handling in C++


    As far as error handling is concerned, C++ has all of the features and capabilities of C, but they are wholly inadequate in an object-oriented language. One very evident way in which C-style error handling is inadequate in an object-oriented language is in the implementation of a constructor of any non-trivial class, but this only becomes evident when we’ve analyzed two things: the guarantees that any method (including special methods such as constructors and destructors) can give, and the minimal guarantees that each of these special methods must give.

  • Bill #c32 seems to be getting less controversial

    Copyright is an important part of my work: every time I sign a work-related contract, I have to make sure that I don’t sign away the rights of previous works to which I retain the rights, nor the rights to work that I do outside the scope of the contract I am signing at that point. I spend a significant amount of time and energy creating copyrighted material and some of that material has to remain mine. Like any copyright bill would, bill C-32 provides a framework to fall back on when cases aren’t covered by contract and now, it looks like it’s on its way to be passed.

  • Home Search, Where Art Thou?

    In my day to day life, there are few things I truly dislike doing: I’m a pretty happy person. There is one thing, though, that I really don’t like - at all - and that strikes me as a truly pointless exercise in futility: searching. Shouldn’t we have a solution for that by now?

    It strikes me we already have all of the necessary technology to come up with a viable solution: I’ve worked with most of them! Let’s have a look what this might look like.

  • "Given the existence of A, B will create itself, therefore, C had nothing to do with it"

    Stephen Hawking’s new book promises a lot of hype. CNN Already published two separate articles about it on their site even hough no-one has read it yet. I’ve added it to my Amazon Science Books Wishlist, and will buy it when I come round to it unless some generous soul wants to offer it to me first. About the hype, though:

  • Opacity: Encapsulation at its best (and worst)


    One thing you may have noticed when looking at the code of our abstract factory, is that the base classes (interfaces) of each of our abstract objects don’t have much to tell their users: there are hardly any accessors or mutators to be found. This is an attribute of encapsulation called opacity and in this installment, we’ll explore its advantages and disadvantages.

  • Women in computing

    When I ran a team of R&D; programmers, a while ago, at one point, we had one person from a visible minority, one person with a slight handicap, two women, two immigrants (one of which was one of the two women, the other was me) and at least one phytopathologist (me). We beat most of the statistics with that team, because there were about ten of us at the time. One of the members of my team remarked that it was the first time he’d worked in a team with two women in it - and he had worked in larger teams before.

  • Tell me twice

    A few days ago, I explained to a colleague why certain communications protocols have a “tell me twice” policy - i.e. to allow for any command to have any effect, the same command - or a command to the same effect - has to be received twice (from the same master). In human parlance, this would be the equivalent of Jean-Luc Picard saying “ensign, I’m about to tell you to lower the shields” … “ensign, shields down!” in which the ensign (Wesley Crusher?) wouldn’t be allowed to obey the second command unless he had heard, understood and acknowledged (HUA!) the first. Now for the math..

  • Once burned, twice shy

    Is Good Code Possible?” John Blanco asks on his blog. He goes on to tell a harrowing story on how he had to develop an iPhone app for a big retailer (“Gorilla Mart”) in less than two weeks. Why he even accepted the contract is beyond me but then, he may not have had a choice.

    In the scenario he described, there’s really little chance of creating quality code, unless…

  • Socks 5: Credentials on Windows


    In this installment, we will continue our implementation of GSSAPI/SSPI, this time on Windows, where we’ll try to get some credentials.

    We will look at two topics this time: first, we’ll look at data encapsulation, after which we’ll look at when RAII is a bit too much, and how to handle that.

  • On the Importance of Coverage Profiling

    Coverage profiling allows you to see which parts of the code have been run and are especially useful when unit-testing. Here’s an anecdote to show just how important they can be.

  • Socks 5: Expanding the factory


    In this installment, we will expand the MechanismFactory class for SSPI. We will take a slightly closer look at the SSPI call than we would normally do, and we will also take a look at the Unicode/”ANSI” differences on Windows. Because of this, we will not have time to take a look at the GSS-API side of things, which we will therefore look into next time around.

  • TPM on your content under #c32 - handing away your rights?

    Under bill C-32 it would be illegal to remove TPM under by far most circumstances. Does that mean that, if you decide to publish software you create with TPM, you’re handing away the rights of your software to the TPM manufacturer? No, it doesn’t.

  • Git server is back

    This is an all-new server, but at the same address as before ( Note, though, that as it’s an all-new server, the server’s SSH key has changed. If you were using SSH for any of your access to the server, there’s a good chance that you will both have to register with the server again and that SSH will complain that the server key has changed. Remove the corresponding key from ~/.ssh/known_hosts and it will be fine.

    If anything is missing, let me know.

  • TPM and the Public Domain (#c32)

    Accroding to The Appropriation Art Coalition applying TPM to public domain content effectively removes that content from the public domain. Is that really true? I don’t think so, and here’s why.

  • Is TPM bad for Open Source? (#c32)

    It’s been argued that TPM and bill C-32 are bad for Free/Libre Open Source Software development. Is that true? If so, why? If not, why not? Personally, I don’t think so, and I’ll tell you why.

  • Git server off-line

    The server at has been off-line since midnight and will remain off-line for at least another day due to a hardware problem. It will be back as soon as possible.

  • Feedback on #C32: Constructive, Destructive or Pointless?

    While some of the feed-back on bill C-32 (Copyright reform) seems to be constructive, much of it has become a foray of personal attacks on Conservative MP and Minister of Canadian Heritage and Official Languages, James Moore, who tabled the legislation with Tony Clement, Minister of Industry, on June 2. Of course, his remarks on the subject weren’t very welcome either, calling opponents of the bill “radical extremists”. So, the debate is on on what should probably be one of the more boring subjects in Ottawa: copyright legislation.

  • Bill C-32

    A few days ago, I was listening to the podcast for the CBC program Spark, in which they mentioned a new bill, bill C-32. They had a person on the show, whose name I do not remember, who said it was a very “balanced” bill. That peaked my interest, so I decided to read the bill myself.

  • Socks 5: Starting GSS-API - The Factory Pattern


    In this installment, we’ll be doing a final bit of clean-up and starting to implement a GSS-API/SSPI client program, while focusing on the Abstract Factory Pattern

  • Binary Search

    While going through some old code, for another article I’m writing that will come up on the blog, I came across an implementation of binary search in C. While the implementation itself was certainly OK, it wasn’t exactly a general-purpose implementation, so I thought I’d write one and put it on the C++ for the self-taught side of my blog. While I was at it, I also analyzed

  • Lock-Poor Stack


    The following is the complete code of a lock-poor stack in C/C++: it’s mostly C but it uses Relacy for testing, so the atomics are implemented in C++. With a little work, you can turn this into a complete C implementation without depending on relacy. I wrote in while writing an article that will soon appear on this blog.

    The stack is not completely lock-free because it needs a lock to make sure it doesn’t need any memory management solution for its reference to the top node during popping or reading the top node.

  • Quantum teleportation achieved over 16 km

    Recently, in this report, it’s been reported that a physics laboratory in China achieved a new distance record in quantum teleportation: 16 km. That’s quite a feat, considering that up until now, the max. distance had been a few hundred meters.

  • Albion College decides to scrap Computer Science and Journalism majors

    I came across this article while surfing the web this afternoon: [Albion College officials defend decisions on faculty reduction and elimination of courses -]( Apparently, Computer Science and Journalism (as well as a few other topics) aren’t reasonable career options in the twenty-first century.
  • Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome

    D.G. Gibson et al. reported, in Science Magazine, the “Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome”. Now, I used to be a biologist and have studied this particular type of biology for a number of years before leaving the field, mostly for financial reasons, for a career in computer science. I’m also a certifiable geek, as I think most of the readers of this blog are, so I thought I’d explain what this means, in geek terms.

  • Socks 5: Finishing sending a token


    In this installment, we will finish the implementation for sending a token from the client to the server. We will answer the remaining open questions and, in doing so, improve the code a bit more. When we’re done, we will have a client that sends a token to the server, and a server that reads the token and parses its envelope - which is a pretty good foundation to build on. We will later make that foundation a bit more solid by removing the classes involved from the test code and moving them to their final locations. First, however, let’s take a look at those questions and the answers.

  • Software development productivity

    In the latest installment of my podcast, I asserted that “all software productivity problems are project management problems”. In this post, I will explain why I believe that to be the case and how I think those problems can be resolved.

  • Podcast format survey

    I have recently made some changes to the format of the podcast: there is background music all the time now (not just a jingle at the beginning) and most installments are about 30 minutes. I’d like to know what you think about this new format.

  • Speaking different languages

    As a dutchman living in Quebec, Canada - one of those parts of the world where francophones (french-speaking people) are surrounded by anglophones (english-speaking people) and yet thrive speaking french almost exclusively - I sometimes run into the “corner cases” of language related coding standards - e.g. the language comments are supposed to be written in.

  • Socks 5: Continuing Sending a Token - Anecdote


    Recording the latest episode of the podcast reminded me of a story that I’d like to tell you: a few years ago, I started working as a programmer on a project in which there was a policy to include the definitions of the classes used in a header - by including the headers that defined those classes - rather than what I recommended in the podcast: to use forward declarations. They also had a policy to use only the name of the file to include rather than the complete path (e.g. #include "MyClass.h" rather than #include "path/to/MyClass.h"). The reason for this was convenience: the preprocessor, when told where to look, would find the proper files and including them in the class’ header meant you didn’t have to use dynamic allocation (of which there was still a lot going on in the project) but you could use the objects directly, rather than references and pointers.

  • Did I say 5000? Make that 14000!


    What else can I say? Two weeks ago, I said “thanks for listening” because there’d been 5000 downloads of the Podcast. Apparently, you went ahead and told all your friends and colleagues, so now the podcast has been downloaded another 9000 times - getting us from 5000 to 14000 in two weeks!

    I am both honored and humbled by such a big audience. Thank you.

    Now I’ll get back to coding that socks server so I have something to show y’all in two weeks :D

  • SOCKS 5 Step 2: exchanging a token


    With a few minor adjustments to the existing Token class, we can finish the first part of our implementation of RFC 1961 for now - we will hook it into an implementation of the GSS API later. Before we do that, though, we’ll create a new directory in our project called lib/rfc1961 and move our files there: it seems more appropriate that way, as we will have a lot more code to write. We will also move our implementation into its own namespace, which will be Vlinder::Chausette::RFC1961. In the first part of this installment, we will look at the changes necessary to do that and we will discuss the importance of namespaces.

    In the second part of this installment, we will start implementing a simple program to send a GSSAPI token from a client to a server. As we will see, this isn’t as simple as it might seem at first glance. We will build upon this example in the following installments to finish the implementation of RFC 1961.

  • Thanks for listening - 5000+ downloads

    Hello everyone,

    I just checked the podcast’s statistics and noticed we passed the 5000 downloads mark this week. At the rate the podcast is now being downloaded, we’ll have doubled that in another few weeks: each episode got downloaded, on average, almost 500 times!

    I honestly never thought this podcast would be that popular. Thanks for listening - I’ll keep talking!


    NB: if you have any comments to help me improve on the podcast or the blog, please feel free to contact me or leave a comment on the blog. Off-topic comments will not be posted on the blog, but I do read all of them and I will reply to questions.

  • Git server is back

    A bit quicker than expected - but that’s always a good thing.

    We’re tesing a new host OS: the Git server - just like most other servers we run - is a VM running a Linux flavor running on a system which is (now) running a different Linux flavor. The intent is to make the server a bit more responsive and to make better use of the available hardware. We’ll re-evaluate in a month or so, when we’ve gathered some statistics on this new running system. By that time, we might move to another host OS again, or move to a different physical host.

    I just have to say this, though: it only took 27 minutes to install and set up the new host OS, and we didn’t lose any data - ain’t modern software just great?

  • Git server down for maintenance

    Vlinder’s Git server, where FLOSS projects including the SDK and the project for C++ for the self-taught are hosted, is going down for maintenance as of now, four an hour or so.

    Sorry for the inconvenience.

  • Preprequisites for the project


    In this installment, we’ll get you set up to compile everything that needs compiling in our project. We’ll try to keep it short and sweet and you’ll be able to download most of what you need just by following the links on this page.

  • Use-Cases Part 3: What A Use-Case Really Is & Writing Use-Cases


    Before we start using use-cases in the description of the functional requirements we want to meet in our project, we need to understand what a use-case really is and how to go about writing one. In this installment I will attempt to answer both those questions. However, this series is called “C++ for the self-taught” for a reason: I will include references for all of the material I have cited in this installment, and I hope you will take it upon yourself to go out and look a bit yourself as well.

  • Use-Cases Part 2: What Use-Cases Are For (The history, present and future of use-cases)


    In the late 1980s and early 1990s, the “waterfall” software development model, which had been around (with that name) since the 1970s (see, for example, Boehm, B.W. Software engineering. IEEE 7~ans Comput. C-25, (1976), 1226-1241) was starting to be progressively “refined”. When that happens, it usually means that there are problems with the model that need to be addressed - or the model will crumble and fall. Object-oriented programming was becoming more or less main-stream and early versions of C++ were cropping up. “Good practice” documents for programming on non-OO languages started to stress the use of OO-like APIs and soon enough, object-oriented programming would no longer be a mere buzzword.

  • The answer to the quiz in episode 7 of C++ for the self-taught


    I know you must have been aching for the response to the quiz from three weeks ago. If you haven’t thought of your own answer yet, go back to the code and have another look. Try running it through a compiler with all the warnings turned on - it might tell you what the bug is (more or less), but probably not how to solve it.

  • Use-Cases Part 1: Introduction & Ingredients


    In the “C++ for the self-taught” series, we’re about to embark on a new project. In order to describe that project and in order to figure out what we want the result of that project will be, we will be using a tool called the use-case. So, I think an intermezzo on use-cases is in order.

  • Confusing the compiler

    Sometimes it’s real fun to see how easily you can confuse the compiler. In the error below, function is a macro that takes three parameters:

    filename.c(453) : error C2220: warning treated as error - no 'object' file generated
    filename.c(453) : warning C4013: 'function' undefined; assuming extern returning int
    filename.c(466) : error C2064: term does not evaluate to a function taking 279509856 arguments

    I don’t know where it got the idea that I typed 279,509,856 parameters, but I sure didn’t take the time to do that! ;)

  • [Re-post from Greg Wilson: ] Bits of Evidence

    Bits of Evidence

    View more presentations from Greg Wilson.

  • 7- Polymorphism


    In this last installment before we start our development project (and yes, there is a development project coming) we will talk a bit about the C++ type system, how to use it, how it ties in with object-oriented programming and how it ties in with what we’ve discussed earlier. We will see what the virtual keyword is all about, and how “a duck is a bird, is an animal” and “a table and a chair are both pieces of furniture” comes into play, and is expressed in C++. Once we’ve gone through that, you’ll be sufficiently equipped for object-oriented programming in C++.

  • Error handling in C

    One of the things I do as a analyst-programmer is write software - that would be the “programmer” part. I usually do that in C++ but, sometimes, when the facilities of C++ aren’t available (e.g. no exception handling and no RTTI) C becomes a more obvious choice. When that happens, RTTI is not the thing I miss the most - you can get around that using magic numbers if you need to. Exceptions, on the other hand, become a very painful absence when you’re used to using them.

  • Distributed Software Development Part 3: Tools Of The Trade

    For software development, there are a few things we need on a daily basis: our source code, our documentation, our integrated development environment (IDE) and our hardware. Without any one of these, a software developer is as useless as… well… something very useless.

  • 6- Resource Allocation and RAII


    In standard C++, there is no garbage collector: there is no built-in mechanism that will magically clean up after you if you make a mess. You do, however, have the possibility to allocate resources, such as memory or files, and work with them. You should, therefore, be able to manage them consistently so you don’t “leak” them.

  • Distributed Software Development Part 2: Management Challenges

    Business is largely about management which, in turn, is largely about reducing costs and reducing time-to-market. However, today’s management models for human resources are largely based on two things: physical presence in the office and seniority. Performance is often only part of the equation when it comes to promotion - people tend to get promoted upto their level of incompetence - and bonuses. In the software industry, however, management models are changing towards a more participatory model in which managers have less and less to say on the “how”, the “who” and the “when” of the development process but, in return, get more say in the “what” - the customer gets to say “why”. In some forms of agile development, team members can even be “voted off the island”, which can be very disconcerting indeed for the manager.

  • Distributed Software Development Part 1: The Safe Boom

    As I said in a previous post, the new economic realities that come with peak oil and climate change will change the way we work and the way the computing industry is run. One of those changes will be limiting unnecessary costs related to moving people around - something we already do for goods.

  • 5- Objects, References and Pointers


    The difference between references and pointers, what they are w.r.t. pointers and how to handle each has often been the source of confusion, sometimes even for seasoned programmers and often for formally trained, inexperienced programmers. Very often, especially in legacy code, I find one if the ugliest constructs imaginable: a function that returns a reference that is the result of dereferencing a pointer, if which the address is subsequently taken to validate its value. Ugh!

  • Staring into the depths of the yet unwritten

    By the end of the next decade, there will be no oil left for consumers such as myself and we’ll have reached peak oil. By the end of the decade after that, the last wild polar bear will have drowned because there will be no polar ice left for it to walk on, it will have been shot by some-one up North as it entered a home looking for food, or it will have died of starvation after eating the last of its cubs. By the end of my expected natural life-span, there will be no edible fish left in the ocean.

    These statements, which are corroborated by leading economist and, for the one about the polar ice cap, meteorologists rather than environmentalists, have a profound impact on the way we work and on the computing industry in general.

  • 4- Classes


    In any language that supports object-oriented programming, the class is a, if not the, basic building block. In this post, we’ll take a closer look at what a class is, and how that ties in with what we’ve seen in the previous post, data structures, and in the two next posts: pointers, references, objects and RAII.

  • Microsoft Team Foundation Server vs Git

    For the last few weeks, and in the coming months, I’ve had to (and will have to) work with Microsoft’s Team Foundation Server (TFS).

  • Implicit, Contextual Requirements

    We tend to forget what we know implicitly: if we’ve been working in the same domain for long enough, we tend to forget that not everybody knows the same things and has the same experience we do. For example, some-one who has been working in distribution for a long time may think it’s obvious that, even if you do switch to RFID, you will still need line-of-sight machine-readable codes (because RFID might fail and because the technology for using line-of-sight machine-readable codes is much mire ubiquitous than RFID is) and when they think of line-of-sight machine-readable codes, they think of barcodes and, depending on what and where they distribute, they might think of Data Matrix codes, UPC-12 codes, or any number of other barcodes.

  • A Day In The Life Of A C++ Analyst/Programmer

    While listening to Spark, on CBC Radio, I had the idea it might be nice for non-developers (and aspiring developers) to know what a typical day might look like.

  • 3- Data Structures


    Before we delve into the realm of object-oriented programming (which we will get into in the next post), there is a notion that is so basic, and so important to any type of programming, that we have to treat it in order to make the whole notion of object-oriented-programming comprehensible.

  • The Quest For Bug-Free Software

    In recent literature from the scientific side of software engineering, there’ve been a lot of publications on producing and maintaining high-quality software. A lot of focus is being put on tools, systems, procedures and processes that aim to reduce the price-tag of quality and avoid the price-tag of failure.

  • 2-Control Structures


    In this post, we’ll take a look at a few control structures in C++. There are only a few of them, so we’ll start by listing them all and giving you some examples of each, but we’ll first take a look at what we mean by control structures.

  • Badly defined semantics

    There is probably nothing worse than badly defined semantics: functions that might (or might not) take ownership of the object you pass to them can be a serious maintenance headache.

  • Protect what's yours

    I’ve drawn up a list of my intellectual property yesterday. It’s about four pages long and contains libraries, applications, web apps, training material, etc. Only one thing that I’ve ever created and published is in the public domain - the rest has copyrights attached to it. That doesn’t necessarily mean that you can’t use it, or even that you have to pay me to be allowed to use it: it just means that it’s mine and that I decide what kind of rights you have over it.

  • 1- "Hello, world!"


    This is the first post in the “C++ for the self-taught” series - the second if you count the introduction. We will take a look at how to create your first C++ application.

  • C++ for the self-taught: Introduction

    I’ve decided to take a little time to make use of those 20000+ hours of C++ I have under my belt and make life a bit easier on those of you that are learning C++. In order to do that, I have created a new category called “C++ for the self-taught” that will basically show you how to program in C++.

  • Rapid application development in PHP

    For the last few days, I’ve been out of my usual C++ cocoon and working, in stead, on a web app to help me better organize my projects and - more especially - help me better track them.

  • The importance of meaningful work

    "_Autonomy, complexity and a connection between effort and reward are the three qualities that work as to have if it is to be satisfying. It's not how much money we make that ultimately makes us happy between nine and five - it's whether our work fulfills us. (...) Work that fulfills those three criteria is meaningful. Hard work is a prison sentence only if it does not have meaning. Once it does, it becomes the kind of thing that makes you grab your wife around the waist and dance a jig. (...) If you work hard enough, and assert yourself, and use your mind and imagination, you can shape the world to your desires._" - Malcolm Gladwell, _Outliers_

    I’ve just finished reading Outliers by Malcolm Gladwell and found it a fascinating book. The passage I quote above explains what, in his view, it takes for work to be meaningful. I would tend to agree with him on the ingredients of meaningful work, but I also think he understates the importance of meaningful work: it doesn’t just make you grab your wife around the waist and dance a jig - it gives a meaning to the better part of your daily life.

    I think there is nothing so depressing as a life without meaning: I have seen people go to work every morning dreading the very job they were going to, and I have seen people enjoy their work and look forward to another day of it. The latter group is a much happier, livelier group. They enjoy life and that enjoyment is often contagious. They are usually also more productive at the work-place and more loyal to their employer. The former group drag their way to the office, are usually glum and tend to take the ambiance down with them. De-motivation is at least as contagious as motivation can be and the only thing that keeps a de-motivated worker at his job is his need for an income and the belief that he can’t find a better job. As soon as a better opportunity comes along (and despite our current economic situation, one inevitably will) the exodus commences.

    So, from a business perspective, motivation - and therefore meaningful work - is very important. However, from a personal perspective, I would argue it is even more so, and I would argue that if your work isn’t meaningful, you should quit. If you need the money (as most of us do) try to save as much as possible so you can do for a few weeks - perhaps months - without a regular income. Try downsizing your expenses for a while and start looking for a job.

  • Developer's Guidelines & High-Quality Software

    Yesterday, I was asked what I saw as the most important factors to ensure the development of quality software. What I cited was good design, good implementation following good standards, and good testing. On the testing end, I have a rule-of-thumb that says that at least 85% of the code should be covered with unit tests - and for the parts that aren’t there should be a clear reason/rationale for it not being covered. Unit tests aren’t enough, however: you also need functional tests, regression tests, etc. IMO, pre-production testing should be an important focus for any software-centric development team. But the subject of this post isn’t testing - it’s developer’s guidelines.

    Dr Dobb’s Survey

    A (more-or-less) recent survey, published on Dr Dobb’s had some interesting things to say about using developer’s guidelines. Here’s some excerpts from the article, each of which I will comment on:

    59% of respondents indicated that their organization has enterprise-wide coding conventions. (...) Of the remaining 41% of respondents, the survey found that 32% had not considered enterprise conventions and that 44% hoped to put them in place one day (everyone else wasn't sure if they had coding conventions at all).

    Let’s take a look at what that means: 247-piewhile a small majority has standards in place, an appalling number of businesses and developer teams still don’t - and some don’t know, which means that if there are any, they are not being followed. Let’s take another quote (from the same parahraph):

    of the respondents who indicated that they have enterprise-level coding guidelines, 17% indicated that developers were more likely to follow their own programming conventions anyway, 51% of them indicated that it was more likely for developers to follow project-specific conventions, and the remaining 32% to actually follow the enterprise conventions

    Let’s take a look at what that means: 247-pie-2now, the part of developers that actually follow enterprise-level standards is reduced to about 18% and 52% is not using any standards (but some of those are hoping that standards might come one day). On the bright side, at least about two-thirds of developers seem to understand the importance of shared standards and are either applying them, applying them but badly managed, or hoping they’ll come one day.

    I see two major problems in this: first, I see a lack of enterprise-level comprehension of the importance of shared standards, meaning that if there are different development teams, those teams are likely to not have the same standards. In “low” times, this may not be a problem, but it reduces the mobility of your programmers between your teams (i.e. if one team needs help, a programmer from another team will have more trouble integrating with the team to help them out). This also means that higher levels of management won’t necessarily understand the reason for being for any standards that do exist, and will pressure development teams to “just hurry up”, which, in the long run, is a counter-productive pressure.

    Second, I see a lack of programmer-level comprehension of the importance of shared standards. This is evident from the results where even in those businesses that do have enterprise-wide standards, 17% of programmers will follow “their own standards” anyway. This could of course indicate a lack of quality in those enterprise-level standards, but if that were the case, the programmers would follow a superset of the standard, not a completely different one. Still, there is hope in this category: 67% of programmers, which is two-thirds, seem to understand the importance of standards, although the results don’t say whether they understand the importance of enterprise-level standards.

    Advantages of Enterprise-Level Standards

    Aside from inter-team mobility for your programmers, there are quite a few business advantages to having enterprise-wide coding standards:

    • reduced time-to-market
      new features are developed faster if they are to be integrated with software that already follows a standard: it makes the software to b integrated with more predictable, meaning the analysts and programmers spend less time trying to figure out how to integrate the new feature;
    • less service costs
      many businesses try to make money selling service contracts in the hope that the customer won’t use them, but find out the customer does use them, and start losing money as a result - high-quality software leads to less service calls, enterprise-level standards lead to high-quality software;
    • higher quality
      implementing enterprise-level standards means you can verify and validate your software against those standards and catch problems early in the product life-cycle, meaning the quality of the end product is higher;
    • lean manufacturing
      one of the “seven wastes” is Defects, of which you can reduce the impact by catching them early, and which you can eliminate using good standards and good manufacturing practices. One important thing to take into account, however, is the quality of your standards: a low-quality standard will have little or no positive impact on the quality of the code, and may even have a negative impact! For example, a standard that says that “at least 50% of the source code file’s text should be comments” is appealing to a lot of people, including a lot of programmers, but it really depends on what you put in your comments whether it’s of any use: comments aren’t verified by the compiler, so they mey lie about the code. Documenting the code’s history in the comments is simple nonsense: there are version control (SCM) systems for that.

    Developer’s Guideline Quality

    What does it take for a developer’s guideline to be a good developer’s guideline? What are the ingredients of a good developer’s guideline? Well, before I say anything about that, I should quote Andrei Alexandrescu and Herb Sutter and say: “don’t sweat the small stuff”. Don’t try to tell people where to put their semicolons.

    Things you do need to put in your guidelines are things that will help your programmers create secure code, code that performs well, code that is thread-safe (if applicable) and code that is maintainable, stable (both in terms of API and in terms of mean time between failures) and scalable. In my opinion, these latter three should really be the focus of any developer’s guideline: the rest will follow.

    The danger of having a bad developer’s guide are legion: you will lose productivity, developer buy-in (which is important to keep them on your team) and, in the long run, any kind of quality. You’d probably be better off without a developer’s guideline than with a bad one.


    From the business point of view, you need your software to be maintainable: you need to reduce your time-to-market and you need to reduce the time between a bug being found and the fix being shipped. There are diverse ways to improve the maintainability of your code through a shared coding standard/developer’s guide. One of them is to have a shared style for all of the code, so five years from now, your programmers can check out that long-forgotten module in which you’ve just decided you have to add a new feature, open up the code, understand it at a glance, and add that feature.

    Another part of maintainability is documentation: programmers usually neither like reading nor writing documentation, so you need to keep your documentation requirements to a minimum in order for your programmers to actually use it (or create it, though that can be enforced); but you also need your documentation to be sufficiently complete so a new programmer can understand how the code is set up without actually having to read all of the code. This minimal but complete approach (which applies to other things than documentation as well) really allows you to augment the quality of your documentation and the maintainability of your code, and guidelines pertaining this should be part of your developer’s guidelines.

    Minimal but complete is also the mantra for API design: in order for a software module to be maintainable, it has to be encapsulated as well as possible, so the API doesn’t tell the user how the functions are provided, but just tells him what functions are provided and what the pre- and postconditions of those functions are. Guidelines pertaining API design should therefore also be part of your developer’s guidelines.


    Stability really comes in two kinds: the “I won’t need to change the client code for this modification” kind and the “I can trust this application with my data, with my process, with my life” kind (and no, the last one is not an exaggeration: just think of all the software in an airplane). These two types of stability are really very different: one is a productivity concern whereas the other is a value concern.

    API Stability

    API stability reduces time-to-market: it means that adding a new feature to an existing module (library, service or otherwise) may add something to the API of that module, but won’t break anything that already uses the module, like removing a feature from the module would. This means that your APIs have to be well-designed, which means that even in agile development teams there is some thinking ahead to be done: even if according to the specific methodology you use you’re supposed to concentrate on only one user story, use-case or what-have-you at a time, you should still take those other cases into account when designing your API.

    You should not, however, push that too far: you should not go into a whole ramble of what-ifs and yes-buts. On the contrary, you should take a very conservative approach to API design: decide what functionality your API will represent to the system (in which functionality can be in the service-sense, in the object-sense or in whatever sense best applies to your context) and limit your API to that functionality. For example, if you need something to interact with PLCs, you don’t normally need functionalities to parse XML in the same API. You might use XML for something behind the scenes, but that’s not what your API is about.

    Mean Time Between Failures

    The other tangent of stability is the mean time between failures. This is where security concerns are involved, but it is also where you need to look for resource leaks, and bugs in general. The most common MTBF-related problems, in my experience, are also very easy to get rid of: they are resource leaks (which are very easy to avoid by applying a very simple coding standard); deadlocks which are provably avoidable by correctly ordering the acquisition of your locks; “access violations” (this is the Windows term for “segmentation fault”, but it basically means accessing a resource you haven’t allocated (anymore) or dereferencing a pointer that doesn’t point anywhere valid), which is also very easy to avoid, either by using smart (or unsmart) pointers, or by making sure pointers that don’t point anywhere are nulled and checking before accessing; and unhandled exceptions and other exception-safety issues, which can be handled by standards as well.

    The one type of error that is very hard to catch just by applying standards, but which it is possible to catch using certain static analysis tools, is the race condition: they are hard to catch at run-time, hard to find when reading the code and hard to diagnose when they pop up - there are some things guides can’t fix.


    Scalability is the capacity of the system to accept more input and/or generate more output without significantly changing the system, both in throughput and in format. From a software perspective, this means (among other things) that the software should not be tied to the specific electronics platform it was originally developed for: it should be portable to the extent applicable to the software in question (e.g. it should not be limited to using a single core of the CPU by design if it is at all possible for it to parallelize certain parts of its logic, but for a firmware, it may very well be acceptable for at least part of the software to have to be re-written to put it on another platform).

    This often means that your architecture needs to take your scalability requirements into account, but at the implementation-level, there are also scalability requirements that should not be lost from sight. Developer’s guidelines can help ascertain that the software is portable, is not limited to specific communications protocols, file formats, etc.

    Other Concerns For Guideline Quality

    For the guideline itself, there are three things that can greatly improve its quality: presentation, structure and enforcement.

    A guideline should be more than just a list of rules and regulations and should definitely not be a law book: it should be easily accessible, preferably something you could present as a wiki, and you should allow the programmers to comment, be able to give a rationale for most (if not all) of the rules you enforce.

    The structure of the guideline should be clear: if a new programmer on the team asks himself “how should I do this?” he should be able to find the answer easily, not have to go through all of the rules to find his answer. Again, presenting the guidelines in a wiki, which usually means you can search it, is a good thing here, but the structure itself should lend itself for quick searches as well - especially for printed copies.

    You should also enforce those rules that need enforcing, work on peer reviews, static analysis, etc. to make sure the rules are being followed. Guidelines that aren’t followed eventually become inapplicable to the code, thus reducing the quality of the guideline as well as of the code.

  • Security at the Design Phase - Examples & Review

    A recent report from the SEI confirms once more what I have been saying for a few years now: security is a design-time concern as much as it is a concern at any other time during the application life-cycle. The very architecture of the application should take security into account from the outset, and that concern should be followed through down to implementation and deployment.

    The cost of defects, especially security defects, is (or can be) a lot higher once the application is deployed than before deployment - defects are usually especially cheap if caught at early design phases. This is true regardless of whether the application is built using agile practices or not - being agile doesn’t mean not thinking ahead. The report acknowledges this and focuses on a few patterns, which are divided into three categories:

    Three general classes of patterns are presented in this document: > > > * Architectural-level patterns. Architectural-level patterns focus on the high-level allocation of responsibilities between different components of the system and define the interaction be-tween those high-level components. The architectural-level patterns defined in this document are > * Distrustful Decomposition > > * PrivSep (Privilege Separation) > > * Defer to Kernel > > * Design-level patterns. Design-level patterns describe how to design and implement pieces of a high-level system component, that is, they address problems in the internal design of a single high-level component, not the definition and interaction of high-level components themselves. The design-level patterns defined in this document are > * Secure State Machine > > * Secure Visitor > > * Implementation-level patterns. Implementation-level patterns address low-level security issues. Patterns in this class are usually applicable to the implementation of specific functions or methods in the system. Implementation-level patterns address the same problem set addressed by the CERT Secure Coding Standards and are often linked to a corresponding secure coding guideline. Implementation-level patterns defined in this document are: > * Secure Directory > > * Pathname Canonicalization > > * Input Validation > > * Runtime Acquisition Is Initialization > > This report does not provide a complete secure design pattern catalog. In the creation of this report, some, but by no means all, best practices used in the creation of secure software were analyzed and generalized. Future work will extend the catalog of secure design patterns.

    I found the report very interesting - you should read it! I found a bit light on the side of examples, though, so I though I’d include a few here for each pattern.

    Architectural Patterns

    Distrustful Decomposition

    The report cites QMail and PostFix as examples, both of which are mail transport agents. MTAs lend themselves particularly well to this pattern: the act of transporting mail to either a local destination or to a remote one can be split into several distinct steps, each of which can be represented by a process that can communicate with one or more other processes. One of these, of course, can expose an SMTP server as an interface while others may be more concerned with delivery or filtering. QMail has an excellent design in this respect and is a very good example.

    There is, however, an example that might be even better for those of us that use the Internet on a daily basis: there is a new vogue in web browser design that is an instance of distrustful decomposition. It’s called tab isolation and it was first introduced by Google Chrome. Internet Explorer 8 also adopted it - among the major players, only Firefox doesn’t have it yet (I don’t know about Opera, but I believe Safari has it in their latest versions as well). It is basically the idea that whatever runs in the tab runs in its own process and therefore cannot affect what runs in other tabs. Though all tabs have basically the same task (and they do share some information, such as cookies and the cache) they are separated into processes that mutually distrust each other for exactly the reason cited in the SEI report: security.

    PrivSep (Privilege Separation)

    This is the idea on which basically the whole GNU operating system is based. No matter which server application you look at, if it runs on *NIX it is almost certain to run under its own user account on almost any GNU/Linux distribution. The “hardened” distributions push this idea as far as they can, and the one - and only - reason for this is always security.

    Defer to Kernel

    Security, in its most general form, is really about two things: authentication and authorization. The focus is usually on the former: one needs to know for certain that ones interlocutor is really who he/she claims to be. There are various mechanisms for authentication, including the almost-universal username and password scheme that we find almost anywhere. The thing this pattern defers to the kernel is not (necessarily) authentication, but rather authorization.

    Looking from this angle, any application that uses the file system to determine whether or not a given user has access to a requested resource - once the user is authenticated - defers the authorization request to the kernel (where the filesystem code usually runs) and usually does so by changing its own effective user id before trying to access the requested resource. That means that by far most server application do this.

    Design Patterns

    Secure State Machine

    IMHO, this pattern, as described in the report, is over-engineered: basically, they present a state machine that “wraps” another state machine and handles security separately from the “user” state machine. Generalizing this pattern just a little bit brings us to the Proxy pattern, and this brings us to a very-often-used pattern for implementing authorization in secure applications.

    Secure Visitor

    I have a somewhat different take on the Visitor Pattern than most, because I think the usual design and implementation is far too intrusive. I have therefore designed a different Visitor Pattern that accomplishes the same task, but is far less intrusive because the object being visited knows nothing about the visitor. I have a working implementation of this revised Visitor pattern and I will expound on it later.

    If the goal is to visit a hierarchical structure and authorize the user as he accesses the structure, the one example I would have expected to find would be that of a file system - e.g. any file alteration monitor accesses nodes in the file system if it is authorized to do so. As, in the report, the task of authorizing falls on the node being visited rather than on the visitor, this is exactly what a file system does: before you can enter a directory (folder) in the file system, you have to be authorized to do so by the file system or, very often, something on top of it.

    In my opinion, however, when it is possible to put the burden on the visitor rather than on the visited, it should be done. I.e. the data structure being visited should be ignorant of the fact that it is being visited and, unless a proxy is warranted, should not have one. Otherwise, this is just another Proxy Pattern.

    Implementation Patterns

    Secure Directory

    Any-one who has had to set up public key authentication for SSH on a Linux server any number of times will have run into SSH’s implementation of this pattern: the ~/.ssh directory must have the right permissions to convince OpenSSH that it can safely assume that the authorized_keys file is secure. When warranted, which it certainly is in the case of OpenSSH, this is an excellent security mechanism that should not be neglected.

    Pathname Canonicalization

    Especially important at deployment, when you need to know that the files/resources you are trying to access are really where they’re supposed to be. Examples of applications that do this include PHP, which is very helpful when deploying applications written in PHP.

    Input Validation

    This, any application should do: you cannot assert on input, as assertions may not be run in some cases (e.g. if NDEBUG is defined), so input should always be sanitized, checked, etc.

    Runtime Acquisition Is Initialization

    As many who have worked with me will know, this is my all-time favorite implementation pattern. Using RAII (which is more commonly called Resource Acquisition Is Initialization) you can create an application that is guaranteed to be leak-free, exception safe, etc. I.e., I use the rule that any resource should belong, directly or indirectly, to an object that has either automatic or static storage duration and that object should be responsible for its deallocation. This is very easy to check during code reviews and, if followed strictly, means nothing can be leaked - whether it be memory or any other type of resource.


    While on the architectural level and on the implementation level, the report has some interesting notes to make, on the design level, I find it rather lacking: what is presented is basically two instances of the Proxy pattern neither of which really add anything to the Proxy pattern itself. If the intent of the authors is to show that we can use proxies at the design level to implement state machines and visitors, that’s nice, but I think we knew that already.

    However, on the architectural level, though hardly very new for experienced analysts and programmers, the presentation of the three patterns gives a thorough look at them from an architectural perspective, which is useful. The same thing goes on the implementation level: it is always good to repeat that RAII and input validation should be applied at all times. The Secure Directory and Pathname Canonicalization patterns are a bit less universally applicable, but very important nonetheless.

  • Can Agile and CMMI Come Together?

    I just finished reading a report by the Software Engineering Institute that accomplishes something that earlier literature, including “SCRUM Meets CMMi - Agility and discipline combined” didn’t accomplish: it takes a rational step back from both methods, shows where they’re from and why they’re different, how much of that difference is real and where the perceived differences come from, and how the two can come together. So, the short answer to my title is “yes”.

    One thing I have often said about, and to, self-proclaimed agilists, is that diving into code head-first isn’t agile - it’s just plain stupid. It gives agile a bad name and it is bad for both the software and the clients that pay for the software. That doesn’t mean that everything should be documented and specified before you start coding: I agree with the Manifesto for Agile Software Development, and I’ll even quote it completely here:

    We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value: **Individuals and interactions** over processes and tools **Working software** over comprehensive documentation **Customer collaboration** over contract negotiation **Responding to change** over following a plan That is, while there is value in the items on the right, we value the items on the left more.

    There is an importance to the contents of the last line: while there is value in the items on the right. I.e., processes and tools are still needed - just look at all the scrum tools that are cropping up everywhere to visualize burndown charts and if a daily stand-up meeting and a weekly sprint isn’t a process, I don’t know what it is.

    Documentation is still needed, though some documentation is more needed than other documentation, and some can be generated straight from the source code rather than trying to document first and code afterwards. Requirements, however, need to be written down somewhere, as do user stories and other such documentation tools. Sometimes, you need to establish a clear standard because you need to communicate with sometimes yet-to-be-written third-party software, so you need to document your protocols first. Bref, there is value to documentation.

    Contracts still need to be negotiated as well: no-one likes to work for free and many clients need non-disclosure agreements and other legalities in order to be able to do business. Perhaps the emphasis should no longer be on “what is the application going to do, exactly?” and should rather be on “under what conditions will the application be made?” but there is still a contract that needs to be negotiated, or there will simply not be a customer to collaborate with.

    Plans should not be underestimated as assets either: a well thought-out plan will take care of the big picture while allowing many, many details to go less planned. For large projects, plans should be less detailed, not more, than for small projects. If both a small and a large project can be planned in five steps, the steps for the small project will be smaller, and therefore more detailed. The whole question is how to find a balance.

    And that’s where CMMI comes in: while iterative and incremental design and development is a cornerstone for all agile methodologies I know of, it pre-dates all of them by decades. CMMI provides a number of models in which iterations have an important place (it’s not like either the authors of CMMI or the authors of the diverse agile methodologies re-invented the wheel). The explicit goal of CMMI, and of its predecessor CMM, is to increase software quality and decrease the risk of software failure, failure of software delivery and failure of software to be “up to spec”. Agile methods actually have the same goals, but may have a different approach to reaching them in that they make the approach explicitly iterative and incremental, focusing on the small increments that might (or very well might not) end up with the end goal. CMMI focuses on the end goal and CMMI users are usually less interested in the individual increments.

    In order for a process to become more mature, it has to progressively add and follow-up on configuration management, process and product quality assurance, planning of all kinds, etc. etc. (actually, that’s mostly level 2):


    So merging CMMI and Agile really comes down to one thing: doing all that, in small steps, for every iteration. That’s really all there is to it!

  • Installing Git on CentOS 5.2

    I’m pretty sure that I’m not the first one to run into this, so I thought I’d blog how this works.

    As I said earlier, one of my clients uses CentOS 5.2 in their production environment, so I need a CentOS 5.2 development server set up. I use Git for all projects I can use it on so I needed to install Git on CentOS 5.2. There is no RPM package for CentOS 5.2 for Git (the RPM package at is for Fedora) so I had to build from source. Here’s how I did it:


    • Git sources:

      $ wget

    • Build-time dependencies:

      $ sudo yum install curl-devel expat-devel

    Now you can make git by running

    $ make

    Personally, I install packages like these, which are not part of the OS, in my home directory, under opt, which you can do like this:

    $ make prefix=${HOME}/opt

    This should build without warning or error. You can now install it like this:

    $ make prefix=${HOME}/opt install

    (Don’t forget to re-specify the prefix: not doing so will force a complete re-build).

    If you want to make your own RPM of Git, you need to clone git from the git repository: the “rpm” target uses git-archive, which needs to be run in a git repository.


  • Out-of-touch techies, marketing retoric, and nonsense. You do the math.

    Sometimes, techies and marketers - and especially people who are both - can get very out-of-touch with the real world and start spewing out nonsense like this:

    The web has dramatically changed the software industry over the past 15 years. Today it’s hard to imagine business without the web. Nearly all businesses have or are creating a presence on the web to promote or sell their products and services, find new customers, or support existing ones. At the same time, the web has spawned a massive new ecosystem of web professionals – developers and designers who are focused on helping these businesses thrive. From: [Somasegars weblog](

    Anyone who has ever worked for an NGO, who has ever listened to radio documentaries (or watched TV documentaries) or has ever picked up a paper without turning to the sports section knows that this is utter hogwash. The large majority of businesses in this world do not have websites, and the vast majority of people dont have Internet access. Think of all the micro-businesses, the small farms and the large majority of businesses in the developing countries and you’ll see what I mean.

    I agree that theres a business in helping businesses get on the Internet and that theres a lot of activity there, but can we tone down the retoric a bit?

  • A new theme

    I promised I wouldn’t talk about this blog too much on the blog, and I promise I won’t do this often, but I thought it might be a good idea to notice that I’ve made a few minor changes.

    First of all, the theme has changed. I think this one ergonomically a wee bit better and esthetically a bit more pleasing than the previous one. Pages have moved to the top (I’ll make some changes up there as well) and the color scheme is a bit different, making it easier to distinguish quotes from headers (something that was nearly impossible before). There are fewer ads, no more LinkInfo ads (they were annoying and didn’t add any real revenu) and easier access to the RSS feed for the blog - meaning it’s easier to subscribe now.

    All posts have been categorized, meaning a few new categories have popped up. I’ve also added per-category feeds: select a category and look for the “This category” link under the “Feeds” menu, at the right-hand side. I’ll try to make sure that I categorize all new posts correctly.

    I hope you enjoy these improvements and welcome your comments (either on-blog or by E-mail).

  • Running a LAMP: Debian vs. CentOS

    One of my clients uses CentOS for the production platform of their (web) application (written in PHP). They’ve asked me to take over the development and maintenance of their web application, so, naturally, I set up a new server with CentOS 5.2, rather than the Debian installation I would normally use.

    I like Debian for a lot of reasons: it is generally a stable system that is well-documented, secure and easy to handle. The “easy to handle” part is, of course, because I happen to know my way around a Debian system. When I started out, seven years ago, professionally working on Linux systems, I started out on the then-current RedHat distro.

    A lot has changes since then.

    Sometimes I feel like a real geezer when I say that, but having memory go back a decade or more in computer science is like having a living memory go back to the middle ages in history: “civilization” started a few thousand years ago in “real” life, while it started only a few decades ago where computers are concerned. The age when computers arrived in the household is recent enough for me to remember it.

    Anyway, back to the topic at hand. There are actually very few differences between CentOS and Debian: in many respects, they are very similar. I would argue that the CentOS installation is a bit more user-friendly in the way it set up its interface by default, but Debian has a better installer (apt) that CentOS does (it uses yum), though yum uses the RPM format while apt uses its own format - and RPM is the Linux standard, at least in the Linux Standard Base.

    Debian has a lot more packages available, though - but for running a LAMP, that doesn’t change much.

    So basically, for running a LAMP, I found them pretty much equivalent - though I will continue to prefer Debian because I know my way around better. Both do the job of running a LAMP just fine, both have a “when it’s ready” approach to releasing and both are well-documented.

  • Having fun on a technical test

    I guess it’s not a secret: I’m looking for a job, either short-term or long-term, so I put my CV on a few websites. I got called by head-hunters twice this week: once for a contract to start on Monday (I’m busy until the end of September/the beginning of October so I told ‘em I couldn’t start full-time until then) and one to start a bit later.

    The second (first one to call, actually) is for a position that looks a lot like the one I have at Optel, but they wanted two references and wanted me to take a tech test on C++. That was fun.

    I had 90 minutes to do the tech test they sent me by E-mail. The test was only ten pages, 12 questions, one of which was improving a piece of some of the ugliest code I’ve ever seen - uglier even than the code I put on the tech test I wrote for Optel (and that’s hard to beat) but with less diversity in its errors (so actually easier than the code I put on my tech test): the code simply had one fundamental flaw from which most others followed - and a few minor style errors that could be dangerous if overlooked.

    There was another question on the test that was more fun, though: a description of a function muck like strtok, but not allowed to change the input string and, unlike strtok, it had to be re-entrant. Of course, there’s a whole slew of ways to implement that, so I picked two - one using boost::tokenizer and one not using it - and implemented them both.

    Tech tests, IMO, are an excellent forum to show off. In fact, they’re one of the few forums where you should show off - the others being, um…, I’ll let you think of that. According to the description, I had ten minutes to write the code for the function. I wrote the version using Boost in two minutes, the version without Boost in four - so I wrote both within the ten minutes. If I had taken the four remaining minutes, I could easily have spit out another two versions - but there’s a limit to showing off.

    Thinking back, I think I may have over-done it a bit: I was pretty pedantic on the first few questions of the test, sometimes going into the details of the language a bit more than was asked, and I made a point of using both english and french to respond to different questions - being an immigrant, people expect me to speak english but not french, so I showed off on that too.

    Now, I’ll just have to go back to being my “humble” self again :)

  • The Importance of Proof-Of-Concepts

    Any problem is an invitation to find a solution.

    Any solution - at least in my line of work - is an amalgam of concrete implementations of abstract concepts. Each of those implementations may or may not meet the requirements just like any of those concepts may or may not be the right one for the situation at hand. You therefore need to prove two things:

    1. the solution proposed is the solution to the right problem

    2. the solution proposed solves the problem

    The solution being a concrete implementation of an abstract concept, needs to be proven. Hence, you need a Proof of Concept.

    Why? When? What? How? Who?

    These are the five basic questions you need to have answered before being convinced of anything - at least, I need to have answers to these five basic questions before being convinced of anything and I try to have answers to each of these whenever I try to convince someone of anything. So, I will now try to answer each of these questions to convince you of the importance of proofs of concepts.


    Ultimately, a proof of concept saves you money: it saves time because you don’t spend more time than necessary on a concept that you can’t prove; it saves more time because concepts that you can prove are known to work and can be built upon as the project progresses; it instills confidence in the solution in your management and development teams and it documents two things: what the solution is meant to solve, and how the solution is meant to solve it. It therefore also establishes two things - which are of vital importance to understand for any solution: the capabilities of the solution and the limitations of the solution.

    I express functional requirements in terms of required capabilities or capacity and permitted limitations, and I believe this is a universal way of expressing functional requirements. Mind you that this does not express all types of requirements - e.g. security requirements can (and should) be expressed in terms of rights and obligations, but they are non-functional requirements (see L. Chung, B. Nixon, E. Yu, and J. Mylopoulos, Non-Functional Requirements in Software Engineering. Kluwer Academic, 2000; P. Devanbu and S. Stubblebine, “Software Engineering for Security: A Roadmap,” The Future of Software Eng., A. Finkelstein, ed., ACM Press, 2000.; and D.G. Firesmith, “Specifying Reusable Security Requirements,” J. Object Technology, vol. 3, no. 1, pp. 61-75, Jan.-Feb. 2004. and many others). A proof of concept proves beyond a reasonable doubt that the functional requirements that are to be met by the solution can be met by the solution proposed. Though it is not a complete implementation of the solution, it is enough of an implementation to prove the feasibility of the solution and that it fits the problem at hand. It should, however, be significantly less expensive to develop than the solution itself, lest it not serve its purpose as a time and money saver.


    “Before it’s too late, but no earlier”.

    It’s too late to prove a concept when your design depends on the concept to work, so you have to prove it before that. It’s too early to prove a concept if you haven’t even analysed the requirements the solution is going to have to meet yeat, so you need to prove your concept after that. It’s no use to create a formal proof of concept if there are already plenty of proofs around, so you might not need to make your proof of concept at all (i.e. don’t try to prove the obvious).

    “When in doubt, prove it!”


    Some things are not concepts and should not be treated as such; a thing does not become a concept just by sticking “conceptually” before it; and negatives cannot be proven.


    Say you want to use MySQL in a project written in C++, but you want the project to be closed-source and you don’t want to pay a license to MySQL AB (or Sun Microsystems, or (soon) Oracle). You don’t mind distributing MySQL’s own source code under GPL, but you don’t want to GPL yours. Think about this for a bit.

    After thinking about it for a bit, you may have come up with the solution “we need an abstraction layer for MySQL that allows us to talk to the MySQL database without using MySQL’s own code”. You mull on that a bit, think about what that abstraction layer should be like and come up with something like this: “We need a solution in which a closed-source object, can load an open-source object at run-time; both expose the same or a similar API allowing to perform database queries and the client application can use that API, talking directly only to the closed-source part of the solution, to perform queries. The closed-source object shall not depend on the open source object in any way shape or form, but the open source object may depend on the closed source object”. Call the closed-source object “Manager” and the open-source object “Driver” and you get ODBC.

    You now have the following assertions:

    • “ODBC allows us to perform SQL queries from within the C++ code on a MySQL database”

    • “ODBC meets the performance requirements for our solution”

    • “Using ODBC allows us to use the MySQL database without rendering our own source code GPL and without paying for a license” Each of these assertions is actually a hypothesis and each of these hypotheses can be tested.. In order for a hypothesis to be tested, you need to be unable to falsify it - i.e. you need to try and fail at falsifying it; and you need to prove it practically feasible.

    Our first assertions, “ODBC allows us to perform SQL queries from within the C++ code on a MySQL database” is one that can only be falsified by trying to prove its feasibility and failing. I’ve tried it - it’s feasible (it’s actually very straight-forward). The second assertion, “ODBC meets the performance requirements for our solution” depends on our performance requirements. Once you have those, you can try to falsify it by building on your first proof of concept - the one that proved that you can perform a query from C++ - by performing queries that you conceive might not meet your requirements. You then proceed by either failing to do so (all queries you can think of remain within your performance requirements) or succeeding to do so and choosing a course of action (optimize the proof of concept, or consider the concept a failure and go back to the drawing boards). If you’ve passed this step (either by optimizing your proof of concept or failing to produce queries that do not meet your requirements; or perhaps by tailoring your requirements - it happens) you verify that in all your proof of concept code, you have not used any MySQL code that would render your implementation Free Software. You have thus proven your third assertion: “Using ODBC allows us to use the MySQL database without rendering our own source code GPL and without paying for a license”.

    While building your proof of concept, you are creating code. The code might not meet all of the requirements production-level code would meet, but it should be a very good starting point. Proof of concept code should therefore be developed using the same standards as production code and should be conserved in a working form - i.e. it should, during the development of the production code, serve as your first tests.

    So, for a more concise answer to “How?”:

    1. Analyse your requirements

    2. Produce a set of assertions that are testable as hypotheses and of which proof will be sufficient evidence that the proposed solution is valid for the problem at hand

    3. Find a solution that you think will meet the requirements

    4. Conceive of tests to test each assertion (and make sure all stakeholders agree that the tests test the assertions adequately)

    5. Develop and perform the tests, stopping as soon as one of your assertions fail.

    You should test the assertion most likely to fail first: you should Fail Fast.


    The answer to this really depends on how you manage your human resources: as a software analyst, I’d say my responsibility is to come up with solutions to your problems and that would normally include proving that the solution I propose is feasible and meets the requirements it sets out to meet - i.e. that it solves the problem. I also happen to be a programmer some of the time, so I don’t mind programming some of the time. On the other hand, there is something to be said for having a programmer - someone who will ideally be involved in developing the final solution - code the proof of concept under the analyst’s supervision: the programmer will know what to expect when the time comes to implement the final solution, and will have a far better understanding of what requirements the solution should meet if he implemented the proof of concept himself. That way, the proof of concept takes a bit longer to develop, but the production code takes less time to develop.

    So there’s a trade-off, but from a business perspective, time-to-market will usually win, as it should. Which allows for a shorter time to market depends on the complexity and risk of the concept: higher-risk or more complex concepts usually require more involvement from the analyst.


    Proofs of concepts are important: they save time, they save money and they allow you to build your products on a solid foundation, with a better understanding of both the problem and the solution. They require an investment in the (potential) solution early on, but that’s when investments have the highest return and when modifications are least costly.

  • Refreshing SQL

    I first started working with SQL several years ago: MySQL was still in the 3.x versions, so I didnt use any stored procs, transactions, etc. Most of the business logic around the data was written in Perl. Though it was a fun time in many respects, I dont miss the limitations of MySQL one bit.

    Since then, a lot has changed in SQL: stored procedures have evolved a lot, including, as of MySQL 5.4.14, an error signalling system (the SIGNAL statement) that is much like an exception mechanism. This brings MySQL (finally) to the level where it can be used for serious business logic, moving that logic into the database, where (if it applies to the data) it belongs.

    A well-designed database can contain a lot of data of which the structure should be hidden most of the time: INSERT statements should be rare in client code, as should SELECT and UPDATE statements as they require intimite knowledge of the underlying schema, which you just shouldnt have at that level of code. However, without a viable way of signalling errors, its simply impossible to avoid putting the business logic surrounding the data in the client application, where errors can be more easily handled. Most PHP MVC frameworks, including my favorite, symfony, abstract the data away through model classes which, in turn, take care of doing the INSERTing and the SELECTing. That still puts the data-related business logic in the client application code, though, coupling the database schema to the code.

    Anyone who knows me and the way I design software knows how much I abhor coupling, and that if I know of a viable way to avoid it, I will avoid it. Some have told me that I sometimes go too far in writing loosely-coupled code, but I have too many bad experiences maintaining spaghetti code to allow myself to write anything that might become spaghetti one day. Things like this field used to be an INT(10) but, un such-and-such version of the schema was an INT(11) and now its a VARCHAR(45) because so-and-so changed it should have no effect on client code.



    And using well-designed stored procs allows just such decoupling!

    Talk about refreshing!

  • Binary Compatibility

    When writing library code, one of the snares to watch out for is binary compatibility. I have already talked about the dangers of breaking binary - and API - compatibility but I had neither defined what binary compatibility is, now how to prevent breaking it. In this post, I will do both - and I will explain how, at Vlinder Software, we go about managing incompatible changes.

    What is “compatibility”?

    compatible: _ Capable of orderly, efficient integration and operation with other elements in a system with no modification or conversion required_ [[Free Dictionary](]

    That about sums it up: a library is “backward compatible” is you can drop it in the place of an older version of the same library (or an older, but different library the new one aims to replace) without having to change anything else, and without breaking anything. Forward compatibility - the ability to gracefully accept input destined for later versions of itself, does not apply in this context and is beyond the scope of this post.

    We will distinguish two types of (backward) compatibility: API compatibility and Binary compatibility.

    Binary Compatibility

    Library _**N**_ is said to be _Binary Compatible_ with library _**O**_ if it possible to replace an instance of library _**O**_ with an instance of library _**N**_ without making any changes to software that uses library _**O**_.

    There is one obvious restriction to binary compatibility: it only applies to shared libraries because a library that is statically linked into an executable program cannot be changed after the fact.

    API Compatibility

    Library **_N_**is said to be _API compatible_ with library _**O**_ if it is possible to recompile the software using library _**O**_ and link it against library _**N**_ without making any other changes to that software

    How compatibility works

    Binary compatibility and API compatibility are two different creatures: binary compatibility is really all about the library’s ABI (Application Binary Interface) which is determined not only by its API, but also by the dependencies exposed through that API. For example: if your library is written in C++ and uses std::string, and that use of std::string is exposed in the API you have exposed the dependency on std::string in your API. That means that your library will only be compatible with software that uses the same implementation of std::string as you do - or at least uses a compatible one. This may sound easier than it really is: std::string allocates memory and therefore uses a memory allocator. If you want to pass a string from one library to another, you need to make sure that you use the same allocator in both - because you can’t deallocate memory with an allocator that didn’t allocate it. So binary compatibility (in C++) is really about four things:

    1. C++ name mangling

    2. exposed dependencies on third-part libraries (including the STL, Boost, the RTL, etc.)

    3. object layout

    4. API compatibility

    API compatibility is about the names and signatures of your functions and the contents of your objects. In C, it is pretty easy to know whether two APIs are compatible:

    • was any function removed from the API?

    • was any function’s signature changed in the API?

    • was any structure’s layout changed in the API that was not a simple addition at the end of the structure?

    If the response to any of these questions is “yes”, you are no longer API compatible, (Note that the object layout question is stricter than strictly necessary for API compatibility: as long as you don’t remove anything from an exposed structure, you can change the layout of an object. We use a stricter definition to help with binary compatibility.)

    In C++, the notion of function is a bit murky due to the addition of template functions and template classes. Adding a specialization of a C++ template class to an API may well break your API compatibility, as there is nothing that requires you to implement the same methods in that specialized class as were available in the generic version.

    Similarly, adding a parameter to a function to an API may well be a breaking change, but might not: C++ allows the programmer to specify default parameter values so you can add a parameter to the end of your parameter list, supply a default value and retain API compatibility.

    So, let’s take a closer look at how things really work: C and C++ are compiled languages. A program is divided into translation units that are translated by a compiler and then linked together by a linker. This is somewhat simplified from reality, but it’s close enough for our purposes. During the compilation phase, the API comes into play: the compiler has to find a function prototype for each function called by the code, a class or structure for each object created, etc. At some point during this translation, the names of your functions are mangled so what once was your API now becomes your ABI: i.e. the human-readable instructions you wrote, which in your mind used the library you were going to link to, are translated into some intermediate form that your linker will understand. As long as that mangling is done in the same way by the compiler that compiled the library and by the compiler that is now compiling your software, the link will work.

    The compiler creates object files which, among other things, tell your linker how to create your executable application - i.e. which libraries to look for (you may have to help it on that), which functions to look for in those libraries (using their mangled names) etc. and creates an executable which will contain the instructions for your computer, but also some instructions for your dynamic linker (or dynamic loader, depending on your OS) so it knows how to load and link your shared libraries (whether they be “dynamic load libraries” or “shared objects”).

    If you can go through the compile and link steps (so if you got up to here) without changing anything, you are API compatible.

    The next step is to run your program. When you do that, your OS will load your program into memory and scan it for any dependencies - any shared libraries you depend on. It will then try to find those libraries and load them into memory, looking for their dependencies, and so on. When they are all loaded, a final linking step is performed in which the functions (mangled and all) your application was looking for are all resolved. Then, your application starts running.

    If you got to here, you might be binary compatible.

    Might be? Well, remember the remark I made about “exposed dependencies” earlier? If your application goes through all its functional and unit tests, and you’ve testing everything “comme il faut” (as you should), you are binary compatible. If you haven’t tested everything, you’re on thin ice.

    Avoiding Compatibility Pitfalls

    Removing a function or a method from the API is a sure way to break your API compatibility. Changing a type of member might break API compatibility, but will almost certainly break binary compatibility. Changing the order of members in a structure won’t break API compatibility, but will break binary compatibility. I could go on. Routine maintenance and innocent-looking changes may break it in very subtle ways. What may seem like simply recompiling your library might, in fact, break binary compatibility: you might be using a slightly different version of the RTL than you did last time, or you might be using different optimization settings, that change the alignment of your exposed structures a bit, breaking binary compatibility.

    The point is: binary compatibility is a lot more fragile than API compatibility. You should therefore be very careful about promising binary compatibility.

    Sure-fire ways to not break binary compatibility don’t exist, but there are some ways to avoid the most common problems:

    Versioning The Development Environment

    This is, without a doubt, the most radical solution, but also the most effective one: everything is built in a known environment, to which any changes are versioned and documented. The way this is done is straight-forward: the entire build environment, including the compiler, all the headers, etc. is put in a virtual machine (i.e. VMWare) in which everything is compiled. This virtual machine is called an “Incubator”. It takes a known source as input, builds it in a known environment and spits out a known, compiled version.

    The way the incubator is set up, it’s a hands-off experience: all you need to tell it (through a web interface) is where to get the source (i.e. a Git URL and an SHA-1). It will check out the source and build it. Build scripts are allowed to copy files to a certain location: the staging area. Once the build is done, the staging area is wrapped into a tarball and made available. The incubator itself is also versioned: the entire hard drive image - i.e. the entire virtual machine - is put in a versioning system such that the exact version of the incubator is known by it’s SHA-1 checksum. Whenever a new (version of a) package is added the the most-current incubator, this changes the version of the incubator: the incubator is now “dirty” and will refuse to call anything it produces “clean” (dirty incubators make for dirty packages) until it has been “cleaned” - i.e. versioned. Of course, the incubator can’t version itself, so it can be told it’s clean even if it isn’t - that’s a question of putting protocols and procedures in place.

    Once you have an incubator in place, anything you build now comes from a known environment, so as long as you don’t do anything in the code to break compatibility, you’ll be able to produce binary-compatible packages - i.e., you’ll be able to produce exact replicas of what you built before if you need to.

    Interface design

    One of the major pitfalls in the compatibility “debate” (let’s call it a debate, shall we?) is the “exposed dependencies” problem: all dependencies you expose in your interface - whether it be the API or the ABI, become part of your ABI and, thus, become part of your compatibility problem. If you version your entire development environment, that is not really a problem because your exposed dependencies will simply not change from one (maintenance) release to another and you can assert that whatever you depended on before has remained unchanged and will therefore have no effect on your compatibility.

    Versioning your entire development environment, however, represents a (sometimes huge) investment that you may or may not be willing to make. So, an alternative is to restrict the dependencies you expose in your interfaces. This is done in various ways:

    • expose built-in equivalents

    i.e. use char* in stead of std::string, etc.

    • roll your own

    i.e. implement your own string classes, your own smart pointers, etc. and expose those

    • create a C API

    i.e. implement your usual C++ API exposing anything that needs to be exposed and wrap it all in a C API in which you wrap:

    * all strings, smart pointers, etc.
    * all allocation and de-allocation
    * all deep copies, etc.
    * basically anything else - wrap it all in opaque structures not visible in C

    I have often taken the first or the third route but never the second - though I can’t say it’s a less “honorable” one. I.e., to come back to my original example, Xerces does a pretty good job at “rolling their own” with its own XMLString class, which is used throughout the implementation and the API, its own allocation scheme, with its MemoryManager class, and its own I/O classes, in the form of InputSource and BinOutputStream.

    Personally, I tend to rely on the presence of at least two things: the STL, which is shipped in different shapes and forms with most compilers, and Boost, which is available for most compilers. If need be, I can provide a C API that doesn’t need either, but generally, I presume both are present. Other than those two, I go to great lengths on the first of my three routes, hiding dependencies behind interfaces. E.g., Arachnida goes a long way towards hiding OpenSSL by wrapping the whole thing in the Scorpion library. I definitely did not intend to implement my own SSL implementation - and didn’t do that at all - but I did intend to hide it sufficiently so the dependency is not exposed beyond the interface of the library.

    Maintenance branches

    The way Xerces-C maintains its version numbers is very good if you want to know whether the software you are downloading is theoretically going to be binary-compatible with what you’ve downloaded earlier. That is: if you’re downloading a binary distribution (something I avoid doing if I can) and the publisher took care to make sure the binary distribution was produced in the same, or an equivalent, environment as the previous version, the version number will tell you exactly what is is supposed to tell you: the two versions are binary-compatible.

    Again, Xerces-C can do this more effectively because most of its dependencies are hidden behind its interface: for the major classes used by the implementation, the implementation provides its own versions. Hence, even the STL is not exposed through its API.

    Maintenance branches, however they are numbered, come with an important caveat, though: you can add, you can fix, but you cannot modify semantics and you cannot remove. If you do either, you break compatibility either by simply crashing the program at some point (or preventing it from executing in the first place) or by changing the way the program works in unintended ways. Putting in place a strict policy of how your maintenance branches are managed, what kind of changes do or don’t get included on such branches, etc. will go a very long way toward preventing damage.


    I hope to have shed some light on the caveats of compatibility management. It is potentially a very interesting subject but usually only becomes that when things start exploding - until then, we kinda tend to take compatibility for granted. Hopefully, this is no longer the case as you reach this final paragraph. If it is, please leave a comment so I can correct the post.

  • The Danger of Breaking Changes

    Xerces-C is without a doubt one of the most popular DOM implementations in C++ (and its Java sibling undoubtedly the most popular implementation for Java). As with any project that lives under the banner of the Apache Foundation the project is managed using a meritocracy-style project management scheme and has been, quite successfully, for the last decade.

    Due to its well-deserved popularity which can be attributed to the run-time stability of its code and its adherence to open standards as well as to good documentation and support by an active community, it has been integrated as the reference DOM implementation for free and non-free XML processing programs alike. XML processing tools such as Altova XML Spy, a very good smart XML editor with which I have no affiliation, but which I do recommend, support Xerces - as well as MSXML - to generate XML-parsing code. That means that, in order to compile code generated by such a tool, you need a version of Xerces-C that is API-compatible with the version the code was generated for.

    Xerces-C recently underwent a rather intrusive change: it now adhers more closely to the XML-1.0 recommendation than ever before, meaning some of its public API was deprecated and/or removed. The Xerces-C team encodes such changes in its public API in its version number, which is structured a bit differently than the way we structure version numbers at Vlinder Software (but then, at Vlinder Software, we have the benefit of having two independent version numbers for each library). I.e. in X.Y.Z, increasing Z means a bugfix has occured but, the version remains compatible at a binary level with X.Y.; X..* is compatible at an API-level, but not necessarilly binary-compatible. Changing X means the API compatibility is broken.

    The dangers of making breaking API changes of such widely-popular (wildly-popular?) software should be obvious: code generated from such tools will no longer compile with the latest-and-greatest version of the software. As binary compatibility is also broken, using more than one version of the software in any “client” software - i.e. any software that uses the software in question - can be very dangerous. I have had to deal with a problem like this before: I maintained a local version of Xerces-C at one of the companies I worked for. My first course of action was to move everyone from Xerces-C1 to Xerces-C2, which was an API-breaking change, By then, Xerces-C2 was at version 2.3 and Xerces-C1 was deprecated for all intents and purposes. Also, the users for Xerces-C1 were running into a few bugs that had already been repaired in Xerces-C2 but were making life more difficult for the software team. The move was a painful one, as some parts of the code had to be re-written from scratch while others needed to be modified. It paid off, however, as we saw our performance increase and at least some of our bugs go away.

    As the software had only a few releases to create every year and we knew in advance when those were going to be, keeping up with Xerces-C development was relatively easy: whenever the upstream base was updated and we were close enough to a release to bother but far enough to be bothered, I recompiled the entire base with the new Xerces-C version, ran the extensive tests on it and, if all the tests passed, packaged it up, put it on our internal package manager and sent out an E-mail. From Xerces-C2.3 to Xerces-C 2.4 that went fine. With Xerces-C 2.5, a new memory management scheme was introduced that broke our code. After some fruitless discussions with the Xerces-C team and some internal discussions, we decided to fork, ripping out the new memory management until subsequent versions of “vanilla” Xerces-C survived the tests. I had some issues with the way the memory management code was designed, so I was not willing to put any time in it in order to fix it, and forking was less effort than repairing. I have since left the company I maintained the fork for and have been told the latest version of Xerces-C2 no longer pose the problem.

    The change from version 1.x to version 2.x was painful, but worth it: the team had been using 1.x for quite a while, but 2.x was ready and stable and, though there was some code changing involved, the result was better, faster and more stable code. The change from 2.4 to 2.5 was not intended to break anything except binary compatibility and was intended to allow for radical optimization by client code. In our experience, however, the change was painful and barely worth it as it rocked our confidence in the Xerces team and finally meant a fork, which in itself can be a costly, painful process.

    Now, version 3 is upon us. The lesson I have learned from going from version 1.x to version 2.x and from version 2.4 to version 2.5 is to take a wait-and-see approach, which is regrettably the wrong approach: the current version of Xerces-C2 is version 2.8.0, which Boris Kolpackov volunteered to maintain for exactly the right reasons:

    While there are preparations to release 3.0 soon, many existing applications won't be able to use 3.0 immediately because of the extensive API changes but would greatly benefit from a large number of bug fixes that have been committed to repository. I therefore have volunteered to be a release manager for 2.8.0.

    The effect of this is likely to be that many, many users will continue to use Xerces-C2 waiting for Xerces-C3 to mature which, because of the fact that Xerces-C2 implements a deprecated, non-standard DOM API, is a Bad Thing. It is one of the dangers of making breaking changes to the API of a very popular software package, though: deprecated versions live longer like that…

  • The Importance of Patterns

    When explaining the design of some application to some-one, I find the use of analogies is one of the best tools available to me - better than diagrams and much better than technical terms: when using technical terms, the listener often starts “glazing over” after only a few seconds - maybe a minute. It really serves no other purpose than showing off how smart you are - and that is usually a pretty stupid (and therefore self-defeating) thing to do.

    Using diagrams works well with engineers (and former engineers) because it seems to get to a part of their brain that is wired similarly to the analyst’s/architect’s brain. UML diagrams - especially sequence diagrams, I find - register very well with most people as they are very easy to understand to most, and easy to explain to most others.

    Analogies, on the other hand, seem to work with by far most people. Explaining, for example, a message pump in terms of an actual pump helps give your listener an idea of what you’re talking about in physical terms - most people have an idea of how a pump might work - and, though the analogy is far from perfect, it’s a good start. The next step would be to transform the pump from something that pumps water into something closer to what a post-office distribution center might use to dispatch envelopes (but you may need to pass by grain conveyors and that kind of thing first, to go from water to a more solid substance). Once you get to the post office, you practically have your message pump already. If you start at the post office, however, for people who don’t know how the post office sorts and dispatches its letters, you may have lost them before you even started. The trick is really to make sure, at every step, that your listener is still following you.

    I recently had to explain one of my previous posts to a lay person - some-one with practically no knowledge of how computers work internally, at all. So, in order to explain what a magic number was, I had to explain what a magic number was for first, so I had to explain (without the benefit of having a debugger at hand) how debugging works, how we can look at the computer’s memory and what memory was. This went all the way down to what a bit is, what a byte is, and how you can make four bytes into a (double) word - all that to show how you can have an integer also “be” a four-letter word. The analogies I used in this went from saying that, of the tiles on the (tiled) bar we have at our kitchen, each was a bit which could either be empty (0) or have something on it (1). Then we went though the powers of two (to arrive at 256 possible values for eight bits) and we went back to the alphabet, via Morse code, assigning a number to each of them, etc.

    It took a while, but had I tried to use diagrams or (worse) only technical terms, it would have been a lot worse.

  • Naming conventions and name mangling

    In C++, any name that starts with and underscore followed by an uppercase letter and any name that contains two consecutive underscores is reserved for any use [] and any name that begins with an underscore is reserved in the global namespace. The intent for this, as explained by several people on comp.std.c++ is to allow C++ name mangling to result in valid names in C (because the two underscores restriction does not exist in C).

    A naming convention I have been using for a few years now includes a rule about scope: anything with member scope has one underscore at the end; anything with global scope (including static class members and enumerators) have two. Technically, this breaks the requirement of []. I have yet to find a standards-compliant compiler, however, that is not able to handle this correctly. If you know of one, let me know.

  • Using Four-Letter Words In Code

    When writing firmware and device drivers, it is useful, sometimes, to have human-readable integer values - i.e. integer values that, when you read them in a debugger, mean something distinctive. This is different from using integers that have a distinctive bit pattern so you can read them on a scope (ex. 0xABABABAB, which is 10101011 repeated four times). So, when generating a new magic number, I usually use od, like this

    $ echo -n {FOUR-LETTER-WORD} | od -t x1
    0000000 50 4f 4e 59

    which would render the magic number 0x504f4e59UL.

    Writing this in a piece of documentation often has the effect that the programmer who reads the documentation find his imagination taking off: how many four-letter words does he know? What does 0x504f4e59UL mean? Is it R-rated or X-rated?

    Actually, it’s G-rated, as all magic numbers, and all technical documentation, should be. Try it to figure it out, you’ll see.

    If you can’t figure it out, leave a comment and I’ll tell you.

  • Mail down - and back up again

    I changed my hosting provider a few days ago, which implied changing the DNS provider as well. As a result of this - and my forgetting to set the MX entry correctly, the mail service for was down. Michel was kind enough to notify me of this, so it’s been fixed this morning.

  • Name For Functionality, Not Type

    I just read a blog by Michel Fortin, where he quotes Joel On Software regarding Hungarian notation, or rather, Hungarian WartHogs. Naming a variable for its type, or a type for its location or namespace, is a mistake.

    I agree with Joel on his introduction: there are different levels of programmers and, at some point, your nose simply starts to itch when you see code that looks OK, but really isn’t. More than once (and I have witnesses to this fact) I have repaired bugs that we knew existed, but didn’t know where they were, simply by fixing a piece of code that didn’t “feel” right. For a few months that was a full-time job for me, in fact: I was to look over the shoulders of programmers debugging things and fix their bugs for them. Though I was really good at it, it’s not a great job to have to do every day.

    So, I agree that at some point, you start having an idea of what clean code should feel like, and you start trying to explain that to other people. If you’re coding in K&R; C, then the original Hungarian Notation that Joel talks about may be a good path to go on. However, if you’re coding in a type-safe language, such as C99 or C++, Hungarian notation, whether it be the app-style or the system-style, is simply a mistake - and a very bad one.

    In case Joel reads this: no, I don’t think exceptions are the best invention since chocolate milkshake - and I don’t particularly like chocolate milkshake either. I don’t passionately hate Hungarian notation. I do think, however, that Hungarian notation is a mistake and that if you think you need it, there’s something you are doing wrong.

    The Example

    Joel gave us an example to get rid of cross-site scripting. I agree cross-site scripting is a problem, but it is a problem only if you don’t obey the rule that you should check what comes into your program with run-time checks - always. Anything you read from a file, a connection, a console, the command-line or any other place where a human being could possibly give you any kind of input, should be considered dirty until cleaned, and should be cleaned as soon as possible. You don’t need any special notation for this (such as us for unsafe string and ss for safe string). In fact, it is a mistake to do that because your name will lie to you. Consider the following code:

    s = Request("name")
    Write "Hello, " & Request("name")

    which Joel “corrected” into

    s = Request("name")
    Write "Hello, " & Encode(Request("name"))

    We agree on the problem of the first version of the code: it is vulnerable to cross-site scripting. We don’t agree on the solution - to encode the string when it is used. I.e., IMHO, the solution should be to make sure the string is never, or at least for as short a period as possible in memory in an unsafe form. I.e., if there is no way to make sure that Request(“name”) returns an encoded (clean) string, the code should be

    s = Encode(Request("name"))

    Joel proposed this solution but rejected it because you might want to store the user’s input in a database. He’s right on that point - he’s also right to reject his second proposed solution, which is to encode anything that gets output to the HTML. His “real” solution is still wrong, however: the first proposed solution just needs a tweak.

    What you need, in this case, is a way to capture your user’s input, clean it and get it in a format that you can meaningfully store in a database and output back to the screen. IMHO, the best way to do that is to use a reversible clean-up method that puts the string in an intermediary form that you can store in the database, and from which you can convert to safely output it to HTML. The intermediate form should be easily recognizable for debugging purposes. I usually use Base64 for this. That way, if you forget to convert from your intermediate form, you are not vulnerable to XSS but you have a (clearly visible) bug. Your database isn’t vulnerable to XSS either, and you don’t need an extra way to make sure of that. Using base64 makes the clean-up completely reversible. However, I concede that this is rather crude. The point is, though, that though this is crude, it precludes from relying on style for the security of the application. Refining the method, wrapping it in an object type of some kind, for example, is straight-forward and comes with more advantages - and very few disadvantages.

    The Fragility of Hungarian Notation

    Hungarian notation is fragile: you have to rely on the names of your variables to tell you something about their type. Even in the original Hungarian notation, there was no functionality-related information so Joel’s “us”, which contains an unsafe string, could be an unsafe string meaning absolutely anything. But that is not the only problem. Hungarian notation makes your code lie to you. Consider the following code:

    us = UsRequest("name")
    usName = us
    recordset("usName") = usName
    ' much later
    sName = SFromUs(recordset("usName"))
    WriteS sName

    which according to Joel is just dandy. That’s nice, until another programmer comes along and inserts something between lines 1 and 2:

    us = UsRequest("name")
    usName = us
    us = UsRequest("address")
    usAddress = us
    recordset("usName") = usName
    recordset("usAddress") = usAddress
    ' much later
    sName = SFromUs(recordset("usName"))
    sAddress = SFromUs(recordset("usAddress"))
    WriteS sName
    WriteS sAddress

    which is fine and dandy as well, but let’s say some-one introduces SRequest, which for some reason is more efficient that UsRequest and renders safe strings. The code is changed (under pressure) into this:

    us = SRequest("name")
    usName = us
    us = UsRequest("address")
    usAddress = us
    recordset("usName") = usName
    recordset("usAddress") = usAddress
    ' much later
    sName = recordset("usName")
    sAddress = recordset("usAddress")
    WriteS sName
    WriteS sAddress

    which means most of the code now lies to you.

    The code presented here is trivial and it is unlikely that this specific scenario will occur. However, scenarios like this occur every day, and more and more code is changed to lie to the reader.

    You need a style that doesn’t let your code lie to you - and Hungarian notation doesn’t qualify.

    Just one more example to drive the point home: in C and C++, the _t suffix traditionally implies that the name denotes a typedef.

    What is wchar_t?

    In C, it is a typedef.

    In C++, it is a built-in type, and the name lies about it.

    The functionality of a variable is very unlikely to change. When the code changes enough for a variable’s functionality to change, the variable is usually renamed because it doesn’t feel right to have a variable explicitly say one thing and do another - explicitly, not in some kind of code that you have to decipher. Use that and you’ll be a lot safer.

  • Hiding Complexity in C++

    C++ is a programming language that, aside from staying as close to the machine as possible (but no closer) and as close to C as possible (but no closer), allows the programmer to express abstraction if a few very elegant constructs. That is probably the one thing I like best about C++.

    This morning, while coding on a product for Vlinder Software, I had a function to write that was to handle at least ten different scenarios, which first had to be identified, and had subtle and not-so-subtle consequences, including, but not limited to, four scenarios in which the function had to recurse up the directory tree. The calling code is ignorant to these scenarios - and should be, for it doesn’t need to know about them. I didn’t want to expose the existence of these scenarios any more than strictly necessary, but I did want readable code. I.e., at the calling point, I just wanted this:

    HANDLE handle(createFCNHandle(monitor));

    This means createFCNHandle had to behave differently according to a set of flags in monitor, and the current state of the filesystem.

    I could have written one huge function with a few loops in it, or broken it up into a few functions that would live in a separate namespace and call that from createFCNHandle. That would’ve been a respectable way of implementing it (the latter, not the former). That’s not what I did, however: I decided to use two facilities that C++ offers that are underestimated most of the time, IMHO: the fact that you can construct an object in-place by calling its constructor, and the fact that you can overload the cast operator.

    Here’s what the code looks like:

    /* this is a pseudo-function: it's an object that gets created by invoking its
     * constructor and is then automatically cast to the intended return type. The
     * naming convention suggests its a function for this purpose. The intent is to
     * be able to split the function's logic into parts without creating a whole
     * bunch of separate functions and thus putting things like enumerations in the
     * surrounding namespace. */
    struct createFCNHandle
    	/* When we're constructed, we may be in any one of the following situations:
    	 * * the monitor was asked to monitor a file that already exists
    	 *   in that case, we need to create a FCN handle for the file or,
    	 *   if the self_remove_and_subseq_create__ flag is set, for the
    	 *   directory above it;
    	 * * the monitor was asked to monitor a directory that already exists
    	 *   more or less the same case as above - the only difference being
    	 *   that we now need to pass the FILE_NOTIFY_CHANGE_DIR_NAME flag in
    	 *   stead of the FILE_NOTIFY_CHANGE_FILE_NAME flag to
    	 *   FindFirstChangeNotification, but if we always pass both, there should
    	 *   be no problem
    	 * * the monitor was asked to monitor a file or directory that does not exist, in a
    	 *   directory that does exist
    	 *   in that case, we monitor the parent directory
    	 * * the monitor was asked to monitor a file or directory in a directory that does not exist
    	 *   in that case, we climb up the tree until we either hit the root, which may
    	 *   or may not exist (might be a non-existant drive in Windows) or we hit a
    	 *   directory that does exist, and monitor it. */
    	enum Scenario {
    	createFCNHandle(Monitor & monitor)
    		: monitor_(monitor)
    		scenario_ = determineScenario(monitor.getPath());
    		file_to_monitor_ = findFileToMonitor(scenario_, monitor.getPath());
    		handle_ = FindFirstChangeNotification(/* ... */);
    		if (handle_ != NULL)
    		{ /* no-op */ }
    	operator HANDLE() const
    		HANDLE retval(handle_);
    		handle_ = NULL;
    		return retval;
    	Scenario determineScenario(const boost::filesystem::path & path)
    		/* ... */
    	boost::filesystem::path findFileToMonitor(Scenario scenario, const boost::filesystem::path & path)
    		using namespace boost::filesystem;
    		switch (scenario)
    		case monitor_existing_file_no_recurse__ :
    		case monitor_existing_dir_no_recurse__ :
    			return path;
    		case monitor_existing_file_with_recurse__ :
    		case monitor_existing_dir_with_recurse__ :
    		case monitor_non_existant_file_in_existing_dir__ :
    			return path.branch_path();
    		case monitor_existing_file_full_recurse__ :
    		case monitor_existing_dir_full_recurse__ :
    			return path.root_path();
    		case monitor_non_existant_file_in_non_existant_dir__ :
    			return findFileToMonitor(determineScenario(path.branch_path()), path.branch_path());
    		default :
    			throw std::logic_error("Un-treated case!");
    	bool recursive() const
    		return (scenario_ != monitor_existing_file_no_recurse__ &&
    			scenario_ != monitor_existing_dir_no_recurse__);
    	Monitor & monitor_;
    	Scenario scenario_;
    	mutable HANDLE handle_;
    	boost::filesystem::path file_to_monitor_;

    Though unimportant details have been removed from the code above, I think it’s pretty self-explaining: the pseudo-function first tries to find out in what scenario it is, then it finds the file it should attach the OS’s monitor to attaches the monitor and is finished constructing. Where the pseudo-function is invoked, the object is constructed after which the cast operator is invoked, which will emulate returning the value. Should the return value be ignored for some reason (and thus the cast operator not be invoked), the destructor will close the handle.

  • Crime, Debugging and the Broken Window Rule

    In the late 1980s New York City was cleaned up from under the ground up: from 1984 to 1990, the New York subway was cleaned of its grafiti, then of its non-paying passengers. After that, when the chief of tge New York transit police became the chief of the New York city police, the city was cleaned up in the same way, and crime rates dropped dramatically.

    The people responsible for this clean-up believed that taking care of the details - the petty crimes and broken windows - would dissuade the (potential) criminals from criminal behavior and, apparently, they were right.

    I think the same is true for code: the way the human psyche works, messy code doesn’t invite the programmer to behave him/herself but seems to rather invite him/her to let go of his/her restrictions and just add a few hooks and patches until things seem to stick together. When there’s no apparent discipline in the code as it was written, there usually won’t be any when there’s code being added or modified. When there’s no apparent structure in the existing code, there is rarely any after a necessary, but most likely unwanted, modification.

    I have seen this happen: I have seen inexperienced programmers get lost in huge piles of code and, through mere desperation, just start adding code to see if it works. I have subsequently see the mediocre tests on those huge piles of code succeed, the software shipped and the fragile equilibrium that kept the thing together simply, but loudly, fall apart. This is how code becomes a maintenance nightmare.

    I believe that code should be readable, structured and neat. Neat meaning something like “tidy”, not like “cool” - though that’s OK too. When the code looks clean, the programmer will have a harder time messing it up, will be more inclined to do a good job and leave no visible traces of his passing - other than, perhaps, a bug that is no longer there. When the code is well-structured, even if that structure doesn’t allow for the modification the programmer would like to make, the programmer will try to retain the structure and will be less likely to hook things together that weren’t intended to be together. If the structure doesn’t allow for the thing the programmer is trying to accomplish, he will be more inclined to replace the structure with a better, more flexible one, rather than for the existing structure into obedience.

    Readable code, finally, doesn’t mean it should be littered with comments: code should definitely not try to chronicle its own history, as some “good practices” would have us do: there are tools to do that, such as Git, Subversion, CVS. MKS, SourceSafe, Bazaar, etc. Even using RCS directly is better than trying to do it in the code itself. Comments also have a tendency to lie about the code, so they should not be allowed to describe the code in any way, shape or form. No function should look like this:

    void foo(struct X x)
    void baz(struct X x)
    void bar(int m)
        // create a VLA of Xs
        struct X xs[m]
        unsigned int n;
        // for each X in x, call foo
        for (n = 0; n < m; ++n)

    Not only do the comments not add anything that is actually useful, but they lie: if you look closely, on line 17 of the example, baz is called, while on line 14, the comment said we’d call foo. In this case, the comment and the code were far enough from each other that the programmer who made the change changed the code, but not the comment.

    Comments should explain rationale, and in some cases contracts. They may be useful for generating documentation, but they aren’t useful for describing the code itself.

  • How Data Transport Should Work IMNSHO

    One of the most ubiquitous problems in software design is to get data from one place to another. When some-one starts coding code that does that, you seem to inevitably end up with spaghetti code that mixes the higher-level code, the content and the transport together in an awful mix that looks like a cheap weeks-old spaghetti that was left half-eaten and abandoned next to a couch somewhere. Now, I have never seen what that actually looks like, but I have a rather vivid imagination - and I’ll bet you have too.

    In my opinion, there is a right way to do it - and there are many, many wrong ways. The right way is trying to chop your data into messages and building a transport layer that is compeltely ignorant of those messages. The GS1 EPCGlobal standards, with which I have worked for the last few years, up until a few months ago, got this exactly right and since I first read their model in 2007, I have applied it in numerous occasions and have started to advocate it whenever there was a reason to do so. I have since refined a few aspects of it to better suit my purposes, so I think it’s about time I did some explaining.

    In my opinion, there is a single, universal way to get a bit of information - no matter the size or the contents - accross from one location to another: you simply split the logic that you need to get it accross into three layers: the Application Layer, the Message Layer and the Transport Layer.

    The Application Layer

    The Application Layer contains all the high-level logic that is basically not involved in getting data from one point to another and should not be aware that there is any other “point” that data might have to go to. All it is interested in is getting things done, no matter how it gets those things done. It reacts to events, which comes in the form of messages, and it generates events, in the form of messages as well. it knows how to handle the contents of those messages and, on an API level, it knows how to extract the contents from the messages - i.e. it knows which methods to call on the Message object to get the contents out. What it does not know, and does not need to know, is where the message came from (in the case of events) or where the message is going to (in the case it generates them) - unless that has some semantic value, in which case it will get it from the message itself.

    When the application layer receives a message, the message has already been validated. That means that it doesn’t need to worry about the validity or the authenticity of the contents of the message: the only thing the Application Layer is concerned with is the semantic value of the contents of the message. I.e. if a message tells the Application Layer to do something and it is possible for it to do so, it should do so.


    From the Application Layer’s point of view, there are two ways for it to receive a message, which are semantically different from each other and, from its point of view, constitute two different channels: there is the Event Channel on which it will receive normal events that it needs to know about in order to perform specific actions, and there is the Exception Channel on which it will receive exceptional messages - such as alarms - that require immediate action. If the Application Layer is not concerned with exceptions or events, it will simply ignore the existence of these two channels altogether - i.e. if there is no need to know, it shall not know.

    A this channel, the Data Channel, is used by the Application Layer to emit queries. Those queries may or may not elicit a response and that response may or may not be delivered asynchronously.

    The Message-Transport Binder

    In order to subscribe to the event and exception channels and in order to use the data channel, the Application Layer uses an object called the Message-Transport Binder or MTB for short. This object, which is largely opaque to the Application Layer, knows a bit more about the Transport Layer and the Message Layer - i.e. it knows enough to bind them together and expose a coherent API to the Application Layer.

    The MTB, which would usually be a self-contained singleton, exposes at least the following methods:

    • send(Message): Message
    • send(Message, NoResponseTag): void
    • expect(Message, Message): void
    • attach(Channel, Observer): void
    • detach(Channel, Observer): void

    The first method sends a message and returns the resulting response; the second sends a message and doesn’t return anything; the third sends a message and expects another message as a response, and will raise an exception if the two don’t match; the third attaches an object as an observer to one of the two observable channels (the event channel and the exception channel) and the third detaches such an observer, providing a no-fail guarantee. The first four methods all provide a strong guarantee. (If implemented in a language that allows for return-type overloads, the first and second methods can be overloaded on return type rather than using a tag to distinguish them.)

    These five methods constitute the “low-level API” of the message-transport binder. This API is considered low-level because the Appplication Layer, in order to use this API, needs to know the Message Layer, because it needs to create its own messages. The MTB may also expose a higher-level API, for which the Application Layer need not know the Message Layer at all (because it would be used behind the scenes) but which would be specific to the application in question.

    The Message Layer

    Completely oblivious to the business logic of the Application Layer and as ignorant about the way the messages will be transported - which is the domain of the Transport Layer - the Message Layer is concerned with wrapping contents into a message that can be understood on both ends of the communication channels. It provides the tools to create a Message object that allows the Application Layer to extract the contents from the message and/or to wrap the contents into a message, and to serialize and deserialize a message, which allows the Message-transport Binder to pass the message onto the Transport Layer without the Transport Layer knowing anything about messages, and vice-versa.

    The API consists of any number of overloads of a getMessage function, each of which returns an opaque Message object with the contents neatly tucked into it. Other than that, the Message Layer API consists of the Message type itself, a way to create association and validation masks for messages (for asynchonous message validation - see below) and a way to serialize/deserialize messages into a memory buffer. Hence, the Message Layer is partly aware of the higher-level communications protocol: it knows how a message is formatted (serialized), how a response is associated with a query and what query corresponds to what message. What it does not know is what those queries/messages mean, semantically (that’s what the Application Layer is about) nor how they are transported from one Application Layer to another (that’s what the Transport Layer is about).

    The Transport Layer

    This is where we see our three channels again, though this time, the code in question doesn’t know, semantically, what those channels are about. The Transport Layer is completely oblivious to the contents, format and semantics of the messages it transports: it sees it simply as data that may be provided with a little bit of meta-data to allow it to perform some actions asynchronously - namely associating and validating response messages.

    The Transport Layer is the only part of the message transport that is concerned with things like Transport Layer Security (TLS, SSL, etc.) authentication (TLS and SSL again), TCP/IP, addressing, etc. While the Message-Transport Binder may know how to map a symbolic node name to an IP address, that is all that it would know about addressing. The Transport Layer, which may or may not be implemented as a device driver in some cases, knows how to get a message from address A to address B.

    It provides a similar API to the low-level MTB API:

    • send(Buffer, AssociationInfo): Buffer
    • send(Buffer): void
    • expect(Buffer, ValidationInfo): void
    • attach(Channel, Observer): void
    • detach(Channel, Observer): void

    The first method is provided with a buffer to send - which corresponds to the serialized message, but the Transport Layer doesn’t know that - and is given the meta-data necessary to associate an incoming message from the data channel with the message it sent. The AssociationInfo object contains three bits of data:

    1. an expected message length: any message that arrives on the data channel that is not of the required length cannot be associated with the message in question;
    2. an association mask and
    3. an association value

    Any message that arrives on the data channel (as a return message) is checked for its length after which the association mask is applied to the message. If the masked value corresponds to the assocation value, the message is returned as the return message for the one that was originally sent. The reason for this is that, although the Application Layer may not be interested in a response for certain messages, the other end of the communication (which receives those messages as an event) may return something (i.e. respond on their event channel, which is the data channel on our side). If those messages aren’t matched, they’ll be ignored but, in order to be able to ignore them, the Transport Layer needs to know how to associate the two.

    The second method is similar to the first, but doesn’t take any association info, so the Transport Layer won’t try to get a response message and any response message that it does get will simply be ignored.

    The third message goes a step further than the first: it will not only associate the return message with the sent message but, once the association is done, will apply a second mask to the message (the validation mask) and will compare that with the validation value. If the two correspond, all is well. If not, an exception is raised. It is conceivible, in certain cases, for this exception to be delivered asynchronously - e.g. in the case where the Transport Layer is implemented in a device driver. In either case, the way the association and validation is done remains the same - and is actually done without any knowledge of what the message might mean. If the validation match isn’t successful, the raised exception will, of course, contain the received message - as long as the association was successful.

    The fourth and fifth methods are exactly the same as they were for the Message-Transport Binder.


    This way of splitting the application (business) logic from the message layer and transport layer, the barrier in between being the Message-Transport Binder, allows for any type of message to be transported over any type of transport, the message having any type of meaning without any of the components being dependant on the other two: chaning the transport affects only the Transport Layer and (very minimally) the Message-Transport Binder (which has to be linked to a new Transport Layer, with the same old API). Adding messages to the protocol affects the Message Layer and the little bit of code that actually uses the new message - which might be in the MTB or in the Application Layer. Changing the message format affects only the Message Layer - so going from, e.g., XML to a binary format is now a matter of hours (i.e. re-writing the serialize/deserialize fucntions), not days or weeks.

    As neither the Message-Transport Binder nor the Message Layer nor the Transport Layer are concerned with the application logic, they are not concerned with anything that might happen in that level either. Adding new actions for a given event, or ignoring events that were previously treated, is now an affair only of the Application Layer. The other two layers (and the MTB) are in no way concerned by any of that.

  • Google releases new dialect of Basic

    And here I though Basic was on its way out: Microsoft has been touting the advantages of C# and .NET in general far more than they have the advantages of Visual Basic (I remember when it became “visual”: it used to be “quick” and that never said anything about run time); and Google seemed to be much more interested in Python and Java than they were in the whole Basic scene. In the circles I’ve frequented for the last several years, Basic was used only in ASP applications and then only if, for some reason, using C# was out of the question. Basic was basically legacy code that hadn’t been replaced yet.

    Now, Basic is back - and it’s Google that brought it back. As part of their Android platform, they’ve introduced Simple, a dialect for Basic that is apparently complete written in Java - and completely written by hand. I’ve taken a look at the code for the parser, written by Herbert Czymontek, who was formerly employed at Sun but now works for Google. At Sun, he worked on Semplice, a project to bring Visual Basic to the Java platform, so it only makes sense that at Google, he would continue on a similar line as he did at Sun (and before that at Borland): he already knew Visual Basic pretty well and, of course, Java as well.

    From the looks of it Simple is bound tightly into the Android SDK though, the way the code seems to be set up, it should be possible to yank the Android out of there and make it a more general-purpose solution for the Java platform - was Semplice was originally meant for. According to this post Semplice died when mr. Czymontek left Sun - he just might have, at least partly, revived his old project but with a narrower scope - which would make it more feasible than Semplice would have been. At the very least, the scope now being limited to Android, he doesn’t have to try to support the whole Windows Forms API that most Visual Basic applications are bound to: Basic programmers (and non-programmers who want a quick and easy way to learn programming) will be able to re-use their (newly minted?) skills on their cellphones and won’t need the Windows Forms API to do anything useful with it. This could potentially open the Android platform to a whole bunch of people to whom it is currently not really accessible - like pure Windows programmers, ASP programmers, etc. who don’t know Java, might not know any C-style language and will now not have to learn.

    Refs: Dr Dobb’s report of this item.

  • Testing QA

    During the development of the next version of Funky, version 1.4.00, I found a bug that hadn’t been picked up during the release process for 1.3.00. Though the bug was in a corner of the interpreter that was new to version 1.3.00 and didn’t cause anything too nasty - just a case where the interpreter rejects a script as invalid when it’s not - it does mean an actual bug got through QA. I hate it when that happens.

    So, now we’re testing QA to see if anything else got through and to see whether it’s “normal” that this one got through. The way we’re doing that is by launching a contest: if you find the bug - or any other bug for that matter, we’re giving away an unencumbered perpetual license to all current dialects of Funky - and if you fix the bug, we’ll pay you $50 Canadian.

    Seeing the nature of the bug, which is really very minor (a script that gets rejected as invalid by the interpreter, but is such a corner case that we might as well have argued that it is really invalid) we decided we can do that with this particular bug: we don’t need to bring out a new version of Funky that fixes the bug right away (except to registered users, who might actually want to use the missing feature at some point). So we took this as an opportunity to test QA.

    The idea is that if you can’t find the bug with an incentive to do so, but we can find the bug, though not during the right release process (after all, I found it during the development process for the next version, but we should’ve picked it up in QA) our process might need a bit of a review so the bug will get caught next time (i.e. a post-mortem review) but we don’t need a major overhaul. If bugs start popping out of the woodworks during the contest, we definitely need to take a closer look at QA.

  • Fixing mistakes

    I just finished debugging a very, very nasty problem, which took me the better part of two hours to find and, once found, only a few minutes to fix. In this case, I have no one to blame but myself, so I really shouldn’t complain too loudly, but I thought it was worth mentioning anyway, to show what can happen if you break the One Definition Rule.

    Let’s get a bit of context first: I am currently writing a piece of firmware and the simulator that goes with it - the simulator allows me, and will later allow other programmers, to test the software that talks to the firmware without having the hardware that goes with it. As many firmwares are, this one is written in a mixture of C and C++ - mostly C - and uses a bunch of structures and unions. The firmware and the simulator, though they use the same source code, do not use the same compiler: the simulator uses Microsoft Visual Studio’s compiler whereas the firmware uses the GNU Compiler Collection with a few settings that, among other things, make sure that unions work correctly. On Microsoft’s compiler, the code uses #pragma pack to make sure unions are aligned correctly. This is where the trouble begins.

    In C++, the One Definition Rule states that a structure, class, function, etc. cannot have more than one definition, that the compiler need to generate a warning - or error - when this is not the case and that breaking it causes undefined behavior. I made a few notes about this before. My programming style usually prevents me from breaking the ODR: I am very careful with things - including pragmas - that change the alignment or that might change their definition according to context. In template code - where this is perhaps most likely to happen if something isn’t defined in one translation unit but is in another - I usually make sure that everything that needs to be defined is defined, using a static assertion (but only if there’s a need: i.e. only if there might be something that might not be defined). This time, I made a silly mistake.

    The firmware is based on a small RTOS that comes with the hardware and that includes a file called “predef.h”. Almost all files in the firmware need to include this file, because it contains a lot of useful things from the RTOS - and most other RTOS headers include the file too. In the simulator code, there’s a set of stub headers that includes a version of predef.h. When writing the stub for predef.h, I mistakenly assumed that it would be included in all files in the firmware - not almost all files. That’s error number 1.

    Error number 2 was being lazy. In stead of putting the pragmas in the right location, where they should be and where I ended up putting them anyway - first because I forgot about error number 1, then because I had to undo error number 2 - I added a line to my predef.h:

    #pragma pack(1)

    This causes the Microsoft compiler to align everything after it encounters this pragma on a one-byte boundary, effectively packing it all together as much as possible.

    All was fine and dandy, compiled and ran merrily, until I started testing a bit more thoroughly. Then, I started pulling my hair out. For some reason, calling a very simple method on a class, which was to set the instance’s ID, didn’t seem to work at all: it said it set the ID, but when I read it back later, it hadn’t done anything: the ID was what the constructor had made it - namely -1. Adding traces to the source code didn’t help much: it said it put in the new ID (0, 1, 2, …) every time.

    Then, I noticed something odd: as I usually do when I add traces in C++, I output this - the pointer to the instance I’m working on. The odd thing was that when the IDs were first set, the value for this for the first instance was 0x004e37c0. When another method was called on the same instance a bit later, this was 0x004e39b2 - 510 bytes further away. The object in question was 9557 bytes in size, so this was moved somewhere within the object itself.

    My first guess was that I had done something wrong when declaring the object’s class, which inherited from two other classes, so I dug up some documentation, counted the base class’ members, figured out how bug they should be, added some traces, found the same size (4 and 13 bytes, resp. - nowhere near the 510 bytes I was seeing) and came to the conclusion that this could not be it. Then, inspiration struck: “wait a minute,” I thought: “I must be breaking the One-Definition Rule on something!”.

    The object in question is part of a structure located in mapped memory in the firmware, which I retrieve using a macro which, in the simulator, actually calls a function that returns a pointer to a static instance of the same thing. In that structure there are several other objects as well - some of them before my object, some of them after. I decided to test my theory by moving my object to the start of my memory-mapped structure. Lo and behold, this no longer moved. I was now sure that I had broken the One-Definition Rule, but on what type? I decided to move my object one step at a time towards the end of my structure. As soon as this started moving again, I would have found the culprit. The next test I did, putting the object right behind the head of an internal FIFO, bang! it moved again.

    The FIFO in question has nodes that look a bit like this:

    struct FIFONode
    	void * reserved_;
    	uint32_t magic_;
    	union Payload_
    		PayloadType1 p1_;
    		PayloadType2 p2_;
    		PayloadType3 p3_;
    	} payload_;
    	bool allocated_;

    When I saw this code, I thought “well that’s odd: I didn’t add a pragma pack to this union - but it’s been working anyway!” so I added the pragma - but saw no effect. I then added some more traces to see whether the size was consistent with what it should be - and it was. Something else was going on.

    Then, it dawned on me that I usually do this:

    #pragma pack(push)
    #pragma pack(1)
    #pragma pack(pop)

    but that I might have forgotten a pop somewhere. That’s when I did a solution-wide search for “pack”.

    Luckily, I did a solution-wide search and not just a project-wide search: the way the simulator is set up, the firmware’s code is in a library that is linked into the simulator, and is therefore in its own project in Visual Studio, as is the interface and the stubs (so three projects in total). Had I done a project-wide search, all my pushes would have had pops and no pack(1) would have been outside a push and a pop - because I would only have looked in the firmware’s code. Now, my search included the stubs, which contained the one naked line:

    #pragma pack(1)

    I removed the line, tested again and found this no longer moved. I put the object at the end of my memory-mapped structure to be sure - it still didn’t move. The mistake was fixed - the One Definition Rule obeyed once more.

  • Working on a programming language

    Like a warm spring breeze writing is to summer's dawn as language to dusk
  • Culture and working internationally

    When autumn turns hence to where winter must come forth spring awaits summer
  • Google Chrome OS: Promising - but promising what, exactly?

    is this coming spring or is't autumn in disguise? spring doth promise much!
  • Funky, functional programming and looping

    functional combines programming summers into sheerly fun coding
  • Storing data in an optical illusion

    For the past five years now, I’ve worked on vision inspection systems for the pharmaceutical industry. In those years, I have seen many applications in which cameras were used to read data on bottles, cartons, even tablets. Barcodes can be printed almost anywhere and can be of almost any size. One application I’ve worked on - with a whole bunch of other people, of course - had Optel vision systems inspect datamatrix 2D barcodes with ten digits in it (a 12x12 ECC200 datamatrix) printed on only 3x3 mm on the neckband of a vial. The system had to be able to inspect several dozens of these a minute, using VGA resolution cameras - and they were small enough that it was hard to find them if you didn’t know where they were. Let’s just say this was one of the more challenging systems.

    Now, researchers at MIT have come up with a new way to print data matrix barcodes: the barcodes are printed in a 3mm dot that looks like little more than a blot but, if looked at with a camera that’s set out of focus, contains one or more data matrix barcodes. With that, the code will not just be hard to find - it will be all but impossible to find if you don’t know it’s there.

    I’m wondering what this will be used for: storing information in almost anything is already possible, using RFID, for example. Even if Bokodes are very small and, for the naked eye, difficult to identify, it is a line-of-sight technology. Unlike RFID, which allows the Wal-Mart cashier to see that you have a bottle of blue pills in your pocket, a camera has to be in the line of sight of the Bokode, not just in proximity.

    The authors suggest using it for positioning, encoding position information in each of the data matrices encoded in the Bokode, and show that it can be very accurate for that purpose. Use in research, and perhaps off-the-shelf video games, may be obvious uses that could be developed in the short term. I wonder, however, what’s next? What would the privacy concerns for this be?

    Comments welcome.

  • Critical sections - of what?

    a glass of water may sometimes have a storm, but blizzards there are rare
  • A glimmer of hope on comp.std.c++

    Sun to early spring is like snow is to autumn: unexpected joy
  • Ah - The One Definition Rule

    In response to Scott Meyers’ question on non-inline non-template functions and the one-definition rule, Francis Glassborow replied with a very interesting example of two lexically identical functions that weren’t actually identical.

  • Recursive Locking Is Evil, or is it?

    recursive locking: winter's way of saying "yes", to summer's loud "no"
  • No Concepts in C++0x

    Sadness of winter decided this summer, when no concept survived
  • Why you shouldn't inflate your resume

    Many people inflate their resumes when they apply for a job. When I’m on the hiring side of the equasion, I tend to frown upon such practices: though I usually don’t care much about references, I do check the outliers. But what I check more is expertise - and that’s something I can’t stand inflation on.

  • @msofficeus @fixoutlook - what's the big deal?

    flabbergasted I long for winter in summer? perhaps just autumn
  • Seven?

    Clarification as clarity of summer - yet not unlike spring.
  • How to become an expert

    Many summers spent on coding and on code, yet... expertise attained?
  • The IKEA Approach

    spring cleaning brings it perhaps not any cleaner - at least much leaner
  • A new blog

    the first of summer the very last of winter? the first of blog posts

subscribe via RSS