ICS Security: Current and Future Focus

The flurry of DNP3-related vulnerabilities reported to ICS-CERT as part of Automatak’s project Robus seems to have subsided a bit, so it may be time to take a look at where we are regarding ICS security, and where we might be going next.

Of course, I’ll only look at communications protocol security in this context: low-tech attacks on the grid1 is outside the scope of this article. In stead, I will take a look at two questions: why the focus on DNP3, and what else could they, and should they, be looking at.

The current focus on DNP3

There are two questions we can ask about the current focus on DNP3: why should we focus on DNP3?, and why did Crain and Sistrunk focus on DNP3? But before we ask those questions, we should get a quick idea of what DNP3 is.

DNP3 in a nutshell

DNP3 is an immensely complex protocol. The IEEE standard that defines it, IEEE Std™ 1815-2012 is over 700 pages long and still leaves parts of the standard to the imagination of the implementer — which is why the the DNP3 Technical Committee members write Application Notes and Technical Bulletins which are then debated in bi-weekly telecon meetings and in lengthy E-mails which continue through holidays, week-ends and well into the night, come hell or high water.

DNP3 also has a long history, dating back to “before rocks were old”2 and which is partly described on the DNP user’s group’s website.

This means that in order to implement DNP3, you need a lot of code. One partial implementation of DNP3 is OpenDNP33, which for the benefit of this post I downloaded from GitHub and ran cloc on:

To see the cloc output click here.To hide the cloc output click here.

     691 text files.
     684 unique files.                                          
      40 files ignored.

http://cloc.sourceforge.net v 1.55  T=4.0 s (162.5 files/s, 22578.0 lines/s)
Language                     files          blank        comment           code
C++                            244           6039           6101          21064
XSLT                             2             57            456          12515
C/C++ Header                   258           5081           9196          11838
HTML                             1             16              0           3391
XML                              5             34            205           2341
Java                            90            660           2713           1824
m4                              13            190             15           1409
ASP.Net                          2            442              0           1133
C#                              27            265           1488           1102
make                             3             26              3            341
MSBuild scripts                  3              0             21            198
Scala                            1             24              0            114
Bourne Shell                     1              1              0              9
SUM:                           650          12835          20198          57279

In over 57,000 lines of code, there’s bound to be a bug or two — even if only half of them actually implement DNP3, there’s still bound to be a bug or two in there that may affect device performance.

Why should we focus on DNP3?

  1. DNP3 is very widely used in the industry
  2. DNP3 is one of the few SCADA protocols that actually has security-related features
  3. DNP3 is well-defined
  4. DNP3 has a very active, receptive and open technical committee with very well-hearsed members that dedicate a lot of time and effort to making the protocol better, and better understood

In North America, DNP3 is used by the vast majority of utilities and supported by a large number of devices (probably the majority as well, but the installed base is huge and very old, so I won’t venture to asserting that it is a majority).

DNP3 is one of few SCADA protocols that have any security-related features — namely Secure Authentication. Most SCADA protocols work without any security at all.

DNP3 is well-defined, so if anything is found with the protocol itself (which has not been the case so far) it can be fixed. Devices are tested for compliance using compliance tests published by the DNP3 User’s Group. They come with a machine-readable Device Profile which states what a device can and cannot do, which parts of the standard it supports and to what extent, and which version of the compliance tests was used to test it.

The technical committee is security-aware. Its members know what they’re doing and are very actively seeking out issues with the protocol and user’s understanding of the protocol. For example, as a response to the Crain-Sistrunk vulnerabilities, the Technical Committee drafted an Application Note which is publicly available and which describes how one should go about validating incoming DNP3 data.

This means that

  1. finding security issues is likely because of the large amount of code needed to implement the protocol
  2. finding security issues will have a positive impact on the safety and security of a critical part of the infrastructure throughout North America
  3. finding security issues is likely to lead to a fix of those security issues (because the user-base of the protocol is active and the technical committee is security-aware and inclined to be receptive)
  4. security issues are likely to be the result of implementation issues rather than specification issues (because the technical committee is both security-aware and technically savvy, as well as pragmatic w.r.t. what the standard should look like)

Why did Crain and Sistrunk focus on DNP3?

I see two obvious reasons why Adam Crain and Chris Sistrunk would focus their efforts on DNP3:

  1. Adam Crain wrote a large part of the OpenDNP3 stack and makes money off it.
    Part of his efforts in looking for security bugs stems from debugging OpenDNP3
  2. Both Adam Crain and Chris Sistrunk work in the field of the Smart Grid.
    DNP3 is an important (i.e. widely-used) SCADA protocol that they both know well. Additionally, Chris Sistrunk has access to a wide variety of devices he can run tests on.

There are also a few less obvious, and perhaps less important, issues to consider:

  • DNP3 is a large and complex protocol. There are bound to be bugs in any implementation (so this is relatively low-hanging fruit).
  • Security issues have not been taken particularly seriously in the smart grid historically.
    From Adam’s tweet concerning two posts on this blog:

    one might surmise he is somewhat frustrated with this state of affairs

  • Disclosing these issues has lead to the Aegis Consortium, which will pay for the continued development of the fuzzing tool used to test the DNP3 stacks and devices — and will therefore pay for its developers’ livelihood for the foreseeable future.

This is by no means intended to be cynical: I think his apparent frustration with the lack of interest in security is justified, and I think making money off one’s own effort is nothing anyone should be ashamed of, or consider shameful in someone else — as long as those efforts are honorable.

In the case of Crain and Sistrunk, I believe they uphold a very high standard of “honorable” in terms of so-called “white hat hacking”. So far, they have done everything right:

  1. they have practiced “responsible disclosure”:
    • they have contacted the DNP3 Technical Committee with their findings, without specifically naming any vendors to the vendors on the committee
    • they have contacted ICS-CERT with their findings, providing the necessary information to the vendors to fix their problems free of charge
    • their smart fuzzer tool was on,y made available after vendors had had a chance to fix their implementations and utilities had had a chance to deploy them4
    • their smart fuzzer tool is available to vendors (though not for free)

    They have all the information they need to attempt to extort money from vendors, sell vulnerabilities on the black market to the highest bidder (as zero-day exploits), etc. but, as far as I can tell, have done none of that but have taken the (rather less lucrative, but far more ethical) path of responsible disclosure all the way.

  2. they have been careful to protect the reputations of all involved:
    • they have made it very clear that the problems they have found so far are implementation problems, not protocol problems
    • they have made themselves available to the DNP3 Technical Committee, to help draft documentation to prevent future vulnerabilities, but have made their detailed findings available only to non-vendor members of the committee (so no vendor knows of the vulnerabilities of other vendors, which would cause conflicts of interest)
    • they do not publish the names of affected vendors/devices until the ICS publishes the advisory, at which time the vendor will have had time to prepare a response

What’s next?

Among the protocols that are also often used in critical infrastructure, a few stand out:

  • Modbus (modicon)
    Widely used, old, fairly simple, but often implemented ad-hoc. There are bound to be security issues and robustness issues to be found with a smart fuzzer.
  • IEC 61850 and GOOSE
    Complex, widely used (especially in Europe). Excellent candidate for fuzzing
  • IEC 60870-5-101/104
    The protocol DNP3 was originally spun off from. Fairly widely used, as complex as DNP3, similar in design to DNP3.
  • any of the large number of home-grown vendor-specific, device-specific protocols
    a “metasploit for ICSs” would be wonderful to have

Some of these are already on the list of protocols the Aegis people intend to look at (their choice of IEC 60870 protocols is 60870-6 rather than 60870-5). I would be surprised if they don’t find anything interesting — which means there should be vulnerabilities being discovered by this project for several years to come.

In the interest of full disclosure, I should indicate (at least in this page) that I work, among others, for Eaton’s Cooper Power Systems EAS, which is the manufacturer of one of the devices and vendor of another software solution which were subject to these advisories. I was involved in the response to those advisories. I am also a (non-voting) member of the DNP3 Technical Committee.
That said, the opinions I expressed in this article are my own. I am not getting paid for writing this and, to the best of my knowledge, everything in this post is truthful.

2014-07-10: link to the DNP3 Application Note corrected — thanks to Chris Sistrunk for pointing out the broken link
2014-07-10: hand-drawn illustration of the balloon hack replaced with one made in SketchUp. Thanks to Cadyou user Anthea16 for sharing a beautiful drawing of a pylon

  1. e.g. letting two helium-filled balloons up with a wire between them, under a high-voltage power line, in order to cause a short between the phases
    Balloon hack illustration

    Balloon hack illustration — don’t do this!

  2. this is a direct quote from one of the earliest members of the DNP3 Technical Committee, but I don’t remember which one []
  3. I say “partial” because they don’t implement DNP3 SAv5 yet, and there’s probably some other features missing as well — I don’t know much about this particular implementation []
  4. though the time to deploy the fixes was very short: due to power consumption cycles, every time is not necessarily a good time to plan an outage to update firmware, as this chart from Statistics Canada illustrates:

    Power Generation per month in Canada

    Power Generation per month in Canada


About rlc

Software Analyst in embedded systems and C++, C and VHDL developer, I specialize in security, communications protocols and time synchronization, and am interested in concurrency, generic meta-programming and functional programming and their practical applications. I take a pragmatic approach to project management, focusing on the management of risk and scope. I have over two decades of experience as a software professional and a background in science.
This entry was posted in DNP3, Industrial Automation, Smart Grid and tagged . Bookmark the permalink.

6 Responses to ICS Security: Current and Future Focus

  1. Adam Crain says:

    Good post. I think you pretty much hit the nail on head. I’ll offer a little insight into motivations.

    Opendnp3 is about showing the industry that competing over “plumbing” is counter-productive, especially given the size and complexity of current standards. The project for me is a labor of love and an attempt to create a self-sustaining open source project in an industry that has very little OSS activity. We’ve had great adoption over the past year, and I’m very excited to bring the project to feature completeness. After SAv5, we’ll add the features needed for WITS. For the record, I barely make enough money off opendnp3 in a year to pay a single developer salary. Perhaps when it more heavily used, I can sell support contracts. This would also probably make the project more attractive to larger vendors.

    This does not mean that I think that proprietary source code vendors are somehow evil. IMO it’s good for end users and vendors to have options available that don’t involve lock-in and offer a proactive approach to software security/robustness. My primary criticism of vendors has not been that they don’t open source their software. Nor has it been that their products are riddled with low-hanging defects. The only criticism I’ve had of vendors is how some have handled vulnerability reports. Some have shined in this area, whereas a small few have lashed out inappropriately. I really don’t care how people get to secure code, but with limited budgets I see a consortium / OSS approach as a more viable possibility than the two current options: write your own or use an expensive library from a company that apparently has little security posture.

    With regard to the fuzzer, it was written originally to test opendnp3, not for actively deconstructing other implementations. It was used for that later, but only via serendipity. I have actually talked to the technical committee on and off about fuzzing since 2010 when I have them a “peach pit” for the Peach Fuzzing Platform. I repeated this in early 2013 by open sourcing what is now Aegis and talking on the technical committee conference call about it. I really don’t think anyone “got it” at that time. Once you show people defects in dozens of products, it changes things.

    60870-5-103/104 is an excellent candidate. All of the protocols you mentioned will have the same issues:

    I talked about the size of specification vs defects in some length here:

    • rlc says:

      Hi Adam,

      Thanks for your comments: they help shed some light where I was merely speculating.

      I do wonder how opendnp3 shows the industry that competing over plumbing is counter-productive: before opendnp3, since ASE stopped selling their stack, adding DNP3 support to your device means either rolling your own stack (feasible, but a lot of work) or buying Triangle’s stack (expensive, but maybe less so than rolling your own). So the way I see it, opendnp3 is all about competing over plumbing — and very effectively so as opendnp3 is basically free.

      As I mentioned in earlier posts [1, 2] the industry still has some way to go w.r.t. security awareness. I honestly don’t know how wide-spread good practices regarding security in development are yet, but there does seem to be a trend brewing: Cooper’s SMP Gateways are all Achilles certified and regularly fuzzed, GE just bought Wurldtech so I’m guessing they’ll be updating their devices as well, Triangle re-vamped their website and seem to be improving their stance on security, etc.

      With the DNP3 Tech Committee also revisiting conformance certification (not security-related) your TB on Octet String g110v0 coming out soon (security-related) and my TB on g114/g115 hopefully coming out by the end of the year (more or less security-related), and EPRI setting up a multi-vendor Secure Authentication demo, the DNP3 UG is certainly playing ball.

      Hopefully, all utilities will start asking questions (security is more than authentication and robustness, after all). Them being the customers, they (and ultimately we, consumers) have to pay the bills to get where we need to go…

      • Adam Crain says:

        With regard to competing over plumbing, consider this analogy. Where would we be if every vendor rolled their own TCP stack?

        DNP3 is core plumbing in this industry, just like TCP is core plumbing. You have to implement it to sell a product. It’s incredibly hard to get right because of it’s complexity, and even more work to maintain.

        Why then do we have 30 implementations, most of which are non-compliant and full of bugs? This seems like a huge waste of resources to me in a sector that badly needs to be focusing limited development resources on other things.

        Instead of building or buying, there’s now a “contributing and sharing” option. Not all vendors will take it, but a growing number already have.

        This demonstrates a 3rd option: Vendors could open source their protocol implementation in the hope of building a self sustaining community to support it. This is what my ex-employer did, and for their investment others have carried the codebase along and improved it. This is why Google, Netflix, and other large technology firms open source projects that are “infrastructure”. They can tap and benefit from others with the same itch. If it’s not part of your core business model, why not? Clearly selling protocol stacks is TMW’s core business model, but how about Schneider, GE, ABB, Cooper, Siemens?

        This is a growing trend in the broader technology space, and perhaps it’s naive to assume the same thing can happen in this space, but you never know.

  2. Jake Brodsky says:

    Yes, the DNP3 protocol standard is subtle and complex because event-oriented SCADA concepts are subtle and complex. We need to remember what this protocol was designed to overcome: low bandwidth, high latency communications using memory limited, low performance, low power devices in the field.

    Computing capacity has grown exponentially at a furious rate while power requirements shrank since the DNP3 protocol was conceived in the early 1990s. Bandwidth to the remote station is also getting better at an exponential rate, though latencies are probably more inconsistent than ever before. Thus, many of the assumptions behind the original protocol design goals are no longer as true as they once were.

    For example, remote devices can maintain their own clocks with better reliability and more accurately than they could in the past. The need to convey the time through a DNP3 connection is not the requirement that it once was.

    So, yes, there are some old features still lurking in DNP3. Some of these “forgotten” features are a problem for those who seeking protocol security. And in fact, I have pointed long before I became a member of the Technical Committee that any attempt to “bolt on” security after the fact is bound to be a less effective solution than a completely new system designed for security from the ground up.

    Despite the “bolt-on” approach, the DNP3 secure authentication effort is the best anyone has produced for a SCADA system to date. We chose the secure authentication approach because at the time it was conceived, we were concerned that the export of cryptographic software in a commercial product might subject vendors and users alike to munition export laws. The use of a Virtual Private Network in an embedded device was not nearly as commonplace ten years ago as it is today.

    The problem with secure authentication is that it will only work as well as the rest of the software in the protocol stack does. In other words, if the message parsing software goes off in to the weeds due to a deliberately malformed packet, no amount of authentication can save it.

    For this reason I recommend that SCADA users take a “Belt and Suspenders” approach by using a Virtual Private Network appliance in addition to the authentication. The authentication could be used as an audit trail for who did what and when, while the VPN is used to keep the casual miscreants at bay.

    I would be lying if I said that the fuzzing exercises were an entirely pleasant experience. However, I believe we are stronger for it. I am also thankful for Adam Crain and Chris Sistrunk’s efforts to maintain high ethical standards while dealing with a few less than stellar responses. It is my hope that the community has learned from this experience and that there will be a more aggressive reasonable, and focused response to future reports.

    Finally, for those of who are smug enough to believe that their protocol stack is immune to such flaw discovery, I have this advice: Fuzz your own products and fix your problems before someone else does it for you. The former is much less expensive in the long run than the latter. What is more: Utility users like me will notice and appreciate the aggressive stance toward security and stability.

    • rlc says:

      Hi Jake,

      Thanks for your comments: it helps having the perspective of someone working for a utility.

      Perhaps I should have been clearer to this point, but I rather like the DNP3 protocol for its event-oriented design: the ability to be notified of events you’re interested in, and only those you’re interested in (think unsolicited responses and dual end-points) can be harnessed for system design well beyond what one might be able to do with some other protocols (which shall remain nameless) while still allowing event polls for events you’re less interested in but still want to know about.

      Funny you should mention clocks: I’ve been immersed in IRIG-B, C37.118.1, 1344 and 1588 for over three years now. DNP3’s ability to update system time — reasonably accurate to within a few milliseconds if done right) means you don’t have to rely on a second protocol (like 61850 does). I don’t think that is, nor should be, a forgotten feature, even if devices are getting better at keeping time.

      I think the “forgotten features” that also happen to be dangerous are g102, g110, g111, g112 and g113: octets, octet strings and virtual terminals. None of these are full-fledged objects and none of these should be useful. I can think of a number of ways virtual terminals could be abused, and without TLS, even their foreseen use is a gaping security hole.

      I agree that belts and suspenders are good, but would personally prefer DNP3-SAv5 over TLS rather than over VPN, but perhaps (likely) I don’t know enough about VPN solutions in the utility space…

      • Jake Brodsky says:

        I used the example of time synchronization because many are discovering that between issues of variable network latency (which DNP3 does not handle well) and the inexpensive radio clocks of various flavors, that many would choose to ignore this feature in a new installation.

        I could have cited many other things, but this is an easy feature to discuss because it does not require intimate knowledge of DNP3 nor does it expose a roadmap for people to hack existing installations. I do not like to air dirty laundry unnecessarily.

        As for a VPN appliance, I prefer that approach because it can be updated and re-keyed independently of the RTU. It is a completely orthogonal platform. If it were compromised, secure authentication could provide some temporary security to prevent abuse of the link. Conversely, if the SA features were compromised, the VPN would remain a barrier to keep the public at bay. It also goes without saying that key distribution for each of these devices should be separate and independent.

        This issue is one performance related test that I look forward to seeing in the upcoming EPRI tests.

Comments are closed.