S4x19 – ICS Detection Challenge 2.0; again, from a technical perspective overview

This challenge was performed independently from my employers and my views/perspectives are those of my own, and may not represent those of my colleagues or employer. The PCAPS and the attacks cannot be released - sorry

In continuation of the ICS Detection Challenge (read more here) started in 2018, we had big boots to fill, and several political aspects combined with the “points” controversy to overcome. The debut event was the first time we kicked off an event of this magnitude, and certainly, as with any sport/competition, it takes 100+ years of continual improvement to nail down the path forward. S4x19 was an absolute success from my perspective despite limited participants (FOSS, Kaspersky ICS and Dragos). Many new discussions kicked off and I believe onlookers, participants, asset owner and all parties had an overall win-win as an outcome.

The challenge at a high-level is characterized as a large dataset obtained from 3.5 hours of real-world data obtained from 7 zones within a large copper mine. This resulted in 400GB of network traffic in PCAP format for the organizers to use as a basis of an asset detection challenge, but also as the foundations of a multi-stage attack. Then using tools developed by my previous company, I deep-anonymized the captured traffic to a reasonable level given time constraints, characterised and profiled assets, mapped an attack flow with multiple stages and an “end-game” target, and injected a realistic attack customized to the network.

Once the readied attack dataset was uploaded, it was distributed to competitors along with a neat note from the “attacker” . Attackers had roughly five (5) days to review the data, and fill out a reference judging sheet (scores were not calculated, but used to validate their findings).

The results were then presented and shown at S4x19 with a recording to be released a future date (will update). Both vendors’ videos are uploaded here and here. Points were removed NOT because of last years flak (and Pale Deterson’s Twitter), but because a competition does not exist with two (2) vendors. In fact, there was an objective scorecard that was similar to a Scantron with drop-downs, but that’s not the way the ICS detection challenge played out.

Regardless of who, what, and how, competitors participated – there were three (3) major areas for improvement were identified at S4x18 and improved upon in S4x19:

  • Scoring & Judging (objectively, and could not reflect Bonus Points for REAL NDA’d/sensitive items – no points for you)
  • Complexity of the attacks (bigger, better, harder real-world attacks with noise)
  • Validating vendor claims (again, helping the community and asset owners move forward)

Despite the above three points, we also need to acknowledge these other perspective facets:

  • Competitor interests (e.g., reputation, risk/marketing etc..)
  • Industry needs (e.g., one tool does not rule them all, and highlight that fact)
  • Asset owners looking to purchase (e.g., “if you are not in the competition, you are likely not a product they will purchase” - said a prominent OT security lead in mining)
  • Domain knowledge (S4x18 was O&G, S4x19 was mining)
  • Security is holistic (you need bodies behind desks, incident response (IR), policy, reporting, ticket integrations and more – OT should also possess many of those processes/activities as part of better engineering for the organization)

I’ll let Dale Peterson enunciate most of those latter points (and reasoning for change of challenge format), but from my perspective, I believe the industry needed a neutral party to test claims made by vendors, but also to allow vendors to safely participate in new domains, to learn where potential gotcha’s exist (reducing risk for future deployments and asset owners), provide an opportunity for “free” training, and quite frankly, vendors receive free destructive product testing (which the audience won’t see). Asset owners themselves get several benefits such as a free audit/feedback too! And my interpretations of their product and feedback based on my experiences/industry involvementplug>

Besides the wins noted above, my motivations are to demonstrate for the community that custom real-world live-fire exercises are possible and to push the boundaries of what red-team/table-tops can deliver while being safe, informal, and mutual wins; I’m hoping people recognise the value/applications of the approach, and the achievement of being among the first to demonstrate a nation state style attack in the public domain. When Dale and company asked me (Ron) to run the challenge again, I chose to accept it with intensity and integrity, but also the deviousness an attacker out apply. It wasn’t to “outsmart” the vendors, but to make things realistic and create a pseudo benchmark for the industry (this was a comment by one event attendee, and its fair/objective in my opinion).

Developing the challenge at a high-level

Without giving away the secret sauce, I and some of the other volunteers have had many years of practice in this domain. Much of this intellectual property is embedded as knowledge/expertise, but also as custom or in-house developed tools. Both of the previous items though, rely on the most important aspect of the challenge: context and asset profiles.

To derive the overall context of the challenge within the industry/dataset domain, I needed to research the industry, devices deployed, OT processes, and the organization’s history, which includes acquisitions and mergers. These are all key elements that allow me to devise the term list for anonymization, but also to get an adequate feel that would be conducive to a realistic and sophisticated attack. For the S4x19 challenge, I was provided merely an XLS with asset name, slight asset details and an IP address; no architecture information was provided, which allows me to model the efforts of an attacker realistically at the expense of time/effort.

Let’s look at this example. I have over 4000 devices, servers, hosts and other networked nodes on a network. To build an attack, I need to:

  1. Map the communication flows between devices, zones and network captures
  2. Understand what data is flowing between those points, and over gateways/routers
  3. Profile key-assets and derive their “classification” to be used as part of the attack flow

I believe the first two points are easy to understand, but the last point is difficult because the profiler needs to look at all available traffic, applications running, and apply intuition to generate a realistic attack launchpad/or recipient. In continuation, X vendor commercial printer is not going to generate industrial network traffic such as Modbus, but a Dell Windows Workstation hosting a Human Machine Interface (HMI) and programming applications would.

All of these aspects have to look realistic, and plausible. Some attacks might look sloppy in real-life, and some might be absolutely meticulous as would be executed by a true professional; including realistic threat behaviours. This includes the end-goal, which in S4x19, was the powerhouse (I’ll withhold my reasons for that target, but I chose it due to research, and have been warned against discussing it in detail due to sensitivity given its critical nature). Furthermore, real networks, and various segments will often have noise, whether intentional (e.g., OT firmware changes), accidental (e.g., acceptable use/best practice violations), or malicious (e.g., malware infections). These too need to be considered and added to the dataset. This is hard, and part of my expertise (I’d love to be engaged to execute these types of mandates or deliver live-fire events).

Once you have all of those aspects, attack samples (which also need to be modified for relevance and matched to identified hosts), then all of the data samples need to be merged (I use the term: stitched) into the master dataset(s) while maintaining the validity and integrity of the original pcaps. You can’t merge PCAPS “butt-to-butt”, they have to be stitched with jitter, have bit-rates normalized and woven into the master. Unfortunately, from an IO resource perspective, this is quite resource-intensive and time-consuming.

Executing the event

Previously, I replayed the challenge’s network traffic (3.5GB) on an industrial switch that was capable for replaying like a hub. The same format over two days was originally going to be applied, but due to changing circumstances, the event execution was altered. This year, we distributed the PCAPs via a filedrop (130GB), and allowed the contestants to review the PCAPS over five (5) days, and present during a twenty (20) minute their findings using their platform.

The results were not scored, but contestants provided the video/judging sheets as reference for myself and my colleague (Marc-Etienne) to review. This allowed us to match the events understood by competitors over the actually injected attack(s), and track their investigative progress towards the “end-game”.

Then over a one (1) hour session on the main stage, Dale Peterson and myself invited the contestants (except for the FOSS team) to illustrate their results with a video rendered the day before, and to discuss their findings within a twenty (20) minute window for each vendor. Dale and I mediated, and then eventually, we opened the floor for a friendly and safe discussion to close.

Closing notes and outcomes

I’ll re-iterate that I believe that the S4x19 challenge was a milestone of improvement computationally, in size, complexity, and in positive outcomes even though several vendors did not participate (I respect their decision, but I think they could have participated and communicated their concerns).

Dragos, Kaspersky ICS, and the FOSS team garnered experience, and lessons learned. Furthermore, I learned a lot about a new industry, made new acquaintances, and enhanced my tools/approach to build a unique/boutique live-fire attack. Information was shared safely and respectfully, participants handled outcomes with grace, they were tested, and mistakes were noted/passed along, and asset owners truly got to see a real-world test of an almost blind challenge. To add to the blindness, Dale only saw the attack outline & related complexity the evening before the results were shown on stage! Competitors had known for sometime the scoring card and methodology as well, although, I was never present for those discussions with the competitors.

Today, in the ICS/SCADA/critical system security sphere, there is an absence of benchmarks to test products and vendor claims. Some attendees would say to water down the difficulty of the challenge, and I would disagree – we executed this challenge neutrally to ensure the industry moves forward agnostic of business influence. I do believe that vendors should have a say in the presentation of results (after all, their brand is potentially at risk, and it is an investment), but the industry/asset owners need these kinds of events to spur progress, discussion and innovation. It was almost an open ISAAC, and whether Dale’s S4 is the right place for such an event is up for debate (as mentioned by a large vendor not participating), it is an industry achievement. Period.

On a personal note, I invested 400-500 hours of my own unpaid personal time in this event. Others invested varying amounts of time beyond their daily responsibilities too. Despite all of us having some emotional attachment to this challenge, I believe these kinds of events and industry engagement are absolutely necessary. There will always be winners, losers, naysayers and so on, but in the spirit of professionally improving overall industry safety/reliability – I take that feedback humbly, and will certainly try to integrate it back into any future events; it’s work in progress after all.

I thank and highly appreciate the asset owner, vendors, organizers, volunteers, supporters, and attendees for allowing us to run this event. I look forward to more in the future (but please don’t make me do it for free again – your support is required). Please follow up for more coverage, and reach out on how you can constructively and positively participate/engage us in the future.

Blog tags: 

Add new comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd> <python> <c>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
This question is for testing whether you are a human visitor and to prevent automated spam submissions.