A Cyber Security Professional’s Quick Start Guide To Operational Technology

A cold, hard truth exists for conventional career paths in cyber security, and that is that it’s not quite as sexy as one may believe. Media makes a point to portray it largely as sophisticated hackers using their skills to break into government systems or take down nation-sponsored threats. While very exciting and technical jobs certainly exist, depictions of them conveniently leave out the endless hours spent in back-to-back meetings, mountains of documentation and grueling regulatory requirements. These characteristics can make infosec a particularly nuanced member of the greater Information Technology community. I recently found myself engaged in an even further distinguished sector of the field prominently known as “Operational Technology” cyber security, where the rules are basically the same but entirely different from more business-focused corporate-y security. This article will dive into the most important aspects of OT and serve as a reference guide for OT Cyber Security professionals.

“Operational Technology” is a broad term that refers largely to IT environments that support critical infrastructure assets and processes, such as those found in the industrial sector. Some industries encompassed by this umbrella are energy, water, manufacturing, transportation and the like. As such, it is common to hear the technologies themselves described as Industrial Control Systems (ICS). These systems as a whole enable the ability to remotely control, diagnose and maintain machinery used in the larger industrial sector through a variety of networking protocols, some of which are specific to the OT environment. Key components of ICS Operations include the following:

Sensors – ingest data from controlled processes, which it sends to Controllers for further action
Controllers – interprets the data provided by sensors, which it then uses to issue commands to targeted Actuators
Actuators – machinery such as valves, switches, motors and breakers which directly manipulate controlled processes
Operators – human workers using various interfaces to configure, monitor and maintain the system as a whole

Previous generations of industrial control systems relied heavily on specialized equipment, though progress made in computing technologies over the last several decades has allowed for a convergence of traditional IT platforms with industrial machinery. So-called Supervisory Control and Data Acquisition (SCADA) systems, they have allowed for improved functionality while easing the budgetary requirements of companies. The caveat here, as one would guess, is that OT/IT convergence introduces all of the same requirements of a traditional IT environment to the more sensitive OT space (plus so many more, but we’ll discuss that further when we get to regulatory bodies). Patching, security, maintenance and life cycling, support, you name it; if it needs done in IT, it’s now conventional for OT. We, of course, are focusing primarily on the security aspect. As mentioned before, the rules on the operational side of the fence share significant resemblance to it’s IT counterpart, but perhaps with differing priorities and additional precautions. The best place to start, I’d argue, is any professional’s best friend: The CIA Triad. “Traditional IT security objectives (heavily influenced by the banking and financial sectors) typically follow the priorities of confidentiality, integrity and availability. In the case of control systems, and particularly electric networks, the consequences of a security breach are very different and therefore the priorities are different. The combined importance of availability and integrity within an OT system mean that nothing must be done on the active control systems network that would interfere or disrupt the time-critical operations of the system. In the control systems environment, the security objectives of the IT world are replaced by human health and safety, availability of the system, and timeliness of integrity of the data. Due to these differences, while the OT and IT domains often use similar or identical technology, differences in focus between the two domains drives the need for specific industry-aligned approaches appropriate to cyber security for the OT domain” (Ausgrid, 2019).

Given the scope and severity that integration brings, it’s vastly important that an organization’s level of readiness has fully matured as a result of in-depth and thoughtful planning prior to adopting IT systems in their OT environments. A few factors to consider before determining readiness include management’s understanding of such an integration, scope, and opportunities for assimilation. As with anything that determines the future state of an organization, senior management has to be knowledgeable and supportive of any changes being made to OT. They must be understanding of the implications as it relates to a convergence with IT systems, and they have to be aware of the risks associated with not providing the resources necessary to manage them successfully. Next, the organization must decide what they currently, or will, consider OT internally. Finally they should evaluate opportunities for assimilation with regards to technology and personnel.

Prior to exploring threats that exist to SCADA systems it’s worth revisiting again the aftermath created by exploitation. Anyone who’s ever studied for any sort of security certification has likely heard of the 2010 Stuxnet incident, a computer worm which leveraged a Windows-based vulnerability to cause physical damage to the Iranian Bushehr and Natanz nuclear facilities. In his TED Talk, cyber security researcher Ralph Langner describes Stuxnet as “a lab rat that didn’t like their cheese.” That is to say that, over the course of the three years he spent researching and reverse engineering the malware, he realized that he and his team were looking through the wrong pair of spectacles. Rather than looking for signs of data exfiltration or system control, it became apparent that Stuxnet was preying on control systems and forcing them to slowly ramp up the spin speed of nuclear centrifuges. Over time this would cause them to crack or, in extreme situations, even explode. Langner speculates that the incentive behind this complicated cyber attack was that unconfirmed rivaling nations wanted to prevent Iran from developing nuclear weapons of their own. Reports on the incident indicated an infection of over 200,000 devices and the physical degradation of 1,000 industrial machines.

NIST outlines four potential threat sources that may intentionally or inadvertently impact ICS. The example above encompasses one of these threat sources: Adversarial. These types of threats are the ones we generally think of when we hear about attacks or other events in the security space, and include actors such as insider threats, APTs, nation-sponsored attackers, competitors and so on. They generally aim to exploit the organization’s cyber resources and use those assets with malicious intent.

The second threat source is Accidental, incurred generally by an insider or business partner through erroneous or accidental actions. Examples may include a system administrator unintentionally taking down a critical system or a contractor incorrectly installing machinery.

Next are Structural threats such as equipment failure, resource depletion, widespread outages and so on.

Lastly, Environmental threats are those caused by external, natural or man-made forces such as earthquakes, violent storms, powerful winds, telecommunication/electric outages and acts of terrorism.

As with IT cyber threats, the above hazards to Operational Technology environments aren’t worth much in solidarity – they require precise exploitation of the following vulnerabilities:

Policy and Procedure – Oh, you built your business right on the San Andreas Fault but never took the time to build out a business continuity plan? Have fun with that one. Another peculiar issue more inclined towards OT is SCADA system owners’ inclination towards keeping systems running at all costs. It may often be seen that security objectives create animosity amongst this effort, and so a trusting and communicative relationship must be encouraged between SCADA and security teams.
Architecture and Design – Major technical oversights such as foregoing MFA, placing your SCADA devices on the same domain as all of your corporate assets, opening RDP to the world so you can more easily manage your domain controller in your pajamas (unironically this is disturbingly common for small businesses).
Configuration and Maintenance – Avoiding Patch Tuesday, exposing that 20-year old unsupported proprietary application that only works on Windows XP to the network. Bonus points if it controls a dam that can flood the city.
Physical - Making your bottom line look better by rescinding that project to build security perimeters around all your substations. Now they’re popular rural hot spots for locals to do all sorts of unsavory activities.
Software Development – Think all of the OWASP Top 10 no-nos. Basically your developers spending too much time on r/programmerhumor and less time implementing input validation.
Communication and Network – Anything that voids the 99.999% uptime pact you agreed to. Or, no, maybe that was your ISP.

Prior to the exploration of secure architecture in OT it’s important to be cognizant of the methods by which attackers gain access to these systems. This section is akin to similar material preached over and over again in the industry so I’d understand if the more seasoned readers wanted to skip over this. Courtesy of CISA, here’s a quick TL;DR for anyone wanting a refresher: “A cyber attack generally follows a process allowing the attacker to perform reconnaissance or discovery of the targeted business, then develops and executes the attack, and finally uses the attacker’s command and control presence to extract data and/or achieve the attacker’s goals on the target system.

In the discovery phase, a threat agent performs reconnaissance by probing the network perimeter to characterize the system, that is, determine if there is a firewall (and what type), what types of web or other Internet facing servers are used, and whether there are any open communication ports. The goal is to find any way possible to get into the system. They may also harvest publicly available corporate information (company principal’s names and email addresses, photos that may show physical security barriers, support personnel names and numbers) to gain any advantage they can for social engineering or email-based attacks. Once they find a way in, they select an attack method and begin the actual attack.

Potential intrusion vectors can range from technical brute-force hacking using exploit tools to showing up at a site dressed as a worker. At this point, the goal is to exploit any and all vulnerable people, processes, or components to gain entry. Adversaries may have a direct target in mind or merely wish to deposit code on any available machine in order to maintain a presence on the network or system and to allow for future unauthorized access. Generally, the goal at this point is to maintain continued surveillance using a light footing, many times covering their attack tracks as they go. Once they have found their access point, intruders can accomplish their intent through network intrusion—whether it is data exfiltration, creating a denial of service, or taking over command and control of the process, system, or the entire network. Many intruders leave residual back doors, accounts, or port openings for future or continued access. Once they have compromised a system, they may access it multiple times and may also use it to access other systems” (CISA, 2016).

CISA further breaks down the Discovery phase into two additional stages: System Characterization and Vulnerability Establishment. System Characterization are the methods by which a potential intruder gathers information about sites and devices, the most common including:

Physical Surveillance – investigation of facilities and their physical security capabilities. This can be done by a variety of means, such as street views in Google Maps, dumpster diving, drive-bys and social engineering
Public Information Aggregation – job postings, employee directories, whois data, vendor partnership announcements and so on
Scanning – the use of available tools to fingerprint systems

Vulnerability Establishment is conducted after discovering intrusion vectors and relies on all of the same misconfigurations found in IT. In 2017 The National Cybersecurity and Communications Integration Center identified the most frequent control system vulnerabilities to be boundary protection, identification and authentication of legitimate system users, and allocation of resources. They also cited susceptibility to phishing attacks, poor password and patching practices, and improper configurations heavily saturated by the use of weak cipher suites within SSL and TLS implementations. In my own experience I have also found removable media to be an area of concern. In 2019 CISA expanded on these common vulnerabilities by also adding physical access control and principle of least functionality to the list. There are challenges unique to OT environments, however, that make remediation of these vulnerabilities difficult, and so attackers realize they are more likely to be target-rich: “Because of the high or constant availability and critical response time requirements inherent in ICS, any change to the system necessitates exhaustive testing for software and security updates. Schedule all patching or update activities far in advance and permit them on a very infrequent basis. In addition, ICS components may not tolerate security software because of critical timing requirements. Control system components are often so processor-constrained that running security software itself creates unacceptably high delays in response, threatening system stability. The result is outdated OS revision levels and outdated or no malware protection software. Even if antivirus software is up to date and configured for proper execution, ICSs built on standard platforms are vulnerable to newly discovered malware threats that, once again, cannot be patched in a timely fashion” (CISA 2016).

After thorough reconnaissance attackers will begin to offensively target physical and logical assets to embed themselves and carry out nefarious objectives. Initial intrusion often results from a successful spearphishing attempt, brute force, exploitation of weak authentication, and physical access via insider threats or social engineering campaigns. “In addition, field devices are part of an internal and trusted domain, and thus access into these devices can provide an intruder with a vector into the control system architecture. By gaining access into a field device, the intruder can use the trusted relationship associated with the sensor network to tunnel back into the control system network. Recognizing that field devices are an extension of the control domain, intruders can add these field devices to their list of viable targets to investigate during the reconnaissance and scanning phases of the attack. Although such attacks are not considered possible across serial connections, the increasing use of standard networking protocols in remote devices requires attention” (CISA 2016). While inside the control network attackers can begin altering data or even changing the behavior of devices themselves. This was demonstrated on a global stage after the disclosure of Stuxnet. The specific actions an attacker may take is wholly contingent on the objectives they want to achieve. MITRE’s ATT&CK framework for ICS provides a knowledge base for tactics and techniques that could be used by threats once they’ve achieved access. As we move into a discussion on security engineering, it’s also worth noting that this framework can be useful in designing countermeasures to those same threats.

At a high level, any organization should use the below objectives as the foundation for their security program, which should assist in satisfying the broader ICS safety and reliability requirements:

While very general, these objectives protect against a number of possible incidents including inaccurate or a delayed flow in information, unauthorized changes to systems and their behavior, and the interference of operation equipment protection and safety systems. It simply cannot be stressed enough that the possible consequences of a major incident threaten public health and endanger human life. This greatly encourages the implementation of a defense-in-depth strategy, which contains many elements:

“In order to create a layered defense, one must have a clear understanding of how all the technology fits together and where all the interconnectivity resides. Dividing common control system architectures into zones can assist organizations in creating clear boundaries in order to effectively apply multiple layers of defense. Understanding how to achieve network segmentation is vital to creating architectural zones and determining the best methodologies for segmenting networks within and around control system environments” (CISA, 2016).

VLANs and DMZs play a critical role in building out this stratification in that they isolate traffic and disallow exposure to larger untrusted networks. The principal of least privilege needs to be baked in at every zone. If systems do not need to communicate, they absolutely should not be allowed to.

We will begin our dissection of ICS security architecture starting at the perimeter, where various controls govern the flow of traffic into, within and out of the environment. Key objectives and methods for achieving them are as follows:

Firewalls will carry the heaviest workload in enforcing the flow of traffic across boundaries. As such, it very, very important that rules concerning this flow always end with a “Deny Any Any” rule. Exclusions necessary for business functions can be built on top of that when needed. Administrators should be careful not to design rules allowing traffic to be overly broad. Object groups will be helpful in this regard. As opposed to allowing communications from subnet-to-subnet when only a handful of devices need access to a single resource on a different part of the network, those devices should be dumped into an object group and then that object group is allowed communication with the respective server. Organizations may also consider geo-IP blocking. Adding IP pools from countries that have absolutely no business prodding one’s network can save a lot of headache. Firewalls should also be configured to monitor and log events, but we will discuss this further when we get to security monitoring. Aside from firewalls, diodes can be another useful tool in defending high security zones. They work by physically allowing traffic flow in one direction only and so are usually placed between zones of differing security classifications.

While firewalls, proxies, jumphosts and diodes contribute a lot in securing the border of the OT environment, they don’t do a whole lot for the assets residing within it. Host security is an important part of any layered defense and must be implemented in any security program. While “antivirus” is a seemingly quick solution to this problem, it’s also a dirty one. For starters, gone are the days where AV is the only host-based security tool that needs to be employed. Heuristics-based tools augmented by threat intelligence, data loss prevention, hard drive encryption, local firewalls and HIDS are all options that should be considered in the modern age. Beyond that, host-based security tools are only a small piece of a larger picture. Device hardening, security policy enforcement, patch management and vulnerability management also needs to be established. Hardening a device means disabling unused ports, services, protocols, applications and anything else that could serve as a potential attack vector for an intruder. “System configurations should be actively managed throughout the system life cycle. There are a number of techniques that organizations can use such as creating a secure image to configure new equipment, equipment re-builds that only contain required software, and configuration of devices with only required services and ports” (CISA 2016). Security policies can aid in this venture by globally disabling insecure protocols, such as telnet, enforcing strict password requirements and so on. Vulnerability management should be an ongoing effort to raise the alarm on newly discovered vulnerabilities, and patch management should be used to fill those gaps. “Administrators should schedule software upgrades and patch management procedures at routine intervals. The development of procedures to incorporate security patches promptly and current software recommendations on a regular basis can substantially limit opportunities for hostile parties to target newly discovered vulnerabilities, which receive wide and immediate publicity. Organizations should institute practices to protect themselves without granting ample time for would-be intruders to apply the new knowledge against critical systems,” (CISA, 2016).

Network monitoring is another indispensable factor of OT security that assists teams in determining what is considered “normal”, and alerting on things that are not. The efficiency of an organization’s security monitoring program is based wholly on how quickly teams can respond to incidents and recover affected systems. Logging, monitoring and alerting (LMA) tools provide them with the insight and capability to disrupt the cyber kill chain before irreparable damage can be done. Security Information and Event Management (SIEM) platforms are generally at the center of a solid LMA program. It collects and analyzes log data across the environment in real time, but also correlates all that information across the footprint to alert on anomalies. A SIEM, however, is not worth much on its own. It needs to ingest all of that information from firewalls, intrusion detection/prevention systems, network appliances, monitors and sensors, honeypots, endpoint security tools… basically anything that can create a log has useful information to send to this centralized location. Additional resources such as advanced threat detection tools, intrusion detection and prevention systems and network sensors provide incident response teams with additional functionality for discovering threats. These tools will be either signature-based that alert in accordance to pre-determined rules, or heuristics-based that find patterns existing outside of a baseline. There are some considerations to account for prior to choosing tools to satisfy an organization’s unique needs:

Much like the tools employed for perimeter security, LMA solutions need to be thoughtfully placed within an environment to eliminate blind spots.

The last topic that requires some attention is risk and compliance. Ironically it’s my least favorite area of work but manages to dominate most of my time on a day-to-day basis. The fact of the matter is that the industrial sector is often heavily regulated, where strict requirements are aggressively enforced through audits and, for companies that fall short, hefty fines. For example, the North American Electric Reliability Corporation has been hired by FERC in the US to regulate the electric sector through their colossal Critical Infrastructure Protection (CIP) body of standards. They contain hundreds of pages of requirements that dictate how affected companies, or entities, conduct their security operations. For example, CIP-007 Requirement 3.1 states organizations must “Deploy method(s) to deter, detect, or prevent malicious code.” While they don’t specifically state how this can be achieved, it’s reasonable to assume anti-malware solutions as discussed above would satisfy this. It requires vast resources in the form of personnel, technology and funding to run a successful risk and compliance program, and those resources also have to integrate seamlessly with the teams that will be doing the actual technical work. To best achieve compliance and avoid unnecessary violations, compliance teams have to work closely with anyone managing or altering systems that fall under any applicable regulations. Compliance must be integrated with every step of the change management process. Because I am not a compliance professional and do not have the bandwidth to do a deep dive on industrial standards, I would simply advise security professionals working in a regulated field to first understand the requirements of the body governing their environment, and secondly to seek help when needed. There are a plethora of resources available, from the regulator themselves to educational seminars to consulting firms.

The greatest fear, and a very real possibility, for any of these organizations is that a highly sophisticated, well-funded nation sponsored cyber threat would compromise their environment and use that foothold to claim the lives of thousands of people. The digital age has deeply interconnected everyone and everything, so much so that nothing should be considered safe from adversaries. The adoption of IT technologies in the industrial sector ultimately translates to very real and frightening physical consequences for improperly securing them. “What you end up [with] is Cyber Weapons of Mass Destruction. Unfortunately, the biggest number of targets for such attacks are not in the Middle East. They are in the United States, Europe and Japan. These are the target-rich environments. We have to face the consequences, and we better start preparing right now.”

Sources and further reading:

Ausgrid: Industry Best Practice for Operational Technology Cyber Security

Department of Energy: 21 Steps to Improve Cyber Security of SCADA Networks

Gartner: IT and Operational Technology Alignment Innovation Key Initiative Overview

MITRE: ATT&CK for Industrial Control Systems

NIST: SP 1800-23 Energy Sector Asset Management

NIST: SP 800-82 R2 Guide to ICS Security

NIST: SP 1800-7 Situational Awareness For Electric Utilities

NIST: SP 1800-2 Identity and Access Management for Electric Utilities

NIST: SP 800-53 R5 Security and Privacy Controls for Information Systems and Organizations

SANS: Security for Critical Infrastructure SCADA Systems