The Department of Homeland Security wants help identifying, attributing and combatting major internet outages and disruptions — and it will pay.
Last week, at an industry day and in solicitation documents posted online, the department’s Science and Technology Directorate invited research proposals under its “Predict, Assess Risk, Identify (and Mitigate) Disruptive Internet-scale Network Events,” or PARIDINE.
These large-scale internet outages or slowdowns can have many causes, explained PARIDINE program manager Ann Cox — from natural disasters like hurricanes or tsunamis, to accidents that can knock out physical infrastructure, through geo-political events like a country trying to cut itself off from the internet, to the mass-scale re-routing of internet traffic. Large-scale re-routing incidents can happen by accident; but they can also be caused by malicious actors using a technique called border gateway protocol, or BGP, hijacking.
On Twitter, security analyst Richard Bejtlich called BGP hijacking, “Probably[the] biggest Internet weakness hardly any[one] knows/cares about.”
“Right now we don’t even have a good shared definition of these disruptive events,” Cox told CyberScoop after her industry day presentation. “Obviously an outage counts, but what if it’s just a wide-scale degradation of service? How bad do the delays and hang-ups need to get before it qualifies?” she asked.
The Broad Agency Announcement the department posted last week is the first of three planned solicitations. For the current BAA, Cox said, “The idea is to develop an operational reporting capability for these events” — with three elements: “definition, identification and attribution.”
With identification, the timing is key, Cox added. “Right now, nobody’s seeing [these events] until it’s all over … usually their answers come in retrospect,” she said.
With BGP hijacking, for example, it is generally “15 minutes at the earliest and many times hours before you know your traffic’s been rerouted,” added Douglas Maughan, director of DHS’ cybersecurity research and development effort. By contrast, the PARIDINE technology should be able to identify events “in near-real time, tens of minutes, if not minutes … that [would be] an order of magnitude better than we have at the moment,” according to Cox.
There is a lot of interest from inside the government about this kind of capability, said Maughan. For instance, as part of its work regulating voice over internet protocol telephony, or VOIP, the Federal Communications Commission needs to know whether a VOIP outage impacts more than 900,000 user-minutes. FCC is a “pilot partner and a possible customer,” he added.
Maughan said DHS expected to fund between five and eight research proposals, spending $12 million to $15 million over the next three years.
Attribution is another vital element of the technology PARIDINE wants to develop. Often it’s unclear exactly what’s creating the service interruption, or what the motivation of the actors involved is, said Cox.
“We’ll start with something like network attribution, can we tell even which network it’s coming from?” she said. “Or, is there a root cause analysis, did someone put a backhoe through a fiber-optic line?”
“Sometimes it’s clear what the cause is, but often it is very murky,” said Cox.
For instance, on April 27, the internet traffic of more than three dozen major companies — mainly banks, credit card providers and internet security outfits— was, for about seven minutes, all redirected through the network of Rostelecom, Russia’s state-owned telecommunications and internet provider.
The problem is, like most of the internet’s architecture, BGP was written without any security considerations and is based totally on trust. Thus, any provider, even a small one, can advertise packet-flow routes that bring traffic across their own networks — even if that represents a huge diversion — and those routes will propagate to other service providers and can eventually spread all over the world. While it’s passing through their network, the provider can listen in, record or even tamper with the traffic. It’s still unclear whether the Rostelecom re-routing was accidental or deliberate — and the same is true of a series of large-scale BGP hijackings going back almost a decade.
But some analysts are skeptical about the program. “For many years, we’ve been able to conduct analysis of an incident within minutes of it happening and it comes from multiple sources (traceroute, BGP, DNS),” said one private sector expert who asked for anonymity owing to the sensitivities of his employer.
However, the expert added, “Our analysis isn’t completely automated. I believe they are proposing a system that could completely automate analysis. The truth is that it is more of an art form than a science … There is a human skill involved.”
To the extent that they are creating tools which automate some of the drudge-work of the human analyst, the expert welcomed the move. “Building capability to expedite the work of a human analyst would definitely be progress,” the expert said.
The second and third BAA’s, which will follow in six to nine months and one to two years respectively, will ask secondly for technologies that can predict the impact of such events as they are occurring and offer mitigation strategies in near-real time; and thirdly for “a risk assessment and mission impact tool,” in which the impact of the events can be modeled and their effect on mission performance calculated.