Naming pollution is real.  It’s a real problem.  First anti-malware/AV malware detection names, now APT group names – and their campaigns – and their malware.  Analysts are in love with names – and marketing is in love with their names.

You see, naming is powerful.  It’s why we agonize over a child’s name.  It’s why (in the Judeo-Christian tradition) God’s name was truncated and not to be uttered.  At about 2 years old we start learning the names of things and are able to start uttering them back.  This gives us power, because when the 2-year-old is able communicate a thing’s name – we give it to them!  It’s powerful to a 2-year-old and that same power follows us throughout life – see “name dropping” – or the honor of naming a new geographic/astrological feature.

EveryoneGetsAName

It’s followed us into the information security space – for both good and bad.  You see, we need names.  Names are important.  It’s part of how we organize cognitive information and make sense of our world – through abstraction.  It’s important to how we communicate.  But, like any power, it can be misused and misappropriated.  Every organization now loves to name “adversaries,” “actors,” “activity groups,” or whatever you call them.  They can blog about it, tweet about it, produce nice glossy materials and presentations.  It gives them power – because that’s what names do.

The problem isn’t names, it’s the power we attribute to them and their use in our analysis.  When ThreatToe calls something BRUCESPRINGSTEEN and CyberCoffin identifies a similar activity and names it PEARLJAM, everyone else starts updating their “Rosetta Stone” and makes the association BRUCESPRINGSTEEN = PEARLJAM.  Everyone else now starts attributing their intelligence to these two named groups.  But, nobody actually knows what the heck these things are aside from a few properties (e.g. IPs/domains/capabilities/etc).  That is not enough to understand.

I can’t tell you how many time’s I’ve heard: “Did you see the recent report from CyberVendor – can you believe they attributed that activity to PEARLJAM?!  That is clearly STEVIEWONDER – those guys don’t know what they’re talking about.”  The problem with that statement is that assumes: (1) you actually know what you’re talking about (you’ve correct correlated activity) and (2) you understand their definition of PEARLJAM.  Within their own analytic definition the correlation could be absolutely correct.  It’s that we’ve made unfounded assumptions and assigned too much power to the names.

NamesEverywhereBut, WHY CAN’T WE JUST ALL AGREE ON NAMES!!!!! (as this is usually said in an elevated tone and usually while slightly-intoxicated)  Because we can’t.  That’s why.  It’s not about the names.  The names are just crutches – simple monikers for what is very complex activity and analytic associations which we still don’t know how to define properly.  To understand this, you need to understand how we’re actually defining, correlating, and classifying these into groups – read the Diamond Model section 9 for this information.

The simple answer: it’s hard enough to correlate activity consistently within a 10 person team let alone across a variety of organizations.  The complex answer: correlation and classification is a complex analytic problem which requires us to share the same grouping function and feature vector.

What we shouldn’t do is to start using each other’s names – because, again, it’s not about the names.  If you begin to use the names of others you start to take on their “analytic baggage” as well since you are now intimately associating your analysis with theirs.  This means you may also take on their errors and mis-associations.  Further, it may mean that you agree with their attribution.  Its highly unlikely that you’ll want intertwine your analysis with that of others whose you don’t really understand.

Instead, we need to rely on definitions.  We need to openly share our correlation and classification logic and the feature vectors which we’re applying.  But to those who are now saying, “Finally! An answer!  Let’s just share this!” sorry, it’s not a silver bullet.  Because, the feature vector is highly dependent on visibility.  For instance, some organizations have excellent network visibility, some have outstanding host visibility, others may have great capability/malware visibility, etc.  It means that generally, I need the same visibility as another organization to effectively use the shared functions to produce accurate output.

So, reader, here I am, telling you about this problem forcing poor analytic practices on daily basis causing us all these issues but without a real solution in sight.  Yes, I think that sharing our definitions will get a LONG way towards improving correlation across organizations and giving those names real value – but it is by no means a silver bullet.  I’m a proponent of this approach (over pure name/Rosetta stone work) but I know we’ll still spend hours on the phone or in a side conversation at a conference hashing all of this out anyways.  But maybe, just maybe, it will reduce some analytic errors – and if that is the case it is better than what we have today.