With a brief postscript: the new, every expanding problem – hackers purposely corrupting data
ABOVE: a little magic trick in Washington, D.C.
A guest post by:
Eric De Grasse Chief Technology Officer
A member of my Luminative Media team
14 June 2024 — If you live in D.C., you might have seen this show, or perhaps read about it in the Washington Post.
It was a magic show, and as part of the performance two magicians were “dueling”. As part of the act, one guy holds up some pieces of rope, changes their lengths in his hands, and then combines them into one single strand.
Then the other guy came on stage and tried to one-up him.
There are the usual cheesy jokes you encounter at these types of affairs (modern vaudeville at its best), cards that magically appear in participants’ pockets, a plunger on the head of an audience member for “mind” tricks, etc., etc.
But the ending performance is a corker.
The two magicians I noted at the top of this post return to the stage, joining forces, asking a volunteer to think of a number between 20 and 50. A man in the audience is randomly chosen, he picks a number, and the magicians tried to guess. They kept getting his number wrong, as they wrote down digits on a large pad on the stage (see graphic above).
But then they asked the audience to look more closely at the grid they had drawn. In every direction, the numbers added up to the man’s chosen number – 36. Thunderous applause from the audience.
I rushed the stage at the end and I asked the man selected “Were you part of the act?” He said he was not. The magicians said he was not.
So if they were all telling the truth, how did the trick work? My eyes had been searching for clues during the show. Afterward, my friend and I thought we’d know how to figure it out. Ask ChatGPT! I mean, knowing that artificial intelligence grows more capable by the day surely it would know.
So we fed it the parameters of the trick and even the image at the top of this post.
Alas, what we discovered: the AI did not know magic. Instead, it gave us a long-winded answer as a pedant would, elaborating on irrelevant minutiae without reaching any real point – you know, a “hallucination”, when large language models generate inaccurate, false, or nonsensical responses. And in truth, I like to imagine a coterie of magicians still honoring their craft, keeping their slights of hand away from the public.
But if magic tricks have not been leaked or hacked, the recipes for violence and destruction have. Earlier this year our boss, Gregory Bufithis, was invited to one of those rare military intelligence briefings (the officials speaking do so on the condition of anonymity and we are allowed to post about them, with no attribution back to the source) describing how violent extremists and traditional terrorists are using AI to solicit ways to create different types of explosives and potentially lethal chemical and biological substances. It is a huge issue. Some have gone as far as to post about their successes with AI models, easily obtainable on the Dark Web. And unsurprisingly, China has been at the forefront of prioritizing ways to steal American AI technology and data, in the most ingenious of ways. It’s complex networks targeting and collecting on U.S. companies, law firms, universities, and government research facilities.
So what has become a big issue of late is what AI should not learn – based on humans choosing to hold back information, and governemnt organizations “scrubbing” the web to get certain information out of the hands of training sets.
One such organization I spoke with (name withheld due to confidentiality) is made up of e-CIA and ex-NSA operatives. They work from an office in Crystal City, a hub for the defense industry in Arlington, Virginia which is a warren of buildings (many with no signs, nor even public access doors). As my contact said, here are the covert areas where relying on AI to recommend forces of action, including covert influence or kinetic action, “need to have a well thought out, thorough, human-in-the-loop type of guidance so that the intelligence community does not rely solely on an AI to make some of those decisions”. And both the top brass at the CIA and NSA have said humans must take primacy, that AI can only augment – not replace – humans.
Here are the areas where the intelligence community is cautious with information, and even withholding information, from generative AI, and sometimes even “scrubbing” the data from the discoverable internet:
• Commercially available data on Americans that it considers to be sensitive. Because anything on the internet can essentially get sucked up, the intelligence community is bringing algorithms on their high side (aka “classified AI systems”), experimenting in a “closed-off, sandbox type arena,” i.e. using smaller, in-house language models.
• Information that could jeopardize sources and methods. The intelligence community is working on ways to leverage cloud technology and cybersecurity to protect the digital “breadcrumbs” involved in conducting operations, while still taking advantage of AI. And as we have indicated in numerous posts, the digital “breadcrumbs” we all leave allows anybody (with the right tools) to build a detailed digital dossier on almost anybody.
• Operational information that could end up leaking if it’s used in commercially available AI. This is the gaping hold that ChatGPT has opened, and it will be very difficult to close. As an example, my contact showed me the absolute ease of acquiring information on a law firm and its client by accessing both their use of ChatGPT.
And how is the intelligence community using AI? It is largely to manage enormous datasets. That includes “computer vision,” which is used to detect objects in images, a source who works on government projects told me. In Ukraine, AI has been paired with drone footage, satellite images, and social media to provide intelligence on Russian forces, including their units, identities, movements, weapon systems, and morale.
And in Gaza. The Israeli military admitted it was only able to pull off its most daring hostage rescue since that 1976 raid at Entebbe airport in Uganda due to a team of U.S. military and intelligence personnel working out of the U.S. Embassy in Jerusalem. That team provided the key information that helped the Israeli military plan the complex operation to rescue those 4 four hostages last weekend using a complex of U.S. drone surveillance, communications intercepts, and other multiple other digital sources.
AI is not new to the intelligence community. The exact origins of their work on it probably are classified, but it goes back at least a decade. In those early days, that meant machine learning. The CIA made an investment in machine translation of foreign language material, which not only sped up processing, but has allowed the agency to be more strategic about who needs intensive language training.
The CIA’s new, unclassified generative AI tool, called Osiris, is available to all 18 intelligence agencies. They’re using it mainly for unclassified open source reporting. Osiris can summarize massive amounts of information, allowing analysts to quickly learn key points from dozens of articles, then ask follow-up questions to determine where they want to focus their time.
But the questions must be posed carefully, as U.S. adversaries want to know the queries too. Because of that, intelligence officers are trained on which queries they can use on the low side and high side. It is a balancing act. He showed me examples of “low side” and “high side” so I could see the dangers but I will not share them in this post.
Threats of foreign penetration only grow more stark in the AI age of intelligence. AI-enabled tools will allow our adversaries – and by the way, not just state actors, but non-state groups like Hezbollah who are getting quite versed at all kinds of AI-enabled tools – to use AI to uncover Western intelligence undercover officers, or discover their travel, or the travel by key, sensitive players they’d like to “tap”, as well as understanding what bank accounts they’re using to pay agents (which was shown, ironically, in the first Jason Bourne movie in 2002).
So a hell of a problem: make our intelligence community AI-proof, even as at the same time, we’re taking advantage of AI tools for offensive purposes. This is why it is such a big business area for the intelligence community. Especially the work in progress on how to deal with the ways our adversaries are using these systems against us.
My contact also gave me a translation of a Mandarin-language article published in a Chinese military journal in March 2024. It described AI’s importance in “algorithmic cognitive warfare,” what it describes as an invisible war: how AI can mine images and speeches of a target — assessing keywords, intonations, and personality traits to assess emotions, stances, interpersonal relationships, and command chain; correlate content with military operations and social events; and analyze the enemy’s algorithm to evaluate its technical level and attack range. Simply, a fascinating read.
Oh, and I know your question. Is China itself withholding information (erasing information) from generative AI? Yes, my contact said. Beijing regularly “sweeps” its internet and takes publicly available data off the internet to protect the regime. They’ve really put the clamps down on what foreign actors, non-Chinese actors, can access on the internet. And this is restricting how Western intellignce AI systems that are reliant on the internet can acquire information on what is happening in China.
But something is also happening in the reverse. China, Russia, and other U.S. adversaries are trying to pollute American AI algorithms or data sets (see next story below). They can trash our AIs to make them useless while they use it themselves.
One big issue is that the U.S. intelligence community is not on track to meet a two-year deadline set by the Special Competitiveness Studies Project for deploying generative AI at scale (“we are always behind the Chinese and the Russians”) despite assertions by the CIA’s first chief technology officer that “we think we’ve beaten the initial timeline by a big margin. Not so, said my contact:
“I do not worry not so much about it today, but if we drag our feet for too long, I worry about where we’ll be in another two years’ time, because that technology is advancing that rapidly, and by then we might never catch up”.
He said “spy games need to evolve to delivering speed to insight”. Will the U.S. be faster (and more insightful) than China and Russia? It depends, he says, but we have a weak point:
“Perhaps our intelligence community won’t be at the absolute forefront of what is technically possible. It seems to always be stuck sorting through the critical policy and operational considerations that need to be taken into account, where our adversaries never have that burden.”
We are all well aware that biased data pollutes AI algorithms and yields disastrous results. The real life examples include racism, sexism, and prejudice towards people with low socioeconomic status. Beta News adds its take to the opinions of poor data with “Poisoning The Data Well For Generative AI”. The article restates what we already know: bad large language models (LLMs) lead to bad outcomes. It’s like poisoning a well.
Beta News does bring up the new, every expanding problem to the discussion: hackers purposely corrupting data. Bad actors could alter LLMs to teach AI how to be deceptive and malicious. This leads to unreliable and harmful results. What’s horrible is that these LLMs can’t be repaired.
Bad actors are harming generative by inserting malware, phishing, disinformation installing backdoors, data manipulation, and retrieval augmented generation (RAG) in LLMs. If you’re unfamiliar with RAG:
“With RAG, a generative AI tool can retrieve data from external sources to address queries. Models that use a RAG approach are particularly vulnerable to poisoning. This is because RAG models often gather user feedback to improve response accuracy. Unless the feedback is screened, attackers can put in fake, deceptive, or potentially compromising content through the feedback mechanism.”
Unfortunately it is difficult to detect data poisoning, so it’s very important for AI security experts to be aware of current scams and how to minimize risks. There aren’t any set guidelines on how to prevent AI data breaches and the experts are writing the procedures as they go.
But the best advice is to set up a special team, and be familiar with AI projects, code, current scams, and run frequent security checks. It’s also wise to not doubt gut instincts.