Larry Tesler was right: “AI is anything that doesn’t work yet”. But in just a few weeks, AI technology has advanced to a level where it really works and it could have wide-ranging applications.
7 October 2022 – – Over the last 10-12 years I’ve had the opportunity to read numerous books and long-form essays that question mankind’s relation to the machine – best friend or worst enemy, saving grace or engine of doom? Such questions have been with us since James Watt’s introduction of the first efficient steam engine in the same year that Thomas Jefferson wrote the Declaration of Independence. My primary authors have been Charles Arthur, Ugo Bardi, James Beniger, Wendy Chun, Jacques Ellul, Thomas Kuhn, Lewis Lapham, Donella Meadows, Marshall McLuhan, Nicholas Negroponte, Neil Postman, Maël Renouard, Matt Stoller, Alvin Toffler, Jerome Wiesner, and Simon Winchester.
And, yes, you need to read all of them. Because when it comes to technology, my age group (I’m 71) started with the people who seemed to run the world – guys like Steward Brand and Douglas Engelbart – and then later with the guys who did run the world – guys like Bill Gates, and then Jeff Bezos and Elon Musk and The Zuck. They all read the folks I listed above. Well, not Wendy Chun, Maël Renouard, or Matt Stoller. They are “kids”, far younger than I, but they have joined the pantheon of tech pioneers. More on those “kids” later in this series.
But it also brings up one of the things I detest about the Internet and how it has changed our thinking. It is the inexorable disappearance of retrospection and reminiscence from our digital lives. Our lives are increasingly lived in the present, completely detached even from the most recent of the pasts. And yet we have this enormous global information exchange at our disposal. So much of what we are experiencing now, often considered as “new”, has been addressed in detail in the past.
My study really began in ernest when I started to receive invitations to attend sessions and special events at the MIT Media Lab, a research laboratory at the Massachusetts Institute of Technology which grew out of MIT’s Architecture Machine Group in the School of Architecture. Its research does not restrict to fixed academic disciplines, but draws from art, design, media, science, and technology. The Lab was founded by Nicholas Negroponte and Jerome Wiesner.
NOTE TO READERS: long-time readers of this blog will know I tend to wax lyrical about Nicholas Negroponte. His book “Being Digital”, originally published in January 1995, provides a general history of several digital media technologies, many that Negroponte himself was directly involved in developing. He said “humanity is inevitably headed towards a future where everything that can will be digitalized (be it newspapers, entertainment, or sex) because we’ll get better at controlling those unwieldy atoms”. He introduced the concept of a virtual daily newspaper (“customized for an individual’s tastes”), predicted the advent of web feeds, personal web portals, “mobile communications in your pocket”, and that “touch-screen technology would become a dominant interface”.
But in a sense, this is hardly surprising: the social beast that has taken over our digital lives has to be constantly fed with the most trivial of ephemera. And so we oblige, treating it to countless status updates and zetabytes of multimedia (almost 5 thousand photos are uploaded to Facebook and Instagram every second), and the latest “business news” which is really just a press release or a paid ad (and rarely disclosed as such).
And Negroponte credits Philip Agre (whom I have cited numerous times in my posts), especially his pioneering article from 1994 titled “Surveillance and Capture” which is about the way computing systems and networks “capture” data. I love the word “capture”, an implication of a measure of violence as these systems force their environment (us) into the modus of data-engines. Agre predicted that we’d see a permanent translation of the flux of life into machine readable bits and bytes. And without using the word “platform” but in every sense of the word meaning platform, he said this was all due to the growing reliance on data-driven decision systems, systems “that would need ever more behavioral data as they move into cyberphysical systems. It will completely overhaul our existing information infrastructure, and the thirst for data will have no end”.
And some personal advice. Do not ever be ashamed of taking baby steps or really basic beginnings to get into this stuff. Years ago I dove head first into natural language processing and network science, and then pursued a certificate of advanced studies at ETH Zürich because I immediately understood that I had to learn this stuff to really know how tech worked. But I was fortunate because I began with a (very) patient team at the Microsoft Research Lab in Amsterdam – a Masterclass in basic machine learning, speech recognition, and natural language understanding. But it provided a base for getting to sophisticated levels of ML and AI.
Lately all of these questions about AI and machine intelligence (and even plain vanilla search which is upending the enterprise search and eDiscovery markets, as well as others) have been fortified with the not entirely fanciful notion that machine-made intelligence, now freed like a genie from its bottle, has moved beyond its birth pangs. We are no longer in the “early years” of machine learning and AI. It is growing phenomenally, assuming the role of some fiendish Rex Imperator. Many feel it will ultimately come to rule us all, for always, and with no evident existential benefit to the sorry and vulnerable weakness that is humankind.
Maybe. But now, a decade into the artificial intelligence boom, scientists in research and industry have made incredible breakthroughs.
Increases in computing power, theoretical advances and a rolling wave of capital have revolutionised domains from biology and design to transport to language analysis. In this past year alone advancements in artificial intelligence have led to small but significant scientific breakthroughs across disciplines. AI systems are helping us explore spaces that we couldn’t previously access.
In biochemistry, for example, researchers are using AI to develop new neural networks that “hallucinate” proteins with new, stable structures. This breakthrough expands our capacity to understand the construction of a protein.
In another discipline, researchers at Oxford are using DeepMind to develop fundamentally new techniques in mathematics. They have established a new theorem with the help of DeepMind, known as knot theory, by connecting algebraic and geometric invariants of knots.
Meanwhile, NASA scientists took these developments to space on the Kepler spacecraft. Using a neural network called ExoMiner, scientists have discovered 301 exoplanets outside of our solar system. We don’t need to wait for AI to create thinking machines and artificial minds to see dramatic changes in science.
By enhancing our capacity, AI is transforming how we look at the world. But over the past several months (weeks, even) we’ve had a major ….
There are companies that have ridden the AI wave in important ways: Google claims to have refined many of its services with the help of AI, machine learning has boosted sales of Nvidia’s graphics processing units and TikTok’s algorithm is reputedly a big part of what keeps users coming back to its short videos.
But it’s hard to find a pure AI company that has risen on the back of the technology, or to identify a big new market that has been created. That picture may be about to change, and in a big way.
NOTE TO READERS: I would argue that TikTok is actually very close to a pure AI company, as the product is inconceivable without the recommendation engine that curates content for users. Likewise, many tech products (Google Search, etc.) couldn’t exist without machine learning; we just tend to forget because we still remember the earliest iterations that could do without it.
According to several major AI pundits, something significant has happened in AI in the most recent weeks. Generative systems – ones that automatically produce text and images from simple text prompts – have advanced to a level where they could have wide-ranging business uses. A partner in one leading Silicon Valley venture capital firm, who describes the recent history of AI as a graveyard for start-up investors, now reports that the race is on to find breakthrough applications for this new technology.
Since the launch of the OpenAI’s GPT3 text-writing system two years ago (of which I wrote extensively here), generative models like this have been all the rage in AI. The ethical issues they raise are profound, ranging from any biases they could imbibe from the data they are trained on, to the risk that they could be used to spew out misinformation. But that hasn’t prevented the hunt for practical uses.
Three things have changed to turn these systems from clever party tricks into potentially useful tools.
One is that the AI systems have moved way beyond text. Actually, GANs (generative adversarial networks) were able to product images or even music (yes, music) years ago.
Last week, Meta unveiled the first system capable of producing a video from a text or image prompt. That breakthrough had been thought to be two years or more away. Not to be outdone, Google responded with not one but two AI video systems of its own.
This year’s biggest AI breakthrough has come in image generation, thanks to systems such as OpenAI’s Dall-E 2, Google’s Imagen and the start-up Midjourney. Emad Mostaque, the London hedge fund manager behind Stable Diffusion, the latest image-generating system to take the AI world by storm, claims pictures will be the “killer app” for this new form of AI.
NOTE TO READERS: Google is trying to become a more visual, more exploratory search engine. It is trying to blow up how you think about search. To say it’s pivoting to compete in a world where TikTok and Instagram are changing the way the internet works would be an overstatement … but not a big one. Google now exists on a more visual, more interactive internet, in which users want to be surprised and delighted as often as they just want an answer to their questions. But it highlights why the visual is becoming the dominant element of search. Yes, yes. I know. Every couple of years Google says that “search is changing completely!”, though the shift tends to be difficult to spot. But now it’s definitely here. Why? It needs to catch up with its audience which is increasingly using TikTok for search. Yes, search. TikTok has become a search engine. More later in this series.
The second big change comes from the rapidly falling cost of training giant AI models. Microsoft’s $1bn backing of OpenAI three years ago highlighted the prohibitive expense of this for ever-larger models.
NOTE TO READERS: OpenAI takes a “snapshot” of internet content on a regular basis to know what’s available to the public. They keep the latest/best for themselves – for national security purposes, of course, among other things. I’ll discuss that later in the series.
New techniques that make it possible to achieve high-quality results by training neural networks with fewer layers of artificial neurons are changing the picture. The computing resources used to train Stable Diffusion would have cost only $600,000 at market prices (data supported in this Tweet).
The third change has been the expanding availability of the technology. The technology (either hardware or software) is mostly open-source and has been relatively freely available. Anybody can build a neural network that can produce images or music, or both for that matter. Even you, dear reader, with your crappy computer (though the use of a GPU is advised, but then again you can borrow one on AWS). But what had not been readily available is the data used for training the neural networks and computing capacity.
Google and OpenAI have been wary about making their technology widely available, partly out of concern about possible misuse. By contrast, Midjourney’s image system is available to all users through a freemium pricing model. Stable Diffusion has gone further, open-sourcing its software and releasing details of how it trained its system. That makes it possible for other organisations to train an image model on their own data sets.
The risks that stem from such generative systems have received much attention. They churn out fresh images or text based on the millions of examples they have learnt from, with no understanding of the underlying material. That can lead to nonsensical results, as well as deliberate misinformation.
But in a business setting, at least some of these shortcomings could be controlled. The trick will be to find ways to embed the technology in existing work processes, creating tools that can suggest new ideas or speed up creative production, with human workers filtering the output. The idea is already being used to generate computer code.
The big question right now? Will the existing giants of industries such as marketing, media and entertainment be the first to make use of these powerful new creative tools? Or will they be disrupted by a new generation of upstarts with their roots in AI? I am already seeing media graphics attributed to +midjourney. And, for sure, Stable Diffusion (referenced above) can be used to make proper art.
You know what? It does not matter. The approaching tsunami of addictive AI-created content and disinformation will overwhelm us.
One of the lessons I absorbed from a few decades of technology journalism is that conceiving what will happen when things scale up is really, really difficult. We can see a lone tree and grasp it; but imagining how a forest of them will change the ecosystem is incredibly hard.
For instance, the iPhone and Android made it easy to get email out of the office. But they also prompted an explosion of apps. Which created a new economy of people making apps. Which encouraged apps that weren’t restricted just to doing things on the phone, but were useful in the physical world, such as Uber. Meanwhile, the connectedness meant that photos and videos could be uploaded and even streamed – for good, for bad … and for ugly.
The point being that all the disparate bits above might look like, well, disparate parts, but they’re available now – and that’s without mentioning deepfakes and the unstoppable flood of AI-created disinformation.
And that is why I always tell my team, and everybioy I meet, “Read your histories of technology. Read the pioneers”. And that includes all the folks I listed above.
Glimpses of the AI tsunami
I suspect in the future there will be a premium on good, human-generated content and response, but that huge and growing amounts of the content that people watch and look at and read on content networks (“social networks” will become outdated) will be generated automatically, and the humans will be more and more happy about it.
In its way, it sounds like the society in “Fahrenheit 451” (that’s 233ºC for Europeans) though without the book burning. There’s no need: why read a book when there’s something fascinating you can watch instead?
Quite what effect this has on social warming is unclear. Possibly it accelerates polarisation, but rather like the Facebook Blenderbot, people are just segmented into their own world, and not shown things that will disturb them. Or, perhaps, they’re shown just enough to annoy them and engage them again if their attention seems to be flagging. After all, if you can generate unlimited content, you can do what you want. And as we know, what the companies who do this want is your attention, all the time.
Here is just a few examples of the types of AI and tech I’ll talk about in subsequent posts:
Facebook has opened the kimono ever so slightly to show off a text-to-video system it calls Make-a-Video (not sure how much work went into the name). Give it a text prompt and it generates a small 5-second looping video:
“With just a few words or lines of text, Make-A-Video can bring imagination to life and create one-of-a-kind videos full of vivid colors and landscapes.”
Definitely true, as with the GIF, via Meta, below, which was created with the simple text prompt “a teddy bear painting a portrait”:
OpenAI, those jolly people who gave us GPT-3, the AI system which can generate entirely believable text from a short prompt, are back.
GPT-3 was like – well, a freshly hired intern who is well-read, opinionated and has a poor short-term memory.
How they bring us a speech-to-text system called Whisper. It’s hardly the first speech-to-text system (there’s one on your phone), but early reports say that it’s really good, dealing with mumbling, spotting names and capitalising them, all those desirable things.
It presently works in English but has a big corpus of other languages that it can be used on. We just received permission to try it for French and Italian.
Darth Vader’s voice (originally by James Earl Jones, now 91 years old) is now being created by a speech-generating algorithm – from a team in Ukraine, as it happens, which insiders say is incredibly good at this kind of work. One engineer said: “For a character such as Darth Vader, who might have 50 lines on a show, we need to go back-and-forth over almost 20,000 audio files to synthesize a re-creation.”
NOTE: Jones was only paid $7,000 when he first voiced Darth Vader in 1977. Hollywood sources say he was paid $25 million for his voice.
Copyright implications of AI illustration systems? There probably aren’t any, because the training falls under fair use/fair dealing (US phrase/UK phrase), rather as happened with Google’s scanning of books for the Google Books project.
But, there is a legal specialty building nonetheless. As a recovering IP/digital lawyer I’ll end this series with an analysis. I am seeing the development of “Invention Services” that can create inventions to handle them and/or their “Invention Studio” product can be licensed-in to enable companies to create their own in-house data-driven invention laboratories. This is going to create great work for IP lawyers.