Oh, God. ChatGPT. Not ANOTHER “Betamax Battle”! Oops. Sorry. It’s an “intellectual spat”.

What would Sir Isaac Newton say?

1 February 2023 (Crete, Greece) – – Here I am sitting in my remote corner of Crete, nothing but the warm sun (16°C, 60°F) and a calm sea to keep me company, thinking about ChatGPT – and the failure of education. I recall learning from Mr. DeNardo, my high school algebra teacher, this statement by Sir Isaac Newton (you remember, the apple and calculus guy): “If I have seen further, it is by standing on the shoulders of giants”. Did Sir Isaac actually say this, or was it his PR team? Well, it was in a letter he wrote to Robert Hooke in 1675. But I don’t care too much. It is the gist of the sentence that matters. Why? I just finished a stack of reading on ChatGPT and plagiarism and “theft of IP”. More on that in a moment. But first …

MUDWRESTING!!!

The AI search fight is officially underway. True, the much anticipated Baidu AI (with billions more parameters than GPT-3) will not be available until March 2023, but the trumpet has sounded.

NOTE TO READERS: Marc Hellenberg, my “go to data guy” on ChatGPT because he knows the details on how ChatGPT was constructed, uses “doodads”. So GPT-3 has 175 billion “doodads”. GPT-4 is rumoured to have 100 trillion “doodads”. Baidu is rumoured to have 200 “doodads”. Marc says parameters isn’t quite the correct nomenclature but it has made its way into the media and so everybody uses it. Parameters (also called “weights”) should be thought of as connections between data points made during pre-training. Parameters can also be compared with human brain synapses, the connections between our neurons. It’s complicated. Buzz me and I’ll send you a briefing paper.

And so the fighters are making their way from the changing room to the mud pit. In the stands are dozens of AI infused applications. There are glimpses of its capabilities during its warm up. The somewhat unsteady Googzilla is late. Microsoft has been in the ring waiting for what seems to be a dozen or more news cycles. More spectators are showing up. “Oh! Look! Baidu is here!”

However, there is a spectator with a different point of view from the verdant groves (and pizza joints) of Silicon Valley. This Merlin is a guy named Arvind Narayanan, who according to “Decoding the Hype About AI”, once gave a lecture called “How to Recognize AI Snake Oil.” That talk is going to be a book called “AI Snake Oil”. Catchy title. Yep, snake oil: A product of no real worth. No worth. Sharp point: Worth versus no worth. What’s worth, anyway?

Well, read the article which is an interview with a person who wants to slow the clapping and stomping of the attendees. Here’s a quote from Dr. Arvind Narayanan’s interview:

Even with something as profound as the internet or search engines or smartphones, it’s turned out to be an adaptation, where we maximize the benefits and try to minimize the risks, rather than some kind of revolution. I don’t think large language models are even on that scale. There can potentially be massive shifts, benefits, and risks in many industries, but I cannot see a scenario where this is a “sky is falling” kind of issue.

Some observations:

• Google and its Code Red Alert (“HELP! Get Page and Brin back in here!!”) suggest that Dr. Narayanan is way off base for the Google search brain trust. Maybe Facebook and its “meh” response are better?

• Microsoft’s bet on OpenAI is going with the adaptation approach. “Smart Word” may be better than “Clippy” – plus it may sell software licenses to big companies, marketers, and students who need essay writing help (according to a leaked plan).

• If ChatGPT is snake oil … what’s the fuss, exactly? Could it be that some people who are exposed to ChatGPT perceive the smart software as new, exciting, promising, and an opportunity? That seems a reasonable statement at this time.

• The split between the believers (Microsoft, et al) and the haters (Google, et al) really surfaced with “that Timnit Gebru incident thing” at Google.

More intellectual warfare is likely: bias, incorrect output pretending to be correct, copyright issues, etc. Is technology exciting again? Yes. Finally.

And just some brief words on copyright …

That “standing on the shoulders of giants” thing.

Last week there seemed to be petabytes spilled on CNET’s use of ChatGPT and how it committed plagarism, was riddled with errors, skewered babies while in its cups, etc., etc.

Hold on. How is any self-respecting, super buzzy smart software supposed to know anything without ingesting, indexing, vectorizing, and doing all that other cool math magic the developers have baked into the system? Did Filippo Brunelleschi wake up one day and do the “Eureka!” thing? Maybe he stood in the Pantheon tourist line, entered … and looked up? Maybe he found a wasp’s nest and cut it in half and looked at what the feisty insects did to build a home? Obviously intellectual theft. Just because the dome still stands … well, when it falls, he is an untrustworthy architect engineer. Argument nailed.

The write-up I linked to above focuses on other ideas; namely, being incorrect and stealing content. Okay, those are interesting and possibly valid points. The write-up states:

All told, a pattern quickly emerges. Essentially, CNET‘s AI seems to approach a topic by examining similar articles that have already been published and ripping sentences out of them. As it goes, it makes adjustments — sometimes minor, sometimes major — to the original sentence’s syntax, word choice, and structure. Sometimes it mashes two sentences together, or breaks one apart, or assembles chunks into new Frankensentences. Then it seems to repeat the process until it’s cooked up an entire article.

Except that somebody did some research (ok, they Googled) and found the exact same patterns by 100+ media companies and corporations – taking similar articles that had already been published, ripping sentences out of them, making adjustments to syntax, word choice, structure … etc., etc., etc.

But that does not mean the copyright hawks aren’t on the attack. Entrepreneurs are eyeing this “new frontier” – built on artificial intelligence that can generate coherent text, captivating images, and functional computer code. But another “new frontier”, a legal one, is forming a looming cloud of its own.

I draw your attention to one case I think could alter the landscape. Now, this case needs a lot of discussion so for purposes of this short column some brief comments:

A class-action lawsuit filed in a federal court in California takes aim at GitHub Copilot, a powerful tool that automatically writes working code when a programmer starts typing. The coder behind the suit argues that GitHub is infringing copyright because it does not provide attribution when Copilot reproduces open-source code covered by a license requiring it. The lawsuit is at an early stage, and its prospects are unclear because the underlying technology is novel and has not faced much legal scrutiny. But legal experts say it may have a bearing on the broader trend of generative AI tools. AI programs that generate paintings, photographs, and illustrations from a prompt, as well as text for marketing copy, are all built with algorithms trained on previous work produced by humans.

Visual artists have been the first to question the legality and ethics of AI that incorporates existing work. Some people who make a living from their visual creativity are upset that AI art tools trained on their work can then produce new images in the same style. The Recording Industry Association of America, a music industry group, has signaled that AI-powered music generation and remixing could be a new area of copyright concern.

My big take-away: there’s a lot at stake here for Microsoft in particular, because it’s investing all that money in ChatGPT. If it loses here, people will come after ChatGPT for something or other, too. Equally, if Microsoft wins, that makes the other lawsuits against generative AI systems tougher to win.

My other initial thoughts after only a cursory look at the complaints:

• There is a problematic issue, for example, where you claim that all resulting images are necessarily derivatives of the five billion images used to train the model. I’m not sure if I like the implications of such level of dilution of liability. This is like homeopathy copyright – any trace of a work in the training data will result in a liable derivative. That way madness lies.

• Somewhere in every process a human must have input in choosing what goes into the image, the text, the code. A human directs the story (unless ChatGPT writes it completely) and edits it. Is that not part of the creativity?

Oh, and Sir Isaac? How does he fit into this future-leaning analysis? Oh, he’s still pre-occupied with proving that the evil Gottfried Wilhelm Leibniz was tipped off about tiny rectangles and the methods thereof. Blame smart software.

Leave a Reply

Your email address will not be published. Required fields are marked *

scroll to top