Search the KHIT Blog

Tuesday, May 7, 2024

Developments in AI "voice cloning"

A tiny start-up has made some of the most convincing AI voices. Are its creators ready for the chaos they’re unleashing?
By Charlie Warzel

My voice was ready. I’d been waiting, compulsively checking my inbox. I opened the email and scrolled until I saw a button that said, plainly, “Use voice.” I considered saying something aloud to mark the occasion, but that felt wrong. The computer would now speak for me.

I had thought it’d be fun, and uncanny, to clone my voice. I’d sought out the AI start-up ElevenLabs, paid $22 for a “creator” account, and uploaded some recordings of myself. A few hours later, I typed some words into a text box, hit “Enter,” and there I was: all the nasal lilts, hesitations, pauses, and mid-Atlantic-by-way-of-Ohio vowels that make my voice mine.

It was me, only more pompous. My voice clone speaks with the cadence of a pundit, no matter the subject. I type I like to eat pickles, and the voice spits it out as if I’m on Meet the Press. That’s not my voice’s fault; it is trained on just a few hours of me speaking into a microphone for various podcast appearances. The model likes to insert ums and ahs: In the recordings I gave it, I’m thinking through answers in real time and choosing my words carefully. It’s uncanny, yes, but also quite convincing—a part of my essence that’s been stripped, decoded, and reassembled by a little algorithmic model so as to no longer need my pesky brain and body…
  The Atlantic, Charlie Warzel: AI voice cloning
This stuff is getting so good, so rapidly. Beyond burgeoning anxieties with regard to job losses, we're gonna have increasing difficulties with "autheentication" (the "disinfo" thing).
OK, it's been on my to-do list to record myself on my Mac reading the entire U.S. Constitution from Preamble through the 27th Amendment and then post the mp3 online. I've read it aloud from start to finish several times and have studied it closely piecemeal going all the way back to graduate school in the mid- 1990s. I am fairly SME with the 4th Amendment in particular. It comprised a central focus of my nearly 300 page Master's Thesis (pdf). My personal study of a range of legal and constitutional issues has continued ever since graduate school. Here’s a post from last year. So, no, I don’t have much patience for people who blab on about such topics without any substantive underlying knowledge. 
Why bother? Well, again, it just goes to my ongoing irritation with our overpopulation of dilettante Barstool ConLaw Geniuses, most of whom have likely never read all of it, or rationally grasped its provisions (many of them elected officials, from Donald Trump on down). I would never be so arrogant as to claim ConLaw expertise (uh, for starters, IANAL). Nonetheless, in addition to my lengthy, ongoing reading-comprehension "hermeneutic"-level efforts, I have dug into and tabulated a bit of info perhaps of interest to all those "textualists" out there.
Current English language word count, 171,476, obsolete 47,156 (some authorities think the current active tally is a significant undercount)

US Constitution

  7,420 words total, *
  1,065 unique words,
  497 appearing only once (48%)
  (Preamble thru 27th Amendment, *Signators’ names excluded)

(Only 0.53% of all English words are in Constitution)

Words not found: “democracy,” “privacy” “outer” “perimeter”

Appearing 16 times: “vote” 14 “votes”
Appearing 9 times: “election”

Phrases not found:
   “Co-equal branch(es)”
   “Separation of Powers”
   “Checks and Balances”
   “outer perimeter”

153 sentences, 2 Declarative, 151 Imperative. Mostly compound / complex.

89 semicolons
11 colons
559 commas
195 periods
24 dashes
5 open/close parentheses

Commonly listed English parts of speech are noun, verb, adjective, adverb, pronoun, preposition, conjunction, interjection, numeral, article, or determiner.

Numerals 1-10 (total 139, 1-27 inclusive):
1      32
2      29
3      15
4      18
5       8
6       6
7       5
8       3
9       3
10     2
And, while there's legal scholar consensus that the bulk of "Conservative Strict Construction Originalist Textualism" hovers at the semantic/contextual/phraseological level, SCOTUS Justice Barrett recently remarked at Orals "putting aside for the moment the meaning of the word 'and' and the placement of a comma..."
I  kiddeth thee not.
Back to my mp3 idea. I apparently can now just pay ElevenLabs $22 to digitally "clone" my voice, after which I could simply post fake recitations of all manner or prose I'd never read. Naysayers might well scoff at any genuine audio V/O.
So, for the near future, one might have to go all the way to a video talking-head recording of such material using YouTube to demonstrate one's chops. But, full-on “deep-fake“ video is likely not that far off either.
Charlie Warzel continues:
The uncomfortable reality is that there aren’t a lot of options to ensure bad actors don’t hijack these tools. “We need to brace the general public that the technology for this exists,” Staniszewski said. He’s right, yet my stomach sinks when I hear him say it. Mentioning media literacy, at a time when trolls on Telegram channels can flood social media with deepfakes, is a bit like showing up to an armed conflict in 2024 with only a musket.

That made me cry. Tears of joy. Back during my musician days, I used to sing his song “I’m Gonna Love You Forever” during my solo acoustic days. He was a great CW artist.  Using AI to extend his work in the wake of his severe stroke misfortune is a wonderful application of this technology.

BTW, I riffed a bit on malign AI potential back in December.

Jacob Collier is now 30. He is without any exaggeration a complete musical / music technology genius. He is also no mere studio / tech performer. Below, 2 hours of jaw-dropping live concert performance in Lisbon.

His band, (half brilliant female), is simply fabulous.

No comments:

Post a Comment