A friend recently recommended that I watch “The Circle” on Netflix. I won’t ask you to subject yourself to the same torture (I barely made it through the first episode) but I also couldn’t let my suffering be in vain, so I invite you to read about the show instead.
If you haven’t watched it (and I genuinely hope you have not watched it) here’s the general plot:
- It’s a reality show where contestants compete to win $100,000
- To win the show, you must be voted as “most popular” by the other contestants
- The contestants (a handful of men and women) all live in the same apartment, but they never meet each other in person
- They can only interact via digital identities (which may not be who they actually are in real life)
Because contestants never meet in person, they must communicate using the show’s namesake: “The Circle.”
And what is “The Circle?”
Each contestant’s apartment is outfitted with TVs in every room that display a UI for the contestants to use for communication. But these are not your ordinary, run of the mill touchscreens. In fact, they aren’t touchscreens at all! In what’s presented as a wild twist, these screens can only be controlled using 🥁🥁🥁 voice.
While the show never explicitly calls “The Circle” an AI, its implicitly described as a sort of voice-controlled operating system. Like if you had an iPhone but you had ask Siri to do literally everything for you — from opening apps to scrolling through feeds.
Of course I know why the producers made this decision — people sitting on their couches silently tapping away their texts doesn’t translate to good TV — but have you ever listened to someone dictate their text messages? It’s like taking a cheese grater to your ear.
So because THE ONLY WAY THE CONTESTANTS CAN TALK TO EACH OTHER IS BY DICTATING THEIR MESSAGES COMMA MOST OF THE DIALOGUE ON THE SHOW SOUNDS EXACTLY LIKE THIS EXCLAMATION POINT CRYING EMOJI.
The show is supposedly about the perils of social media and digital communication (I think?), but I found the interaction between contestants and “The Circle” to be far more interesting than the person-to-person catfish drama.
The main challenge of interacting with a voice controlled system is that you have no idea what it’s capable of doing — the edges of the technology are invisible. The only way to discover the boundaries of its capabilities is through trial and error: you tell or ask it something and then wait for a response. Each exchange is a pass/fail test of confidence.
If you’ve ever used Siri or the Google Assistant more than once, then you know this is a super frustrating way to engage with technology.
So once contestants learned they could only interact with “The Circle” via voice, they began their tests as if they were taking their first steps onto a frozen pond. Their initial requests were made as timid questions, like
“Circle, take me…to my profile?”
Though they braced themselves for disappointment, “The Circle” responded flawlessly to each request.
Brimming with confidence and delight after a few more successful tests, contestants threw all caution to the wind and blazed fearlessly ahead with requests like:
“Circle, choose the photo of me in the black shirt.”
The sentence is so simple but it highlights exactly what makes natural language processing so difficult: the system must understand what “black” is. And what a “shirt” is. And that “shirts” can be “black.” And that “shirts” can also not be “black.” And who “me” is. And that “choose the photo” means this photo in relation to those photos.
If I type “black shirt” into my Google Photos — which is probably the most accurate general purpose image recognition system in the world — these are the results I get:
This is all a very long way of saying a very unremarkable thing that I suspect everyone already knows — at least in the back of their minds — which is that “The Circle” is not an advanced digital assistant but a gaggle of interns crowded into a room who are glued to live feeds of each apartment, listening with bated breath for the next voice command or message that they must furiously respond to or transcribe.
There’s a research technique that I learned about while I was at Google called the “Wizard of Oz”, where participants interact with a system they believe to be a computer but is actually being operated by a “wizard” behind the scenes.
This works well for a research session that might be a couple of hours long, but could you imagine being a “wizard” for 3 straight weeks of filming? These folks, not the producers or the contestants, are the true heroes of the show.
And I have so many questions for them!
Who are they? What other jobs have they held? Will they return for Season 2? Were they assigned one contestant each, or multiple? Were they fired for typos or slow responses? Did they work in shifts? Did they go through training for consistency? Were contestants ever weirded out by their accuracy? Did they ever make mistakes on purpose? Will one of them break their NDA and give me the answers I crave?
Justin's "Bonsai" Newsletter
A personal newsletter on life, culture, and design.