- | 8:00 am
Why Apple’s Vision Pro should remove the screen
Our concept for a screenless AR assistant launched nearly 10 years ago. And we still believe it could be as, if not more impactful, than Apple’s Vision Pro.Our concept for a screenless AR assistant launched nearly 10 years ago. And we still believe it could be as, if not more impactful, than Apple’s Vision Pro.
Apple’s recent announcement of its Vision Pro headset garnered a mixed bag of reviews, with even some of its most ardent supporters scratching their heads at what to make of it. Wearable mobile computers are undoubtedly a part of our future, though, and Apple’s entrance validates that at the highest levels.
This is a topic my team at argodesign has been studying for well over a decade in experiments and private partnerships—which includes three and a half years working as a Strategic Design Partner with Magic Leap on the fundamentals of Mixed Reality devices.
We have many entries into the gestalt here, from the principle expanding Interactive Light interfaces to helping design one of the first comprehensive design guides for Mixed Reality. We even developed a concept for a consumer adoptable headset we called Reality X.
But if we go all the way back to our first year at argodesign, we developed the one concept we think is actually the best answer to solving some of the initial criticisms of Vision Pro. Something that could exist today. Something at the time we called LaLaLa. Something that—thinking within Apple’s own nomenclatures—maybe we should have called it the VisionPod.
And the combined capabilities of Generative AI (GAI) along with the market opening power of Apple makes us think maybe the time has come for a device like this.
LaLaLa was imagined to be an earpiece with onboard compute power, a mic, and forward facing camera. There was notably no screen. At the time the use cases we imagined were as a personal assistant. It could see what you could see, so you could ask it things like who is that or what can I make for dinner while looking at your fridge. Hence the colloquial LaLaLa—sing to it and it sings back.
The device seemed very sci-fi at the time, but nearly a decade later, the technology has largely caught up to the concept. A feature in iOS 17 called Visual Look Up could enable the option to look in your fridge to figure out dinner tonight. It turns out that GAI is the best friend of wearable computing.
AI REDUCES OUR NEED FOR A SCREEN
A device like LaLaLa imagined you could talk to it like a person and it could answer in a comprehensible way. Large Language Models like ChatGTP have since made this conversational interface possible.
Believe it or not there are a lot of moments that simply don’t require screens. They can be handled with a voice whispering in your ear. Getting directions to sushi nearby, identifying what car I am looking at, finding out LinkedIn details on a person in front of me, even narratives like, “take me on a ghost hunting tour of this area.” All are possible with GAI. They’re no longer fan fiction stories in the world of computing.
For years we have also dreamed up features that required a process called Object Identification. Imagine users asking “how much does that desk weigh?” (while at a vintage store looking at the desk), “do locals like that restaurant?” (while looking across the street), or “can I get that dress at a lower price?” (while in a department store looking at the dress). Large Image Models have taken these scenarios into the world of possibility, as they can now both identify objects but then also understand their context.
This advance in computer vision will also help make the world more digitally addressable. Anchoring in mixed reality is the ability to lock a URL to a physical object. What Apple calls WorldAnchors are a new fundamental pointer, like the QRCode, that will allow the world itself to act like a giant Google Drive with folders containing objects, files and links laid out like layers over the real world.
What that creates is an entirely new pattern of computing spread across our environment, what Apple and others have dubbed Spatial Computing. One that is filled with digital experiences that fit into and are contextualized by physical placefulness. Thanks to GAI those experiences will manifest themselves in more forms, like a talking voice in your ear, but also as software written and launched in real time.
THE RISE OF REALTIME APPS
It turns out Large Language Models are very good at writing code. Before long, a spoken prompt will be able to invoke a complete running piece of software as easily as you can ask Siri for the weather today. (This ability is already being worked on by startups like Builder.AI, one of our clients.)
When AI can write apps, it enables the creation of software without the need for a business behind it to pay off the costs (you could perhaps subscribe to this service like we do ChatGPT today, or run many of these AI models locally on your phone in the future). This means we will see more smaller apps customized for a specific context and user. With my LaLaLa, I can imagine looking at a bookshelf and asking it to help me organize my books. In real time, AI could spin up an application for organizing books, complete the steps to organize them by value, age, author, color or size. Apps like this don’t often exist today because there’s not a large enough market. But tomorrow, a single person could be the addressable market.
That means in a few short years, a big part of the App Store will be eviscerated as small apps are invoked in real-time and simply become the output of your operating system.
WHAT COMES NEXT ALONGSIDE VISIONPOD? MAYBE THE ULTIMATE OS
Realtime apps would completely change the way our mobile devices work. Wearables like LaLaLa could offer incredible utility without putting a screen over your face, especially when working in concert with a smartphone in your pocket. And we can imagine several ways using that mobile and spatial computing could merge into something that simply feels more natural to use.
- You will launch software with a prompt
- “Hey Siri help me plan a birthday party”
- You will not be tied to one type of UX
- you can ask for a party planning app to be in the style of Slack or PowerPoint or as a calendar.
- A lot of experience will start from the camera
- The camera gives context so you point it at your grinding dryer and say “help me fix this dryer.” GAI will identify the model, classify the sound, figure out the repair, look up the part and offer it to you next day for $143 or installed for $250.
- Other experiences will start in the middle
- When the dryer part eventually arrives, you won’t need to load an app or YouTube video to understand installation. Just looking at your dryer, you can hold out a bearing kit and say “help me install this”—no pre-planning required
- Alerts become more important than apps
- Alerts will answer your questions and serve as a chatbot with your phone and become a way for the device to answer quickly as technology
- You will have far fewer apps and icons
- Most ad-supported software will simply be gone as you invoke those apps with a prompt directed at your Super OS
A REASON FOR OPTIMISM
We have crossed the Uncanny Valley and come out on the Plains of Myopia. Meaning computing has suddenly become undetectably human and so capable it is hard for us to take in the vastness of the new possibilities. That leads us to scan the horizons and mostly see the creepy boogeymen of AI competing with humanity or leading us to some doom.
But as we look around and widen our aperture we realize that is not the only path. We can simply choose to walk around those pitfalls toward a more human form of computing. Not a dystopian future of people wearing digital ski masks, but one where we can be more present with each other. One where technology lives in service of human interaction. Where it builds authenticity rather than influencers and a force for building presence with each other rather than escapism to some virtual world.