Google Project Astra hands-on: Full of potential, but it’s going to be a while

The move to multi-modal AI is going to be multi-year endeavor.

Wed, May 15, 2024, 1:56 AM·5 min read

At I/O 2024, Google’s teaser for Project Astra gave us a glimpse at where AI assistants are going in the future. It’s a multi-modal feature that combines the smarts of Gemini with the kind of image recognition abilities you get in Google Lens, as well as powerful natural language responses. However, while the promo video was slick, after getting to try it out in person, it's clear there’s a long way to go before something like Astra lands on your phone. So here are three takeaways from our first experience with Google’s next-gen AI.

Sam’s take:

Currently, most people interact with digital assistants using their voice, so right away Astra’s multi-modality (i.e. using sight and sound in addition to text/speech) to communicate with an AI is relatively novel. In theory, it allows computer-based entities to work and behave more like a real assistant or agent – which was one of Google’s big buzzwords for the show – instead of something more robotic that simply responds to spoken commands.

The first project Astra demo we tried used a large touchscreen connected to a downward-facing camera. — Photo by Sam Rutherford/Engadget

In our demo, we had the option of asking Astra to tell a story based on some objects we placed in front of camera, after which it told us a lovely tale about a dinosaur and its trusty baguette trying to escape an ominous red light. It was fun and the tale was cute, and the AI worked about as well as you would expect. But at the same time, it was far from the seemingly all-knowing assistant we saw in Google's teaser. And aside from maybe entertaining a child with an original bedtime story, it didn’t feel like Astra was doing as much with the info as you might want.

Then my colleague Karissa drew a bucolic scene on a touchscreen, at which point Astra correctly identified the flower and sun she painted. But the most engaging demo was when we circled back for a second go with Astra running on a Pixel 8 Pro. This allowed us to point its cameras at a collection of objects while it tracked and remembered each one’s location. It was even smart enough to recognize my clothing and where I had stashed my sunglasses even though these objects were not originally part of the demo.

In some ways, our experience highlighted the potential highs and lows of AI. Just the ability for a digital assistant to tell you where you might have left your keys or how many apples were in your fruit bowl before you left for the grocery store could help you save some real time. But after talking to some of the researchers behind Astra, there are still a lot of hurdles to overcome.

An AI-generated story about a dinosaur and a baguette created by Google's Project Astra — Photo by Sam Rutherford/Engadget

Unlike a lot of Google’s recent AI features, Astra (which is described by Google as a “research preview”) still needs help from the cloud instead of being able to run on-device. And while it does support some level of object permanence, those “memories” only last for a single session, which currently only spans a few minutes. And even if Astra could remember things for longer, there are things like storage and latency to consider, because for every object Astra recalls, you risk slowing down the AI, resulting in a more stilted experience. So while it’s clear Astra has a lot of potential, my excitement was weighed down with the knowledge that it will be some time before we can get more full-feature functionality.

Karissa’s take:

Of all the generative AI advancements, multimodal AI has been the one I’m most intrigued by. As powerful as the latest models are, I have a hard time getting excited for iterative updates to text-based chatbots. But the idea of AI that can recognize and respond to queries about your surroundings in real-time feels like something out of a sci-fi movie. It also gives a much clearer sense of how the latest wave of AI advancements will find their way into new devices like smart glasses.

Google offered a hint of that with Project Astra, which may one day have a glasses component, but for now is mostly experimental (the glasses shown in the demo video during the I/O keynote were apparently a “research prototype.”) In person, though, Project Astra didn’t exactly feel like something out of sci-fi flick.

During a demo at Google I/O, Project Astra was able to remember the position of objects seen by a phone's camera. — Photo by Sam Rutherford/Engadget

It was able to accurately recognize objects that had been placed around the room and respond to nuanced questions about them, like “which of these toys should a 2-year-old play with.” It could recognize what was in my doodle and make up stories about different toys we showed it.

But most of Astra’s capabilities seemed on-par with what Meta has already made available with its smart glasses. Meta’s multimodal AI can also recognize your surroundings and do a bit of creative writing on your behalf. And while Meta also bills the features as experimental, they are at least broadly available.

The Astra feature that may set Google’s approach apart is the fact that it has a built-in “memory.” After scanning a bunch of objects, it could still “remember” where specific items were placed. For now, it seems Astra’s memory is limited to a relatively short window of time, but members of the research team told us that it could theoretically be expanded. That would obviously open up even more possibilities for the tech, making Astra seem more like an actual assistant. I don’t need to know where I left my glasses 30 seconds ago, but if you could remember where I left them last night, that would actually feel like sci-fi come to life.

But, like so much of generative AI, the most exciting possibilities are the ones that haven’t quite happened yet. Astra might get there eventually, but right now it feels like Google still has a lot of work to do to get there.

Catch up on all the news from Google I/O 2024 right here!