A Google team has suggested leveraging mobile phone data, including images and searches, to build a “bird’s-eye” perspective of users’ life through artificial intelligence technology.
According to a copy of a presentation seen by CNBC, the plan, named “Project Ellmann” after biographer and literary critic Richard David Ellmann, would be to use LLMs like Gemini to ingest search results, spot patterns in a user’s photos, create a chatbot, and “answer previously impossible questions.” It says that Ellmann wants to be “Your Life Story Teller.”
It’s unknown if the business intends to include these features in Google Photos or any other product. According to a business blog post, Google Photos has 4 trillion photos and videos and more than 1 billion users.
Project Ellman is only one of many ideas Google has to use AI to develop or enhance its products. Google unveiled Gemini, its newest and “most capable” AI model to date, on Wednesday. In certain scenarios, Gemini fared better than OpenAI’s GPT-4. The business intends to licence Gemini to numerous clients via Google Cloud so they can utilize it in their own apps. The fact that Gemini is multimodal—that is, capable of processing and comprehending data other than text—such as pictures, videos, and audio
At a recent internal meeting, a Google Photos product manager introduced Project Ellman to the Gemini teams, according papers obtained by CNBC. The teams have been working for the past few months to come to the conclusion that the best technology to enable this aerial view of an individual’s life story is massive language models.
According to the presentation, Ellmann could characterize a user’s photos in greater detail than “just pixels with labels and metadata” by incorporating context from biographies, earlier events, and later images. It suggests being able to pinpoint a range of experiences, such as years spent in college, the Bay Area, and as a father.
One description features a picture of a little boy playing with a dog in the dirt and the words, “We can’t answer tough questions or tell good stories without a bird’s-eye view of your life.”
A presentation slide says, “We search through your photos, examining their locations and tags to find a meaningful moment.” “When we take a broader view and comprehend your life as a whole, your overall narrative becomes evident.”
According to the presentation, big language models could predict events such as the birth of a user’s child. “This LLM can deduce that this is Jack’s birth and that he is James and Gemma’s only child by using knowledge from higher in the tree.”
A slide with an illustration of a user’s various life “moments” and “chapters” reads, “One of the reasons that an LLM is so powerful for this bird’s-eye approach, is that it’s able to take unstructured context from all different elevations across this tree, and use it to improve how it understands other regions of the tree.”
Another example of identifying a user who had just attended a class reunion was provided by the presenters. In their presentation, the team hinted that it was likely a reunion because it had been exactly ten years since he graduated and there were many faces, they hadn’t seen in that time.
With the description, “Imagine opening ChatGPT but it already knows everything about your life,” the developers also showcased “Ellmann Chat.” What questions would you pose to it?
It brought up a sample chat where a user queries, “Do I have a pet?” It responds, “Yeah, I have a dog,” and provides the name of the dog along with the names of the two family members it is most frequently seen with. The user’s dog also happens to be wearing a red raincoat.
A user also inquired about the last visitation date of their siblings throughout the chat. Another person, who was considering moving, asked it to provide towns that were comparable to their current location. Ellmann responded to both questions.
Additional slides displayed Ellmann’s summary of the user’s eating habits. You seem to be fond of Italian cuisine. Together with a picture of a pizza, there are multiple images of pasta dishes. Additionally, it stated that the user seems to enjoy trying new foods because one of their images included a menu item that it was unable to identify.
According to the presentation, the system also used the user’s screenshots to determine what things the user was thinking about buying, their interests, and their work and travel schedules. Additionally, it implied that it will be able to identify their preferred apps and websites, using Google Docs, Reddit, and Instagram as examples.
Google Photos has long employed AI to help users search through their images and videos, and we’re enthusiastic about the potential of LLMs to unlock even more beneficial experiences, a Google representative told CNBC. This was a preliminary internal investigation, and should we choose to release new features in the future, we would take the necessary time to make sure they were beneficial to consumers and made with their safety and privacy as our top priorities.
The rush by Big Tech to develop “memories” powered by AI
In the arms race amongst digital titans to produce more customized life memories, Google may benefit from the proposed Project Ellmann.
For years, albums have been created and “memories” have been served by Google Photos and Apple Photos based on photo trends.
Google said in November that it can now use AI to group together comparable photographs and arrange screenshots into easily accessible albums in Google photographs.
In June, Apple revealed that the most recent software update would enable its camera app to identify people, dogs, and cats in images. Users may already search for faces by name and have them sorted by face.
Apple has also revealed plans to release a Journal App that will leverage AI on the device to generate tailored recommendations based on recent images, locations, workouts, and music, encouraging users to write sections that capture their experiences and memories.
However, Google, Apple, and other internet behemoths are still figuring out how to properly display and identify photographs.
For example, despite reports in 2015 that the business mislabeled Black people as gorillas, Apple and Google continue to avoid labelling gorillas. This year, a New York Times investigation discovered that, out of concern that a person would be mistaken for an animal, Apple and Google disabled the capability to visually search for primates in their Android software, which powers the majority of smartphones worldwide.
Over time, companies such as Google, Facebook, and Apple have implemented controls to reduce the appearance of undesirable memories. However, consumers have claimed that these memories occasionally still appear and that minimizing them requires navigating through multiple settings.