Google’s Gemini AI: A Leap Towards Understanding Multimedia

Photo of author

By Carina

Google is introducing Gemini, a new Bard AI chatbot model designed to natively comprehend video, audio, and photos. Initially, Google Pixel 8 phone owners will have access to Gemini’s enhanced artificial intelligence capabilities.

Read More: Santos discloses plans to file ethics complaints against former colleagues

Gemini’s Multifaceted Capabilities 

Gemini is set to revolutionize the world of AI by expanding beyond text-based interactions. While text is crucial, our understanding of the world involves complex multimedia elements such as video, audio, and imagery. Gemini aims to bring AI closer to comprehending this rich and dynamic world.

Improving AI Abilities

Gemini’s initial release in English offers text-based chat enhancements, improving AI capabilities for tasks like document summarization, reasoning, and even writing programming code.

However, the more significant change, enabling Gemini to understand multimedia, including hand gestures in videos and solving puzzles, is set to arrive soon, according to Google.

Also Read: DeSantis aims to recover lost ground in high-stakes Alabama debate

Gemini’s Three Versions

Google has developed three versions of Gemini to cater to different levels of computing power:

  • Gemini Nano: Designed for mobile phones with varying memory options, this version will power new features on Google’s Pixel 8 phones.
  • Gemini Pro: Tuned for fast responses, this version runs in Google’s data centers and enhances the Bard AI.
  • Gemini Ultra: Limited to a test group initially, it will be available in a new Bard Advanced chatbot in early 2024, likely coming at a premium cost.

Rapid Progress in Generative AI

The launch of Gemini highlights the rapid advancements in generative AI, where chatbots generate responses based on plain language input rather than complex programming instructions. Google’s competitor, OpenAI, released ChatGPT a year ago.

Still, Google has since introduced its third major AI model revision to integrate it into widely-used products like Search, Chrome, Google Docs, and Gmail.

A Step Closer to Human Understanding

Eli Collins, a product vice president at Google’s DeepMind division, expressed the goal behind Gemini: “Gemini brings us a step closer to that vision” of AI models inspired by how humans understand and interact with the world.

While multimedia capabilities promise significant changes, AI models, including Gemini, still grapple with fundamental issues.

They can generate increasingly sophisticated responses, but the accuracy and trustworthiness of those responses remain a concern.

Google’s chatbot warns users of the possibility of inaccurate information.

Gemini’s Training Approach

Gemini represents Google’s next-generation language model, trained on various data types, including text, programming code, images, audio, and video.

This approach enables it to efficiently handle multimedia input, unlike separate, interconnected models for each data type.

Examples of Gemini’s Abilities

According to a Google research paper, Gemini exhibits diverse capabilities. It correctly predicts the next shape in a sequence of shapes, identifies links in photos, converts charts into tables, and even assists in solving handwritten physics problems.

Ongoing Testing and Responsiveness

While Gemini shows promise in demonstrations, it awaits further testing, including “red teaming” to identify vulnerabilities.

Handling multimedia data, which can convey different meanings when combined, poses unique challenges for AI systems.

Google CEO Sundar Pichai emphasized the company’s commitment to ambitious research while responsibly addressing risks.

Collaboration with governments and stakeholders is part of Google’s strategy as AI capabilities evolve.

Read Next: Texas Takes Legal Action Against Pfizer Over Vaccine Efficacy Claims


Related Posts

Leave a Comment