7-1-2023 (SAN FRANCISCO) Google has revealed its latest AI project, Gemini, which aims to position the company as a global leader in artificial intelligence. Gemini is a new type of AI model that can process text, images, and video, making it a significant advancement in Google’s AI capabilities.
Gemini is described as a “natively multimodal” model, as it was trained on various types of data, including images, video, and audio, unlike previous large language models that focused solely on text. This multimodal training makes Gemini highly versatile and capable across different tasks.
The initial release of Gemini is happening today within Google’s chatbot called Bard, which is similar to OpenAI’s ChatGPT. Bard will now be powered by Gemini Pro, enabling more advanced reasoning and planning capabilities. Gemini will also be made available to developers through Google Cloud’s API starting from December 13. Additionally, a more compact version of the model will be integrated into Pixel 8 smartphones, enhancing suggested messaging replies from the keyboard. Google plans to introduce Gemini into other products, including generative search, ads, and Chrome, in the coming months. The most powerful version of Gemini, called Gemini Ultra, is scheduled to debut in 2024, pending rigorous trust and safety checks.
Demis Hassabis, CEO of Google DeepMind, expressed excitement about Gemini’s performance and the potential for developers to build upon it. Gemini’s multimodal abilities have significantly improved Bard’s skills, such as content summarization, brainstorming, writing, and planning, according to Sissie Hsiao, vice president at Google and general manager for Bard.
Google demonstrated Gemini’s proficiency in handling visual information through various demos. The AI model responded to a video featuring drawn images, puzzles, and game ideas related to a world map. Gemini also showcased its potential in scientific research by answering questions about a research paper with graphs and equations.
Gemini Pro, the model being rolled out this week, outperformed its predecessor, GPT-3.5, on six out of eight commonly used benchmarks for testing the intelligence of AI software, as stated by Eli Collins, vice president of product for Google DeepMind. Gemini Ultra, the most powerful version scheduled for release in 2024, achieved a remarkable score of 90 percent on the Massive Multitask Language Understanding (MMLU) benchmark, surpassing other models, including GPT-4.
While Google has released a technical report providing insights into Gemini’s inner workings, specific details regarding its architecture, model size, and training data remain undisclosed. Training large AI models like Gemini is a lengthy and expensive process, with estimated costs reaching hundreds of millions of dollars.
The competition in the AI landscape has intensified, with OpenAI’s ChatGPT making significant advancements and prompting Google to accelerate its AI efforts. Google aims to maintain its position as the leading provider of AI services through the cloud, a market that could yield billions or even trillions in revenue.
Google has prioritized safety and quality testing with Gemini, collaborating with external researchers to identify potential weaknesses and ensure the model’s responsible behavior. The company is determined to address concerns related to AI technology and deliver a reliable and powerful solution.
Gemini marks an important milestone for Google and Alphabet, its parent company, as they strive to leverage AI research and reestablish their dominance in the field. With Gemini’s launch, Google continues its efforts to compete with ChatGPT and maintain its market share in search.
Gemini’s development was led by Google DeepMind, a division formed by merging Google’s AI research group, Google Brain, with DeepMind. The project leveraged the expertise of researchers and engineers from across Google and utilized the latest version of Google’s custom silicon chips, known as Tensor Processing Units (TPUs), for training the AI model.
Gemini derives its name from the twinning of Google’s major AI labs and pays homage to NASA’s Project Gemini, which paved the way for the Apollo Program’s moon landings.
Experts have expressed optimism about Gemini’s multimodal approach, recognizing its potential as a step in the right direction. However, as Gemini and similar proprietary models remain opaque, researchers are limited in their understanding of the inner workings of these AI systems.