Google IO has introduced many new features aimed at expanding the capabilities of Google Gemini. One of the most notable updates is the launch of Gemini Advanced, with Google Gemini Pro 1.5, which was previously available only to developers, now available on Google’s advanced platform. With Google Gemini Pro 1.5, users can take advantage of a context window of up to 1 million tokens, allowing for the processing of large amounts of information from documents such as Google Docs, PDFs, and Word files. This means you can add a large amount of data from various sources and ask specific questions based on that vast context. For example, a million-token context window allows you to review an hour of video, 11 hours of audio, 30,000 lines of code, or a 700,000-word document. This is a significant advancement in handling diverse and complex information, demonstrating Google Gemini‘s superior ability to integrate and manage massive data.
Another significant announcement at Google IO is the introduction of a low-latency multimodal model called Gemini Flash. This model provides advanced reasoning capabilities and large context windows of up to 1 million tokens, making it more efficient than other Gemini models. You can try out Gemini Flash right now in Google AI Studio and Vertex AI. For developers, Google offers an extended two-million-token context window, enhancing the ability to store and process larger datasets. Notably, Gemini Flash is very affordable: 35 cents for content under 128,000 tokens and 70 cents for up to 1 million output tokens, as well as 53 cents per million output tokens for prompts up to 128,000 tokens and $1.05 for longer prompts. This is significantly cheaper than OpenAI’s GPTo, which is currently priced at $5 per million input tokens and $15 per million output tokens, making Gemini Flash the most affordable and powerful multimodal model on the market.
In addition, Google announced the availability of new vision features for Gemma, an open-source model based on Google Gemini. Gemma is now available on Vertex AI and other platforms, with versions like RecurrentGemma and CodeGemma. Starting today, you can use PaliGemma, a multimodal model with vision capabilities. Currently, Gemma comes in two small sizes, 2 billion and 7 billion parameters, but a 27 billion parameter version will soon be available. This increase in the number of parameters helps Gemma process information and perform more complex tasks with higher accuracy. Google Gemini is also beginning to roll out on Google Search with real-time information and customization capabilities. Users can try out AI Overviews, which helps Google create AI-powered custom search results pages, providing information on dining, recipes, movies, hotels, shopping, and more. The combination of search capabilities and AI will enhance user experience and provide accurate, timely information.
Another notable product is Project Astra, a global AI agent that continuously processes and responds to everything it sees in real time through video. Google demonstrated Project Astra last year, and OpenAI also recently introduced a similar model. However, this product will only be released on Gemini applications later this year. Project Astra promises to bring real-time video data analysis and monitoring capabilities, enhancing management efficiency and enabling quick decision-making. Another update is Imagen 3, the new version of Google’s AI image generation model. Imagen 3 promises to be more realistic, more responsive to prompts, and better at rendering text. Google also introduced VO, a version that can generate impressive long videos. You can sign up to test these tools on Google Labs, which offers Google’s experimental projects. Imagen 3 and VO will open up many new creative opportunities for developers and users, creating high-quality and diverse products.
Although there are many other minor announcements, most of the new features are not immediately available. However, you can be assured that Google Gemini will soon be widely deployed across all of Google’s products. Google’s approach is to create automated agents that help users complete tasks faster and more efficiently. With these updates and improvements, Google Gemini promises to bring many new and convenient experiences to users, while also strengthening Google’s position in the field of artificial intelligence and multimodal technology. These advancements not only meet the growing needs of users but also reshape how we interact with and use technology in our daily lives.
Tác giả Hồ Đức Duy. © Sao chép luôn giữ tác quyền