Gemini
Google’s family of foundation models. In this wiki Gemini appears as the provider of Embedding 2, the File Search API, Gemini Flash Live, and now Gemini Robotics-ER 1.6 for embodied reasoning in robot systems.
Key facts
- Gemini Embedding 2 is Google’s first natively multimodal embedding model in the RAG cluster, handling text, images, video, audio, and documents in one vector space [src-006].
- Gemini 2.5 Flash is the default chat model used against File Search in the earlier RAG workflow [src-006].
- Gemini Robotics-ER 1.6 extends the Gemini family into robotics, specializing in spatial reasoning, task planning, success detection, instrument reading, and physical safety constraints [src-039].
- Gemini Robotics-ER 1.6 is available through the Gemini API and Google AI Studio [src-039].
- In Dwarkesh Patel’s Reiner Pope interview, Gemini is used as a pricing and infrastructure example: long-context price tiers and public traffic estimates are treated as clues about Memory Wall for Long Context, Prefill vs Decode, and model-serving scale [src-042].
- Google Cloud Next ’26 introduces Gemini 3.1 Pro for complex workflow orchestration, Gemini 3.1 Flash Image / Nano Banana 2 for UI and visual assets, and broader Gemini Enterprise integration across agents, data, Workspace, and customer experience [src-044].
- In [src-061], Gemini is treated as the main consumer-chatbot challenger to ChatGPT: Gemini 3 had major release momentum and Google has scale, TPUs, and data-center integration, but differentiation and habit still matter against OpenAI’s incumbent ChatGPT position.
- In [src-062], Pichai frames Gemini as the product/model layer sitting on top of long-term Google bets: TPUs, Brain/DeepMind integration, search, Android/XR, and Google’s full-stack infrastructure.
- The same source links Gemini-style multimodal intelligence to Project Astra, Android XR, AI Mode in Google Search, and Google Beam-style presence products [src-062].
Related concepts
- Embodied Reasoning
- Agentic Vision
- Robotic Success Detection
- Robotic Instrument Reading
- LLM Inference Economics
- Memory Wall for Long Context
- Prefill vs Decode
- Agentic Enterprise
- Workspace Intelligence
- Enterprise Knowledge Graph
- Model Lab Differentiation
- GPU Supply as AI Strategy
- AI Search as Context Layer
- Agentic Operating Systems
- AI Productivity Multiplier
Source references
- [src-006] Nate Herk cluster — Nate Herk — RAG and data ingestion cluster (5 videos)
– Videos referenced: hem5D1uvy-w, irg-2IfAjpo
- [src-039] Laura Graesser and Peng Xu — “Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning” (2026-04-14)
- [src-042] Dwarkesh Patel — “How GPT, Claude, and Gemini are actually trained and served – Reiner Pope” (2026-04-29)
- [src-044] Thomas Kurian — “Welcome to Google Cloud Next ’26” (2026-04-22)
- [src-061] Lex Fridman – “State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490” (2026-01-31)
- [src-062] Lex Fridman – “Sundar Pichai: CEO of Google and Alphabet | Lex Fridman Podcast #471” (2025-06-05)