Learn Google Deep Mind Technologies || Latest GeminiAI #1.5 #1.0 & More – Speedlink3 Communication And Technologies
GoogleLife Style

Learn Google Deep Mind Technologies || Latest GeminiAI #1.5 #1.0 & More

Welcome to 
the Gemini era

The Gemini ecosystem represents Google’s most capable AI.

Our Gemini models are built from the ground up for multimodality — reasoning seamlessly across text, images, audio, video, and code.

Gemini represents a significant leap forward in how AI can help improve our daily lives.

Introducing
Gemini 1.5

Our next-generation model

Gemini 1.5 delivers dramatically enhanced performance with a more efficient architecture. The first model we’ve released for early testing, Gemini 1.5 Pro, introduces a breakthrough experimental feature in long-context understanding.

Read the technical paperRead the blog post

Reasoning about vast
amounts of information

Gemini 1.5 Pro can analyze and summarize the 402-page transcripts from Apollo 11’s mission to the moon.

Better understanding
across modalities

Gemini 1.5 Pro can perform highly sophisticated reasoning tasks for different modalities, like a silent Buster Keaton movie.

https://youtube.com/watch?v=wa0MT8OwHuk%3Frel%3D0%26enablejsapi%3D1%26origin%3Dhttps%253A%252F%252Fdeepmind.google%26widgetid%3D5

Problem-solving with
longer blocks of code

Gemini 1.5 Pro can reason across 100,000 lines of code giving helpful solutions, modifications and explanations.

https://youtube.com/watch?v=SSnsmqIj1MI%3Frel%3D0%26enablejsapi%3D1%26origin%3Dhttps%253A%252F%252Fdeepmind.google%26widgetid%3D7

Gemini comes in three model sizes

Ultra

1.0

Our most capable and largest model for highly-complex tasks.

Pro

1.01.5

Our best model for scaling across a wide range of tasks.

Nano

1.0

Our most efficient model for on-device tasks.

Meet the first version of Gemini— our most capable AI model.

Gemini 1.0 Ultra

90.0%

90.0%

CoT@32*

89.8% Human expert (MMLU)

86.4%

86.4%

5-shot* (reported)
Previous SOTA (GPT-4)

*Note that evaluations of previous SOTA models use different prompting techniques.

Gemini is the first model to outperform human experts on MMLU (Massive Multitask Language Understanding), one of the most popular methods to test the knowledge and problem-solving abilities of AI models.

Gemini 1.0 Ultra surpasses state-of-the-art performance on a range of benchmarks including text and coding.

TEXT

CapabilityBenchmark
Higher is better
DescriptionGemini 1.0 UltraGPT-4API numbers calculated where reported numbers were missing
GeneralMMLURepresentation of questions in 57 subjects (incl. STEM, humanities, and others)90.0%CoT@32*86.4%5-shot** (reported)
ReasoningBig-Bench HardDiverse set of challenging tasks requiring multi-step reasoning83.6%3-shot83.1%3-shot (API)
DROPReading comprehension (F1 Score)82.4Variable shots80.93-shot (reported)
HellaSwagCommonsense reasoning for everyday tasks87.8%10-shot*95.3%10-shot* (reported)
MathGSM8KBasic arithmetic manipulations (incl. Grade School math problems)94.4%maj1@3292.0%5-shot CoT (reported)
MATHChallenging math problems (incl. algebra, geometry, pre-calculus, and others)53.2%4-shot52.9%4-shot (API)
CodeHumanEvalPython code generation74.4%0-shot (IT)*67.0%0-shot* (reported)
Natural2CodePython code generation. New held out dataset HumanEval-like, not leaked on the web74.9%0-shot73.9%0-shot (API)

*See the technical report for details on performance with other methodologies
**GPT-4 scores 87.29% with CoT@32—see the technical report for full comparison

Our Gemini 1.0 models surpass state-of-the-art performance on a range of multimodal benchmarks.

MULTIMODAL

CapabilityBenchmarkDescription
Higher is better unless otherwise noted
GeminiGPT-4VPrevious SOTA model listed when capability is not supported in GPT-4V
ImageMMMUMulti-discipline college-level reasoning problems59.4%0-shot pass@1
Gemini 1.0 Ultra (pixel only*)
56.8%0-shot pass@1
GPT-4V
VQAv2Natural image understanding77.8%0-shot
Gemini 1.0 Ultra (pixel only*)
77.2%0-shot
GPT-4V
TextVQAOCR on natural images82.3%0-shot
Gemini 1.0 Ultra (pixel only*)
78%0-shot
GPT-4V
DocVQADocument understanding90.9%0-shot
Gemini 1.0 Ultra (pixel only*)
88.4%0-shot
GPT-4V (pixel only)
Infographic VQAInfographic understanding80.3%0-shot
Gemini 1.0 Ultra (pixel only*)
75.1%0-shot
GPT-4V (pixel only)
MathVistaMathematical reasoning in visual contexts53%0-shot
Gemini 1.0 Ultra (pixel only*)
49.9%0-shot
GPT-4V
VideoVATEXEnglish video captioning
(CIDEr)
62.74-shot
Gemini 1.0 Ultra
564-shot
DeepMind Flamingo
Perception Test MCQAVideo question answering54.7%0-shot
Gemini 1.0 Ultra
46.3%0-shot
SeViLA
AudioCoVoST 2 (21 languages)Automatic speech translation
(BLEU score)
40.1Gemini 1.0 Pro29.1Whisper v2
FLEURS (62 languages)Automatic speech recognition
(based on word error rate, lower is better)
7.6%Gemini 1.0 Pro17.6%Whisper v3

*Gemini image benchmarks are pixel only—no assistance from OCR systemsRead the technical report

Anything to anything

Gemini models are natively multimodal, which gives you the potential to transform any type of input into any type of output.

Gemini models can generate code based on different inputs you give it.

Geminimodelscangeneratecodebasedondifferentinputsyougiveit.

Could Gemini help make a demo based on this video?

Gemini

I see a murmuration of starlings, so I coded a flocking simulation.class Boid { constructor(x, y) { this.pos = new p5.Vector(x, y); this.vel = p5.Vector.random2D(); this.vel.setMag(random(2, 4)); this.acc = new p5.Vector(); this.maxForce = 0.2; this.maxSpeed = 4; } }

Pink and blue mouse octopus knit
Pink and blue mouse knit

The potential of Gemini

Learn about what our Gemini models can do from some of the people who built it.

Read the blog post

Image: two people standing at a table about to say something.

TAYLOR APPLEBAUM AND SEBASTIAN NOWOZIN

Unlocking insights in scientific literature

Image: two people standing at a table with a computer in front of a curtain.

RÉMI LEBLOND AND GABRIELA SURITA

Excelling at competitive programming

Image: a person with glasses standing in a room, smiling and ready to speak.

ADRIÀ RECASENS

Processing and understanding raw audio signal end-to-end

Image: a person with glasses sitting in front of a computer speaking.

SAM CHEUNG

Explaining reasoning in math and physics

Image: a person with a beard standing in front of an open comptuer, similing and ready to speak.

PALASH NANDY

Reasoning about user intent to generate bespoke experiences

Building and deploying
Gemini responsibly

We’ve built our Gemini models responsibly from the start, incorporating safeguards and working together with partners to make it safer and more inclusive.

Try Gemini Advanced with our most capable AI model

With 1.0 Ultra, Gemini Advanced is far more capable at coding, reasoning, and creative collaboration.Try for 2 months, at no chargeLearn more

Line background

Build with Gemini

Integrate Gemini models into your applications with Google AI Studio and Google Cloud Vertex AI.

ai.google.dev

Leave a Reply

Back to top button