Gemma-4 has no heavy vision encoder — it cannot read multispectral Earth-observation bands on its own.
Here a frozen Gemma-4-12B-it reasons about a place using the free co-located TESSERA embedding,
decoded over the whole ~2.5 km scene (native 10 m, 0 transmitted bytes) into land-cover composition and
spectral indices, plus emem observations — each carrying an ed25519-signed fact_cid. Co-registering
that embedding is what lets an encoderless model answer Earth-observation questions. Toggle grounding off to see
the same model on its prior alone (the live A/B). Honest by construction: the products are faithful recall of the
Sentinel-2 archive TESSERA ingests, not independent sensing; single-date appearance and cell-level change are out
of scope, and the decode reference is 32 sites so exotic places generalise less well.
Each grounded answer also shows the complete bridge: the encode-on-satellite step (the image
compressed to a ~1.5 kB int4 downlink) and the spectral-index maps TESSERA recovers that the downlink
cannot. First view of a place takes ~a minute (live tile fetch + render); it is cached after, and the
preset places are pre-warmed.
See also the global map and the
spectral-index maps (with vs without TESSERA).