research

At Yaak, we are building spatial intelligence that is trained entirely through self-supervision, and harnesses rich information within different modalities.

And it's being built in the open.

Large Multimodal models

Large language models (LLMs), image and video models, and vision-language models shed an important insight; Pre-training large models on a diverse set of tasks, as well as fine-tuning or downstream alignment, is now an industry-wide practice.

Spatial intelligence and robotics is a challenging problem space due to the vast variation in sensor setup, modalities, and controls on different embodied platforms. As of 2024, the industry-wide accepted recipe of pre-training and fine-tuning (or alignment) has yet to show potential for spatial intelligence.

At the forefront of spatial intelligence, Yaak's research is focused on realizing pre-trained, large, multimodal, multi-embodiment, and multi-task models that can be fine-tuned for downstream tasks.

Click on a spatial intelligence topic below to learn more about our current findings.

active topics:

Multimodal datasets
Multimodal datasets

Multimodal datasets

Explore

Cross-model learning
Cross-model learning

Cross-modal learning

Explore

Spatial intelligence
Spatial intelligence

Spatial intelligence

Explore