research
At Yaak, we are building spatial intelligence that is trained entirely through self-supervision, and harnesses rich information within different modalities.
And it's being built in the open.
Large Multimodal models
Large language models (LLMs), image and video models, and vision-language models shed an important insight; Pre-training large models on a diverse set of tasks, as well as fine-tuning or downstream alignment, is now an industry-wide practice.
Spatial intelligence and robotics is a challenging problem space due to the vast variation in sensor setup, modalities, and controls on different embodied platforms. As of 2024, the industry-wide accepted recipe of pre-training and fine-tuning (or alignment) has yet to show potential for spatial intelligence.
At the forefront of spatial intelligence, Yaak's research is focused on realizing pre-trained, large, multimodal, multi-embodiment, and multi-task models that can be fine-tuned for downstream tasks.
Click on a spatial intelligence topic below to learn more about our current findings.
active topics:
Multimodal datasets
Explore
Cross-modal learning
Explore
Spatial intelligence
Explore