Conjuring Semantic Similarity

¹Department of Computer Science, University of California, Los Angeles

Abstract

The semantic similarity between sample expressions measures the distance between their latent 'meaning'. These meanings are themselves typically represented by textual expressions. We propose a novel approach whereby the semantic similarity among textual expressions is based not on other expressions they can be rephrased as, but rather based on the imagery they evoke. While this is not possible with humans, generative models allow us to easily visualize and compare generated images, or their distribution, evoked by a textual prompt. Therefore, we characterize the semantic similarity between two textual expressions simply as the distance between image distributions they induce, or 'conjure.' We show that by choosing the Jeffreys divergence between the reverse-time diffusion stochastic differential equations (SDEs) induced by each textual expression, this can be directly computed via Monte-Carlo sampling. Our method contributes a novel perspective on semantic similarity that not only aligns with human-annotated scores, but also opens up new avenues for the evaluation of text-conditioned generative models while offering better interpretability of their learnt representations.

Meanings from Generative Processes, not Generative Models

A "well-trained" (text/image) generative model equipped with a random (diffusion/next-token) sampler yields generative processes which evidently attributes "no meaning" to its inputs (initial conditions).

Liu et al. (2024) view meanings in autoregressive LLMs via distributions over their generated continuations, or trajectories. In this work, we show that even image-generation models can capture textual meaning through distributions over their generative trajectories in image space.

Crucially, meanings are not only a function of a trained model (weights, architecture etc.), but rather a function of the generative process. Different generative processes on the same model can yield completely different meaning representations for the same inputs. This key feature of our definitions differentiates our work from prior art.

BibTeX Citation

Conjuring Semantic Similarity

@inproceedings{liu2026conjuring,
  title={Conjuring Semantic Similarity},
  author={Liu, Tian Yu and Soatto, Stefano},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026}
}
}

Meanings Representations from Trajectories

@inproceedings{liu2024meaning,
  title={Meaning Representations from Trajectories in Autoregressive Models},
  author={Liu, Tian Yu and Trager, Matthew and Achille, Alessandro and Perera, Pramuditha and Zancato, Luca and Soatto, Stefano},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024}
}
}

Conjuring Semantic Similarity

Visualizing Semantic Divergence

Abstract

Meanings from Generative Processes, not Generative Models

Textual Trajectories

Visual Trajectories

BibTeX Citation