Publications | Yudi Xie (谢禹)

2025

ICLR Blogpost
How do we interpret the outputs of a neural network trained on classification?

Yudi Xie

In The Fourth Blogpost Track at ICLR 2025, 2025

Abs Bib PDF

Deep neural networks are widely used for classification tasks, but the interpretation of their output activations is often unclear. This tutorial article explains how these outputs can be understood as approximations of the Bayesian posterior. We showed that, in theory, the loss function for classification tasks – derived by maximum likelihood – is minimized by the Bayesian posterior. We conducted empirical studies training neural networks to classify synthetic data from a known generative model. In a simple classification task, the network closely approximates the theoretically derived posterior. However, a few changes in the task can make accurate approximation much more difficult. The ability of the networks to approximate the posterior depends on multiple factors, such as the complexity of the posterior and whether there is sufficient data for learning.
@inproceedings{xie2025how, title = {How do we interpret the outputs of a neural network trained on classification?}, author = {Xie, Yudi}, booktitle = {The Fourth Blogpost Track at ICLR 2025}, year = {2025}, url = {https://openreview.net/forum?id=36HVJOTXTs}, }
ICLR
Vision models trained to estimate spatial latents learned similar ventral-stream-aligned representations

Yudi Xie, Weichen Huang, Esther Alter, Jeremy Schwartz, Joshua B. Tenenbaum, and James J. DiCarlo

In The Thirteenth International Conference on Learning Representations, 2025

Abs Bib PDF

Studies of the functional role of the primate ventral visual stream have traditionally focused on object categorization, often ignoring – despite much prior evidence – its role in estimating "spatial" latents such as object position and pose. Most leading ventral stream models are derived by optimizing networks for object categorization, which seems to imply that the ventral stream is also derived under such an objective. Here, we explore an alternative hypothesis: Might the ventral stream be optimized for estimating spatial latents? And a closely related question: How different – if at all – are representations learned from spatial latent estimation compared to categorization? To ask these questions, we leveraged synthetic image datasets generated by a 3D graphic engine and trained convolutional neural networks (CNNs) to estimate different combinations of spatial and category latents. We found that models trained to estimate just a few spatial latents achieve neural alignment scores comparable to those trained on hundreds of categories, and the spatial latent performance of models strongly correlates with their neural alignment. Spatial latent and category-trained models have very similar – but not identical – internal representations, especially in their early and middle layers. We provide evidence that this convergence is partly driven by non-target latent variability in the training data, which facilitates the implicit learning of representations of those non-target latents. Taken together, these results suggest that many training objectives, such as spatial latents, can lead to similar models aligned neurally with the ventral stream. Thus, one should not assume that the ventral stream is optimized for object categorization only. As a field, we need to continue to sharpen our measures of comparing models to brains to better understand the functional roles of the ventral stream.
@inproceedings{xie2025vision, title = {Vision models trained to estimate spatial latents learned similar ventral-stream-aligned representations}, author = {Xie, Yudi and Huang, Weichen and Alter, Esther and Schwartz, Jeremy and Tenenbaum, Joshua B. and DiCarlo, James J.}, booktitle = {The Thirteenth International Conference on Learning Representations}, year = {2025}, url = {https://openreview.net/forum?id=emMMa4q0qw}, }

2024

Nature
Dopamine dynamics are dispensable for movement but promote reward responses

Xintong Cai, Changliang Liu, Iku Tsutsui-Kimura, Joon-Hyuk Lee, Chong Guo, Aditi Banerjee, Jinoh Lee, Ryunosuke Amo, Yudi Xie, Tommaso Patriarchi, Yulong Li, Mitsuko Watabe-Uchida, and 2 more authors

Nature, Oct 2024

Abs Bib PDF

Dopamine signalling modes differ in kinetics and spatial patterns of receptor activation. How these modes contribute to motor function, motivation and learning has long been debated. Here we show that action-potential-induced dopamine release is dispensable for movement initiation but supports reward-oriented behaviour. We generated mice with dopamine-neuron-specific knockout of the release site organizer protein RIM to disrupt action-potential-induced dopamine release. In these mice, rapid in vivo dopamine dynamics were strongly impaired, but baseline dopamine persisted and fully supported spontaneous movement. Conversely, reserpine-mediated dopamine depletion or blockade of dopamine receptors disrupted movement initiation. The dopamine precursor l-DOPA reversed reserpine-induced bradykinesia without restoring fast dopamine dynamics, a result that substantiated the conclusion that these dynamics are dispensable for movement initiation. In contrast to spontaneous movement, reward-oriented behaviour was impaired in dopamine-neuron-specific RIM knockout mice. In conditioned place preference and two-odour discrimination tasks, the mice effectively learned to distinguish the cues, which indicates that reward-based learning persists after RIM ablation. However, the performance vigour was reduced. During probabilistic cue-reward association, dopamine dynamics and conditioned responses assessed through anticipatory licking were disrupted. These results demonstrate that action-potential-induced dopamine release is dispensable for motor function and subsecond precision of movement initiation but promotes motivation and performance during reward-guided behaviours.
@article{Cai2024, title = {Dopamine dynamics are dispensable for movement but promote reward responses}, author = {Cai, Xintong and Liu, Changliang and Tsutsui-Kimura, Iku and Lee, Joon-Hyuk and Guo, Chong and Banerjee, Aditi and Lee, Jinoh and Amo, Ryunosuke and Xie, Yudi and Patriarchi, Tommaso and Li, Yulong and Watabe-Uchida, Mitsuko and Uchida, Naoshige and Kaeser, Pascal S.}, journal = {Nature}, year = {2024}, month = oct, day = {16}, issn = {1476-4687}, doi = {10.1038/s41586-024-08038-z}, url = {https://doi.org/10.1038/s41586-024-08038-z}, }
COSYNE
Learning only a handful of latent variables produces neural-aligned CNN models of the ventral stream

Yudi Xie, Esther Alter, Jeremy Schwartz, and James J DiCarlo

Oct 2024

Abs Bib PDF

Image-computable modeling of primate ventral stream visual processing has made great strides via brain-mapped versions of convolutional neural networks (CNNs) that are optimized on thousands of object categories (ImageNet), the performance of which strongly predicts CNNs’ neural alignment. However, human and primate visual intelligence extends far beyond object categorization, encompassing a diverse range of tasks, such as estimating the latent variables of object position or pose in the image. The influence of task choice on neural alignment in CNNs, compared to CNN architecture, remains underexplored, partly due to the scarcity of large-scale datasets with rich known labels beyond categories. 3D graphic engines, capable of creating training images with detailed information on various latent variables, offer a solution. Here, we asked how the choice of visual tasks that are used to train CNNs (i.e., the set of latent variables to be estimated) affects their ventral stream neural alignment. We focused on the estimation of variables such as object position and pose, and we tested CNNs’ neural alignment via the Brain-Score open science platform. We found some of these CNNs had neural alignment scores that were very close to those trained on ImageNet, even though their entire training experience has been on synthetic images. Additionally, we found training models on just a handful of latent variables achieved the same level of neural alignment as models trained on a much larger number of categories, suggesting that latent variable training is more efficient than category training in driving model-neural alignment. Moreover, we found that these models’ neural alignment scores scale with the amount of synthetic data used during training, suggesting the potential of obtaining more aligned models with larger synthetic datasets. This study highlights the effectiveness of using synthetic datasets and latent variables in advancing image-computable models of the ventral visual stream.
@article{xie2024learning, title = {Learning only a handful of latent variables produces neural-aligned CNN models of the ventral stream}, author = {Xie, Yudi and Alter, Esther and Schwartz, Jeremy and DiCarlo, James J}, year = {2024}, publisher = {Computational and Systems Neuroscience}, }

2023

bioRxiv
Natural constraints explain working memory capacity limitations in sensory-cognitive models

Yudi Xie, Yu Duan, Aohua Cheng, Pengcen Jiang, Christopher J Cueva, and Guangyu Robert Yang

bioRxiv, Oct 2023

Abs Bib PDF

The limited capacity of the brain to retain information in working memory has been well-known and studied for decades, yet the root of this limitation remains unclear. Here we built sensory-cognitive neural network models of working memory that perform tasks using raw visual stimuli. Contrary to intuitions that working memory capacity limitation stems from memory or cognitive constraints, we found that pre-training the sensory region of our models with natural images imposes sufficient constraints on models to exhibit a wide range of human-like behaviors in visual working memory tasks designed to probe capacity. Examining the neural mechanisms in our model reveals that capacity limitation mainly arises in a bottom-up manner. Our models offer a principled and functionally grounded explanation for the working memory capacity limitation without parameter fitting to behavioral data or much hyperparameter tuning. This work highlights the importance of developing models with realistic sensory processing even when investigating memory and other high-level cognitive phenomena.
@article{xie2023natural, title = {Natural constraints explain working memory capacity limitations in sensory-cognitive models}, author = {Xie, Yudi and Duan, Yu and Cheng, Aohua and Jiang, Pengcen and Cueva, Christopher J and Yang, Guangyu Robert}, journal = {bioRxiv}, pages = {2023--03}, year = {2023}, publisher = {Cold Spring Harbor Laboratory}, }

2022

CCN
Human-like capacity limitation in multi-system models of working memory

Yudi Xie, Yu Duan, Aohua Cheng, Pengcen Jiang, Christopher Cueva, and Guangyu Robert Yang

Oct 2022

Abs Bib PDF

Working memory (WM) enables humans and other animals to hold information temporarily for various kinds of mental processing. WM has limited capacity and the maintenance of information in WM involves interactions between multiple brain regions. To account for such properties, we built multi-system models of WM, i.e., models that involve both sensory and cognitive systems, and their interactions. Our contributions are twofold, involving engineering and science. Engineering-wise, we built a framework to systematically construct such models to generate and test hypotheses in neuroscience research. Our models take sensory stimuli in their raw form, and reproduce diverse behavioral and neural findings across classical and recent WM experiments. Science-wise, our framework allows us to dissect the sensory and cognitive system’s contribution to WM capacity limitation. Our models reproduced behavioral findings in several WM tasks commonly used to assess capacity limitation. We found human-like capacity limitations arise in models with sensory systems pre-trained to recognize natural images, but not in models trained end-to-end on WM tasks. Our results suggest that WM capacity limitation is partly attributed to the sensory system when it is optimized for naturalistic objectives other than tasks artificially designed to probe WM.
@article{xie2022human, title = {Human-like capacity limitation in multi-system models of working memory}, author = {Xie, Yudi and Duan, Yu and Cheng, Aohua and Jiang, Pengcen and Cueva, Christopher and Yang, Guangyu Robert}, year = {2022}, publisher = {Cognitive Computational Neuroscience}, }
Neuron
Striatal dopamine explains novelty-induced behavioral dynamics and individual variability in threat prediction

Korleki Akiti, Iku Tsutsui-Kimura, Yudi Xie, Alexander Mathis, Jeffrey E Markowitz, Rockwell Anyoha, Sandeep Robert Datta, Mackenzie Weygandt Mathis, Naoshige Uchida, and Mitsuko Watabe-Uchida

Neuron, Oct 2022

Abs Bib PDF

Animals both explore and avoid novel objects in the environment, but the neural mechanisms that underlie these behaviors and their dynamics remain uncharacterized. Here, we used multi-point tracking (DeepLabCut) and behavioral segmentation (MoSeq) to characterize the behavior of mice freely interacting with a novel object. Novelty elicits a characteristic sequence of behavior, starting with investigatory approach and culminating in object engagement or avoidance. Dopamine in the tail of the striatum (TS) suppresses engagement, and dopamine responses were predictive of individual variability in behavior. Behavioral dynamics and individual variability are explained by a reinforcement-learning (RL) model of threat prediction in which behavior arises from a novelty-induced initial threat prediction (akin to “shaping bonus”) and a threat prediction that is learned through dopamine-mediated threat prediction errors. These results uncover an algorithmic similarity between reward- and threat-related dopamine sub-systems.
@article{akiti2022striatal, title = {Striatal dopamine explains novelty-induced behavioral dynamics and individual variability in threat prediction}, author = {Akiti, Korleki and Tsutsui-Kimura, Iku and Xie, Yudi and Mathis, Alexander and Markowitz, Jeffrey E and Anyoha, Rockwell and Datta, Sandeep Robert and Mathis, Mackenzie Weygandt and Uchida, Naoshige and Watabe-Uchida, Mitsuko}, journal = {Neuron}, volume = {110}, number = {22}, pages = {3789--3804}, year = {2022}, publisher = {Elsevier}, }

2020

Elife
Flexible motor sequence generation during stereotyped escape responses

Yuan Wang, Xiaoqian Zhang, Qi Xin, Wesley Hung, Jeremy Florman, Jing Huo, Tianqi Xu, Yudi Xie, Mark J Alkema, Mei Zhen, and Quan Wen

Elife, Oct 2020

Abs Bib PDF

Complex animal behaviors arise from a flexible combination of stereotyped motor primitives. Here we use the escape responses of the nematode Caenorhabditis elegans to study how a nervous system dynamically explores the action space. The initiation of the escape responses is predictable: the animal moves away from a potential threat, a mechanical or thermal stimulus. But the motor sequence and the timing that follow are variable. We report that a feedforward excitation between neurons encoding distinct motor states underlies robust motor sequence generation, while mutual inhibition between these neurons controls the flexibility of timing in a motor sequence. Electrical synapses contribute to feedforward coupling whereas glutamatergic synapses contribute to inhibition. We conclude that C. elegans generates robust and flexible motor sequences by combining an excitatory coupling and a winner-take-all operation via mutual inhibition between motor modules.
@article{wang2020flexible, title = {Flexible motor sequence generation during stereotyped escape responses}, author = {Wang, Yuan and Zhang, Xiaoqian and Xin, Qi and Hung, Wesley and Florman, Jeremy and Huo, Jing and Xu, Tianqi and Xie, Yudi and Alkema, Mark J and Zhen, Mei and Wen, Quan}, journal = {Elife}, volume = {9}, pages = {e56942}, year = {2020}, publisher = {eLife Sciences Publications, Ltd}, }