Publications
2025
- Vision models trained to estimate spatial latents learned similar ventral-stream-aligned representationsYudi Xie, Weichen Huang, Esther Alter, Jeremy Schwartz, Joshua B. Tenenbaum, and James J. DiCarloIn The Thirteenth International Conference on Learning Representations, 2025
Studies of the functional role of the primate ventral visual stream have traditionally focused on object categorization, often ignoring – despite much prior evidence – its role in estimating "spatial" latents such as object position and pose. Most leading ventral stream models are derived by optimizing networks for object categorization, which seems to imply that the ventral stream is also derived under such an objective. Here, we explore an alternative hypothesis: Might the ventral stream be optimized for estimating spatial latents? And a closely related question: How different – if at all – are representations learned from spatial latent estimation compared to categorization? To ask these questions, we leveraged synthetic image datasets generated by a 3D graphic engine and trained convolutional neural networks (CNNs) to estimate different combinations of spatial and category latents. We found that models trained to estimate just a few spatial latents achieve neural alignment scores comparable to those trained on hundreds of categories, and the spatial latent performance of models strongly correlates with their neural alignment. Spatial latent and category-trained models have very similar – but not identical – internal representations, especially in their early and middle layers. We provide evidence that this convergence is partly driven by non-target latent variability in the training data, which facilitates the implicit learning of representations of those non-target latents. Taken together, these results suggest that many training objectives, such as spatial latents, can lead to similar models aligned neurally with the ventral stream. Thus, one should not assume that the ventral stream is optimized for object categorization only. As a field, we need to continue to sharpen our measures of comparing models to brains to better understand the functional roles of the ventral stream.
@inproceedings{xie2025vision, title = {Vision models trained to estimate spatial latents learned similar ventral-stream-aligned representations}, author = {Xie, Yudi and Huang, Weichen and Alter, Esther and Schwartz, Jeremy and Tenenbaum, Joshua B. and DiCarlo, James J.}, booktitle = {The Thirteenth International Conference on Learning Representations}, year = {2025}, url = {https://openreview.net/forum?id=emMMa4q0qw}, }
2024
- Dopamine dynamics are dispensable for movement but promote reward responsesXintong Cai, Changliang Liu, Iku Tsutsui-Kimura, Joon-Hyuk Lee, Chong Guo, Aditi Banerjee, Jinoh Lee, Ryunosuke Amo, Yudi Xie, Tommaso Patriarchi, Yulong Li, Mitsuko Watabe-Uchida, and 2 more authorsNature, Oct 2024
Dopamine signalling modes differ in kinetics and spatial patterns of receptor activation. How these modes contribute to motor function, motivation and learning has long been debated. Here we show that action-potential-induced dopamine release is dispensable for movement initiation but supports reward-oriented behaviour. We generated mice with dopamine-neuron-specific knockout of the release site organizer protein RIM to disrupt action-potential-induced dopamine release. In these mice, rapid in vivo dopamine dynamics were strongly impaired, but baseline dopamine persisted and fully supported spontaneous movement. Conversely, reserpine-mediated dopamine depletion or blockade of dopamine receptors disrupted movement initiation. The dopamine precursor l-DOPA reversed reserpine-induced bradykinesia without restoring fast dopamine dynamics, a result that substantiated the conclusion that these dynamics are dispensable for movement initiation. In contrast to spontaneous movement, reward-oriented behaviour was impaired in dopamine-neuron-specific RIM knockout mice. In conditioned place preference and two-odour discrimination tasks, the mice effectively learned to distinguish the cues, which indicates that reward-based learning persists after RIM ablation. However, the performance vigour was reduced. During probabilistic cue-reward association, dopamine dynamics and conditioned responses assessed through anticipatory licking were disrupted. These results demonstrate that action-potential-induced dopamine release is dispensable for motor function and subsecond precision of movement initiation but promotes motivation and performance during reward-guided behaviours.
@article{Cai2024, title = {Dopamine dynamics are dispensable for movement but promote reward responses}, author = {Cai, Xintong and Liu, Changliang and Tsutsui-Kimura, Iku and Lee, Joon-Hyuk and Guo, Chong and Banerjee, Aditi and Lee, Jinoh and Amo, Ryunosuke and Xie, Yudi and Patriarchi, Tommaso and Li, Yulong and Watabe-Uchida, Mitsuko and Uchida, Naoshige and Kaeser, Pascal S.}, journal = {Nature}, year = {2024}, month = oct, day = {16}, issn = {1476-4687}, doi = {10.1038/s41586-024-08038-z}, url = {https://doi.org/10.1038/s41586-024-08038-z}, }
- Learning only a handful of latent variables produces neural-aligned CNN models of the ventral streamYudi Xie, Esther Alter, Jeremy Schwartz, and James J DiCarloOct 2024
Image-computable modeling of primate ventral stream visual processing has made great strides via brain-mapped versions of convolutional neural networks (CNNs) that are optimized on thousands of object categories (ImageNet), the performance of which strongly predicts CNNs’ neural alignment. However, human and primate visual intelligence extends far beyond object categorization, encompassing a diverse range of tasks, such as estimating the latent variables of object position or pose in the image. The influence of task choice on neural alignment in CNNs, compared to CNN architecture, remains underexplored, partly due to the scarcity of large-scale datasets with rich known labels beyond categories. 3D graphic engines, capable of creating training images with detailed information on various latent variables, offer a solution. Here, we asked how the choice of visual tasks that are used to train CNNs (i.e., the set of latent variables to be estimated) affects their ventral stream neural alignment. We focused on the estimation of variables such as object position and pose, and we tested CNNs’ neural alignment via the Brain-Score open science platform. We found some of these CNNs had neural alignment scores that were very close to those trained on ImageNet, even though their entire training experience has been on synthetic images. Additionally, we found training models on just a handful of latent variables achieved the same level of neural alignment as models trained on a much larger number of categories, suggesting that latent variable training is more efficient than category training in driving model-neural alignment. Moreover, we found that these models’ neural alignment scores scale with the amount of synthetic data used during training, suggesting the potential of obtaining more aligned models with larger synthetic datasets. This study highlights the effectiveness of using synthetic datasets and latent variables in advancing image-computable models of the ventral visual stream.
@article{xie2024learning, title = {Learning only a handful of latent variables produces neural-aligned CNN models of the ventral stream}, author = {Xie, Yudi and Alter, Esther and Schwartz, Jeremy and DiCarlo, James J}, year = {2024}, publisher = {Computational and Systems Neuroscience}, }
2023
- Natural constraints explain working memory capacity limitations in sensory-cognitive modelsYudi Xie, Yu Duan, Aohua Cheng, Pengcen Jiang, Christopher J Cueva, and Guangyu Robert YangbioRxiv, Oct 2023
The limited capacity of the brain to retain information in working memory has been well-known and studied for decades, yet the root of this limitation remains unclear. Here we built sensory-cognitive neural network models of working memory that perform tasks using raw visual stimuli. Contrary to intuitions that working memory capacity limitation stems from memory or cognitive constraints, we found that pre-training the sensory region of our models with natural images imposes sufficient constraints on models to exhibit a wide range of human-like behaviors in visual working memory tasks designed to probe capacity. Examining the neural mechanisms in our model reveals that capacity limitation mainly arises in a bottom-up manner. Our models offer a principled and functionally grounded explanation for the working memory capacity limitation without parameter fitting to behavioral data or much hyperparameter tuning. This work highlights the importance of developing models with realistic sensory processing even when investigating memory and other high-level cognitive phenomena.
@article{xie2023natural, title = {Natural constraints explain working memory capacity limitations in sensory-cognitive models}, author = {Xie, Yudi and Duan, Yu and Cheng, Aohua and Jiang, Pengcen and Cueva, Christopher J and Yang, Guangyu Robert}, journal = {bioRxiv}, pages = {2023--03}, year = {2023}, publisher = {Cold Spring Harbor Laboratory}, }
2022
- Human-like capacity limitation in multi-system models of working memoryYudi Xie, Yu Duan, Aohua Cheng, Pengcen Jiang, Christopher Cueva, and Guangyu Robert YangOct 2022
Working memory (WM) enables humans and other animals to hold information temporarily for various kinds of mental processing. WM has limited capacity and the maintenance of information in WM involves interactions between multiple brain regions. To account for such properties, we built multi-system models of WM, i.e., models that involve both sensory and cognitive systems, and their interactions. Our contributions are twofold, involving engineering and science. Engineering-wise, we built a framework to systematically construct such models to generate and test hypotheses in neuroscience research. Our models take sensory stimuli in their raw form, and reproduce diverse behavioral and neural findings across classical and recent WM experiments. Science-wise, our framework allows us to dissect the sensory and cognitive system’s contribution to WM capacity limitation. Our models reproduced behavioral findings in several WM tasks commonly used to assess capacity limitation. We found human-like capacity limitations arise in models with sensory systems pre-trained to recognize natural images, but not in models trained end-to-end on WM tasks. Our results suggest that WM capacity limitation is partly attributed to the sensory system when it is optimized for naturalistic objectives other than tasks artificially designed to probe WM.
@article{xie2022human, title = {Human-like capacity limitation in multi-system models of working memory}, author = {Xie, Yudi and Duan, Yu and Cheng, Aohua and Jiang, Pengcen and Cueva, Christopher and Yang, Guangyu Robert}, year = {2022}, publisher = {Cognitive Computational Neuroscience}, }
- Striatal dopamine explains novelty-induced behavioral dynamics and individual variability in threat predictionKorleki Akiti, Iku Tsutsui-Kimura, Yudi Xie, Alexander Mathis, Jeffrey E Markowitz, Rockwell Anyoha, Sandeep Robert Datta, Mackenzie Weygandt Mathis, Naoshige Uchida, and Mitsuko Watabe-UchidaNeuron, Oct 2022
Animals both explore and avoid novel objects in the environment, but the neural mechanisms that underlie these behaviors and their dynamics remain uncharacterized. Here, we used multi-point tracking (DeepLabCut) and behavioral segmentation (MoSeq) to characterize the behavior of mice freely interacting with a novel object. Novelty elicits a characteristic sequence of behavior, starting with investigatory approach and culminating in object engagement or avoidance. Dopamine in the tail of the striatum (TS) suppresses engagement, and dopamine responses were predictive of individual variability in behavior. Behavioral dynamics and individual variability are explained by a reinforcement-learning (RL) model of threat prediction in which behavior arises from a novelty-induced initial threat prediction (akin to “shaping bonus”) and a threat prediction that is learned through dopamine-mediated threat prediction errors. These results uncover an algorithmic similarity between reward- and threat-related dopamine sub-systems.
@article{akiti2022striatal, title = {Striatal dopamine explains novelty-induced behavioral dynamics and individual variability in threat prediction}, author = {Akiti, Korleki and Tsutsui-Kimura, Iku and Xie, Yudi and Mathis, Alexander and Markowitz, Jeffrey E and Anyoha, Rockwell and Datta, Sandeep Robert and Mathis, Mackenzie Weygandt and Uchida, Naoshige and Watabe-Uchida, Mitsuko}, journal = {Neuron}, volume = {110}, number = {22}, pages = {3789--3804}, year = {2022}, publisher = {Elsevier}, }
2020
- ElifeFlexible motor sequence generation during stereotyped escape responsesYuan Wang, Xiaoqian Zhang, Qi Xin, Wesley Hung, Jeremy Florman, Jing Huo, Tianqi Xu, Yudi Xie, Mark J Alkema, Mei Zhen, and Quan WenElife, Oct 2020
Complex animal behaviors arise from a flexible combination of stereotyped motor primitives. Here we use the escape responses of the nematode Caenorhabditis elegans to study how a nervous system dynamically explores the action space. The initiation of the escape responses is predictable: the animal moves away from a potential threat, a mechanical or thermal stimulus. But the motor sequence and the timing that follow are variable. We report that a feedforward excitation between neurons encoding distinct motor states underlies robust motor sequence generation, while mutual inhibition between these neurons controls the flexibility of timing in a motor sequence. Electrical synapses contribute to feedforward coupling whereas glutamatergic synapses contribute to inhibition. We conclude that C. elegans generates robust and flexible motor sequences by combining an excitatory coupling and a winner-take-all operation via mutual inhibition between motor modules.
@article{wang2020flexible, title = {Flexible motor sequence generation during stereotyped escape responses}, author = {Wang, Yuan and Zhang, Xiaoqian and Xin, Qi and Hung, Wesley and Florman, Jeremy and Huo, Jing and Xu, Tianqi and Xie, Yudi and Alkema, Mark J and Zhen, Mei and Wen, Quan}, journal = {Elife}, volume = {9}, pages = {e56942}, year = {2020}, publisher = {eLife Sciences Publications, Ltd}, }