Publications
2024
- Vision CNNs trained to estimate spatial latents learned similar ventral-stream-aligned representationsYudi Xie, Weichen Huang, Esther Alter, Jeremy Schwartz, Joshua B. Tenenbaum, and James J. DiCarlo2024
Studies of the functional role of the primate ventral visual stream have traditionally focused on object categorization, often ignoring – despite much prior evidence – its role in estimating "spatial" latents such as object position and pose. Most leading ventral stream models are derived by optimizing networks for object categorization, which seems to imply that the ventral stream is also derived under such an objective. Here, we explore an alternative hypothesis: Might the ventral stream be optimized for estimating spatial latents? And a closely related question: How different – if at all – are representations learned from spatial latent estimation compared to categorization? To ask these questions, we leveraged synthetic image datasets generated by a 3D graphic engine and trained convolutional neural networks (CNNs) to estimate different combinations of spatial and category latents. We found that models trained to estimate just a few spatial latents achieve neural alignment scores comparable to those trained on hundreds of categories, and the spatial latent performance of models strongly correlates with their neural alignment. Spatial latent and category-trained models have very similar – but not identical – internal representations, especially in their early and middle layers. We provide evidence that this convergence is partly driven by non-target latent variability in the training data, which facilitates the implicit learning of representations of those non-target latents. Taken together, these results suggest that many training objectives, such as spatial latents, can lead to similar models aligned neurally with the ventral stream. Thus, one should not assume that the ventral stream is optimized for object categorization only. As a field, we need to continue to sharpen our measures of comparing models to brains to better understand the functional roles of the ventral stream.
- Dopamine dynamics are dispensable for movement but promote reward responsesXintong Cai, Changliang Liu, Iku Tsutsui-Kimura, Joon-Hyuk Lee, Chong Guo, Aditi Banerjee, Jinoh Lee, Ryunosuke Amo, Yudi Xie, Tommaso Patriarchi, Yulong Li, Mitsuko Watabe-Uchida, and 2 more authorsNature, Oct 2024
Dopamine signalling modes differ in kinetics and spatial patterns of receptor activation. How these modes contribute to motor function, motivation and learning has long been debated. Here we show that action-potential-induced dopamine release is dispensable for movement initiation but supports reward-oriented behaviour. We generated mice with dopamine-neuron-specific knockout of the release site organizer protein RIM to disrupt action-potential-induced dopamine release. In these mice, rapid in vivo dopamine dynamics were strongly impaired, but baseline dopamine persisted and fully supported spontaneous movement. Conversely, reserpine-mediated dopamine depletion or blockade of dopamine receptors disrupted movement initiation. The dopamine precursor l-DOPA reversed reserpine-induced bradykinesia without restoring fast dopamine dynamics, a result that substantiated the conclusion that these dynamics are dispensable for movement initiation. In contrast to spontaneous movement, reward-oriented behaviour was impaired in dopamine-neuron-specific RIM knockout mice. In conditioned place preference and two-odour discrimination tasks, the mice effectively learned to distinguish the cues, which indicates that reward-based learning persists after RIM ablation. However, the performance vigour was reduced. During probabilistic cue-reward association, dopamine dynamics and conditioned responses assessed through anticipatory licking were disrupted. These results demonstrate that action-potential-induced dopamine release is dispensable for motor function and subsecond precision of movement initiation but promotes motivation and performance during reward-guided behaviours.
- Learning only a handful of latent variables produces neural-aligned CNN models of the ventral streamYudi Xie, Esther Alter, Jeremy Schwartz, and James J DiCarloOct 2024
Image-computable modeling of primate ventral stream visual processing has made great strides via brain-mapped versions of convolutional neural networks (CNNs) that are optimized on thousands of object categories (ImageNet), the performance of which strongly predicts CNNs’ neural alignment. However, human and primate visual intelligence extends far beyond object categorization, encompassing a diverse range of tasks, such as estimating the latent variables of object position or pose in the image. The influence of task choice on neural alignment in CNNs, compared to CNN architecture, remains underexplored, partly due to the scarcity of large-scale datasets with rich known labels beyond categories. 3D graphic engines, capable of creating training images with detailed information on various latent variables, offer a solution. Here, we asked how the choice of visual tasks that are used to train CNNs (i.e., the set of latent variables to be estimated) affects their ventral stream neural alignment. We focused on the estimation of variables such as object position and pose, and we tested CNNs’ neural alignment via the Brain-Score open science platform. We found some of these CNNs had neural alignment scores that were very close to those trained on ImageNet, even though their entire training experience has been on synthetic images. Additionally, we found training models on just a handful of latent variables achieved the same level of neural alignment as models trained on a much larger number of categories, suggesting that latent variable training is more efficient than category training in driving model-neural alignment. Moreover, we found that these models’ neural alignment scores scale with the amount of synthetic data used during training, suggesting the potential of obtaining more aligned models with larger synthetic datasets. This study highlights the effectiveness of using synthetic datasets and latent variables in advancing image-computable models of the ventral visual stream.
2023
- Natural constraints explain working memory capacity limitations in sensory-cognitive modelsYudi Xie, Yu Duan, Aohua Cheng, Pengcen Jiang, Christopher J Cueva, and Guangyu Robert YangbioRxiv, Oct 2023
The limited capacity of the brain to retain information in working memory has been well-known and studied for decades, yet the root of this limitation remains unclear. Here we built sensory-cognitive neural network models of working memory that perform tasks using raw visual stimuli. Contrary to intuitions that working memory capacity limitation stems from memory or cognitive constraints, we found that pre-training the sensory region of our models with natural images imposes sufficient constraints on models to exhibit a wide range of human-like behaviors in visual working memory tasks designed to probe capacity. Examining the neural mechanisms in our model reveals that capacity limitation mainly arises in a bottom-up manner. Our models offer a principled and functionally grounded explanation for the working memory capacity limitation without parameter fitting to behavioral data or much hyperparameter tuning. This work highlights the importance of developing models with realistic sensory processing even when investigating memory and other high-level cognitive phenomena.
2022
- Human-like capacity limitation in multi-system models of working memoryYudi Xie, Yu Duan, Aohua Cheng, Pengcen Jiang, Christopher Cueva, and Guangyu Robert YangOct 2022
Working memory (WM) enables humans and other animals to hold information temporarily for various kinds of mental processing. WM has limited capacity and the maintenance of information in WM involves interactions between multiple brain regions. To account for such properties, we built multi-system models of WM, i.e., models that involve both sensory and cognitive systems, and their interactions. Our contributions are twofold, involving engineering and science. Engineering-wise, we built a framework to systematically construct such models to generate and test hypotheses in neuroscience research. Our models take sensory stimuli in their raw form, and reproduce diverse behavioral and neural findings across classical and recent WM experiments. Science-wise, our framework allows us to dissect the sensory and cognitive system’s contribution to WM capacity limitation. Our models reproduced behavioral findings in several WM tasks commonly used to assess capacity limitation. We found human-like capacity limitations arise in models with sensory systems pre-trained to recognize natural images, but not in models trained end-to-end on WM tasks. Our results suggest that WM capacity limitation is partly attributed to the sensory system when it is optimized for naturalistic objectives other than tasks artificially designed to probe WM.
- Striatal dopamine explains novelty-induced behavioral dynamics and individual variability in threat predictionKorleki Akiti, Iku Tsutsui-Kimura, Yudi Xie, Alexander Mathis, Jeffrey E Markowitz, Rockwell Anyoha, Sandeep Robert Datta, Mackenzie Weygandt Mathis, Naoshige Uchida, and Mitsuko Watabe-UchidaNeuron, Oct 2022
Animals both explore and avoid novel objects in the environment, but the neural mechanisms that underlie these behaviors and their dynamics remain uncharacterized. Here, we used multi-point tracking (DeepLabCut) and behavioral segmentation (MoSeq) to characterize the behavior of mice freely interacting with a novel object. Novelty elicits a characteristic sequence of behavior, starting with investigatory approach and culminating in object engagement or avoidance. Dopamine in the tail of the striatum (TS) suppresses engagement, and dopamine responses were predictive of individual variability in behavior. Behavioral dynamics and individual variability are explained by a reinforcement-learning (RL) model of threat prediction in which behavior arises from a novelty-induced initial threat prediction (akin to “shaping bonus”) and a threat prediction that is learned through dopamine-mediated threat prediction errors. These results uncover an algorithmic similarity between reward- and threat-related dopamine sub-systems.
2020
- Flexible motor sequence generation during stereotyped escape responsesYuan Wang, Xiaoqian Zhang, Qi Xin, Wesley Hung, Jeremy Florman, Jing Huo, Tianqi Xu, Yudi Xie, Mark J Alkema, Mei Zhen, and Quan WenElife, Oct 2020
Complex animal behaviors arise from a flexible combination of stereotyped motor primitives. Here we use the escape responses of the nematode Caenorhabditis elegans to study how a nervous system dynamically explores the action space. The initiation of the escape responses is predictable: the animal moves away from a potential threat, a mechanical or thermal stimulus. But the motor sequence and the timing that follow are variable. We report that a feedforward excitation between neurons encoding distinct motor states underlies robust motor sequence generation, while mutual inhibition between these neurons controls the flexibility of timing in a motor sequence. Electrical synapses contribute to feedforward coupling whereas glutamatergic synapses contribute to inhibition. We conclude that C. elegans generates robust and flexible motor sequences by combining an excitatory coupling and a winner-take-all operation via mutual inhibition between motor modules.