Xinyang Liu | Publications

2024

Advancing Graph Generation through Beta Diffusion

Yilin He*, Xinyang Liu*, Bo Chen, and Mingyuan Zhou

ArXiv 2406.09357, 2024, 2024

ABSTRACT PDF Code

Diffusion models have demonstrated effectiveness for generating natural images and have since been adapted to generate diverse types of data, including graphs. While this emerging family of diffusion-based graph generative models has shown remarkable performance gains over predecessors that rely on variational autoencoders or generative adversarial networks, it is important to note that the majority of these models utilize Gaussian or categorical-based diffusion processes, which may encounter difficulties when modeling sparse and long-tailed data distributions. In our work, we introduce Graph Beta Diffusion (GBD), a diffusion-based generative model adept at modeling diverse graph structures. Focusing on the sparse and range-bounded characteristics of graph adjacency matrices, GBD employs a beta diffusion process to ensure that the initial distribution aligns with the beta distribution, which is well-suited for modeling such data types. To enhance the realism of generated graphs further, we introduce a modulation technique that stabilizes the generation of important graph structures while maintaining flexibility for the rest. The superior performance of GBD in generating graphs, as demonstrated across three generic graph benchmarks and two biochemical graph benchmarks, underscores its effectiveness in capturing the complexities of real-world graph data.
Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models

Xinyang Liu*, Dongsheng Wang*, Bowei Fang, Miaoge Li, Zhibin Duan, Yishi Xu, Bo Chen, and Mingyuan Zhou

Proceedings of the 40th Conference on Uncertainty in Artificial Intelligence (UAI), 2024

ABSTRACT PDF

For downstream applications of vision-language pre-trained models, there has been significant interest in constructing effective prompts. Existing works on prompt engineering, which either require laborious manual designs or optimize the prompt tuning as a point estimation problem, may fail to describe diverse characteristics of categories and limit their applications. We introduce a Bayesian probabilistic resolution to prompt tuning, where the label-specific stochastic prompts are generated hierarchically by first sampling a latent vector from an underlying distribution and then employing a lightweight generative model. Importantly, we semantically regularize the tuning process by minimizing the statistical distance between the visual patches and linguistic prompts, which pushes the stochastic label representations to faithfully capture diverse visual concepts, instead of overfitting the training categories. We evaluate the effectiveness of our approach on four tasks: few-shot image recognition, base-to-new generalization, dataset transfer learning, and domain shifts. Extensive results over 15 datasets show promising transferability and generalization performance of our proposed model, both quantitatively and qualitatively.

2023

Context-guided Embedding Adaptation for Effective Topic Modeling in Low-Resource Regimes

Yishi Xu, Jianqiao Sun, Yudi Su, Xinyang Liu, Zhibin Duan, Bo Chen, and Mingyuan Zhou

The 37th Conference on Neural Information Processing Systems (NeurIPS), 2023

ABSTRACT

Embedding-based neural topic models have been shown as superior options for few-shot topic modeling. However, existing approaches treat the static word embeddings learned from source tasks as transferable knowledge which can be directly applied to the target task, ignoring the fact that word meanings can vary across tasks with different contexts, thus leading to suboptimal results when adapting to new tasks with novel contexts. To address the issue, in this paper, we propose an effective approach for topic modeling under the low-resource regime, the core of which is the adaptive generation of semantic matching word embeddings by integrating the contextual information of each task. Concretely, we introduce a variational graph autoencoder to learn task-specific word embeddings based on the dependency graph refined from the context of each task, with a learnable Gaussian mixture prior to capture the clustering structure of distributed word representations. This is naturally connected to topic modeling by regarding each component of the mixture as the representation of a topic, which facilitates the discovery of diverse topics and the fast adaptation to novel tasks. Both quantitative and qualitative experiments demonstrate the superiority of our method against established topic models.
Tuning Multi-mode Token-level Prompt Alignment across Modalities

Dongsheng Wang, Miaoge Li, Xinyang Liu, Xu Mingsheng, Bo Chen, and Zhang Hanwang

The 37th Conference on Neural Information Processing Systems (NeurIPS), 2023

ABSTRACT

Prompt tuning pre-trained vision-language models have demonstrated significant potential in improving open-world visual concept understanding. However, prior works only primarily focus on single-mode (only one prompt for each modality) and holistic level (image or sentence) semantic alignment, which fails to capture the sample diversity, leading to sub-optimal prompt discovery. To address the limitation, we propose a multi-mode token-level tuning framework that leverages the optimal transportation to learn and align a set of prompt tokens across modalities. Specifically, we rely on two essential factors: 1) multi-mode prompts discovery, which guarantees diverse semantic representations, and 2) token-level alignment, which helps explore fine-grained similarity. Thus, the similarity can be calculated as a hierarchical transportation problem between the modality-specific sets. Extensive experiments on popular image recognition benchmarks show the superior generalization and few-shot abilities of our approach. The qualitative analysis demonstrates that the learned prompt tokens have the ability to capture diverse visual concepts.
PatchCT: Aligning Patch Set and Label Set with Conditional Transport for Multi-Label Image Classification

Miaoge Li*, Dongsheng Wang*, Xinyang Liu, Zequn Zeng, Ruiying Lu, Bo Chen, and Mingyuan Zhou

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023

ABSTRACT PDF Code

Multi-label image classification is a prediction task that aims to identify more than one label from a given image. This paper considers the semantic consistency of the latent space between the visual patch and linguistic label domains and introduces the conditional transport (CT) theory to bridge the acknowledged gap. While recent cross-modal attention-based studies have attempted to align such two representations and achieved impressive performance, they required carefully-designed alignment modules and extra complex operations in the attention computation. We find that by formulating the multi-label classification as a CT problem, we can exploit the interactions between the image and label efficiently by minimizing the bidirectional CT cost. Specifically, after feeding the images and textual labels into the modality-specific encoders, we view each image as a mixture of patch embeddings and a mixture of label embeddings, which capture the local region features and the class prototypes, respectively. CT is then employed to learn and align those two semantic sets by defining the forward and backward navigators. Importantly, the defined navigators in CT distance model the similarities between patches and labels, which provides an interpretable tool to visualize the learned prototypes. Extensive experiments on three public image benchmarks show that the proposed model consistently outperforms the previous methods.
Bayesian Progressive Deep Topic Model with Knowledge Informed Textual Data Coarsening Process

Zhibin Duan*, Xinyang Liu*, Yudi Su, Yishi Xu, Bo Chen, and Mingyuan Zhou

The 40th International Conference on Machine Learning (ICML), 2023

ABSTRACT PDF Code

Deep topic models have shown an impressive ability to extract multi-layer document latent representations and discover hierarchical semantically meaningful topics. However, most deep topic models are limited to the single-step generative process, despite the fact that the progressive generative process has achieved impressive performance in modeling image data. To this end, in this paper, we propose a novel progressive deep topic model that consists of a knowledge-informed textural data coarsening process and a corresponding progressive generative model. The former is used to build multi-level observations ranging from concrete to abstract, while the latter is used to generate more concrete observations gradually. Additionally, we incorporate a graph-enhanced decoder to capture the semantic relationships among words at different levels of observation. Furthermore, we perform a theoretical analysis of the proposed model based on the principle of information theory and show how it can alleviate the wellknown “latent variable collapse” problem. Finally, extensive experiments demonstrate that our proposed model effectively improves the ability of deep topic models, resulting in higher-quality latent document representations and topics.