Wenhu Chen is currently an assistant professor in the David R. Cheriton School of Computer Science at University of Waterloo. He obtained his PhD from the computing science department of University of California, Santa Barbara in 2021, and he spent a wonderful postdoctoral year at Google Research. His main research interests include natural language processing, large language models, vision-language interaction, image generation, etc.
Assistant Professor, David Cheriton School of Computer Science, University of Waterloo
Canada CIFAR Artificial Intelligence Chair
Research Interests
- Natural Language Processing
- Multimodal Learning
- Knowledge Reasoning and Grounding
Highlights
- Canada CIFAR AI Chair in 2022
- WACV best student paper honorable mention
- UCSB CS Outstanding Dissertation Award
- Tencent AI Gift Award
Publications
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
2022
Re-imagen: Retrieval-augmented text-to-image generator
2022
Explanations from Large Language Models Make Small Reasoners Better
2022
Large language models are few (1)-shot table reasoners
2022
MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text
2022
Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering
2022
Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
2022
HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data
2022
Subject-driven Text-to-Image Generation via Apprenticeship Learning
2023
DePlot: One-shot visual language reasoning by plot-to-table translation
2022
Controllable Dialogue Simulation with In-Context Learning
2022
QA Is the New KR: Question-Answer Pairs as Knowledge Bases
2022
Using meta-information in neural machine translation
2022
On the Risk of Misinformation Pollution with Large Language Models
2023
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
2023
TheoremQA: A Theorem-driven Question Answering dataset
2023
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
2023
Few-shot In-context Learning for Knowledge Base Question Answering
2023
DreamEdit: Subject-driven Image Editing
2023
MARBLE: Music Audio Representation Benchmark for Universal Evaluation
2023
MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response
2023
EDIS: Entity-Driven Image Search over Multimodal Web Content
2023
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
2023
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
2023
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark
2024
Interactive Natural Language Processing
2023
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
2023
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
2023
VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation
2023
Instruct-Imagen: Image Generation with Multi-modal Instruction
2024
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
2024
Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation
2024
Augmenting Black-box LLMs with Medical Textbooks for Clinical Question Answering
2023
TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks
2023
ImagenHub: Standardizing the evaluation of conditional image generation models
2023
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
2023
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
2024
E^ 2-LLM: Efficient and Extreme Length Extension of Large Language Models
2024
ChatMusician: Understanding and Generating Music Intrinsically with LLM
2024
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
2024
Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation
2024
COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning
2024
AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks
2024
Reward Guided Latent Consistency Distillation
2024
DEEP-ICL: Definition-Enriched Experts for Language Model In-Context Learning
2024
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
2024
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
2024