Haonan Ge

Multimodal Intelligence · World Modeling · Action Reasoning

johnny.jpg

Haonan Ge · 葛浩南

I am an incoming CS Ph.D. student at University of California, Santa Barbara (UCSB)icon, advised by Prof. Yao Qin. I am also a senior undergraduate student in Electrical and Computer Engineering at Southeast Universityicon.

I currently work as a Research Intern with UC Mercedicon and The University of Queenslandicon, advised by Prof. Yujun Cai (UQ) and Prof. Yiwei Wang (UC Merced), and I collaborate with Prof. Kai-Wei Chang (UCLA)icon and Prof. Ming-Hsuan Yang (UC Merced)icon.

Incoming CS Ph.D. @ UCSB ECE @ Southeast University

Research Interests

Faithful Multimodal Intelligence and World Modeling

I build multimodal models that learn physical laws and world dynamics from large-scale unlabeled video, grounding decisions in perceptual evidence rather than language priors.

Scalable Multimodal Action Reasoning and Agents

I study controllable systems that integrate images, video, audio, and actions, aiming for practical agentic tools for creative workflows like filmmaking and design.

I am actively seeking a Research Intern position. Feel free to reach out.

News

Apr 05, 2026 I have accepted the Reviewer invitation for NeurIPS 2026.
Apr 01, 2026 I have accepted the Ph.D. offer from University of California, Santa Barbara (UCSB).
Jan 26, 2026 “SportR: A Benchmark for Multimodal Large Language Model Reasoning in Sports” has been accepted to ICLR 2026, See you in Rio de Janeiro!
Oct 02, 2025 I have been invited as a Reviewer in CVPR 2026 ! :sparkles: :smile:
Sep 25, 2025 Three Paper have been submitted to ICLR 2026 !
Aug 21, 2025 “MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs” has been accepted to EMNLP 2025 !
Jun 25, 2025 I start a Summer Research Intern in UCI and Rice University.

Selected Publications

  1. ICML
    camreasoner.png
    CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning
    Hang Wu, Yujun Cai, Zehao Li, and 4 more authors
    In ICML, 2026
    Under review.
  2. ECCV
    Framemind.png
    FrameMind: Frame-Interleaved Video Reasoning via Reinforcement Learning
    Haonan Ge, Yiwei Wang, Kai-Wei Chang, and 2 more authors
    In ECCV, 2026
    Under review.
  3. ICLR
    SPORTR.png
    SportR: A Benchmark for Multimodal Large Language Model Reasoning in Sports
    Haotian Xia*, Haonan Ge*, Junbo Zou*, and 16 more authors
    In ICLR, 2026
  4. Refineshot.png
    RefineShot: Rethinking Cinematography Understanding with Foundational Skill Evaluation
    Hang Wu, Yujun Cai, Haonan Ge, and 3 more authors
    In ICLR, 2026
    Under review.
  5. EMNLP
    MRFD.png
    MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs
    Haonan Ge, Yiwei Wang, Ming-Hsuan Yang, and 1 more author
    In EMNLP, 2025

* denotes equal contribution

Academic Services

Conference Reviewer:

NeurIPS 2026 CVPR 2026 WACV 2026

Previous Research Experience

UC Irvineicon

Visiting Student · Research Intern

Worked with Prof. Weining Shen on multimodal learning and reasoning.

Rice University · Chili Labicon

Project Co-lead

Co-led a vision-language sports benchmark and reinforcement learning project with Prof. Hanjie Chen.

Miscellaneous

Beyond my academic pursuits, I lead a vibrant life filled with diverse interests. I am passionate about photography, capturing city nights, landscapes, and skies through both digital and film lenses. Music is an indispensable part of my daily routine—I am a huge fan of Jay Chou and absolutely love singing to unwind. To recharge, I enjoy cycling to explore the city and immersing myself in fascinating virtual worlds through gaming. Above all, I am a devoted cat lover, and spending quiet moments with my feline friends always brings me immense joy.