Haonan Ge

johnny.jpg

Hi, I am Haonan Ge (葛浩南) 👋, a senior undergraduate student at Southeast University, majoring in Electrical & Computer Engineering. I currently work as a Research Intern with University of California, Merced and The University of Queensland, advised by Prof. Yujun Cai (UQ) and Prof. Yiwei Wang (UC Merced), and I collaborate with Prof. Kai-Wei Chang (UCLA) and Prof. Ming-Hsuan Yang (UC Merced).

Previously, I was a visiting student at University of California, Irvine, working with Prof. Weining Shen as a Research Intern. And I also co-led a vision-language model in sports-benchmark and reinforcement learning project with Prof. Hanjie Chen at the Chili Lab, Rice University.

Research Interests

  • Faithful Multimodal Intelligence and World Modeling
    I aim to develop general-purpose multimodal foundation models that learn physical laws and world dynamics from large-scale unlabeled video data. My research focuses on enabling faithful multimodal intelligence, where models reason about object properties, long-term dynamics, and interactions by grounding their decisions in perceptual evidence rather than language priors.

  • Scalable Multimodal Action Reasoning and Agents
    I explore interpretable, controllable, and scalable AI systems that integrate images, video, audio, and action signals. By studying non-textual chains-of-thought reasoning and extracting human and robotic action knowledge from large-scale video data, my goal is to develop intelligent agentic software that enhances productivity in domains such as filmmaking, musicmaking, and design.

I am actively seeking a Ph.D. position beginning in Fall 2026! Feel free to email me if you are interested.

Email: gehaonan82@gmail.com

News

Jan 26, 2026 “SportR: A Benchmark for Multimodal Large Language Model Reasoning in Sports” has been accepted to ICLR 2026, See you in Rio de Janeiro!
Oct 02, 2025 I have been invited as a Reviewer in CVPR 2026 ! :sparkles: :smile:
Sep 25, 2025 Three Paper have been submitted to ICLR 2026 !
Aug 21, 2025 “MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs” has been accepted to EMNLP 2025 !
Jun 25, 2025 I start a Summer Research Intern in UCI and Rice University.

Selected Publications

  1. camreasoner.png
    CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning
    Hang Wu, Yujun Cai, Zehao Li, and 4 more authors
    In ICML, 2026
    Under review.
  2. Framemind.png
    FrameMind: Frame-Interleaved Video Reasoning via Reinforcement Learning
    Haonan Ge, Yiwei Wang, Kai-Wei Chang, and 2 more authors
    In ICLR, 2026
    Under review.
  3. ICLR
    SPORTR.png
    SportR: A Benchmark for Multimodal Large Language Model Reasoning in Sports
    Haotian Xia*, Haonan Ge*, Junbo Zou*, and 16 more authors
    In ICLR, 2026
  4. Refineshot.png
    RefineShot: Rethinking Cinematography Understanding with Foundational Skill Evaluation
    Hang Wu, Yujun Cai, Haonan Ge, and 3 more authors
    In ICLR, 2026
    Under review.
  5. EMNLP
    MRFD.png
    MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs
    Haonan Ge, Yiwei Wang, Ming-Hsuan Yang, and 1 more author
    In EMNLP, 2025

* denotes equal contribution