Optical Reasoning: Rethinking Images as an Expressive Reasoning Medium Beyond Text
Optical Reasoning: Rethinking Images as an Expressive Reasoning Medium Beyond Text
要約
Chain-of-Thought (CoT) improves the performance of Large Language Models (LLMs) and has been extended to Multimodal Large Language Models (MLLMs). More recent work further moves from text-based multimodal reasoning toward interleaved-modal reasoning, where intermediate steps can incorporate both tex…