已发表成果:
WOK 论文 34 篇;
Image Captioning via Dynamic Path Customization
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
3D-GRES: Generalized 3D Referring Expression Segmentation
Multi-branch Collaborative Learning Network for 3D Visual Grounding
Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
HRSAM: Efficiently Segment Anything in High-Resolution Images
Evaluating and Analyzing Relationship Hallucinations in LVLMs
Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval
SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation
Image Captioning via Dynamic Path Customization
X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation
Toward Open-Set Human Object Interaction Detection
X-RefSeg3D: Enhancing Referring 3D Instance Segmentation via Structured Cross-Modal Graph Neural Networks
Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation
3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation
MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization