已发表成果:
WOK 论文 113 篇;中文核心 1 篇;
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation
Towards Omni-supervised Referring Expression Segmentation
Semi-Supervised Panoptic Narrative Grounding
Semi-Supervised Panoptic Narrative Grounding
PixelFace plus : Towards Controllable Face Generation and Manipulation with Text Descriptions and Segmentation Masks
Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation
Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval
JM3D & JM3D-LLM: Elevating 3D Representation with Joint Multi-modal Cues
Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models
3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation
Towards Language-Guided Visual Recognition via Dynamic Convolutions
Continual Face Forgery Detection via Historical Distribution Preserving
Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation
Towards General Visual-Linguistic Face Forgery Detection
Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer
End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation
Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network
Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting
Towards local visual modeling for image captioning
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance
Active Teacher for Semi-Supervised Object Detection
Towards End-to-end Semi-supervised Learning for One-stage Object Detection
Towards Efficient Visual Adaption via Structural Re-parameterization
Towards Local Visual Modeling for Image Captioning
Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network
HSM-QA: Question Answering System Based on Hierarchical Semantic Matching
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression Comprehension
RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension
RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension
Clover : Towards A Unified Video-Language Alignment and Fusion Model
X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance
Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models