已发表成果:
WOK 论文 537 篇;中文核心 11 篇;其它论文 1 篇;专利发明 8 个;
M<SUP>3</SUP>ixup: 3 ixup: A multi-modal data augmentation approach for image captioning
An efficient blur kernel estimation method for blind image Super-Resolution
You only compress once: Towards effective and elastic BERT compression via exploit-explore stochastic nature gradient
Deep hybrid transformer network for robust modulation classification in wireless communications
Continual Face Forgery Detection via Historical Distribution Preserving
Adaptive Fuzzy Positive Learning for Annotation-Scarce Semantic Segmentation
CamoTeacher: Dual-Rotation Consistency Learning for Semi-Supervised Camouflaged Object Detection
Beyond Inter-Item Relations: Dynamic Adaptive Mixture-of-Experts for LLM-Based Sequential Recommendation
StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model
EASYINV: TOWARD FAST AND BETTER DDIM INVERSION
Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation
Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
Image Captioning via Dynamic Path Customization
3D-GRES: Generalized 3D Referring Expression Segmentation
MOVE AND ACT: ENHANCED OBJECT MANIPULATION AND BACKGROUND INTEGRITY FOR IMAGE EDITING
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
AccDiffusion: An Accurate Method for Higher-Resolution Image Generation
ERQ: Error Reduction for Post-Training Quantization of Vision Transformers
Multi-branch Collaborative Learning Network for 3D Visual Grounding
Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
ANYSR: REALIZING IMAGE SUPER-RESOLUTION AS ANY-SCALE, ANY-RESOURCE
Oracle Bone Inscriptions Multi-modal Dataset
HRSAM: Efficiently Segment Anything in High-Resolution Images
Identity-Aware Variational Autoencoder for Face Swapping
Local Manifold Learning for No-Reference Image Quality Assessment
HUWSOD: Holistic Self-training for Unified Weakly Supervised Object Detection
UIO-LLMS: UNBIASED INCREMENTAL OPTIMIZATION FOR LONG-CONTEXT LLMS
Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text
Depth-Guided Semi-Supervised Instance Segmentation
Evaluating and Analyzing Relationship Hallucinations in LVLMs
AnyTrans: Translate AnyText in the Image with Large Scale Models
VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models
Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval
SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation
Image Captioning via Dynamic Path Customization
UniPTS: A Unified Framework for Proficient Post-Training Sparsity
FocSAM: Delving Deeply into Focused Objects in Segmenting Anything
GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane
Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion
Optg: Optimizing Gradient-Driven Criteria in Network Sparsity
Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference
X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation
GraCo: Granularity-Controllable Interactive Segmentation
ObjectAdd: Adding Objects into Image via a Training-Free Diffusion Modification Fashion
Cantor: Inspiring Multimodal Chain-of-Thought of MLLM
CutDiffusion: A Simple, Fast, Cheap, and Strong Diffusion Extrapolation Method
Multi-Modal Prompt Learning on Blind Image Quality Assessment
NeRF-DetS: Enhancing Multi-View 3D Object Detection with Sampling-adaptive Network of Continuous NeRF-based Representation
ConCLVD: Controllable Chinese Landscape Video Generation via Diffusion Model
Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization
CycleTrans: Learning Neutral Yet Discriminative Features via Cycle Construction for Visible-Infrared Person Re-Identification
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
Deep Instruction Tuning for Segment Anything Model
DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis
Learning Image Demoiréing from Unpaired Real Data
Toward Open-Set Human Object Interaction Detection
Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models
AFFINEQUANT: AFFINE TRANSFORMATION QUANTIZATION FOR LARGE LANGUAGE MODELS
DMAD: Dual Memory Bank for Real-World Anomaly Detection
Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers
Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization
DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
Semi-supervised Counting via Pixel-by-pixel Density Distribution Modelling
EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs
An Efficient Blur Kernel Estimation Method for Blind Image Super-Resolution
Shadow-aware dynamic convolution for shadow removal
A closer look at branch classifiers of multi-exit architectures
Unified-Width Adaptive Dynamic Network for All-In-One Image Restoration
Feature Denoising Diffusion Model for Blind Image Quality Assessment
Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation
Cross-Modality Perturbation Synergy Attack for Person Re-identification
Learning Image Demoiréing from Unpaired Real Data
Preface
Preface
Preface
Preface
Preface
Preface
Preface
Preface
Preface
Two-Stage Deep Learning Segmentation for Tiny Brain Regions
Defense Against Adversarial Attacks Using Topology Aligning Adversarial Training
Weakly-Supervised RGBD Video Object Segmentation
Training-Free Transformer Architecture Search With Zero-Cost Proxy Guided Evolution
Uncovering the Over-Smoothing Challenge in Image Super-Resolution: Entropy-Based Quantification and Contrastive Optimization
Adaptive Zone Learning for Weakly Supervised Object Localization
FUNCTIONALLY SIMILAR MULTI-LABEL KNOWLEDGE DISTILLATION
MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization
EXPLORING TARGET REPRESENTATIONS FOR MASKED AUTOENCODERS
AFFINEQUANT: AFFINE TRANSFORMATION QUANTIZATION FOR LARGE LANGUAGE MODELS
DYNAMIC SPARSE NO TRAINING ?: TRAINING-FREE FINE-TUNING FOR SPARSE LLMS
GreedyAgent: Crafting Efficient Agents for Meta-learning from Learning Curves via Greedy Algorithm Selection
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization
Adaptive Feature Selection for No-Reference Image Quality Assessment using Contrastive Mitigating Semantic Noise Sensitivity
Aligning and Prompting Everything All at Once for Universal Visual Perception
Lottery Jackpots Exist in Pre-Trained Models
X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation
Code Search Debiasing: Improve Search Results beyond Overall Ranking Performance
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization
Semi-Supervised Panoptic Narrative Grounding
Semi-Supervised Panoptic Narrative Grounding
Learning Occlusion Disentanglement with Fine-grained Localization for Occluded Person Re-identification
PixelFace plus : Towards Controllable Face Generation and Manipulation with Text Descriptions and Segmentation Masks
Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation
Improving Human-Object Interaction Detection via Virtual Image Learning
Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval
Shadow Removal by High-Quality Shadow Synthesis
CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes
JM3D & JM3D-LLM: Elevating 3D Representation with Joint Multi-modal Cues
Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration
Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models
Towards Unified Token Learning for Vision-Language Tracking
DLIP: Distilling Language-Image Pre-training
M3PS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization in E-commerce
EALink: An Efficient and Accurate Pre-trained Framework for Issue-Commit Link Recovery
HODN: Disentangling Human-Object Feature for HOI Detection
Towards Language-Guided Visual Recognition via Dynamic Convolutions
Continual Face Forgery Detection via Historical Distribution Preserving
Pseudo-label Alignment for Semi-supervised Instance Segmentation
Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation
Improving Human-Object Interaction Detection via Virtual Image Learning
Towards General Visual-Linguistic Face Forgery Detection
REFBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring
Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer
Approximated Prompt Tuning for Vision-Language Pre-trained Models
CF-ViT: A General Coarse-to-Fine Method for Vision Transformer
OMPQ: Orthogonal Mixed Precision Quantization
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Shadow-Aware Dynamic Convolution for Shadow Removal
Spatial Re-parameterization for N:M Sparsity
STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection
Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting
Towards local visual modeling for image captioning
CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion Models
DiffRate: Differentiable Compression Rate for Efficient Vision Transformers
RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
MultiQuant: A Novel Multi-Branch Topology Method for Arbitrary Bit-width Network Quantization
Distribution-Flexible Subset Quantization for Post-Quantizing Super-Resolution Networks
InterFormer: Real-time Interactive Image Segmentation
Latent Feature Relation Consistency for Adversarial Robustness
X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance
CAT: Collaborative Adversarial Training
You Only Segment Once: Towards Real-Time Panoptic Segmentation
Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective
Attention Disturbance and Dual-Path Constraint Network for Occluded Person Re-Identification
Active Teacher for Semi-Supervised Object Detection
DistilPose: Tokenized Pose Regression with Heatmap Distillation
HGNN<SUP>+</SUP>: General Hypergraph Neural Networks
Dynamic Support Network for Few-Shot Class Incremental Learning
Self-supervised Graph Representation Learning for Black Market Account Detection
Towards End-to-end Semi-supervised Learning for One-stage Object Detection
Towards Efficient Visual Adaption via Structural Re-parameterization
Bi-directional Masks for Efficient N:M Sparse Training
Towards Local Visual Modeling for Image Captioning
REAL-TIME IMAGE DEMOIRéING ON MOBILE DEVICES
Exploring Invariant Representation for Visible-Infrared Person Re-Identification
Spectral Aware Softmax for Visible-Infrared Person Re-Identification
Unsupervised Domain Adaptation on Person Re-Identification via Dual-level Asymmetric Mutual Learning
DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution Pruning
Training Compact CNNs for Image Classification Using Dynamic-Coded Filter Fusion
Prioritized Subnet Sampling for Resource-Adaptive Supernet Training
Positive-Sample-Free Object Tracking via a Soft Constraint
Transformer Tracking via Frequency Fusion
Toward Unified Token Learning for Vision-Language Tracking
HODN: Disentangling Human-Object Feature for HOI Detection
Semantically Consistent Visual Representation for Adversarial Robustness
Bilateral Knowledge Interaction Network for Referring Image Segmentation
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression Comprehension
RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension
Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective
RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension
You Only Segment Once: Towards Real-Time Panoptic Segmentation
Bi-directional Masks for Efficient N:M Sparse Training
Interactive Object Placement with Reinforcement Learning
DistilPose: Tokenized Pose Regression with Heatmap Distillation
EALink: An Efficient and Accurate Pre-trained Framework for Issue-Commit Link Recovery
Clover : Towards A Unified Video-Language Alignment and Fusion Model
STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection
Meta Architecture for Point Cloud Analysis
Discriminator-Cooperated Feature Map Distillation for GAN Compression
AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration
Category-aware Allocation Transformer for Weakly Supervised Object Localization
SMMix: Self-Motivated Image Mixing for Vision Transformers
X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance
InterFormer Real-time Interactive Image Segmentation
DiffRate : Differentiable Compression Rate for Efficient Vision Transformers
REAL-TIME IMAGE DEMOIRéING ON MOBILE DEVICES
Automatic Network Pruning via Hilbert-Schmidt Independence Criterion Lasso under Information Bottleneck Principle
Pseudo-label Alignment for Semi-supervised Instance Segmentation
Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
Improving Adversarial Robustness via Information Bottleneck Distillation
Discover and Align Taxonomic Context Priors for Open-world Semi-Supervised Learning
Discriminator-Cooperated Feature Map Distillation for GAN Compression
SMMix: Self-Motivated Image Mixing for Vision Transformers
Exploring Content Relationships for Distilling Efficient GANs
Shadow Removal by High-Quality Shadow Synthesis
Self-supervised Graph Representation Learning for Black Market Account Detection
Meta Architecture for Point Cloud Analysis
Exploiting the Partly Scratch-off Lottery Ticket for Quantization-Aware Training
Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach
Learning Dynamic Prior Knowledge for Text-to-Face Pixel Synthesis
Cycle Encoding of a StyleGAN Encoder for Improved Reconstruction and Editability
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval
Dynamic Prototype Mask for Occluded Person Re-Identification
Searching Lightweight Neural Network for Image Signal Processing
Towards Open-Ended Text-to-Face Generation, Combination and Manipulation
ECO-TR: Efficient Correspondences Finding Via Coarse-to-Fine Refinement
Exploring Target Representations for Masked Autoencoders
CycleTrans: Learning Neutral yet Discriminative Features for Visible-Infrared Person Re-Identification
A Closer Look at Branch Classiers of Multi-Exit Architectures
Cycle Encoding of a StyleGAN Encoder for Improved Reconstruction and Editability
Dynamic Prototype Mask for Occluded Person Re-Identification
Clover: Towards A Unified Video-Language Alignment and Fusion Model
Privacy-Preserving Face Recognition with Learnable Privacy Budgets in Frequency Domain
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval
Learning to Learn Transferable Attack
Dual Contrastive Learning for General Face Forgery Detection
Learning Best Combination for Efficient N:M Sparsity
Super Vision Transformer
Shadow-Aware Dynamic Convolution for Shadow Removal
Deepwalk-aware graph convolutional networks
A Closer Look at Branch Classifiers of Multi-exit Architectures
What Goes beyond Multi-modal Fusion in One-stage Referring Expression Comprehension: An Empirical Study
Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks
PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation
End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation
Towards Robust Adversarial Training via Dual-label Supervised and Geometry Constraint
SeqTR: A Simple yet Universal Network for Visual Grounding
Training-free Transformer Architecture Search
ARM: Any-Time Super-Resolution Method
Global2Local: A Joint-Hierarchical Attention for Video Captioning
Differentiated Relevances Embedding for Group-based Referring Expression Comprehension
Factored Attention and Embedding for Unstructured-view Topic-related Ultrasound Report Generation
Dynamic Dual Trainable Bounds for Ultra-low Precision Super-Resolution Networks
Coarse-to-Fine Vision Transformer
Boosting Crowd Counting via Multifaceted Attention
Pruning Networks With Cross-Layer Ranking & k-Reciprocal Nearest Filters
Distilling a Powerful Student Model via Online Knowledge Distillation
Theophylline Extracted from Fu Brick Tea Affects the Metabolism of Preadipocytes and Body Fat in Mice as a Pancreatic Lipase Inhibitor
Plenty is Plague: Fine-Grained Learning for Visual Question Answering
Carrying Out CNN Channel Pruning in a White Box
Fast Monocular Depth Estimation via Side Prediction Aggregation with Continuous Spatial Refinement
Disentangling Task-Oriented Representations for Unsupervised Domain Adaptation
Learning Efficient GANs for Image Translation via Differentiable Masks and Co-Attention Distillation
Knowing What it is: Semantic-Enhanced Dual Attention Transformer
Multi-Branch Distance-Sensitive Self-Attention Network for Image Captioning
Towards Lightweight Transformer Via Group-Wise Transformation for Vision-and-Language Tasks
Knowing What to Learn: A Metric-Oriented Focal Mechanism for Image Captioning
SiamBAN: Target-Aware Tracking With Siamese Box Adaptive Network
1xN Pattern for Pruning Convolutional Neural Networks
Generating Hypergraph-Based High-Order Representations of Whole-Slide Histopathological Images for Survival Prediction
Robust Tracking via Uncertainty-Aware Semantic Consistency
SiMaN: Sign-to-Magnitude Network Binarization
Leveraging Local and Global Cues for Visual Tracking via Parallel Interaction Network
Training-free Transformer Architecture Search
IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization
Boosting Crowd Counting via Multifaceted Attention
ECO-TR: Efficient Correspondences Finding via Coarse-to-Fine Refinement
An Information Theoretic Approach for Attention-Driven Face Forgery Detection
SeqTR: A Simple Yet Universal Network for Visual Grounding
Black-Box Dissector: Towards Erasing-Based Hard-Label Model Stealing Attack
Dynamic Dual Trainable Bounds for Ultra-low Precision Super-Resolution Networks
Privacy-Preserving Face Recognition with Learnable Privacy Budgets in Frequency Domain
Fine-grained Data Distribution Alignment for Post-Training Quantization
PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation
ARM: Any-Time Super-Resolution Method
Active Teacher for Semi-Supervised Object Detection
DIFNet: Boosting Visual Information Flow for Image Captioning
Neural Architecture Search with Representation Mutual Information
Learning Best Combination for Efficient N:M Sparsity
Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach
E2Net: Excitative-Expansile Learning for Weakly Supervised Object Localization
CDP: Towards Optimal Filter Pruning via Class-wise Discriminative Power
Show, Read and Reason: Table Structure Recognition with Flexible Context Aggregator
RecycleNet: An Overlapped Text Instance Recovery Approach
Joint segmentation and detection of COVID-19 via a sequential region generation network
Containing the Transmission of COVID-19: A Modeling Study in 160 Countries
Real-time semantic segmentation via sequential knowledge distillation
Cauchy loss induced block diagonal representation for robust multi-view subspace clustering
Uncovering Media Bias via Social Network Learning
Local Relation Learning for Face Forgery Detection
Domain General Face Forgery Detection by Learning to Weight
HifiFace: 3D Shape and Semantic Prior Guided High Fidelity Face Swapping
Dual Distribution Alignment Network for Generalizable Person Re-Identification
Seminar Learning for Click-Level Weakly Supervised Semantic Segmentation
Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme
Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network
Dual-Level Collaborative Transformer for Image Captioning
Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning
The Ninth Visual Object Tracking VOT2021 Challenge Results
A Dual-stream Framework for 3D Mask Face Presentation Attack Detection
Toward Efficient CNN Compression via Coordinate Descent Structure Search
Towards Compact CNNs via Collaborative Compression
Discover Cross-Modality Nuances for Visible-Infrared Person Re-Identification
Beyond Max-Margin: Class Margin Equilibrium for Few-shot Object Detection
RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words
Image-to-image Translation via Hierarchical Style Disentanglement
Spectral Clustering to Analyze the Hidden Events in Single-Molecule Break Junctions
Modulated Convolutional Networks
MIGO-NAS: Towards Fast and Generalizable Neural Architecture Search
Winning Solutions and Post-Challenge Analyses of the ChaLearn AutoDL Challenge 2019
Beyond Universal Person Re-Identification Attack
Attention-Based Neural Architecture Search for Person Re-Identification
Network Pruning Using Adaptive Exemplar Filters
A Real-Time Global Inference Network for One-Stage Referring Expression Comprehension
Filter Sketch for Network Pruning
Knowledge-Driven Generative Adversarial Network for Text-to-Image Synthesis
Detecting High Frequency Oscillations for Stereoelectroencephalography in Epilepsy via Hypergraph Learning
TOWARDS ROBUSTNESS AGAINST NATURAL LANGUAGE WORD SUBSTITUTIONS
Parallel Detection-and-Segmentation Learning for Weakly Supervised Instance
Occlude Them All: Occlusion-Aware Attention Network for Occluded Person Re-ID
ReCU: Reviving the Dead Weights in Binary Neural Networks
EC-DARTS: Inducing Equalized and Consistent Optimization into DARTS
Architecture Disentanglement for Deep Neural Networks
TRAR: Routing the Attention Spans in Transformer for Visual Question Answering
K-armed Bandit based Multi-Modal Network Architecture Search for Visual Question Answering
Cascade Grouped Attention Network for Referring Expression Segmentation
Attacking Image Captioning towards Accuracy-Preserving Target Words Removal
Exploring Language Prior for Mode-Sensitive Visual Attention Modeling
Semi-Supervised Adversarial Monocular Depth Estimation
Every node counts: Self-ensembling graph convolutional networks for semi-supervised learning
Learning task-oriented disentangled representations for unsupervised domain adaptation
Dual distribution alignment network for generalizable person re-identification
Multiple expert brainstorming for domain adaptive person re-identification
A New Transfer Function for Volume Visualization of Aortic Stent and Its Application to Virtual Endoscopy
Cogradient Descent for Bilinear Optimization
Siamese box adaptive network for visual tracking
Improving face recognition from hard samples via Distribution Distillation Loss
Toward Compact ConvNets via Structure-Sparsity Regularized Filter Pruning
Link-aware semi-supervised hypergraph
Category-Aware Spatial Constraint for Weakly Supervised Detection
Similarity-Preserving Linkage Hashing for Online Image Retrieval
Binarized Neural Architecture Search for Efficient Object Recognition
Improving Face Recognition from Hard Samples via Distribution Distillation Loss
AD-Cluster: Augmented Discriminative Clustering for Domain Adaptive Person Re-Identification
Noise-aware fully webly supervised object detection
HRank: Filter Pruning using High-Rank Feature Map
Multi-task collaborative network for joint referring expression comprehension and segmentation
Salience-Guided Cascaded Suppression Network for Person Re-identification
Cogradient descent for bilinear optimization
Siamese Box Adaptive Network for Visual Tracking
One-shot adversarial attacks on visual tracking with dual attention
Projection & Probability-Driven Black-Box Attack
Rethinking Performance Estimation in Neural Architecture Search
Filter Grafting for Deep Neural Networks
Aggregating Global and Local Visual Representation for Vehicle Re-IDentification
Fast Class-Wise Updating for Online Hashing
Fine-Grained Spatial Alignment Model for Person Re-Identification With Focal Triplet Loss
Hadamard Matrix Guided Online Hashing
Binarized neural architecture search
Revisiting image aesthetic assessment via self-supervised feature learning
Fast learning of temporal action proposal via dense boundary generator
Asymmetric co-teaching for unsupervised cross-domain person re-identification
Rotated binary neural network
UWSOD: Toward fully-supervised-level capacity weakly supervised object detection
Channel pruning via automatic structure search
Polynomial Universal Adversarial Perturbations for Person Re-Identification
Multi-scale features forweakly supervised lesion detection of cerebral hemorrhage with collaborative learning
Multi-modal multi-layer fusion network with average binary center loss for face anti-spoofing
Visual-Textual Sentiment Analysis in Product Reviews
Deep Manifold Structure Transfer for Action Recognition
Multi-scale gem pooling with N-pair center loss for fine-grained image search
Colloquial image captioning
Circulant binary convolutional networks: Enhancing the performance of 1-bit dcnns with circulant back propagation
Towards visual feature translation
Cyclic guidance for weakly supervised joint detection and segmentation
Exploiting kernel sparsity and entropy for interpretable CNN compression
Towards optimal structured CNN pruning via generative adversarial learning
Pyramidal person re-identification via multi-loss dynamic training
DSNET: Accelerate Indoor Scene Semantic Segmentation
Towards Cross-modality Topic Modelling via Deep Topical Correlation Analysis
Learning Similarity-specific Dictionary for Zero-shot Fine-grained Recognition
Social Media Based Topic Modeling for Smart Campus: A Deep Topical Correlation Analysis Method
Generalized zero-shot vehicle detection in remote sensing imagery via coarse-to-fine framework
Hypergraph induced convolutional manifold networks
A part power set model for scale-free person retrieval
Free vqa models from knowledge inertia by pairwise inconformity learning
Towards optimal discrete online hashing with balanced similarity
Hypergraph neural networks
PVRNet: Point-view relation neural network for 3D shape recognition
Dynamic capsule attention for visual question answering
Towards optimal fine grained retrieval via decorrelated centralized loss with normalize-scale layer
FreeAnchor: Learning to match anchors for visual object detection
Variational structured semantic inference for diverse image captioning
Information competing process for learning diversified representations
Learning neural bag-of-matrix-summarization with riemannian network
Generative Adversarial Learning Towards Fast Weakly Supervised Detection
Modulated Convolutional Networks
GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition
GroupCap: Group-Based Image Captioning with Structured Relevance and Diversity Constraints
Dense auto-encoder hashing for robust cross-modality retrieval
PVNet: A joint convolutional network of point cloud and multi-view for 3D shape recognition
Supervised online hashing via hadamard codebook learning
Towards Compact Visual Descriptor via Deep Fisher Network with Binary Embedding
Bio-Inspired Deep Attribute Learning Towards Facial Aesthetic Prediction
Deep Neural Network Compression and Acceleration: A Review
Gamma Mixture Models for Outlier Removal
Cross-Modality Microblog Sentiment Prediction via Bi-Layer Multimodal Hypergraph Learning
Inductive Multi-Hypergraph Learning and Its Application on View-Based 3D Object Classification
A Stacked Sparse Autoencoder-based Detector for Automatic Identification of Neuromagnetic High Frequency Oscillations in Epilepsy
Weakly Supervised Object Detection via Object-Specific Pixel Gradient
Image Quality Assessment for Color Correction Based on Color Contrast Similarity and Color Value Difference
Ordinal Constraint Binary Coding for Approximate Nearest Neighbor Search
Action-Attending Graphic Neural Network
Deep-based fisher vector for mobile visual search
Font generation based on least squares conditional generative adversarial nets
AAM Based Face Sketch Synthesis
Context-Aware Phrase Representation for Statistical Machine Translation
Topic-Guided automatical human-simulated tweeting system
Correntropy-Induced Robust Low-Rank Hypergraph
Surface Saliency Detection Based On Curvature Co-Occurrence Histograms
Holistic CNN Compression via Low-rank Decomposition with Knowledge Transfer
Face Sketch Synthesis by Multidomain Adversarial Learning
Centralized ranking loss with weakly supervised localization for fine-grained object retrieval
Robust face sketch synthesis via generative adversarial fusion of priors and parametric sigmoid
Accelerating convolutional networks via global & dynamic filter pruning
Cross-modality person re-identification with generative adversarial training
Asynchronous bidirectional decoding for neural machine translation
Toward Optimal Manifold Hashing via Discrete Locally Linear Embedding
StructCap: Structured semantic embedding for image captioning
More than an answer: Neural pivot network for visual qestion answering
Predicting Microblog Sentiments via Weakly Supervised Multi-Modal Deep Learning
Body Structure Aware Deep Crowd Counting
Learning-based Shadow Recognition and Removal from Monochromatic Natural Images
Output Constraint Transfer for Kernelized Correlation Filter in Tracking
Mobile social multimedia analytics in the big data era: An introduction to the special issue
Weakly supervised vehicle detection in satellite images via multi-instance discriminative learning
Exploring Coherent Motion Patterns via Structured Trajectory Learning for Crowd Mood Modeling
Continuous Probability Distribution Prediction of Image Emotions via Multitask Shared Sparse Regression
Special issue on “visual semantic analysis with weak supervision”
Face sketch aging via aging oriented principal component analysis
Ordinal constrained binary code learning for nearest neighbor search
ESPACE: Accelerating convolutional neural networks via eliminating spatial and channel redundancy
Lattice-based recurrent neural network encoders for neural machine translation
Cross-Modality Binary Code Learning via Fusion Similarity Hashing
Sensitive information detection on cyber-space
Optimization algorithm toward deep features based camera pose estimation
Masked face detection via a modified LeNet
Advanced learning for large-scale heterogeneous computing
Dynamic programming based optimized product quantization for approximate nearest neighbor search
The distributed system for inverted multi-index visual retrieval
Joint Depth and Semantic Inference from a Single Image via Elastic Conditional Random Field
Predicting personalized emotion perceptions of social images
Towards perceptual video cropping with curve fitting
Crowd video retrieval via deep attribute-embedding graph ranking
Towards building abstraction by using line segment descriptor
Learning high-dimensional multimedia data
Bounding Multiple Gaussians Uncertainty with Application to Object Tracking
Face recognition by decision fusion of two-dimensional linear discriminant analysis and local binary pattern
Survey of visual sentiment prediction for social media analysis
Image Categorization by Learning a Propagated Graphlet Path
Top rank supervised binary coding for visual search
Detection based object labeling of 3D point cloud for indoor scenes
A novel features ranking metric with application to scalable visual and bioinformatics data classification
Web video topics discovery and structuralization with social network
Supervised matrix factorization for cross-modality hashing
Towards convolutional neural networks compression via global error reconstruction
Search-Based Depth Estimation via Coupled Dictionary Learning with Large-Margin Structure Inference
Variational neural discourse relation recognizer
An effective eye states detection method based on the projection of the gray interval distribution
Understanding image structure via hierarchical shape parsing
Towards 3D object detection with bimodal deep Boltzmann machines over RGBD imagery
A cross-media sentiment analytics platform for microblog
On-Device Mobile Landmark Recognition Using Binarized Descriptor with Multifeature Fusion
Discriminative local collaborative representation for online object tracking
Multimodal hypergraph learning for microblog sentiment prediction
Forward stereo obstacle detection with Weighted Hough Transform and local temporal correlation
Interactive on-device Mobile Landmark Recognition with compact binary codes
Learning a Probabilistic Topology Discovering Model for Scene Categorization
Cross-Modality Sentiment Analysis for Social Multimedia
Rank Preserving Hashing for Rapid Image Search
Social Attribute-Aware Force Model: Exploiting Richness of Interaction for Abnormal Crowd Detection
Sparse auto-encoder based feature learning for human body detection in depth image
Low-rank similarity metric learning in high dimensions
Localizing web videos using social images
Robust infrared target tracking based on particle filter with embedded saliency detection
Multimodal learning for view-based 3D object classification
3D object retrieval with multi-feature collaboration and bipartite graph matching
Feature learning based on SAE-PCA network for human gesture recognition in RGBD images
Fast verification via statistical geometric for mobile visual search
When location meets social multimedia: A survey on vision-based recognition and mining for geo-social multimedia analytics
Spectralspatial co-clustering of hyperspectral image data based on bipartite graph
Spatial-aware object-level saliency prediction by learning graphlet hierarchies
Probabilistic Skimlets Fusion for Summarizing Multiple Consumer Landmark Videos
Local consistent hierarchical Hough Match for image re-ranking
Learning high-level feature by deep belief networks for 3-D model retrieval and recognition
Weakly supervised visual dictionary learning by harnessing image attributes
Visual sentiment topic model based microblog image sentiment analysis
Hacking Chinese touclick CAPTCHA by multi-scale corner structure model with fast pattern matching
Toward statistical modeling of saccadic eye-movement and visual saliency
Decomposed human localization from social photo album
Learning-based bipartite graph matching for view-based 3D model retrieval
Single/cross-camera multiple-person tracking by graph matching
Online MIL tracking with instance-level semi-supervised learning
A cross-media public sentiment analysis system for microblog
Mining Compact Bag-of-Patterns for Low Bit Rate Mobile Visual Search
Hyperspectral Image Classification Through Bilayer Graph-Based Learning
Online semi-supervised compressive coding for robust visual tracking
Symbiotic tracker ensemble toward a unified tracking framework
Structured partial least squares for simultaneous object tracking and segmentation
Actively Learning Human Gaze Shifting Paths for Semantics-Aware Photo Cropping
3-D Object Retrieval With Hausdorff Distance Learning
Weakly Supervised Multi-Graph Learning for Robust Image Reranking
A Topic Clustering Approach to Finding Similar Questions from Large Question and Answer Archives
Spectral-Spatial Constraint Hyperspectral Image Classification
Visual tracking via weakly supervised learning from multiple imperfect oracles
Efficient semantic image segmentation with multi-class ranking prior
Discriminative Orthogonal Nonnegative matrix factorization with flexibility for data representation
Representative Discovery of Structure Cues for Weakly-Supervised Image Segmentation
Towards Mobile Document Image Retrieval for Digital Library
Robust tracking via patch-based appearance model and local background estimation
Large-scale geosocial multimedia
Improved and promising identificationof human microRNAs by incorporatinga high-quality negative set
High-capacity reversible watermarking scheme of 2D-vector data
Pursuing detector efficiency for simple scene pedestrian detection
RGBD salient object detection: A benchmark and algorithms
Phrasal Paraphrase Based Question Reformulation for Archived Question Retrieval
Remote Dynamic Three-Dimensional Scene Reconstruction
Visual reranking through weakly supervised multi-graph learning
Stereotime: A wireless 2D and 3D switchable video communication system
Semi-supervised learning with manifold fitted graphs
Query-dependent visual dictionary adaptation for image reranking
Localizing web videos from heterogeneous images
A NEW CAMERA SELF-CALIBRATION METHOD BASED ON CSA
DECOMPOSED HUMAN LOCALIZATION IN PERSONAL PHOTO ALBUMS
Spectral-spatial classification of hyperspectral imagery based on random forests
SEEING ACTIONS THROUGH SCENE CONTEXT
Saliency detection by adaptive clustering
Salient object detection via Low-rank and Structured sparse Matrix Decomposition
厦门大学纪荣嵘教授团队在深度伪造检测领域取得新进展
信息网络安全,1671-1122,2024-11-10.基于采样和加权损失函数的模型窃取攻击方法
中国科学:信息科学,1674-7267,2023-05-16.《中国图象图形学报》多媒体智能专刊简介
中国图象图形学报,1006-8961,2022-09-16.双标签监督的几何约束对抗训练
软件学报,1000-9825,2022-04-15.基于注意力机制的实例分割算法
导航定位与授时,2095-8110,2021-11-15.语调、情绪及市场影响:基于金融情绪词典
管理科学学报,1007-9807,2021-05-15.深度神经网络结构搜索综述
中国图象图形学报,1006-8961,2021-02-09.深度神经网络压缩与加速综述
计算机研究与发展,1000-1239,2018.高管之“人”的先天特征在PO市场中起作用吗?
管理世界,1002-5502,2017.高管之“人”的先天特征在IPO市场中起作用吗?
管理世界,1002-5502,2017.深度学习:开启大数据时代的钥匙
工程研究-跨学科视野中的工程,1674-4969,2014-09-25.信息技术 神经网络表示与模型压缩 第1部分:卷积神经网络
国家市场监督管理总局;国家标准化管理委员会,,2023-03-17.