Toward Grouping in Large Scenes With Occlusion-Aware Spatio–Temporal Transformers (English)

Zhang, Jinsong / Gu, Lingfeng / Lai, Yu-Kun / Wang, Xueyang / Li, Kun

In: IEEE Transactions on Circuits and Systems for Video Technology ; 34 , 5 ; 3919-3929 ; 2024

ISSN:

1051-8215, 1558-2205

Article (Journal) / Electronic Resource

How to get this title?

Check access

Download

Commercial Copyright fee: €30.47 Basic fee: €4.00 Total price: €34.47

Academic Copyright fee: €30.47 Basic fee: €2.00 Total price: €32.47

Export, share and cite

Group detection, especially for large-scale scenes, has many potential applications for public safety and smart cities. Existing methods fail to cope with frequent occlusions in large-scale scenes with multiple people, and are difficult to effectively utilize spatio-temporal information. In this paper, we propose an end-to-end framework, GroupTransformer, for group detection in large-scale scenes. To deal with the frequent occlusions caused by multiple people, we design an occlusion encoder to detect and suppress severely occluded person crops. To explore the potential spatio-temporal relationship, we propose spatio-temporal transformers to simultaneously extract trajectory information and fuse inter-person features in a hierarchical manner. Experimental results on both large-scale and small-scale scenes demonstrate that our method achieves better performance compared with state-of-the-art methods. On large-scale scenes, our method significantly boosts the performance in terms of precision and F1 score by more than 10%. On small-scale scenes, our method still improves the performance of F1 score by more than 5%. We will release the code for research purposes.

Title:

Toward Grouping in Large Scenes With Occlusion-Aware Spatio–Temporal Transformers
Contributors:

Zhang, Jinsong ( author ) / Gu, Lingfeng ( author ) / Lai, Yu-Kun ( author ) / Wang, Xueyang ( author ) / Li, Kun ( author )
Published in:

IEEE Transactions on Circuits and Systems for Video Technology ; 34, 5 ; 3919-3929
Publisher:

IEEE

Publication date:

2024-05-01
Size:

2816772 byte
ISSN:

1051-8215, 1558-2205
DOI:

https://doi.org/10.1109/TCSVT.2023.3324868
Type of media:

Article (Journal)
Type of material:

Electronic Resource
Language:

English
Source:

IEEE

Table of contents – Volume 34, Issue 5

Show all volumes and issues

The tables of contents are generated automatically and are based on the data records of the individual contributions available in the index of the TIB portal. The display of the Tables of Contents may therefore be incomplete.

3063: Guest Editorial Special Section on Recent Standardization Efforts for Learning-Based Visual Data Coding
Liu, Dong / Liu, Shan / Ascenso, Joao / Tian, Dong / Yu, Lu et al. | 2024
digital version
3067: End-to-End Learning-Based Image Compression With a Decoupled Framework
Zhang, Zhaobin / Esenlik, Semih / Wu, Yaojun / Wang, Meng / Zhang, Kai / Zhang, Li et al. | 2024
digital version
3082: Rate-Distortion Optimized Post-Training Quantization for Learned Image Compression
Shi, Junqi / Lu, Ming / Ma, Zhan et al. | 2024
digital version
3096: MPAI-EEV: Standardization Efforts of Artificial Intelligence Based End-to-End Video Coding
Jia, Chuanmin / Ye, Feng / Dong, Fanke / Lin, Kai / Chiariglione, Leonardo / Ma, Siwei / Sun, Huifang / Gao, Wen et al. | 2024
digital version
3111: Deep Reference Frame Generation Method for VVC Inter Prediction Enhancement
Jia, Jianghao / Zhang, Yuantong / Zhu, Han / Chen, Zhenzhong / Liu, Zizheng / Xu, Xiaozhong / Liu, Shan et al. | 2024
digital version
3125: Lightweight Context Model Equipped aiWave in Response to the AVS Call for Evidence on Volumetric Medical Image Coding
Xue, Dongmei / Li, Li / Liu, Dong / Li, Houqiang et al. | 2024
digital version
3138: Rate-Rendering Distortion Optimized Preprocessing for Texture Map Compression of 3D Reconstructed Scenes
Kim, Soowoong / Do, Jihoon / Kang, Jungwon / Kim, Hui Yong et al. | 2024
digital version
3156: End-to-End Learnable Multi-Scale Feature Compression for VCM
Kim, Yeongwoong / Jeong, Hyewon / Yu, Janghyun / Kim, Younhee / Lee, Jooyoung / Jeong, Se Yoon / Kim, Hui Yong et al. | 2024
digital version
3168: Incremental Learning-Based Lane Detection for Automated Rubber-Tired Gantries in a Container Terminal
Feng, Yunjian / Zhou, Kunyang / Li, Jun / Zhou, Mengchu et al. | 2024
digital version
3180: Domain-Aware Prototype Network for Generalized Zero-Shot Learning
Hu, Yongli / Feng, Lincong / Jiang, Huajie / Liu, Mengting / Yin, Baocai et al. | 2024
digital version
3192: Transformer-Based Multimodal Emotional Perception for Dynamic Facial Expression Recognition in the Wild
Zhang, Xiaoqin / Li, Min / Lin, Sheng / Xu, Hang / Xiao, Guobao et al. | 2024
digital version
3204: Transparent Embedding Space for Interpretable Image Recognition
Wang, Jiaqi / Liu, Huafeng / Jing, Liping et al. | 2024
digital version
3220: DeNKD: Decoupled Non-Target Knowledge Distillation for Complementing Transformer-Based Unsupervised Domain Adaptation
Mei, Zhen / Ye, Peng / Li, Baopu / Chen, Tao / Fan, Jiayuan / Ouyang, Wanli et al. | 2024
digital version
3232: Progressive Multi-Resolution Loss for Crowd Counting
Yan, Ziheng / Qi, Yuankai / Li, Guorong / Liu, Xinyan / Zhang, Weigang / Yang, Ming-Hsuan / Huang, Qingming et al. | 2024
digital version
3245: BC-GAN: A Generative Adversarial Network for Synthesizing a Batch of Collocated Clothing
Zhou, Dongliang / Zhang, Haijun / Ma, Jianghong / Shi, Jianyang et al. | 2024
digital version
3260: Dual-Path Transformer for 3D Human Pose Estimation
Zhou, Lu / Chen, Yingying / Wang, Jinqiao et al. | 2024
digital version
3271: Un-Gaze: A Unified Transformer for Joint Gaze-Location and Gaze-Object Detection
Tu, Danyang / Shen, Wei / Sun, Wei / Min, Xiongkuo / Zhai, Guangtao / Chen, Changwen et al. | 2024
digital version
3286: Dual-Constraint Coarse-to-Fine Network for Camouflaged Object Detection
Yue, Guanghui / Xiao, Houlu / Xie, Hai / Zhou, Tianwei / Zhou, Wei / Yan, Weiqing / Zhao, Baoquan / Wang, Tianfu / Jiang, Qiuping et al. | 2024
digital version
3299: Compressed Video Action Recognition With Dual-Stream and Dual-Modal Transformer
Mou, Yuting / Jiang, Xinghao / Xu, Ke / Sun, Tanfeng / Wang, Zepeng et al. | 2024
digital version
3313: Light Field Super-Resolution Using Decoupled Selective Matching
Liu, Yutong / Cheng, Zhen / Xiao, Zeyu / Xiong, Zhiwei et al. | 2024
digital version
3327: Co-Occurrence Matters: Learning Action Relation for Temporal Action Localization
Cao, Congqi / Wang, Yizhe / Zhang, Yueran / Lu, Yue / Zhang, Xin / Zhang, Yanning et al. | 2024
digital version
3340: Boosting Video Object Segmentation via Robust and Efficient Memory Network
Chen, Yadang / Zhang, Dingwei / Zheng, Yuhui / Yang, Zhi-Xin / Wu, Enhua / Zhao, Haixing et al. | 2024
digital version
3353: Robust Tracking via Fully Exploring Background Prior Knowledge
Wang, Zheng'ao / Zhou, Zikun / Chen, Fanglin / Xu, Jun / Pei, Wenjie / Lu, Guangming et al. | 2024
digital version
3368: OASNet: Object Affordance State Recognition Network With Joint Visual Features and Relational Semantic Embeddings
Chen, Dongpan / Kong, Dehui / Li, Jinghua / Wang, Lichun / Gao, Junna / Yin, Baocai et al. | 2024
digital version
3383: A Refinement Method for Single-Stage Object Detection Based on Progressive Decoupled Task Alignment
Tang, Xianlun / Yang, Qiao / Zhang, Xi / Deng, Wuquan / Wang, Huiming / Gao, Xinbo et al. | 2024
digital version
3395: Instance-Dictionary Learning for Open-World Object Detection in Autonomous Driving Scenarios
Ma, Zeyu / Zheng, Ziqiang / Wei, Jiwei / Yang, Yang / Shen, Heng Tao et al. | 2024
digital version
3409: A Transferable Generative Framework for Multi-Label Zero-Shot Learning
Ma, Peirong / He, Zhiquan / Ran, Wu / Lu, Hong et al. | 2024
digital version
3424: BSSNet: A Real-Time Semantic Segmentation Network for Road Scenes Inspired From AutoEncoder
Shi, Xiaoqiang / Yin, Zhenyu / Han, Guangjie / Liu, Wenzhuo / Qin, Li / Bi, Yuanguo / Li, Shurui et al. | 2024
digital version
3439: Multi-Branch GAN-Based Abnormal Events Detection via Context Learning in Surveillance Videos
Li, Daoheng / Nie, Xiushan / Gong, Rui / Lin, Ximing / Yu, Hui et al. | 2024
digital version
3451: New Insights on Relieving Task-Recency Bias for Online Class Incremental Learning
Liang, Guoqiang / Chen, Zhaojie / Chen, Zhaoqiang / Ji, Shiyu / Zhang, Yanning et al. | 2024
digital version
3465: Improving Knowledge Distillation via Head and Tail Categories
Xu, Liuchi / Ren, Jin / Huang, Zhenhua / Zheng, Weishi / Chen, Yunwen et al. | 2024
digital version
3481: Watch You Under Low-Resolution and Low-Illumination: Face Enhancement via Bi-Factor Degradation Decoupling
Ding, Xin / Wang, Zheng / Fang, Jing / Shu, Zhenyu / Hu, Ruimin / Lin, Chia-Wen et al. | 2024
digital version
3496: Revisiting Open World Object Detection
Zhao, Xiaowei / Ma, Yuqing / Wang, Duorui / Shen, Yifan / Qiao, Yixuan / Liu, Xianglong et al. | 2024
digital version
3510: Discriminative Feature Learning With Co-Occurrence Attention Network for Vehicle ReID
Sheng, Hao / Wang, Shuai / Chen, Haobo / Yang, Da / Huang, Yang / Shen, Jiahao / Ke, Wei et al. | 2024
digital version
3523: Dictionary-Based Multi-View Learning With Privileged Information
Liu, Bo / Sun, Peng / Xiao, Yanshan / Zhao, Shilei / Li, Xiaokai / Peng, Tiantian / Zheng, Zhiyu / Huang, Yongsheng et al. | 2024
digital version
3538: Boosting Robust Multi-Focus Image Fusion With Frequency Mask and Hyperdimensional Computing
Qiao, Lihong / Wu, Shixin / Xiao, Bin / Shu, Yucheng / Luan, Xiao / Lu, Sicheng / Li, Weisheng / Gao, Xinbo et al. | 2024
digital version
3551: Autofocusing for Synthetic Aperture Imaging Based on Pedestrian Trajectory Prediction
Pei, Zhao / Zhang, Jiaqing / Zhang, Wenwen / Wang, Miao / Wang, Jianing / Yang, Yee-Hong et al. | 2024
digital version
3563: TridentCap: Image-Fact-Style Trident Semantic Framework for Stylized Image Captioning
Wang, Lanxiao / Qiu, Heqian / Qiu, Benliu / Meng, Fanman / Wu, Qingbo / Li, Hongliang et al. | 2024
digital version
3576: Speed-Up DDPM for Real-Time Underwater Image Enhancement
Lu, Siqi / Guan, Fengxu / Zhang, Hanyu / Lai, Haitao et al. | 2024
digital version
3589: Learning Global-Local Correspondence With Semantic Bottleneck for Logical Anomaly Detection
Yao, Haiming / Yu, Wenyong / Luo, Wei / Qiang, Zhenfeng / Luo, Donghao / Zhang, Xiaotian et al. | 2024
digital version
3606: DaFIR: Distortion-Aware Representation Learning for Fisheye Image Rectification
Liao, Zhaokang / Zhou, Wengang / Li, Houqiang et al. | 2024
digital version
3619: UAV-Ground Visual Tracking: A Unified Dataset and Collaborative Learning Approach
Sun, Dengdi / Cheng, Leilei / Chen, Song / Li, Chenglong / Xiao, Yun / Luo, Bin et al. | 2024
digital version
3633: Learnable Spatial-Spectral Transform-Based Tensor Nuclear Norm for Multi-Dimensional Visual Data Recovery
Liu, Sheng / Leng, Jinsong / Zhao, Xi-Le / Zeng, Haijin / Wang, Yao / Yang, Jing-Hua et al. | 2024
digital version
3647: Deep Convolution Modulation for Image Super-Resolution
Huang, Yuanfei / Li, Jie / Hu, Yanting / Huang, Hua / Gao, Xinbo et al. | 2024
digital version
3663: Video Question Answering With Semantic Disentanglement and Reasoning
Liu, Jin / Wang, Guoxiang / Xie, Jialong / Zhou, Fengyu / Xu, Huijuan et al. | 2024
digital version
3674: Dual-Path Feature Aware Network for Remote Sensing Image Semantic Segmentation
Geng, Jie / Song, Shuai / Jiang, Wen et al. | 2024
digital version
3687: A Clinically Guided Graph Convolutional Network for Assessment of Parkinsonian Pronation-Supination Movements of Hands
Xie, Zheng / Guo, Rui / Zhang, Chencheng / Qian, Xiaohua et al. | 2024
digital version
3700: See SIFT in a Rain
Wu, Wei / Chang, Hao / Li, Zhu et al. | 2024
digital version
3714: Spectral-Wise Implicit Neural Representation for Hyperspectral Image Reconstruction
Chen, Huan / Zhao, Wangcai / Xu, Tingfa / Shi, Guokai / Zhou, Shiyun / Liu, Peifu / Li, Jianan et al. | 2024
digital version
3728: Graph Regularized and Feature Aware Matrix Factorization for Robust Incomplete Multi-View Clustering
Wen, Jie / Xu, Gehui / Tang, Zhanyan / Wang, Wei / Fei, Lunke / Xu, Yong et al. | 2024
digital version
3742: Motion-Oriented Hybrid Spiking Neural Networks for Event-Based Motion Deblurring
Liu, Zhaoxin / Wu, Jinjian / Shi, Guangming / Yang, Wen / Dong, Weisheng / Zhao, Qinghang et al. | 2024
digital version
3755: MC-Blur: A Comprehensive Benchmark for Image Deblurring
Zhang, Kaihao / Wang, Tao / Luo, Wenhan / Ren, Wenqi / Stenger, Bjorn / Liu, Wei / Li, Hongdong / Yang, Ming-Hsuan et al. | 2024
digital version
3768: A Novel Framework for Scene Graph Generation via Prior Knowledge
Wang, Zhenghao / Lian, Jing / Li, Linhui / Zhao, Jian et al. | 2024
digital version
3782: Detect Any Shadow: Segment Anything for Video Shadow Detection
Wang, Yonghui / Zhou, Wengang / Mao, Yunyao / Li, Houqiang et al. | 2024
digital version
3795: CTIF-Net: A CNN-Transformer Iterative Fusion Network for Salient Object Detection
Yuan, Junbin / Zhu, Aiqing / Xu, Qingzhen / Wattanachote, Kanoksak / Gong, Yongyi et al. | 2024
digital version
3806: OHD: An Online Category-Aware Framework for Learning With Noisy Labels Under Long-Tailed Distribution
Zhao, Qihao / Zhang, Fan / Hu, Wei / Feng, Songhe / Liu, Jun et al. | 2024
digital version
3819: MBSI-Net: Multimodal Balanced Self-Learning Interaction Network for Image Classification
Ma, Mengru / Ma, Wenping / Jiao, Licheng / Liu, Xu / Liu, Fang / Li, Lingling / Yang, Shuyuan et al. | 2024
digital version
3834: Deep Pyramid Network for Low-Light Endoscopic Image Enhancement
Yue, Guanghui / Gao, Jie / Cong, Runmin / Zhou, Tianwei / Li, Leida / Wang, Tianfu et al. | 2024
digital version
3846: Global Localization in Large-Scale Point Clouds via Roll-Pitch-Yaw Invariant Place Recognition and Low-Overlap Global Registration
Wang, Zhong / Zhang, Lin / Zhao, Shengjie / Zhou, Yicong et al. | 2024
digital version
3860: Think Holistically, Act Down-to-Earth: A Semantic Navigation Strategy With Continuous Environmental Representation and Multi-Step Forward Planning
Chen, Bolei / Kang, Jiaxu / Zhong, Ping / Cui, Yongzheng / Lu, Siyi / Liang, Yixiong / Wang, Jianxin et al. | 2024
digital version
3876: Dense Pixel-to-Pixel Harmonization via Continuous Image Representation
Chen, Jianqi / Zhang, Yilan / Zou, Zhengxia / Chen, Keyan / Shi, Zhenwei et al. | 2024
digital version
3891: Hierarchical Attention Network for Open-Set Fine-Grained Image Recognition
Sun, Jiayin / Wang, Hong / Dong, Qiulei et al. | 2024
digital version
3905: MPLA-Net: Multiple Pseudo Label Aggregation Network for Weakly Supervised Video Salient Object Detection
Ma, Chunjie / Du, Lina / Zhuo, Li / Li, Jiafeng et al. | 2024
digital version
3919: Toward Grouping in Large Scenes With Occlusion-Aware Spatio–Temporal Transformers
Zhang, Jinsong / Gu, Lingfeng / Lai, Yu-Kun / Wang, Xueyang / Li, Kun et al. | 2024
digital version
3930: Modal Evaluation Network via Knowledge Distillation for No-Service Rail Surface Defect Detection
Zhou, Wujie / Hong, Jiankang / Yan, Weiqing / Jiang, Qiuping et al. | 2024
digital version
3943: Learned Two-Step Iterative Shrinkage Thresholding Algorithm for Deep Compressive Sensing
Gan, Hongping / Wang, Xiaoyang / He, Lijun / Liu, Jie et al. | 2024
digital version
3957: Learning Spatio-Temporal Sharpness Map for Video Deblurring
Zhu, Qi / Zheng, Naishan / Huang, Jie / Zhou, Man / Zhang, Jinghao / Zhao, Feng et al. | 2024
digital version
3971: Break the Bias: Delving Semantic Transform Invariance for Few-Shot Segmentation
Cao, Qinglong / Chen, Yuntian / Ma, Chao / Yang, Xiaokang et al. | 2024
digital version
3983: Correlation Filters for UAV Online Tracking Based on Complementary Appearance Model and Reversibility Reasoning
Wang, Biao / Li, Wenling / Zhang, Bin / Liu, Yang / Du, Junping et al. | 2024
digital version
3998: The Illusion of Visual Security: Reconstructing Perceptually Encrypted Images
Yang, Ying / Xiang, Tao / Lv, Xiao / Guo, Shangwei / Zeng, Tieyong et al. | 2024
digital version
4011: Toward High-Quality HDR Deghosting With Conditional Diffusion Models
Yan, Qingsen / Hu, Tao / Sun, Yuan / Tang, Hao / Zhu, Yu / Dong, Wei / Van Gool, Luc / Zhang, Yanning et al. | 2024
digital version
4027: Sparse-to-Dense: High Efficiency Rate Control for End-to-End Scale-Adaptive Video Coding
Chen, Jiancong / Wang, Meng / Zhang, Pingping / Wang, Shurun / Wang, Shiqi et al. | 2024
digital version
4040: Temporal Wavelet Transform-Based Low-Complexity Perceptual Quality Enhancement of Compressed Video
Dong, Cunhui / Ma, Haichuan / Li, Zhuoyuan / Li, Li / Liu, Dong et al. | 2024
digital version
4054: Camera Pose-Based Background Modeling for Video Coding in Moving Cameras
Fang, Zheng / Zheng, Mingkui / Chen, Pingping / Chen, Zhifeng / Oliver Wu, Dapeng et al. | 2024
digital version
4070: Joint Pixel and Frequency Feature Learning and Fusion via Channel-Wise Transformer for High-Efficiency Learned In-Loop Filter in VVC
Kathariya, Birendra / Li, Zhu / Auwera, Geert Van der et al. | 2024
digital version
4084: Quality Harmonization for Virtual Composition in Online Video Communications
Li, Binzhe / Chen, Bolin / Wang, Zhao / Chen, Baoliang / Wang, Shiqi / Ye, Yan et al. | 2024
digital version
4095: Unsupervised Deep Hashing With Fine-Grained Similarity-Preserving Contrastive Learning for Image Retrieval
Cao, Hu / Huang, Lei / Nie, Jie / Wei, Zhiqiang et al. | 2024
digital version
4109: Question-Aware Global-Local Video Understanding Network for Audio-Visual Question Answering
Chen, Zailong / Wang, Lei / Wang, Peng / Gao, Peng et al. | 2024
digital version
4120: Blind Universal Denoising for Radar Micro-Doppler Spectrograms Using Identical Dual Learning and Reciprocal Adversarial Training
Yang, Yang / Wen, Peiling / Ye, Wenbo / Li, Beichen / Lang, Yue et al. | 2024
digital version
4135: Towards Video Anomaly Detection in the Real World: A Binarization Embedded Weakly-Supervised Network
Yang, Zhen / Guo, Yuanfang / Wang, Junfu / Huang, Di / Bao, Xiuguo / Wang, Yunhong et al. | 2024
digital version
C1: Table of Contents
| 2024
digital version
C2: IEEE Transactions on Circuits and Systems for Video Technology Publication Information
| 2024
digital version
C3: IEEE Circuits and Systems Society Information
| 2024
digital version

How to get this title?

Check access

Download

Commercial Copyright fee: €30.47 Basic fee: €4.00 Total price: €34.47

Academic Copyright fee: €30.47 Basic fee: €2.00 Total price: €32.47

Quicklinks

Borrowing & Ordering

Quicklinks

Search & discover

Quicklinks

Learning & working

Quicklinks

Publishing & Archiving

Quicklinks

About the TIB

Quicklinks

Research & Development

Toward Grouping in Large Scenes With Occlusion-Aware Spatio–Temporal Transformers (English)

How to get this title?

Export, share and cite

More details on this result

Table of contents

Table of contents – Volume 34, Issue 5

Similar titles

How to get this title?

Export, share and cite