Our LSTS could achieve 77.2% mAP on the mainstream benchmarks of video object detection, which basically outperforms other state-of-the-art methods considering accuracy and efficiency. More detailed comparison and ablation studie are presented in our paper.
Figure (a) and Figure (b) indicate that the distribution of the learned sampling locations is much closer to the distribution calculated by the Datsets.
@inproceedings{jiang2020learning, title = {Learning Where to Focus for Efficient Video Object Detection}, author = {Jiang, Zhengkai and Liu, Yu and Yang, Ceyuan and Liu, Jihao and Gao, Peng and Zhang, Qian and Xiang, Shiming and Pan, Chunhong}, journal = {European Conference on Computer Vision (ECCV)}, year = {2020} }