Online Model Distillation for Efficient Video Inference

R. Mullapudi, S. Chen, K. Zhang, D. Ramanan, и K. Fatahalian.
(2018)cite arxiv:1812.02699.

Аннотация

High-quality computer vision models typically address the problem of understanding the general distribution of real-world images. However, most cameras observe only a very small fraction of this distribution. This offers the possibility of achieving more efficient inference by specializing compact, low-cost models to the specific distribution of frames observed by a single camera. In this paper, we employ the technique of model distillation (supervising a low-cost student model using the output of a high-cost teacher) to specialize accurate, low-cost semantic segmentation models to a target video stream. Rather than learn a specialized student model on offline data from the video stream, we train the student in an online fashion on the live video, intermittently running the teacher to provide a target for learning. Online model distillation yields semantic segmentation models that closely approximate their Mask R-CNN teacher with 7 to 17x lower inference runtime cost (11 to 26x in FLOPs), even when the target video's distribution is non-stationary. Our method requires no offline pretraining on the target video stream, and achieves higher accuracy and lower cost than solutions based on flow or video object segmentation. We also provide a new video dataset for evaluating the efficiency of inference over long running video streams.

ключ BibTeX: mullapudi2018online
тип записи: misc
год: 2018
url: http://arxiv.org/abs/1812.02699
Примечание: cite arxiv:1812.02699

тэги

Пользователи данного ресурса

Комментарии и рецензиипоказать / перейти в невидимый режим

Пожалуйста, войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)

Цитировать эту публикацию

@misc{mullapudi2018online, abstract = {High-quality computer vision models typically address the problem of understanding the general distribution of real-world images. However, most cameras observe only a very small fraction of this distribution. This offers the possibility of achieving more efficient inference by specializing compact, low-cost models to the specific distribution of frames observed by a single camera. In this paper, we employ the technique of model distillation (supervising a low-cost student model using the output of a high-cost teacher) to specialize accurate, low-cost semantic segmentation models to a target video stream. Rather than learn a specialized student model on offline data from the video stream, we train the student in an online fashion on the live video, intermittently running the teacher to provide a target for learning. Online model distillation yields semantic segmentation models that closely approximate their Mask R-CNN teacher with 7 to 17x lower inference runtime cost (11 to 26x in FLOPs), even when the target video's distribution is non-stationary. Our method requires no offline pretraining on the target video stream, and achieves higher accuracy and lower cost than solutions based on flow or video object segmentation. We also provide a new video dataset for evaluating the efficiency of inference over long running video streams.}, added-at = {2019-05-30T18:07:05.000+0200}, author = {Mullapudi, Ravi Teja and Chen, Steven and Zhang, Keyi and Ramanan, Deva and Fatahalian, Kayvon}, biburl = {https://www.bibsonomy.org/bibtex/2ec82369146261b9e50986252ae2c5725/ngaloppo}, description = {Online Model Distillation for Efficient Video Inference}, interhash = {60261804622b21195e77d4af08dea9e2}, intrahash = {ec82369146261b9e50986252ae2c5725}, keywords = {deeplearning edgeinference onlinelearning}, note = {cite arxiv:1812.02699}, timestamp = {2019-05-30T18:07:05.000+0200}, title = {Online Model Distillation for Efficient Video Inference}, url = {http://arxiv.org/abs/1812.02699}, year = 2018 }

BibSonomy