Recent studies have shown that vision transformer (ViT) models can attain better results than most state-of-the-art convolutional neural networks (CNNs) across various image recognition tasks, and can do so while using considerably fewer computational resources. This has led some researchers to propose ViTs could replace CNNs in this field.However, despite their promising performance, ViTs areContinue Reading
Gibson’s underlying database of spaces includes 572 full buildings composed of 1447 floors covering a total area of 211k m2s. The database is collected from real indoor spaces using 3D scanning and reconstruction. For each space, we provide: the 3D reconstruction, RGB images, depth, surface normal, and for a fraction of the spaces, semantic object annotations. In this page you can see various visualizations for each space, including 3D dissections, exploration using a randomly controlled husky agent, and standard point-to-point navigation episodes
W. Hung, Y. Tsai, Y. Liou, Y. Lin, и M. Yang. (2018)cite arxiv:1802.07934Comment: Accepted in BMVC 2018. Code and models available at https://github.com/hfslyc/AdvSemiSeg.
P. Wu, R. Wang, K. Kin, C. Twigg, S. Han, M. Yang, и S. Chien. Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology, стр. 365--374. New York, NY, USA, ACM, (2017)