Thursday, November 30, 2023
HomeComputer VisionCVPR 2020 – The Critical Laptop Imaginative and prescient Weblog

CVPR 2020 – The Critical Laptop Imaginative and prescient Weblog


(By Li Yang Ku)

CVPR is digital this 12 months for apparent causes, and when you didn’t pay the $325 registration price to attend this ‘prerecorded’ dwell occasion, now you can have an identical expertise by way of watching all of the recorded movies on their YouTube channel free of charge. In fact its not precisely the identical since you might be loosing out the digital chat room networking expertise, however actually talking, pc imaginative and prescient events are sometimes awkward in individual already and I can’t think about you lacking a lot. Earlier than we undergo my paper picks, lets have a look at the development first. The graph under is the accepted paper counts by matter this 12 months.

CVPR 2020 stats

And the next are the stats for CVPR 2019:

CVPR 2019 stats

These numbers can’t be immediately in contrast because the classes will not be precisely the identical, for instance, deep studying that had probably the most submission in 2019 is not a class (Aren’t gonna be a really helpful class when each paper is about deep studying.) The distribution of those two graphs look fairly comparable. Nevertheless, if I’ve to research it at gunpoint, I’d say the next:

  1. Recognition remains to be the preferred utility for pc imaginative and prescient.
  2. The brand new class “Switch/Low-shot/Semi/Unsupervised Studying” is the preferred drawback to resolve with deep networks.
  3. Regardless of being a controversial expertise, extra persons are engaged on face recognition. For some international locations that is in all probability nonetheless the place most cash is distributed.
  4. The brand new class “Environment friendly coaching and inference strategies for networks” exhibits that there’s an effort to push for sensible use of the neural community.
  5. Primarily based on this different statistic information, plainly the key phrase ‘graph’, ‘illustration’, and ‘cloud’ doubled from final 12 months. That is in step with my commentary that persons are exploring 3D information extra because the analysis area on 2D picture is probably the most crowded and aggressive.

Now for my random paper picks:

a) Boyang Deng, Kyle Genova, Soroosh Yazdani, Sofien Bouaziz, Geoffrey Hinton, and Andrea Tagliasacchi. “CvxNet: Learnable Convex Decomposition” (video)

This Google Analysis paper introduces a brand new illustration for 3D shapes that may be realized by neural networks and utilized by physics engines immediately. Within the paper, the authors talked about that there are two varieties of 3D representations, 1) express representations reminiscent of meshes. These representations can be utilized in lots of purposes reminiscent of physics simulations immediately as a result of they comprise data of the floor. express representations are nonetheless laborious to study with neural networks. The opposite sort is 2) implicit representations reminiscent of voxel grids, voxel grids will be realized from neural networks since it may be thought of as a classification drawback that labels every voxel empty or not. Nevertheless, turning these voxel grids right into a mesh is kind of costly. The authors subsequently introduce this convex decomposition illustration that signify a 3D form with a union of convex elements. Since a convex form will be represented by a set of hyperplanes that draw the boundary of the form, it turns into a learnable classification drawback whereas stays the good thing about having data of the form boundary. This illustration is subsequently each implicit and express. The authors additionally demonstrated {that a} realized CvxNet is ready to generate 3D shapes from 2D photos with significantly better success in comparison with different approaches as present under.

b) Tushar Nagarajan, Yanghao Li, Christoph Feichtenhofer, Kristen Grauman. “Ego-Topo: Atmosphere Affordances From Selfish Video” (video)

Environment Affordance

This paper on predicting an atmosphere’s affordance is a collaboration between UT Austin’s pc imaginative and prescient group and Fb AI Analysis. This paper caught my eye since my dissertation was additionally about affordances utilizing a graph like construction. If you’re not acquainted of the phrase “affordance”, its a controversial phrase made as much as describe what motion/operate an object/atmosphere affords an individual/robotic.

On this work, the authors argue that the area that an motion is taken place in is essential to understanding first individual movies. Conventional approaches on classifier actions in movies often simply take a piece of the video and generate a illustration for classification, whereas SLAM (simultaneous localization and mapping) approaches that tries to create the precise 3D construction of the atmosphere usually fails when people transfer too quick. As a substitute, this work learns a community that classifies whether or not two views belong to the identical area. Primarily based on this data, a graph the place every node represents an area and the corresponding movies will be created. The sides between nodes then signify the motion sequences that occurred between these areas. These movies inside a node can then be used to foretell what an atmosphere affords. The authors additional skilled a graph convolution community that takes under consideration neighboring nodes to foretell the following motion within the video. The authors confirmed that taking into consideration the underlying area benefited in each duties.

c) Kiana Ehsani, Shubham Tulsiani, Saurabh Gupta, Ali Farhadi, Abhinav Gupta. “Use the Pressure, Luke! Studying to Predict Bodily Forces by Simulating Results” (video)

use the force luke - Yoda | Meme Generator

This paper would in all probability received the perfect title award for this convention if there’s one. This work is about estimating forces utilized to things by human in a video. Arguably, if robots can estimate forces utilized on objects, it could be fairly helpful for performing duties and predicting human intentions. Nevertheless, personally I don’t assume that is how people perceive the world and it might be fixing a more durable drawback then wanted. Having stated that that is nonetheless an attention-grabbing paper price discussing.

Estimating force and contact points

The issue of this job is that the bottom reality forces utilized on objects can’t be simply obtained. As a substitute of determining how you can acquire this information, the authors use a physics simulator to simulate the end result of making use of the pressure after which use keypoints annotated within the subsequent body in comparison with the keypoints location of the simulated end result as a sign to coach the community. Contact factors are additionally predicted by a separate community with annotated information. The determine above exhibits this coaching schema. Observe that estimating gradients by way of a non-differentiable physics simulator is feasible by trying on the outcome when every dimension is modified a little bit bit. The authors present this strategy is ready to acquire cheap outcome on a collected dataset and will be prolonged to novel objects.

d) Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Music. “SpineNet: Studying Scale-Permuted Spine for Recognition and Localization” (video)

This can be a Google Mind paper that tries to discover a higher structure for object detection duties that will profit from extra spatial data. For segmentation duties, the standard structure has an hour glass formed encoder decoder construction that first down scales the decision after which scales it again as much as predict pixel-wise outcome. The authors argued that these sort of neural networks which have this scale reducing spine is probably not the perfect resolution for duties which localization can also be essential.

Left: ResNet, Proper: Permute final 10 blocks

The concept is then to permute the order of the layers of an current community reminiscent of ResNet and see if this may end up in a greater structure. To keep away from having to check out all combos, the authors used Neural Structure Search (mainly one other community) to study what structure can be higher. The result’s an structure that has blended resolutions and lots of skip connections that go additional (picture above). The authors confirmed that with this structure they had been capable of outperform prior cutting-edge outcome and this identical community was additionally capable of obtain good outcomes on different datasets apart from the one skilled on.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments