We consider a problem: Can a machine learn from a few labeled pixels to predict every pixel in a new image? This task is extremely challenging (see Fig. 1) as a single body part could contain visually distinctive areas (e.g. head consists of eyes, noses and mouths); different body parts might look similar and undistinguishable (e.g., upper arms v.s. lower arms). It could be even more difficult if we do not provide any precise location but only the occurrence of body parts in the image. This problem is dubbed weakly-supervised segmentation, where the goal is to classify every pixel into semantic categories using only partial / weak supervision. There are many forms of weak annotations which are cheap but not perfect, e.g. image-level tags, bounding boxes, points
This article is purposely trimmed, please visit the source to read the full article.