Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
365 views
in Technique[技术] by (71.8m points)

neural network - Multiple category classification in Caffe

I thought we might be able to compile a Caffeinated description of some methods of performing multiple category classification.

By multi category classification I mean: The input data containing representations of multiple model output categories and/or simply being classifiable under multiple model output categories.

E.g. An image containing a cat & dog would output (ideally) ~1 for both the cat & dog prediction categories and ~0 for all others.

  1. Based on this paper, this stale and closed PR and this open PR, it seems caffe is perfectly capable of accepting labels. Is this correct?

  2. Would the construction of such a network require the use of multiple neuron (inner product -> relu -> inner product) and softmax layers as in page 13 of this paper; or does Caffe's ip & softmax presently support multiple label dimensions?

  3. When I'm passing my labels to the network which example would illustrate the correct approach (if not both)?:

    E.g. Cat eating apple Note: Python syntax, but I use the c++ source.

    Column 0 - Class is in input; Column 1 - Class is not in input

    [[1,0],  # Apple
     [0,1],  # Baseball
     [1,0],  # Cat
     [0,1]]  # Dog
    

    or

    Column 0 - Class is in input

    [[1],  # Apple
     [0],  # Baseball
     [1],  # Cat
     [0]]  # Dog
    

If anything lacks clarity please let me know and I will generate pictorial examples of the questions I'm trying to ask.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Nice question. I believe there is no single "canonical" answer here and you may find several different approaches to tackle this problem. I'll do my best to show one possible way. It is slightly different than the question you asked, so I'll re-state the problem and suggest a solution.

The problem: given an input image and a set of C classes, indicate for each class if it is depicted in the image or not.

Inputs: in training time, inputs are pairs of image and a C-dim binary vector indicating for each class of the C classes if it is present in the image or not.

Output: given an image, output a C-dim binary vector (same as the second form suggested in your question).

Making caffe do the job: In order to make this work we need to modify the top layers of the net using a different loss.
But first, let's understand the usual way caffe is used and then look into the changes needed.
The way things are now: image is fed into the net, goes through conv/pooling/... layers and finally goes through an "InnerProduct" layer with C outputs. These C predictions goes into a "Softmax" layer that inhibits all but the most dominant class. Once a single class is highlighted "SoftmaxWithLoss" layer checks that the highlighted predicted class matches the ground truth class.

What you need: the problem with the existing approach is the "Softmax" layer that basically selects a single class. I suggest you replace it with a "Sigmoid" layer that maps each of the C outputs into an indicator whether this specific class is present in the image. For training, you should use "SigmoidCrossEntropyLoss" instead of the "SoftmaxWithloss" layer.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...