Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
563 views
in Technique[技术] by (71.8m points)

matlab - How to perform RCNN object detection on custom dataset?

I'm trying to perform object detection with RCNN on my own dataset following the tutorial on Matlab webpage. Based on the picture below:

enter image description here

I'm supposed to put image paths in the first column and the bounding box of each object in the following columns. But in each of my images, there is more than one object of each kind. For example there are 20 vehicles in one image. How should I deal with that? Should I create a separate row for each instance of vehicle in an image?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The example found on the website finds the pixel neighbourhood with the largest score and draws a bounding box around that region in the image. When you have multiple objects now, that complicates things. There are two approaches that you can use to facilitate finding multiple objects.

  1. Find all bounding boxes with scores that surpass some global threshold.
  2. Find the bounding box with the largest score and find those bounding boxes that surpass a percentage of this threshold. This percentage is arbitrary but from experience and what I have seen in practice, people tend to choose between 80% to 95% of the largest score found in the image. This will of course give you false positives if you submit an image as the query with objects not trained to be detected by the classifier but you will have to implement some more post-processing logic on your end.

An alternative approach would be to choose some value k and you would display the top k bounding boxes associated with the k highest scores. This of course requires that you know what the value of k is before hand and it will always assume that you have found an object in the image like the second approach.


In addition to the above logic, the approach that you state where you need to create a separate row for each instance of vehicle in the image is correct. This means that if you have multiple candidates of an object in a single image, you would need to introduce one row per instance while keeping the image filename the same. Therefore, if you had for example 20 vehicles in one image, you would need to create 20 rows in your table where the filename is all the same and you would have a single bounding box specification for each distinct object in that image.

Once you have done this, assuming that you have already trained the R-CNN detector and you want to use it, the original code to detect objects is the following referencing the website:

% Read test image
testImage = imread('stopSignTest.jpg');

% Detect stop signs
[bboxes, score, label] = detect(rcnn, testImage, 'MiniBatchSize', 128)

% Display the detection results
[score, idx] = max(score);

bbox = bboxes(idx, :);
annotation = sprintf('%s: (Confidence = %f)', label(idx), score);

outputImage = insertObjectAnnotation(testImage, 'rectangle', bbox, annotation);

figure
imshow(outputImage)

This only works for one object which has the highest score. If you wanted to do this for multiple objects, you would use the score that is output from the detect method and find those locations that either accommodate situation 1 or situation 2.

If you had situation 1, you would modify it to look like the following.

% Read test image
testImage = imread('stopSignTest.jpg');

% Detect stop signs
[bboxes, score, label] = detect(rcnn, testImage, 'MiniBatchSize', 128)

% New - Find those bounding boxes that surpassed a threshold
T = 0.7; % Define threshold here
idx = score >= T;

% Retrieve those scores that surpassed the threshold
s = score(idx);

% Do the same for the labels as well
lbl = label(idx);

bbox = bboxes(idx, :); % This logic doesn't change

% New - Loop through each box and print out its confidence on the image
outputImage = testImage; % Make a copy of the test image to write to
for ii = 1 : size(bbox, 1)
    annotation = sprintf('%s: (Confidence = %f)', lbl(ii), s(ii)); % Change    
    outputImage = insertObjectAnnotation(outputImage, 'rectangle', bbox(ii,:), annotation); % New - Choose the right box
end

figure
imshow(outputImage)

Note that I've stored the original bounding boxes, labels and scores in their original variables while the subset of the ones that surpassed the threshold in separate variables in case you want to cross-reference between the two. If you wanted to accommodate for situation 2, the code remains the same as situation 1 with the exception of defining the threshold.

The code from:

% New - Find those bounding boxes that surpassed a threshold
T = 0.7; % Define threshold here
idx = scores >= T;
% [score, idx] = max(score);

... would now change to:

% New - Find those bounding boxes that surpassed a threshold
perc = 0.85; % 85% of the maximum threshold
T = perc * max(score); % Define threshold here
idx = score >= T;

The end result will be multiple bounding boxes of the detected objects in the image - one annotation per detected object.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...