As already mentioned in comments, a solution would be to essentially use a customized text extraction strategy to insert a "[ 2]" text chunk at the coordinates of the image.
Code
You can e.g. extend the LocationTextExtractionStrategy
like this:
class SimpleMixedExtractionStrategy extends LocationTextExtractionStrategy
{
SimpleMixedExtractionStrategy(File outputPath, String name)
{
this.outputPath = outputPath;
this.name = name;
}
@Override
public void renderImage(final ImageRenderInfo renderInfo)
{
try
{
PdfImageObject image = renderInfo.getImage();
if (image == null) return;
int number = counter++;
final String filename = String.format("%s-%s.%s", name, number, image.getFileType());
Files.write(new File(outputPath, filename).toPath(), image.getImageAsBytes());
LineSegment segment = UNIT_LINE.transformBy(renderInfo.getImageCTM());
TextChunk location = new TextChunk("[" + filename + "]", segment.getStartPoint(), segment.getEndPoint(), 0f);
Field field = LocationTextExtractionStrategy.class.getDeclaredField("locationalResult");
field.setAccessible(true);
List<TextChunk> locationalResult = (List<TextChunk>) field.get(this);
locationalResult.add(location);
}
catch (IOException | NoSuchFieldException | SecurityException | IllegalArgumentException | IllegalAccessException ioe)
{
ioe.printStackTrace();
}
}
final File outputPath;
final String name;
int counter = 0;
final static LineSegment UNIT_LINE = new LineSegment(new Vector(0, 0, 1) , new Vector(1, 0, 1));
}
(Unfortunately for this kind of work, some members of LocationTextExtractionStrategy
are private. Thus, I used some Java reflection. Alternatively you can copy the whole class and change your copy accordingly.)
Example
Using that strategy you can extract mixed contents like this:
@Test
public void testSimpleMixedExtraction() throws IOException
{
InputStream resourceStream = getClass().getResourceAsStream("book-of-vaadin-page14.pdf");
try
{
PdfReader reader = new PdfReader(resourceStream);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
SimpleMixedExtractionStrategy listener = new SimpleMixedExtractionStrategy(OUTPUT_PATH, "book-of-vaadin-page14");
parser.processContent(1, listener);
Files.write(new File(OUTPUT_PATH, "book-of-vaadin-page14.txt").toPath(), listener.getResultantText().getBytes());
}
finally
{
if (resourceStream != null)
resourceStream.close();
}
}
E.g. for my test file (which contains page 14 of the Book of Vaadin):
You get this text
Getting Started with Vaadin
? A version of Book of Vaadin that you can browse in the Eclipse Help system.
You can install the plugin as follows:
1. Start Eclipse.
2. Select Help ? Software Updates....
3. Select the Available Software tab.
4. Add the Vaadin plugin update site by clicking Add Site....
[book-of-vaadin-page14-0.png]
Enter the URL of the Vaadin Update Site: http://vaadin.com/eclipse and click OK. The
Vaadin site should now appear in the Software Updates window.
5. Select all the Vaadin plugins in the tree.
[book-of-vaadin-page14-1.png]
Finally, click Install.
Detailed and up-to-date installation instructions for the Eclipse plugin can be found at http://vaad-
in.com/eclipse.
Updating the Vaadin Plugin
If you have automatic updates enabled in Eclipse (see Window ? Preferences ? Install/Update
? Automatic Updates), the Vaadin plugin will be updated automatically along with other plugins.
Otherwise, you can update the Vaadin plugin (there are actually multiple plugins) manually as
follows:
1. Select Help ? Software Updates..., the Software Updates and Add-ons window will
open.
2. Select the Installed Software tab.
14 Vaadin Plugin for Eclipse
and two images book-of-vaadin-page14-0.png
and book-of-vaadin-page14-1.png
in OUTPUT_PATH
.
Improvements to make
As also already mentioned in comments, this solution is for the easy situation in which the image has text above and/or below but neither left nor right.
If there is text left and/or right, too, there is the problem that the code above calculates LineSegment segment
as the bottom line of the image but the text strategy usually works with the base line of text which is above the bottom line.
But in this case one first has to decide at which position on which line one wants the marker in the text to be anyways. Having decided that, one can adapt the source above.