Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
261 views
in Technique[技术] by (71.8m points)

java - exception while Read very large file > 300 MB

My task is to open a large file in READ&WRITE mode and i need to search some portion of text in that file by searching starting and end point. Then i need to write that searched area of text to a new file and delete that portion from the original file.

The above process i will do more times. So I thought that for these process, it will be easy by loading the file into memory by CharBuffer and can search easily by MATCHER class. But im getting HeapSpace exception while reading, even though i increased to 900MB by executing like below java -Xms128m -Xmx900m readLargeFile My code is

FileChannel fc = new FileInputStream(fFile).getChannel();
CharBuffer chrBuff = Charset.forName("8859_1").newDecoder().decode(fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size()));

For that above code every one suggested me that its a bad idea to load everything into memory and If file size is 300 MB means, it will be 600MB due to charSet.

So above is my task, then now suggest me some efficient ways. Note that my file size will be more and using JAVA only i've to do these things.

Thanks in Advance...

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You definitely do NOT want to load a 300MB file into a single large buffer with Java. The way you're doing things is supposed to be more efficient for large files than just using normal I/O, but when you run a Matcher against an entire file mapped into memory as you are, you can very easily exhaust memory.

First, your code memory maps the file into memory ... this will consume 300 Meg of memory in your virtual address space as the file is mmaped into it, although this is outside the heap. (Note that the 300 Meg of virtual address space is tied up until the MappedByteBuffer is garbage collected. See below for discussion. The JavaDoc for map warns you about this.) Next, you create a ByteBuffer backed by this mmaped file. This should be fine, as it's just a "view" of the mmaped file and should thus take minimal extra memory. It will be a small object in the heap with a "pointer" to a large object outside the heap. Next, you decode this into a CharBuffer, which means you make a copy of the 300 MB buffer, but you make a 600 MB copy (on the heap) because a char is 2 bytes.

To respond to a comment, and looking at the JDK Source code to be sure, when you call map() as the OP is, you do in fact map the entire file into memory. Looking at openJDK 6 b14 Windows native code sun.nio.ch.FileChannelImpl.c, it first calls CreateFileMapping, then calls MapViewOfFile. Looking at this source, if you ask to map the whole file into memory, this method will do exactly as you ask. To quote MSDN:

Mapping a file makes the specified portion of a file visible in the address space of the calling process.

For files that are larger than the address space, you can only map a small portion of the file data at one time. When the first view is complete, you can unmap it and map a new view.

The way the OP is calling map, the "specified portion" of the file is the entire file. This won't contribute to heap exhaustion, but it can contribute to virtual address space exhaustion, which is still an OOM error. This can kill your application just as thoroughly as running out of heap.

Finally, when you make a Matcher, the Matcher potentially makes more copies of this 600 MB CharBuffer, depending on how you use it. Ouch. That's a lot of memory used by a small number of objects! Given a Matcher, every time you call toMatchResult(), you'll make a String copy of the entire CharBuffer. Also, every time you call replaceAll(), at best you will make a String copy of the entire CharBuffer. At worst you will make a StringBuffer that will slowly be expanded to the full size of the replaceAll result (applying a lot of memory pressure on the heap), and then make a String from that.

Thus, if you call replaceAll on a Matcher against a 300 MB file, and your match is found, then you'll first make a series of ever-larger StringBuffers until you get one that is 600 MB. Then you'll make a String copy of this StringBuffer. This can quickly and easily lead to heap exhaustion.

Here's the bottom line: Matchers are not optimized for working on very large buffers. You can very easily, and without planning to, make a number of very large objects. I discovered this when doing something similar enough to what you're doing and encountering memory exhaustion, then looking at the source code for Matcher.

NOTE: There is no unmap call. Once you call map, the virtual address space outside the heap tied up by the MappedByteBuffer is stuck there until the MappedByteBuffer is garbage collected. As a result, you will be unable to perform certain operations on the file (delete, rename, ...) until the MappedByteBuffer is garbage collected. If call map enough times on different files, but don't have sufficient memory pressure in the heap to force a garbage collection, you can out of memory outside the heap. For a discussion, see Bug 4724038.

As a result of all of the discussion above, if you will be using it to make a Matcher on large files, and you will be using replaceAll on the Matcher, then memory mapped I/O is probably not the way to go. It will simply create too many large objects on the heap as well as using up a lot of your virtual address space outside the heap. Under 32 bit Windows, you have only 2GB (or if you have changed settings, 3GB) of virtual address space for the JVM, and this will apply significant memory pressure both inside and outside the heap.

I apologize for the length of this answer, but I wanted to be thorough. If you think any part of the above is wrong, please comment and say so. I will not do retaliatory downvotes. I am very positive that all of the above is accurate, but if something is wrong, I want to know.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...