The behaviour is by design.
Me: It must be the multi_pass
iterator adaptor. Since there is no grammar Spirit doesn't know when it can be flushed. [...]
You: As fas as I know, istream_iterator
takes care of reading the input stream without having to store the whole stream into memory
Yes. But you're not using std::istream_iterator
. You're using Boost Spirit. Which is a parser generator. Parsers need random access for backtracking.
Spirit supports input iterators by adapting an input sequence to a random-access sequence with the multi_pass
adaptor. This iterator adaptor stores a variable-size buffer1 for backtracking purposes. Certain actions (expectation points, always-greedy operators like Kleene-*
etc) tell the parser framework when it's safe to flush the buffer.
The Problem:
You're not parsing, just tokenizing. Nothing ever tells the iterator to flush its buffers.
The buffer is unbounded, so memory usage grows. Of course it's not a leak because as soon as the last copy of a multi-pass adapted iterator goes out of scope, the shared backtracking buffer is freed.
The Solution:
The simplest solution is to use a random access source. If you can, use a memory mapped file.
Other solutions would involve telling the multi-pass adaptor to flush. The simplest way to achieve this would be to use tokenize_and_parse
. Even with a faux grammar like *(any_token)
this should be enough to convince the parser framework you will not be asking it to backtrack.
Inspiration:
1 http://www.boost.org/doc/libs/1_62_0/libs/spirit/doc/html/spirit/support/multi_pass.html by default it stores a shared deque. See it after running your test for a little while using dd if=/dev/zero bs=1M | valgrind --tool=massif ./sotest
:
Clearly shows all the memory in
100.00% (805,385,576B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->99.99% (805,306,368B) 0x4187D5: void boost::spirit::iterator_policies::split_std_deque::unique<char>::increment<boost::spirit::multi_pass<std::istream, boost::spirit::iterator_policies::default_policy<boost::spirit::iterator_policies::ref_counted, boost::spirit::iterator_policies::no_check, boost::spirit::iterator_policies::istream, boost::spirit::iterator_policies::split_std_deque> > >(boost::spirit::multi_pass<std::istream, boost::spirit::iterator_policies::default_policy<boost::spirit::iterator_policies::ref_counted, boost::spirit::iterator_policies::no_check, boost::spirit::iterator_policies::istream, boost::spirit::iterator_policies::split_std_deque> >&) (in /home/sehe/Projects/stackoverflow/sotest)
| ->99.99% (805,306,368B) 0x404BC3: main (in /home/sehe/Projects/stackoverflow/sotest)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…