Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
175 views
in Technique[技术] by (71.8m points)

c++ - Are end+1 iterators for std::string allowed?

Is it valid to create an iterator to end(str)+1 for std::string?
And if it isn't, why isn't it?

This question is restricted to C++11 and later, because while pre-C++11 the data was already stored in a continuous block in any but rare POC toy-implementations, the data didn't have to be stored that way.
And I think that might make all the difference.

The significant difference between std::string and any other standard container I speculate on is that it always contains one element more than its size, the zero-terminator, to fulfill the requirements of .c_str().

21.4.7.1 basic_string accessors [string.accessors]

const charT* c_str() const noexcept;
const charT* data() const noexcept;

1 Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
2 Complexity: Constant time.
3 Requires: The program shall not alter any of the values stored in the character array.

Still, even though it should imho guarantee that said expression is valid, for consistency and interoperability with zero-terminated strings if nothing else, the only paragraph I found casts doubt on that:

21.4.1 basic_string general requirements [string.require]

4 The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

(All quotes are from C++14 final draft (n3936).)

Related: Legal to overwrite std::string's null terminator?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

TL;DR: s.end() + 1 is undefined behavior.


std::string is a strange beast, mainly for historical reasons:

  1. It attempts to bring C compatibility, where it is known that an additional character exists beyond the length reported by strlen.
  2. It was designed with an index-based interface.
  3. As an after thought, when merged in the Standard library with the rest of the STL code, an iterator-based interface was added.

This led std::string, in C++03, to number 103 member functions, and since then a few were added.

Therefore, discrepancies between the different methods should be expected.


Already in the index-based interface discrepancies appear:

§21.4.5 [string.access]

const_reference operator[](size_type pos) const;
reference operator[](size_type pos);

1/ Requires: pos <= size()

const_reference at(size_type pos) const; reference at(size_type pos);

5/ Throws: out_of_range if pos >= size()

Yes, you read this right, s[s.size()] returns a reference to a NUL character while s.at(s.size()) throws an out_of_range exception. If anyone tells you to replace all uses of operator[] by at because they are safer, beware the string trap...


So, what about iterators?

§21.4.3 [string.iterators]

iterator end() noexcept;
const_iterator end() const noexcept;
const_iterator cend() const noexcept;

2/ Returns: An iterator which is the past-the-end value.

Wonderfully bland.

So we have to refer to other paragraphs. A pointer is offered by

§21.4 [basic.string]

3/ The iterators supported by basic_string are random access iterators (24.2.7).

while §17.6 [requirements] seems devoid of anything related. Thus, strings iterators are just plain old iterators (you can probably sense where this is going... but since we came this far let's go all the way).

This leads us to:

24.2.1 [iterator.requirements.general]

5/ Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the expression *i is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable. [...]

So, *s.end() is ill-formed.

24.2.3 [input.iterators]

2/ Table 107 -- Input iterator requirements (in addition to Iterator)

List for pre-condition to ++r and r++ that r be dereferencable.

Neither the Forward iterators, Bidirectional iterators nor Random iterator lift this restriction (and all indicate they inherit the restrictions of their predecessor).

Also, for completeness, in 24.2.7 [random.access.iterators], Table 111 -- Random access iterator requirements (in addition to bidirectional iterator) lists the following operational semantics:

  • r += n is equivalent to [inc|dec]rememting r n times
  • a + n and n + a are equivalent to copying a and then applying += n to the copy

and similarly for -= n and - n.

Thus s.end() + 1 is undefined behavior.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...