Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
179 views
in Technique[技术] by (71.8m points)

c++ - What is the rationale for parenthesis in C++11's raw string literals R"(...)"?

There is a very convenient feature introduced in C++11 called raw string literals, which are strings with no escape characters. And instead of writing this:

  regex mask("\t[0-9]+\.[0-9]+\t\\SUB");

You can simply write this:

  regex mask(R"([0-9]+.[0-9]+\SUB)");

Quite more readable. However, note extra parenthesis around the string one have to place to define a raw string literal.

My question is, why do we even need these? For me it looks quite ugly and illogical. Here are the cons what I see:

  • Extra verbosity, while the whole feature is used to make literals more compact
  • Hard to distinguish between the body of the literal and the defining symbols

That's what I mean by the hard distinguishing:

"good old usual string literal"
 ^-    body inside quotes   -^

R"(new strange raw string literal)"
   ^- body inside parenthesis  -^

And here is the pro:

  • More flexibility, more characters available in raw strings, especially when used with the delimiter: "delim( can use "()" here )delim"

But hey, if you need more flexibility, you have old good escapeable string literals. Why the standard committee decided to pollute the content of every raw string literal with these absolutely unnecessary parenthesis? What was the rationale behind that? What are the pros I didn't mention?

UPD The answer by Kerrek is great, but it is not an answer, unfortunately. Since I already described that I understand how it works and what benefits does it give. Five years passed since I've asked this question, and still there is no answer. And I am still frustrated by this decision. One could say that this is a matter of taste, but I would disagree. How many spaces do you use, how do you name your variables, is this SomeFunction() or some_function() - this is the matter of taste. And I can really easily switch from one style to another.

But this?.. Still feels awkward and clumsy after so many years. No, this is not about the taste. This is about how we want to cover all possible cases no matter what. We doomed to write these ugly parens every time we need to write a Windows-specific path, or a regular expression, or a multi-line string literal. And for what?.. For those rare cases when we actually need to put " in a string? I wish I was on that committee meeting where they decided to do it this way. And I would be strongly against this really bad decision. I wish. Now we are doomed.

Thank you for reading this far. Now I feel a little better.

UPD2 Here are my alternative proposals, which I think both would be MUCH better than existing.

Proposal 1. Inspired by python. Cannot support string literals with triple quotes: R"""Here is a string literal with any content, except for triple quotes, which you don't actually use that often."""

Proposal 2. Inspired by common sense. Supports all possible string literals, just like the current one: R"delim"content of string"delim". With empty delimiter: R""Looks better, doesn't it?"". Empty raw string: R"""". Raw string with double quotes: R"#"Here are double quotes: "", thanks"#".

Any problems with these proposals?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The purpose of the parentheses is to allow you to specify a custom delimiter:

R"foo(Hello World)foo"   // the string "Hello World"

In your example, and in typical use, the delimiter is simply empty, so the raw string is enclosed by the sequences R"( and )".

Allowing for arbitrary delimiters is a design decision that reflects the desire to provide a complete solution without weird limitations or edge cases. You can pick any sequence of characters that does not occur in your string as the delimiter.

Without this, you would be in trouble if the string itself contained something like " (if you had just wanted R"..." as your raw string syntax) or )" (if the delimiter is empty). Both of those are perfectly common and frequent character sequences, especially in regular expressions, so it would be incredibly annoying if the decision whether or not you use a raw string depended on the specific content of your string.

Remember that inside the raw string there's no other escape mechanism, so the best you could do otherwise was to concatenate pieces of string literal, which would be very impractical. By allowing a custom delimiter, all you need to do is pick an unusual character sequence once, and maybe modify it in very rare cases when you make a future edit.

But to stress once again, even the empty delimiter is already useful, since the R"(...)" syntax allows you to place naked quotation marks in your string. That by itself is quite a gain.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...