Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
497 views
in Technique[技术] by (71.8m points)

c++ - sscanf() and locales. How does one really parse things like "3.14"?

Let's say I have to read a file, containing a bunch of floating-point numbers. The numbers can be like 1e+10, 5, -0.15 etc., i.e., any generic floating-point number, using decimal points (this is fixed!). However, my code is a plugin for another application, and I have no control over what's the current locale. It may be Russian, for example, and the LC_NUMERIC rules there call for a decimal comma to be used. Thus, Pi is expected to be spelled as "3,1415...", and

sscanf("3.14", "%f", &x); 

returns "1", and x contains "3.0", since it refuses to parse past the '.' in the string.

I need to ignore the locale for such number-parsing tasks.

How does one do that?

I could write a parseFloat function, but this seems like a waste.
I could also save the current locale, reset it temporarily to "C", read the file, and restore to the saved one. What are the performance implications of this? Could setlocale() be very slow on some OS/libc combo, what does it really do under the hood?
Yet another way would be to use iostreams, but again their performance isn't stellar.

So I'm puzzled. What do you guys do in such situations?

Cheers!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

My personal preference is to never use LC_NUMERIC, i.e. just call setlocale with other categories, or, after calling setlocale with LC_ALL, use setlocale(LC_NUMERIC, "C");. Otherwise, you're completely out of luck if you want to use the standard library for printing or parsing numbers in a standared form for interchange.

If you're lucky enough to be on a POSIX 2008 conforming system, you can use the uselocale and *_l family of functions to make the situation somewhat better. There are at least 2 basic approaches:

  1. Leave the default locale unset (at least the troublesome parts like LC_NUMERIC; LC_CTYPE should probably always be set), and pass a locale_t object for the user's locale to the appropriate *_l functions only when you want to present things to the user in a way that meets their own cultural expectations; otherwise use the default C locale.

  2. Have your code that needs to work with data for interchange keep around a locale_t object for the C locale, and either switch back and forth using uselocale when you need to work with data in a standard form for interchange, or use the appropriate *_l functions (but there is no scanf_l).

Note that implementing your own floating point parser is not easy and is probably not the right solution to the problem unless you're an expert in numerical computing. Getting it right is very hard.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...