I'm using a Java StreamTokenizer to extract the various words and numbers of a String but have run into a problem where numbers which include commas are concerned, e.g. 10,567 is being read as 10.0 and ,567.
I also need to remove all non-numeric characters from numbers where they might occur, e.g. $678.00 should be 678.00 or -87 should be 87.
I believe these can be achieved via the whiteSpace and wordChars methods but does anyone have any idea how to do it?
The basic streamTokenizer code at present is:
BufferedReader br = new BufferedReader(new StringReader(text));
StreamTokenizer st = new StreamTokenizer(br);
st.parseNumbers();
st.wordChars(44, 46); // ASCII comma, - , dot.
st.wordChars(48, 57); // ASCII 0 - 9.
st.wordChars(65, 90); // ASCII upper case A - Z.
st.wordChars(97, 122); // ASCII lower case a - z.
while (st.nextToken() != StreamTokenizer.TT_EOF) {
if (st.ttype == StreamTokenizer.TT_WORD) {
System.out.println("String: " + st.sval);
}
else if (st.ttype == StreamTokenizer.TT_NUMBER) {
System.out.println("Number: " + st.nval);
}
}
br.close();
Or could someone suggest a REGEXP to achieve this? I'm not sure if REGEXP is useful here given that any parding would take place after the tokens are read from the string.
Thanks
Mr Morgan.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…