Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
315 views
in Technique[技术] by (71.8m points)

java - Removing ASCII characters in a string with encoding

I have a byte array which is filled by a serial port event and code is shown below:

private InputStream input = null; 
......
......
public void SerialEvent(SerialEvent se){
  if(se.getEventType == SerialPortEvent.DATA_AVAILABLE){
    int length = input.available();
    if(length > 0){
      byte[] array = new byte[length];
      int numBytes = input.read(array);
      String text = new String(array);
    }
  }
}

The variable text contains the below characters,

"33[K", "33[m",  "33[H2J", "33[6;1H" ,"33[?12l", "33[?25h", "33[5i", "33[4i", "33i" and similar types..

As of now, I use String.replace to remove all these characters from the string.

I have tried new String(array , 'CharSet'); //Tried with all CharSet options but I couldn't able to remove those.

Is there any way where I can remove those characters without using replace method?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I gave a unsatisfying answer, thanks to @OlegEstekhin for pointing that out. As noone else answered yet, and a solution is not a two-liner, here it goes.

Make a wrapping InputStream that throws away escape sequences. I have used a PushbackInputStream, where a partial sequence skipped, may still be pushed back for reading first. Here a FilterInputStream would suffice.

public class EscapeRemovingInputStream extends PushbackInputStream {

    public static void main(String[] args) {
        String s = "u001B[kHello u001B[H12JWorld!";
        byte[] buf = s.getBytes(StandardCharsets.ISO_8859_1);
        ByteArrayInputStream bais = new ByteArrayInputStream(buf);
        EscapeRemovingInputStream bin = new EscapeRemovingInputStream(bais);
        try (InputStreamReader in = new InputStreamReader(bin,
                StandardCharsets.ISO_8859_1)) {
            int c;
            while ((c = in.read()) != -1) {
                System.out.print((char) c);
            }
            System.out.println();
        } catch (IOException ex) {
            Logger.getLogger(EscapeRemovingInputStream.class.getName()).log(
                Level.SEVERE, null, ex);
        }
    }

    private static final Pattern ESCAPE_PATTERN = Pattern.compile(
        "u001B\[(k|m|H\d+J|\d+:\d+H|\?\d+\w|\d*i)");
    private static final int MAX_ESCAPE_LENGTH = 20;

    private final byte[] escapeSequence = new byte[MAX_ESCAPE_LENGTH];
    private int escapeLength = 0;
    private boolean eof = false;

    public EscapeRemovingInputStream(InputStream in) {
        this(in, MAX_ESCAPE_LENGTH);
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        for (int i = 0; i < len; ++i) {
            int c = read();
            if (c == -1) {
                return i == 0 ? -1 : i;
            }
            b[off + i] = (byte) c;
        }
        return len;
    }

    @Override
    public int read() throws IOException {
        int c = eof ? -1 : super.read();
        if (c == -1) { // Throw away a trailing half escape sequence.
            eof = true;
            return c;
        }
        if (escapeLength == 0 && c != 0x1B) {
            return c;
        } else {
            escapeSequence[escapeLength] = (byte) c;
            ++escapeLength;
            String esc = new String(escapeSequence, 0, escapeLength,
                    StandardCharsets.ISO_8859_1);
            if (ESCAPE_PATTERN.matcher(esc).matches()) {
                escapeLength = 0;
            } else if (escapeLength == MAX_ESCAPE_LENGTH) {
                escapeLength = 0;
                unread(escapeSequence);
                return super.read(); // No longer registering the escape
            }
            return read();
        }
    }

}
  • User calls EscapeRemovingInputStream.read
  • this read may call some read's itself to fill an byte buffer escapeSequence
  • (a push-back may be done calling unread)
  • the original read returns.

The recognition of an escape sequence seems grammatical: command letter, numerical argument(s). Hence I use a regular expression.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...