Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
134 views
in Technique[技术] by (71.8m points)

java - Spark Dataset show: Unable to Capture Output Multiple Times

I am asking this question even though I have already got a work around (see answers), to save anyone else this same pain.

I required a method to show my dataset to my log4j logger. I did this using: void org.apache.spark.sql.Dataset.show(int numRows, boolean truncate) which simply logs to the stdOut. In order to capture the stdOut I did the following (inspiration found somewhere else on stackoverflow):

void myMethod(Dataset<Row> data){
    // Save the old System.out
    PrintStream originalPrintStream = System.out;

    // Tell Java to use your special stream
    ByteArrayOutputStream logCollection = new ByteArrayOutputStream();
    PrintStream printStreamForCollectingLogs = new PrintStream(logCollection);
    System.setOut(printStreamForCollectingLogs);

    // Print some output: goes to your special stream
    data.show(MAX_DISPLAY_ROWS, false);

    // Put things back
    System.out.flush();
    System.setOut(originalPrintStream);

    logger.info("
"+logCollection.toString());
    logCollection.reset();
}

This works only once, subsequent calls to the same method for the same dataset will fail to capture anything. I am using:

      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.11</artifactId>
      <version>2.4.5</version>
question from:https://stackoverflow.com/questions/65901329/spark-dataset-show-unable-to-capture-output-multiple-times

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The issue seems to have been caused by the Dataset retaining the byte stream that it first encounters as stdOut, as I was able to resolve this issue by extracting my alternative stdOut stream as a class variable:

private static ByteArrayOutputStream logCollection = new ByteArrayOutputStream();
private static PrintStream printStreamForCollectingLogs = new PrintStream(logCollection);

This means I am using the same bytestream in logger.info(" "+logCollection.toString()) each time. This allows me to call myMethod as often as I want and capture the output each time.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...