Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
711 views
in Technique[技术] by (71.8m points)

java - Add content to a very large Excel file using Apache POI (run out of alternatives...)

I have a large xlsx file which has an empty "data source sheet" and other sheets containing lots of formulas that use the data source sheet. My application should generate the data, open the file, fill the empty sheet up with that data and save it. I'm trying to do all that with Apache POI. The problem is that opening the file takes an unacceptable amount of memory and time. I've read other threads and couldn't find a solution. This is how I open the file:

pkg = OPCPackage.open(filename);
wb = new XSSFWorkbook(pkg);

Please note that using SXSSFWorkbook does not work as its constructors take a XSSFWorkbook which I'm unable to create in the first place. What I need is just to fill one empty sheet in the file, I don't need to completely load it in memory. Any Ideas??

Thank you!!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You could try working only with the OPCPackage without creating a Workbook. But then we must work at the lower level org.openxmlformats.schemas.spreadsheetml.x2006.main objects. This means we have not the support from the XSSF objects while storing string values as data (SharedStringsTable) and evaluating formulas.

The example takes a Excel workbook with at least 4 worksheets. The third worksheet is your "data source sheet". It must exist and will be overwritten with new data. The fourth worksheet is the worksheet in which formulas are referencing the "data source sheet". Since we can't use an evaluator, we must set FullCalcOnLoad true. If we would not do that, we had to press [Ctrl]+[Alt]+[Shift]+[F9] to force fully recalculation.

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackagePart;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;

import org.apache.poi.xssf.model.SharedStringsTable;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.OutputStream;

import org.openxmlformats.schemas.spreadsheetml.x2006.main.WorksheetDocument;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTWorksheet;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTSheetData;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRst;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTCell;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.STCellType;

import  org.openxmlformats.schemas.officeDocument.x2006.relationships.STRelationshipId;

import org.apache.xmlbeans.XmlOptions;
import org.apache.xmlbeans.XmlException;

import javax.xml.namespace.QName;

import java.util.List;
import java.util.Map;
import java.util.HashMap;

import java.util.regex.Pattern;

class ReadAndWriteTest5 {

 public static void main(String[] args) {
  try {

   File file = new File("ReGesamt11_3Test.xlsx");
   //we only open the OPCPackage, we don't create a Workbook
   OPCPackage opcpackage = OPCPackage.open(file);

   //if there are strings in the SheetData, we need the SharedStringsTable
   PackagePart sharedstringstablepart = opcpackage.getPartsByName(Pattern.compile("/xl/sharedStrings.xml")).get(0);
   SharedStringsTable sharedstringstable = new SharedStringsTable();
   sharedstringstable.readFrom(sharedstringstablepart.getInputStream());

   //create empty WorksheetDocument for the "data source sheet"
   WorksheetDocument worksheetdocument = WorksheetDocument.Factory.newInstance();
   CTWorksheet worksheet = worksheetdocument.addNewWorksheet();
   CTSheetData sheetdata = worksheet.addNewSheetData();

   //put some data in for the "data source sheet"
   for (int i = 0; i < 10; i++) {

    CTCell ctcell= sheetdata.addNewRow().addNewC();

    CTRst ctstr = CTRst.Factory.newInstance();
    ctstr.setT("DataRow " + i);
    int sRef = sharedstringstable.addEntry(ctstr);
    ctcell.setT(STCellType.S);
    ctcell.setV(Integer.toString(sRef));

    ctcell=sheetdata.getRowArray(i).addNewC();
    ctcell.setV(""+(i*100+(i+1)*10+(i+2))+"."+((i+3)*10+(i+4)));

   }

   //write the SharedStringsTable
   OutputStream out = sharedstringstablepart.getOutputStream();
   sharedstringstable.writeTo(out);
   out.close();

   //create XmlOptions for saving the worksheet
   XmlOptions xmlOptions = new XmlOptions();
   xmlOptions.setSaveOuter();
   xmlOptions.setUseDefaultNamespace();
   xmlOptions.setSaveAggressiveNamespaces();
   xmlOptions.setCharacterEncoding("UTF-8");
   xmlOptions.setSaveSyntheticDocumentElement(new QName(CTWorksheet.type.getName().getNamespaceURI(), "worksheet"));
   Map<String, String> map = new HashMap<String, String>();
   map.put(STRelationshipId.type.getName().getNamespaceURI(), "r");
   xmlOptions.setSaveSuggestedPrefixes(map);

   //get the PackagePart of the third sheet which is the "data source sheet" 
   //this sheet must exist and will be replaced with the new content
   PackagePart sheetpart = opcpackage.getPartsByName(Pattern.compile("/xl/worksheets/sheet3.xml")).get(0);
   //save the worksheet as the third sheet which is the "data source sheet" 
   out = sheetpart.getOutputStream();
   worksheet.save(out, xmlOptions);
   out.close();

   //get the PackagePart of the fourth sheet which is the sheet on which formulas are referencing the "data source sheet"
   //since we can't use Evaluator, we must force recalculation on load for this sheet
   sheetpart = opcpackage.getPartsByName(Pattern.compile("/xl/worksheets/sheet4.xml")).get(0);
   worksheetdocument = WorksheetDocument.Factory.parse(sheetpart.getInputStream());
   worksheet = worksheetdocument.getWorksheet();
   //setFullCalcOnLoad true
   if (worksheet.getSheetCalcPr() == null) {
    worksheet.addNewSheetCalcPr().setFullCalcOnLoad(true);
   } else {
    worksheet.getSheetCalcPr().setFullCalcOnLoad(true);
   }
   out = sheetpart.getOutputStream();
   worksheet.save(out, xmlOptions);
   out.close();

   opcpackage.close();

  } catch (InvalidFormatException ifex) {
     ifex.printStackTrace();
  } catch (FileNotFoundException fnfex) {
     fnfex.printStackTrace();
  } catch (IOException ioex) {
     ioex.printStackTrace();
  } catch (XmlException xmlex) {
     xmlex.printStackTrace();
  }
 }
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...