Java Excel file parsing and large Excel file reading and writing

preface:

In the Java technology ecosystem, the mainstream technologies that can process Excel files include Apache POI, JXL, Alibaba EasyExcel, etc.
Apache POI parses based on DOM and loads files directly into memory, so it is fast and suitable for application scenarios with small amount of Excel file data. JXL only supports versions below Excel 2003, so it is not very common.
Alibaba EasyExcel adopts the parsing mode of reading row by row, and notifies the analysieventlistener of the parsing results of each row, so it is more suitable for parsing Excel files with large data volume.

1, Apache POI

Apache POI is a free, open source, cross platform Java API written in Java. Apache POI provides an API open source class library for Java programs to read and write files in Microsoft Office format.
It provides parsing of files in different formats:
● HSSF - provides the function of reading and writing files in Microsoft Excel format.
● XSSF - provides the function of reading and writing files in Microsoft Excel OOXML format.
● HWPF - provides the function of reading and writing files in Microsoft Word format.
● HSLF - provides the function of reading and writing files in Microsoft PowerPoint format.
● HDGF - provides the function of reading and writing files in Microsoft Visio format.

2, XSSF parsing Excel file

HSSF is used to parse the old version (*.xls) excel file. Because the old version of Excel file can only contain 65535 rows of data, it is not commonly used at present. Therefore, XSSF is mainly used to parse the new version (*.xlsx) Excel files at present.

Required jar package: in my resources

WorkBook (Excel file)

WorkBook is an interface, so when creating a WorkBook reference, you need to point to the subclass object. After creation, it represents an Excel file

	// Parsing an excel file

		try (// Read excel file through input stream
				FileInputStream in = new FileInputStream("D:\\1694318113\\FileRecv\\1627356552686.xlsx");
				// Input stream into Workbook
				Workbook workbook = new XSSFWorkbook(in)) {
			// Sheet: Workbook
			// Get Workbook
			int sheetNumber = workbook.getNumberOfSheets();
			System.out.println("Number of workbooks:" + sheetNumber);

			// Get workbook by name
			Sheet sheet1 = workbook.getSheet("Sheet0");
            // Get workbook by subscript
			Sheet sheet2 = workbook.getSheetAt(1);
			System.out.println("Data rows in Workbook 1" + sheet1.getLastRowNum());
			System.out.println("Data lines in Workbook 2" + sheet2.getLastRowNum());
		}

We need to use IO stream, so it is recommended to put the input stream and the code block of creating XSSFWorkbook object into the try () {} catch{} block

What types of files need to be parsed and what types of objects need to be created

● HSSF - provides the function of reading and writing files in Microsoft Excel format.
● XSSF - provides the function of reading and writing files in Microsoft Excel OOXML format.
● HWPF - provides the function of reading and writing files in Microsoft Word format.
● HSLF - provides the function of reading and writing files in Microsoft PowerPoint format.
● HDGF - provides the function of reading and writing files in Microsoft Visio format.

Sheet (Workbook)

Obtain or create Excel workbook Sheet objects through Workbook

The workbook can be obtained by subscript (the workbook subscript starts from zero) or directly getSheet (the name of the Workbook)

Sheet sheet1 = workbook.getSheet("Sheet0");
Sheet sheet2 = workbook.getSheetAt(1);

Get the number of workbooks

int num = workbook.getNumberofSheets();

Row (data row)

Get or create Row object of data Row through Sheet

Row row = sheet.createRow(0);

Get first line subscript

int first = sheet.getFirstRowNum();

Get trailing subscript

int last = sheet.getLastRowNum();

You can also get the specified row according to the subscript

Row row = sheet.getRow(0);

Traverse all rows

for(Row row : sheet) {
    System.out.println(row);
}

To traverse the specified line, you need to use subscripts, so use for (;;) {} cycle

for (int i = 1; i <= sheet.getLastRowNum(); i++) {
    Row row = sheet.getRow(i);
    System.out.println(row);
}

Cell (cell)

Use Row to obtain or create Cell objects

Cell cell0 = row.createCell(0);

Set cell value

cell0.setCellValue(UUID.randomUUID().toString());

Get cells by subscript

Cell cell = row.getCell(1);

Traverse all cells

for(Cell cell : row) {
				
}

Get cell type

CellType type = cell.getCellType();

Set cell style: use dataformat The getformat () method gets the encoding of a shor type in this format

Then store it into the CellStyle object (use the Workbook object to call the createCellStyle() method to create the CellStyle object), (use the CellStyle object to call the setDataFormat() method to store it), and finally call the setCellStyle() method to import the set CellStyle object into the Cell that you need to format

// Create cell style
DataFormat dataFormat = workbook.createDataFormat();
Short formatCode = dataFormat.getFormat("yyyy-MM-dd HH:mm:ss");
CellStyle cellStyle = workbook.createCellStyle();
cellStyle.setDataFormat(formatCode);

// ...

// Create cells for the current row
Cell cell1 = row.createCell(1);
cell1.setCellStyle(cellStyle); // Set cell style
cell1.setCellValue(new Date()); // Save the current date and time to this cell

Set cell alignment

// Create cell style
CellStyle cellStyle = workbook.createCellStyle();

//Set the horizontal alignment type of cells. Horizontal center at this time
cellStyle.setAlignment(HorizontalAlignment.CENTER);

// Set the vertical alignment type of cells. At this time, the vertical bottom edge
cellStyle.setVerticalAlignment(VerticalAlignment.BOTTOM);

Large Excel file reading and writing

1. Write using POI

Use SXSSFWorkbook to write. By setting the construction parameters of SXXFWorkbook, you can set the number of rows held in memory each time. When this value is reached, these data will be flush ed to the disk, so that there will be no lack of memory

try (Workbook workbook = new SXSSFWorkbook(1000);
				FileOutputStream out = new FileOutputStream("D:\\test\\10w.xlsx")) {
			Sheet sheet = workbook.createSheet();
			// Create a cell format object
			DataFormat dataFormat = workbook.createDataFormat();
			// Get cell format encoding value
			short dateFormatCode = dataFormat.getFormat("yyyy year MM month dd day HH:mm:ss");
			short moneyFormatCode = dataFormat.getFormat("¥#,###");
			
			// Create date format object
			CellStyle dateCellStyle = workbook.createCellStyle();
			dateCellStyle.setDataFormat(dateFormatCode);
			
			// Create currency format object
			CellStyle moneyCellStyle = workbook.createCellStyle();
			moneyCellStyle.setDataFormat(moneyFormatCode);
			Row row = sheet.createRow(0);
			Cell cell0 = row.createCell(0);
			cell0.setCellValue("Serial number");
			
			Cell cell1 = row.createCell(1);
			cell1.setCellValue("full name");
			
			Cell cell2 = row.createCell(2);
			cell2.setCellValue("Creation time");
			
			Cell cell3 = row.createCell(3);
			cell3.setCellValue("Red envelope amount");
			for(int i = 0;i<100000;i++) {
				Row rowVal = sheet.createRow(i+1);
				Cell cellVal0 = rowVal.createCell(0);
				cellVal0.setCellValue(String.valueOf(i+1));
				
				Cell cellVal1 = rowVal.createCell(1);
				cellVal1.setCellValue("A"+i);
				
				Cell cellVal2 = rowVal.createCell(2);
				cellVal2.setCellStyle(dateCellStyle);
				cellVal2.setCellValue(new Date());
				
				Cell cellVal3 = rowVal.createCell(3);
				cellVal3.setCellStyle(moneyCellStyle);
				cellVal3.setCellValue(Math.random()*10000);
			}
			workbook.write(out);
			
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}

 2. Using EasyExcel

Using this method, you need to import the jar package related to alibaba easyexcel (in the resources I uploaded)

Step 1: prepare entity classes

public class Order {
    @ExcelProperty("Order No")
	private String orderId; // Order No
	
    @ExcelProperty("Payment amount")
	@NumberFormat("¥#,###")
	private Double payment; // Payment amount
	
    @ExcelProperty(value = "Creation date",converter = LocalDateTimeConverter.class)
	private LocalDateTime creationTime; // Creation time

	public Order() {
		this.orderId = LocalDateTime.now().format(DateTimeFormatter.ofPattern("yyyyMMddhhmmss"))
				+ UUID.randomUUID().toString().substring(0, 5);
		this.payment = Math.random() * 10000;
		this.creationTime = LocalDateTime.now();
	}

	public String getOrderId() {
		return orderId;
	}

	public void setOrderId(String orderId) {
		this.orderId = orderId;
	}

	public Double getPayment() {
		return payment;
	}

	public void setPayment(Double payment) {
		this.payment = payment;
	}

	public LocalDateTime getCreationTime() {
		return creationTime;
	}

	public void setCreationTime(LocalDateTime creationTime) {
		this.creationTime = creationTime;
	}



	@Override
	public String toString() {
		return "Order [orderId=" + orderId + ", payment=" + payment + ", creationTime=" + creationTime + "]";
	}
}

Step 2: prepare the Converter conversion class (compatible with LocalDateTime date time class). This class can set the time format, otherwise the imported jar package is incompatible and cannot be set into the time format

public class LocalDateTimeConverter implements Converter<LocalDateTime> {

	@Override
	public Class<LocalDateTime> supportJavaTypeKey() {
		return LocalDateTime.class;
	}

	@Override
	public CellDataTypeEnum supportExcelTypeKey() {
		return CellDataTypeEnum.STRING;
	}

	@Override
	public LocalDateTime convertToJavaData(CellData cellData, ExcelContentProperty contentProperty,
			GlobalConfiguration globalConfiguration) {
		return LocalDateTime.parse(cellData.getStringValue(), DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss"));
	}

	@Override
	public CellData<String> convertToExcelData(LocalDateTime value, ExcelContentProperty contentProperty,
			GlobalConfiguration globalConfiguration) {
		return new CellData<>(value.format(DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")));
	}

}

Step 3: write 100w Order type data

public class Demo {
	public static void main(String[] args) {
        // Write 100w
        EasyExcel.write("c:\\test\\run\\easy.xlsx", Order.class)
                 .sheet("Order list")
                 .doWrite(data());
    }
    
    // Create 100w order data
    private static List<Order> data() {
        List<Order> list = new ArrayList<Order>();
        for (int i = 0; i < 1000000; i++) {
            list.add(new Order());
        }
        return list;
    }
}

Summary: compared with SXSSFWorkbook, easyExcel can greatly shorten the writing time and write larger data

Tags: Java poi Eclipse programming language

Posted by trygve on Tue, 19 Jul 2022 12:59:28 +0930