Html
extends BaseReader
in package
Table of Contents
Constants
- TEST_SAMPLE_SIZE = 2048
- Sample size to read to determine if it's HTML or not.
- FORMATS = [ 'h1' => ['font' => ['bold' => true, 'size' => 24]], // Bold, 24pt 'h2' => ['font' => ['bold' => true, 'size' => 18]], // Bold, 18pt 'h3' => ['font' => ['bold' => true, 'size' => 13.5]], // Bold, 13.5pt 'h4' => ['font' => ['bold' => true, 'size' => 12]], // Bold, 12pt 'h5' => ['font' => ['bold' => true, 'size' => 10]], // Bold, 10pt 'h6' => ['font' => ['bold' => true, 'size' => 7.5]], // Bold, 7.5pt 'a' => ['font' => ['underline' => true, 'color' => ['argb' => \PhpOffice\PhpSpreadsheet\Style\Color::COLOR_BLUE]]], // Blue underlined 'hr' => ['borders' => ['bottom' => ['borderStyle' => \PhpOffice\PhpSpreadsheet\Style\Border::BORDER_THIN, 'color' => [\PhpOffice\PhpSpreadsheet\Style\Color::COLOR_BLACK]]]], // Bottom border 'strong' => ['font' => ['bold' => true]], // Bold 'b' => ['font' => ['bold' => true]], // Bold 'i' => ['font' => ['italic' => true]], // Italic 'em' => ['font' => ['italic' => true]], ]
- Formats.
Properties
- $allowExternalImages : bool
- Allow external images. Use with caution.
- $dataArray : array<string|int, array<string|int, mixed>>
- Data Array used for testing only, should write to Spreadsheet object on completion of tests.
- $fileHandle : resource
- $ignoreRowsWithNoCells : bool
- Ignore rows with no cells? Identifies whether the Reader should ignore rows with no cells.
- $includeCharts : bool
- Read charts that are defined in the workbook? Identifies whether the Reader should read the definitions for any charts that exist in the workbook;.
- $inputEncoding : string
- Input encoding.
- $loadSheetsOnly : null|array<string|int, string>
- Restrict which sheets should be loaded? This property holds an array of worksheet names to be loaded. If null, then all worksheets will be loaded.
- $nestedColumn : array<string|int, string>
- $readDataOnly : bool
- Read data only? Identifies whether the Reader should only read data values for cells, and ignore any formatting information; or whether it should read both data and formatting.
- $readEmptyCells : bool
- Read empty cells? Identifies whether the Reader should read data values for all cells, or should ignore cells containing null value or empty string.
- $readFilter : IReadFilter
- IReadFilter instance.
- $rowspan : array<string, bool>
- $securityScanner : XmlScanner|null
- $sheetIndex : int
- Sheet index to read.
- $tableLevel : int
- $valueBinder : IValueBinder|null
Methods
- __construct() : mixed
- Create a new HTML Reader instance.
- canRead() : bool
- Validate that the current file is an HTML file.
- getAllowExternalImages() : bool
- getBorderMappings() : array<string, string>
- getBorderStyle() : string|null
- Map html border style to PhpSpreadsheet border style.
- getIgnoreRowsWithNoCells() : bool
- getIncludeCharts() : bool
- Read charts in workbook? If this is true, then the Reader will include any charts that exist in the workbook.
- getLoadSheetsOnly() : null|array<string|int, string>
- Get which sheets to load Returns either an array of worksheet names (the list of worksheets that should be loaded), or a null indicating that all worksheets in the workbook should be loaded.
- getReadDataOnly() : bool
- Read data only? If this is true, then the Reader will only read data values for cells, it will not read any formatting or structural information (like merges).
- getReadEmptyCells() : bool
- Read empty cells? If this is true (the default), then the Reader will read data values for all cells, irrespective of value.
- getReadFilter() : IReadFilter
- Read filter.
- getSecurityScanner() : XmlScanner|null
- getSecurityScannerOrThrow() : XmlScanner
- getSheetIndex() : int
- Get sheet index.
- getStyleColor() : string
- Check if has #, so we can get clean hex.
- getValueBinder() : IValueBinder|null
- listWorksheetInfo() : array<int, array{worksheetName: string, lastColumnLetter: string, lastColumnIndex: int, totalRows: int, totalColumns: int, sheetState: string}>
- Return worksheet info (Name, Last Column Letter, Last Column Index, Total Rows, Total Columns).
- listWorksheetNames() : array<string|int, string>
- Returns names of the worksheets from a file, possibly without parsing the whole file to a Spreadsheet object.
- load() : Spreadsheet
- Loads Spreadsheet from file.
- loadFromString() : Spreadsheet
- Spreadsheet from content.
- loadIntoExisting() : Spreadsheet
- Loads PhpSpreadsheet from file into PhpSpreadsheet instance.
- loadSpreadsheetFromFile() : Spreadsheet
- Loads Spreadsheet from file.
- setAllowExternalImages() : self
- Allow external images. Use with caution.
- setIgnoreRowsWithNoCells() : self
- setIncludeCharts() : $this
- Set read charts in workbook Set to true, to advise the Reader to include any charts that exist in the workbook.
- setLoadAllSheets() : $this
- Set all sheets to load Tells the Reader to load all worksheets from the workbook.
- setLoadSheetsOnly() : $this
- Set which sheets to load.
- setReadDataOnly() : $this
- Set read data only Set to true, to advise the Reader only to read data values for cells, and to ignore any formatting or structural information (like merges).
- setReadEmptyCells() : $this
- Set read empty cells Set to true (the default) to advise the Reader read data values for all cells, irrespective of value.
- setReadFilter() : $this
- Set read filter.
- setSheetIndex() : $this
- Set sheet index.
- setValueBinder() : self
- flushCell() : void
- Flush cell.
- getTableStartColumn() : string
- newSpreadsheet() : Spreadsheet
- openFile() : void
- Open file for reading.
- processDomElement() : void
- processFlags() : void
- releaseTableStartColumn() : string
- setTableStartColumn() : string
Constants
TEST_SAMPLE_SIZE
Sample size to read to determine if it's HTML or not.
public
mixed
TEST_SAMPLE_SIZE
= 2048
FORMATS
Formats.
protected
mixed
FORMATS
= [
'h1' => ['font' => ['bold' => true, 'size' => 24]],
// Bold, 24pt
'h2' => ['font' => ['bold' => true, 'size' => 18]],
// Bold, 18pt
'h3' => ['font' => ['bold' => true, 'size' => 13.5]],
// Bold, 13.5pt
'h4' => ['font' => ['bold' => true, 'size' => 12]],
// Bold, 12pt
'h5' => ['font' => ['bold' => true, 'size' => 10]],
// Bold, 10pt
'h6' => ['font' => ['bold' => true, 'size' => 7.5]],
// Bold, 7.5pt
'a' => ['font' => ['underline' => true, 'color' => ['argb' => \PhpOffice\PhpSpreadsheet\Style\Color::COLOR_BLUE]]],
// Blue underlined
'hr' => ['borders' => ['bottom' => ['borderStyle' => \PhpOffice\PhpSpreadsheet\Style\Border::BORDER_THIN, 'color' => [\PhpOffice\PhpSpreadsheet\Style\Color::COLOR_BLACK]]]],
// Bottom border
'strong' => ['font' => ['bold' => true]],
// Bold
'b' => ['font' => ['bold' => true]],
// Bold
'i' => ['font' => ['italic' => true]],
// Italic
'em' => ['font' => ['italic' => true]],
]
Properties
$allowExternalImages
Allow external images. Use with caution.
protected
bool
$allowExternalImages
= false
Improper specification of these within a spreadsheet can subject the caller to security exploits.
$dataArray
Data Array used for testing only, should write to Spreadsheet object on completion of tests.
protected
array<string|int, array<string|int, mixed>>
$dataArray
= []
$fileHandle
protected
resource
$fileHandle
$ignoreRowsWithNoCells
Ignore rows with no cells? Identifies whether the Reader should ignore rows with no cells.
protected
bool
$ignoreRowsWithNoCells
= false
Currently implemented only for Xlsx.
$includeCharts
Read charts that are defined in the workbook? Identifies whether the Reader should read the definitions for any charts that exist in the workbook;.
protected
bool
$includeCharts
= false
$inputEncoding
Input encoding.
protected
string
$inputEncoding
= 'ANSI'
$loadSheetsOnly
Restrict which sheets should be loaded? This property holds an array of worksheet names to be loaded. If null, then all worksheets will be loaded.
protected
null|array<string|int, string>
$loadSheetsOnly
= null
This property is ignored for Csv, Html, and Slk.
$nestedColumn
protected
array<string|int, string>
$nestedColumn
= ['A']
$readDataOnly
Read data only? Identifies whether the Reader should only read data values for cells, and ignore any formatting information; or whether it should read both data and formatting.
protected
bool
$readDataOnly
= false
$readEmptyCells
Read empty cells? Identifies whether the Reader should read data values for all cells, or should ignore cells containing null value or empty string.
protected
bool
$readEmptyCells
= true
$readFilter
IReadFilter instance.
protected
IReadFilter
$readFilter
$rowspan
protected
array<string, bool>
$rowspan
= []
$securityScanner
protected
XmlScanner|null
$securityScanner
= null
$sheetIndex
Sheet index to read.
protected
int
$sheetIndex
= 0
$tableLevel
protected
int
$tableLevel
= 0
$valueBinder
protected
IValueBinder|null
$valueBinder
= null
Methods
__construct()
Create a new HTML Reader instance.
public
__construct() : mixed
canRead()
Validate that the current file is an HTML file.
public
canRead(string $filename) : bool
Parameters
- $filename : string
Return values
boolgetAllowExternalImages()
public
getAllowExternalImages() : bool
Return values
boolgetBorderMappings()
public
static getBorderMappings() : array<string, string>
Return values
array<string, string>getBorderStyle()
Map html border style to PhpSpreadsheet border style.
public
getBorderStyle(string $style) : string|null
Parameters
- $style : string
Return values
string|nullgetIgnoreRowsWithNoCells()
public
getIgnoreRowsWithNoCells() : bool
Return values
boolgetIncludeCharts()
Read charts in workbook? If this is true, then the Reader will include any charts that exist in the workbook.
public
getIncludeCharts() : bool
Note that a ReadDataOnly value of false overrides, and charts won't be read regardless of the IncludeCharts value. If false (the default) it will ignore any charts defined in the workbook file.
Return values
boolgetLoadSheetsOnly()
Get which sheets to load Returns either an array of worksheet names (the list of worksheets that should be loaded), or a null indicating that all worksheets in the workbook should be loaded.
public
getLoadSheetsOnly() : null|array<string|int, string>
Return values
null|array<string|int, string>getReadDataOnly()
Read data only? If this is true, then the Reader will only read data values for cells, it will not read any formatting or structural information (like merges).
public
getReadDataOnly() : bool
If false (the default) it will read data and formatting.
Return values
boolgetReadEmptyCells()
Read empty cells? If this is true (the default), then the Reader will read data values for all cells, irrespective of value.
public
getReadEmptyCells() : bool
If false it will not read data for cells containing a null value or an empty string.
Return values
boolgetReadFilter()
Read filter.
public
getReadFilter() : IReadFilter
Return values
IReadFiltergetSecurityScanner()
public
getSecurityScanner() : XmlScanner|null
Return values
XmlScanner|nullgetSecurityScannerOrThrow()
public
getSecurityScannerOrThrow() : XmlScanner
Return values
XmlScannergetSheetIndex()
Get sheet index.
public
getSheetIndex() : int
Return values
intgetStyleColor()
Check if has #, so we can get clean hex.
public
getStyleColor(string|null $value) : string
Parameters
- $value : string|null
Return values
stringgetValueBinder()
public
getValueBinder() : IValueBinder|null
Return values
IValueBinder|nulllistWorksheetInfo()
Return worksheet info (Name, Last Column Letter, Last Column Index, Total Rows, Total Columns).
public
listWorksheetInfo(string $filename) : array<int, array{worksheetName: string, lastColumnLetter: string, lastColumnIndex: int, totalRows: int, totalColumns: int, sheetState: string}>
Parameters
- $filename : string
Return values
array<int, array{worksheetName: string, lastColumnLetter: string, lastColumnIndex: int, totalRows: int, totalColumns: int, sheetState: string}>listWorksheetNames()
Returns names of the worksheets from a file, possibly without parsing the whole file to a Spreadsheet object.
public
listWorksheetNames(string $filename) : array<string|int, string>
Readers will often have a more efficient method with which they can override this method.
Parameters
- $filename : string
Return values
array<string|int, string>load()
Loads Spreadsheet from file.
public
load(string $filename[, int $flags = 0 ]) : Spreadsheet
Parameters
- $filename : string
-
The name of the file to load
- $flags : int = 0
-
the optional second parameter flags may be used to identify specific elements that should be loaded, but which won't be loaded by default, using these values: IReader::LOAD_WITH_CHARTS - Include any charts that are defined in the loaded file
Return values
SpreadsheetloadFromString()
Spreadsheet from content.
public
loadFromString(string $content[, Spreadsheet|null $spreadsheet = null ]) : Spreadsheet
Parameters
- $content : string
- $spreadsheet : Spreadsheet|null = null
Return values
SpreadsheetloadIntoExisting()
Loads PhpSpreadsheet from file into PhpSpreadsheet instance.
public
loadIntoExisting(string $filename, Spreadsheet $spreadsheet) : Spreadsheet
Parameters
- $filename : string
- $spreadsheet : Spreadsheet
Return values
SpreadsheetloadSpreadsheetFromFile()
Loads Spreadsheet from file.
public
loadSpreadsheetFromFile(string $filename) : Spreadsheet
Parameters
- $filename : string
Return values
SpreadsheetsetAllowExternalImages()
Allow external images. Use with caution.
public
setAllowExternalImages(bool $allowExternalImages) : self
Improper specification of these within a spreadsheet can subject the caller to security exploits.
Parameters
- $allowExternalImages : bool
Return values
selfsetIgnoreRowsWithNoCells()
public
setIgnoreRowsWithNoCells(bool $ignoreRowsWithNoCells) : self
Parameters
- $ignoreRowsWithNoCells : bool
Return values
selfsetIncludeCharts()
Set read charts in workbook Set to true, to advise the Reader to include any charts that exist in the workbook.
public
setIncludeCharts(bool $includeCharts) : $this
Note that a ReadDataOnly value of false overrides, and charts won't be read regardless of the IncludeCharts value. Set to false (the default) to discard charts.
Parameters
- $includeCharts : bool
Return values
$thissetLoadAllSheets()
Set all sheets to load Tells the Reader to load all worksheets from the workbook.
public
setLoadAllSheets() : $this
Return values
$thissetLoadSheetsOnly()
Set which sheets to load.
public
setLoadSheetsOnly(null|string|array<string|int, string> $sheetList) : $this
Parameters
- $sheetList : null|string|array<string|int, string>
Return values
$thissetReadDataOnly()
Set read data only Set to true, to advise the Reader only to read data values for cells, and to ignore any formatting or structural information (like merges).
public
setReadDataOnly(bool $readCellValuesOnly) : $this
Set to false (the default) to advise the Reader to read both data and formatting for cells.
Parameters
- $readCellValuesOnly : bool
Return values
$thissetReadEmptyCells()
Set read empty cells Set to true (the default) to advise the Reader read data values for all cells, irrespective of value.
public
setReadEmptyCells(bool $readEmptyCells) : $this
Set to false to advise the Reader to ignore cells containing a null value or an empty string.
Parameters
- $readEmptyCells : bool
Return values
$thissetReadFilter()
Set read filter.
public
setReadFilter(IReadFilter $readFilter) : $this
Parameters
- $readFilter : IReadFilter
Return values
$thissetSheetIndex()
Set sheet index.
public
setSheetIndex(int $sheetIndex) : $this
Parameters
- $sheetIndex : int
-
Sheet index
Return values
$thissetValueBinder()
public
setValueBinder(IValueBinder|null $valueBinder) : self
Parameters
- $valueBinder : IValueBinder|null
Return values
selfflushCell()
Flush cell.
protected
flushCell(Worksheet $sheet, string $column, int|string $row, mixed &$cellContent, array<string|int, string> $attributeArray) : void
Parameters
- $sheet : Worksheet
- $column : string
- $row : int|string
- $cellContent : mixed
- $attributeArray : array<string|int, string>
Tags
getTableStartColumn()
protected
getTableStartColumn() : string
Return values
stringnewSpreadsheet()
protected
newSpreadsheet() : Spreadsheet
Return values
SpreadsheetopenFile()
Open file for reading.
protected
openFile(string $filename) : void
Parameters
- $filename : string
processDomElement()
protected
processDomElement(DOMNode $element, Worksheet $sheet, int &$row, string &$column, string &$cellContent) : void
Parameters
- $element : DOMNode
- $sheet : Worksheet
- $row : int
- $column : string
- $cellContent : string
processFlags()
protected
processFlags(int $flags) : void
Parameters
- $flags : int
releaseTableStartColumn()
protected
releaseTableStartColumn() : string
Return values
stringsetTableStartColumn()
protected
setTableStartColumn(string $column) : string
Parameters
- $column : string