PdfParser is a standalone PHP library that provides various tools for extracting data from PDF files. It loads and parses objects and headers, extracts meta data, and extracts text from ordered pages. It supports compressed PDF, MAC OS Roman charset encoding, hex and octal encoding in text sections, and is compliant with PSR-0 (autoloader) and PSR-1 (code styling). Currently, secured documents are not supported.
|Tags||PHP class libraries lib Library PDF PDF file manipulation file conversion Extract Extract Text|
|Operating Systems||Not Applicable|
For those who can't use composer on their computer, here is a link with all requested to use the library : http://pdfparser.org/pdfparser-v0.9.8.tgz
Release Notes: This release fixes XObject parsing. Note that it is necessary to update the TCPDF dependency, too.
Release Notes: This release fixes MacRoman encoding and adds support for multichar font table translation.
Release Notes: This release supports text extraction from XObject forms (a rare situation).
Release Notes: This release fixes two bugs concerning comments in PDF structure files and a "0" char prefix on object IDs. Those bugs were rare, but unfortunately occurred on PDFs provided by a user.
Release Notes: This release fixes many bugs in text blocs. Code cleanup and performance optimizations. The demo page has been updated to use this release (http://www.pdfparser.org/demo). Feel free to send the author any PDF files that generate an issue or aren't parsed as expected.