ottosmops / pdftotext by ottosmops

Extract text from PDF
97,606
4
4
Package Data
Maintainer Username: ottosmops
Maintainer Contact: kraenzle@k-r.ch (ak)
Package Create Date: 2016-11-09
Package Last Update: 2024-03-21
Language: PHP
License: MIT
Last Refreshed: 2024-04-27 03:06:39
Package Statistics
Total Downloads: 97,606
Monthly Downloads: 1,955
Daily Downloads: 73
Total Stars: 4
Total Watchers: 4
Total Forks: 1
Total Open Issues: 0

Extract text from a PDF with pdftotext

Software License Latest Stable Version Build Status SensioLabsInsight Packagist Downloads

This package provides a class to extract text from a pdf. It is more or less a PHP 5.6 compatible copy of spatie/pdf-to-text.

  \Ottosmops\Pdftotext\Extract::getText('/path/to/file.pdf') //returns the text from the pdf

Requirements

The Package uses pdftotext. Make sure that this is installed: which pdftotext

For Installation see: poppler-utils

If the installed binary is not found ("The command "which pdftotext" failed.") you can pass the full path to the _constructor (see below) or use putenv('PATH=$PATH:/usr/local/bin/:/usr/bin') (with the dir where pdftotext lives) before you call the class Extract.

Installation

composer require ottosmops/pdftotext

Usage

Extracting text from a pdf:

$text = (new Extract())
    ->pdf('file.pdf')
    ->text();

You can set the binary and you can specify options:

$text = (new Extract('/path/to/pdftotext'))
    ->pdf('path/to/file.pdf')
    ->options('-layout')
    ->text();

Default options are: -eol unix -enc UTF-8 -raw

License

The MIT License (MIT). Please see License File for more information.