Route 326: 2003/03/26

2003/03/26

pdftotext

xpdf に附属しているpdftotextを使用するとpdfファイルから日本語などを含むテキストを抽出できることを発見。

% pdftotext -enc EUC-JP hoge.pdf

とかする。

ちなみにdebian的には

ii  xpdf-common    2.01-3         Portable Document Format (PDF) suite -- comm
ii  xpdf-japanese  20020401-1     Portable Document Format (PDF) suite -- Japa
ii  xpdf-utils     2.01-3         Portable Document Format (PDF) suite -- util

あたりを入れておけば良い。

0 件のコメント:

コメントを投稿