mirror of
https://github.com/kmein/niveum
synced 2026-03-18 19:11:08 +01:00
package .bin/ scripts as proper nix packages, delete .bin/
Packaged 14 scripts from .bin/ into packages/ with proper dependency declarations (writers.writeDashBin/writeBashBin/writePython3Bin): - 256color → two56color (terminal color chart) - avesta.sed → avesta (Avestan transliteration) - bvg.sh → bvg (Berlin transit disruptions) - unicode → charinfo (Unicode character info) - chunk-pdf → chunk-pdf (split PDFs by page count) - csv2json → csv2json (CSV to JSON converter) - fix-sd.sh → fix-sd (exFAT SD card recovery, improved output handling) - json2csv → json2csv (JSON to CSV converter) - mp3player-write → mp3player-write (audio conversion for MP3 players) - mushakkil.sh → mushakkil (Arabic diacritization) - nix-haddock-index → nix-haddock-index (GHC Haddock index generator) - pdf-ocr.sh → pdf-ocr (OCR PDFs via tesseract) - prospekte.sh → prospekte (German supermarket flyer browser) - readme → readme (GitHub README as man page) All added to overlay and packages output. .bin/ directory removed.
This commit is contained in:
29
packages/pdf-ocr.nix
Normal file
29
packages/pdf-ocr.nix
Normal file
@@ -0,0 +1,29 @@
|
||||
# OCR a PDF file to text using tesseract
|
||||
{
|
||||
writers,
|
||||
poppler_utils,
|
||||
tesseract,
|
||||
coreutils,
|
||||
}:
|
||||
writers.writeDashBin "pdf-ocr" ''
|
||||
set -efu
|
||||
|
||||
pdf_path="$(${coreutils}/bin/realpath "$1")"
|
||||
|
||||
[ -f "$pdf_path" ] || {
|
||||
echo "Usage: pdf-ocr FILE.pdf" >&2
|
||||
exit 1
|
||||
}
|
||||
|
||||
tmpdir="$(${coreutils}/bin/mktemp -d)"
|
||||
trap 'rm -rf $tmpdir' EXIT
|
||||
|
||||
cd "$tmpdir"
|
||||
|
||||
${poppler_utils}/bin/pdftoppm -png "$pdf_path" pdf-ocr
|
||||
for png in pdf-ocr*.png; do
|
||||
${tesseract}/bin/tesseract "$png" "$png.txt" 2>/dev/null
|
||||
done
|
||||
|
||||
cat pdf-ocr-*.txt
|
||||
''
|
||||
Reference in New Issue
Block a user