doc2data

PyPI - Version PyPI - Python Version Code style: black Hatch project


About doc2data

doc2data is a Python library that provides functionality to train deep learning models for various document processing tasks.

Currently, models can be trained for four tasks:

  1. Page rotation
  2. Page cropping
  3. Document (multi-page) classification
  4. Token classification

Please note that doc2data is currently in a prototype stage.

Installation

pip install doc2data

Documentation

The documentation can be found here.

License

doc2data is distributed under the terms of the Apache-2.0 license.

Credits

Prototypefund Federal Ministry of Education and Research