Please use this identifier to cite or link to this item:
https://doi.org/10.21256/zhaw-25554
Publication type: | Article in scientific journal |
Type of review: | Peer review (publication) |
Title: | FormulaNet : a benchmark dataset for mathematical formula detection |
Authors: | Schmitt-Koopmann, Felix M. Huang, Elaine M. Hutter, Hans-Peter Stadelmann, Thilo Darvishy, Alireza |
et. al: | No |
DOI: | 10.1109/ACCESS.2022.3202639 10.21256/zhaw-25554 |
Published in: | IEEE Access |
Volume(Issue): | 10 |
Page(s): | 91588 |
Pages to: | 91596 |
Issue Date: | 2022 |
Publisher / Ed. Institution: | IEEE |
ISSN: | 2169-3536 |
Language: | English |
Subjects: | Automatic annotation; Dataset; Document analysis; Deep learning; Mathematical formula detection; Page object detection |
Subject (DDC): | 005: Computer programming, programs and data |
Abstract: | One unsolved sub-task of document analysis is mathematical formula detection (MFD). Research by ourselves and others has shown that existing MFD datasets with inline and display formula labels are small and have insufficient labeling quality. There is therefore an urgent need for datasets with better quality labeling for future research in the MFD field, as they have a high impact on the performance of the models trained on them. We present an advanced labeling pipeline and a new dataset called FormulaNet in this paper. At over 45k pages, we believe that FormulaNet is the largest MFD dataset with inline formula labels. Our experiments demonstrate substantially improved labeling quality for inline and display formulae detection over existing datasets. Additionally, we provide a math formula detection baseline for FormulaNet with an mAP of 0.754. Our dataset is intended to help address the MFD task and may enable the development of new applications, such as making mathematical formulae accessible in PDFs for visually impaired screen reader users. |
URI: | https://digitalcollection.zhaw.ch/handle/11475/25554 |
Fulltext version: | Published version |
License (according to publishing contract): | CC BY 4.0: Attribution 4.0 International |
Departement: | School of Engineering |
Organisational Unit: | Centre for Artificial Intelligence (CAI) Institute of Applied Information Technology (InIT) |
Appears in collections: | Publikationen School of Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
2022_SchmittKoopmann-etal_FormulaNet-Benchmark-Dataset-Mathematical-Formula-Detection.pdf | 1.35 MB | Adobe PDF | ![]() View/Open |
Show full item record
Schmitt-Koopmann, F. M., Huang, E. M., Hutter, H.-P., Stadelmann, T., & Darvishy, A. (2022). FormulaNet : a benchmark dataset for mathematical formula detection. IEEE Access, 10, 91588–91596. https://doi.org/10.1109/ACCESS.2022.3202639
Schmitt-Koopmann, F.M. et al. (2022) ‘FormulaNet : a benchmark dataset for mathematical formula detection’, IEEE Access, 10, pp. 91588–91596. Available at: https://doi.org/10.1109/ACCESS.2022.3202639.
F. M. Schmitt-Koopmann, E. M. Huang, H.-P. Hutter, T. Stadelmann, and A. Darvishy, “FormulaNet : a benchmark dataset for mathematical formula detection,” IEEE Access, vol. 10, pp. 91588–91596, 2022, doi: 10.1109/ACCESS.2022.3202639.
SCHMITT-KOOPMANN, Felix M., Elaine M. HUANG, Hans-Peter HUTTER, Thilo STADELMANN und Alireza DARVISHY, 2022. FormulaNet : a benchmark dataset for mathematical formula detection. IEEE Access. 2022. Bd. 10, S. 91588–91596. DOI 10.1109/ACCESS.2022.3202639
Schmitt-Koopmann, Felix M., Elaine M. Huang, Hans-Peter Hutter, Thilo Stadelmann, and Alireza Darvishy. 2022. “FormulaNet : A Benchmark Dataset for Mathematical Formula Detection.” IEEE Access 10: 91588–96. https://doi.org/10.1109/ACCESS.2022.3202639.
Schmitt-Koopmann, Felix M., et al. “FormulaNet : A Benchmark Dataset for Mathematical Formula Detection.” IEEE Access, vol. 10, 2022, pp. 91588–96, https://doi.org/10.1109/ACCESS.2022.3202639.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.