Please use this identifier to cite or link to this item:
Publication type: Article in scientific journal
Type of review: Peer review (publication)
Title: FormulaNet : a benchmark dataset for mathematical formula detection
Authors: Schmitt-Koopmann, Felix M.
Huang, Elaine M.
Hutter, Hans-Peter
Stadelmann, Thilo
Darvishy, Alireza
et. al: No
DOI: 10.1109/ACCESS.2022.3202639
Published in: IEEE Access
Volume(Issue): 10
Page(s): 91588
Pages to: 91596
Issue Date: 2022
Publisher / Ed. Institution: IEEE
ISSN: 2169-3536
Language: English
Subjects: Automatic annotation; Dataset; Document analysis; Deep learning; Mathematical formula detection; Page object detection
Subject (DDC): 005: Computer programming, programs and data
Abstract: One unsolved sub-task of document analysis is mathematical formula detection (MFD). Research by ourselves and others has shown that existing MFD datasets with inline and display formula labels are small and have insufficient labeling quality. There is therefore an urgent need for datasets with better quality labeling for future research in the MFD field, as they have a high impact on the performance of the models trained on them. We present an advanced labeling pipeline and a new dataset called FormulaNet in this paper. At over 45k pages, we believe that FormulaNet is the largest MFD dataset with inline formula labels. Our experiments demonstrate substantially improved labeling quality for inline and display formulae detection over existing datasets. Additionally, we provide a math formula detection baseline for FormulaNet with an mAP of 0.754. Our dataset is intended to help address the MFD task and may enable the development of new applications, such as making mathematical formulae accessible in PDFs for visually impaired screen reader users.
Fulltext version: Published version
License (according to publishing contract): CC BY 4.0: Attribution 4.0 International
Departement: School of Engineering
Organisational Unit: Centre for Artificial Intelligence (CAI)
Institute of Applied Information Technology (InIT)
Appears in collections:Publikationen School of Engineering

Files in This Item:
File Description SizeFormat 
2022_SchmittKoopmann-etal_FormulaNet-Benchmark-Dataset-Mathematical-Formula-Detection.pdf1.35 MBAdobe PDFThumbnail
Show full item record
Schmitt-Koopmann, F. M., Huang, E. M., Hutter, H.-P., Stadelmann, T., & Darvishy, A. (2022). FormulaNet : a benchmark dataset for mathematical formula detection. IEEE Access, 10, 91588–91596.
Schmitt-Koopmann, F.M. et al. (2022) ‘FormulaNet : a benchmark dataset for mathematical formula detection’, IEEE Access, 10, pp. 91588–91596. Available at:
F. M. Schmitt-Koopmann, E. M. Huang, H.-P. Hutter, T. Stadelmann, and A. Darvishy, “FormulaNet : a benchmark dataset for mathematical formula detection,” IEEE Access, vol. 10, pp. 91588–91596, 2022, doi: 10.1109/ACCESS.2022.3202639.
SCHMITT-KOOPMANN, Felix M., Elaine M. HUANG, Hans-Peter HUTTER, Thilo STADELMANN und Alireza DARVISHY, 2022. FormulaNet : a benchmark dataset for mathematical formula detection. IEEE Access. 2022. Bd. 10, S. 91588–91596. DOI 10.1109/ACCESS.2022.3202639
Schmitt-Koopmann, Felix M., Elaine M. Huang, Hans-Peter Hutter, Thilo Stadelmann, and Alireza Darvishy. 2022. “FormulaNet : A Benchmark Dataset for Mathematical Formula Detection.” IEEE Access 10: 91588–96.
Schmitt-Koopmann, Felix M., et al. “FormulaNet : A Benchmark Dataset for Mathematical Formula Detection.” IEEE Access, vol. 10, 2022, pp. 91588–96,

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.