Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-25554
Full metadata record
DC FieldValueLanguage
dc.contributor.authorSchmitt-Koopmann, Felix M.-
dc.contributor.authorHuang, Elaine M.-
dc.contributor.authorHutter, Hans-Peter-
dc.contributor.authorStadelmann, Thilo-
dc.contributor.authorDarvishy, Alireza-
dc.date.accessioned2022-09-01T12:53:11Z-
dc.date.available2022-09-01T12:53:11Z-
dc.date.issued2022-
dc.identifier.issn2169-3536de_CH
dc.identifier.urihttps://digitalcollection.zhaw.ch/handle/11475/25554-
dc.description.abstractOne unsolved sub-task of document analysis is mathematical formula detection (MFD). Research by ourselves and others has shown that existing MFD datasets with inline and display formula labels are small and have insufficient labeling quality. There is therefore an urgent need for datasets with better quality labeling for future research in the MFD field, as they have a high impact on the performance of the models trained on them. We present an advanced labeling pipeline and a new dataset called FormulaNet in this paper. At over 45k pages, we believe that FormulaNet is the largest MFD dataset with inline formula labels. Our experiments demonstrate substantially improved labeling quality for inline and display formulae detection over existing datasets. Additionally, we provide a math formula detection baseline for FormulaNet with an mAP of 0.754. Our dataset is intended to help address the MFD task and may enable the development of new applications, such as making mathematical formulae accessible in PDFs for visually impaired screen reader users.de_CH
dc.language.isoende_CH
dc.publisherIEEEde_CH
dc.relation.ispartofIEEE Accessde_CH
dc.rightshttp://creativecommons.org/licenses/by/4.0/de_CH
dc.subjectAutomatic annotationde_CH
dc.subjectDatasetde_CH
dc.subjectDocument analysisde_CH
dc.subjectDeep learningde_CH
dc.subjectMathematical formula detectionde_CH
dc.subjectPage object detectionde_CH
dc.subject.ddc005: Computerprogrammierung, Programme und Datende_CH
dc.titleFormulaNet : a benchmark dataset for mathematical formula detectionde_CH
dc.typeBeitrag in wissenschaftlicher Zeitschriftde_CH
dcterms.typeTextde_CH
zhaw.departementSchool of Engineeringde_CH
zhaw.organisationalunitCentre for Artificial Intelligence (CAI)de_CH
zhaw.organisationalunitInstitut für Informatik (InIT)de_CH
dc.identifier.doi10.1109/ACCESS.2022.3202639de_CH
dc.identifier.doi10.21256/zhaw-25554-
zhaw.funding.euNode_CH
zhaw.originated.zhawYesde_CH
zhaw.pages.end91596de_CH
zhaw.pages.start91588de_CH
zhaw.publication.statuspublishedVersionde_CH
zhaw.volume10de_CH
zhaw.publication.reviewPeer review (Publikation)de_CH
zhaw.funding.snf194677de_CH
zhaw.webfeedMachine Perception and Cognitionde_CH
zhaw.webfeedHuman-Centered Computingde_CH
zhaw.author.additionalNode_CH
zhaw.display.portraitYesde_CH
Appears in collections:Publikationen School of Engineering

Files in This Item:
File Description SizeFormat 
2022_SchmittKoopmann-etal_FormulaNet-Benchmark-Dataset-Mathematical-Formula-Detection.pdf1.35 MBAdobe PDFThumbnail
View/Open
Show simple item record
Schmitt-Koopmann, F. M., Huang, E. M., Hutter, H.-P., Stadelmann, T., & Darvishy, A. (2022). FormulaNet : a benchmark dataset for mathematical formula detection. IEEE Access, 10, 91588–91596. https://doi.org/10.1109/ACCESS.2022.3202639
Schmitt-Koopmann, F.M. et al. (2022) ‘FormulaNet : a benchmark dataset for mathematical formula detection’, IEEE Access, 10, pp. 91588–91596. Available at: https://doi.org/10.1109/ACCESS.2022.3202639.
F. M. Schmitt-Koopmann, E. M. Huang, H.-P. Hutter, T. Stadelmann, and A. Darvishy, “FormulaNet : a benchmark dataset for mathematical formula detection,” IEEE Access, vol. 10, pp. 91588–91596, 2022, doi: 10.1109/ACCESS.2022.3202639.
SCHMITT-KOOPMANN, Felix M., Elaine M. HUANG, Hans-Peter HUTTER, Thilo STADELMANN und Alireza DARVISHY, 2022. FormulaNet : a benchmark dataset for mathematical formula detection. IEEE Access. 2022. Bd. 10, S. 91588–91596. DOI 10.1109/ACCESS.2022.3202639
Schmitt-Koopmann, Felix M., Elaine M. Huang, Hans-Peter Hutter, Thilo Stadelmann, and Alireza Darvishy. 2022. “FormulaNet : A Benchmark Dataset for Mathematical Formula Detection.” IEEE Access 10: 91588–96. https://doi.org/10.1109/ACCESS.2022.3202639.
Schmitt-Koopmann, Felix M., et al. “FormulaNet : A Benchmark Dataset for Mathematical Formula Detection.” IEEE Access, vol. 10, 2022, pp. 91588–96, https://doi.org/10.1109/ACCESS.2022.3202639.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.