Layout-aware text extraction from full-text PDF of scientific articles
Abstract
The Portable Document Format (PDF) is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the 'Layout-Aware PDF Text Extraction' (LA-PDFText)...
Paper Details
Title
Layout-aware text extraction from full-text PDF of scientific articles
Published Date
May 28, 2012
Volume
7
Issue
1
Citation AnalysisPro
You’ll need to upgrade your plan to Pro
Looking to understand the true influence of a researcher’s work across journals & affiliations?
- Scinapse’s Top 10 Citation Journals & Affiliations graph reveals the quality and authenticity of citations received by a paper.
- Discover whether citations have been inflated due to self-citations, or if citations include institutional bias.
Notes
History