Layout-aware text extraction from full-text PDF of scientific articles

Volume: 7, Issue: 1
Published: May 28, 2012
Abstract
The Portable Document Format (PDF) is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the 'Layout-Aware PDF Text Extraction' (LA-PDFText)...
Paper Details
Title
Layout-aware text extraction from full-text PDF of scientific articles
Published Date
May 28, 2012
Volume
7
Issue
1
Citation AnalysisPro
  • Scinapse’s Top 10 Citation Journals & Affiliations graph reveals the quality and authenticity of citations received by a paper.
  • Discover whether citations have been inflated due to self-citations, or if citations include institutional bias.