

Loading using GemBox.PdfĪlternatively you can use GemBox.Pdf component which gives you lower level access to the PDF elements and gives you more control over editing the document and more precise information when extracting content and properties of PDF elements. Editing is limited - Since text segments are absolutely positioned on a page using text frames, removing text or adding new text doesn't reflow the rest of the content the positions of all text frames are independent of each other.įor an example, see the convert PDF to DOCX example.Text search is limited - Since logically connected text segments might end up in different text frames, looking for a term which spans two or more text frames is not possible.The logical structure of the document is not available - For example, if you have a table in a PDF file and you want to extract the content of a cell in the second row and third column, that is not possible since there is no table.

The PDF page graphics are converted to shapes or rendered into temporary images that are then inserted into a page.Īlthough the output of this approach looks very similar or identical to the input PDF, it has the following drawbacks:

High fidelity loading uses text frames to position the text in the same location on a page as it appeared in the PDF page. For an example, see the PDF encryption example. GemBox.Document also supports reading encrypted PDF files. Or, a PDF page with a single small line of text in the middle of it could be a paragraph with left alignment and left indentation, right alignment and right indentation, or some other combination.įor an example, see the extract text from PDF example. For example, a PDF page with text in two columns could be a table with a single row and two cells or a section with two columns. However, note that a fully correct recognition is impossible to achieve just by reading the content of PDF pages, because higher level information is required to disambiguate certain cases. The recognition of PDF logical structure in GemBox.Document is based on various heuristics that we have implemented and plan to improve and extend over time based on customer feedback. is specified in page coordinates and is, potentially, transformed) and GemBox.Document model is a flow document format (like HTML, for example), to read a PDF file into a GemBox.Document, elements such as Paragraph and Table must be recognized from PDF-positioned text and lines/paths. Since PDF is a fixed document format (the location of every text, border line, background fill, etc. There are various options for reading PDF files using GemBox components and each has its advantages and is suitable in different scenarios. GemBox.Document supports most of the Microsoft Word Document (DOCX) and Word 97-2003 Document (DOC) features through its API, but not all.įor example, GemBox.Document doesn't support macros and smart arts through its API.Īlthough not supporting all Microsoft Word Document (DOCX) features through its API, GemBox.Document allows you to preserve the unsupported features, so you don't lose any relevant document content when loading and saving a document to DOCX format.įor more information, see Preservation and Preservation example.

Image.Source = document.ConvertToImageSource(SaveOptions.ImageDefault) ' Assign a DocumentModel instance to Image control. ' Assign a DocumentModel instance to DocumentViewer control.ĭocumentViewer.Document = document.ConvertToXpsDocument(SaveOptions.XpsDefault).GetFixedDocumentSequence() Assign a DocumentModel instance to Image control. Assign a DocumentModel instance to DocumentViewer control.ĭocumentViewer.Document = document.ConvertToXpsDocument(SaveOptions.XpsDefault).GetFixedDocumentSequence()
