
Ominpage by Nuance is another offer - I think it is the main competitor to ABBYY FineReader. Yes, ABBYY FineReader does OCR (Optical Character Recognition), and is able to create PDFs with the images as being used as input and a layer of the recognized text behind, so that one can search for text in the PDF.
Vectorize pdf pdf#
If preserving the layout of the original pages is not important, it could be a good option.ĭo you know if ABBYY does this? So far I have only been using ABBYY to convert from PDF to Microsoft Word. Lower resolution images can however be used satisfactorily, subject to the OCR text recognition accuracy achieved and the time that can be spent on proof-reading. Even if fonts that nominally match the original fonts used are available, line breaks for example are likely to change, while preserving the layout of a complex page is likely to require a lot of effort. While that option can also produce excellent quality output and minimal file sizes, a practical limitation is that in general it will be difficult to maintain the original page layout, especially when the page layout is complex. Maybe the next stable version (0.49) will include such an option (which certainly was planned to be added - based on the dropdown list in the PDF import dialog), maybe you'll need to wait for a later release.Adobe ClearScan, as noted above, can produce excellent quality output and also very small files sizes, but requires quite high DPI images (ideally typically around 600DPI) in order to work well.Īnother approach to producing a PDF with vector text, not mentioned above, is to use the option in Abbyy FineReader or Nuance OmniPage to output pages that are created using standard vector fonts, as would be used in a word processor document. At the moment, I am not aware of an internal command/function available in the latest stable release (0.48.1) to achieve your request (vectorize (embedded) fonts when importing PDF). edited later: Apologies again - I know this phrase is not helpful to users looking for a cross-platform solution. The feature is not intentionally omitted in Inskcape, but the currently active core of the developer team is small, and as with other open source projects, contributions are often driven by personal interest of a developer in fixing or adding certain features.
Vectorize pdf code#
The request is known and needs to be addressed by writing code based on current routines in Inkscape for PDF import (using the shared external poppler and cairo libraries). You are welcome to implement the missing feature in the current code base. Sorry for thinking you might be interested in interim solutions. I was talking about Inkscape, which is multi-platform. Netheril96 wrote:but it is restricted to Linux and I don't want switch to Linux every time I convert PDF to SVG.ĪFAICS you never told what OS/platform you are working on (and need a solution for) … Maybe developers can just incorporate the code of pdf2svg into Inkscape. There is in fact a tiny program pdf2svg ( ) based on Cairo and Poppler able to do this, but it is restricted to Linux and I don't want switch to Linux every time I convert PDF to SVG. Sadly what I want is exactly to convert text to paths based on embedded fonts of the PDF file. Such a command could be daisy-chained as external script in an Inkscape input extension.

Vectorize pdf install#
If you can install a recent development snapshot (0.48+devel), test opening the PDF file from within Inkscape as " Adobe PDF via cairo-poppler (*.pdf)" (Note: experimental, work-in-progress, might not be included in the next stable release) - it will not convert text to paths based on embedded fonts of the PDF file AFAICT, but does use installed fonts if available: the imported text is created as paths (clones linked to glyph paths as 's in the section).Īlternatively write a script to use ghostscript for converting texts into outlines of PDF files, before opening the PDF file in Inkscape (a sample command line is here, a quick google search will return many other examples).

Netheril96 wrote:I need Inkscape to automatically vectorize all text upon importing.This is not yet supported by the current stable release (but a known feature request).
