Introductions:
Hello folks ! once again, this tutorial is all about building special functions that could be used to process images:OCR, bar-code,grey-scaling and etc. Actually this guide should not be included in the techno-blog; but I was compromised to put the document here to avoid head ache experience again.It's been awhile since I've debugged the exact-image libraries so I've forgotten the "howtos" necessary for a quick set-up -not even worried about taking notes then. Just recently ,when I decided to upgrade my deployed OS into a newer version ,I didn't expect that it would take almost 3 days for me to compile those object modules and libraries-really a head ache . So this time, I've realized that it's not a good practice to ignore even a little pieces of notes (patches,revision,repositories and etc) in recalling included files patched in the program -especially if its free.
Anyway let me share to you the usefulness of the ExactCode software.The software is a fast, modern and generic image processing library .It includes codecs
allowing library users to implement their own data sources and
destinations, such as in memory locations or network transfers.It is know as viable alternative to ImageMagick. The software was prototyped the needed code in C++, just for speed, and achieved processing times about 1/20th of what ImageMagick consumed. It features explore several new algorithms, e.g. for de-screening, data-dependent triangulation scaling, loss-less JPEG transforms and others needed for fast image processing.
Below are the instructions on how you can install and build exactimage which includes programs for fast image processing.I've also attached video on how the OCR(hocr2pdf) program functions in searching different texts in a PDF viewer. You can use each program in the command line as I've written how-to's and instructions in the testing portion of this blog.You may cut and paste all included examples and see for yourself if it indeed does its job more than what is expected.But , hey,don't forget to jot some notes before you will forget its procedures.Otherwise you will experience headache in the future as you wanted to try it once more. A sort of advise folks!
Requirements:
Linux OS: Fedora 18 64 bit
Server ,i7 core
ExactCode image processing library
Cuneiform (installed)
Tesseract (installed)
Methodology:
Download
root@localhost# wget http://exactcode.de/exact-image.0.8.x.tar.bz2
root@localhost# svn co https://exactcode.de/exact-image/trunk exact-image.8.x
Installations:
root@localhost# yum install
gcc gcc-c++ libstdc++
libXrender libXrender-devel
libaa libaa-devel
libX11 libX11-devel
agg agg-devel
freetype2 freetype2-devel
evas evas-devel
libjpeg libjpeg-devel
libtiff libtiff-devel
libpng libpng-devel
libungif libungif-devel
jasper jasper-devel
expat expat-devel
openexr openexr-devel
lcms lcms-devel
barcode barcode-devel
swig swig-devel
lua lua-devel
perl perl-devel perl-ExtUtils-Embed
php php-devel
python python-devel
ruby ruby-devel
root@localhost# tar -jxvf exact-image.8.x.tar.bz2
root@localhost# cd exact-image.8.x/
root@localhost# ./configure --prefix=/usr/local/scanner
root@localhost# make && make install
Testing:
Note:
This CLI based program can createa searchable PDF from hOCR input
hocr2pdf: Is a command line front-end for the image processing library to create perfectly layouted, searchable PDF files from hOCR, annotated HTML, input obtained from an OCR system.
(1) hOCR, annotated HTML, input must be provided to STDIN, and the image data is read using the filename from the -i or --input argument. For example:
roott@localhost# hocr2pdf -i scan.tiff -o test.pdf < cuneiform-out.hocr
(2) By default the text layer is hidden by the real image data. Including image data can be disabled via the -n, --no-image, so that just the recognized text from the OCR is visible - e.g. for debugging or to save storage space:
root@localhost# hocr2pdf -i scan.tiff -n -o test.pdf < cuneiform-out.hocr
(3) If too many gabs between letters in individual words as this might be a problem with imprecise OCR data or justified text with huge gabs. Hocr2pd in ExactImage includes a special mode activated with the command line argument -s, --sloppy-text, to group glyphs between whitespace to words which can help PDF viewers to produce better results while cut and pasting text:
root@localhost#hocr2pdf -i scan.tiff -s -o test.pdf < cuneiform-out.hocr
Details:
0) Exact-Image.8.8 files
(1)Plane image (Tiff file)
(2) hOCR generated text
3) Hocr2pdf script which is called every processing OCR
4) OCR searchable texts in PDF viewer
Remarks:
(1)Troubles:
png error [1]
Shooting:
Note: libpng12 in ExactImage depreciated and causes bug in the compilation so better delete "png.hh" and "png.cc"
root@localhost# cd /codecs
root@localhost# rm -rf png.*
(2)Troubles
Video(OCR processing)
This comment has been removed by the author.
ReplyDeleteFor today's computing platforms, easy access and openness is important for net based mostly communications and for lean resourced IT Management groups.
DeleteThis is directly at odds for the multiplied necessity for comprehensive security measures during a world choked with malware, hacking threats and would-be knowledge thieves. Latest Info Visit this URL
Hello Mam Jessy,
DeleteGood Day.
It is nice meeting you in my blog.
Actually, as long as your post is useful and helpful that could make a little boost to our blogging side.
Just we are sure that we have some how given credits to the original author or certain that we are the one who did it and for free..
Cheers and GOD bless!
E^3
Philippines
Well It Was Very Nice Article It Is Very Useful For Linux Learners. We Are Also Providing Linux Online Courses Training. Our Linux Online Training Is One Of The Best Online Training Institute In The World.
ReplyDeleteHi Mam Daniel,
DeleteGood Day.
Good Day.
It is nice meeting you in my blog.
Please read some articles posted and hope I am able to share good ideas.
Thanks
-E^3
Philippines
ReplyDeleteWe design and install only the highest quality CCTV systems for residential and commercial establishments , with provizionph.com you have own safety and surely protected.
This comment has been removed by a blog administrator.
ReplyDeleteib595 caterpillarbootssaleireland,caterpillar greece,altra spor ayakkabı,caterpillarsgreece,kedsschuhe,caterpillar canada,asicssneaker,mizunovoetbalschoenen,keen kengät ts104
ReplyDelete