# PDF challenge



## xenou (Jun 24, 2010)

Hi,
I've got some co-workers using Acrobat in a long-handed way.  They are going through these steps which will be repeated about 14,000 times:

1.  Search one long pdf document for a name
2.  Print the page
3.  Repeat (14,000 times)

The search takes about 10 seconds (alone) on average - so it adds up to a bit of time, with all clicking and waiting for search results.

I'm not at all familiar with coding to Adobe's object model.  Does anyone know if it's possible to automate these steps, given a list of names (they come in about 100 at at time in Excel, to be matched and printed).  Naturally, the use of names will not be a perfect match so we'd probably need to print all matching names and sort out the duplicates (common names will likely have two or more matches).

Any ideas welcome.  Cheers,
xenou


----------



## Rekd (Jun 25, 2010)

xenou said:


> Hi,
> I've got some co-workers using Acrobat in a long-handed way. They are going through these steps which will be repeated about 14,000 times:
> 
> 1. Search one long pdf document for a name
> ...


 
I'm not sure about reading PDF data (it can be done but I think you have to buy a program to do it), but if it's taking 10 seconds to search and you've got 14,000 searches to do, the searches alone would take over 2 solid weeks to complete, or 48 working days. 

Talk about job security!


----------



## xenou (Jun 26, 2010)

I scrapped something together here.  One item that was mildly problematic is that the pdf search seems sometimes to _not _go to the page where the text was found.  And if the search didn't "select" the found text then I don't have the page number(!).  I haven't been able to nail down the cause yet, but it may be something wierd like running the code in the vbe window - also sometimes search results seem to be in a "treeview" with the document still at the top of the tree, rather than the search result present immediately - I couldn't figure out why the two ways of presenting the result (tried under search options in the Acrobat program but no joy). (BTW, I did find a post where the search was done "page by page" - that may be the best way to do it). I decided to close and re-open the file each time as a precaution since I don't really know what Acrobat does with searches and so on.

This is a stripped down version of the code.  I found two search methods, one a method of the avDoc and one a method of the JSO interface.  They seem to do the same thing - I decided on the latter.


```
[COLOR="Navy"]Public[/COLOR] [COLOR="Navy"]Sub[/COLOR] PrintCNA()


[COLOR="Navy"]Dim[/COLOR] gApp [COLOR="Navy"]As[/COLOR] Acrobat.CAcroApp [COLOR="SeaGreen"]'//Acrobat app[/COLOR]
[COLOR="Navy"]Dim[/COLOR] avDoc [COLOR="Navy"]As[/COLOR] Acrobat.CAcroAVDoc [COLOR="SeaGreen"]'//Visible pdf document with a UI Window[/COLOR]
[COLOR="Navy"]Dim[/COLOR] pdDoc [COLOR="Navy"]As[/COLOR] Acrobat.CAcroPDDoc [COLOR="SeaGreen"]'//Underlying pdf document[/COLOR]
[COLOR="Navy"]Dim[/COLOR] avView [COLOR="Navy"]As[/COLOR] Acrobat.CAcroAVPageView [COLOR="SeaGreen"]'//For access to page numbers[/COLOR]
[COLOR="Navy"]Dim[/COLOR] jso [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Object[/COLOR] [COLOR="SeaGreen"]'//Javascript interface[/COLOR]

    
    [COLOR="Navy"]Set[/COLOR] gApp = CreateObject("AcroExch.App")
    [COLOR="Navy"]Set[/COLOR] avDoc = CreateObject("AcroExch.AVDoc")
    gApp.Show
    
    [COLOR="Navy"]For[/COLOR] i = 1 [COLOR="Navy"]To[/COLOR] UBound(b)
        
        [COLOR="Navy"]Call[/COLOR] avDoc.Open(sMasterFullFilePath, sMasterFileName)
        [COLOR="Navy"]Set[/COLOR] avView = avDoc.GetAVPageView()
        [COLOR="Navy"]Set[/COLOR] pdDoc = avDoc.GetPDDoc()
        [COLOR="Navy"]Set[/COLOR] jso = pdDoc.GetJSObject
        
        [COLOR="SeaGreen"]'//Search for ID Numbers[/COLOR]
        pgNum = 0
        [COLOR="SeaGreen"]'Call avDoc.FindText(b(i), False, False, True)[/COLOR]
        [COLOR="Navy"]Call[/COLOR] jso.Search.query(b(i), "ActiveDoc")
        [COLOR="Navy"]Do[/COLOR] [COLOR="Navy"]While[/COLOR] pgNum = 0
            pgNum = avView.GetPageNum [COLOR="SeaGreen"]'//bug alert: infinite loop if number never found ...[/COLOR]
        [COLOR="Navy"]Loop[/COLOR]
        [COLOR="Navy"]Call[/COLOR] avDoc.PrintPages(pgNum, pgNum, 2, False, False)
                    
        avDoc.Close (True) [COLOR="SeaGreen"]'//close w/o saving changes[/COLOR]
        pdDoc.Close
    
    [COLOR="Navy"]Next[/COLOR] i

    gApp.Exit

[COLOR="Navy"]End[/COLOR] [COLOR="Navy"]Sub[/COLOR]
```


----------



## gg89 (May 29, 2012)

do you  have the original C# code?
 what is b in uBound(b)? is it a parameter string[] b?
any hint about select all or page by page without searching?

I looked at the adobe acrobat site and and the object browser for the Acrobat but I just don't know how to leap form get object to text.


----------



## gg89 (May 30, 2012)

for those that want to extract all words from all pages of a PDF document w/o searching, here is an slightly modified version of the orginal post in adobe forum by Eldrarak82:


> private static string PdDocGetText(AcroPDDoc pdDoc)
> {
> AcroPDPage page;
> int pages = pdDoc.GetNumPages();
> ...


----------

