How to use Webmagnify and GSA to categorize text, pdf, or word documents

May 28, 2008, 10:18 PM

Hi,

Are there any guidelines/examples/manuals from ibi on how to use Web magnify with PDF, word, or text files?

We want to feed these files to the GSA and we would like to be able to use webmagnify to categorize these pdf documents.

So far We are unable to find any IBI reference manual on how to do this.

Thank you for any pointers you can provide.

Regards & Thanks

May 29, 2008, 04:09 PM

IBAdam

Magnify is able to index files that are either stored in a database or Document Management System. Do you fit this criteria?

If not, we are working to develop a method to index file contained in a file directory structure.

When do you require such a feature and can we speak to discuss this in more detail?

May 29, 2008, 09:33 PM

IBAdam

Also, if using the Google Search Appliance, you can look into the crawling feature to add various files to your index.

This will require working in conjunction with your Google Search Appliance Administrator.

Start by looking at the documentation located at:

Administering Crawl for Web and File Share Content: Introduction
http://code.google.com/apis/searchappliance/documentati...wl/Introduction.html

Administering Crawl for Web and File Share Content: Preparing for a Crawl
http://code.google.com/apis/searchappliance/documentati...wl/Introduction.html

May 30, 2008, 11:12 AM

<MoonLightWare>

Indexing Text file should be fairly easy you could read the text file in and wrap it with and xml tag. Then make it part of the html body.

May 30, 2008, 02:27 PM

MNMH

Adam,

Thank you for response to my inquiries.

The files we referred to are in Windows server 2003 directories, they are not in database or any Documentation Management System.

We already successfully set up the GSA to crawl the above pdfs, words files etc. The problem is we are not able to group the "hits" in the category tree. Thus, we are not really taking advantage of the Web Magnify "categorization feature". Assume that we already knew the classification/categories for each of the pdf file we have, what we are interested in is a way to prep the html with the enriched metadata values for each pdf files (as shown in Magnify installation document, except that the example in the book is used for database records). Another scenario which we are interested in is finding a way (probably using Iway) to do "automatic categorization" of a document before feeding to GSA -- this seems to be difficult to achieve at this point.

I hope these would clearly clarify on what we intend to use Web Magnify for this case. Thank you and have a very successful Summit conf.

Regards
Michael