Tuesday, February 1, 2011

Building a Web Search Engine

Introductions:

I often think of google ,yahoo and other web search engines, actually I don't have any idea of those ,what I mean a deeper thought of developing it before.Until , little by little it would be a coincident now in my own scope of work.Yes, I am getting the idea that if I could run Zebra server together with its client (Yaz to be particular) I would be having a simple web search engine-think it ain't obvious. Ohhhm ...pretty cool ;of course that requires authentications and permission to the end(peer) server, and that is how these big web search engines are doing ( am correct?) .

In this case , we can refer this topic " Installing Zebra(z3950 protocol)" that I have blogged recently.

Apparently ,I have search out (wheew!) that we need other party to browse it via http (that is the new features of YAX 4.1.XX getting data via HTTP protocol-we'll see it brother!)and this is the job of SOLR application software.SOLR is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. SOLR is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.

SOLR is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. SOLR uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. SOLR's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.

Now let's have a try if this could make a straight forward approach on how to develop a simple search engine, again what we need is a server to host the records such as the meta-data/indexes and the client to communicate the server in a z3950 protocol way.

That is it for now.


Requirements:
For our requirements
Installed Zebra server
http://ftp.indexdata.dk/pub/zebra/idzebra-2.0.45.tar.gz
Configured Zebra server
Testing Marc to Zebra server
Installed YAZ client
http://ftp.indexdata.dk/pub/yaz/yaz-4.1.3.tar.gz
Configured YAz client
Search index Zebra server's marc records
Installed SOLR (new)
http://ftp.wayne.edu/apache//lucene/solr/1.4.1/apache-solr-1.4.1.tgz
Configured SOLR
Installed Java
http://apache.cyberuse.com//lucene/java/lucene-2.9.4-src.tar.gz
InstalledTomcat (new)
http://apache.cs.utah.edu/tomcat/tomcat-7/v7.0.6/bin/apache-tomcat-7.0.6.tar.gz


Methodology:

Detail(1) Testing Zebra server for HTTP

Remarks:


Conclusions:

No comments:

Post a Comment