<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Techno WeBlog &#187; Search Engine</title>
	<atom:link href="http://blog.codlib.com/category/search-engine/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.codlib.com</link>
	<description>Blogging about tech, the tech, and everything tech, for techno addicts!</description>
	<lastBuildDate>Thu, 04 Nov 2010 04:05:35 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>How to enable mod_rewrite module in apache</title>
		<link>http://blog.codlib.com/2008/05/06/how-to-enable-mod_rewrite-module-in-apache/</link>
		<comments>http://blog.codlib.com/2008/05/06/how-to-enable-mod_rewrite-module-in-apache/#comments</comments>
		<pubDate>Tue, 06 May 2008 08:18:37 +0000</pubDate>
		<dc:creator>Jans</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[Search Engine]]></category>
		<category><![CDATA[Tips & Tricks]]></category>

		<guid isPermaLink="false">http://blog.codlib.com/2008/05/06/how-to-enable-mod_rewrite-module-in-apache/</guid>
		<description><![CDATA[How to check weather mod_rewrite module is enabled or not?
The very simple technique to check weather mod_rewrite module is enabled or not in you web server.
1) Type  in a php file and save it and run that file in the server.
2) And now you can the list of information, just search the word “mod_rewrite” [...]]]></description>
			<content:encoded><![CDATA[<p><b>How to check weather mod_rewrite module is enabled or not?</b></p>
<p>The very simple technique to check weather mod_rewrite module is enabled or not in you web server.</p>
<p>1) Type <?php phpinfo(); ?> in a php file and save it and run that file in the server.<br />
2) And now you can the list of information, just search the word “mod_rewrite” from the browser’s search menu.<br />
3) If it is found under the “Loaded Modules” section then this module is already loaded. Otherwise you need to enable mod_rewrite module.</p>
<p><b>To enable mod_rewrite module in apache installed under windows environment.</b></p>
<p>1) Find the “httpd.conf” file under the “conf” folder inside the Apache’s installation folder.<br />
2) Find the following line “#LoadModule rewrite_module modules/mod_rewrite.so” in the “httpd.conf” file.<br />
3) Remove the “#” at the starting of the line, “#” represents that line is commented.<br />
4) Now restart the apache server.<br />
5) You can see now “mod_rewrite” in the Loaded Module section while doing “phpinfo()”.</p>
<div id="crp_related"><h3>Related Posts:</h3><ul><li><a href="http://blog.codlib.com/2007/07/05/100-windows-keyboard-shortcuts/" rel="bookmark" class="crp_title">100 Windows keyboard shortcuts</a></li><li><a href="http://blog.codlib.com/2007/06/03/how-do-i-include-another-file-inside-a-html-file/" rel="bookmark" class="crp_title">How do I include another file inside a HTML file?</a></li><li><a href="http://blog.codlib.com/2007/05/12/web-bugs-hack-or-solution/" rel="bookmark" class="crp_title">Web bugs: hack or solution?</a></li><li><a href="http://blog.codlib.com/2007/11/15/how-to-display-server-load-in-php/" rel="bookmark" class="crp_title">How to Display Server Load in PHP</a></li><li><a href="http://blog.codlib.com/2008/05/11/what-is-web20/" rel="bookmark" class="crp_title">What is web2.0 ?</a></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://blog.codlib.com/2008/05/06/how-to-enable-mod_rewrite-module-in-apache/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lucene : A Powerfull Search Engine</title>
		<link>http://blog.codlib.com/2007/10/07/lucene-a-powerfull-search-engine/</link>
		<comments>http://blog.codlib.com/2007/10/07/lucene-a-powerfull-search-engine/#comments</comments>
		<pubDate>Mon, 08 Oct 2007 06:25:01 +0000</pubDate>
		<dc:creator>Jans</dc:creator>
				<category><![CDATA[Search Engine]]></category>

		<guid isPermaLink="false">http://blog.codlib.com/2007/10/07/lucene-a-powerfull-search-engine/</guid>
		<description><![CDATA[Doug Cutting, an experienced developer of text-search and retrieval tools, created Lucene. Cutting is the primary author of the V-Twin search engine (part of Apple&#8217;s Copland operating system effort) and is currently a senior architect at Excite. He designed Lucene to make it easy to add indexing and search capability to a broad range of [...]]]></description>
			<content:encoded><![CDATA[<p class="MsoNormal" style="line-height: 150%"><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Doug Cutting, an experienced developer of text-search and retrieval tools, created Lucene. Cutting is the primary author of the V-Twin search engine (part of Apple&#8217;s Copland operating system effort) and is currently a senior architect at Excite. He designed Lucene to make it easy to add indexing and search capability to a broad range of applications, including:<o:p> </o:p></span></p>
<ul style="margin-top: 0in" type="square">
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Searchable      email</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">: An email application could let users search archived messages      and add new messages to the index as they arrive.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Online documentation      search</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">: A documentation reader &#8212; CD-based, Web-based, or      embedded within the application &#8212; could let users search online      documentation or archived publications.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Searchable      Webpages</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">: A Web browser or proxy server could build a      personal search engine to index every Webpage a user has visited, allowing      users to easily revisit pages.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Website      search</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">: A </span><st1:stockticker><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">CGI</span></st1:stockticker><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"> program      could let users search your Website.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Content      search</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">: An application could let the user search saved      documents for specific content; this could be integrated into the Open      Document dialog.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Version      control and content management</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">: A document management system could      index documents, or document versions, so they can be easily retrieved.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">News and      wire service feeds</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">: A news server or relay could index articles as      they arrive.</span><strong><span style="font-size: 11pt; line-height: 150%; font-family: Verdana"><o:p><br />
</o:p></span></strong></li>
</ul>
<p class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 11pt; line-height: 150%; font-family: Verdana">How search engines work?</span></strong><span style="font-size: 11pt; line-height: 150%; font-family: Verdana"><o:p></o:p></span></p>
<p class="MsoNormal" style="line-height: 150%"><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Creating and maintaining an inverted index is the central problem when building an efficient keyword search engine. To index a document, you must first scan it to produce a list of postings. Postings describe occurrences of a word in a document; they generally include the word, a document ID, and possibly the location(s) or frequency of the word within the document.</span><strong><span style="font-size: 11pt; line-height: 150%; font-family: Verdana"><o:p><br />
</o:p></span></strong></p>
<p class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 11pt; line-height: 150%; font-family: Verdana">What is Lucene?</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"><o:p></o:p></span></p>
<ul style="margin-top: 0in" type="square">
<li class="MsoNormal" style="line-height: 150%"><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Apache Lucene is a high-performance, full-featured      text search engine library written entirely in Java. <o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">It is a technology suitable for nearly any      application that requires full-text search.<o:p></o:p></span></li>
</ul>
<p class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 11pt; line-height: 150%; font-family: Verdana">Facts<o:p></o:p></span></strong></p>
<p class="MsoNormal" style="margin-left: 0.25in; line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"><o:p> </o:p></span></strong></p>
<ul style="margin-top: 0in" type="square">
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Incremental      versus batch indexing</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">: Some search engines only support batch      indexing; once they create an index for a set of documents, adding new      documents becomes difficult without reindexing all the documents.      Incremental indexing allows easy adding of documents to an existing index.      For some applications, like those that handle live data feeds, incremental      indexing is critical. Lucene supports both types of indexing.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Data sources</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">: Many      search engines can only index files or Webpages. This handicaps      applications where indexed data comes from a database, or where multiple      virtual documents exist in a single file, such as a ZIP archive. Lucene      allows developers to deliver the document to the indexer through a String      or an InputStream, permitting the data source to be abstracted from the      data. However, with this approach, the developer must supply the      appropriate readers for the data.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Indexing      control</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">: Some search engines can automatically crawl through      a directory tree or a Website to find documents to index. While this is      convenient if your data is already stored in this manner, crawler-based      indexers often provide limited flexibility for applications that require      fine-grained control over the indexed documents. Since Lucene operates      primarily in incremental mode, it lets the application find and retrieve      documents.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">File formats</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">: Some      search engines can only index text or HTML documents; others support a      filter mechanism, which offers a simple alternative to indexing word      processing documents, SGML documents, and other file formats. Lucene      supports such a mechanism.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Content      tagging</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">: Some search engines treat a document as a single      stream of tokens; others allow the specification of multiple data fields      within a document, such as &#8220;subject,&#8221; &#8220;abstract,&#8221;      &#8220;author,&#8221; and &#8220;body.&#8221; This permits semantically richer      queries like &#8220;author contains </span><st1:city><st1:place><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Hamilton</span></st1:place></st1:city><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"> </span><st1:stockticker><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">AND</span></st1:stockticker><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"> body      contains Constitution.&#8221; Lucene supports content tagging by treating      documents as collections of fields, and supports queries that specify      which field(s) to search.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Stop-word      processing</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">: Common words, such as &#8220;a,&#8221;      &#8220;and,&#8221; and &#8220;the,&#8221; add little value to a search index.      But since these words are so common, cataloging them will contribute      considerably to the indexing time and index size. Most search engines will      not index certain words, called stop words. Some use a list of stop words,      while others select stop words statistically. Lucene handles stop words      with the more general Analyzer mechanism, to be described later, and      provides the Stop Analyzer class, which eliminates stop words from the      input stream.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Stemming</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">: Often, a      user desires a query for one word to match other similar words. For      example, a query for &#8220;jump&#8221; should probably also match the words      &#8220;jumped,&#8221; &#8220;jumper,&#8221; or &#8220;jumps.&#8221; Reducing a      word to its root form is called stemming. Lucene does not yet implement      stemming, but you could easily add a stemmer through a more sophisticated Analyzer      class.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Query      features</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">: Search engines support a variety of query features.      Some support full Boolean queries; others support only and queries. Some      return a &#8220;relevance&#8221; score with each hit. Some can handle      adjacency or proximity queries &#8212; &#8220;search followed by engine&#8221; or      &#8220;Knicks near Celtics&#8221; &#8212; others can only search on single      keywords. Some can search multiple indexes at once and merge the results      to give a meaningful relevance score. Lucene supports a wide range of      query features, including all of those listed above. However, Lucene does      not support the valuable Soundex, or &#8220;sounds like,&#8221; query.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Concurrency</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">: Can      multiple users search an index at the same time? Can a user search an      index while another updates it? Lucene allows users to search an index      transactionally, even if another user is simultaneously updating the      index.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Non-English      support</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">: Many search engines implicitly assume that English      is the target language; this is evident in areas such as stop-word lists,      stemming algorithms, and the use of proximity to match phrase queries. As      Lucene preprocesses the input stream through the Analyzer class provided      by the developer, it is possible to perform language-specific filtering.<o:p></o:p></span></li>
</ul>
<p class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 11pt; line-height: 150%; font-family: Verdana">Basic Definitions</span></strong><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"><o:p></o:p></span></strong></p>
<p class="MsoNormal" style="line-height: 150%"><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">The fundamental concepts in Lucene are <o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left: 0.5in; text-indent: -0.25in; line-height: 150%"><!--[if !supportLists]--><span style="font-size: 8pt; line-height: 150%; font-family: Wingdings"><span>§<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal">          </span></span></span><!--[endif]--><strong><span style="font-size: 8pt; line-height: 150%; font-family: Verdana">index<o:p></o:p></span></strong></p>
<p class="MsoNormal" style="margin-left: 0.5in; text-indent: -0.25in; line-height: 150%"><!--[if !supportLists]--><span style="font-size: 8pt; line-height: 150%; font-family: Wingdings"><span>§<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal">          </span></span></span><!--[endif]--><strong><span style="font-size: 8pt; line-height: 150%; font-family: Verdana">document<o:p></o:p></span></strong></p>
<p class="MsoNormal" style="margin-left: 0.5in; text-indent: -0.25in; line-height: 150%"><!--[if !supportLists]--><span style="font-size: 8pt; line-height: 150%; font-family: Wingdings"><span>§<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal">          </span></span></span><!--[endif]--><strong><span style="font-size: 8pt; line-height: 150%; font-family: Verdana">field<o:p></o:p></span></strong></p>
<p class="MsoNormal" style="margin-left: 0.5in; text-indent: -0.25in; line-height: 150%"><!--[if !supportLists]--><span style="font-size: 8pt; line-height: 150%; font-family: Wingdings"><span>§<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal">          </span></span></span><!--[endif]--><strong><span style="font-size: 8pt; line-height: 150%; font-family: Verdana">term<o:p></o:p></span></strong></p>
<p class="MsoNormal" style="line-height: 150%"><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">An index contains a sequence of documents.<br />
A document is a sequence of fields.<br />
A field is a named sequence of terms.|<br />
A term is a string<!--[if !supportLineBreakNewLine]--><br />
<!--[endif]--><o:p></o:p></span></p>
<p class="MsoNormal" style="line-height: 150%"><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">The same string in two different fields is considered a different term.<br />
<span> </span><br />
</span><strong><span style="font-size: 8pt; line-height: 150%; font-family: Verdana">Inverted index</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"><o:p></o:p></span></p>
<ul style="margin-top: 0in" type="square">
<li class="MsoNormal" style="line-height: 150%"><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">The index stores statistics about terms in order to      make term-based search more efficient. <o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Lucene&#8217;s index falls into the family of indexes known      as an <strong>inverted index</strong>. <o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">This is because it can list, for a term, the      documents that contain it. <o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">This is the inverse of the natural relationship, in      which documents list terms.</span><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"><o:p><br />
</o:p></span></li>
</ul>
<p class="MsoNormal" style="line-height: 150%"><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Lucene indexes may be composed of multiple <strong>sub-indexes, or segments</strong>. Each segment is a fully independent index, which could be searched separately.<o:p> </o:p></span></p>
<p class="MsoNormal" style="line-height: 150%"><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Internally, Lucene refers to documents by an <strong>integer document number</strong>. The first document added to an index is numbered zero, and each subsequent document added gets a number one greater than the previous.<o:p> </o:p></span></p>
<p class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 11pt; line-height: 150%; font-family: Verdana">Overview </span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"><br />
Each segment index maintains the following:<o:p> </o:p></span></p>
<ul style="margin-top: 0in" type="square">
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Field names</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"> &#8211; This      contains the set of field names used in the index.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Stored Field      values </span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">- This contains, for each document, a list of      attribute-value pairs, where the attributes are field names. These are      used to store auxiliary information about the document, such as its title,      url, or an identifier to access a database. The set of stored fields are      what is returned for each hit when searching. This is keyed by document      number.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Term      dictionary</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"> &#8211; A dictionary containing all of the terms used in      all of the indexed fields of all of the documents. The dictionary also      contains the number of documents which contain the term, and pointers to      the term&#8217;s frequency and proximity data.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Term      Frequency data</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"> &#8211; For each term in the dictionary, the numbers of      all the documents that contain that term, and the frequency of the term in      that document.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Term      Proximity data -</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"> For each term in the dictionary, the positions that      the term occurs in each document.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Normalization      factors -</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"> For each field in each document, a value is stored      that is multiplied into the score for hits on that field.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Term Vectors      -</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"> For each field in each document, the term vector (sometimes      called document vector) may be stored. A term vector consists of term text      and term frequency. To add Term Vectors to your index see the Field      constructors<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Deleted      documents -</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"> An optional file indicating which documents are      deleted.</span><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"><o:p><br />
</o:p></span></li>
</ul>
<p class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 11pt; line-height: 150%; font-family: Verdana">File Naming</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"><o:p></o:p></span></p>
<ul style="margin-top: 0in" type="square">
<li class="MsoNormal" style="line-height: 150%"><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Typically in lucene, all segments in an index are      stored in a single directory.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">All files belonging to a segment have the same name      with varying extensions.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">File names are never reused.<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Segments Info File</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"> &#8211; The active segments in the      index are stored in the segment info file <o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Lock File</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"> &#8211; This lock file ensures that only      one writer is modifying the index at a time. <o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Deletable File</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"> &#8211; contained details about files      that need to be deleted. <o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Field Info File &#8211; </span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Field names are stored in the      field info file, with suffix .fnm. <o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Field index File</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"> &#8211; This contains, for each      document, a pointer to its field data. Suffix is .fdm<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Field Data File</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"><span>  </span>- This contains the stored fields of      each document. Suffix is .fdt<o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Term Dictionary Files</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"> &#8211; The term      dictionary is represented as two files, The term infos, or tis file and      The term info index, or .tii file. <o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Frequency File</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"> &#8211; The .frq file contains the lists      of documents which contain each term, along with the frequency of the term      in that document. <o:p></o:p></span></li>
<li class="MsoNormal" style="line-height: 150%"><strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana">Positions File</span></strong><span style="font-size: 10pt; line-height: 150%; font-family: Verdana"> &#8211; The .prx file contains the lists      of positions that each term occurs at within documents. <o:p></o:p></span></li>
</ul>
<div id="crp_related"><h3>Related Posts:</h3><ul><li><a href="http://blog.codlib.com/2007/05/07/embedding-fonts-in-web-page/" rel="bookmark" class="crp_title">Embedding Fonts In Web Page.</a></li><li><a href="http://blog.codlib.com/2007/05/30/php-magic-constants/" rel="bookmark" class="crp_title">PHP Magic Constants</a></li><li><a href="http://blog.codlib.com/2010/03/19/top-10-icon-search-engines/" rel="bookmark" class="crp_title">Top 10 Icon Search Engines</a></li><li><a href="http://blog.codlib.com/2007/05/26/have-you-heard-about-joel-test/" rel="bookmark" class="crp_title">Have you heard about Joel Test?</a></li><li><a href="http://blog.codlib.com/2007/05/12/web-bugs-hack-or-solution/" rel="bookmark" class="crp_title">Web bugs: hack or solution?</a></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://blog.codlib.com/2007/10/07/lucene-a-powerfull-search-engine/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

