Line processing
At first we have to cut the line
Code:
$search = substr($search, 0, 64);
64 symbols will be enough for user to search
$search = preg_replace ("/[^\w\x7F-\xFF\s]/", " ", $search);
User can�t be allowed searching by very short words (it loads the server). So, we�ll allow searching by words that are longer then two letters:
Code:
$good = trim(preg_replace
    ("/\s(\S{1,2})\s/", " ", ereg_replace(" +", "  "," $search ")));
After bad words changing we have to compact double spaces:
Code:
$good = ereg_eplace(" +", " ", $good);
Logic
Let us suppose that user will have an opportunity to choose the search logic � to search all words or only one. Search form is as follows:
Code:
<form name="some">
    <input type=text name="line">
    <select name="logic">
        <option value="OR">search any word
        <option value="AND">search all words
    </select>
</form>
In the search script we have to check what user has entered:
Code:
if (($logic!="AND") && ($logic!="OR"))

  $logic = "OR";
Search statistics
User must be informed about the result of the search (how many lines of the table he found). That�s why we have to create the query to the base:
Code:
$query = "SELECT id FROM table WHERE field 
LIKE "%". str_replace(" ", "%" OR field LIKE "%", $good). "%"";
For statistics by single words you can do as follows:
$word = explode(" ", $search);
while (list($k, $v) = each($word)) {
if (strlen($v)>2)
$stat[]="$v:".
mysql_num_rows(mysql_query("SELECT id FROM table WHERE field LIKE "%$v%""));
else
$stat[]="$v: <font color=#cc0000>short</font>";
};
$word_stats = "Words statistics: ". implode("", $stat). "<br>";
unset($stat);
Paginal result output
If we have search layout and amount of the lines by the search result we can create paginal search without any problems. Check the variable $page (not less than 0, no more than $results_amount/$rows_in_page). In the query we have to write required fields and fields for sorting. Then we add:
Code:
if ($page==0)
      $request .= "LIMIT $rows_in_page";
else
      $request .= "LIMIT ". $page*$rows_in_page. ",". $rows_in_page;
As a result we�ll get lines that we need to output at the page. For navigation you make links to the next or previous page or navigation panefor several pages.
Code:
if ($page>0)
  print ("<a href=search.php?search=". rawurlencode($good). "&page=". ($page-1). 
  "> previous page </a>");

if ($page<$results_amount/$rows_in_page)
  print ("<a href=search.php?search=". rawurlencode($good). "&page=". ($page+1). 
  ">next page </a>");
Highlight
If you want to highlight or print in bold type initial words in the text, the only thing you have to do is to write the following code:
Code:
$highlight = str_replace(" ", "|", $good);
Spaces we can change to the vertical lines. There is no need to highlight �bad� words because we won�t search them in the base. So, write in the code as follows:
Code:
$row["text"] = ereg_replace($highlight, "<font color=#cc0000>\0</font>", $row["text"]);
If in your text there are HTML tags, you have to write:
$text = eregi_replace(">([^<]*)($words)", ">\1<font color=#cc0000>\2</font>", $text);
Result
Using these methods you can restrict user�s freedom of actions, you won�t allow him a) to learn the site structure; b) to load the server (for example he can send 1 megabyte text message). More than that paginal search and highlight are very convenient for user.
PART 2. Briefly about relevance.

For search result output you have to do as follows:

  • Required fields VARCHAR or other variants of TEXT (SMALLTEXT, MEDIUMTEXT etc.) fields should be done as FULLTEXT keys:ALTER TABLE table ADD FULLTEXT(field)
  • $query = "SELECT *, MATCH field AGAINST ('$searchwords') as relev FROM table ORDER BY relev DESC"

NOTES:

  • By default there is used a search of the words not less than 4 symbols. They can be changed by the �#define MIN_WORD_LEN 4� setting in the �ft_static.c�.
  • % symbols in the search phrase are not allowable.
  • The list of the words delimiters can be changed in the �ft_static.c.�
  • For relevance computation there should be at least ten records in the table
  • Relev field can�t be used in WHERE:
    Code:
    SELECT *, MATCH field AGAINST ("$searchwords") 
    as relev FROM table WHERE relev>0 ORDER BY relev DES
    C
    Also is possible:
    Code:
    SELECT *, MATCH field AGAINST ("$searchwords")
             as relev FROM table WHERE MATCH field AGAINST 
             ("$searchwords")>0 ORDER BY relev DESC
  • Speed is high enough; sometimes it exceeds speed of the like search.
  • Everything aforesaid works starting with the MySQL 3.23.23 version

If you want to create FULLTEXT indexes by several fields there are to variants:
Code:
CREATE TABLE table ( field1 VARCHAR (255), field2 TEXT, FULLTEXT (field1, field2) )
CREATE TABLE table ( field1 VARCHAR (255), field2 TEXT, FULLTEXT (field1), FULLTEXT (field2) )
In the first case there is possible the following query:
Code:
SELECT *, MATCH field1, field2 AGAINST ("$searchwords") 
as relev FROM table ORDER BY relev DESC
Here relevance is computed for all the fields. In the second case such query will show an error. That�s why relevance can be computed as follows:
Code:
SELECT *, MATCH field1 AGAINST ("$searchwords")+MATCH field2 
AGAINST ("$searchwords") as relev FROM table ORDER BY relev DESC
PART 3. Relevance Exercise

At first we need to add FULLTEXT index:
"
Code:
mysql> alter table articlea add fulltext(ztext);
ERROR 1073: BLOB column "ztext" can't be used in key specification with the used
table type
Text indexes can be done only in the tables like MyISAM. Texts are taken from the table and put to the index file. Relev field can�t be used in WHERE:
Code:
SELECT *, MATCH field AGAINST 
    ("$searchwords") as relev FROM table WHERE relev>0 ORDER BY relev DESC
Also is possible:
SELECT *,
Code:
 MATCH field AGAINST ("$searchwords") 
as relev FROM table WHERE MATCH field AGAINST ("$searchwords")>0 ORDER BY relev DESC
Computed field can�t be used in WHERE, it can be used in HAVING:
Code:
SELECT *, MATCH field AGAINST ("$searchwords") 
as relev FROM table HAVING relev>0 ORDER BY relev DESC
Search by means of MATCH is carried out only by the whole words. We remove condition "relev>0", leave the grading. Then cut the line and convert it to the query:
SELECT *,MATCH field AGAINST ("$searchwords")
AS relev FROM table WHERE field LIKE "%$word1%" OR field LIKE
"%$word2%" ORDER BY relev DESC, datefield DESC
In last versions of MySQL database MySQL advises using FULLTEXT indexation and MATCH field AGAINST construction. But not all servers have the last version of MySQL and not all providers want to update the software.
Here the code:
Code:
SELECT title, date_format(material_date,"%e.%c.%y") 
AS date1, IF(text like "%word1 word2 word3%", 3*10, 0) + IF(text LIKE 
"%word1%", 9, 0) + IF(text LIKE "%word2%", 9, 0) 
+ IF(text LIKE "%word3%", 9, 0) AS relevance FROM table
WHERE text LIKE "%word1%" OR text LIKE "%word2%" OR text LIKE 
"%word3%" ORDER BY relevance DESC, material_date DESC
It looks very frightfully but it works even on the old versions of MySQL. I tried to compare the speed of working with the following script:
Code:
SELECT title, date_format(material_date,"%e.%c.%y") AS date1,
MATCH text AGAINST("word1 word2 word3") AS relevance 
FROM table WHERE text LIKE "%word1%"
OR text LIKE "%word2%" OR text LIKE "%word3%" 
ORDER BY relevance DESC, material_date DESC
Speed of the universal query is smaller than speed of the query that uses new constructions. Let�s try to create the following query automatically. Cut the long line as well as all wrong symbols and short words:
Code:
$query = "SELECT title, date_format(material_date,�%e.%c.%y�) 
AS date1, IF(text like �%". $good_words. "%�, ". 
(substr_count($good_words, " ") + 1). "*10, 0)
+ IF(text LIKE �%". str_replace(" ", "%�, 9, 0) + 
IF(text LIKE �%", $good_words). "%�, 9, 0)
AS relevance FROM table WHERE text LIKE �%". str_replace
(" ", "%� OR text LIKE �%", $good_words). 
"%� ORDER BY relevance DESC, material_date DESC";