New search system for the Cave
21st September, 2018
Anth's Computer Cave is deploying AAIMI SiteSearch boxes across the site to help visitors find the articles and tutorials they need.
This program is part of the AAIMI SiteMod platform. You can try it out by clicking the Site Search button in the main menu of this page.
Why the change?
Up till now we have used an older program from the AAIMI Project, AaimiClip.
This worked okay, but it required manually compiling lists of keywords for each page, which is not practical for large sites.
Manually predicting search terms also means that people will only find your search suggestions. Often visitors will be looking for something you have not expected, and they are out of luck if your search terms are loaded towards what you think is important.
Using AAIMI SiteSearch, which is based purely on word-repetition, visitors can get more spontaneous results.
How it works
AAIMI SiteSearch will be available for download on the 25th of September, so you'll be able to embed it in your own site.
We'll feature a comprehensive setup and usage tutorial then, but for now I'll just give you a brief overview of the system's capabilities.
The crawler is easy to use, just run the program and it moves recursively through your web directories. It opens each HTML file and separates then extracts the content from your HTML tags. It reads every word of this content and notes the number of times each word occurs in the file.
When visitors use the search box, their search terms are sent to the Aaimi SiteSearch Python program, which finds matches and returns results to the browser as pre-formated HTML.
AAIMI SiteSearch will release as a nightly-build with a continuous update/upgrade cycle. There are several new options on the bench now.
The results are currently ranked by the number of matching search terms. Using more search terms will produce more results, but brings the most relevant results to the top.
In future builds the word list method will be just one part of the criteria. The program will also look at entire sentences, and their context to the current search.
There will also be more exclude options to avoid indexing unwanted pages. At the moment you can exclude entire directories but not single files.
Stay tuned for more. Leave a comment below if you have any ideas.