The process of breaking down a large database to find the document vector (relevance) for various items by comparing them to other items and documents.
- Stemming: taking in account for various forms of a word on a page
- Local Weighting: increasing the relevance of a given document based on the frequency a term appears in the document
- Global Weighting: increasing the relevance of terms which appear in a small number of pages as they are more likely to be on topic than words that appear in most all documents.
- Normalization: penalizing long copy and rewarding short copy to allow them fair distribution in results. a good way of looking at this is like standardizing things to a scale of 100.
Multi dimensional scaling is more efficient than singular value decomposition because it requires exceptionally less computation. When combined with other ranking factors only a rough approximation of relevance is necessary.