The process of breaking down a large database to find the document vector (relevance) for various items by comparing them to other items and documents.

Important steps:

  • Stemming: taking in account for various forms of a word on a page
  • Local Weighting: increasing the relevance of a given document based on the frequency a term appears in the document
  • Global Weighting: increasing the relevance of terms which appear in a small number of pages as they are more likely to be on topic than words that appear in most all documents.
  • Normalization: penalizing long copy and rewarding short copy to allow them fair distribution in results. a good way of looking at this is like standardizing things to a scale of 100.

Multi dimensional scaling is more efficient than singular value decomposition because it requires exceptionally less computation. When combined with other ranking factors only a rough approximation of relevance is necessary.

Previous articleSERP
Next articleTaxonomy