Loading...

Information retrieval models foundations and relationships /

Information Retrieval (IR) models are a core component of IR research and IR systems. The past decade brought a consolidation of the family of IR models, which by 2000 consisted of relatively isolated views on TF-IDF (Term-Frequency times Inverse-Document-Frequency) as the weighting scheme in the ve...

Full description

Bibliographic Details
Main Author: Roelleke, Thomas
Format: eBook
Language:English
Published: San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) : Morgan & Claypool, c2013.
Series:Synthesis digital library of engineering and computer science.
Synthesis lectures on information concepts, retrieval, and services ; # 27.
Subjects:
Online Access:Abstract with links to full text
Table of Contents:
  • 1. Introduction
  • 1.1 Structure and contribution of this book
  • 1.2 Background: a timeline of IR models
  • 1.3 Notation
  • 1.3.1 The notation issue "term frequency"
  • 1.3.2 Notation: Zhai's book and this book
  • 2. Foundations of IR models
  • 2.1 TF-IDF
  • 2.1.1 TF variants
  • 2.1.2 TFlog: Logarithmic TF
  • 2.1.3 TFfrac: fractional (ratio-based) TF
  • 2.1.4 IDF variants
  • 2.1.5 Term weight and RSV
  • 2.1.6 Other TF variants: lifted TF and pivoted TF
  • 2.1.7 Semi-subsumed event occurrences: a semantics of the BM25-TF
  • 2.1.8 Probabilistic IDF: The probability of being informative
  • 2.1.9 Summary
  • 2.2 PRF: the probability of relevance framework
  • 2.2.1 Feature independence assumption
  • 2.2.2 Non-query term assumption
  • 2.2.3 Term frequency split
  • 2.2.4 Probability ranking principle (PRP)
  • 2.2.5 Summary
  • 2.3 BIR: binary independence retrieval
  • 2.3.1 Term weight and RSV
  • 2.3.2 Missing relevance information
  • 2.3.3 Variants of the BIR term weight
  • 2.3.4 Smooth variants of the BIR term weight
  • 2.3.5 RSJ term weight
  • 2.3.6 On theoretical arguments for 0.5 in the RSJ term weight
  • 2.3.7 Summary
  • 2.4 Poisson and 2-Poisson
  • 2.4.1 Poisson probability
  • 2.4.2 Poisson analogy: sunny days and term occurrences
  • 2.4.3 Poisson example: toy data
  • 2.4.4 Poisson example: TREC-2
  • 2.4.5 Binomial probability
  • 2.4.6 Relationship between Poisson and binomial probability
  • 2.4.7 Poisson PRF
  • 2.4.8 Term weight and RSV
  • 2.4.9 2-Poisson
  • 2.4.10 Summary
  • 2.5 BM25
  • 2.5.1 BM25-TF
  • 2.5.2 BM25-TF and pivoted TF
  • 2.5.3 BM25: literature and Wikipedia end 2012
  • 2.5.4 Term weight and RSV
  • 2.5.5 Summary
  • 2.6 LM: language modeling
  • 2.6.1 Probability mixtures
  • 2.6.2 Term weight and RSV: LM1
  • 2.6.3 Term weight and RSV: LM (normalized)
  • 2.6.4 Term weight and RSV: JM-LM
  • 2.6.5 Term weight and RSV: Dirich-LM
  • 2.6.6 Term weight and RSV: LM2
  • 2.6.7 Summary
  • 2.7 PIN's: probabilistic inference networks
  • 2.7.1 The Turtle/Croft link matrix
  • 2.7.2 Term weight and RSV
  • 2.7.3 Summary
  • 2.8 Divergence-based models and DFR
  • 2.8.1 DFR: divergence from randomness
  • 2.8.2 DFR: sampling over documents and locations
  • 2.8.3 DFR: binomial transformation step
  • 2.8.4 DFR and KL-divergence
  • 2.8.5 Poisson as a model of randomness: P(Kt [greater than] 0/d,c): DFR-1
  • 2.8.6 Poisson as a model of randomness: P(Kt [equals] TFd/d,c): DFR-2
  • 2.8.7 DFR: elite documents
  • 2.8.8 DFR: example
  • 2.8.9 Term weights and RSV's
  • 2.8.10 KL-divergence retrieval model
  • 2.8.11 Summary
  • 2.9 Relevance-based models
  • 2.9.1 Rocchio's relevance feedback model
  • 2.9.2 The PRF
  • 2.9.3 Lavrenko's relevance-based language models
  • 2.10 Precision and recall
  • 2.10.1 Precision and recall: conditional probabilities
  • 2.10.2 Averages: total probabilities
  • 2.11 Summary
  • 3. Relationships between IR models
  • 3.1 PRF: the probability of relevance framework
  • 3.1.1 Estimation of term probabilities
  • 3.2 P(d - q): the probability that d implies q
  • 3.3 The vector-space model (VSM)
  • 3.3.1 VSM and probabilities
  • 3.4 The generalised vector-space model (GVSM)
  • 3.4.1 GVSM and probabilities
  • 3.5 A general matrix framework
  • 3.5.1 Term-document matrix
  • 3.5.2 On the notation issue "term frequency"
  • 3.5.3 Document-document matrix
  • 3.5.4 Co-occurrence matrices
  • 3.6 A parallel derivation of probabilistic retrieval models
  • 3.7 The Poisson bridge: Pd(t/u) avgtf(t,u) [equals] PL(t/u) avgdl(u)
  • 3.8 Query term probability assumptions
  • 3.8.1 Query term mixture assumption
  • 3.8.2 Query term burstiness assumption
  • 3.8.3 Query term BIR assumption
  • 3.9 TF-IDF
  • 3.9.1 TF-IDF and BIR
  • 3.9.2 TF-IDF and Poisson
  • 3.9.3 TF-IDF and BM25
  • 3.9.4 TF-IDF and LM
  • 3.9.5 TF-IDF and LM: side-by-side
  • 3.9.6 TF-IDF and PIN's
  • 3.9.7 TF-IDF and divergence
  • 3.9.8 TF-IDF and DFR: risk times gain
  • 3.9.9 TF-IDF and DFR: gaps between term occurrences
  • 3.10 More relationships: BM25 and LM, LM and PIN's
  • 3.11 Information theory
  • 3.11.1 Entropy
  • 3.11.2 Joint entropy
  • 3.11.3 Conditional entropy
  • 3.11.4 Mutual information (MI)
  • 3.11.5 Cross entropy
  • 3.11.6 KL-divergence
  • 3.11.7 Query clarity: divergence(query collection)
  • 3.11.8 LM = Clarity(query) - Divergence(query doc)
  • 3.11.9 TF-IDF = Clarity(doc) - Divergence(doc query)
  • 3.12 Summary
  • 4. Summary & research outlook
  • 4.1 Summary
  • 4.2 Research outlook
  • 4.2.1 Retrieval models
  • 4.2.2 Evaluation models
  • 4.2.3 A unified framework for retrieval and evaluation
  • 4.2.4 Model combinations and "new" models
  • 4.2.5 Dependence-aware models
  • 4.2.6 "Query-log" and other more-evidence models
  • 4.2.7 Phase-2 models: retrieval result condensation models
  • 4.2.8 A theoretical framework to predict ranking quality
  • 4.2.9 MIR: math for IR
  • 4.2.10 AIR: abstraction for IR
  • Bibliography
  • Author's biography
  • Index.