Information retrieval models foundations and relationships /

Information Retrieval (IR) models are a core component of IR research and IR systems. The past decade brought a consolidation of the family of IR models, which by 2000 consisted of relatively isolated views on TF-IDF (Term-Frequency times Inverse-Document-Frequency) as the weighting scheme in the ve...

Full description

Bibliographic Details
Main Author:	Roelleke, Thomas
Format:	eBook
Language:	English
Published:	San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) : Morgan & Claypool, c2013.
Series:	Synthesis digital library of engineering and computer science. Synthesis lectures on information concepts, retrieval, and services ; # 27.
Subjects:	Information retrieval > Mathematical models. BM25 divergence from randomness (DFR) Foundations & Relationships Information Retrieval (IR) Models language modelling (LM) Poisson probabilistic roots of IR models probability of relevance framework (PRF) TF-IDF
Online Access:	Abstract with links to full text

Table of Contents:

1. Introduction
1.1 Structure and contribution of this book
1.2 Background: a timeline of IR models
1.3 Notation
1.3.1 The notation issue "term frequency"
1.3.2 Notation: Zhai's book and this book
2. Foundations of IR models
2.1 TF-IDF
2.1.1 TF variants
2.1.2 TFlog: Logarithmic TF
2.1.3 TFfrac: fractional (ratio-based) TF
2.1.4 IDF variants
2.1.5 Term weight and RSV
2.1.6 Other TF variants: lifted TF and pivoted TF
2.1.7 Semi-subsumed event occurrences: a semantics of the BM25-TF
2.1.8 Probabilistic IDF: The probability of being informative
2.1.9 Summary
2.2 PRF: the probability of relevance framework
2.2.1 Feature independence assumption
2.2.2 Non-query term assumption
2.2.3 Term frequency split
2.2.4 Probability ranking principle (PRP)
2.2.5 Summary
2.3 BIR: binary independence retrieval
2.3.1 Term weight and RSV
2.3.2 Missing relevance information
2.3.3 Variants of the BIR term weight
2.3.4 Smooth variants of the BIR term weight
2.3.5 RSJ term weight
2.3.6 On theoretical arguments for 0.5 in the RSJ term weight
2.3.7 Summary
2.4 Poisson and 2-Poisson
2.4.1 Poisson probability
2.4.2 Poisson analogy: sunny days and term occurrences
2.4.3 Poisson example: toy data
2.4.4 Poisson example: TREC-2
2.4.5 Binomial probability
2.4.6 Relationship between Poisson and binomial probability
2.4.7 Poisson PRF
2.4.8 Term weight and RSV
2.4.9 2-Poisson
2.4.10 Summary
2.5 BM25
2.5.1 BM25-TF
2.5.2 BM25-TF and pivoted TF
2.5.3 BM25: literature and Wikipedia end 2012
2.5.4 Term weight and RSV
2.5.5 Summary
2.6 LM: language modeling
2.6.1 Probability mixtures
2.6.2 Term weight and RSV: LM1
2.6.3 Term weight and RSV: LM (normalized)
2.6.4 Term weight and RSV: JM-LM
2.6.5 Term weight and RSV: Dirich-LM
2.6.6 Term weight and RSV: LM2
2.6.7 Summary
2.7 PIN's: probabilistic inference networks
2.7.1 The Turtle/Croft link matrix
2.7.2 Term weight and RSV
2.7.3 Summary
2.8 Divergence-based models and DFR
2.8.1 DFR: divergence from randomness
2.8.2 DFR: sampling over documents and locations
2.8.3 DFR: binomial transformation step
2.8.4 DFR and KL-divergence
2.8.5 Poisson as a model of randomness: P(Kt [greater than] 0/d,c): DFR-1
2.8.6 Poisson as a model of randomness: P(Kt [equals] TFd/d,c): DFR-2
2.8.7 DFR: elite documents
2.8.8 DFR: example
2.8.9 Term weights and RSV's
2.8.10 KL-divergence retrieval model
2.8.11 Summary
2.9 Relevance-based models
2.9.1 Rocchio's relevance feedback model
2.9.2 The PRF
2.9.3 Lavrenko's relevance-based language models
2.10 Precision and recall
2.10.1 Precision and recall: conditional probabilities
2.10.2 Averages: total probabilities
2.11 Summary
3. Relationships between IR models
3.1 PRF: the probability of relevance framework
3.1.1 Estimation of term probabilities
3.2 P(d - q): the probability that d implies q
3.3 The vector-space model (VSM)
3.3.1 VSM and probabilities
3.4 The generalised vector-space model (GVSM)
3.4.1 GVSM and probabilities
3.5 A general matrix framework
3.5.1 Term-document matrix
3.5.2 On the notation issue "term frequency"
3.5.3 Document-document matrix
3.5.4 Co-occurrence matrices
3.6 A parallel derivation of probabilistic retrieval models
3.7 The Poisson bridge: Pd(t/u) avgtf(t,u) [equals] PL(t/u) avgdl(u)
3.8 Query term probability assumptions
3.8.1 Query term mixture assumption
3.8.2 Query term burstiness assumption
3.8.3 Query term BIR assumption
3.9 TF-IDF
3.9.1 TF-IDF and BIR
3.9.2 TF-IDF and Poisson
3.9.3 TF-IDF and BM25
3.9.4 TF-IDF and LM
3.9.5 TF-IDF and LM: side-by-side
3.9.6 TF-IDF and PIN's
3.9.7 TF-IDF and divergence
3.9.8 TF-IDF and DFR: risk times gain
3.9.9 TF-IDF and DFR: gaps between term occurrences
3.10 More relationships: BM25 and LM, LM and PIN's
3.11 Information theory
3.11.1 Entropy
3.11.2 Joint entropy
3.11.3 Conditional entropy
3.11.4 Mutual information (MI)
3.11.5 Cross entropy
3.11.6 KL-divergence
3.11.7 Query clarity: divergence(query collection)
3.11.8 LM = Clarity(query) - Divergence(query doc)
3.11.9 TF-IDF = Clarity(doc) - Divergence(doc query)
3.12 Summary
4. Summary & research outlook
4.1 Summary
4.2 Research outlook
4.2.1 Retrieval models
4.2.2 Evaluation models
4.2.3 A unified framework for retrieval and evaluation
4.2.4 Model combinations and "new" models
4.2.5 Dependence-aware models
4.2.6 "Query-log" and other more-evidence models
4.2.7 Phase-2 models: retrieval result condensation models
4.2.8 A theoretical framework to predict ranking quality
4.2.9 MIR: math for IR
4.2.10 AIR: abstraction for IR
Bibliography
Author's biography
Index.

Information retrieval models foundations and relationships /

Similar Items