Vl 4 Boolean Retrieval Flashcards

(13 cards)

1
Q

Boolean Retrieval

A
  • queries are boolean expressions
  • search enginge returns all documents that satisfy the expression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Term-Document Incidence Matrix

A
  • rows = documents
  • columns = terms
  • entrance = 1 or 0
  • memory overhead
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Incidence Vector

A
  • vector for each term
  • column
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Inverted Index

A
  • for each term a list of all documents are store, where the term appears
  • singly linked list, sorted by document ID
  • dictionary kept in memory (smaller)
  • postin kept in background storage (larger)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Simple conjunctive query

A
  • locate the terms
  • retrieve posting lists from the file
  • intersect the two posting lists
  • step by step comparision
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Query optimization

A
  • start with shortest posting list (document frequency)
  • AND two posting list
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Skip pointer

A
  • allow us to skip ostings in the list
  • intersectiong list more efficient
  • problem: where to put pointers
  • sqrt(P) pointers (P = length of posting list)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Hierachical organization

A
  • multiple skip pointers at a posting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Phrase queries

A
  • words that belong together
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

(Phrase queries)
Bi-Word indexes

A
  • index evey consecutive pair of terms in the text as a phrase -> new vocabulary term
  • memory
  • false positives -> post- filtering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

(Phrase queries)
Extended Bi-Words

A
  • perform part-of-speech tagging for each document
  • bucket terms into nouns N or articles/prepositions X
  • terms of the form NX*N are extended Bi-words
  • false positives
  • index blow-up
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

(Phrase queries)
Position index

A
  • each posting is a doc-ID and a ist of positions (in a document)
  • used for proximity search
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

combination scheme

A
  • bi-word indexes and positionale indexes
  • include frequent bi-words as vocabulary terms
  • individual words common, desired phrases is rare
How well did you know this?
1
Not at all
2
3
4
5
Perfectly