Title: Using Lift as a Practical Measure of Surprise in a Document Stream

Year of Publication: March - 2015
Page Numbers: 7-12
Authors: Sean Rooney
Conference Name: The Third International Conference on E-Technologies and Business on the Web (EBW2015)
- France


We describe how the concept of Lift can be generalized to order small documents in a corpus by their degree of similarity. This surprisal norm can be used in conjunction with other features to search over the corpus. From an information theoretic point of view surprisal is the combination of the Mutual-information of all word pairs in a documents. We show how the calculation of surprisals can be performed efficiently on a document stream using sketching techniques.