A Multiple Relevance Feedback Strategy with Positive and Negative Models.  
Jump to Full Text  
MedLine Citation:

PMID: 25137234 Owner: NLM Status: Publisher 
Abstract/OtherAbstract:

A commonly used strategy to improve search accuracy is through feedback techniques. Most existing work on feedback relies on positive information, and has been extensively studied in information retrieval. However, when a query topic is difficult and the results from the firstpass retrieval are very poor, it is impossible to extract enough useful terms from a few positive documents. Therefore, the positive feedback strategy is incapable to improve retrieval in this situation. Contrarily, there is a relatively large number of negative documents in the top of the result list, and it has been confirmed that negative feedback strategy is an important and useful way for adapting this scenario by several recent studies. In this paper, we consider a scenario when the search results are so poor that there are at most three relevant documents in the top twenty documents. Then, we conduct a novel study of multiple strategies for relevance feedback using both positive and negative examples from the firstpass retrieval to improve retrieval accuracy for such difficult queries. Experimental results on these TREC collections show that the proposed language model based multiple model feedback method which is generally more effective than both the baseline method and the methods using only positive or negative model. 
Authors:

Yunlong Ma; Hongfei Lin 
Related Documents
:

23380684  Managing proposals and evaluations of updates to medical knowledge: theory and applicat... 23944634  On occupational performance. 23909854  Epilepsy biomarkers. 23544524  Production of nanocrystalline cellulose from lignocellulosic biomass: technology and ap... 17894354  Automated server predictions in casp7. 9397344  Efficient temporal probabilistic reasoning via contextsensitive model construction. 
Publication Detail:

Type: JOURNAL ARTICLE Date: 2014819 
Journal Detail:

Title: PloS one Volume: 9 ISSN: 19326203 ISO Abbreviation: PLoS ONE Publication Date: 2014 
Date Detail:

Created Date: 2014819 Completed Date:  Revised Date:  
Medline Journal Info:

Nlm Unique ID: 101285081 Medline TA: PLoS One Country:  
Other Details:

Languages: ENG Pagination: e104707 Citation Subset:  
Export Citation:

APA/MLA Format Download EndNote Download BibTex 
MeSH Terms  
Descriptor/Qualifier:

Full Text  
Journal Information Journal ID (nlmta): PLoS One Journal ID (isoabbrev): PLoS ONE Journal ID (publisherid): plos Journal ID (pmc): plosone ISSN: 19326203 Publisher: Public Library of Science, San Francisco, USA 
Article Information Download PDF Copyright: 2014 Ma, Lin License: Received Day: 20 Month: 3 Year: 2014 Accepted Day: 16 Month: 7 Year: 2014 collection publication date: Year: 2014 Electronic publication date: Day: 19 Month: 8 Year: 2014 Volume: 9 Issue: 8 Elocation ID: e104707 PubMed Id: 25137234 ID: 4138086 Publisher Id: PONED1404292 DOI: 10.1371/journal.pone.0104707 
A Multiple Relevance Feedback Strategy with Positive and Negative Models Alternate Title:Multiple Relevance Feedback Strategy with Positive and Negative Models  
Yunlong Maaff1^{¶}  
Hongfei Linaff1*  
Rongrong Jiedit1 
Role: Editor 
Information Retrieval Laboratory, School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning, China 

Xiamen University, China 

Correspondence: * Email: hflin@dlut.edu.cn [conflict] Competing Interests: The authors have declared that no competing interests exist. Contributed by footnote: Conceived and designed the experiments: YM. Performed the experiments: YM. Analyzed the data: YM. Contributed reagents/materials/analysis tools: YM HL. Wrote the paper: YM. [other] ¶ YM is the first author on this work. 
Since the inherent limitations of current retrieval models, it is nearly impossible for any retrieval model to return satisfactory results for every query. Indeed, a query might be so simple or ambiguous that a large number of topranked documents are nonrelevant, and we usually call it difficult query. In such a case, a user would have to either reformulate the query or go far down on the ranked list to examine more documents. Both may decrease the user satisfaction. As a result, improving the effectiveness of search results for such difficult queries would bring user satisfaction which is the ultimate goal of search engines.
The language modeling approach to text retrieval was first introduced by Ponte and Croft in ^{[1]} and later explored in ^{[2]}–^{[4]}. The relative simplicity and effectiveness of the language modeling approach, together with the fact that it leverages statistical methods that have been developed in speech recognition and other areas, make it an attractive framework in which to develop new text retrieval method. Although the language modeling approach has performed well empirically, a significant amount of performance increase is often due to feedback ^{[1]}, ^{[2]}, ^{[5]}. When a user is unable to submit an effective query (which happens often in informational queries due to ^{[1]}, ^{[2]}, ^{[5]} insufficient knowledge about the relevant documents), feedback can be quite beneficial with the basic idea of extracting useful terms or features from relevant (or pseudo relevant) documents and use them to expand the original query or update the query model. The feedback techniques can help not only text retrieval but also multimedia retrieval ^{[6]}, such as image searching ^{[7]}, landmark searching ^{[8]} and etc. Although several kinds of feedback techniques, including relevance feedback ^{[9]}–^{[11]}, pseudorelevance feedback ^{[12]}–^{[14]} and implicit feedback ^{[15]}, have been extensively studied in information retrieval, most existing work on feedback relies on positive information, i.e., exploiting relevant documents or documents that are assumed to be relevant. To our knowledge, both of implicit feedback and pseudorelevance feedback have limitations individually. An explicit feedback operation is harmful to user's experience, and the hypothesis of pseudo relevance feedback, the top documents in the firstround retrieval are all relevant to a specific query, is often invalid ^{[16]} which can result in a negative impact on the retrieval performance.
In this paper, we focus on a real environment that a user submits one query to a search engine and then clicks several hyperlinks of return list for viewing. Fortunately, the click operations can be recorded with the form of search engine query logs. Thereby, we assume that all the clicked documents are all highly relevant and others in the return list before the lowestranked clicked document are irrelevant. Then, according this assumption, we use the positive information in the relevant document to derive a new positive model that expands the original query. We choose the RelevanceBased Language Models (RM) in ^{[17]}, which is a typical language model based Query Expansion (QE) approach in ^{[18]}, as the implementation of positive model estimating approach in our work.
Some previous studies concluded that when positive documents are available, they are generally more useful than negative documents in ^{[19]}, so the positive feedback has been studied extensively. As a result, how to exploit negative documents for feedback has been largely underaddressed, and negative feedback has just attracted attention recently. In ^{[16]}, ^{[20]}, the authors studied different methods for negative feedback using only irrelevant information and neglecting all relevant information. Intuitively, if we can learn from both of positive and negative information to raise the rank of relevant documents and prune nonrelevant documents from the original ranked list concurrently, we will improve the performance more. In this paper, we tackle this challenge and estimate a negative feedback model by considering not only the negative information but also the positive model which we just obtained using an improved RM approach. Finally, we use the multiple relevance feedback strategy which is formed by the fusion of positive and negative relevance model to rerank the unseen list. To evaluate the effectiveness of the proposed method, we construct a test collection containing only appropriate queries from TREC collections. Experiment results show that the proposed multiple relevance feedback strategy is effective for improving ranking accuracy and it outperforms the one using only either positive or negative feedback.
The rest of the paper is organized as follows. In the next section, we review related work firstly. Section 3 describes our feedback framework for language models. Then, in the section 4, we show our positive and negative model estimating approaches in details. Section 5 contains experimental results, as well as a discussion of those results and the last section is a conclusion.
Relevance feedback has been shown to be effective with different kinds of retrieval models in ^{[14]}, ^{[15]}, ^{[21]}, ^{[22]}. In the vector space model, feedback is usually done by using the Rocchio algorithm, which forms a new query vector by maximizing its similarity to relevant documents and minimizing its similarity to nonrelevant documents ^{[10]}. The feedback method in classical probabilistic models is to select expanded terms primarily based on Robertson/SparckJones weight ^{[9]}. Unfortunately, both of them cannot be naturally implemented in the language modeling approaches ^{[16]}. In the language modeling approaches, relevance feedback can be implemented through estimating a query language model ^{[22]} or relevance model ^{[17]} through exploiting a set of feedback documents.
Recently, several query expansion techniques have been developed in the language modeling framework, including, e.g., mixturemodel feedback method ^{[22]} and relevance model ^{[17]}. The basic idea is to use feedback documents to estimate a better query language model. Both the mixture model and relevance model have been shown to be very effective, but the relevance model appears to be more robust ^{[23]}. In the mixturemodel feedback, the words in feedback documents are assumed to be drawn from two models: (1) background model and (2) topic model. The mixturemodel feedback finds the topic model that best describes the feedback documents by separating the topic model from the background model. The topic model is then interpolated with the original query model to form the expanded query. Much like mixturemodel feedback, the relevance model also estimates an improved query language model. Given a query , a relevance model is a multinomial distribution that encodes the likelihood of each term in the query as evidence. To estimate the relevance model, the authors first compute the joint probability of observing a word together with the query words in each feedback document and then aggregate the evidence by summing over all the documents. It essentially uses the query likelihood as the weight for a document and takes an average of the probability of word given by each document language model.
When there are no real relevance judgments available, alternatively, pseudo relevance feedback ^{[12]}–^{[14]} may be performed, which simply assumes that a small number of topranked documents in the initial retrieval results are relevant and then applies relevance (positive) feedback. Thus, both of the two above feedback approach in the language model are based on this assumption and our work differs from those in that we use real relevance judgments instead of the assumption above.
There are lots of pervious work focus on explicit feedback which can be used to obtain user's judgements leading to a good performance retrieval, but unfortunately, quite few users will put up with an additional interactive operation. Even if we also use the user's judgements in this work, but a main difference of our work from the explicit feedback approach is that the additional operation is dispensable and we just use the information extracted from search engine query logs. This idea is similar to the measures proposed in ^{[15]} which is named implicit feedback, but our work considers not only positive information but also negative information, and this research field has just attracted attention recently.
There have been some attempts to exploit nonrelevant documents. Query zone ^{[24]} appears to be the only major heuristic proposed to effectively exploit nonrelevant information for a document routing tasks. It shows that using nonrelevant documents that are close to the original query is more effective than using all nonrelevant documents in the collection. Also, the work in ^{[25]} exploits highscoring documents outside of top documents (called pseudoirrelevant documents) to improve the performance of pseudorelevance feedback. The work in ^{[16]} and later extension ^{[20]} exploit the top nonrelevant documents to improve the ranking of documents and they are the earliest studies of negative relevance feedback in the language modeling framework. The last one defines an important concept called generalization of a language model and the authors propose an optimization framework based on this concept. It is a brilliant work and we propose our feedback strategy in this paper also based on the same concept, but we consider that positive feedback model should be taken into account when optimizing the negative model to more aggressively (but carefully) prune nonrelevant documents, leading to a more effective multiple relevance feedback method.
Given a query and a document collection , a retrieval system returns a ranked list of documents where is the th ranked document in the ranked list . We assume that the query is difficult enough so that there are only a handful of relevant documents in top ranked documents (seen so far by the user) and most of documents in are nonrelevant. The goal of our study is to use these positive examples to build a positive language model first which describe the information need more accurately, so that the rest unseen relevant documents will be assign a higher relevance score when reranking.
However, the second part of our feedback model is a set of negative models. Therefore, we use all the negative feedback examples, i.e., to build a set of negative language models, each corresponds to a negative example. Then, every negative language model will be optimized by taking account of the original query model and the positive language model, so that these improved negative language models are better able to describe other unseen nonrelevant documents and improve the ranking of relevant documents by pushing down nonrelevant documents in the ranked list.
More formally, given a specific query , a ranked list and a set of relevance judgements including relevant (positive) documents and nonrelevant (negative) documents corresponding this query , our goal is to estimate a positive language model (then combine with the original query language model to form a relevance topic language model ) and a set of improved negative language models , where , i.e., each language model consists of words along with their probabilities. All the models above can then be plugged into the final feedback strategy to improve feedback performance.
In this paper, we only focus on the positive and negative feedback problem in the language modeling framework, so we just use Language Model (LM) as the basic retrieval model in all our work. There are two main score functions in LM, the original and basic one is QueryLikelihood (QL) function ^{[26]}. In it, we construct from each document in the collection a language model . The goal is to rank documents by , where the probability of a document is interpreted as the likelihood that it is relevant to the query. Using Bayes rule we have:
where is the same for all documents, and so can be ignored. The prior probability of a document is often treated as uniform across all and so it can also be ignored. Thereby, return results ranked by simply , the probability of the query under the language model derived from . The Language Modeling approach thus attempts to model the query generation process: documents are ranked by the probability that a query would be observed as a random sample from the respective document model.The other score function named KLdivergence function ^{[27]} which is one of the most effective score function in the language modeling framework ^{[23]}. It is a generalization of the querylikelihood function and would score a document w.r.t query based on the negative KullbackLeibler divergence between the query language model and the document language model :
where is the words in the vocabulary.Clearly, the two main tasks are to estimate the query language model and the document language model . The document language model is usually smoothed using Dirichlet prior smoothing which is an effective smoothing method ^{[28]}.
The query model intuitively captures what the user is interested in, thus would affect retrieval accuracy significantly. The query language model , is often estimated (in case of no feedback) based on:
where is the count of word in query and is the total number of words in the query. Such a model, is not very discriminative because a query is typically extremely short. When there is feedback information, the information would be used to improve the estimate of query language model .According to all pervious work, all our work use language model with KLdivergence score function as the basic retrieval model throughout this paper.
In order to describe users' information need more effectively, we have to estimate a positive feedback language model first, and the RelevanceBased Language Models (RM), which is a typical pseudo relevance feedback (PRF) approach implementation in the language modeling framework, is chosen as the basic of the positive model estimating approach in our work.
In RM estimate function, except the PRF document models, the document weight consists of two components: a document relevance score and a document prior. The former represents the initial document relevance probability, while the latter is the prior probability of selecting the corresponding document. More formally, for each given query , based on the corresponding PRF document set , the RM estimates an expanded query model:
where is the estimated relevance model. A number of terms with top probabilities in will be used to estimate the QE model (i.e. the expanded query model).In Equation 4, is the probability of a term in the language model for a document , is 's prior probability, and is the querylikelihood:
In RM, the weighting function is:
where the QL relevance score and document prior are integrated to form the document weight. The plays a key role in RM since it distinguishes the RM from a mixture of document language model (see ).To apply revised weighting functions under the RM framework, we reformulate the RM as:
where denotes any revised documentweighting function that satisfies , and different will derive different RM implement.According to Equation 6, in RM is a normalized querylikelihood score (see Equation 1) being eliminated the constant and since the document prior is assumed to be uniform, it turns out that the weighting function is the normalized querylikelihood probability:
The normalized querylikelihood document weight are called as QL weights in the following text. From Equation 2 and 8, the QL weights can out be computed efficiently, because it lead to a additional calculation operation. Moreover, the KLdivergence is a more effective function in information retrieval tasks. Thus, we adapt the KLdivergence function as the document weight in the original RM function:
When the first time ranked list return, all the necessary KL scores can be obtained at once, and then, we can figure out the normalized KL weight very soon.
Nevertheless, RM is a typical pseudo relevance feedback (PRF) approach and the basic assumption is the a small number of topranked documents in the initial retrieval results are relevant. So, it is reasonable to assign each document weight by their relevance score descending sequence. But in our work, all the relevant document are extract from truly judgements by user's feedback, so we consider that document which got a lower relevance score in the first time retrieval maybe need more attention and higher weight in feedback processing, because it is necessary to improve the new query description ability for the document which have not be described well by the original query. Thus, we modify the KL document weight as follow:
and the final RM function we use as the positive feedback model is:The experiment results in the evaluation section show that the our positive feedback model (named RMKL) is more effective than RM.
The basic idea in relevance feedback is to extract useful information from positive documents and use them to update the original query language model as we have done above. When a query is difficult, it is often impossible to obtain a lot of (or enough) positive documents for feedback. Therefore, the best way would be to exploit the negative documents to perform negative feedback ^{[16]}. The idea of negative feedback is to identify distracting nonrelevant documents and penalize unseen documents containing such irrelevant information.
The two negative feedback methods proposed in ^{[20]} are SingleNeg and MultiNeg methods which we briefly describe below.
This method adjusts the original relevance score of a document with a single negative model. Let and be estimated query model and document model, respectively. Let be a negative language model estimated based on negative feedback documents . The new scoring according to this model is:
In order to estimate , it is assumed that all nonrelevant documents are generated from a mixture model of a unigram language model and a background language model (generating common words). The loglikelihood of the sample documents is:
where is a mixture parameter that controls the weight of the background model. A standard EM algorithm is used to estimate parameters .This method adjusts the original relevance score with multiple negative topic models. Document w.r.t query is scored as follows:
where is a negative document representation and is a parameter that controls the influence of negative feedback. EM algorithm is used to estimate a negative model for each individual negative document in . Then negative models be obtained and combined with the above formula for reranking.According to the experimental results and conclusion in ^{[20]}, the MultiNeg strategy lead a better performance than the other one, so of course, we choose the MultiNeg as our basic negative feedback modeling strategy. Specifically, based on the KLdivergence score function, the MultiNeg formula will be expended to the following form:
and we use it in our experiments.A main goal of our study is to improve the estimate of the positive and negative document language models. A effective positive language model can combine with the original query language model to improve the ranking of relevant documents by boosting their relevance scores directly, and it can be optimized through the EM algorithm. A effective negative document language model can be used to exploit the top nonrelevant documents to improve the ranking of documents, and it can be obtained by generalizing a basic negative document language model with an optimization framework. There are three criteria have to be considered in the optimization process: (1) closeness to the original negative language model (to ensure the accuracy), (2) closeness to the relevance (positive) language model (if it is far from the information need, the pruning power is not very effective), and (3) a generalization constraint. The reason why all these three components are important can be explained in Figure 1, where (a) shows that the general negative language model is safe and effective since it is both close to the original negative language model (thus ensures that the pruned documents to be nonrelevant) and reasonably close to the relevance language model (thus can make a difference in the topranked results through pruning).
In the next section, we present an optimization framework for improving the estimate of both positive and negative document language models.
In order to build a more general negative language model, we need an optimization framework that given , searches in the space of all language models and finds a set of more general negative language models, i.e., , finally, picks out the best model, i.e. .
Therefore, we prefer the objective function definition and expend it with positive feedback model which is a important pair in our work as follows:
where and are divergence functions. is a tradeoff between closeness to the relevant topic model and closeness to the original negative model.We also continue to use the restriction to avoid overgeneralization:
It provides that general negative language model can deviate at most from original negative language model. The generality is defined as:
where is the number of documents containing word in collection (document frequency) and is the probability of word given language model .Next, we describe the divergence functions, and in the optimization framework.
We define both of the two divergence and in Equation 16 based on KLdivergence. First, the divergence from general negative model to the relevant topic model is KL value exactly:
The KLdivergence function also be called as relative entropy, the former variable in is consider as the truly distribution and later variable is testing distribution. But, it is unreasonable to consider either or as the truly distribution, so we continue to use the symmetric version of KLdivergence ^{[27]} for the divergence between general negative model and the original negative model.
With these instantiations, the objective function is completely defined.
In the objective function (Equation 16), the searching space is infinite, and in order to find an optimal solution efficiently, we make it tractable by searching in a finite space of all feasible solutions, . Therefore, we propose two steps for shrink the searching space, and we describe them in details here.
As we have explained in Section 4.1, the goal of general negative language model optimization is 1) close to the original negative language model (the first part of Equation 16), 2) and close to relevance topic language model (the second part). The closeness to ensures the pruning power, but the original negative language model is in collision with the relevance topic model (the same terms with high observation frequency), that is the main reason that these negative documents are returned in the toprank list. So we remove the terms, which have a high probability in relevance topic model , from the original negative language model . Specifically, top terms in be removed in our experiments.
Similar to the Perturbation step in ^{[29]}, foreach in the original negative language model set, we build a more general negative language model by removing appropriate terms . But in our work, we remove those terms iteratively that satisfy , with the increment of for iteration, until minimizing the objective function and it is no doubt that the revise negative language model is still more general than . Table 1 shows the iteration of term elimination.
Note that, after any term removing, the probabilities are renormalized to ensure they are comparable.
The evaluation is done using two standard TREC (Text REtrieval Conference – http://trec.nist.gov/) collections: Robust04 and GOV2, that are representative of heterogeneous and homogeneous data sets, respectively, with the details in Table 2.
Our first data set is Robust Track of TREC 2004 which has 528,155 news articles. We use 150 queries in this set for our experiments. The Robust Track is a standard ad hoc retrieval with an emphasis on the overall reliability of IR systems which contains difficult queries and is a heterogeneous data set. The data set is called “Robust04” in the following text.
The second data set is a TREC test collection for use in the Terabyte Track which is a homogeneous data set. It contains 25,205,179 documents crawled from the “.gov” domain sites in 2004, and there are 150 queries in this set. The data set is called “GOV2” below.
For both data sets, preprocessing of documents and queries involves only stemming with Porter stemmer and removing stopwords by a minimum English stopwords list in Lucene (Apache Lucene – http://lucene.apache.org/).
Since our goal is to positive and negative feedback in language modeling framework, we construct a simulated query set to simulate the users' behavior on a search engine. Because there is no truly feedback information, so in our experiments, we treat the relevance judgements published by TREC as the feedback by several truthful users. Considering the hypothesis of multiple relevance feedback, the relevance and nonrelevance documents have to appear concurrently in the feedback judgement, so we filter both two query set above following the constraint is that: the baseline method (the language model with the KLdivergence score function and Dirichlet Prior Smoothing, more details in Section 5.2) returned at least 1 relevant document in top 20 (user clicked) and at least 1 nonrelevant document (user swept over) before the lowestranked relevant document in top 20 also. Finally, there are 112 and 129 queries are available respectively for our experiments, with more details in Table 2. In particular, we treat all topic titles as queries and neglect their description field.
In order to evaluate the effectiveness of our method, we use three methods as the baselines for comparison.
 The Language Model was implemented by the Indri (Indri Toolkit – http://www.lemurproject.org/indri.php) toolkit, in which the Dirichlet smoothing prior is set to 2000 for Robust04 and 1500 for GOV2 empirically ^{[26]}, and this method is denoted by LMDir.
 The RelevanceBased Language Model was also implemented by the Indri, which is one of the PRF expansion approaches and only use the positive feedback model, based on the querylikelihood method by ^{[17]} and denoted by RMQL.
 The MultiNeg feedback method which we implement following the describe in ^{[20]}, considering only the negative feedback information, and we denote it by MultiNeg (details in Subsection 3.4). All the parameters are set to the empirical value ( , and ) ^{[29]}.
The multiple relevance feedback strategy which we proposed in this paper, take account of both positive and negative feedback information. Therefore, the goal of our experiments is to simulate a scenario when a user has viewed the top ranked documents (on the first page). He (or she) has clicked a few hyperlink for further view and is about to view the rest of the search results (click the button of next page). At this point, we can naturally apply feedback information to rerank all the unseen documents. As we have showed in Section 5.1, we set , which simulates the scenario of applying feedback, the relevant and nonrelevant documents have been found on the first page of search results and the user is about to view the next page of results.
In order to set parameters in our method, i.e., (described in Section 3.4, a parameter to control the influence of the negative feedback) and (described in Section 4.2, a tradeoff between two divergence functions), we do a 5fold cross validation as follows: we fix the number of positive feedback terms (has been described in Section 4.4) and (described in ^{[17]}, a parameter to control the influence of the positive feedback) to and respectively, then learn both of two parameters based on the training data. The other parameters are set the same value to the MultiNeg (described in Section 5.1).
When the user clicked the nextpage button, the top20 ranked documents have been browsed by the user, so they should not be returned again on the next pages. To simulate this scenario and reflect the performance directly, we remove the top20 documents in the original ranked list from the reranking results.
Specifically, we denote the positive feedback strategy method (described in Section 3.3.2) by RMKL, and the final multiple relevance feedback strategy method we proposed by MultiFB.
Before doing some detail testing, Figure 2 shows the results of assigning different on two TREC collections, when (the number of positive feedback terms) is set to empirically. As it can be seen in Figure 2, the RMKL method perform well when the value of is set to for Robust04 and for GOV2. Thus, we set the parameter to optimal values above for our RMKL and MultiFB methods.
With the setup showed above, the topranked 1000 unseen documents for all runs were compared in terms of two sets of performance measures: Mean Average Precision (MAP) and Precision at 20 (P@20), which reflect the utility from users perspective who can not bear with more than two pages browsing. Please note that MAP is considered as the main measure, however, we show our experimental results based on all measures for the sake of completeness.
Finally, in order to see the effectiveness of our proposed strategies, we compare them with the three baselines methods after several adhoc retrieval testing on two TREC standard collections and list Table 3 and Table 4 to show the results with MAP and P@20 measures, respectively. Table 3 shows the cross validation results with MAP and Table 4 also show cross validation with P@20, based on both collections Robust04 and GOV2, respectively. These Tables also show the results of assigning different value to for every collections. The MAP of our method MultiFB is and higher than the Relevancebased Model (RMQL) based on pseudo relevance feedback, also and higher on P@20 for Robust04 and GOV2 respectively.
According these results, we can see the MultiFB method outperform the RMQL and MultiNeg in most case, it shows that taking account of positive and negative feedback information concurrently lead to a more effective feedback language model than using either of them singly. We also find out that the RMKL method we proposed preforms better than the RMQL method, it confirms the effectivity of KLdivergence in information retrieval field.
Because of the inherent limitations of current retrieval models, it is nearly impossible for any retrieval model to return satisfactory results for every query. The feedback is an important and useful technique for tackling this challenge, which can be done automatically without requiring extra user effort based on implicit feedback information. In this paper, we focused on a scenario that a user submitted to a search engine and then clicked several hyperlinks of return list for viewing. Thus, we addressed the problem of data sparseness in feedback, with the assumption is that all the clicked documents are all highly relevant and the others in the return list before the lowestranked clicked document were irrelevant, by proposing a multiple relevance feedback strategy in the language modeling framework using KLdivergence score function. In the strategy we proposed, we learned a positive language model and a set of general negative language model from the feedback documents. Finally, we used the multiple relevance feedback strategy which is formed by the fusion of positive and negative language models to rerank the unseen list for users. In the last section, our experiments showed that the proposed multiple relevance feedback strategy is effective for improving ranking accuracy and it outperformed the methods using only either positive or negative feedback.
There are a few limitations of our work. First, all the experiments are based on simulated feedback instead of real world relevance feedback by truly users, so a future work will be to test the proposed methods with real feedback data. Second, maybe some other feedback method can be interpolate into our multiple strategy later. The last, because of the great number of combination possibility, we do not have enough time and equipment to do optimization for all parameters in our experiments. So we will pay more attention on all of these in our future work.
The authors are grateful for the support from Prof. Hongfei Lin of the School of Computer Science and Technology, Dalian University of Technology. The authors wish to thank the anonymous reviewer whose constructive comments were very helpful for strengthening the presentation of this paper.
References
1.  Ponte JM, Croft WB (1998) A language modeling approach to information retrieval. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. New York, NYUSA: ACM, SIGIR ′98, pp. 275–281. doi:10.1145/290941.291008. Available: http://doi.acm.org/10.1145/290941.291008 
2.  Miller DRH, Leek T, Schwartz RM (1999) A hidden markov model information retrieval system. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. New York, NYUSA: ACM, SIGIR ′99, pp. 214–221. doi:10.1145/312624.312680. Available: http://doi.acm.org/10.1145/312624.312680 
3.  Berger A, Lafferty J (1999) Information retrieval as statistical translation. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. New York, NYUSA: ACM, SIGIR ′99, pp. 222–229. doi:10.1145/312624.312681. Available: http://doi.acm.org/10.1145/312624.312681 
4.  Song F, Croft WB (1999) A general language model for information retrieval. In: Proceedings of the eighth international conference on Information and knowledge management. New York, NYUSA: ACM, CIKM ′99, pp. 316–321. doi:10.1145/319950.320022. Available: http://doi.acm.org/10.1145/319950.320022 
5.  Ng K (1999) A maximum likelihood ratio information retrieval model. 
6.  Kankanhalli M,, Rui Y, (Year: 2008) Application potential of multimedia information retrieval. Proceedings of the IEEE96: 712–720 
7.  Rui Y,, Huang T,, Ortega M,, Mehrotra S, (Year: 1998) Relevance feedback: a power tool for interactive contentbased image retrieval. Circuits and Systems for Video Technology, IEEE Transactions on8: 644–655 
8.  Ji R,, Duan LY,, Chen J,, Yao H,, Yuan J,, et al. (Year: 2012) Location discriminative vocabulary coding for mobile landmark search. International Journal of Computer Vision96: 290–314 
9.  Robertson S,, Sparck JK, (Year: 1976) Relevance weighting of search terms. The American Society for Information Science27: 129–146 
10.  Rocchio J, (Year: 1971) Relevance feedback in information retrieval. The SMART Retrieval System: Experiments in Automatic Document Processing21: 313–323 
11.  Salton G,, Buckley C, (Year: 1990) Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science41: 288–297 
12.  Attar R,, Fraenkel AS, (Year: 1977) Local feedback in fulltext retrieval systems. J ACM24: 397–417 
13.  Buckley C,, Salton G,, Allan J,, Singhal A, (Year: 1994) Automatic Query Expansion Using SMART: TREC 3. In: TREC pp 69–80 Available: http://dblp.unitrier.de/rec/bibtex/conf/trec/BuckleySAS94 
14.  Croft W, Xu JX (1996) Query expansion using local and global document analysis. In: SIGIR 1996: Proceedings of the 19th annual international ACM International Conference on Research and development in information retrieval. ACM, pp. 4–11. 
15.  Shen X, Tan B, Zhai C (2005) Contextsensitive information retrieval using implicit feedback. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NYUSA: ACM, SIGIR ′05, pp. 43–50. doi:10.1145/1076034.1076045. URL http://doi.acm.org/10.1145/1076034.1076045 
16.  Wang X, Fang H, Zhai C (2007) Improve retrieval accuracy for difficult queries using negative feedback. In: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. New York, NYUSA: ACM, CIKM ′07, pp. 991–994. doi:10.1145/1321440.1321593. Available: http://doi.acm.org/10.1145/1321440.1321593 
17.  Lavrenko V, Croft WB (2001) Relevancebased language models. In: SIGIR 2001: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp. 120–127. 
18.  Zhang P,, Song D,, Zhao X,, Hou Y, (Year: 2010) A study of document weight smoothness in pseudo relevance feedback. In: AAIRS pp 527–538 
19.  Dunlop MD, (Year: 1997) The effect of accessing nonmatching documents on relevance feedback. ACM Transactions on Information Systems15: 137–153 
20.  Wang X, Fang H, Zhai C (2008) A study of methods for negative relevance feedback. In: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. New York, NYUSA: ACM, SIGIR ′08, pp. 219–226. doi:10.1145/1390334.1390374. Available: http://doi.acm.org/10.1145/1390334.1390374 
21.  Karimzadehgan M, Zhai C (2010) Explorationexploitation tradeoff in interactive relevance feedback. In: Proceedings of the 19th ACM international conference on Information and knowledge management. New York, NYUSA: ACM, CIKM′10, pp. 1397–1400. doi:10.1145/1871437.1871631. Available: http://doi.acm.org/10.1145/1871437.1871631 
22.  Zhai CX, Lafferty J (2001)Modelbased feedback in the language modeling approach to information retrieval. In: Proceedings of the tenth international conference on Information and knowledge management. New York, NYUSA: ACM, CIKM ′01, pp. 403–410. doi:10.1145/502585.502654. Available: http://doi.acm.org/10.1145/502585.502654 
23.  Lv Y, Zhai C (2009) A comparative study of methods for estimating query language models with pseudo feedback. In: Proceedings of the 18th ACM conference on Information and knowledge management. New York, NYUSA: ACM, CIKM′09, pp. 1895–1898. doi:10.1145/1645953.1646259. Available: http://doi.acm.org/10.1145/1645953.1646259 
24.  Singhal A, Mitra M, Buckley C (1997) Learning routing queries in a query zone. In: Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NYUSA: ACM, SIGIR ′97, pp. 25–32. doi:10.1145/258525.258530. Available: http://doi.acm.org/10.1145/258525.258530 
25.  Raman K, Udupa R, Bhattacharya P, Bhole A (2010) On improving pseudorelevance feedback using pseudoirrelevant documents. In: Proceedings of the 32nd European conference on Advances in Information Retrieval. Berlin, Heidelberg: SpringerVerlag, ECIR ′2010, pp. 573–576. doi:10.1007/9783642122750. Available: http://dx.doi.org/10.1007/9783642122750 
26.  Croft B, Metzler D, Strohman T (2009) Search Engines: Information Retrieval in Practice. Reading, Massachusetts: AddisonWesley Publishing Company. 
27.  Lafferty J, Zhai C (2001) Document language models, query models, and risk minimization for information retrieval. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NYUSA: ACM, SIGIR ′01, pp. 111–119. doi:10.1145/383952.383970. Available: http://doi.acm.org/10.1145/383952.383970 
28.  Zhai C, Lafferty J (2001) A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NYUSA: ACM, SIGIR ′01, pp. 334–342. doi:10.1145/383952.384019. Available: http://doi.acm.org/10.1145/383952.384019 
29.  Karimzadehgan M, Zhai C (2011) Improving retrieval accuracy of difficult queries through generalizing negative document language models. In: Proceedings of the 20th ACM international conference on Information and knowledge management. New York, NYUSA: ACM, CIKM ′11, pp. 27–36. doi:10.1145/2063576.2063586. Available: http://doi.acm.org/10.1145/2063576.2063586 
Figures
Tables
Table 1 Algorithm of Term Elimination for Optimization.
Algorithm 1. Term Elimination for Optimization. 
Input: a set of negative language models 
Output: a set of general negative language models 
Parameter: the increment of 
Initialize parameter and 
for to do 
repeat 
Set 
foreach satisfies 
Remove from 
end foreach 
Normalize the probabilities in current 
Update 
until the distance of is higher than (Eq. 16) 
Set 
end for 
Table 2 Data statistics of Robust04 and GOV2.
Corpus  Docs  Original Q  Final Q  Firstpass P@20 
Robust04  528,155  No.301–450  112  0.163 
Gov2  25,205,179  No.701–850  129  0.229 
Table 3 MAP scores of various methods.
Meth.  MAP  
Robust04  
LMDir  0.220  0.220  0.220  0.220  0.220 
RMQL  0.241  0.249  0.253  0.254  0.254 
RMKL  0.252  0.259  0.260  0.260  0.256 
MultiNeg  0.229  0.229  0.229  0.229  0.229 
MultiFB  0.262  0.266  0.266  0.262  0.260 
Gov2  
LMDir  0.272  0.272  0.272  0.272  0.272 
RMQL  0.270  0.277  0.280  0.281  0.281 
RMKL  0.277  0.281  0.282  0.283  0.282 
MultiNeg  0.276  0.276  0.276  0.276  0.276 
MultiFB  0.280  0.282  0.286  0.285  0.285 
k  10  20  30  40  50 
Table 4 P@20 scores of various methods.
Meth.  Precision@20  
Robust04  
LMDir  0.354  0.354  0.354  0.354  0.354 
RMQL  0.347  0.348  0.354  0.356  0.356 
RMKL  0.347  0.349  0.351  0.357  0.356 
MultiNeg  0.356  0.356  0.356  0.356  0.356 
MultiFB  0.352  0.355  0.357  0.361  0.360 
Gov2  
LMDir  0.467  0.467  0.467  0.467  0.467 
RMQL  0.467  0.464  0.469  0.473  0.466 
RMKL  0.470  0.476  0.480  0.485  0.482 
MultiNeg  0.468  0.468  0.468  0.468  0.468 
MultiFB  0.479  0.486  0.490  0.493  0.492 
k  10  20  30  40  50 
Article Categories:

Previous Document: The Roles of Cellular Nanomechanics in Cancer.
Next Document: Effect of Dialysis Initiation Timing on Clinical Outcomes: A PropensityMatched Analysis of a Prospe...