Sinha, Aman (2023) Retrievability in IR. Masters thesis, Indian Institute of Science Education and Research Kolkata.
Text (MS dissertation of Aman Sinha (18MS065))
Thesis_18MS065.pdf - Submitted Version Restricted to Repository staff only Download (4MB) |
Abstract
The rapid digitization of information has led to an unprecedented accumulation of knowledge across various formats, making effective information retrieval essential for accessing and utilizing this vast pool of information. In the field of Information Retrieval (IR), search engines like Google, Bing, and DuckDuckGo play a pivotal role in influencing our daily lives by presenting us with search results. With this influence comes the responsibility to ensure unbiased retrieval of websites and documents. Retrievability, a quantitative measure capturing a document’s ability to be retrieved by a retrieval model regardless of the search query, plays a crucial role in assessing retrieval effectiveness. This master’s thesis investigates critical aspects of IR systems, aiming to identify flaws, propose improvements, and uncover biases affecting their performance evaluations. The research begins by exposing a major flaw in the artificial query generation method used for Retrievability analysis. Building upon this finding, an improved method is proposed, demonstrating enhanced correlation for accurately assessing document retrievability. The study further uncovers a new bias in the Relevance Judgement process, which favors highly retrievable documents, potentially distorting the evaluation of IR systems. Recognizing this bias is crucial for fair assessments, and future research should focus on mitigating its impact. An examination of the RM3 technique reveals its ability to boost overall performance but at the cost of making unique relevant documents less findable. Balancing retrieval effectiveness with the retrieval of unique and relevant documents becomes a vital consideration for practitioners and researchers. Additionally, a slight correlation is observed between document ranks from PageRank and Retrievability measures, suggesting a relationship between the two metrics. This finding opens avenues for exploring their interplay and mutual reinforcement in future research.
Item Type: | Thesis (Masters) |
---|---|
Additional Information: | Supervisor: Dr. Dwaipayan Roy; DPS Coordinator: Prof. Rangeet Bhattacharyya |
Uncontrolled Keywords: | Artificial Query Generation; IR; Information Retrieval; PageRank; Retrieval Bias |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Department of Physical Sciences |
Depositing User: | IISER Kolkata Librarian |
Date Deposited: | 27 Dec 2023 11:47 |
Last Modified: | 27 Dec 2023 11:47 |
URI: | http://eprints.iiserkol.ac.in/id/eprint/1522 |
Actions (login required)
View Item |