Search

A practical guide to evaluating sensitivity of literature search strings for systematic reviews using relative recall
Malgorzata Lagisz, Yefeng Yang, Sarah Young, Shinichi Nakagawa
Journal:

Research Synthesis Methods / Volume 16 / Issue 1 / January 2025

Published online by Cambridge University Press:

07 March 2025, pp. 1-14
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Systematic searches of published literature are a vital component of systematic reviews. When search strings are not “sensitive,” they may miss many relevant studies limiting, or even biasing, the range of evidence available for synthesis. Concerningly, conducting and reporting evaluations (validations) of the sensitivity of the used search strings is rare, according to our survey of published systematic reviews and protocols. Potential reasons may involve a lack of familiarity or inaccessibility of complex sensitivity evaluation approaches. We first clarify the main concepts and principles of search string evaluation. We then present a simple procedure for estimating a relative recall of a search string. It is based on a pre-defined set of “benchmark” publications. The relative recall, that is, the sensitivity of the search string, is the retrieval overlap between the evaluated search string and a search string that captures only the benchmark publications. If there is little overlap (i.e., low recall or sensitivity), the evaluated search string should be improved to ensure that most of the relevant literature can be captured. The presented benchmarking approach can be applied to one or more online databases or search platforms. It is illustrated by five accessible, hands-on tutorials for commonly used online literature sources. Overall, our work provides an assessment of the current state of search string evaluations in published systematic reviews and protocols. It also paves the way to improve evaluation and reporting practices to make evidence synthesis more transparent and robust.

Unsupervised extraction of local and global keywords from a single text
Part of
- NLP Editorial Board access (current content+all back content)
Lida Aleksanyan, Armen Allahverdyan
Journal:

Natural Language Processing ,

Published online by Cambridge University Press:

05 December 2024, pp. 1-23
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
We propose an unsupervised, corpus-independent method to extract keywords from a single text. It is based on the spatial distribution of words and the response of this distribution to a random permutation of words. Our method has three advantages over existing unsupervised methods (such as YAKE). First, it is significantly more effective at extracting keywords from long texts in terms of precision and recall. Second, it allows inference of two types of keywords: local and global. Third, it extracts basic topics from texts. Additionally, our method is language-independent and applies to short texts. The results are obtained via human annotators with previous knowledge of texts from our database of classical literary works. The agreement between annotators is moderate to substantial. Our results are supported via human-independent arguments based on the average length of extracted content words and on the average number of nouns in extracted words. We discuss relations of keywords with higher-order textual features and reveal a connection between keywords and chapter divisions.

11 - Quantum Language
Jerome R. Busemeyer, Indiana University, Peter D. Bruza, Queensland University of Technology
Book:

Quantum Models of Cognition and Decision

Published online:

14 November 2024

Print publication:

21 November 2024, pp 313-344
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

A case study on decompounding in Indian language IR
Part of
- NLP Editorial Board access (current content+all back content)
Siba Sankar Sahu, Sukomal Pal
Journal:

Natural Language Processing / Volume 31 / Issue 2 / March 2025

Published online by Cambridge University Press:

03 June 2024, pp. 575-605
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Decompounding is an essential preprocessing step in text-processing tasks such as machine translation, speech recognition, and information retrieval (IR). Here, the IR issues are explored from five viewpoints. (A) Does word decompounding impact the Indian language IR? If yes, to what extent? (B) Can corpus-based decompounding models be used in the Indian language IR? If yes, how? (C) Can machine learning and deep learning-based decompounding models be applied in the Indian language IR? If yes, how? (D) Among the different decompounding models (corpus-based, hybrid machine learning-based, and deep learning-based), which provides the best effectiveness in the IR domain? (E) Among the different IR models, which provides the best effectiveness from the IR perspective? This study proposes different corpus-based, hybrid machine learning-based, and deep learning-based decompounding models in Indian languages (Marathi, Hindi, and Sanskrit). Moreover, we evaluate the effectiveness of each activity from an IR perspective only. It is observed that the different decompounding models improve IR effectiveness. The deep learning-based decompounding models outperform the corpus-based and hybrid machine learning-based models in Indian language IR. Among the different deep learning-based models, the Bi-LSTM-A model performs best and improves mean average precision (MAP) by 28.02% in Marathi. Similarly, the Bi-RNN-A model improves MAP by 18.18% and 6.1% in Hindi and Sanskrit, respectively. Among the retrieval models, the In_expC2 model outperforms others in Marathi and Hindi, and the BB2 model outperforms others in Sanskrit.

Introducing a multipliable BOM-based automatic definition of information retrieval in plant engineering
Max Layer, Sebastian Neubert, Ralph Stelzer
Journal:

Proceedings of the Design Society / Volume 4 / May 2024

Published online by Cambridge University Press:

16 May 2024, pp. 413-422
- Article
- - You have access
  - Open access
- PDF
- Export citation
The complexity of process plants and the growing demand for digitalization require efficient and accurate information retrieval throughout the lifecycle phases of a process plant. This paper discusses the concept of instantiation and introduces a method for identifying and multiplying required information in plant engineering using scalable so-called Instantiation Blocks linked to the Bill of Material. Core functionality, an ontology graph and a user interface based on Python and React are developed to demonstrate the implementation of the framework and validate its effectiveness in practice.

Death and Taxonomies: Bringing Certainty to Legal Language
Alice Laird, Katy Snell
Journal:

Legal Information Management / Volume 23 / Issue 3 / September 2023

Published online by Cambridge University Press:

17 November 2023, pp. 144-149

Print publication:

September 2023
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Taxonomies are key to creating certainty in language across the legal landscape. In this article Alice Laird and Katy Snell of Howard Kennedy examine why taxonomies are still important in the age of AI. They also discuss how they have built on their expertise on taxonomies for the benefit of their firm, and how they have become involved with a cross firm project called noslegal.

C'mon feel the Moys: a trip through the orange book
Helen Garner, Felicity Staveley-Taylor
Journal:

Legal Information Management / Volume 23 / Issue 2 / June 2023

Published online by Cambridge University Press:

11 August 2023, pp. 73-80

Print publication:

June 2023
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This article is based on a course on how to use Moys classification scheme,1 delivered by Helen Garner and Felicity Staveley-Taylor on behalf of the British and Irish Association of Law Librarians (BIALL) in November 2022. The article, which is also written by Helen and Felicity, provides guidance on how to use Moys classification, explaining the features that enable the scheme to be expanded to accommodate new subject areas. The article also explains some of the features which ensure the scheme remains relevant to legal libraries today. Sections from the classification scheme, as published in the fifth edition, appear in the article text. In addition, any references to page numbers are to the fifth edition.

A comparison of latent semantic analysis and correspondence analysis of document-term matrices
Qianqian Qi, David J. Hessen, Tejaswini Deoskar, Peter G. M. van der Heijden
Journal:

Natural Language Engineering / Volume 30 / Issue 4 / July 2024

Published online by Cambridge University Press:

18 May 2023, pp. 722-752
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Latent semantic analysis (LSA) and correspondence analysis (CA) are two techniques that use a singular value decomposition for dimensionality reduction. LSA has been extensively used to obtain low-dimensional representations that capture relationships among documents and terms. In this article, we present a theoretical analysis and comparison of the two techniques in the context of document-term matrices. We show that CA has some attractive properties as compared to LSA, for instance that effects of margins, that is, sums of row elements and column elements, arising from differing document lengths and term frequencies are effectively eliminated so that the CA solution is optimally suited to focus on relationships among documents and terms. A unifying framework is proposed that includes both CA and LSA as special cases. We empirically compare CA to various LSA-based methods on text categorization in English and authorship attribution on historical Dutch texts and find that CA performs significantly better. We also apply CA to a long-standing question regarding the authorship of the Dutch national anthem Wilhelmus and provide further support that it can be attributed to the author Datheen, among several contenders.

Recommending tasks based on search queries and missions
Darío Garigliotti, Krisztian Balog, Katja Hose, Johannes Bjerva
Journal:

Natural Language Engineering / Volume 30 / Issue 3 / May 2024

Published online by Cambridge University Press:

17 May 2023, pp. 577-601
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Web search is an experience that naturally lends itself to recommendations, including query suggestions and related entities. In this article, we propose to recommend specific tasks to users, based on their search queries, such as planning a holiday trip or organizing a party. Specifically, we introduce the problem of query-based task recommendation and develop methods that combine well-established term-based ranking techniques with continuous semantic representations, including sentence representations from several transformer-based models. Using a purpose-built test collection, we find that our method is able to significantly outperform a strong text-based baseline. Further, we extend our approach to using a set of queries that all share the same underlying task, referred to as search mission, as input. The study is rounded off with a detailed feature and query analysis.

An unsupervised perplexity-based method for boilerplate removal
Marcos Fernández-Pichel, Manuel Prada-Corral, David E. Losada, Juan C. Pichel, Pablo Gamallo
Journal:

Natural Language Engineering / Volume 30 / Issue 1 / January 2024

Published online by Cambridge University Press:

21 February 2023, pp. 132-149
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
The availability of large web-based corpora has led to significant advances in a wide range of technologies, including massive retrieval systems or deep neural networks. However, leveraging this data is challenging, since web content is plagued by the so-called boilerplate: ads, incomplete or noisy text and rests of the navigation structure, such as menus or navigation bars. In this work, we present a novel and efficient approach to extract useful and well-formed content from web-scraped data. Our approach takes advantage of Language Models and their implicit knowledge about correctly formed text, and we demonstrate here that perplexity is a valuable artefact that can contribute in terms of effectiveness and efficiency. As a matter of fact, the removal of noisy parts leads to lighter AI or search solutions that are effective and entail important reductions in resources spent. We exemplify here the usefulness of our method with two downstream tasks, search and classification, and a cleaning task. We also provide a Python package with pre-trained models and a web demo demonstrating the capabilities of our approach.

An empirical study of incorporating syntactic constraints into BERT-based location metonymy resolution
Hao Wang, Siyuan Du, Xiangyu Zheng, Lingyi Meng
Journal:

Natural Language Engineering / Volume 29 / Issue 3 / May 2023

Published online by Cambridge University Press:

01 August 2022, pp. 669-692
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Metonymy resolution (MR) is a challenging task in the field of natural language processing. The task of MR aims to identify the metonymic usage of a word that employs an entity name to refer to another target entity. Recent BERT-based methods yield state-of-the-art performances. However, they neither make full use of the entity information nor explicitly consider syntactic structure. In contrast, in this paper, we argue that the metonymic process should be completed in a collaborative manner, relying on both lexical semantics and syntactic structure (syntax). This paper proposes a novel approach to enhancing BERT-based MR models with hard and soft syntactic constraints by using different types of convolutional neural networks to model dependency parse trees. Experimental results on benchmark datasets (e.g., ReLocaR, SemEval 2007 and WiMCor) confirm that leveraging syntactic information into fine pre-trained language models benefits MR tasks.

Information Retrieval and Survey Design for Two-Stage Customer Preference Modeling
Y. Xiao, Y. Cui, N. Raut, J. H. Januar, J. Koskinen, N. Contractor, W. Chen, Z. Sha
Journal:

Proceedings of the Design Society / Volume 2 / May 2022

Published online by Cambridge University Press:

26 May 2022, pp. 811-820
- Article
- - You have access
  - Open access
- PDF
- Export citation
Customer survey data is critical to supporting customer preference modeling in engineering design. We present a framework of information retrieval and survey design to ensure the collection of quality customer survey data for analyzing customers’ preferences in their consideration-then-choice decision-making and the related social impact. The utility of our approach is demonstrated through the survey design for customers in the vacuum cleaner market. Based on the data, we performed descriptive analysis and network-based modeling to understand customers’ preferences in consideration and choice.

Digital literacy and online political behavior
Andrew M. Guess, Kevin Munger
Journal:

Political Science Research and Methods / Volume 11 / Issue 1 / January 2023

Published online by Cambridge University Press:

22 April 2022, pp. 110-128
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Digital literacy is receiving increased scholarly attention as a potential explanatory factor in the spread of misinformation and other online pathologies. As a concept, however, it remains surprisingly elusive, with little consensus on definitions or measures. We provide a digital literacy framework for political scientists and test survey items to measure it with an application to online information retrieval tasks. There exists substantial variation in levels of digital literacy in the population, which we show is correlated with age and could confound observed relationships. However, this is obscured by researchers’ reliance on online convenience samples that select for people with computer and internet skills. We discuss the implications of these measurement and sample selection considerations for effect heterogeneity in studies of online political behavior. We argue that there is no universally applicable formula for selecting a given non-probability sample or operationalization of the concept of digital literacy; instead, we conclude, researchers should make theoretically informed arguments about how they select both sample and measure.

In-depth analysis of the impact of OCR errors on named entity recognition and linking
Ahmed Hamdi, Elvys Linhares Pontes, Nicolas Sidere, Mickaël Coustaty, Antoine Doucet
Journal:

Natural Language Engineering / Volume 29 / Issue 2 / March 2023

Published online by Cambridge University Press:

18 March 2022, pp. 425-448
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Named entities (NEs) are among the most relevant type of information that can be used to properly index digital documents and thus easily retrieve them. It has long been observed that NEs are key to accessing the contents of digital library portals as they are contained in most user queries. However, most digitized documents are indexed through their optical character recognition (OCRed) version which include numerous errors. Although OCR engines have considerably improved over the last few years, OCR errors still considerably impact document access. Previous works were conducted to evaluate the impact of OCR errors on named entity recognition (NER) and named entity linking (NEL) techniques separately. In this article, we experimented with a variety of OCRed documents with different levels and types of OCR noise to assess in depth the impact of OCR on named entity processing. We provide a deep analysis of OCR errors that impact the performance of NER and NEL. We then present the resulting exhaustive study and subsequent recommendations on the adequate documents, the OCR quality levels, and the post-OCR correction strategies required to perform reliable NER and NEL.

Exploration of Domain Relevance by Legal Professionals in Information Retrieval Systems
Gineke Wiggers, Suzan Verberne, Gerrit-Jan Zwenne, Wouter Van Loon
Journal:

Legal Information Management / Volume 22 / Issue 1 / March 2022

Published online by Cambridge University Press:

27 April 2022, pp. 49-67

Print publication:

March 2022
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper, written by Gineke Wiggers, Suzan Verberne, Gerrit-Jan Zwenne and Wouter Van Loon, addresses the concept of ‘relevance’ in relation to legal information retrieval (IR). They investigate whether the conceptual framework of relevance in legal IR, as described by Van Opijnen and Santos in their paper published in 2017, can be confirmed in practice.1 The research is conducted with a user questionnaire in which users of a legal IR system had to choose which of two results they would like to see ranked higher for a query and were asked to provide a reason for their choice. To avoid questions with an obvious answer and extract as much information as possible about the reasoning process, the search results were chosen to differ on relevance factors from the literature, where one result scores high on one factor, and the other on another factor. The questionnaire had eleven pairs of search results. A total of 43 legal professionals participated consisting of 14 legal information specialists, 6 legal scholars and 23 legal practitioners. The results confirmed the existence of domain relevance as described in the theoretical framework by Van Opijnen and Santos as published in 2017.2 Based on the factors mentioned by the respondents, the authors of this paper concluded that document type, recency, level of depth, legal hierarchy, authority, usability and whether a document is annotated are factors of domain relevance that are largely independent of the task context. The authors also investigated whether different sub-groups of users of legal IR systems (legal information specialists who are searching for others, legal scholars and also for legal practitioners) differ in terms of the factors they consider in judging the relevance of legal documents outside of a task context. Using a PERMANOVA there was found to be no significant difference in the factors reported by these groups. At this moment there is no reason to treat these sub-groups differently in legal IR systems.

A survey of the extraction and applications of causal relations
Brett Drury, Hugo Gonçalo Oliveira, Alneu de Andrade Lopes
Journal:

Natural Language Engineering / Volume 28 / Issue 3 / May 2022

Published online by Cambridge University Press:

20 January 2022, pp. 361-400
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Causationin written natural language can express a strong relationship between events and facts. Causation in the written form can be referred to as a causal relation where a cause event entails the occurrence of an effect event. A cause and effect relationship is stronger than a correlation between events, and therefore aggregated causal relations extracted from large corpora can be used in numerous applications such as question-answering and summarisation to produce superior results than traditional approaches. Techniques like logical consequence allow causal relations to be used in niche practical applications such as event prediction which is useful for diverse domains such as security and finance. Until recently, the use of causal relations was a relatively unpopular technique because the causal relation extraction techniques were problematic, and the relations returned were incomplete, error prone or simplistic. The recent adoption of language models and improved relation extractors for natural language such as Transformer-XL (Dai et al. (2019). Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860) has seen a surge of research interest in the possibilities of using causal relations in practical applications. Until now, there has not been an extensive survey of the practical applications of causal relations; therefore, this survey is intended precisely to demonstrate the potential of causal relations. It is a comprehensive survey of the work on the extraction of causal relations and their applications, while also discussing the nature of causation and its representation in text.

Efficient bilingual lexicon extraction from comparable corpora based on formal concepts analysis
Mohamed Chebel, Chiraz Latiri, Eric Gaussier
Journal:

Natural Language Engineering / Volume 29 / Issue 1 / January 2023

Published online by Cambridge University Press:

04 October 2021, pp. 138-161
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Bilingual corpora are an essential resource used to cross the language barrier in multilingual natural language processing tasks. Among bilingual corpora, comparable corpora have been the subject of many studies as they are both frequent and easily available. In this paper, we propose to make use of formal concept analysis to first construct concept vectors which can be used to enhance comparable corpora through clustering techniques. We then show how one can extract bilingual lexicons of improved quality from these enhanced corpora. We finally show that the bilingual lexicons obtained can complement existing bilingual dictionaries and improve cross-language information retrieval systems.

EXPLORING CHALLENGES WITH DESIGNING AND DEVELOPING SHORE CONTROL CENTERS (SCC) FOR AUTONOMOUS SHIPS
Part of
- Design Methods
H. Dybvik, E. Veitch, M. Steinert
Journal:

Proceedings of the Design Society: DESIGN Conference / Volume 1 / May 2020

Published online by Cambridge University Press:

11 June 2020, pp. 847-856
- Article
- - You have access
  - Open access
- PDF
- Export citation
The concept of remotely operated, unmanned, and autonomous ships is creating increasing interest in the maritime domain, promising safety, increased efficiency and sustainability. Shore control centers (SCCs) have been proposed to operate such vessels and some industry projects are initiated. This paper aims at bringing knowledge about what a SCC is envisioned to be. It identifies and explores challenges related to designing and developing SCCs through semi-structured interviews with the research community and industry. We discuss tasks, functions and interactions between human and machine.

Action-Centered Information Retrieval
MARCELLO BALDUCCINI, EMILY C. LEBLANC
Journal:

Theory and Practice of Logic Programming / Volume 20 / Issue 2 / March 2020

Published online by Cambridge University Press:

09 August 2019, pp. 249-272
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Information retrieval (IR) aims at retrieving documents that are most relevant to a query provided by a user. Traditional techniques rely mostly on syntactic methods. In some cases, however, links at a deeper semantic level must be considered. In this paper, we explore a type of IR task in which documents describe sequences of events, and queries are about the state of the world after such events. In this context, successfully matching documents and query requires considering the events’ possibly implicit uncertain effects and side effects. We begin by analyzing the problem, then propose an action language-based formalization, and finally automate the corresponding IR task using answer set programming.

10 - Language and communication
- By Yorick Wilks, University of Sheffield
Edited by Keith Frankish, The Open University, Milton Keynes, William M. Ramsey, University of Nevada, Las Vegas
Book:

The Cambridge Handbook of Artificial Intelligence

Published online:

05 July 2014

Print publication:

12 June 2014, pp 213-231
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Language and communication are considered as relevant to artificial intelligence. Linguists are not the only scientists wishing to test theories of language functioning: so do psychologists and neurophysiologists. This chapter briefly looks at samples of important and prescient early work, and shows two contrasting, slightly later, approaches to the extraction of content, evaluation, representation, and the role of knowledge. It considers a range of systems embodying natural language processing (NLP)/computational linguistics (CL) aspects since the early seventies, and divides them by their relationships to linguistic systems and in relation to concepts normally taken as central to AI, namely logic, knowledge, and semantics. Broadly, statistical methods imply the use of only numerical, quantitatively based, methods for NLP/CL, rather than methods based on representations, whether those are assigned by humans or by computers. The chapter discusses the role of annotations to texts and the interpretability of core AI representations.

Search Results

Refine search

Refine search

Actions for selected content:

34 results

A practical guide to evaluating sensitivity of literature search strings for systematic reviews using relative recall

Unsupervised extraction of local and global keywords from a single text

11 - Quantum Language

A case study on decompounding in Indian language IR

Introducing a multipliable BOM-based automatic definition of information retrieval in plant engineering

Death and Taxonomies: Bringing Certainty to Legal Language

C'mon feel the Moys: a trip through the orange book

A comparison of latent semantic analysis and correspondence analysis of document-term matrices

Recommending tasks based on search queries and missions

An unsupervised perplexity-based method for boilerplate removal

An empirical study of incorporating syntactic constraints into BERT-based location metonymy resolution

Information Retrieval and Survey Design for Two-Stage Customer Preference Modeling

Digital literacy and online political behavior

In-depth analysis of the impact of OCR errors on named entity recognition and linking

Exploration of Domain Relevance by Legal Professionals in Information Retrieval Systems

A survey of the extraction and applications of causal relations

Efficient bilingual lexicon extraction from comparable corpora based on formal concepts analysis

EXPLORING CHALLENGES WITH DESIGNING AND DEVELOPING SHORE CONTROL CENTERS (SCC) FOR AUTONOMOUS SHIPS

Action-Centered Information Retrieval

10 - Language and communication

Summary

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

34 results

Summary