基于Zero-Shot-CoT的對話價值觀優先級標注方法

馬志強; 劉佳; 李鑫; 王奎波; 劉義興; 葉浩然

doi:10.13374/j.issn2095-9389.2024.12.30.004

基于Zero-Shot-CoT的對話價值觀優先級標注方法

Method for annotating dialogue value priority based on Zero-Shot Chain-of-Thought

摘要

摘要: 價值觀優先級識別旨在識別文本背后隱含的價值觀優先級屬性，從而判斷其是否與特定的價值觀及其類型相符，對于用戶語言檢測、評估大語言模型生成內容和探究大語言模型對人類價值觀優先級的評估能力至關重要. 目前，由于缺乏對話場景下的人類價值觀識別數據集，在對話中建模并識別人類價值觀優先級的研究仍未被觸及. 因此，構建高質量的對話價值觀優先級識別數據集是首要任務. 然而，標注對話價值觀優先級識別數據集要求標注者具備一定專業知識儲備，標注門檻較高，因此，本文基于大語言模型對現有的對話語料進行標注，提供了一個對話價值觀優先級識別數據集的標注案例，擴展了基于大語言模型的數據標注的應用. 具體來說，設計了一種基于Zero-Shot-CoT的對話價值觀標注方法，模擬了人類標注結果，并通過本文提出的對話價值觀優先級標注方法，構建了一個大規模對話價值觀識別數據集ValueCon. 實驗結果表明，與人工標注方法相比，本文提出的標注方法緩解了人工標注帶來的不一致性和噪聲影響，基于此構建的ValueCon數據集能夠有效訓練對話價值觀識別模型，驗證了本文提出的標注方法具有實用價值.

Abstract: Value priority recognition is a fundamental task in computational linguistics that focuses on discerning and categorizing the implicit hierarchical structure of human values manifested within textual expressions. Its core objective is to determine whether textual content aligns with specific value types and identify the relative precedence assigned to these values within the implicit hierarchy. This capability holds profound significance across several critical domains. It is indispensable for conducting sophisticated analysis of user language patterns in behavioral profiling. It serves as an essential metric for evaluating the ethical alignment and value consistency inherent in content generated by large language models. It establishes a vital methodological foundation for investigating the capacity of large language models to comprehend, interpret, and evaluate the complex hierarchies inherent in human value systems. Dialogue, which is a primary and natural mode of human communication, inherently functions as a potent vehicle for expressing value-driven judgments, preferences, and priorities. The interactive nature of conversations involving turn-taking, argumentation, and negotiation frequently reveals implicit value tradeoffs and hierarchical relationships. Consequently, dialogue presents an exceptionally fertile domain for modeling human value prioritization. Despite this inherent suitability, dedicated research that focuses on systematically modeling human value priorities within interactive conversational settings remains underdeveloped and largely unexplored. This significant research gap stems primarily from the current absence of dedicated high-quality datasets specifically designed for recognizing value priorities within authentic dialogue contexts. The lack of such resources substantially hinders empirical investigations and development of effective computational models in this field. Therefore, the creation of a meticulously annotated large-scale dataset for dialogue value priority recognition emerges as an essential foundational prerequisite for advancing scholarly understanding in this area. However, the annotation process required for constructing such specialized datasets encounters substantial and intrinsic challenges. These difficulties arise principally from the complex cognitive nature and profound subjectivity that characterize human values. Values represent deeply held, often abstract, cognitive constructs that fundamentally guide decision making and behavior. Their reliable identification and hierarchical ordering within textual data necessitate more than superficial linguistic analysis; they demand interpretative insight into the underlying motivations, ethical frameworks, and contextual nuances. This cognitive dimension imposes rigorous requirements on human annotators, who must possess substantial expertise in relevant psychological theories, principles of cognitive science pertaining to moral reasoning, and sociolinguistic understanding. Consequently, achieving consistent, reliable, and expert-level manual annotation is a prohibitively high barrier. This challenge inevitably leads to persistent issues including inconsistency among different annotators, conceptual ambiguity in label application, and substantial noise within the resulting annotations—factors that can critically compromise dataset quality and the subsequent performance of models trained upon it. To address these challenges, this study strategically leveraged the advanced capabilities of contemporary large language models. We propose a novel annotation method for dialogue value priority recognition using existing textual dialogue corpora by capitalizing on their exceptional natural language understanding, sophisticated reasoning abilities, and extensive internalized knowledge bases in psychology, ethics, philosophy, and social sciences,. Based on this method, we constructed the ValueCon dataset, a large-scale, high-quality benchmark dataset specifically designed for value priority recognition in dialogue. The experimental results demonstrate that compared with manual annotation methods, the annotation method proposed in this study alleviated the inconsistencies and noise associated with manual annotation. The ValueCon dataset constructed based on this method can effectively train dialogue value recognition models, thereby validating the practical value of the annotation method proposed in this study.

HTML全文

參考文獻(26)

施引文獻

資源附件(0)