Метод формалізованої процедури синтезу та обчислення ознак для виявлення фейкових новин

Шупта, АндрійShupta, Andrii2025-10-182025-10-182025Шупта А. Метод формалізованої процедури синтезу та обчислення ознак для виявлення фейкових новин / А. Шупта // Herald of Khmelnytskyi National University. Technical Sciences. – 2025. – Vol. 355, No. 4. – P. 719-723.https://elar.khmnu.edu.ua/handle/123456789/19658У роботі запропоновано новий метод, що формалізує процедуру виявлення фейкових новин, яка ґрунтується на можливостях великих мовних моделей (LLM) для синтезу підозрілих текстових атрибутів та їхнього перетворення на числові вектори, що придатні для класифікації. Завдання дослідження полягає в уточненні процесу перетворення текстових сигналів на числові ознаки, що покращує інтеграцію лінгвістичних сигналів з глибокими контекстуальними векторами ознак. Експерименти проводилися за англомовним (FakeNewsNet) та україномовним (Ukrainian news) наборами даних, де запропонований метод перевершив базові підходи, досягнувши точності до 89.6% для англійської та 88.3% для української мови. Ключові результати показують, що поєднання числових індикаторів (наприклад, коефіцієнтів перефразування та тональності) з генерацією за LLM забезпечує вищу повноту виявлення оманливих новинних статей. Запропонована процедура обчислення ознак успішно підвищує точність виявлення, зберігаючи прозорість прийняття рішень моделлю. Дослідження підкреслює важливість систематично розроблених числових ознак, які доповнюють генерації за LLM, пропонуючи шлях до більш надійних, адаптивних та пояснюваних систем виявлення фейкових новинThe pervasive and evolving nature of digital disinformation necessitates the development of sophisticated detection systems that are accurate, transparent, and adaptable to novel deceptive strategies. While Large Language Models (LLMs) have demonstrated considerable prowess in discerning nuanced textual patterns, their application in fake news detection often results in “black-box” systems, limiting trust and hindering the ability to respond to emergent manipulative techniques. This paper introduces a novel method designed to bridge this gap. We present a structured procedure for systematically synthesizing suspicious textual attributes, guided by LLM-driven insights, and their subsequent transformation into a robust set of quantifiable, interpretable numerical features. These features, encompassing aspects such as paraphrase intensity, sentiment polarity, stylistic anomalies, and fact-checking congruity, are then synergistically integrated with the deep contextual embeddings generated by LLMs. Rigorous experimental validation was conducted on diverse English (FakeNewsNet) and Ukrainian (Ukrainian news) datasets. The proposed method outperformed established baseline approaches, achieving substantial accuracy improvements, with figures reaching up to 89.6% for English and 88.3% for Ukrainian language texts. Key findings reveal that explicitly incorporating these engineered numeric indicators significantly enhances recall rates for deceptive articles, a critical factor in mitigating the societal impact of misinformation. Furthermore, the method’s modularity fosters adaptability, enabling the incorporation of newly identified deceptive patterns as additional numeric features without necessitating the complete retraining of the foundational LLM. This study unequivocally underscores the significant value of systematically engineered, interpretable numeric features as a vital complement to the powerful, yet often opaque, embeddings of LLMsukвиявлення фейкових новинвеликі мовні моделіпроцедура обчислення ознакобробка природної мовикласифікація текстівfake news detectionlarge language modelsfeature computation procedurenatural language processingtext classificationМетод формалізованої процедури синтезу та обчислення ознак для виявлення фейкових новинMethod of formalized procedure for synthesis and computation of features for fake news detectionСтаття004.85:004.912:025.4.03