Метод інтерпретування результатів виявлення фейкових новин за великою мовною моделлю

Швидке поширення фейкових новин, що насичені складним контекстом, виявило неспроможність традиційних методів аналізу тексту ефективно та точно протидіяти цій глобальній загрозі. Як наслідок актуальної задачі пояснення результатів виявлення фейкових новин, у роботі запропоновано новий метод для виявлення фейкових новин та інтерпретування результатів виявлення за великими мовними моделями, що розв’язує задачу їхньої непрозорості. Метод ґрунтується на синергії локальних технік пояснюваного штучного інтелекту (Integrated Gradients, SHAP), глобальних проєкцій ознак (t-SNE, UMAP) та інтерактивного циклу «людина-в-петлі». Такий підхід забезпечує інтерпретованість рішень як на рівні окремих прикладів, так і всього простору даних. Працездатність методу підтверджено на моделі DistilBERT. За результатами тестування на корпусах текстових даних LIAR, FakeNewsNet та CONSTRAINT-2021 запропонований метод продемонстрував стабільне покращення показника F1-міри на 2–4% проти базових моделей. Найвищу точність за метрикою F1 у 97% зафіксовано на корпусі для тестування CONSTRAINT-2021, що підтверджує надійність та відтворюваність запропонованого підходу
The proliferation of sophisticated disinformation campaigns necessitates not only accurate detection but also a clear, justifiable understanding of how and why a model reaches its conclusions. To this end, we propose a method founded on a transparent and reproducible approach that uniquely integrates local explainable artificial intelligence (XAI) with global feature analysis, all operating within an interactive human-in-the-loop (HITL) cycle. At the local level, our method employs powerful attribution techniques—namely, Integrated Gradients and SHAP— to provide fine-grained, instance-level explanations. These tools deconstruct a model's prediction for any given news article, highlighting the specific words, phrases, and semantic patterns that most heavily influenced its classification as either authentic or fake. Complementing this granular analysis, we utilize global feature projection methods, such as t-SNE and UMAP, to visualize the entire data space in lower dimensions. This offers a macrolevel perspective, revealing the distinct clusters formed by fake and real news, identifying outliers, and illuminating the model's overall decision boundaries. The synergy between these local and global views, governed by the HITL cycle, empowers analysts to iteratively refine the model, correct misclassifications, and build robust, trustworthy systems. To validate the performance of our method, we implemented and rigorously tested a DistilBERT model across several diverse data corpora. The model's performance was quantitatively assessed using a suite of standard metrics, including Accuracy (ACC), Precision/Recall/F1-score, and AUROC, while its classification behavior was qualitatively analyzed through confusion matrices and ROC curves. The results obtained demonstrate a high degree of consistency with established benchmarks and findings from open publications in the 2020–2025 period, thereby confirming the reliability, validity, and reproducibility of our proposed interpretive approach

Ключові слова

фейкові новини, LLM, XAI, Integrated Gradients, SHAP, UMAP, t-SNE, людина-в-петлі, fake news, LLM, XAI, Integrated Gradients, SHAP, t-SNE, human-in-the-loop

Бібліографічний опис

Вовк С. Метод інтерпретування результатів виявлення фейкових новин за великою мовною моделлю / С. Вовк, П. Радюк, Т. Скрипник // Herald of Khmelnytskyi National University. Technical Sciences. – 2025. – Vol. 359, No. 6.2. – P. 149-156.

URI

https://elar.khmnu.edu.ua/handle/123456789/20296

Зібрання

Вісник ХНУ. Технічні науки - 2025 рік

Повна інформація про документ