Комбінований алгоритм стиснення даних, представлених в текстовому форматі

Лебіга, М.М.; Пасічник, О.А.; Скрипник, Т.К.; Медведчук, В.Ю.; Lebiga, M.M.; Pasichnyk, O.A.; Skrypnyk, T.K.; Medvedchuk, V.Y.

Комбінований алгоритм стиснення даних, представлених в текстовому форматі

Файли

10.pdf (1.31 MB)

Дата

2019

Автори

Лебіга, М.М.

Пасічник, О.А.

Скрипник, Т.К.

Медведчук, В.Ю.

Lebiga, M.M.

Pasichnyk, O.A.

Skrypnyk, T.K.

Medvedchuk, V.Y.

Видавець

Хмельницький національний університет

Анотація

В роботі розроблено комбінований алгоритм стиснення текстових даних задля мінімізації об’єму контенту, який є основою відповідної інформаційної технології. В результаті аналізу отриманих експериментальних даних було обрано оптимальні шляхи мінімізації об’єму текстового контенту на основі алгоритму Хаффмана за допомогою основних і найбільш розповсюджених типів структур текстових даних.
The purpose of the work is to develop a combined algorithm for text data compression. To achieve this goal, the following research objectives have been identified: reviewing existing universal data compression algorithms; analysis of data types and text formats; development of a specialized compression algorithm based on the Huffman algorithm, taking into account the data structure. The two main and most common types of text data structure are graph and tree. To achieve the greatest compression of texts in natural languages, it is decided to use a tree-like structure of the dictionary with a fixed number of branches at each level, as the size of branching decreases with increasing depth, which with a fixed size of branches will lead to a strong redundancy and high memory consumption. To achieve the highest compression of html and xml texts, it is decided to use a dictionary based on a weighted graph with an unspecified number of branches on the node. The graph provides a flexible system by which any structure can be emulated, but constructing such a structure is challenging. An analysis of the amount of savings by optimization is calculated based on the difference between the savings of recording branch information and the number of bits that need to be recorded to inform the decoder of branch usage. If the difference is greater than zero, the information is considered to be advantageous for recording. To record tree-based dictionary optimization information, you must specify all the dictionary branches used. This can be done using a recursive algorithm. The result of the work is the development of a combined algorithm for compressing text data whose structure can be represented as a tree or graph. The developed compression algorithm is based on the Huffman algorithm and is the basis for implementation of the relevant information system.

Ключові слова

комбінований алгоритм, алгоритм Хаффмана, мінімізація, оптимізація, граф, дерево, текстові дані, словник, combined algorithm, Huffman algorithm, minimization, optimization, graph, tree, text data, dictionary

Бібліографічний опис

Комбінований алгоритм стиснення даних, представлених в текстовому форматі [Текст] / М. М. Лебіга, О. А. Пасічник, Т. К. Скрипник, В. Ю. Медведчук // Вісник Хмельницького національного університету. Технічні науки. – 2019. – №6. – С. 131-133.

URI

https://elar.khmnu.edu.ua/handle/123456789/8838

Зібрання

Вісник ХНУ. Технічні науки - 2019 рік

Повна інформація про документ