Комбінований алгоритм стиснення даних, представлених в текстовому форматі
Вантажиться...
Файли
Дата
2019
Автори
Лебіга, М.М.
Пасічник, О.А.
Скрипник, Т.К.
Медведчук, В.Ю.
Lebiga, M.M.
Pasichnyk, O.A.
Skrypnyk, T.K.
Medvedchuk, V.Y.
Назва журналу
Номер ISSN
Назва тому
Видавець
Хмельницький національний університет
Анотація
В роботі розроблено комбінований алгоритм стиснення текстових даних задля мінімізації об’єму
контенту, який є основою відповідної інформаційної технології. В результаті аналізу отриманих
експериментальних даних було обрано оптимальні шляхи мінімізації об’єму текстового контенту на основі
алгоритму Хаффмана за допомогою основних і найбільш розповсюджених типів структур текстових даних.
The purpose of the work is to develop a combined algorithm for text data compression. To achieve this goal, the following research objectives have been identified: reviewing existing universal data compression algorithms; analysis of data types and text formats; development of a specialized compression algorithm based on the Huffman algorithm, taking into account the data structure. The two main and most common types of text data structure are graph and tree. To achieve the greatest compression of texts in natural languages, it is decided to use a tree-like structure of the dictionary with a fixed number of branches at each level, as the size of branching decreases with increasing depth, which with a fixed size of branches will lead to a strong redundancy and high memory consumption. To achieve the highest compression of html and xml texts, it is decided to use a dictionary based on a weighted graph with an unspecified number of branches on the node. The graph provides a flexible system by which any structure can be emulated, but constructing such a structure is challenging. An analysis of the amount of savings by optimization is calculated based on the difference between the savings of recording branch information and the number of bits that need to be recorded to inform the decoder of branch usage. If the difference is greater than zero, the information is considered to be advantageous for recording. To record tree-based dictionary optimization information, you must specify all the dictionary branches used. This can be done using a recursive algorithm. The result of the work is the development of a combined algorithm for compressing text data whose structure can be represented as a tree or graph. The developed compression algorithm is based on the Huffman algorithm and is the basis for implementation of the relevant information system.
The purpose of the work is to develop a combined algorithm for text data compression. To achieve this goal, the following research objectives have been identified: reviewing existing universal data compression algorithms; analysis of data types and text formats; development of a specialized compression algorithm based on the Huffman algorithm, taking into account the data structure. The two main and most common types of text data structure are graph and tree. To achieve the greatest compression of texts in natural languages, it is decided to use a tree-like structure of the dictionary with a fixed number of branches at each level, as the size of branching decreases with increasing depth, which with a fixed size of branches will lead to a strong redundancy and high memory consumption. To achieve the highest compression of html and xml texts, it is decided to use a dictionary based on a weighted graph with an unspecified number of branches on the node. The graph provides a flexible system by which any structure can be emulated, but constructing such a structure is challenging. An analysis of the amount of savings by optimization is calculated based on the difference between the savings of recording branch information and the number of bits that need to be recorded to inform the decoder of branch usage. If the difference is greater than zero, the information is considered to be advantageous for recording. To record tree-based dictionary optimization information, you must specify all the dictionary branches used. This can be done using a recursive algorithm. The result of the work is the development of a combined algorithm for compressing text data whose structure can be represented as a tree or graph. The developed compression algorithm is based on the Huffman algorithm and is the basis for implementation of the relevant information system.
Опис
Ключові слова
комбінований алгоритм, алгоритм Хаффмана, мінімізація, оптимізація, граф, дерево, текстові дані, словник, combined algorithm, Huffman algorithm, minimization, optimization, graph, tree, text data, dictionary
Бібліографічний опис
Комбінований алгоритм стиснення даних, представлених в текстовому форматі [Текст] / М. М. Лебіга, О. А. Пасічник, Т. К. Скрипник, В. Ю. Медведчук // Вісник Хмельницького національного університету. Технічні науки. – 2019. – №6. – С. 131-133.