Оптимизация достоверности информации на основе базы электронных документов и особенностей правил контроля базы знаний
- № 3(21) 2019
Страницы:
57
–
74
Язык: русский
Аннотация
Сформулирована проблема повышения достоверности информации в системах электронного документооборота на основе методов и алгоритмов, основанных на использовании типичных инструментов поиска, распознавания, классификации документов, а также генерации, трансляции текстов с одного языка в другой, контроля и коррекции орфографических ошибок различной кратности. Предложены концептуальные принципы использования информационной избыточности различной природы – статистической, естественной, структурно-технологической, семантической при повышении достоверности информации. Разработаны алгоритмы, использующие логические, семантические и структурно-технологические связи между элементами документа, получены инструменты использования перекрестных взаимосвязей между отдельными или группами записей в составе кадра информации.
Предложены инструментарии контроля элементов электронных документов с опорой на выбранные элементы из базы знаний и на основе использования экспертных систем. Разработаны алгоритмы оптимизации размещения электронных документов со множеством элементов, атрибутов, концептов, фрактальных характеристик в базах данных и базы знаний с механизмом регулирования переменных на основе генетических алгоритмов поиска для выбора объекта с нужными характеристиками. Получены оценки меры близости элементов документа, проведен анализ значений коэффициента выигрыша в достоверности зависимости от объема информации при различных значениях вероятности необнаруженных ошибок по синтезированным алгоритмам. Исследована эффективность реализованной технологической схемы по критерию трудоёмкости обработки информации.
In this article has been formulated problem of increasing the information reliability in electronic document management systems based on the use of algorithms based on performing typical functions of search, recognition, classification, generation, translation of texts from one language to another, as well as monitoring and correcting errors of various multiplicity. The use of information redundancy of various nature, in particular, statistical, natural, structural, technological, semantic, has been substantiated and conceptual principles, methods, algorithms and software have been developed to increase information reliability. The control principle of the fidelity of elements, key concepts (words, phrases, terms) is proposed by comparing the entered document with the reference document — the original, as well as evaluating the information reliability based on the discrepancy coefficient. Algorithms have been developed that use logical, semantic, and structural-technological links between elements of document, moreover the tools have been obtained for using cross-relationships between individual records or groups of records as part of an information frame. Instruments of information control are designed based on selected elements of the knowledge base (KB) and the use of expert systems (ES). Algorithms for optimizing the allocation of ED with a variety of elements, attributes, concepts, fractal characteristics in databases (DB) and KB with the use of genetic search algorithms to select a document with the desired characteristics are proposed. The analysis results of proximity measure of document elements depending on the volume of the processed information and the probability of not detecting errors by the synthesized algorithm are obtained, and the effectiveness of the technology according to the complexity of information processing is investigated.
In this article has been formulated problem of increasing the information reliability in electronic document management systems based on the use of algorithms based on performing typical functions of search, recognition, classification, generation, translation of texts from one language to another, as well as monitoring and correcting errors of various multiplicity. The use of information redundancy of various nature, in particular, statistical, natural, structural, technological, semantic, has been substantiated and conceptual principles, methods, algorithms and software have been developed to increase information reliability. The control principle of the fidelity of elements, key concepts (words, phrases, terms) is proposed by comparing the entered document with the reference document — the original, as well as evaluating the information reliability based on the discrepancy coefficient. Algorithms have been developed that use logical, semantic, and structural-technological links between elements of document, moreover the tools have been obtained for using cross-relationships between individual records or groups of records as part of an information frame. Instruments of information control are designed based on selected elements of the knowledge base (KB) and the use of expert systems (ES). Algorithms for optimizing the allocation of ED with a variety of elements, attributes, concepts, fractal characteristics in databases (DB) and KB with the use of genetic search algorithms to select a document with the desired characteristics are proposed. The analysis results of proximity measure of document elements depending on the volume of the processed information and the probability of not detecting errors by the synthesized algorithm are obtained, and the effectiveness of the technology according to the complexity of information processing is investigated.