论文阅读---REALISE model-编程学习网

utilizes multiple encoders to obtain the semantic ,phonetic , and graphic information to distinguish the similarities of Chinese characters and correct the spelling errors.
2.And then, develop a selective modality fusion module to obtain the context-aware multimodal representations.
3.Finally ,the output layer predict the probabilities of error corrections.

Semantic encoder:

BERT, which provides rich contextual word representation with the unsupervised pretraining on large corpora.

from transformers import BertTokenizertokenizer = BertTokenizer.from_pretrained('bert-base-chinese')

Tokenizer是一种文本处理工具，用于将文本分解成单个单词（称为tokens）或其他类型的单位，例如标点符号和数字。在自然语言处理领域，tokenizer通常用于将句子分解为单个单词或词元，以便进行文本分析和机器学习任务。常用的tokenizer包括基于规则的tokenizer和基于机器学习的tokenizer，其中基于机器学习的tokenizer可以自动识别单词和短语的边界，并将其分解为单个tokens。

Phonetic encoder

pinyin： initial（21）+final（39）+tone（5）
hierarchical phonetic encoder ：character-level encoder and sentence-level encoder

Character-level encoder

GRU:
GRU（Gate Recurrent Unit）是循环神经网络（Recurrent Neural Network, RNN）的一种。和LSTM（Long-Short Term Memory）一样，也是为了解决长期记忆和反向传播中的梯度等问题而提出来的。

GRU和LSTM在很多情况下实际表现上相差无几，那么为什么我们要使用新人GRU（2014年提出）而不是相对经受了更多考验的LSTM（1997提出）呢。
我们在我们的实验中选择GRU是因为它的实验效果与LSTM相似，但是更易于计算。

Sentence-level Encoder: obtain the contextualized phonetic representation for each Chinese characters

4-layer Transformer with the same hidden size as the semantic encoder
because independent phonetic vectors are not distinguished in order, so we add the positional embeading to each vector. +pack the vector together ->transformer layers to calculate the contextualized representation in acoustic modality.

Graphic Encoder

ResNet
three fonds correpond to the three channels of the character images whose size is set to 32*32 pixel

Selective Modality Fusion Module

Ht, Ha,Hv ==textual ,acoustic,visual
fuse information i n different modalities
selective gate unit: select how much information flow to the mixed multimodal representation.
gate values :fully-connected layer followed by a sigmoid function.

Acoustic and Visual Pretraining

aims to learn the acoustic-textual and visual-textual relationships
phonetic encoder:input method pretraining objective
graphhic encoder:OCP pretraining objective

data:SIGHAN —>convert to simplified chinese by using the OPENCC tools

two level :detection and correction level to test the model

来源地址：https://blog.csdn.net/qq_48566899/article/details/132560529

文章详情

论文阅读---REALISE model

Semantic encoder:

Phonetic encoder

Character-level encoder

Sentence-level Encoder: obtain the contextualized phonetic representation for each Chinese characters

Graphic Encoder

Selective Modality Fusion Module

Acoustic and Visual Pretraining

软考中级精品资料免费领

相关文章

猜你喜欢

论文阅读---REALISE model

论文阅读：Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

论文阅读_模型结构_LoRA

论文阅读_图形图像_U-NET

【论文阅读笔记】Endoscopic navigation in the absence of CT imaging

[论文阅读笔记25]A Comprehensive Survey on Graph Neural Networks

论文阅读 FOCUS-AND-DETECT: A SMALL OBJECT DETECTION FRAMEWORK FOR AERIAL IMAGES

大模型论文阅读神器来了！5秒翻译67页论文，直接截图提问，网页可试玩

Discuz论坛权限管理：阅读权限设置指南

阅读PHP文档的10个技巧

ChatGPT“克星”：用AI识别AI生成的文本，英语论文阅读笔记都能测出

看论文不用来回翻了，这款PDF阅读神器能自动提取前文信息

次次挂在论文上？软考论文阅卷，真的有点"玄学"！

MAC ｜如何在mac上阅读caj文件？

win10系统怎么安装CAJviewer文献阅读器?

用 Glow 在 Linux 终端阅读和管理 Markdown 文件

Python3.8官网文档之类的基础语法阅读

了解如何使用 Norka 编辑器阅读文本

wordpress怎么不用插件实现文章阅读数

华为阅读与果麦文化达成全面合作，果麦旗下热门书籍上线华为阅读