本篇主要解析Token Passing算法实现的核心过程,主要是四个步奏,配合代码理解如下,最后一部分是词网络拓展,个人感觉复杂度大于解码过程,所以放在最后交代。
1. StartRecognition
这部分需要做一些pri
和vri
的初始化工作,这两个值在前期的InitVRecInfo
函数中已经做了更全面的初始化,之后对pri->net->initial
附着一个实例,往后的解码过程实际上是一个bfs的过程,而AttachInst
函数实际上是给pri->link
这个双向链表加了初始的节点。之后的过程大体上是:
- 从链表中取节点进行节点内token算分,节点间token传递
- 不断从链表中剪枝,即拿掉不满足某种阈值条件的节点
1 | pri->net->final.inst=pri->net->initial.inst=NULL; |
节点含义
需要搞清楚几个节点的含义,我的理解如下:1
2
3
4
5
6
7
8// n_hmm = 2: HMM 模型节点
// n_word = 4: 词节点
// n_tr0 = 4: 一类特殊的点,貌似可以从start直接调到fin状态
// n_wd0 = 1: 最后一个发音节点,如果是5状态的话,就是3节点
最重要的是词节点和模型节点【对音素HMM建模】,token passing算法实现时的多数操作集中在对俩类节点的操作。
AttachInst
关于AttachInst
函数,很重要的一个作用是对pri->head
这个双向链表加入了初始节点:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90 TokenSet *cur;
NetInst *inst;
int i,n;
inst=(NetInst*) New(&pri->instHeap,0);
// node->info.hmm->numState include entry and exit nodes
// represent the HMM nodes
if (node_hmm(node))
n=node->info.hmm->numStates-1;
else
n=1;
// position of the instance in the network
inst->node=node;
// tokenset in HMM non-exit status
inst->state=(TokenSet*) New(pri->stHeap+pri->psi->stHeapIdx[n],0);
// tokenset in HMM exit status
inst->exit=(TokenSet*) New(pri->stHeap+pri->psi->stHeapIdx[1],0);
// const Token null_token={LZERO, 0.0, NULL, NULL}
// 是要传递的token,初始化为空
inst->exit->tok=null_token;
// handle exit nodes
// pri->nToks: maximum tokens to propagate
// could be ignored:这一步不求N-best可以忽略
if (pri->nToks>1) {
// set: sorted tokens
inst->exit->set=(RelToken*) New(&pri->rTokHeap,0);
// n: 0 == 1-best keep only one answer
// 1 >== N-best: keep N best answers
inst->exit->n=1;
inst->exit->set[0]=rmax;
}
else {
// only need to transfer one token:表示1-best
inst->exit->n=0;
}
// initial each status nodes, same as exit node
// 1 2 3 4
for (i=1,cur=inst->state;i<=n;i++,cur++) {
cur->tok=null_token;
if (pri->nToks>1) {
cur->set=(RelToken*) New(&pri->rTokHeap,0);
cur->n=1;
cur->set[0]=rmax;
}
else {
cur->n=0;
}
}
// #define LZERO (-1.0E10) ~log(0)
// to prune the likelihood
// keep the max likelihood of the status:状态的最大可能值
inst->max=LZERO;
// instance: double link
// pri->tail newest pri->head oldest
// append new instance to the pri:拓展网络【实际是链表】
// pri->link will be accessed out of this function
inst->link=&pri->tail;
inst->knil=pri->tail.knil;
inst->link->knil=inst;
inst->knil->link=inst;
// attach instance to a node
node->inst=inst;
// Exit token reaches word node in t=0
// the last phone node
if (node_wd0(node))
// inst->wdlk: Max likelihood of t=0 path to word end node
inst->wdlk=LikeToWord(inst->node);
else
inst->wdlk=LZERO;
// num of active nodes
pri->nact++;
/* New node needs any currently alive following insts moved */
/* to be more recent than it to ensure tokens propagated in */
/* correct order. */
// out of order
inst->ooo=TRUE; /* Need keep list in propagation order */
// move node and it's subsequence into the pri->link
ReOrderList(node);
ReOrderList
ReOrderList
是一个DFS逻辑,将该节点的后续节点加入pri->head
这个链表,注意,node->nlinks
可能是1和n,如果是模型节点的话,只能为1,词节点则可能为n。这个函数作用应该是保证节点间token传递的顺序,代码逻辑如下:
1 | NetLink *dest; |
ReOrderList
将存在的节点后继加入解码网络,之后继续对每一个存在节点进行DFS逻辑的搜索。
net->initial的初始化
net->initial
这个node是在ExpandWordNet
函数中AddInitialFinal
实现的,用于初始化表征解码网络的入口和出口initial
和final
,初始化initial
节点如下【实际上该节点的后继是网络根节点的发音实例】:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40 PronHolder *pInst;
NetNode *node;
LNode *thisLNode;
int ninitial = 0;
int i,type;
// for num of nodes
// a node is a word
for (i=0; i < wnet->nn; i++)
// find root node
if (wnet->lnodes[i].pred == NULL)
// each word may has multiple prons
for (pInst=(PronHolder*)wnet->lnodes[i].sublat; pInst!=NULL;pInst=pInst->next)
ninitial++;
// ninitial: get how many prons
// network init nodes
net->initial.type = n_word;
net->initial.tag = NULL;
net->initial.info.pron = NULL;
net->initial.nlinks = 0;
net->initial.links =
(NetLink *)New(net->heap,ninitial*sizeof(NetLink));
for (i=0,thisLNode=wnet->lnodes; i<wnet->nn; i++,thisLNode++) {
// find root: initial nodes
if (thisLNode->pred != NULL) continue;
// node's pron
for (pInst=(PronHolder*)thisLNode->sublat;pInst!=NULL;pInst=pInst->next) {
// Chain of initial models
// 指向每一个发音的starts节点
if (xc==0) node=pInst->starts;
else if (pInst->nphones!=0) node=pInst->lc[0];
else node=FindWordNode(NULL,pInst->pron,pInst,n_word);
// modify nlinks and point links to the node
net->initial.links[net->initial.nlinks].node = node;
net->initial.links[net->initial.nlinks++].like = 0.0;
}
}
// ...
实际上Lattice和Network之间是有一定的关系的,Lattice是最上层词网络拓扑,Lnode
表征词节点,LArc
表征词词之间的链接,每一个LNode
可能会有多个发音,每一个发音建立一个PronHolder
,这也是个链表结构,可以通过next
寻访到下一个发音,其中有关模型节点的是下面三个结构体:1
2
3
4NetNode *starts; /* Chain of initial models */
NetNode *ends; /* Chain of final models */
// point to status node, each node present a single phone model
NetNode *chain; /* Chain of other nodes in word */
这三个结构和加入pri->head
的实际上是同样的节点。都是模型节点。
执行完AttachInst
之后,继续对该节点实例进行初始化,同时加入pri->head
链表。接下来就可以进行token的传递算法了,那里面主要就是对该链表中的Netnode
进行反复的节点内外的算分,传递和节点的剪枝,添加。AttachInst
剩下的代码如下:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31// init initial node's instance
inst=pri->net->initial.inst;
inst->state->tok.like=inst->max=0.0;
inst->state->tok.lm=0.0;
inst->state->tok.path=NULL;
inst->state->n=((pri->nToks>1)?1:0);
vri->genMaxNode=vri->wordMaxNode=NULL;
vri->genMaxTok=vri->wordMaxTok=null_token;
// #define LSMALL (-0.5E10) log values < LSMALL are set to LZERO
pri->wordThresh=pri->genThresh=pri->nThresh=LSMALL;
// Most likely node/word node in the network
pri->genMaxNode=pri->wordMaxNode=NULL;
pri->genMaxTok=pri->wordMaxTok=null_token;
// init pri->head in AttachInst(&pri->net->initial)
for (inst=pri->head.link;inst!=NULL && inst->node!=NULL;inst=next)
// inst->max: likelihood for pruning of instance
// cutoff
if (inst->max<pri->genThresh) {
next=inst->link;
DetachInst(inst->node);
}
else {
pri->nxtInst=inst;
// call SetEntryState and new instance, append to the pri->head
StepInst2(inst->node);
next=pri->nxtInst->link;
}
2. ProcessObservation
该函数传入Observation
,一帧一帧的处理观测序列
首先进行必要的赋初值之后,根据链表中的活跃节点数目进行一个全局剪枝,因为是一帧一帧的处理,所以token只会向后传递一层。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53void ProcessObservation(VRecInfo *vri,Observation *obs,int id, AdaptXForm *xform)
{
NetInst *inst,*next;
int j;
float thresh;
// get private info
pri=vri->pri;
inXForm = xform;
/* sepcifies the transform to use for this observation */
pri->psi->sBuf[1].n=((pri->nToks>1)?1:0);
/* Needed every observation */
// Current frame number
pri->frame++;
// Current Observation
pri->obs=obs;
if (id<0) pri->id=(pri->prid<<20)+pri->frame;
else pri->id=id;
/* Max model pruning is done initially in a separate pass */
// vri->maxBeam: Maximum model instance beam
// pri->nact: num of active nodes in dual links
if (vri->maxBeam>0 && pri->nact>vri->maxBeam) {
// qsn: quick sort num
if (pri->nact>pri->qsn) {
if (pri->qsn>0)
Dispose(&vri->heap,pri->qsa);
pri->qsn=(pri->nact*3)/2;
pri->qsa=(LogFloat*) New(&vri->heap,pri->qsn*sizeof(LogFloat));
}
// qsa: quick sort array
// inst->max: Likelihood for pruning of instance
for (inst=pri->head.link,j=0;inst!=NULL;inst=inst->link,j++)
pri->qsa[j]=inst->max;
// num of nodes in dual links more than maxBeam: cutoff
if (j>=vri->maxBeam) {
qcksrtM(pri->qsa,0,j-1,vri->maxBeam);
thresh=pri->qsa[vri->maxBeam];
// start cutoff for the first time
if (thresh>LSMALL)
for (inst=pri->head.link;inst->link!=NULL;inst=next) {
next=inst->link;
if (inst->max<thresh)
DetachInst(inst->node);
}
}
}
// ...
}
节点内传递
之后就进行第一个token传递计算,在节点内部,执行完毕,更新maxToken和maxNode1
2
3
4
5
6
7
8
9
10
11
12
13
14// pri->genMaxTok: Most likely token
// pri->wordMaxTok: Most likely word end token
// update in StepInst1 in StepHMM1
pri->genMaxTok=pri->wordMaxTok=null_token;
pri->genMaxNode=pri->wordMaxNode=NULL;
// nodes represent phones
for (inst=pri->head.link,j=0;inst!=NULL;inst=inst->link,j++)
if (inst->node)
// stepHMM1 stepInst1
// calcu aij + bj(Ot) for each status in the node
StepInst1(inst->node);
//...
StepInst1
StepInst1
做了一个分类,主要针对HMM模型节点:1
2
3
4
5
6
7
8
9
10
11
12
13
14static void StepInst1(NetNode *node)
/* First pass of token propagation (Internal) */
{
// model node
if (node_hmm(node))
// inside a single node
// calcu max possibility on each status j[1 < j < N] inside a node
StepHMM1(node);
/* Advance tokens within HMM instance t => t-1 */
/* Entry tokens valid for t-1, do states 2..N */
else
StepWord1(node);
node->inst->pxd=FALSE;
}StepHMM1
计算在该节点内部token的传递,在状态节点之间的实现如下,对应的模型应该如图:
StepHMM1
1 | static void StepHMM1(NetNode *node) |
使用图例可以表示为:
对于exit节点单独处理,处理逻辑和上面处理status节点类似,最后更新了exit节点的token值
1 | // Exit state (ignoring tee trP) |
以上是StepHMM1
函数,它实现的是计算一个节点内部2,3,4,exit音素节点在当前观测序列下最大的可能值。在此期间更新的值有如下:1
2inst->max: 当前节点上实例中每个状态节点的最大like值
pri->wordMaxNode: 最可能的此节点值path
和align
分别对应词级别间的回溯路径和状态级别的回溯路径,只有前者是必要的。上述代码中align
这部分已经删除。而path
信息则在节点间传递时维护。
节点间传递
ProcessObservation
函数下面的部分进行节点之间的token传递,实现如下:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55// #define LSMALL (-0.5E10) log values < LSMALL are set to LZERO
// update in StepHMM1
// vri->wordBeam: Separte word end beam width
// pri->wordThresh: Cutoff for word end propagation
// after StepHMM1, use results to update some global beam
pri->wordThresh=pri->wordMaxTok.like-vri->wordBeam;
if (pri->wordThresh<LSMALL) pri->wordThresh=LSMALL;
// pri->genMaxTok.like update in StepHMM1
// vri->genBeam: Global beam width
// pri->genThresh: Cutoff from global beam
pri->genThresh=pri->genMaxTok.like-vri->genBeam;
if (pri->genThresh<LSMALL) pri->genThresh=LSMALL;
if (pri->nToks>1) {
// vri->nBeam: Beam width for non-best tokens
pri->nThresh=pri->genMaxTok.like-vri->nBeam;
if (pri->nThresh<LSMALL/2) pri->nThresh=LSMALL/2;
}
/* Pass 2 Performs external token propagation and pruning */
for (inst=pri->head.link,j=0;inst!=NULL && inst->node!=NULL;inst=next,j++)
if (inst->max<pri->genThresh) {
next=inst->link;
// remove from inst list
DetachInst(inst->node);
}
else {
pri->nxtInst=inst;
// call SetEntryState
// add new instance to pri->tail and reorder it
StepInst2(inst->node);
next=pri->nxtInst->link;
}
// npth: Current number of path records
// cpth: Number of path records after last collection
// nalign: Current number of align records
// caligh: Number of align records after last collection
// wait for the time to collect path
if ((pri->npth-pri->cpth) > vri->pCollThresh ||
(pri->nalign-pri->calign) > vri->aCollThresh)
CollectPaths();
// pri->tact: total active nodes
pri->tact+=pri->nact;
// update
vri->frame=pri->frame;
vri->nact=pri->nact;
vri->genMaxNode=pri->genMaxNode;
vri->wordMaxNode=pri->wordMaxNode;
vri->genMaxTok=pri->genMaxTok;
vri->wordMaxTok=pri->wordMaxTok;
StepInst2
StepInst2
是对pri->head
链表进行扩展的部分,类似BFS的逻辑在这里实现,还实现了节点之间的token传递1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77/* Second pass of token propagation (External) */
static void StepInst2(NetNode *node)
/* Must be able to survive doing this twice !! */
{
Token tok;
TokenSet xtok;
RelToken rtoks[MAX_TOKS];
NetLink *dest;
LogFloat lm;
int i,k;
// == 4
// word end node
if (node_word(node))
// P(O) = argmax P(O|W) + LMSF * logP(W) + N * logWIP
// add path
// inst->exit->tok.like
// calcu LMSF * logP(W) + N * logWIP and create path
// 为当前节点的exit节点建立索引path,该path的prev指向该节点start的索引path,而这个start节点的索引path实际上在SetEntryState被初始化为上一个节点的exit状态的索引path
StepWord2(node);
/* Merge tokens and update traceback */
// & 4
else if (node_tr0(node) /* && node_hmm(node) */
// calcu exit status node: 这是个十分特殊的节点
StepHMM2(node);
// xtok: node->inst->exit
// xtok init by node->inst->exit
// hmm nodes
// get exit node's token
// 当前音素节点中exit节点的token信息被记录下来了,不管节点类型
// token里面有path和分数信息
tok=node->inst->exit->tok;
xtok.tok=tok;
xtok.n=node->inst->exit->n;
// new RelToken sets
xtok.set=rtoks;
for (k=0;k<xtok.n;k++)
xtok.set[k]=node->inst->exit->set[k];
if (node_word(node))
if (tok.like<pri->wordThresh)
tok=null_token;
// ok
// tok: exit node
if (tok.like>pri->genThresh) {
// connected nodes
// words node has many:词节点有多个后继
// hmm has only one:模型节点只有一个,符合常识,如果是模型节点,则相当于把token传递给了之后的那个节点,下一次可以被寻访到
// pass token to next nodes[model nodes]
for(i=0,dest=node->links;i<node->nlinks;i++,dest++) {
// transition likelihood
lm=dest->like;
// pri->scale: LM (Net probs) scale factor
xtok.tok.like=tok.like+lm*pri->scale;
// tok.lm: LM likelihood of token
xtok.tok.lm=tok.lm+lm;
for (k=0;k<xtok.n;k++)
xtok.set[k].lm=node->inst->exit->set[k].lm+lm;
// pri->genThresh: Cutoff from global beam
if (xtok.tok.like>pri->genThresh) {
// call AttachInst(node)
// expand network
// pass exit token to dest->nodes => xtok
SetEntryState(dest->nodes, &xtok);
/* Transfer set of tokens to node, activating when necessary */
/* choosing N most likely after adding transition likelihood */
}
}
}
node->inst->pxd=TRUE;
}
StepWord2
StepWord2
处理词节点,用新建的path
更新inst->exit->tok.path
,它的prev
记录为inst->state->tok.path
,这个值被前一个节点的inst->exit->tok.path
初始化,在StepWord2
中1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57NetInst *inst;
Path *newpth,*oldpth;
RelToken *cur;
NxtPath *rth;
int i,k;
inst=node->inst;
// info.pron == NULL ?
if (node->info.pron==NULL && node->tag==NULL) {
inst->exit->tok=inst->state->tok;
inst->exit->n=inst->state->n;
for (k=0;k<inst->exit->n;k++)
inst->exit->set[k]=inst->state->set[k];
}
else {
inst->exit->tok=inst->state->tok;
if (node->info.pron!=NULL) {
// pri->wordpen: word penity
inst->exit->tok.like+=pri->wordpen;
// node->info.pron->prob: Log probability of pronunciation
// pri->pscale: LM (Net probs) scale factor
inst->exit->tok.like+=node->info.pron->prob * pri->pscale;
// P(O) = argmax P(O|W) + LMSF * logP(W) + N * logWIP
}
// append new path to pNoRef
// path->chain = NULL
// path->used = FALSE
// new path only in word node and pass the path info in StepInst2 when extend pri->head
newpth=NewNRefPath();
// point to the node
newpth->node=node;
// ref times == 0
newpth->usage=0;
newpth->frame=pri->frame;
newpth->like=inst->exit->tok.like;
newpth->lm=inst->exit->tok.lm;
// not run here
if ((newpth->align=inst->exit->tok.align)!=NULL)
// MoveAlignYesRef(align)
RefAlign(newpth->align);
// assign to exit node and newpath could pass to next nodes by exit->token
inst->exit->tok.path=newpth;
inst->exit->tok.lm=0.0;
inst->exit->tok.align=NULL;
// assign position: in SetEntryState
// inst->state: start status
oldpth=inst->state->tok.path;
// oldpth != NULL
if ((newpth->prev=oldpth)!=NULL)
// if usage == 0 MovePathYesRef(path) => append pYesRef
// and then usage++
RefPath(oldpth);
搞清楚path
相关的一些问题,些直接关系到解码最后一步的搜索。
首先,只有在进行token的外部传递,即处理词节点时才会新建path
变量,记录回溯路径,函数NewNRefPath
仅仅在StepInst2 => StepWord2
用到。StepWord2
函数执行完毕之后,紧接着在StepInst2
后半部分执行SetEntryState
,path
信息伴随着xtok
进入了下一个新的节点。
SetEntryState
SetEntryState
为当前节点附上一个实例,同时传入上一个词的token,如此token即可在网络中传递了,token中包含上一个节点的路径和分值。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35// attach instance to the connected nodes
// SetEntryState(dest->node,&xtok)
// node: connected next node
// src: contains token and path info: 记录前一个节点的token
static void SetEntryState(NetNode *node,TokenSet *src)
{
NetInst *inst;
TokenSet *res;
if (node->inst==NULL)
// append to pri->tail
// call ReOrderList(node)
AttachInst(node);
inst=node->inst;
res=inst->state;
// update instance's token
if (res->n==0) {
// path info assigned: inst->state->tok.path
if (src->tok.like > res->tok.like)
res->tok=src->tok;
}
else
TokSetMerge(res,&src->tok,src);
// inst->max: Likelihood for pruning of instance
if (res->tok.like>inst->max)
inst->max=res->tok.like;
if (node->type==n_word && (pri->wordMaxNode==NULL ||
pri->wordMaxNode->inst==NULL ||
res->tok.like > pri->wordMaxNode->inst->max))
// Most likely word end node in network
pri->wordMaxNode=node;
}
现在path
可以由下图表示逻辑
CollectPaths
最后一步,定时整理之前维护的path
信息,实现在CollectPaths
之中,有些执行的部分已经删除,这一部分还有些疑惑。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64static void CollectPaths(void)
{
NetInst *inst;
TokenSet *cur;
int i,k,n;
Path *path,*plink;
Align *align,*alink;
// for each status in each inst in the pri->head
// modify path->used: in current inst link
for (inst=pri->head.link;inst!=NULL;inst=inst->link)
if (inst->node!=NULL) {
// only model nodes have multiple status
if (node_hmm(inst->node))
n=inst->node->info.hmm->numStates-1;
else
n=1;
// n: status num
// each status
for (i=1,cur=inst->state;i<=n;i++,cur++) {
path=cur->tok.path;
// path->used: Reference to struct by current inst
// not refered in order to avoid duplicate
if (path && !path->used) {
if (path->usage!=0) MovePathYesRef(path);
path->used=TRUE;
}
}
// exit node
path=inst->exit->tok.path;
if (path && !path->used) {
if (path->usage!=0) MovePathYesRef(path);
path->used=TRUE;
}
}
// if no refer, could be deleted
for (path=pri->pNoRef.link;path->link!=NULL;path=plink) {
// not in pri->head
if (!path->used) {
// not run here
if (path->align!=NULL)
DeRefAlign(path->align);
// minus path->prev->prev->usage
// if == 0 add path->prev->prev to pNoRef
DeRefPathPrev(path);
plink=path->link;
// delete path
UnlinkPath(path);
}
// in pri->head
else {
path->used=FALSE;
plink=path->link;
}
}
for (path=pri->pYesRef.link;path->link!=NULL;path=path->link) {
if (!path->used) break;
path->used=FALSE;
}
pri->cpth=pri->npth;
pri->calign=pri->nalign;
}
以上部分是处理观察序列的主要步骤。
3. CompleteRecognition
这部分主要是返回一个Lattice【执行下面这段代码】,同时重置pri
这个结构体,为之后的transcript做准备。1
CreateLattice(heap,pri->net->final.inst->exit,vri->frameDur);
该函数实现如下:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52static Lattice *CreateLattice(MemHeap *heap,TokenSet *res,HTime framedur)
{
Lattice *lat;
RelToken *cur;
Path path;
WordPron pron;
NxtPath rth[MAX_TOKS];
int nn,nl,ln,i;
NetNode node;
pron.word=NULL;pron.pnum=0;pron.next=NULL;
pron.outSym=NULL;pron.phones=NULL;pron.nphones=0;
pron.prob=0.0;
path.like=res->tok.like;
path.lm=res->tok.lm;
path.usage=0;
path.align=res->tok.align;
path.node=&node;
path.node->tag=NULL;
path.node->info.pron=&pron;
path.frame=pri->frame;
path.prev=res->tok.path;
path.chain=NULL;
// nn: num of nodes
// nl: num of arcs
nn=1;nl=0;ln=0;
// dfs
// modify usage, coding nn, nl
MarkPaths(&path,&nn,&nl);
// single lattice
// lat->lnodes = new LNode(nn)
lat=NewLattice(heap,nn,nl);
lat->voc=pri->net->vocab;
lat->lmscale=pri->scale;
lat->wdpenalty=pri->wordpen;
lat->prscale=pri->pscale;
lat->framedur=framedur;
// Time of word boundary at node
// Word represented by arc
// lnodes: Array of lattice nodes
lat->lnodes[0].time=0.0; lat->lnodes[0].word=NULL;
lat->lnodes[0].tag=NULL;
lat->lnodes[0].score=0.0;
LatFromPaths(&path,&ln,lat);
return(lat);
}MarkPaths
函数主要用来在回溯的过程中给path
的usage
编号,这将会在之后的执行中用到。该函数执行完毕之后,得出的nn和nl为node个数和弧的个数1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20/* Number/count nodes (in path->usage field) and count links */
// nn = 1, nl = 0;
// modify usage
static void MarkPaths(Path *path,int *nn,int *nl)
{
NxtPath *pth;
// path->usage: Times struct ref'd (by next path)
if (path->usage>=0) {
// minus
path->usage=-(*nn)++;
(*nl)++;
if (path->prev) MarkPaths(path->prev,nn,nl);
// may not run
for (pth=path->chain;pth!=NULL;pth=pth->chain) {
(*nl)++;
if (pth->prev) MarkPaths(pth->prev,nn,nl);
}
}
}
最主要的函数是LatFromPaths
,它将path
信息整合到Lattice
之中,为最后的TranscriptionFromLattice
做准备。
1 | // ln = 0; |
4. TranscriptionFromLattice
这里通过一个A*搜索算法,在上一步骤填充的Lattice
拓扑网络中搜索出来一个最优解或者多个,这里只讨论最优解。
先看搜索前的处理部分:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
// get best N ans, suppose N = 1
ans=(NBestEntry**) New(&gstack,sizeof(NBestEntry*)*N);ans--;
// through num of nodes
for (i=0,ln=lat->lnodes;i<lat->nn;i++,ln++) {
// ln->foll: following arcs of the nodes
// leaf node?
if (ln->foll==NULL)
ln->score=0.0;
// ~log(0): -oo
else
ln->score=LZERO;
// sorted order init all by -1
ln->n=-1;
}
n=0;
for (i=0,ln=lat->lnodes;i<lat->nn;i++,ln++)
// not ordered: 没有被排序过
if (ln->n==-1)
/*
// nn = 0; num of nodes
static void MarkBack(LNode *ln,int *nn)
{
LArc *la;
// modify flag
ln->n=-2;
// node ln's previous linked nodes
for (la=ln->pred;la!=NULL;la=la->parc)
// == 1: new node
if (la->start->n==-1)
MarkBack(la->start,nn);
ln->n=(*nn)++;
}
*/
MarkBack(ln,&n);
// n: num of nodes
// ln->n: id of nodes by DFS
order=(int*) New(&gstack, sizeof(int)*lat->nn);
// id in the net mapped to the position in the lnodes
// id => pos
for (i=0,ln=lat->lnodes;i<lat->nn;i++,ln++)
order[ln->n]=i;
// backtrack order
// 注意每个节点的score来源
/*
5
/ \
1 4
/ / \
0 2 3
*/
for (i=lat->nn-1;i>0;i--) {
// get the node of position of i in the net
ln=lat->lnodes+order[i];
// get max la->start->score
// arcs from node: 从该节点出发的弧?
for (la=ln->pred;la!=NULL;la=la->parc) {
// LArcTotLike: kinds of factor, like, scalar multiplx together in a arcs
score=ln->score+LArcTotLike(lat,la);
// update la->start->score
if (score>la->start->score)
la->start->score=score;
}
}
Dispose(&gstack,order);
// ...
之后使用AStar算法搜索出一个最优解,维护一个递增队列,不断的取出头节点,push满足条件的后继到这个队列之中,直至队列为空。这里的队列由双向链表实现,根据score
排序。
1 | // init head and tail |
5. 补充:ExpandWordNet
该函数返回一个Network
结构体,用于后续的解码过程,做的是词网络拓展。调用如下:1
2// wdNet: Lattice
net = ExpandWordNet(&netHeap,wdNet,&vocab,&hset);Lattice
通过词网络文件读入内存组织为该结构体:1
wdNet = ReadLattice(nf,&netHeap,&vocab,TRUE,FALSE)
先说ReadLattice
函数:是通过调用ReadOneLattice
来返回Lattice
的:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43Lattice *ReadLattice(FILE *file, MemHeap *heap, Vocab *voc,
Boolean shortArc, Boolean add2Dict)
{
Lattice *lat,*list,*fLat;
Source source;
AttachSource(file,&source);
if((lat=ReadOneLattice(&source,heap,voc,shortArc,add2Dict))==NULL)
return NULL;
// has other lattice
if (lat->subLatId!=NULL) {
/* Need to preserve first lattice to return */
// fLat: first Lattice
fLat=lat; lat = (Lattice *) New(heap,sizeof(Lattice)); *lat=*fLat;
// fLat keep previous one
// lat init as previous one
do {
/* Add SUBLAT to hash table for later lookup */
// add lat into the hash table named subLatId
GetSubLat(lat->subLatId,lat);
if((lat=ReadOneLattice(&source,heap,voc,shortArc,add2Dict))==NULL) {
Dispose(heap, fLat); /*fLat points to 1st thing on heap*/
return NULL;
}
}while(lat->subLatId!=NULL);
/* Clear hash table */
GetSubLat(NULL,NULL);
/* Set all chain fields to NULL */
lat->chain=NULL;
SubLatList(lat,NULL,1);
/* Set chain fields to make linked list */
lat->chain=lat;
list=SubLatList(lat,lat,1);
/* Disconnect loop */
list->chain=NULL;
/* Copy last to first Lattices to ensure lat is first thing on stack */
*fLat=*lat; lat=fLat;
}
return(lat);
}
这个函数的简单逻辑应该是如果有多个词网络文件,多次读取否则只读一次。
ReadOneLattice
函数是对词网络文件的信息包装到Lattice
结构中,包含节点,弧的信息和链接关系以及一些权值,系数的初始化。节点和弧的定义分别为LNode
和LArc
。他们之间的关系如上下图所示:
函数对Lattice
的初始化如下:1
2
3
4
5
6
7
8la->start=lat->lnodes+s;
la->end=lat->lnodes+e;
la->lmlike=lmlike;
la->farc=la->start->foll;
la->parc=la->end->pred;
la->start->foll=la;
la->end->pred=la;
下面就进行词网络拓展部分了,这部分主要是根据文件的配置信息决定如何做音素级别的拓展,因为之前的Lattice
只是最上层的词网络,节点表示词,边表示转换关系,还没有发音和HMM模型,这些信息都是在最终解码之前需要的,所以网络需要一步一步的充实。
InitPronHolders
首先为了引入发音信息,引入一个PronHolder
的结构体,做一个初始化。1
2
3
4/* First create context arrays and pronunciation instances */
// frcSil: global string
// net: new network
int nNull = InitPronHolders(net,lat,hci,voc,&holderHeap,frcSil);
首先需要对!NULL
做处理,如果是真的空节点,创建一个空的发音实例,否则不做此操作。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42 /* Reset hash table prior to processing lattice */
// wnHashTab: NetNode
for (i=0; i<WNHASHSIZE; i++)
wnHashTab[i]=NULL;
/* Determine if we have a real !NULL word */
// nullWord: Word for output when word==NULL
net->nullWord = GetWord(voc,GetLabId("!NULL", TRUE),TRUE);
// next: Next pronunciation of word
// a word can have kinds of pronunciations
for (thisPron=net->nullWord->pron;thisPron!=NULL;thisPron=thisPron->next)
// not real NULL word
if (thisPron->nphones!=0) {
net->nullWord=NULL;
break;
}
// nullword without pron
// real !NULL word
if (net->nullWord!=NULL) {
if (net->nullWord->pron==NULL)
// add a pron to a given word
NewPron(voc,net->nullWord,0,NULL,net->nullWord->wordName,1.0);
}
// frcSil: Automagically add these sil models to the end of words
// confusion: 这部分搞不懂
if (frcSil!=NULL && strlen(frcSil)>0) {
for(nSil=nAdd=0,ptr=frcSil;ptr!=NULL;ptr=nxt) {
// finish
if ((nxt=ParseString(ptr,name))==NULL) break;
if (name[0]=='+' || name[0]=='-') st=name[0],p=name+1;
else st=0,p=name;
if (strlen(p)==0) labid=NULL;
else labid=GetLabId(p,TRUE);
if (st=='+' || st==0) addPhones[++nAdd]=labid;
if (st=='-' || st==0) silPhones[++nSil]=labid;
}
}
else
nSil=nAdd=0;
接下来初始化节点的剪枝信息,有些地方可能不被执行1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27// num of nodes
for (i=0; i<lat->nn; i++) {
float fct;
LArc *la;
// current node
thisLNode = lat->lnodes+i;
fct = 0.0;
// factor lm likelihoods throughout words
// default: false, may not run
if (factorLM && thisLNode->pred!=NULL)
for (la=thisLNode->pred,fct=LZERO;la!=NULL;la=la->parc)
if (la->lmlike>fct) fct=la->lmlike;
// max item in arcs which pointed to itself
// Field used for pruning
thisLNode->score = fct;
}
// factorLM: factor lm likelihoods throughout words
if (factorLM)
for (i=0; i<lat->na; i++) {
LArc *la;
// current arcs
la=NumbLArc(lat,i);
la->lmlike-=la->end->score;
}
// ...
下面是最重要的部分,为Lattice
中的每个节点创建发音实例,一个词的发音可以有多个。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124 /* Create instance for each pronunciation in lattice */
for (i=0,nNull=0,t=0; i < lat->nn; i++) {
// current node in lattice
thisLNode = lat->lnodes+i;
// the word that the word represent
// init in ReadOneLattice by GetWordId
thisWord = thisLNode->word;
// replace by NULL word
if (thisWord==NULL) thisWord=voc->nullWord;
thisLNode->sublat=NULL;
// nAdd: get from frcSil, may be 0
pii=(PInstInfo *) New(&gstack,(thisWord->nprons+1)*(nAdd+1)*sizeof(PInstInfo));
pii--;
/* Scan current pronunciations and make modified ones */
// thisWord->pron init in ReadOneLattice
// for each pron in a single word
for (j=1,thisPron=thisWord->pron,npii=0; thisPron!=NULL; j++,thisPron=thisPron->next) {
if (thisPron->nphones==0) n=0;
else
// for each phones
// n inited by thisPron->nphones
for (k=1,n=thisPron->nphones;k<=nSil;k++)
// each phone end with sil?
if (thisPron->phones[thisPron->nphones-1]==silPhones[k]) {
/* Strip it */
n--;break;
}
// n: non-sil phones
if (thisPron->nphones==0 || nAdd==0 || n==0) {
/* Just need one pronunciation */
// !NULL one
if (thisPron->nphones==0) {
nNull++;
}
if (n==0)
n=thisPron->nphones;
pii[++npii].pron=thisPron; pii[npii].silId=-1;
pii[npii].n=n;pii[npii].t=n;
pii[npii].phones=thisPron->phones;
}
// sil phones
else {
/* Make one instance per silence label */
for (k=1;k<=nAdd;k++) {
pii[++npii].pron=thisPron;
pii[npii].silId=k;
pii[npii].n=pii[npii].t=n;
// after modify
if (addPhones[k]!=NULL)
pii[npii].t++;
pii[npii].phones=(LabId *) New(heap,sizeof(LabId)*pii[npii].t);
// copy phones
for(l=0;l<pii[npii].n;l++)
pii[npii].phones[l]=pii[npii].pron->phones[l];
// add sil
if (addPhones[k]!=NULL)
pii[npii].phones[pii[npii].n]=addPhones[k];
}
}
}
/* Scan new pronunciations and remove duplicates */
// npii: all the phones in a word
// default: true
if (remDupPron)
// each pron
for (j=2; j<=npii; j++) {
n=pii[j].t;
if (pii[j].pron==NULL) continue;
for (k=1; k<j; k++) {
if (pii[j].pron==NULL || pii[k].pron==NULL ||
pii[k].t!=n || pii[j].pron->prob!=pii[k].pron->prob)
continue;
// each phones
for(l=0;l<n;l++)
if (pii[j].phones[l]!=pii[k].phones[l]) break;
// equal
if (l==n) pii[j].pron=NULL,t++;
}
}
/* Now make the PronHolders */
for (j=1; j<=npii; j++) {
/* Don't add duplicates */
if (pii[j].pron==NULL) continue;
/* Build inst for each pron */
pInst=NewPronHolder(heap,hci,pii[j].pron,pii[j].t,pii[j].phones);
// ln: Node that created this instance
pInst->ln = thisLNode;
// add pInst into the head of lnode
pInst->next = (PronHolder*)thisLNode->sublat;
thisLNode->sublat = (SubLatDef*) pInst;
// pInst->fct: LM likelihood to be factored into each phone
if (pInst->nphones<=0) pInst->fct = 0.0;
else pInst->fct = thisLNode->score/pInst->nphones;
/* Fake connections from SENT_[START/END] */
// Number of cross word contexts
if (hci->xc>0) {
// start node ?
if (thisLNode->pred==NULL)
// Left contexts
pInst->lc[0]=(NetNode*)lat;
// end node ?
if (thisLNode->foll==NULL) {
if (pInst->nphones==0) lc=0;
// pInst->fc: final context
else lc = pInst->fc;
type = n_word + lc*n_lcontext; /* rc==0 */
wordNode=FindWordNode(net->heap,pInst->pron,pInst,type);
wordNode->tag=SafeCopyString(net->heap,thisLNode->tag);
wordNode->nlinks = 0;
pInst->rc[0]=wordNode;
}
}
else if (thisLNode->foll==NULL) {
wordNode = FindWordNode(net->heap,pInst->pron,pInst,n_word);
wordNode->tag=SafeCopyString(net->heap,thisLNode->tag);
wordNode->nlinks = 0;
}
}
Dispose(&gstack,++pii);
}
ProcessCrossWordLinks
这个函数将被执行两次,第一次heap != NULL
,第二次heap == NULL
。
遍历Lattice
网络中所有的弧,弧两端节点的不同pron构成全连接关系,第一次仅仅为开端节点标记tag。
1 | // through all the arcs |
接下来,执行CreateWIModels
之前,有一块处理没用看明白。
还有,CreateWIModels
和CreateIEModels
函数执行的对象是Lattice
中每一个节点对应的词中的每一个发音实例。一个发音实例用三音素或者五因素模型建模。这两个函数主要是处理边界音素和中间音素的。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50/* Build models on basis of contexts seen */
net->teeWords=FALSE;
// all the nodes
for (i=0; i < lat->nn; i++) {
thisLNode = lat->lnodes+i;
thisWord = thisLNode->word;
if (thisWord==NULL)
thisWord=voc->nullWord;
// word's pron
for(pInst=(PronHolder*)thisLNode->sublat;pInst!=NULL;pInst=pInst->next) {
/* !NULL consists only of word ends */
if (pInst->nphones==0) {
/* Flawed */
if (hci->xc==0) {
/* But we need a pointer for xc==0 cases */
wordNode = FindWordNode(NULL,pInst->pron,pInst,n_word);
// Chain of initial models
pInst->starts = wordNode;
// Number of models in starts chain
pInst->nstart = 0; /* Stops us adding node into chain twice */
}
continue;
}
/* Determine which bits of word are l and r cd */
if (hci->xc>0) {
for (p=0;p<pInst->nphones;p++)
if (GetHCIContext(hci,pInst->phones[p])>=0) break;
for (q=pInst->nphones-1;q>=0;q--)
if (GetHCIContext(hci,pInst->phones[q])>=0) break;
}
else {
p=0;
// nphones: Number of phones for this instance
q=pInst->nphones-1;
}
pInst->tee=TRUE;
/* Make wrd-int cd phones (possibly none!) */
// p = 0; q = pInst->nphones - 1;
CreateWIModels(pInst,p,q,net,hci);
if (hci->xc==0) {
/* Word internal context only */
CreateIEModels(thisWord,pInst,p,q,net,hci);
}
}
}
CreateWIModels
这个函数主要处理非边界音素,该处理过程针对一个发音实例,操作完毕之后的结果如图所示:
1 | NetNode *node; |
CreateIEModels
这里针对边界音素,操作结果可以表示如图:
1 | NetNode *node,*wordNode; |
最后一部分,进行第二次的ProcessCrossWordLinks
,第二次指定开端节点的后继节点,将他们分别指向末端节点发音实例的开始节点,操做完结果图示如下:
上述过程中用到了一个哈希表,寻访Lattice
中所有节点所用到的词节点,下面的操作先为这些词节点分配后继空间,之后使用ProcessCrossWordLinks
完成这些节点关系的指派。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44// Allocate NetLinks from hash table stats:分配后继空间
// Zero counters
// all nodes is word end node
for (i=0; i<WNHASHSIZE; i++) {
/* Build links for each word end model */
for (node=wnHashTab[i];node!=NULL;node=node->chain) {
if (node->nlinks>0){
// alloc node->links memory
node->links=(NetLink*) New(net->heap,sizeof(NetLink)*node->nlinks);
}else
node->links=NULL;
nxl+=node->nlinks;
// reset to 1
node->nlinks=0;
node->aux=0;
}
}
/* Finally put in the cross word links */
// 之前构建控件,在这里指派指向关系
ProcessCrossWordLinks(NULL,lat,hci->xc);
/* First disassemble wnHashTab and link to end nodes as necessary */
// handle init and final node
AddInitialFinal(lat,net,hci->xc);
// net->chain指向的是Lattice中所有词节点和模型节点,线性表示
for (i=0; i<WNHASHSIZE; i++) {
// word node
// append word node list to the net's chain
AddChain(net,wnHashTab[i]);
}
/* Finally chain all nodes together */
for (i=0; i < lat->nn; i++)
for (pInst=(PronHolder*)lat->lnodes[i].sublat;
pInst!=NULL;pInst=pInst->next) {
if (pInst->nstart>0)
AddChain(net,pInst->starts);
AddChain(net,pInst->chain);
AddChain(net,pInst->ends);
}
//...
所以现在总的来看这个层次的网络结构图如下: