HTK解码器部分代码走读

本篇主要解析Token Passing算法实现的核心过程,主要是四个步奏,配合代码理解如下,最后一部分是词网络拓展,个人感觉复杂度大于解码过程,所以放在最后交代。

1. StartRecognition

这部分需要做一些privri的初始化工作,这两个值在前期的InitVRecInfo函数中已经做了更全面的初始化,之后对pri->net->initial附着一个实例,往后的解码过程实际上是一个bfs的过程,而AttachInst函数实际上是给pri->link这个双向链表加了初始的节点。之后的过程大体上是:

  • 从链表中取节点进行节点内token算分,节点间token传递
  • 不断从链表中剪枝,即拿掉不满足某种阈值条件的节点
1
2
3
4
5
6
7
8
9
10
11
12
13
pri->net->final.inst=pri->net->initial.inst=NULL;

// store previous calculate resultes
for(i=1,pre=pri->psi->sPre+1;i<=pri->psi->nsp;i++,pre++) pre->id=-1;
for(i=1,pre=pri->psi->mPre+1;i<=pri->psi->nmp;i++,pre++) pre->id=-1;

/*
tact: total active number
frame: current process frame id
*/
pri->tact=pri->nact=pri->frame=0;
// attach a instance to the network entrance
AttachInst(&pri->net->initial);

节点含义

需要搞清楚几个节点的含义,我的理解如下:

1
2
3
4
5
6
7
8
// n_hmm = 2: HMM 模型节点
#define node_hmm(node) ((node)->type & n_hmm)
// n_word = 4: 词节点
#define node_word(node) ((node)->type == n_word)
// n_tr0 = 4: 一类特殊的点,貌似可以从start直接调到fin状态
#define node_tr0(node) ((node)->type & n_tr0)
// n_wd0 = 1: 最后一个发音节点,如果是5状态的话,就是3节点
#define node_wd0(node) ((node)->type & n_wd0)

最重要的是词节点和模型节点【对音素HMM建模】,token passing算法实现时的多数操作集中在对俩类节点的操作。

AttachInst

关于AttachInst函数,很重要的一个作用是对pri->head这个双向链表加入了初始节点:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
 TokenSet *cur;
NetInst *inst;
int i,n;

inst=(NetInst*) New(&pri->instHeap,0);

// node->info.hmm->numState include entry and exit nodes
// represent the HMM nodes
if (node_hmm(node))
n=node->info.hmm->numStates-1;
else
n=1;

// position of the instance in the network
inst->node=node;
// tokenset in HMM non-exit status
inst->state=(TokenSet*) New(pri->stHeap+pri->psi->stHeapIdx[n],0);
// tokenset in HMM exit status
inst->exit=(TokenSet*) New(pri->stHeap+pri->psi->stHeapIdx[1],0);
// const Token null_token={LZERO, 0.0, NULL, NULL}
// 是要传递的token,初始化为空
inst->exit->tok=null_token;

// handle exit nodes
// pri->nToks: maximum tokens to propagate
// could be ignored:这一步不求N-best可以忽略
if (pri->nToks>1) {
// set: sorted tokens
inst->exit->set=(RelToken*) New(&pri->rTokHeap,0);
// n: 0 == 1-best keep only one answer
// 1 >== N-best: keep N best answers
inst->exit->n=1;
inst->exit->set[0]=rmax;
}
else {
// only need to transfer one token:表示1-best
inst->exit->n=0;
}

// initial each status nodes, same as exit node
// 1 2 3 4
for (i=1,cur=inst->state;i<=n;i++,cur++) {
cur->tok=null_token;
if (pri->nToks>1) {
cur->set=(RelToken*) New(&pri->rTokHeap,0);
cur->n=1;
cur->set[0]=rmax;
}
else {
cur->n=0;
}
}

// #define LZERO (-1.0E10) ~log(0)
// to prune the likelihood
// keep the max likelihood of the status:状态的最大可能值
inst->max=LZERO;

// instance: double link
// pri->tail newest pri->head oldest
// append new instance to the pri:拓展网络【实际是链表】
// pri->link will be accessed out of this function
inst->link=&pri->tail;
inst->knil=pri->tail.knil;

inst->link->knil=inst;
inst->knil->link=inst;

// attach instance to a node
node->inst=inst;

// Exit token reaches word node in t=0
// the last phone node
if (node_wd0(node))
// inst->wdlk: Max likelihood of t=0 path to word end node
inst->wdlk=LikeToWord(inst->node);
else
inst->wdlk=LZERO;

// num of active nodes
pri->nact++;

/* New node needs any currently alive following insts moved */
/* to be more recent than it to ensure tokens propagated in */
/* correct order. */

// out of order
inst->ooo=TRUE; /* Need keep list in propagation order */
// move node and it's subsequence into the pri->link
ReOrderList(node);

ReOrderList

ReOrderList是一个DFS逻辑,将该节点的后续节点加入pri->head这个链表,注意,node->nlinks可能是1和n,如果是模型节点的话,只能为1,词节点则可能为n。这个函数作用应该是保证节点间token传递的顺序,代码逻辑如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
NetLink *dest;
int i;

// if ordered return
if (node->inst!=NULL?!node->inst->ooo:TRUE) return;
// modify flag
node->inst->ooo=FALSE;

// node->links: connected nodes
// node->nlinks: num of connected nodes 1 or n[word nodes]
for (i=0,dest=node->links;i<node->nlinks;i++,dest++) {
// Entry token reaches exit in t = 0
if (!node_tr0(dest->node)) break;
// if not NULL, many be added to the list again
if (dest->node->inst!=NULL)
// append to pti->tail
MoveToRecent(dest->node->inst);
}
// DFS
for (i=0,dest=node->links;i<node->nlinks;i++,dest++) {
if (!node_tr0(dest->node)) break;
if (dest->node->inst!=NULL)
ReOrderList(dest->node);
}

ReOrderList将存在的节点后继加入解码网络,之后继续对每一个存在节点进行DFS逻辑的搜索。

net->initial的初始化

net->initial这个node是在ExpandWordNet函数中AddInitialFinal实现的,用于初始化表征解码网络的入口和出口initialfinal,初始化initial节点如下【实际上该节点的后继是网络根节点的发音实例】:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
 PronHolder *pInst;
NetNode *node;
LNode *thisLNode;
int ninitial = 0;
int i,type;

// for num of nodes
// a node is a word
for (i=0; i < wnet->nn; i++)
// find root node
if (wnet->lnodes[i].pred == NULL)
// each word may has multiple prons
for (pInst=(PronHolder*)wnet->lnodes[i].sublat; pInst!=NULL;pInst=pInst->next)
ninitial++;
// ninitial: get how many prons

// network init nodes
net->initial.type = n_word;
net->initial.tag = NULL;
net->initial.info.pron = NULL;
net->initial.nlinks = 0;
net->initial.links =
(NetLink *)New(net->heap,ninitial*sizeof(NetLink));

for (i=0,thisLNode=wnet->lnodes; i<wnet->nn; i++,thisLNode++) {
// find root: initial nodes
if (thisLNode->pred != NULL) continue;
// node's pron
for (pInst=(PronHolder*)thisLNode->sublat;pInst!=NULL;pInst=pInst->next) {
// Chain of initial models
// 指向每一个发音的starts节点
if (xc==0) node=pInst->starts;
else if (pInst->nphones!=0) node=pInst->lc[0];
else node=FindWordNode(NULL,pInst->pron,pInst,n_word);
// modify nlinks and point links to the node
net->initial.links[net->initial.nlinks].node = node;
net->initial.links[net->initial.nlinks++].like = 0.0;
}
}
// ...

实际上Lattice和Network之间是有一定的关系的,Lattice是最上层词网络拓扑,Lnode表征词节点,LArc表征词词之间的链接,每一个LNode可能会有多个发音,每一个发音建立一个PronHolder,这也是个链表结构,可以通过next寻访到下一个发音,其中有关模型节点的是下面三个结构体:
1
2
3
4
NetNode *starts; /* Chain of initial models */
NetNode *ends; /* Chain of final models */
// point to status node, each node present a single phone model
NetNode *chain; /* Chain of other nodes in word */

这三个结构和加入pri->head的实际上是同样的节点。都是模型节点。

执行完AttachInst之后,继续对该节点实例进行初始化,同时加入pri->head链表。接下来就可以进行token的传递算法了,那里面主要就是对该链表中的Netnode进行反复的节点内外的算分,传递和节点的剪枝,添加。AttachInst剩下的代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// init initial node's instance
inst=pri->net->initial.inst;
inst->state->tok.like=inst->max=0.0;
inst->state->tok.lm=0.0;
inst->state->tok.path=NULL;
inst->state->n=((pri->nToks>1)?1:0);

vri->genMaxNode=vri->wordMaxNode=NULL;
vri->genMaxTok=vri->wordMaxTok=null_token;

// #define LSMALL (-0.5E10) log values < LSMALL are set to LZERO
pri->wordThresh=pri->genThresh=pri->nThresh=LSMALL;

// Most likely node/word node in the network
pri->genMaxNode=pri->wordMaxNode=NULL;
pri->genMaxTok=pri->wordMaxTok=null_token;

// init pri->head in AttachInst(&pri->net->initial)
for (inst=pri->head.link;inst!=NULL && inst->node!=NULL;inst=next)
// inst->max: likelihood for pruning of instance
// cutoff
if (inst->max<pri->genThresh) {
next=inst->link;
DetachInst(inst->node);
}
else {
pri->nxtInst=inst;
// call SetEntryState and new instance, append to the pri->head
StepInst2(inst->node);
next=pri->nxtInst->link;
}

2. ProcessObservation

该函数传入Observation,一帧一帧的处理观测序列
首先进行必要的赋初值之后,根据链表中的活跃节点数目进行一个全局剪枝,因为是一帧一帧的处理,所以token只会向后传递一层。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
void ProcessObservation(VRecInfo *vri,Observation *obs,int id, AdaptXForm *xform)
{
NetInst *inst,*next;
int j;
float thresh;
// get private info
pri=vri->pri;
inXForm = xform;
/* sepcifies the transform to use for this observation */
pri->psi->sBuf[1].n=((pri->nToks>1)?1:0);
/* Needed every observation */

// Current frame number
pri->frame++;
// Current Observation
pri->obs=obs;

if (id<0) pri->id=(pri->prid<<20)+pri->frame;
else pri->id=id;

/* Max model pruning is done initially in a separate pass */
// vri->maxBeam: Maximum model instance beam
// pri->nact: num of active nodes in dual links

if (vri->maxBeam>0 && pri->nact>vri->maxBeam) {
// qsn: quick sort num
if (pri->nact>pri->qsn) {
if (pri->qsn>0)
Dispose(&vri->heap,pri->qsa);
pri->qsn=(pri->nact*3)/2;
pri->qsa=(LogFloat*) New(&vri->heap,pri->qsn*sizeof(LogFloat));
}
// qsa: quick sort array
// inst->max: Likelihood for pruning of instance
for (inst=pri->head.link,j=0;inst!=NULL;inst=inst->link,j++)
pri->qsa[j]=inst->max;

// num of nodes in dual links more than maxBeam: cutoff
if (j>=vri->maxBeam) {
qcksrtM(pri->qsa,0,j-1,vri->maxBeam);
thresh=pri->qsa[vri->maxBeam];

// start cutoff for the first time
if (thresh>LSMALL)
for (inst=pri->head.link;inst->link!=NULL;inst=next) {
next=inst->link;
if (inst->max<thresh)
DetachInst(inst->node);
}
}
}
// ...
}

节点内传递

之后就进行第一个token传递计算,在节点内部,执行完毕,更新maxToken和maxNode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// pri->genMaxTok: Most likely token
// pri->wordMaxTok: Most likely word end token

// update in StepInst1 in StepHMM1
pri->genMaxTok=pri->wordMaxTok=null_token;
pri->genMaxNode=pri->wordMaxNode=NULL;

// nodes represent phones
for (inst=pri->head.link,j=0;inst!=NULL;inst=inst->link,j++)
if (inst->node)
// stepHMM1 stepInst1
// calcu aij + bj(Ot) for each status in the node
StepInst1(inst->node);
//...

StepInst1

StepInst1做了一个分类,主要针对HMM模型节点:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
static void StepInst1(NetNode *node) 
/* First pass of token propagation (Internal) */
{
// model node
if (node_hmm(node))
// inside a single node
// calcu max possibility on each status j[1 < j < N] inside a node
StepHMM1(node);
/* Advance tokens within HMM instance t => t-1 */
/* Entry tokens valid for t-1, do states 2..N */
else
StepWord1(node);
node->inst->pxd=FALSE;
}

StepHMM1计算在该节点内部token的传递,在状态节点之间的实现如下,对应的模型应该如图:

StepHMM1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
static void StepHMM1(NetNode *node)
{
inst=node->inst;
max=null_token;

hmm=node->info.hmm;
N=hmm->numStates;
// transition matrix (logs)
trP=hmm->transP;
seIndex=pri->psi->seIndexes[hmm->tIdx];

// pri->psi->sBuf: Buffer Array[2..N-1] of tokset for StepHMM1_N
// pri->psi->sBuf: public and each node use it, tmp var and copy back to current nodes
// res: tmp buffer, previous calculate results
// 2, 3, 4
// Emitting states first
for (j=2,res=pri->psi->sBuf+2;j<N;j++,res++) {
// res: tokenset of status j
i=seIndex[j][0];
endi=seIndex[j][3];

// tokenset of status i
// from i to endi
// 这里为什么要减一有点奇怪
cur=inst->state+i-1;

// res <= cur
res->tok=cur->tok; res->n=cur->n;
for (k=0;k<cur->n;k++)
res->set[k]=cur->set[k];

// from status i skip to status j
// init res to get maximum
// tok.like: likelihood of the token
res->tok.like+=trP[i][j];

// following except res
// status + i => status + endi;
for (i++,cur++;i<=endi;i++,cur++) {
cmp.tok=cur->tok;
// aij
cmp.tok.like+=trP[i][j];
// res keep max: status j
// keep best one
if (res->n==0) {
if (cmp.tok.like > res->tok.like)
res->tok=cmp.tok;
}
// don't consider
else
TokSetMerge(res,&cmp.tok,cur);
}

// pri->genThresh: Cutoff from global beam
if (res->tok.like>pri->genThresh) {
// State pruning
// calcu bj(Ot)
outp=cPOutP(pri->psi,pri->obs,hmm->svec[j].info,pri->id);
res->tok.like+=outp;

// update max status token
// max: max aij + bj(Ot)
if (res->tok.like>max.like)
max=res->tok;
}
else {
res->tok=null_token;
res->n=((pri->nToks>1)?1:0);
}
}
/* Null entry state ready for external propagation */
/* And copy tokens from buffer to instance */
// copy back
for (i=1,res=pri->psi->sBuf+1,cur=inst->state; i<N; i++,res++,cur++) {
cur->n=res->n; cur->tok=res->tok;
for (k=0;k<res->n;k++)
cur->set[k]=res->set[k];
}

/* Set up pruning limits */
// max: max token in a single node: aij + bj(Ot)
// update genMaxTok and genMaxNode
if (max.like>pri->genMaxTok.like) {
pri->genMaxTok=max;
pri->genMaxNode=node;
}
// max status likelihood in a single phone node
inst->max=max.like;
}

使用图例可以表示为:

对于exit节点单独处理,处理逻辑和上面处理status节点类似,最后更新了exit节点的token值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// Exit state (ignoring tee trP) 
// process exit status, don't need tmp var

i=seIndex[N][0];
endi=seIndex[N][5];

// res pointed to the exit node
res=inst->exit;
cur=inst->state+i-1;

res->n=cur->n;
res->tok=cur->tok;
for (k=0;k<cur->n;k++) res->set[k]=cur->set[k];

res->tok.like+=trP[i][N];
// update exit nodes
for (i++,cur++;i<=endi;i++,cur++) {
cmp.tok=cur->tok;
cmp.tok.like+=trP[i][N];

// get maximum token from i -> endi
if (res->n==0) {
if (cmp.tok.like > res->tok.like)
res->tok=cmp.tok;
}
else
TokSetMerge(res,&cmp.tok,cur);
}
if (res->tok.like>LSMALL) {
// inst->wdlk: Max likelihood of t=0 path to word end node
tok.like=res->tok.like+inst->wdlk;

// update pri->wordMaxTok.like
if (tok.like > pri->wordMaxTok.like) {
pri->wordMaxTok=tok;
pri->wordMaxNode=node;
}
} else {
inst->exit->tok=null_token;
inst->exit->n=((pri->nToks>1)?1:0);
}

以上是StepHMM1函数,它实现的是计算一个节点内部2,3,4,exit音素节点在当前观测序列下最大的可能值。在此期间更新的值有如下:

1
2
inst->max: 当前节点上实例中每个状态节点的最大like值
pri->wordMaxNode: 最可能的此节点值

pathalign分别对应词级别间的回溯路径和状态级别的回溯路径,只有前者是必要的。上述代码中align这部分已经删除。而path信息则在节点间传递时维护。

节点间传递

ProcessObservation函数下面的部分进行节点之间的token传递,实现如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
// #define LSMALL (-0.5E10) log values < LSMALL are set to LZERO
// update in StepHMM1
// vri->wordBeam: Separte word end beam width
// pri->wordThresh: Cutoff for word end propagation
// after StepHMM1, use results to update some global beam

pri->wordThresh=pri->wordMaxTok.like-vri->wordBeam;
if (pri->wordThresh<LSMALL) pri->wordThresh=LSMALL;

// pri->genMaxTok.like update in StepHMM1
// vri->genBeam: Global beam width
// pri->genThresh: Cutoff from global beam
pri->genThresh=pri->genMaxTok.like-vri->genBeam;
if (pri->genThresh<LSMALL) pri->genThresh=LSMALL;

if (pri->nToks>1) {
// vri->nBeam: Beam width for non-best tokens
pri->nThresh=pri->genMaxTok.like-vri->nBeam;
if (pri->nThresh<LSMALL/2) pri->nThresh=LSMALL/2;
}

/* Pass 2 Performs external token propagation and pruning */
for (inst=pri->head.link,j=0;inst!=NULL && inst->node!=NULL;inst=next,j++)
if (inst->max<pri->genThresh) {
next=inst->link;
// remove from inst list
DetachInst(inst->node);
}
else {
pri->nxtInst=inst;
// call SetEntryState
// add new instance to pri->tail and reorder it
StepInst2(inst->node);
next=pri->nxtInst->link;
}

// npth: Current number of path records
// cpth: Number of path records after last collection
// nalign: Current number of align records
// caligh: Number of align records after last collection
// wait for the time to collect path
if ((pri->npth-pri->cpth) > vri->pCollThresh ||
(pri->nalign-pri->calign) > vri->aCollThresh)
CollectPaths();

// pri->tact: total active nodes
pri->tact+=pri->nact;

// update
vri->frame=pri->frame;
vri->nact=pri->nact;
vri->genMaxNode=pri->genMaxNode;
vri->wordMaxNode=pri->wordMaxNode;
vri->genMaxTok=pri->genMaxTok;
vri->wordMaxTok=pri->wordMaxTok;

StepInst2

StepInst2是对pri->head链表进行扩展的部分,类似BFS的逻辑在这里实现,还实现了节点之间的token传递

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
/* Second pass of token propagation (External) */
static void StepInst2(NetNode *node)
/* Must be able to survive doing this twice !! */
{
Token tok;
TokenSet xtok;
RelToken rtoks[MAX_TOKS];
NetLink *dest;
LogFloat lm;
int i,k;
// == 4
// word end node
if (node_word(node))
// P(O) = argmax P(O|W) + LMSF * logP(W) + N * logWIP
// add path
// inst->exit->tok.like
// calcu LMSF * logP(W) + N * logWIP and create path
// 为当前节点的exit节点建立索引path,该path的prev指向该节点start的索引path,而这个start节点的索引path实际上在SetEntryState被初始化为上一个节点的exit状态的索引path
StepWord2(node);
/* Merge tokens and update traceback */
// & 4
else if (node_tr0(node) /* && node_hmm(node) */
// calcu exit status node: 这是个十分特殊的节点
StepHMM2(node);

// xtok: node->inst->exit
// xtok init by node->inst->exit
// hmm nodes

// get exit node's token
// 当前音素节点中exit节点的token信息被记录下来了,不管节点类型
// token里面有path和分数信息
tok=node->inst->exit->tok;
xtok.tok=tok;
xtok.n=node->inst->exit->n;

// new RelToken sets
xtok.set=rtoks;
for (k=0;k<xtok.n;k++)
xtok.set[k]=node->inst->exit->set[k];

if (node_word(node))
if (tok.like<pri->wordThresh)
tok=null_token;

// ok
// tok: exit node
if (tok.like>pri->genThresh) {
// connected nodes
// words node has many:词节点有多个后继
// hmm has only one:模型节点只有一个,符合常识,如果是模型节点,则相当于把token传递给了之后的那个节点,下一次可以被寻访到
// pass token to next nodes[model nodes]
for(i=0,dest=node->links;i<node->nlinks;i++,dest++) {

// transition likelihood
lm=dest->like;
// pri->scale: LM (Net probs) scale factor
xtok.tok.like=tok.like+lm*pri->scale;
// tok.lm: LM likelihood of token
xtok.tok.lm=tok.lm+lm;

for (k=0;k<xtok.n;k++)
xtok.set[k].lm=node->inst->exit->set[k].lm+lm;

// pri->genThresh: Cutoff from global beam
if (xtok.tok.like>pri->genThresh) {
// call AttachInst(node)
// expand network
// pass exit token to dest->nodes => xtok
SetEntryState(dest->nodes, &xtok);
/* Transfer set of tokens to node, activating when necessary */
/* choosing N most likely after adding transition likelihood */
}
}
}
node->inst->pxd=TRUE;
}

StepWord2

StepWord2处理词节点,用新建的path更新inst->exit->tok.path,它的prev记录为inst->state->tok.path,这个值被前一个节点的inst->exit->tok.path初始化,在StepWord2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
NetInst *inst;
Path *newpth,*oldpth;
RelToken *cur;
NxtPath *rth;
int i,k;

inst=node->inst;

// info.pron == NULL ?
if (node->info.pron==NULL && node->tag==NULL) {
inst->exit->tok=inst->state->tok;
inst->exit->n=inst->state->n;
for (k=0;k<inst->exit->n;k++)
inst->exit->set[k]=inst->state->set[k];
}
else {
inst->exit->tok=inst->state->tok;
if (node->info.pron!=NULL) {
// pri->wordpen: word penity
inst->exit->tok.like+=pri->wordpen;
// node->info.pron->prob: Log probability of pronunciation
// pri->pscale: LM (Net probs) scale factor
inst->exit->tok.like+=node->info.pron->prob * pri->pscale;
// P(O) = argmax P(O|W) + LMSF * logP(W) + N * logWIP
}
// append new path to pNoRef
// path->chain = NULL
// path->used = FALSE
// new path only in word node and pass the path info in StepInst2 when extend pri->head
newpth=NewNRefPath();
// point to the node
newpth->node=node;
// ref times == 0
newpth->usage=0;
newpth->frame=pri->frame;
newpth->like=inst->exit->tok.like;
newpth->lm=inst->exit->tok.lm;

// not run here
if ((newpth->align=inst->exit->tok.align)!=NULL)
// MoveAlignYesRef(align)
RefAlign(newpth->align);

// assign to exit node and newpath could pass to next nodes by exit->token
inst->exit->tok.path=newpth;
inst->exit->tok.lm=0.0;
inst->exit->tok.align=NULL;

// assign position: in SetEntryState
// inst->state: start status
oldpth=inst->state->tok.path;

// oldpth != NULL
if ((newpth->prev=oldpth)!=NULL)
// if usage == 0 MovePathYesRef(path) => append pYesRef
// and then usage++
RefPath(oldpth);

搞清楚path相关的一些问题,些直接关系到解码最后一步的搜索。
首先,只有在进行token的外部传递,即处理词节点时才会新建path变量,记录回溯路径,函数NewNRefPath仅仅在StepInst2 => StepWord2用到。StepWord2函数执行完毕之后,紧接着在StepInst2后半部分执行SetEntryStatepath信息伴随着xtok进入了下一个新的节点。

SetEntryState

SetEntryState为当前节点附上一个实例,同时传入上一个词的token,如此token即可在网络中传递了,token中包含上一个节点的路径和分值。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// attach instance to the connected nodes
// SetEntryState(dest->node,&xtok)
// node: connected next node
// src: contains token and path info: 记录前一个节点的token
static void SetEntryState(NetNode *node,TokenSet *src)
{
NetInst *inst;
TokenSet *res;

if (node->inst==NULL)
// append to pri->tail
// call ReOrderList(node)
AttachInst(node);

inst=node->inst;
res=inst->state;

// update instance's token
if (res->n==0) {
// path info assigned: inst->state->tok.path
if (src->tok.like > res->tok.like)
res->tok=src->tok;
}
else
TokSetMerge(res,&src->tok,src);

// inst->max: Likelihood for pruning of instance
if (res->tok.like>inst->max)
inst->max=res->tok.like;
if (node->type==n_word && (pri->wordMaxNode==NULL ||
pri->wordMaxNode->inst==NULL ||
res->tok.like > pri->wordMaxNode->inst->max))
// Most likely word end node in network
pri->wordMaxNode=node;
}

现在path可以由下图表示逻辑

CollectPaths

最后一步,定时整理之前维护的path信息,实现在CollectPaths之中,有些执行的部分已经删除,这一部分还有些疑惑。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
static void CollectPaths(void)
{
NetInst *inst;
TokenSet *cur;
int i,k,n;
Path *path,*plink;
Align *align,*alink;

// for each status in each inst in the pri->head
// modify path->used: in current inst link
for (inst=pri->head.link;inst!=NULL;inst=inst->link)
if (inst->node!=NULL) {
// only model nodes have multiple status
if (node_hmm(inst->node))
n=inst->node->info.hmm->numStates-1;
else
n=1;
// n: status num
// each status
for (i=1,cur=inst->state;i<=n;i++,cur++) {
path=cur->tok.path;
// path->used: Reference to struct by current inst
// not refered in order to avoid duplicate
if (path && !path->used) {
if (path->usage!=0) MovePathYesRef(path);
path->used=TRUE;
}
}

// exit node
path=inst->exit->tok.path;
if (path && !path->used) {
if (path->usage!=0) MovePathYesRef(path);
path->used=TRUE;
}
}

// if no refer, could be deleted
for (path=pri->pNoRef.link;path->link!=NULL;path=plink) {
// not in pri->head
if (!path->used) {
// not run here
if (path->align!=NULL)
DeRefAlign(path->align);
// minus path->prev->prev->usage
// if == 0 add path->prev->prev to pNoRef
DeRefPathPrev(path);
plink=path->link;
// delete path
UnlinkPath(path);
}
// in pri->head
else {
path->used=FALSE;
plink=path->link;
}
}
for (path=pri->pYesRef.link;path->link!=NULL;path=path->link) {
if (!path->used) break;
path->used=FALSE;
}
pri->cpth=pri->npth;
pri->calign=pri->nalign;
}

以上部分是处理观察序列的主要步骤。

3. CompleteRecognition

这部分主要是返回一个Lattice【执行下面这段代码】,同时重置pri这个结构体,为之后的transcript做准备。

1
CreateLattice(heap,pri->net->final.inst->exit,vri->frameDur);

该函数实现如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
static Lattice *CreateLattice(MemHeap *heap,TokenSet *res,HTime framedur)
{
Lattice *lat;
RelToken *cur;
Path path;
WordPron pron;
NxtPath rth[MAX_TOKS];
int nn,nl,ln,i;
NetNode node;

pron.word=NULL;pron.pnum=0;pron.next=NULL;
pron.outSym=NULL;pron.phones=NULL;pron.nphones=0;
pron.prob=0.0;

path.like=res->tok.like;
path.lm=res->tok.lm;
path.usage=0;
path.align=res->tok.align;
path.node=&node;
path.node->tag=NULL;
path.node->info.pron=&pron;
path.frame=pri->frame;
path.prev=res->tok.path;
path.chain=NULL;

// nn: num of nodes
// nl: num of arcs
nn=1;nl=0;ln=0;
// dfs
// modify usage, coding nn, nl
MarkPaths(&path,&nn,&nl);

// single lattice
// lat->lnodes = new LNode(nn)
lat=NewLattice(heap,nn,nl);
lat->voc=pri->net->vocab;
lat->lmscale=pri->scale;
lat->wdpenalty=pri->wordpen;
lat->prscale=pri->pscale;
lat->framedur=framedur;

// Time of word boundary at node
// Word represented by arc
// lnodes: Array of lattice nodes

lat->lnodes[0].time=0.0; lat->lnodes[0].word=NULL;
lat->lnodes[0].tag=NULL;
lat->lnodes[0].score=0.0;

LatFromPaths(&path,&ln,lat);
return(lat);
}

MarkPaths函数主要用来在回溯的过程中给pathusage编号,这将会在之后的执行中用到。该函数执行完毕之后,得出的nn和nl为node个数和弧的个数
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
/* Number/count nodes (in path->usage field) and count links */
// nn = 1, nl = 0;
// modify usage
static void MarkPaths(Path *path,int *nn,int *nl)
{
NxtPath *pth;

// path->usage: Times struct ref'd (by next path)
if (path->usage>=0) {
// minus
path->usage=-(*nn)++;
(*nl)++;
if (path->prev) MarkPaths(path->prev,nn,nl);
// may not run
for (pth=path->chain;pth!=NULL;pth=pth->chain) {
(*nl)++;
if (pth->prev) MarkPaths(pth->prev,nn,nl);
}
}
}

最主要的函数是LatFromPaths,它将path信息整合到Lattice之中,为最后的TranscriptionFromLattice做准备。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
// ln = 0;
// fill lattice from path info
static void LatFromPaths(Path *path,int *ln,Lattice *lat)
{
LNode *ne,*ns;
LArc *la;
// point
Word nullwordId;
NxtPath tmp,*pth;
Align *align,*al,*pr;
MLink ml;
LabId labid,splabid,labpr = NULL;
char buf[80];
int i,frame;
double prlk,dur,like,wp;

// get id of NULL word
nullwordId = GetWord(lat->voc,GetLabId("!NULL",FALSE),FALSE);
// SP: "sp"
splabid = GetLabId(SP,FALSE);

// tmp <= path
// path->prev: Previous word record
tmp.prev=path->prev;
tmp.like=path->like;
// path->chain: Next of NBest Paths
tmp.chain=path->chain;
tmp.lm=path->lm;

// ne: LNode end
// why substract?
// path->usage lower than zero
// assign ne and init ne
ne=lat->lnodes-path->usage;

// path->frame: Time (frame) of boundary (end of word)
// lat->framedur: Frame duration in 100ns units
// ne->time: Time of word boundary at node
ne->time=path->frame*lat->framedur;

// path->node: Word level traceback info
if (path->node->info.pron != NULL)
ne->word=path->node->info.pron->word;
else
ne->word=nullwordId;

ne->tag=path->node->tag;

// ne->v: Pronunciation variant number
if (path->node->info.pron != NULL)
ne->v=path->node->info.pron->pnum;
else
ne->v=1;

ne->score=path->like;

align=path->align;

// pth->chain: Next of NBest Paths
// ln == 0;
for(pth=&tmp;pth!=NULL;pth=pth->chain) {
// arcs
// ln inited by 0
// modify lat->larcs
// ln++ after get la
la=lat->larcs+(*ln)++;

// ns: node start
if (pth->prev) {
ns=lat->lnodes-pth->prev->usage,prlk=pth->prev->like;
} else {
ns=lat->lnodes, prlk=0.0;
}
// la->start: Node at start of word: pointer
la->start=ns;la->end=ne;

// wp: word penalty
if (ne->word==NULL || ne->word==nullwordId)
/* no word or NULL node */
wp=0.0; /* No penalty for current word */
else
wp=pri->wordpen; /* Inc penalty for current word */

// la->aclike: Acoustic likelihood of word
la->aclike=pth->like-prlk-pth->lm*pri->scale-wp;
// la->prlike: Pronunciation likelihood of arc
if (path->node->info.pron != NULL) {
la->aclike-=path->node->info.pron->prob*pri->pscale;
la->prlike=path->node->info.pron->prob;
}
else
la->prlike=0.0;

// Language model likelihood of word
la->lmlike=pth->lm;
// Field used for pruning/sorting
la->score=pth->like;
// la->farc: Next arc following start node
// la->parc: Next arc preceding end node
// ns->foll: Linked list of arcs following node
// ne->pred: Linked list of arcs preceding node
la->farc=ns->foll;la->parc=ne->pred;

ns->foll=ne->pred=la;

// forword search
if (pth->prev!=NULL && ns->word==NULL)
LatFromPaths(pth->prev,ln,lat);
}
}

4. TranscriptionFromLattice

这里通过一个A*搜索算法,在上一步骤填充的Lattice拓扑网络中搜索出来一个最优解或者多个,这里只讨论最优解。
先看搜索前的处理部分:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70

// get best N ans, suppose N = 1
ans=(NBestEntry**) New(&gstack,sizeof(NBestEntry*)*N);ans--;

// through num of nodes
for (i=0,ln=lat->lnodes;i<lat->nn;i++,ln++) {
// ln->foll: following arcs of the nodes
// leaf node?
if (ln->foll==NULL)
ln->score=0.0;
// ~log(0): -oo
else
ln->score=LZERO;
// sorted order init all by -1
ln->n=-1;
}
n=0;
for (i=0,ln=lat->lnodes;i<lat->nn;i++,ln++)
// not ordered: 没有被排序过
if (ln->n==-1)
/*
// nn = 0; num of nodes
static void MarkBack(LNode *ln,int *nn)
{
LArc *la;
// modify flag
ln->n=-2;
// node ln's previous linked nodes
for (la=ln->pred;la!=NULL;la=la->parc)
// == 1: new node
if (la->start->n==-1)
MarkBack(la->start,nn);
ln->n=(*nn)++;
}
*/
MarkBack(ln,&n);

// n: num of nodes
// ln->n: id of nodes by DFS
order=(int*) New(&gstack, sizeof(int)*lat->nn);

// id in the net mapped to the position in the lnodes
// id => pos
for (i=0,ln=lat->lnodes;i<lat->nn;i++,ln++)
order[ln->n]=i;

// backtrack order
// 注意每个节点的score来源
/*
5
/ \
1 4
/ / \
0 2 3
*/
for (i=lat->nn-1;i>0;i--) {
// get the node of position of i in the net
ln=lat->lnodes+order[i];
// get max la->start->score
// arcs from node: 从该节点出发的弧?
for (la=ln->pred;la!=NULL;la=la->parc) {
// LArcTotLike: kinds of factor, like, scalar multiplx together in a arcs
score=ln->score+LArcTotLike(lat,la);
// update la->start->score
if (score>la->start->score)
la->start->score=score;
}
}
Dispose(&gstack,order);
// ...

之后使用AStar算法搜索出一个最优解,维护一个递增队列,不断的取出头节点,push满足条件的后继到这个队列之中,直至队列为空。这里的队列由双向链表实现,根据score排序。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
// init head and tail
// 搜索队列 small => great
head.link=&tail;head.knil=NULL;
tail.link=NULL;tail.knil=&head;
tail.score=head.score=LZERO;

// all the node in the lattice
for (i=0,ln=lat->lnodes;i<lat->nn;i++,ln++) {
// find root
if (ln->pred!=NULL) continue;

// arcs after/pointed to node
for (la=ln->foll;la!=NULL;la=la->farc) {
like=LArcTotLike(lat,la);
score=like+la->end->score;
if (score<LSMALL) continue;

// new entity
newNBE=(NBestEntry*) New(&gstack,sizeof(NBestEntry));
newNBE->score=score;
newNBE->like=like;
newNBE->lnode=la->end;
newNBE->larc=la;
newNBE->prev=NULL;

// score: from small to large
for (pos=head.link;score<pos->score;pos=pos->link);
// insert into OPEN link
newNBE->knil=pos->knil;newNBE->link=pos;
newNBE->knil->link=newNBE->link->knil=newNBE;
}
}

// from small to large
// equal to while(1)
for (n=0,best=head.link;n<N && best!=&tail;best=head.link) {
// search until linklist is empty
if (head.link==&tail) break;
// delete from linklist
best=head.link;
best->link->knil=best->knil;
best->knil->link=best->link;
nent--;

if (best->lnode->foll!=NULL) {
nexp++;
for (la=best->lnode->foll;la!=NULL;la=la->farc) {
// like: like on arcs
like=best->like+LArcTotLike(lat,la);
// like plus nodes score
score=like+la->end->score;
if (score<LSMALL) continue;
newNBE=(NBestEntry*) New(&gstack,sizeof(NBestEntry));

newNBE->score=score;
newNBE->like=like;
newNBE->lnode=la->end;
newNBE->larc=la;
newNBE->prev=best;
// add to the linklist
for (pos=head.link;score<pos->score;pos=pos->link);
newNBE->knil=pos->knil;newNBE->link=pos;
newNBE->knil->link=newNBE->link->knil=newNBE;
nent++;
}
continue;
}
// must be different from previous
// one time get an ans
// first time n == 0;
for (i=1;i<=n;i++)
if (WordMatch(best,ans[i])) {
best=NULL;
break;
}
if (best!=NULL) {
ans[++n]=best;
}
}

5. 补充:ExpandWordNet

该函数返回一个Network结构体,用于后续的解码过程,做的是词网络拓展。调用如下:

1
2
// wdNet: Lattice
net = ExpandWordNet(&netHeap,wdNet,&vocab,&hset);

Lattice通过词网络文件读入内存组织为该结构体:
1
wdNet = ReadLattice(nf,&netHeap,&vocab,TRUE,FALSE)

先说ReadLattice函数:是通过调用ReadOneLattice来返回Lattice的:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
Lattice *ReadLattice(FILE *file, MemHeap *heap, Vocab *voc, 
Boolean shortArc, Boolean add2Dict)
{
Lattice *lat,*list,*fLat;
Source source;

AttachSource(file,&source);

if((lat=ReadOneLattice(&source,heap,voc,shortArc,add2Dict))==NULL)
return NULL;

// has other lattice
if (lat->subLatId!=NULL) {
/* Need to preserve first lattice to return */
// fLat: first Lattice
fLat=lat; lat = (Lattice *) New(heap,sizeof(Lattice)); *lat=*fLat;
// fLat keep previous one
// lat init as previous one
do {
/* Add SUBLAT to hash table for later lookup */
// add lat into the hash table named subLatId
GetSubLat(lat->subLatId,lat);
if((lat=ReadOneLattice(&source,heap,voc,shortArc,add2Dict))==NULL) {
Dispose(heap, fLat); /*fLat points to 1st thing on heap*/
return NULL;
}
}while(lat->subLatId!=NULL);

/* Clear hash table */
GetSubLat(NULL,NULL);
/* Set all chain fields to NULL */
lat->chain=NULL;
SubLatList(lat,NULL,1);
/* Set chain fields to make linked list */
lat->chain=lat;
list=SubLatList(lat,lat,1);
/* Disconnect loop */
list->chain=NULL;
/* Copy last to first Lattices to ensure lat is first thing on stack */
*fLat=*lat; lat=fLat;
}
return(lat);
}

这个函数的简单逻辑应该是如果有多个词网络文件,多次读取否则只读一次。

ReadOneLattice函数是对词网络文件的信息包装到Lattice结构中,包含节点,弧的信息和链接关系以及一些权值,系数的初始化。节点和弧的定义分别为LNodeLArc。他们之间的关系如上下图所示:

函数对Lattice的初始化如下:

1
2
3
4
5
6
7
8
la->start=lat->lnodes+s;
la->end=lat->lnodes+e;
la->lmlike=lmlike;

la->farc=la->start->foll;
la->parc=la->end->pred;
la->start->foll=la;
la->end->pred=la;

下面就进行词网络拓展部分了,这部分主要是根据文件的配置信息决定如何做音素级别的拓展,因为之前的Lattice只是最上层的词网络,节点表示词,边表示转换关系,还没有发音和HMM模型,这些信息都是在最终解码之前需要的,所以网络需要一步一步的充实。

InitPronHolders

首先为了引入发音信息,引入一个PronHolder的结构体,做一个初始化。

1
2
3
4
/* First create context arrays and pronunciation instances */
// frcSil: global string
// net: new network
int nNull = InitPronHolders(net,lat,hci,voc,&holderHeap,frcSil);

首先需要对!NULL做处理,如果是真的空节点,创建一个空的发音实例,否则不做此操作。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
 /* Reset hash table prior to processing lattice */
// wnHashTab: NetNode
for (i=0; i<WNHASHSIZE; i++)
wnHashTab[i]=NULL;

/* Determine if we have a real !NULL word */
// nullWord: Word for output when word==NULL

net->nullWord = GetWord(voc,GetLabId("!NULL", TRUE),TRUE);
// next: Next pronunciation of word
// a word can have kinds of pronunciations
for (thisPron=net->nullWord->pron;thisPron!=NULL;thisPron=thisPron->next)
// not real NULL word
if (thisPron->nphones!=0) {
net->nullWord=NULL;
break;
}

// nullword without pron
// real !NULL word
if (net->nullWord!=NULL) {
if (net->nullWord->pron==NULL)
// add a pron to a given word
NewPron(voc,net->nullWord,0,NULL,net->nullWord->wordName,1.0);
}

// frcSil: Automagically add these sil models to the end of words
// confusion: 这部分搞不懂
if (frcSil!=NULL && strlen(frcSil)>0) {
for(nSil=nAdd=0,ptr=frcSil;ptr!=NULL;ptr=nxt) {
// finish
if ((nxt=ParseString(ptr,name))==NULL) break;
if (name[0]=='+' || name[0]=='-') st=name[0],p=name+1;
else st=0,p=name;
if (strlen(p)==0) labid=NULL;
else labid=GetLabId(p,TRUE);
if (st=='+' || st==0) addPhones[++nAdd]=labid;
if (st=='-' || st==0) silPhones[++nSil]=labid;
}
}
else
nSil=nAdd=0;

接下来初始化节点的剪枝信息,有些地方可能不被执行
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// num of nodes
for (i=0; i<lat->nn; i++) {
float fct;
LArc *la;

// current node
thisLNode = lat->lnodes+i;
fct = 0.0;

// factor lm likelihoods throughout words
// default: false, may not run
if (factorLM && thisLNode->pred!=NULL)
for (la=thisLNode->pred,fct=LZERO;la!=NULL;la=la->parc)
if (la->lmlike>fct) fct=la->lmlike;
// max item in arcs which pointed to itself
// Field used for pruning
thisLNode->score = fct;
}
// factorLM: factor lm likelihoods throughout words
if (factorLM)
for (i=0; i<lat->na; i++) {
LArc *la;
// current arcs
la=NumbLArc(lat,i);
la->lmlike-=la->end->score;
}
// ...

下面是最重要的部分,为Lattice中的每个节点创建发音实例,一个词的发音可以有多个。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
 /* Create instance for each pronunciation in lattice */
for (i=0,nNull=0,t=0; i < lat->nn; i++) {
// current node in lattice
thisLNode = lat->lnodes+i;
// the word that the word represent
// init in ReadOneLattice by GetWordId
thisWord = thisLNode->word;
// replace by NULL word
if (thisWord==NULL) thisWord=voc->nullWord;
thisLNode->sublat=NULL;

// nAdd: get from frcSil, may be 0
pii=(PInstInfo *) New(&gstack,(thisWord->nprons+1)*(nAdd+1)*sizeof(PInstInfo));
pii--;
/* Scan current pronunciations and make modified ones */
// thisWord->pron init in ReadOneLattice
// for each pron in a single word
for (j=1,thisPron=thisWord->pron,npii=0; thisPron!=NULL; j++,thisPron=thisPron->next) {
if (thisPron->nphones==0) n=0;
else
// for each phones
// n inited by thisPron->nphones
for (k=1,n=thisPron->nphones;k<=nSil;k++)
// each phone end with sil?
if (thisPron->phones[thisPron->nphones-1]==silPhones[k]) {
/* Strip it */
n--;break;
}
// n: non-sil phones
if (thisPron->nphones==0 || nAdd==0 || n==0) {
/* Just need one pronunciation */
// !NULL one
if (thisPron->nphones==0) {
nNull++;
}
if (n==0)
n=thisPron->nphones;
pii[++npii].pron=thisPron; pii[npii].silId=-1;
pii[npii].n=n;pii[npii].t=n;
pii[npii].phones=thisPron->phones;
}
// sil phones
else {
/* Make one instance per silence label */
for (k=1;k<=nAdd;k++) {
pii[++npii].pron=thisPron;
pii[npii].silId=k;
pii[npii].n=pii[npii].t=n;
// after modify
if (addPhones[k]!=NULL)
pii[npii].t++;
pii[npii].phones=(LabId *) New(heap,sizeof(LabId)*pii[npii].t);
// copy phones
for(l=0;l<pii[npii].n;l++)
pii[npii].phones[l]=pii[npii].pron->phones[l];
// add sil
if (addPhones[k]!=NULL)
pii[npii].phones[pii[npii].n]=addPhones[k];
}
}
}
/* Scan new pronunciations and remove duplicates */
// npii: all the phones in a word
// default: true

if (remDupPron)
// each pron
for (j=2; j<=npii; j++) {
n=pii[j].t;
if (pii[j].pron==NULL) continue;
for (k=1; k<j; k++) {
if (pii[j].pron==NULL || pii[k].pron==NULL ||
pii[k].t!=n || pii[j].pron->prob!=pii[k].pron->prob)
continue;
// each phones
for(l=0;l<n;l++)
if (pii[j].phones[l]!=pii[k].phones[l]) break;
// equal
if (l==n) pii[j].pron=NULL,t++;
}
}
/* Now make the PronHolders */
for (j=1; j<=npii; j++) {
/* Don't add duplicates */
if (pii[j].pron==NULL) continue;
/* Build inst for each pron */
pInst=NewPronHolder(heap,hci,pii[j].pron,pii[j].t,pii[j].phones);
// ln: Node that created this instance
pInst->ln = thisLNode;
// add pInst into the head of lnode
pInst->next = (PronHolder*)thisLNode->sublat;
thisLNode->sublat = (SubLatDef*) pInst;

// pInst->fct: LM likelihood to be factored into each phone
if (pInst->nphones<=0) pInst->fct = 0.0;
else pInst->fct = thisLNode->score/pInst->nphones;

/* Fake connections from SENT_[START/END] */
// Number of cross word contexts
if (hci->xc>0) {
// start node ?
if (thisLNode->pred==NULL)
// Left contexts
pInst->lc[0]=(NetNode*)lat;
// end node ?
if (thisLNode->foll==NULL) {
if (pInst->nphones==0) lc=0;
// pInst->fc: final context
else lc = pInst->fc;
type = n_word + lc*n_lcontext; /* rc==0 */
wordNode=FindWordNode(net->heap,pInst->pron,pInst,type);
wordNode->tag=SafeCopyString(net->heap,thisLNode->tag);
wordNode->nlinks = 0;
pInst->rc[0]=wordNode;
}
}
else if (thisLNode->foll==NULL) {
wordNode = FindWordNode(net->heap,pInst->pron,pInst,n_word);
wordNode->tag=SafeCopyString(net->heap,thisLNode->tag);
wordNode->nlinks = 0;
}
}
Dispose(&gstack,++pii);
}

这个函数将被执行两次,第一次heap != NULL,第二次heap == NULL
遍历Lattice网络中所有的弧,弧两端节点的不同pron构成全连接关系,第一次仅仅为开端节点标记tag。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// through all the arcs
for (i=0; i<lat->na; i++) {
thisLArc = NumbLArc(lat, i);

// start word's pron
for (lInst=(PronHolder*)thisLArc->start->sublat;lInst!=NULL;lInst=lInst->next)
// end word's pron
for (rInst=(PronHolder*)thisLArc->end->sublat;rInst!=NULL;rInst=rInst->next) {
// xc: Number of cross word contexts
// n_word: Node Instance represents word end
if (xc==0) {
wordNode = FindWordNode(heap,lInst->pron,lInst,n_word);
// first time
if (heap!=NULL)
wordNode->tag=SafeCopyString(heap,thisLArc->start->tag);
if (heap==NULL) {
// links: Array[0..nlinks-1] of links to connected nodes
// one to multi
// wordNode->nlinks = 0;
//
wordNode->links[wordNode->nlinks].node=rInst->starts;
wordNode->links[wordNode->nlinks].like=thisLArc->lmlike;
}
// has many linked nodes
wordNode->nlinks++;
}
}
}

接下来,执行CreateWIModels之前,有一块处理没用看明白。
还有,CreateWIModelsCreateIEModels函数执行的对象是Lattice中每一个节点对应的词中的每一个发音实例。一个发音实例用三音素或者五因素模型建模。这两个函数主要是处理边界音素和中间音素的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
/* Build models on basis of contexts seen */
net->teeWords=FALSE;

// all the nodes
for (i=0; i < lat->nn; i++) {
thisLNode = lat->lnodes+i;
thisWord = thisLNode->word;
if (thisWord==NULL)
thisWord=voc->nullWord;

// word's pron
for(pInst=(PronHolder*)thisLNode->sublat;pInst!=NULL;pInst=pInst->next) {
/* !NULL consists only of word ends */
if (pInst->nphones==0) {
/* Flawed */
if (hci->xc==0) {
/* But we need a pointer for xc==0 cases */
wordNode = FindWordNode(NULL,pInst->pron,pInst,n_word);
// Chain of initial models
pInst->starts = wordNode;
// Number of models in starts chain
pInst->nstart = 0; /* Stops us adding node into chain twice */
}
continue;
}

/* Determine which bits of word are l and r cd */
if (hci->xc>0) {
for (p=0;p<pInst->nphones;p++)
if (GetHCIContext(hci,pInst->phones[p])>=0) break;
for (q=pInst->nphones-1;q>=0;q--)
if (GetHCIContext(hci,pInst->phones[q])>=0) break;
}
else {
p=0;
// nphones: Number of phones for this instance
q=pInst->nphones-1;
}

pInst->tee=TRUE;
/* Make wrd-int cd phones (possibly none!) */
// p = 0; q = pInst->nphones - 1;

CreateWIModels(pInst,p,q,net,hci);
if (hci->xc==0) {
/* Word internal context only */
CreateIEModels(thisWord,pInst,p,q,net,hci);
}
}
}

CreateWIModels

这个函数主要处理非边界音素,该处理过程针对一个发音实例,操作完毕之后的结果如图所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
NetNode *node;
HLink hmm;
int j;

// num of phones
// [0, 4] => 3 2 1 each phone is different
for(j=q-1;j>p;j--) {
// find HMM model for a single phone
hmm=GetHCIModel(hci,FindLContext(hci,pInst,j,0),
pInst->phones[j],
FindRContext(hci,pInst,j,0));

// not tee
if (hmm->transP[1][hmm->numStates]<LSMALL)
pInst->tee=FALSE;

nwi++;
// new a node by hmm, each has at most one connected node
node=NewNode(net->heap,hmm,(pInst->chain==NULL?0:1));
if (pInst->chain!=NULL) {
nil++;
// each phone pointed to the chain, but chain is modified by current node
// so the later one is pointed to the previous one
// 2->3 1->2
node->links[0].node=pInst->chain;
node->links[0].like=pInst->fct;
}
// node->chain may be NULL
node->chain=pInst->chain;
pInst->chain=node;
// 1->2->3
}

CreateIEModels

这里针对边界音素,操作结果可以表示如图:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
NetNode *node,*wordNode;
HLink hmm;

// p = 0 q = nphones - 1: 如果只有一个节点
if (q==p) {
/* One phone word */
hmm=GetHCIModel(hci,0,pInst->phones[0],0);
if (hmm->transP[1][hmm->numStates]<LSMALL) pInst->tee=FALSE;

// the last node is word node
wordNode = FindWordNode(NULL,pInst->pron,pInst,n_word);

nin++; nil++;
node=NewNode(net->heap,hmm,1);

// pointed to a word node
node->links[0].node=wordNode;
node->links[0].like=pInst->fct;

// Chain of initial models
pInst->starts=node;
// Number of models in starts chain
pInst->nstart=1;
}
// 正常情况
else {
// 处理exit节点
hmm=GetHCIModel(hci,FindLContext(hci,pInst,q,0),
pInst->phones[q],0);

if (hmm->transP[1][hmm->numStates]<LSMALL)
pInst->tee=FALSE;

// 该节点之后跟着一个词节点
wordNode = FindWordNode(NULL,pInst->pron,pInst,n_word);

nfi++; nil++;
node=NewNode(net->heap,hmm,1);
// 指向词节点
node->links[0].node=wordNode;
node->links[0].like=pInst->fct;

// Chain of final models
pInst->ends=node;
// Number of models in ends chain
pInst->nend=1;

/* Start */
hmm=GetHCIModel(hci,0,pInst->phones[p],
FindRContext(hci,pInst,p,0));

// #define LSMALL (-0.5E10)
if (hmm->transP[1][hmm->numStates]<LSMALL)
pInst->tee=FALSE;

nin++; nil++;
node=NewNode(net->heap,hmm,1);
// 开始节点只需要调整其后继即可
node->links[0].node=(pInst->chain?pInst->chain:pInst->ends);
node->links[0].like=pInst->fct;
pInst->starts=node;
pInst->nstart=1;

// Chain的最后一个节点指向exit节点
if (pInst->chain!=NULL) {
for (node=pInst->chain;node->chain!=NULL;node=node->chain);
// position to last node
node->nlinks=1;
nil++;
node->links=(NetLink*) New(net->heap, sizeof(NetLink));
node->links[0].node=pInst->ends;
node->links[0].like=pInst->fct;
}
}

最后一部分,进行第二次的ProcessCrossWordLinks,第二次指定开端节点的后继节点,将他们分别指向末端节点发音实例的开始节点,操做完结果图示如下:

上述过程中用到了一个哈希表,寻访Lattice中所有节点所用到的词节点,下面的操作先为这些词节点分配后继空间,之后使用ProcessCrossWordLinks完成这些节点关系的指派。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// Allocate NetLinks from hash table stats:分配后继空间
// Zero counters
// all nodes is word end node
for (i=0; i<WNHASHSIZE; i++) {
/* Build links for each word end model */
for (node=wnHashTab[i];node!=NULL;node=node->chain) {
if (node->nlinks>0){
// alloc node->links memory
node->links=(NetLink*) New(net->heap,sizeof(NetLink)*node->nlinks);
}else
node->links=NULL;

nxl+=node->nlinks;
// reset to 1
node->nlinks=0;
node->aux=0;
}
}

/* Finally put in the cross word links */
// 之前构建控件,在这里指派指向关系
ProcessCrossWordLinks(NULL,lat,hci->xc);

/* First disassemble wnHashTab and link to end nodes as necessary */
// handle init and final node
AddInitialFinal(lat,net,hci->xc);

// net->chain指向的是Lattice中所有词节点和模型节点,线性表示
for (i=0; i<WNHASHSIZE; i++) {
// word node
// append word node list to the net's chain
AddChain(net,wnHashTab[i]);
}

/* Finally chain all nodes together */
for (i=0; i < lat->nn; i++)
for (pInst=(PronHolder*)lat->lnodes[i].sublat;
pInst!=NULL;pInst=pInst->next) {
if (pInst->nstart>0)
AddChain(net,pInst->starts);
AddChain(net,pInst->chain);
AddChain(net,pInst->ends);
}
//...

所以现在总的来看这个层次的网络结构图如下: