Plan 9 from Bell Labs’s /usr/web/sources/contrib/fernan/nhc98/tests/nofib/real/HMMS/Pronunciations.lhs

Copyright © 2021 Plan 9 Foundation.
Distributed under the MIT License.
Download the Plan 9 distribution.



        This module defines data structures for representing
pronunciation models of words and utterances.
        \begin{haskell}{Pronunciations}

> module Pronunciations(
>       module Phones,
>       module BalBinSTrees, module MaybeStateT,
>       Word, DigraphNode, Digraph, PrnNetwork(..),
>       DictionaryEntry,
>       readDictionary, readsPrnNetwork, showPrnNetwork,
>       pre_hmm
>       ) where
> import Char(isSpace,toUpper,isDigit)

\end{haskell}


        The following modules are part of a general library and are
described in later chapters in Part~\ref{part:library}.
        \begin{verbatim}

> import BalBinSTrees
> import Lists
> import MaybeStateT
> import PlainTextIO
> import StateT

\end{verbatim}


        The module \verb~Phones~ was defined in
Chapter~\ref{ch:Phones}.
        \begin{verbatim}

> import Phones

\end{verbatim}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section {Mathematical Representation}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


        The pronunciation model for a word is a {\em pronunciation
network}\index{pronunciation network}, consisting of (1) an acyclic
directed graph~\cite{Harary69} called the {\em pronunciation
digraph}\index{pronunciation digraph}, (2) a subset of nodes
designated as the {\em initial nodes}\index{initial nodes}, and (3) a
subset of nodes designated as the {\em terminal nodes}\index{terminal
nodes}. Each node of a pronunciation digraph is associated with some
data, e.g., a subword unit or a hidden Markov model of a subword unit
(Chapter~\ref{ch:HmmDigraphs}).


        As an example, the pronunciation network for ``EXIT'' is shown
in Figure~\ref{fg:dg1-exit}.  We represent the written form of a word
entirely in upper-case letters for consistency with the format used by
the evaluation software provided by the National Institute of
Standards and Technology (NIST)\index{NIST}.  The initial nodes---only
one in this case---are the destination nodes of the arcs eminating
from the smaller filled circle on the left side of the figure.  The
terminal nodes---again, only one in this case---are the starting nodes
of the arcs leading into the smaller filled circle on the right side
of the figure.  In general, a word can have multiple initial nodes,
multiple terminal nodes, or both.  The two smaller filled circles are
used only for displaying the network structure; they do not represent
nodes in the network. In Figure~\ref{fg:dg1-exit}, the node data are
data constructors of the type \verb~Phone~, which was defined in
Chapter~\ref{ch:Phones}.
        \begin{haskell}{Word}

> type Word = String

\end{haskell}
        \begin{figure}
        \begin{center}
        \input{figs/dg1-EXIT}
        \end{center}
        \caption[]{Pronunciation network for ``EXIT.''}
        \label{fg:dg1-exit}
        \end{figure}



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section {Haskell Representation}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


        We need to give a name to each node in order to represent the
pronunciation network in the computer.  We can't use the phones
associated with the nodes because the same phone may appear more than
once within the same word.  Also, the network will undergo a series of
transformations that will change the data values.  Without loss of
generality, we elect to use positive integers to name the nodes.  The
integer assigned to a node is that node's {\em index}\index{index}.
These indices can be used to construct the predecessor list for each
node, which is how we will represent the pronunciation digraph.  To
provide a uniform representation of all the words in the vocabulary
and to simplify some of the algorithms, we will index the digraph
nodes according to the following conventions:
        \begin{enumerate}
        \label{conventions}

        \item Integers are used in ascending order starting from
              some positive integer with no skips. (Within the dictionary
              file, this starting integer will be 1, but
              under more general circumstances the starting integer
              could be something different.)

        \item A node's index is always larger than the
              indices of all the nodes that precede it in the
              digraph.  A node $x$ {\em precedes\/} a node $y$ if
              there is a directed path starting at $x$ and ending at
              $y$~\cite[p.\ 198]{Harary69}.  Thus, the acyclic
              digraph is {\em topologically sorted}.

        \end{enumerate}
        Under these conventions, the highest indexed node is always a
terminal node, but not necessarily the only one (in general, words may
have more than one terminal node).  Also, the lowest indexed node is
always an initial node, but, again, not necessarily the only one.


        Following these conventions, the nodes of the pronunciation
network for ``EXIT'' can be indexed as shown in
Figure~\ref{fg:dg2-exit}.  Other numberings are possible.
        \begin{figure}
        \begin{center}
        \input{figs/dg2-EXIT}
        \end{center}
        \caption[]{Pronunciation network for ``EXIT,'' where
each node has been assigned an index according to the conventions
listed in the text.}
        \label{fg:dg2-exit}
        \end{figure}



        A single node of a pronunciation digraph is represented as a
pair.  The first component is the data stored with the node (e.g., the
phone symbol) and the second component is the predecessor list for
that node.  By using a type variable for the node data we provide
ourselves the flexibility of having different digraph types.
        \begin{haskell}{DigraphNode}

> type DigraphNode a = (a, [Int])

\end{haskell}


        There are two obvious possibilities for storing the collection
of digraph nodes; we could use either a list or an array.  For
building composite pronunciation networks we will not need to randomly
access the digraph nodes, so it will be convenient to just use lists.
        \begin{haskell}{Digraph}

> type Digraph a = [DigraphNode a]

\end{haskell}


        The digraph nodes are placed in the list in the order of their
indices.  Thus, the index for each node is implicitly encoded by its
position in the list, although it should be remembered that Haskell
lists are indexed from zero while we are indexing digraph nodes
starting with 1 in the dictionary and possibly something else in other
contexts.  Our digraph representation for ``EXIT'' is shown in
Figure~\ref{fg:dgHaskell-EXIT}.
        \begin{figure}
        \begin{verbatim}
 [(EH, []), (K, [1]), (S, [2]), (G, [1]), (Z, [4]), (IX, [3,5]), (T, [6])]
\end{verbatim}
        \caption[]{Digraph representation for ``EXIT.''}
        \label{fg:dgHaskell-EXIT}
        \end{figure}


        The \verb~Digraph~ data structure doesn't include information
about the initial and terminal nodes, the information needed to
completely specify a pronunciation network.  To include this
information, the pronunciation network is represented using an
algebraic datatype.  The data constructor \verb~PrnN~ (for
``pronunciation network'') takes four arguments: the highest node
index used in the digraph representation (this equals the number of
nodes when the lowest index is 1), the list of initial nodes, the list
of terminal nodes, and the list-based digraph representation.  We make
the data type inherit properties from the class \verb~Text~ so we can
easily read and write the representation as plain text.
        \begin{haskell}{PrnNetwork}

> data PrnNetwork a = PrnN Int [Int] [Int] (Digraph a)  deriving Show{-was:Text-}

\end{haskell}
        The Haskell representation for the pronunciation network for
``EXIT'' is shown in Figure~\ref{fg:pm-EXIT}.
        \begin{figure}
        \begin{verbatim}
     PrnN 7 [1] [7] [(EH, []), (K, [1]), (S, [2]), (G, [1]), (Z, [4]),
                     (IX, [3,5]), (T, [6])]
\end{verbatim}
        \caption[]{Pronunciation network representation for ``EXIT.''}
        \label{fg:pm-EXIT}
        \end{figure}



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section {The User-Supplied Dictionary File}
\label{sc:dictionary}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


        The word ``EXIT'' can be represented in a dictionary file as
shown in Figure~\ref{fg:dic-exit}.  First there is a line containing
the word itself.  Then there is a line with three fields: the total
number of nodes in the network, the list of initial node indices, and
the list of terminal node indices.  This is followed by a series of
lines, one for each node.  Each node-description line contains the
node index, followed by the phone symbol, followed by the predecessor
list for that node.  The first field of each node-description
line---the node index---is not really needed, but we include it in the
file as an aid to checking and modifying word pronunciation networks
by hand.
        \begin{figure}
        \begin{center}
        \begin{verbatim}
                         EXIT
                         7  [1]  [7]
                         1       EH      []
                         2       K       [1]
                         3       S       [2]
                         4       G       [1]
                         5       Z       [4]
                         6       IX      [3,5]
                         7       T       [6]
\end{verbatim}
        \end{center}
        \caption[]{Dictionary file entry for the word ``EXIT.''}
        \label{fg:dic-exit}
        \end{figure}


        An example of a dictionary file containing the words ``EXIT,''
``NOW,'' and ``PLEASE'' is shown in Figure~\ref{fg:dictionary}.  Word
representations are separated from each other by blank lines.
        \begin{figure}
        \begin{center}
        \begin{verbatim}
                         EXIT
                         7  [1]  [7]
                         1       EH      []
                         2       K       [1]
                         3       S       [2]
                         4       G       [1]
                         5       Z       [4]
                         6       IX      [3,5]
                         7       T       [6]

                         NOW
                         2  [1]  [2]
                         1       N       []
                         2       AW      [1]

                         PLEASE
                         4  [1]  [4]
                         1       P       []
                         2       L       [1]
                         3       IY      [2]
                         4       Z       [3]

\end{verbatim}
        \end{center}
        \caption[]{An example of a small dictionary file.}
        \label{fg:dictionary}
        \end{figure}


        Thus, the user-provided dictionary file is a plain text file
that contains the pronunciation networks for all the words in the
vocabulary.  The program \verb~ConvertLinearDic~
(Chapter~\ref{ch:ConvertLinearDic}) can be used to convert a simpler
form of dictionary file to this format.


        We now define the functions used to read the dictionary file.
We begin with the following type synonym.  A {\em dictionary entry\/}
is a word paired with its pronunciation network.
        \begin{haskell}{DictionaryEntry}

> type DictionaryEntry a = (Word, PrnNetwork a)

\end{haskell}


        The function \verb~readDictionary~ takes the contents of the
dictionary file and returns a list of the entries.  Later, these
entries could be stored in a data structure that provides more
efficient retrieval than is possible using a linear list.  For
example, the programs ``Transcribe'' and ``BatchTranscribe'' described
in Chapters~\ref{ch:Transcribe} and~\ref{ch:BatchTranscribe} will
convert the dictionary to a balanced binary search tree.  Because of
the tree-building algorithm used, the entries in the user-supplied
dictionary file do not have to be in alphabetical order.
        \begin{haskell}{readDictionary}

> readDictionary :: [Char] -> [DictionaryEntry Phone]
> readDictionary cs =
>       let
>         (w, cs') = break isSpace (dropWhile isSpace cs)
>       in
>         if null w
>         then  []
>         else  case  readsPrnNetwork cs'  of
>               Nothing         -> error ("readDictionary: can't read \
>                                         \the pronunciation network \
>                                         \for `" ++ w ++ "'")
>               Just (pn, cs'') -> (w, pn) : readDictionary cs''

\end{haskell}
        In the definition of \verb~readDictionary~, we get the written
form of the word via the expression \verb~break isSpace (... cs)~
instead of using the Standard Prelude function \verb~lex~ because we
want to allow some characters in our words that aren't handled the way
we want by \verb~lex~, e.g., hyphens.


        The function \verb~readsPrnNetwork~ reads the pronunciation
network information following a word.  We use a ``Maybe State
Transformer'' monad (Chapter~\ref{ch:MaybeStateT}) to structure the
code.  We also make the function polymorphic in the node data,
requiring only that it belong to the class \verb~Text~ so that
pronunciation networks containing different types of node data can be
read.  The function \verb~readsItem~ is defined in the module
\verb~PlainTextIO~ (Chapter~\ref{ch:PlainTextIO}).
        \begin{haskell}{readsPrnNetwork}
 
> readsPrnNetwork :: (Read d) => MST [Char] (PrnNetwork d)
> readsPrnNetwork = readsItem           `thenMST`  \ n  ->
>                   readsItem           `thenMST`  \ is ->
>                   readsItem           `thenMST`  \ ts ->
>                   readsDigraph n      `thenMST`  \ dg ->
>                   returnMST (PrnN n is ts dg)

\end{haskell}


        The function \verb~readsDigraph~ reads in the data for each
node in the digraph.  This function is also polymorphic in the type of
the node data.  The first argument, \verb~n~, is the number of nodes
in the digraph.
        \begin{haskell}{readsDigraph}

> readsDigraph :: (Read d) => Int -> MST [Char] (Digraph d)
> readsDigraph n = if n <= 0
>                  then returnMST []
>                  else readsNode            `thenMST` \ node ->
>                       readsDigraph (n-1)   `thenMST` \ rdg ->
>                       returnMST (node : rdg)

\end{haskell}


        The function \verb~readsNode~ reads the information for a
single node of the pronunciation digraph.  The function
\verb~readsInt~ is defined in the module \verb~PlainTextIO~
(Chapter~\ref{ch:PlainTextIO}).  We need it because the node index is
read but discarded by this function, so the type checker wouldn't know
that \verb~readsItem~ was supposed to be looking for an \verb~Int~.
Thus, we need to be explicit about the fact that the first item to be
read is an \verb~Int~.
        \begin{haskell}{readsNode}

> readsNode :: (Read d) => MST [Char] (DigraphNode d)
> readsNode = readsInt          `thenMST` \ _  ->       -- node index
>             readsItem         `thenMST` \ d  ->       -- node data
>             readsItem         `thenMST` \ preds ->    -- pred list
>             returnMST (d, preds)

\end{haskell}


        For example, applying \verb~readDictionary~ to the file
shown in Figure~\ref{fg:dictionary} yields the list
        \begin{verbatim}
     [ ( "EXIT",   PrnN 7 [1] [7] [(EH,[]), (K,[1]), (S,[2]),
                                   (G,[1]), (Z,[4]), (IX,[3,5]),
                                   (T,[6])] ),
       ( "NOW",    PrnN 2 [1] [2] [(N,[]), (AW, [1])] ),
       ( "PLEASE", PrnN 4 [1] [4] [(P,[]), (L,[1]), (IY,[2]),
                                   (Z,[3])] )
     ]
\end{verbatim}



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section {Building Pronunciation Networks for Complete Utterances}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

        We need to build pronunciation networks for complete
utterances, not just single words.  This construction is performed as
a series of steps.  In the following list, the name of the
corresponding function is shown in parentheses.
        \begin{itemize}

        \item Convert the text to a canonical form
(\verb~prepare_text~, Section~\ref{sb:preparetext}).

        \item Break the text into individual words (\verb~words~,
Standard Prelude).

        \item Look up each word in the pronunciation dictionary
(\verb~look_up_words~, Section~\ref{sb:lookup}).

        \item Place an optional between-word model
(Section~\ref{sb:betweenword}) at the beginning of the utterance and
after each word.

        \item Reindex each pronunciation network
(\verb~reindexPrnNetworks~, Section~\ref{sb:reindex}).

        \item Join the individual models into a single network model
for the entire utterance.\footnote{In the first version of this
module, the reindexing and joining was done within one function.
While simple, this approach resulted in an algorithm with quadratic
complexity.}

        \end{itemize}


        The flow of processing is summarized in
Figure~\ref{fg:overall-flow}.  As a concrete example, the models for
the words ``PLEASE,'' ``EXIT,'' and ``NOW'' and for the utterance
``PLEASE EXIT NOW'' are shown in Figure~\ref{fg:please-exit-now}.
        \begin{figure}
        \begin{center}
\begin{tabular}{cp{2.3in}}
\hline
\rule{0in}{3ex} $Text$ & \\
$\Downarrow$ & \\
$Text'$ & Convert to canonical form\\
$\Downarrow$ & \\
$[w_1,w_2,\ldots,w_L]$ & break into individual words\\
$\Downarrow$ & \\
$[P_{w_1},P_{w_2},\ldots,P_{w_L}]$ & look up the pronunciation models \\
$\Downarrow$ & \\
$[P_{bwm},P_{w_1},P_{bwm},P_{w_2},P_{bwm},\ldots,P_{bwm},P_{w_L},P_{bwm}]$
& interleave the between-word model \\
$\Downarrow$ & \\
$[P'_{bwm},P'_{w_1},P'_{bwm},P'_{w_2},P'_{bwm},\ldots,P'_{bwm},P'_{w_L},P'_{bwm}]$
& reindex each network \\
$\Downarrow$ & \\
$P'_{bwm} \oslash ( (P'_{w_1} \oplus P'_{bwm}) \otimes \ldots \otimes
                    (P'_{w_L} \oplus P'_{bwm}) )$
& reduce to a single network for the entire utterance \\
$\Downarrow$ & \\
$P_{Text}$  & \\[0.10in]
\hline
\end{tabular}
\end{center}
        \caption[]{Overall flow of processing.  $P_{bwm}$ denotes the
pronunciation network of the between-word model and $P_{Text}$ denotes
the pronunciation network for the utterance text.  The operators
$\oslash$, $\oplus$, and $\otimes$ correspond to the functions
\verb~joinNets1~, \verb~joinNets2~ and \verb~joinNets~, respectively,
(Section~\ref{sb:joining}).}
        \label{fg:overall-flow}
        \end{figure}


        \begin{figure}
        %
        \begin{center}
        \subfigure[``PLEASE'']{%
        \hbox to 4.00in{
        \input{figs/dg-PLEASE}}}
        \end{center}
        %
        \begin{center}
        \subfigure[``EXIT'']{%
        \input{figs/dg2-EXIT}}
        \end{center}
        %
        \begin{center}
        \subfigure[``NOW'']{%
        \hbox to 4.00in{
        \input{figs/dg-NOW}}}
        \end{center}
        %
        \begin{center}
        \subfigure[``PLEASE EXIT NOW.'']{%
        \input{figs/dg-PLEASEEXITNOW}}
        \end{center}
        %
        \caption[]{Pronunciation networks for the words ``PLEASE,''
``EXIT'' and ``NOW,'' and for the complete utterance ``PLEASE EXIT
NOW.''  The model for the complete utterance includes an optional
silence model at the beginning, end, and between each word.}
        \label{fg:please-exit-now}
        \end{figure}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection {Converting the Text to a Canonical Form}
\label{sb:preparetext}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


        The first task is to transform the text string into a
canonical form.  We define a function \verb~prepare_text~ to perform
this transformation.  The function implements the following steps:
        \begin{itemize}

        \item Sometimes the transcription file contains two integers
preceding the text of the utterance (e.g., the TIMIT ``\verb~.txt~''
transcription files).  These numbers represent the starting and
stopping sample numbers for the entire utterance.  We need to remove
them and any white space that preceeds the text.  Note that the text
itself cannot start with a digit or the function \verb~prepare_text~
will remove that too!

        \item Drop any word-external punctuation
characters.\footnote{Since we control the contents of the textual
transcription files, we can forbid abbreviations like `e.g.,', `c.f.,'
etc., requiring that they be spelled out.  This requirement allows us
to remove all periods without having to check to see if they are part
of an abbreviation.}  We retain some characters, like hyphens,
underscores, and tildes.

        \item Change all of the letters in the string to upper-case.
Each word is represented in the dictionary in only one form, and we
have chosen to use all upper-case letters since that is consistent
with NIST's ``snr'' format.

        \end{itemize}
        \begin{haskell}{prepare_text}

> prepare_text = map toUpper . 
>                  filter (`notElem` "!,.:;?") .
>                    dropWhile (\c -> isDigit c || isSpace c)

\end{haskell}



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection {Looking Up Pronunciation Models}
\label{sb:lookup}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

        \begin{haskell}{look_up_words}

> look_up_words :: BalBinSTree Word (PrnNetwork Phone) ->
>                  [Word] ->
>                  [PrnNetwork Phone]

> look_up_words dictionary = map (bbstLookUp dictionary)

\end{haskell}



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection {A Between-Word Model}
\label{sb:betweenword}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

        We create a between-word model to be placed before and at the
end of an utterance as well as between each word of an utterance.  For
now this is just silence, represented by the constructor
\verb~SIL~. Later, we might want to enhance the model to include
filled pauses, e.g., ``ums'' and ``ahs.''
        \begin{haskell}{between_word_model}

> between_word_model = PrnN 1 [1] [1] [(SIL,[])]

\end{haskell}
        The function \verb~interleave~ in the library module
\verb~Lists~ (Chapter~ref{ch:Lists}) can be used to interleave the
between-word model through the list of words.



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection {Reindexing Pronunciation Networks}
\label{sb:reindex}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


        The function \verb~reindexPrnNetwork~ is used to increment the
node indices by a fixed amount.  It returns the newly indexed network
and the sum of the increment and the old highest index value.  The
function uses a State Transformer monad (Chapter~\ref{ch:StateT}).
        \begin{haskell}{reindexPrnNetwork}

> reindexPrnNetwork :: (PrnNetwork a) -> ST Int (PrnNetwork a)
> reindexPrnNetwork (PrnN n is ts dg) inc = (PrnN n' is' ts' dg', n')
>       where
>       n'        = n + inc
>       increment = (inc+)
>       is'       = map increment is
>       ts'       = map increment ts
>       dg'       = mapsnd (map increment) dg

\end{haskell}


        The function \verb~reindexPrnNetworks~ is used to reindex
every pronunciation network in a list of such networks, starting with
an increment of 0.  Thus, the first network in the list will be
unchanged, the second network's indices will be incremented by the
number of nodes in the first network, the third network's indices will
be incremented by the number of nodes in the first two, etc.  The
functions \verb~maplST~ and \verb~startingFrom~ are defined in the
module \verb~StateT~ (Chapter~\ref{ch:StateT}).
        \begin{haskell}{reindexPrnNetworks}

> reindexPrnNetworks :: [PrnNetwork a] -> [PrnNetwork a]
> reindexPrnNetworks ps = maplST reindexPrnNetwork ps `startingFrom` 0

\end{haskell}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection {Joining Pronunciation Networks}
\label{sb:joining}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


        We now define three functions for joining pronunciation
networks.  Note that in all three cases, the functions assume that the
second network of the pair has been properly reindexed so that the new
combination satisfies our indexing conventions
(page~\pageref{conventions}).  The required condition is that the
lowest index of the second network equals the highest index of the
first network plus 1.



        The function \verb~joinNets~ joins two pronunciation networks
into a single pronunciation model where neither network is optional.
The initial nodes of the new network are those of the first, the
terminal nodes are those of the second, and every node that is an
initial node of the second network has its predecessor list augmented
by the terminal nodes of the first.
        \begin{haskell}{joinNets}

> joinNets :: PrnNetwork a -> PrnNetwork a -> PrnNetwork a

> joinNets (PrnN n1 is1 ts1 dg1) (PrnN n2 is2 ts2 dg2) =
>       PrnN n2 is1 ts2 dg'
>       where
>       dg' = dg1 ++ [ if k `elem` is2
>                         then  (d, ts1 ++ ps)
>                         else  (d,     ps   )
>                      | (k, (d,ps)) <- zip [n1+1..] dg2 ]

\end{haskell}


        We need another version of this function for joining two word
models when the second word of the pair is optional but the first word
is not, e.g., when the second word is an optional ``between-word''
model (Section~\ref{sb:betweenword}).  The only difference between
\verb~joinNets~ and \verb~joinNets2~ is that \verb~joinNets2~ produces
a different terminal node list; all of the terminal nodes of the first
network are also terminal nodes of the new network.
        \begin{haskell}{joinNets2}

> joinNets2 :: PrnNetwork a -> PrnNetwork a -> PrnNetwork a

> joinNets2 (PrnN n1 is1 ts1 dg1) (PrnN n2 is2 ts2 dg2) =
>       PrnN n2 is1 (ts1 ++ ts2) dg'
>       where
>       dg' = dg1 ++ [ if k `elem` is2
>                         then  (d, ts1 ++ ps)
>                         else  (d,     ps   )
>                      | (k, (d,ps)) <- zip [n1+1..] dg2 ]


\end{haskell}

        Finally, we need still another version for joining two
networks when the first network of the pair is optional but the second
is not.
        \begin{haskell}{joinNets1}

> joinNets1 :: PrnNetwork a -> PrnNetwork a -> PrnNetwork a

> joinNets1 (PrnN n1 is1 ts1 dg1) (PrnN n2 is2 ts2 dg2) =
>       PrnN n2 (is1 ++ is2) ts2 dg'
>       where
>       dg' = dg1 ++ [ if k `elem` is2
>                         then  (d, ts1 ++ ps)
>                         else  (d,     ps   )
>                      | (k, (d,ps)) <- zip [n1+1..] dg2 ]

\end{haskell}

        In the future these functions will be modified to incorporate
word-junction phonological rules\index{phonological rules}.


        In order to carry out the last transformation shown in
Figure~\ref{fg:overall-flow}, we use a two-operator
generalization of the Standard Prelude function \verb~foldr1~ called
\verb~foldr1_2op~, defined in the module \verb~Lists~
(Chapter~\ref{ch:Lists}).


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection {Modeling an Utterance}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


        We now take the tools developed so far and define the
functions that will take a string like ``PLEASE EXIT NOW'' and return
its phonetic pronunciation network.  We will also define a function
that modifies the network in preparation for the use of hidden Markov
models.  In that version of the network, the node data will be an
ordered pair.  The first component of the pair will be the phone and
the second component will be the number of successors of that node in
the pronunciation digraph.  The number of successors is used to adjust
the exit probabilities of the corresponding HMM.


        The function \verb~build_pron_network~ implements these steps:
        \begin{haskell}{build_pron_network}

> build_pron_network :: BalBinSTree Word (PrnNetwork Phone) ->
>                       String ->
>                       PrnNetwork Phone

> build_pron_network dictionary =
>         (\(n:rns) -> n `joinNets1` (foldr1_2op joinNets2 joinNets rns))
>       . reindexPrnNetworks
>       . interleave between_word_model
>       . look_up_words dictionary
>       . words

\end{haskell}


        In order to properly adjust the exit probabilities for the
phonetic HMMs, we need to know the number of successors for each node
in the pronunciation digraph.  This number is made part of the node
data; specifically, it becomes the second component of an ordered
pair.
        \begin{haskell}{add_num_succs}

> add_num_succs :: PrnNetwork a -> PrnNetwork (a, Int)

> add_num_succs  (PrnN n is ts dg)  =  PrnN n is ts dg'
>       where
>       dg' = add_num_succs' 1 dg  -- recall that the first node is
>                                  --   indexed with a `1'

> add_num_succs'  indx  ((d,preds):remnodes) =
>       ((d, nsuccs),preds) : add_num_succs' (indx + 1) remnodes
>       where
>       (_, predls) = unzip remnodes
>       nsuccs      = length (filter (indx `elem`) predls)

> add_num_succs' indx [] = []

\end{haskell}


        A flow diagram of the processing up to this point is shown in
Figure~\ref{fg:flow-upto-hmms}.
\begin{figure}
\begin{center}
        {\tt "PLEASE EXIT, NOW."}
        \[
        \Downarrow
        \]
        {\tt ["PLEASE", "EXIT,", "NOW."]}
        \[
        \Downarrow
        \]
        {\tt ["PLEASE", "EXIT", "NOW"]}
        \[
        \Downarrow
        \]
        \input{figs/word_pieces}
        \[
        \Downarrow
        \]
        \input{figs/dg-PLEASEEXITNOW}
        \vspace{-0.50in}
        \[
        \Downarrow
        \]
        \input{figs/dg2-PLEASEEXITNOW}
        \end{center}
        \caption[]{An example showing the flow of processing up to the
point at which we are ready to incorporate the HMMs.}
        \label{fg:flow-upto-hmms}
\end{figure}

        For the program ``BatchTranscribe''
(Chapter~\ref{ch:BatchTranscribe}) it will be useful to combine these
functions into a single function that performs all the steps.
        \begin{haskell}{pre_hmm}

> pre_hmm dic = add_num_succs . build_pron_network dic . prepare_text

\end{haskell}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section {Pretty-Printing a Pronunciation Network}
\label{sc:pprintNetwork}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


        It is useful to have a function to print easily readable
representations of pronunciation digraphs whenever the node data is in
the class \verb~Text~.  The function \verb~showPrnNetwork~ prints a
list-based pronunciation network in the same format as it would appear
for a word in the dictionary file (see Section~\ref{sc:dictionary}).
        \begin{haskell}{showPrnNetwork}

> showPrnNetwork  (PrnN n is ts dg) =
>       show n ++ "  " ++ show is ++ "  " ++ show ts ++ "\n"  ++ 
>         unlines (map showIndexedDigraphNode (zip [1..] dg))

\end{haskell}
\fixhaskellspacing\begin{haskell}{showIndexedDigraphNode}

> showIndexedDigraphNode :: (Show d) => (Int, DigraphNode d) -> String
> showIndexedDigraphNode (n,(d,ps)) =
>       shows n ('\t' : shows d ("  \t" ++ show ps))

\end{haskell}


        Using this pretty-printing function, the results of processing
the text ``SHOW DEPARTURES FROM ATLANTA FOR AMERICAN,'' one of the Air
Travel Information System (ATIS) sentences from the ATIS2 database
distributed by the Linguistic Data Consortium (LDC), is shown in
Figure~\ref{fg:atis2-example}.
        \begin{figure}
        \begin{verbatim}
                 40  [1, 2]  [39, 40]
                 1      (SIL, 1)        []
                 2      (SH, 1)         [1]
                 3      (OW, 2)         [2]
                 4      (SIL, 1)        [3]
                 5      (D, 1)          [3, 4]
                 6      (AX, 1)         [5]
                 7      (P, 1)          [6]
                 8      (AA, 1)         [7]
                 9      (R, 1)          [8]
                 10     (CH, 1)         [9]
                 11     (AXR, 1)        [10]
                 12     (Z, 2)          [11]
                 13     (SIL, 1)        [12]
                 14     (F, 1)          [12, 13]
                 15     (R, 1)          [14]
                 16     (AH, 1)         [15]
                 17     (M, 2)          [16]
                 18     (SIL, 1)        [17]
                 19     (AE, 1)         [17, 18]
                 20     (T, 1)          [19]
                 21     (L, 1)          [20]
                 22     (AE, 1)         [21]
                 23     (N, 1)          [22]
                 24     (T, 1)          [23]
                 25     (AX, 2)         [24]
                 26     (SIL, 1)        [25]
                 27     (F, 2)          [25, 26]
                 28     (AO, 1)         [27]
                 29     (R, 2)          [28]
                 30     (AXR, 2)        [27]
                 31     (SIL, 1)        [29, 30]
                 32     (AX, 1)         [29, 30, 31]
                 33     (M, 1)          [32]
                 34     (EH, 1)         [33]
                 35     (R, 1)          [34]
                 36     (IX, 1)         [35]
                 37     (K, 1)          [36]
                 38     (IX, 1)         [37]
                 39     (N, 1)          [38]
                 40     (SIL, 0)        [39]
        \end{verbatim}
        \caption[]{The results of applying the function {\tt pre\_hmm}
to the utterance ``SHOW DEPARTURES FROM ATLANTA FOR AMERICAN'' as
displayed using the pretty-printing function {\tt showPrnNetwork}.}
        \label{fg:atis2-example}
        \end{figure}


%%%%%%%%%%  End of Words.lhs  %%%%%%%%%%

Bell Labs OSI certified Powered by Plan 9

(Return to Plan 9 Home Page)

Copyright © 2021 Plan 9 Foundation. All Rights Reserved.
Comments to [email protected].