This details the recording and endpointing of words
for a large speech database used in the making of a
prototype speech recognizer, a machine to take the
spoken word and create text.
Sitting alone reading lists
tugged a word at a time,
down the syntactical path,
the speaker is
surprised that the list draws her.
I note the rise in inflection and the
strident mispronunciation. She says,
"Why should we speak in isolation,
when it is unnatural."
In the sound room, I
cut away at the waveform, till the
words smart. Clicks, puffs,
frication, severed from words
not pure in sound, dissected from
the speakers' glottis and larynx.
The strippings I discard,
populated by ambient noise latent
in the speech, replaced
by the absolute sound vacuum --
the silence computers know. Now we model
the unquiet environment.
we speak -- for a machine
that can listen but not understand --
utterances, without meaning, masking silence.