Visualising the spoken word with dictographic notation

In 1996, while in my third year at Cent­ral St Mar­tins col­lege of design, I devised a basic nota­tion­al sys­tem for  visu­al­ising the spoken word. The sys­tem which I named “dic­to­graphy” adap­ted aspects of stand­ard music­al nota­tion as well as typo­graph­ic con­ven­tions. Since the first ver­sion I have fur­ther expan­ded dic­to­graphy into a more com­plete nota­tion system.


As we listen to someone speak, subtle vari­ations in inton­a­tion, volume, speed and rhythm con­trib­ute more to our under­stand­ing than words alone. Con­ven­tion­al typo­graphy com­mu­nic­ates pure, refined con­tent, stripped of most of the emo­tion. Unless we high­light a word with ital­ic, any inform­a­tion about someone’s tone of voice must be annot­ated into the text. Such a marked dif­fer­ence between speech and text means that we have one voice for speak­ing and anoth­er for writ­ing. Dic­to­graphy tries to bridge this divide.

Archers Score [A4]-02

In dic­to­graph­ic nota­tion, con­ven­tions of typo­graph­ic and music­al nota­tion are com­bined and aug­men­ted. The four basic prop­er­ties of speech: pitch, volume, tone and speed are divided into sep­ar­ate chan­nels. These ele­ments are then encoded accord­ing to a set of rules which use rel­at­ive pos­i­tion, visu­al weight (bold­ness and con­densed-expan­ded) to back­ground and text col­our, word spacing.

That basic frame­work is fur­ther aug­men­ted with sym­bols relat­ing to indi­vidu­al vocal char­ac­ter­ist­ics: key sig­na­ture gives inform­a­tion on over­all tone, the speak­ers’ sex, nation­al­ity and accent and stand­ard pitch of the voices (think bass, bari­tone, treble and sop­rano). The ends of phrases or sen­tences are marked with a large blue dot. Phrases need­ing exclam­a­tions or ques­tion marks add Span­ish-style inver­ted marks before the phrase, issu­ing advanced warn­ing to read­ers. Any non-spe­cif­ic vocal sounds such as a tut or a click of the tongue is indic­ated with an orange star. Like­wise, a trem­bling voice is rep­res­en­ted by a trill mark. Finally, en dashes are replaced with a dis­crete arrow­head because of the risk of con­flicts with the stave lines.


The inside front cov­er looks, at first glance, to be entirely abstract but is in fact a ren­der­ing of the Arch­ers epis­ode as a sample:


Ana­lys­ing a 5 minute scene from BBC Radio 4’s The Arch­ers took a huge amount of work, codi­fy­ing each speaker’s pitch, speed, volume and emo­tion­al cues. It should be pos­sible to auto­mate this using a soft­ware tool, or per­haps one day even to auto­mate speech record­ing dir­ectly into dic­to­graph­ic nota­tion. That would give a dif­fer­ent look to Hansard!


This close-up view shows the level of detail I had to go into in order to pro­duce the transcript:


Obvi­ously, this sys­tem still has ser­i­ous lim­it­a­tions. It can­not truly por­tray the vast sub­tlety of vocal dynam­ics, har­mon­ics and the bar­rage of oth­er vari­ables inher­ent in any­thing as com­plex as human speaking.

But by offer­ing sev­er­al more lay­ers of data into the text stream, it can offer a rich­er repro­duc­tion than con­ven­tion­al text and annota­tions can con­vey and I hope that might be of value to cer­tain pro­fes­sion­als for whom con­vey­ing mean­ing through speech has a value.

I would really value feed­back and sug­ges­tions on any ways you can see that I could fur­ther improve this. I would espe­cially love to hear from any speech writers, screen and stage writers and radio pro­du­cers. Would dic­to­graphy would use­ful to you and your colleagues?