HFVE research

I've done some presentations on the HFVE project. Click the links below for more details (PDF format) :-

Poster for Audio Mostly & International Conference on Auditory Display (AMICAD), 30th June - 4th July 2025, Coimbra, Portugal. (Updated 25th June 2025.)
Paper for 25th International Conference on Auditory Display (ICAD 2019), 23–27 June 2019, Northumbria University, UK. (Updated 7th July 2019.) *** Hyundai Motors Design Challenge Award Best Paper ***
Paper for 5th Interactive Sonification Workshop (ISon 2016), 16th December 2016, Bielefeld, Germany.
Paper for 4th Interactive Sonification Workshop (ISon 2013), 10th December 2013, Erlangen, Germany.
Paper for 3rd Interactive Sonification Workshop (ISon 2010), 7th April 2010, Stockholm, Sweden.
Paper for 4th International Workshop on Haptic and Audio Interaction Design (HAID 2009), 10th - 11th September 2009, Dresden, Germany.
Poster for SID (Sonic Interaction Design) Workshop, 2nd April 2009, York, UK. (Updated 23rd April 2009.)
Poster for 3rd International Workshop on Haptic and Audio Interaction Design (HAID 2008), 15th - 16th September 2008, Jyväskylä, Finland.
Paper for 2nd International Workshop on Interactive Sonification (ISon 2007), 3rd February 2007, York, UK.
Poster for 1st International Workshop on Haptic and Audio Interaction Design (HAID 2006), 31st August - 1st September 2006, Glasgow, Scotland.
Display poster for " " " .

HFVE software development is nearing completion, and implements many of the features described on this page. Click here for demo videos. Click here to try HFVE!

ISon 2013 & 2016 poster/demo sessions, and ICAD 2019 paper presentation

I demonstrated the HFVE system at the 4th and 5th Interactive Sonification Workshop, showing the software running on a small tablet computer (and laptop); with tactile shapes etc. presented via an adapted Microsoft Sidewinder Force Feedback 2 joystick, and via a Logitech Wingman Force Feedback Mouse. More recently I presented the system at the 25th International Conference on Auditory Display (ICAD 2019) where my paper received Hyundai Motors Design Challenge Award Best Paper.

N.B. Most of the rest of this page has not been updated since 2001 (many years ago!), except for a name change from "STVE/STeViE" (Sono-Tactile Visual Effects) to "HFVE" (Heard & Felt Vision Effects). (Previously it was called "Vuphonics".)

The basic principles remain the same, although corners/vertices are now emphasised; an additional "buzz track" is used to clarify shapes; "imprints" rapidly summarize images; and several new features have been added - see the HFVE software page for a summary and short demo videos. Click here to try HFVE!

David Dewhurst 4th November 2022

ORIGINAL CONCEPT (now somewhat superseded!)

Introduction

The "HiFiVE" (Heard & Felt Vision Effects) system is an experimental sensory-substitution system for use by blind and deafblind people, and this website describes work-in-progress.

The HiFiVE system highlights the features of visual images that are normally perceived categorically, by substituting with coded sound effects and their tactile equivalents. It simulates the instant recognition of properties and objects that occurs in visual perception, by using the near-instantaneous recognition of phoneme sounds that occurs in speech. By listening to coded phonetic sounds (and feeling corresponding tactile/braille effects), the user can instantly understand the colours, textures, distances and entities that are present in an image. The system also conveys shape, location, "fine texture" and change.

For beginners, the system can speak actual words, which directly describe the properties and entities being conveyed. The words can be moved in sound space to convey the shape of an item, for example a red circle :-

Direct-description sounds of the red circle (MP3-compressed 28KB).
.WAV MS ADPCM-compressed version of the same sounds (43 KB).
.WAV uncompressed version of the same sounds (172 KB).

The stereo sound samples (which are best heard through headphones) are in MP3 and .WAV formats. Click on the corresponding link to download them in the format that you prefer.

Volume fluctuations can be added on top of the basic sounds, to convey the "fine texture" of an area or entity :-

Direct-description sounds of the red circle, with "fine texture" effects (MP3 28 KB).
.WAV MS ADPCM-compressed version of the same sounds (43 KB).
.WAV uncompressed version of the same sounds (172 KB).

These volume-variations combine the effect of small variations in brightness, colour, and distance, to produce a texture effect.

Coded phonetics

Instead of speaking actual words, the system can also output "coded phonetics" : for example, if the consonant sound "NN" represents the colour purple, and the vowel sound "EE" represents white, then a purple and white zigzag can be represented by the sound "NNEE", repeated if necessary, and moved in "sound-space" :-

Coded phonetic sounds of the purple and white zigzag (MP3 28 KB).
.WAV MS ADPCM-compressed version of the same sounds (43 KB).
.WAV uncompressed version of the same sounds (174 KB).

Several of these moving audio tracers can be combined to convey 2-dimensional composite audio graphics :-

Purple and white composite parallelogram

Sounds of the purple and white parallelogram (alternating direction) (MP3 60 KB).
.WAV MS ADPCM-compressed version of the same sounds (87 KB).
.WAV uncompressed version of the same sounds (345 KB).

Property types

The coded phonetics used for the examples are taken from the table below, which gives the consonant (column 2) and vowel (column 3) sounds (shown in 2-letter phonetic format) used to convey properties (column 4 shows example words in which the parts in capitals sound like the corresponding consonant and vowel sounds).

The default property type is called "DInCoTex", as it combines the properties of Distance, INteger, COlour and TEXture onto two scales : the system selects one of the properties shown in column 5 (DInCoTex1) and one of the properties shown in column 6 (DInCoTex2), and presents them via their corresponding consonant and vowel sounds (it selects the two properties that best describe the area or entity being presented). Other property types are also available, for example Layout (shown in column 7 and described below).

Assignments of coded sounds to properties.
(1) Nu- mb- er	(2) Conso- nant (C)	(3) Vow- el (V)	(4) Example (Cs & Vs in caps.)	(5) DInCoTex1 (Cons./C or Vowel/V. Numerical categories are for objects.)	(6) DInCoTex2 (Vowel/V.)	(7) Layout (Dark & Light.)
0	WH	EE	WHEAt	White	White	LLLL
1	SS	IH	SIt	Red	Red	LLLD
2	FF	EH	FEd	Green	Green	LLDL
3	SH	AH	SHAck	Blue	Blue	LLDD
4	HH	EY	HAY	Orange	Orange	DLLL
5	TH	AY	THIGH	Yellow	Yellow	LDLL
6	RR	AW	ROUnd	Brown	Brown	LDDD
7	NN	OY	NOIse	Purple	Purple	DLDD
8	DH	OW	THOUGH	Grey	Grey	DLDL
9	YY	AX	YOUng	Text-like, characters and symbols	Polychrome	LDLD
10	LL	ER	LEARn	Radial or loops / Few : 0-4	Close : <70cm	LDDL
11	MM	YR	MERE	Wavy or wiry / Some : 5-16	Near:0.7-1.5m	DLLD
12	ZH	AR	JARdin	Mainly horizontal / Several : 17-64	Medium:1.5-7m	DDDL
13	VV	OR	VORtex	Mainly vertical / Many : 65-256	Far : 7m-50m>	DDLD
14	ZZ	OO	ZOO	Tessellated / Lots : >256	Remote: >50m	DDLL
15	WW	UH	WOOd	Black	Black	DDDD

By using sequencing rules, the system can convey several property types at once. For example, the coded phonetics can (optionally) also describe the layout of properties within an area or entity, as well as conveying the DInCoTex properties : the consonant sound "DH" can represent the sequence of light levels "dark-light-dark-light", and the vowel sound "OO" can represent "dark-dark-light-light" (see column 7 of the table above). The DInCoTex colours Orange and Yellow are represented by the coded phonetics "HHAY", and these sounds and the Layout sounds can be output in DInCoTex-Layout order ("HHAY-DHOO"), the sounds being formed into the shape of the entity concerned, for example a triangle :-

Coded colour and layout sounds of the orange and yellow triangle (MP3 28 KB).
.WAV MS ADPCM-compressed version of the same sounds (43 KB).
.WAV uncompressed version of the same sounds (172 KB).

Viewport containing layouts and an object

Viewport containing layouts and an object

Recognised entities

Recognised entities can be conveyed by replacing the C&V (consonant & vowel) pairs that represent layouts with more complex consonant groups, which represent the objects. The example opposite shows how a small area of an image (known as a "viewport"), containing layouts and an object, can be conveyed via coded phonetics and (in the tactile modality) via braille.

Tactile effects

The audio effects have tactile equivalents which can be produced by using : a moving powered pointer to convey the location within an image; a tactile pad (or moving powered pointer) to convey shape; and braille or other touch-based methods to convey the categorical properties.

As there are 16 basic categorical speech consonants and 16 vowels, a "C&V pair" conveys one of 256 possible combinations (16 x 16), so the combination of two DInCoTex properties can be displayed on a single 8-dot (i.e. 8-bit) braille cell, with each half of the cell (i.e. 4 dots) conveying one property. Layouts and objects can also be presented in braille format, as illustrated in the example above.

Using both modalities allows the user to spread the load of information to suit their needs and abilities, and could be used by deafblind people.

Summary

The HiFiVE system aims to simulate the way that sighted people perceive visual features, rather than conveying raw optical measurements : scanning methods can convey straightforward images effectively, but information overload may occur if too much detail is conveyed in a short period of time. The HiFiVE approach uses speech-like sounds, consisting of specific coded phonetics that can be rapidly interpreted in a categorical and linguistic way. By continuously changing the pitch and binaural positioning of the sounds, they can be made to move, whether following a systematic path or conveying a specific shape.

Visual perception is complicated, and a degree of complication is inevitable if several aspects of vision are to be substituted via audiotactile means. Although the system may initially appear to consist of several unconnected features, the features can generally operate together, with the user controlling the effect of each feature, as well as the resolution, speed of presentation etc.

The rest of this page describes the HiFiVE system in more detail.

Features of HiFiVE

Audiotactile Tracers

"Audiotactile tracers" are apparently-moving audio and tactile effects that can be in the form of "shape-tracers", which trace out the significant shapes of features and identified objects within an image, by continuously changing the pitch and binaural positioning of the sounds (the "sound space" uses a high = high-pitch / low = low-pitch convention, with a frequency range of 200 to 400 Hz and a musical scale).

Alternatively the tracers can systematically move round an area while outputting the properties of the parts that they are conveying at any moment (these are known as "area-tracers").

Examples of "area-tracer" scanning patterns

(N.B. Area-tracer scanning patterns similar to some of these have been used in systems developed by others - see links below.)

In the tactile modality, tracer location and movement can be conveyed via a moving powered pointer (see below). Moving effects are generally easier to mentally position than stationary ones.

Coded phonetics

People can easily recognise speech-like sounds and rapidly assign meanings to them. Speech is a natural and efficient method of conveying information, it is perceived in a classified/coded way, and the information content is not greatly effected by distortion. Most people are able to retain several spoken words in their short-term memory, including "nonsense" words. The use of natural-language words to describe shapes in an image has been investigated before, but the HiFiVE system uses new "words" assembled from the component sounds of English, which convey information in a coded format. These coded phonetics allow a lot of information to be conveyed in a short period of time, and can convey additional information by being modified in pitch and binaural positioning, so that they become moving tracers. The effort needed to learn the coded phonetics is low.

Visual properties are presented to the user via combinations of 16 consonant (C) and 16 vowel (V) sounds, which are assembled to produce "CV CV ..." strings, that convey the properties via a convention. The user can recognise the sounds instantaneously, in the same way as people recognise language. Certain visual properties, such as colour, tend to be perceived in a categorical way, but properties which are not naturally categorical, for example distance, can be assigned to bands of values.

See the table of "DInCoTex" property assignments above. It shows that, for example, if an area or entity is mainly green and blue, the sounds "FFAH" would be conveyed, while if the system wants to convey the texture "Wavy or wiry" and the distance band "Close" then it outputs the sounds "MMER". Certain special consonants, not shown in the table above, are used to temporarily override the default property type, and to convey more detailed or additional information, for example special colours, more precise distances and numbers, or the presence of recognised entities.

Combotex

The fine detail of an area or entity is conveyed by small, rapid fluctuations in the volume of the speech-sounds. These are referred to as "Combotex" effects, as they combine the effects of small changes in brightness, colour, and distance, to give a single volume-conveyed texture effect. This simulates the effect found in vision whereby the overall properties of an area are perceived categorically, and the minor variations in properties across it are perceived as a general texture. The user need not follow the precise detail conveyed by the Combotex effects, but gets an impression of the general level of fine change occurring in an area.

Direct-description sounds of the red circle, with "Combotex" effects added (MP3 28 KB).
.WAV MS ADPCM-compressed version of the same sounds (43 KB).
.WAV uncompressed version of the same sounds (172 KB).

Viewports

Sections of the full image can be selected via a pointer, so that only those parts are conveyed, and at a higher resolution. These sections are known as "viewports", and the user can instruct a viewport to zoom in to any level of detail, as well as zoom out to convey a low-resolution representation of the whole image.

Viewports can be rectangular, hexagonal or rounded (circular or elliptical) and several viewports can be active at any moment. Viewports can be nested so that a "child" viewport moves within a "parent" viewport. One possible configuration would be to define nested viewports to simulate an eye's macula, fovea and/or areas of focal attention, that can move within a simulated visual field :-

Rounded pseudo-macula and pseudo-fovea moving within an elliptical viewport

However viewports would usually be rectangular, as these are easier to work with, and more straightforward to implement.

There are several possible ways of positioning and moving viewports :-

The movement can be controlled via a pointer, such as a computer mouse or joystick.
The system can automatically move a viewport by systematically stepping around the area covered by its parent viewport, conveying adjacent areas with successive scans.
A viewport can be defined as being change-controlled (see below).

Conveying viewports via coded tracers and Combotex effects

The coded phonetic sound tracers (with Combotex effects added) travel in binaural stereophonic sound space, systematically covering a viewport, to sequentially represent the properties of adjacent parts of the viewport.

The methods by which coded tracers convey the properties in a viewport are described as being :-

"Averaged" (or "coded averaged"), conveying a consolidation of the properties of areas conveyed by the tracers within a viewport during a certain period of time; or as
"Layout" (or "coded arranged"), which uses a similar approach to the Averaged method, but which accurately describes the arrangement of properties in small adjacent parts of an area within a viewport (known as a "panel"). Specific categorical effects represent the patterns of light (or other properties) within a panel.

"Layout" sounds allow the layout of the image to be calculated, while Averaged sounds allow a more intuitive interpretation. Averaged properties and recognised entities can also be conveyed in the form of actual descriptive words ("Direct-description" effects).

Additionally, "Audiotactile Entities" can convey identified objects, unidentified objects, areas with common characteristics, or other features that are to be highlighted within a viewport. Combinations of shape-tracers can convey "composite graphics" :-

Half-textured purple and white composite parallelogram

Coded sounds of the half-textured purple and white parallelogram (alternating direction) (MP3 60 KB).
.WAV MS ADPCM-compressed version of the same sounds (87 KB).
.WAV uncompressed version of the same sounds (345 KB).

Viewport containing layouts and an object

Recognised entities can be highlighted by using more complex consonant groups (and corresponding braille effects), so that the user can immediately know that a particular object is being conveyed, in a similar manner to the way in which objects are instantly recognised in vision.

Coded sounds of the example viewport (MP3 60 KB).
.WAV MS ADPCM-compressed version of the same sounds (87 KB).
.WAV uncompressed version of the same sounds (345 KB).

The System Pulse

The "System Pulse" is a user-controlled period of time (typically between one and four seconds) that specifies the time allowed for conveying the contents of the viewports (the "scan time"). It can be thought of as analogous to musical bar timings. It acts as a "conductor" to maintain the timing of different viewports and keep them synchronised.

The System Pulse must be easy for the user to quickly set and change, so that they can slow down the output when the conveyed image content becomes complex, and speed it up again later. This allows the user to feel in control, and they can set the general output speed to a rate that suits them.

The System Pulse effects the "frame rate" of moving images, the resolution of the conveyed information, and the stepping rate of automatically-moved viewports.

Conveying change

Change output is user-controlled and optional. The user can set the sound level to be relatively quiet when there is little change occurring. The volume rises when the amount of change in an area increases, so drawing the user's attention to it. The volume then gently declines. Items or effects moving around an otherwise stationary viewport will cause the volume to increase in the effected areas.

A viewport can be defined as being change-controlled : the area of maximum change can be indicated by the position of a powered pointer; and when sudden change is detected in one part of the image, the system can move the viewport and centre it on the area of change, with the zoom level set to encompass the change.

Audiotactile Entities

As well as conveying general visual features, the system attempts to simulate the way in which features and objects are perceived in vision. Conveying basic properties does not do much to identify entities, separate figures from the background, or assist with the other processes that occur naturally when people see things.

The simplest features are conveyed via shape-tracers, but composite graphics can also be used. These consist of several shape-tracers which together (either simultaneously or in sequence) convey a single entity (whether recognised or unrecognised).

Coded sounds of the purple and white parallelogram (alternating direction) (MP3 60 KB).
.WAV MS ADPCM-compressed version of the same sounds (87 KB).
.WAV uncompressed version of the same sounds (345 KB).

There are three main types of Audiotactile Entities :-

Audiotactile Graphics, which highlight identified shapes and features within a scene, emphasising the shape and layout of the feature or area, without directly conveying information about it.
Audiotactile Text, which conveys short pieces of text, either text visible in the image or, for pre-processed scenes, other explanatory text.
Audiotactile Objects, which convey identified or semi-identified objects.

The Audiotactile Entity types can be joined together as Audiotactile Structures, which link up related objects and features.

Audiotactile Objects

Audiotactile Objects are items in an image that have been identified to the extent that they can be described as specific entities rather than being described in terms of their properties, shapes and features. They are signified by the presence of a complex consonant in audio mode, and by a special "Object dot" in braille mode. Standard audiotactile objects could include common everyday items, standard shapes that are otherwise difficult to convey, items commonly conveyed by signs and symbols etc. At present audiotactile objects will mainly be used within pre-processed images, but in the future their is some scope for automatic recognition of certain objects.

If the shape of an object is available, then an audiotactile shape-tracer can present the coded object description. As an option it may be better to convey the distinctive "classic" shapes of objects, rather than the outline that happens to be formed by the object at its current distance and orientation, allowing "shape constancy" to be simulated. "Shape constancy" and "size constancy" are the perceptual effects whereby the shapes and sizes of objects are often perceived in a constant way once they are recognised, despite objects changing in distance and orientation within a scene : the shape and size of the tracers can be left constant (but their positions changed) when objects move about.

Structures : links between entities

Identified entities in a scene are often related to one another, either in a hierarchical (parent-child) manner (e.g. wheels fitted to a cart) or via a more general linkage (e.g. horse and cart). With prepared material, complex structures may need to be conveyed, and the components of complex entities can be linked together, sometimes with several child sub-components linked to a single parent component.

Navigating structures

When an entity is present in a viewport, the user can select the special Structure mode, whereupon the system ceases conveying the viewport, and only conveys the entity, along with some or all of the entities to which it is linked. (If several structures were present in the viewport, then the system locks on to the entity that was being conveyed when Structure mode was selected.)

Tactile effects

Most of the audio features have tactile equivalents :-

The categorical properties, layouts and classified objects can be displayed on 8-dot (i.e. 8-bit) refreshable braille cells, in a compact format.
"Categorical textures" can be used to convey the 16 basic consonant and vowels in a simpler and more intuitive format than braille.
The position of a viewport can be controlled by the position of a moving powered pointer, which gives proprioceptive cues about the viewport's location.
Shapes (as conveyed by shape-tracers) can be conveyed on a tactile pad, or via the moving powered pointer.

A force-feedback joystick makes an effective pointing device with which to indicate areas of the image, as it can also be programmed to tend to position the viewport in one of a number of set adjacent positions, so that a "notchy" effect is experienced as the viewport is moved. A force-feedback joystick can also be moved by the system, pushing and pulling the user's hand and arm both to convey shapes (by tracing them out), and to indicate a position in space. A force-feedback joystick can convey tactile effects equivalent to the Combotex volume flutter, and can move a viewport to an area of change, so drawing the user's attention to it. Conducted sequences can be developed, where the user is lead round a prepared image or movie sequence.

A possible design for a device which combines these tactile facilities is the multi-purpose Tactile Output and Input Device ("TactOID") illustrated below :-

The device is shown in desktop form, but could be attached to a body-worn framework for mobile use. The TactOID has a multi-functional hand-set which contains control buttons, a braille display, and a tactile palm-pad.

Image pre-processing

The system could convey a prepared programme of material. Pre-processed images allow the best methods for conveying an image to be pre-selected. A sighted designer, with the help of appropriate software, could define features and areas of an image (perhaps by selecting and modifying areas indicated by edge-detection software, which is readily available), and specify the most appropriate methods of conveying them. The designer could assemble conducted sequences which specify the order in which the image features are presented.

The entity and conducted sequence information could be embedded in the image pixels, using steganography, so that the images can also be viewed normally using standard equipment and software. Images and movie sequences prepared in this way could be transmitted through currently available media, for example via compact discs, the Internet or broadcasts, enabling pre-processed sequences to be combined with otherwise standard video material. (For broadcast television, entity and conducted sequence data could be included in the lines often used for teletext data.)

Summary

If implemented, the HiFiVE system would allow a continuum of features, from basic visual properties, to fully-recognised objects, to be conveyed to blind (and deafblind) users. The system could be implemented as a dedicated portable electronic device, or in the form of hardware and software installed on a personal computer.

External links :-

There are other people researching the use of sound and touch to convey images. Peter Meijer has already developed "The vOICe", which conveys images through sound. A different approach is used by "KASPA" (developed by Prof. Leslie Kay), which uses ultrasonics to convey the location and texture of objects. Prof. Phil Picton is developing a real-time "Optophone". Dan Jacobson's "Haptic Soundscapes" site describes a project to develop a tool to allow access to spatial information without vision.

Current & future developments

I'll be making some minor changes to the conventions described on this page, but the general approach remains the same. Please email me if you want to know more.

The HiFiVE System website is maintained by David Dewhurst. Any enquiries or feedback should be sent to info@hfve.com.

Home

Software

Research

Try HFVE!

About

HFVE research

Home

Software

Research

Try HFVE!

About

ISon 2013 & 2016 poster/demo sessions, and ICAD 2019 paper presentation

ORIGINAL CONCEPT (now somewhat superseded!)

Introduction

Coded phonetics

Property types

Recognised entities

Tactile effects

Summary

Features of HiFiVE

Audiotactile Tracers

Coded phonetics

Combotex

Viewports

Conveying viewports via coded tracers and Combotex effects

The System Pulse

Conveying change

Audiotactile Entities

Audiotactile Objects

Structures : links between entities

Navigating structures

Tactile effects

Image pre-processing

Summary

External links :-

Current & future developments

Copyright © 2026 by David C. Dewhurst. All rights reserved. V7.50. Updated 21-JAN-2026.