![]() |
||
COMPUTING PROJECT (CMPC3P2Y)Initial Project Proposal1 Background and MotivationIt is possible to automate the process of animating a character to talk directly from the speaker’s speech signal. The result, however, is somewhat unrealistic, as the characters appear to lack emotion. This is because the character has, unless applied manually, no emotional state whilst speaking. Potential uses for the ability to detect a person’s emotional state from their voice, arise in a number of places. Two examples in the areas of health and security are given:– It is known that people suffering from Autism have reduced connectivity in the default network area of the brain [Martino et al., 2009]. This area is that responsible for social and emotional processing, and so as a result, autistic people have problems identifying and displaying emotion [Ashwin et al., 2006]. Autistic children are already taught to recognise the emotion perceived from facial expression. With the ability to detect the emotion of a voice, autistic children will also be able to see the emotional state, and strength of this emotional state, other people would perceive of them as they speak. It could then be discussed with the child, whether this behavior is appropriate for the situation. This not only improves an autistic person’s ability to recognise the state of emotion of a speaker in every-day scenarios, but also extends their emotion detection ability to scenarios where face-to-face communication is not possible, such as a telephone conversation. Secondly, there exists the potential for applications in the area of security. It is now well established that when a person is being deceitful, by lying or hiding information, they become stressed and exhibit signs of this by means of spontaneous, subconscious changes, including their emotional state. It may be possible to detect these, and thus show that a person may not be entirely truthful. 2 AimsThis project aims to review and evaluate existing methods for the detection of emotional state from a voice. This will be achieved by extracting features from acoustic speech signals and correlating with listener perception of emotional state, in order to develop a system to detect the emotional state of a speaker. The accuracy of this system will be compared to existing, state-of-the-art approaches published in the literature. 3 MethodsPrevious approaches have used various acoustic features, which include the amplitude variation, pitch variation, pitch contour, pitch level, intensity, tempo, envelope, number of harmonics and duration of an acoustic signal, as well as some spectral analysis of the signal have been used to suggest the emotional state of the speaker [Rong et al., 2007] and [Scherer, 1995]. Volunteers will recorded reading a sentence in a neutral state, as well as the emotional states of happiness, anger, sadness and surprise – the same emotional states used by Cai [Wang et al., 2003]. Additional volunteers will then be played the recordings, and asked to choose an emotion from a given set, which most closely matches the perceived emotion of the recording, and rate the strength of that emotion on a 1-5 scale. A Web-based survey will be used for this purpose, maximising the potential number of volunteers to take part. The results of this survey will be used to produce a report for each recording to show the most frequently perceived emotion, and the mean and standard deviation of the strength of the emotion. MATLAB will be used for signal processing and analysis. A MATLAB script will be produced to automate the process of detecting the emotional state of a speaker from a recording of his or her speech. The effectiveness of this script can be tested by comparing its output (the emotional state of an input recording as detected by the script) to the most frequently perceived emotion of that recording (as found by the earlier survey). 4 Risks and Open IssuesThere is a chance that distinguishing between many different emotional states proves too difficult or unreliable. In this case, a reduced set of emotional states will be chosen. These will be those emotional states which have the most distinguishing features. I.e. if two emotional states display similar characteristics, and so prove difficult to distinguish, one of the two may be dropped. As with any manual process, there is always a risk of human error – whether deliberate or otherwise, and so results from the survey will be cleared of any outliers and anomalies. Additionally, there is a risk of getting too few volunteers to take part in the evaluation. This may be rectifiable by emailing fellow university students via the Head of School, to ask for their cooperation. Finding enough volunteers to record may also prove difficult. In this case, the project may have to rely on a reduced set of recordings, with the understanding of the negative affect this will have on the ability to extrapolate the findings of this project. As each person speaks differently, it may prove difficult to determine a speakers emotion without having a neutral-emotion speech waveform to compare to. For example, if it is found that excited speakers speak more quickly, a fast-speaking voice recording may not necessarily prove the speaker to be excited. The speaker my simply always speak more quickly than average. If this is found to be the case, the project will initially focus on speaker specific emotion detection. References[Ashwin et al., 2006] Ashwin, C., Chapman, E., Colle, L., and Baron-Cohen, S. (2006). Impaired drecog-nition nof fnegative ebasic cemotions sin nautism: :A Atest tof fthe eamygdala atheory. Psychology Press.
|
David J. Claxton 3832481 |
|
|
|
||
| Copyright David J. Claxton 2009. All rights reserved. - LightNEasy 2.2.2 | Site Map | Login | ||