Disney also studied artificial intelligence and used AI for animation
leifengwang· 2017-08-19 04:36:58
Lei Feng network AI science and Technology Review: as we all know, Carnegie Mellon University in computer science research among the best, and Disney intends to introduce computer science and technology to animation production. They established cooperation with Carnegie Mellon University's laboratory published a paper in A Deep Learning Approach for Generalized Speech Animation recently, using the method of deep learning, look to generate natural speech animation. This paper has been indexed by SIGGRAPH 2017.
they introduced a simple and effective depth learning method to automatically generate a speech animation that looks natural and can synchronize with input speech. Using sliding window predictor, the method can learn arbitrary nonlinear mapping from phoneme tag input sequence to mouth type movement, and can capture natural motion and Visual Co articulation accurately.
this method has several attractive features: it can run in real time, only very few parameters need to be adjusted, can very good generalization to the input speech sequence of new voice, it is easy to edit to create style and emotional, and is compatible with existing animation redirection method.
Disney lab says one of the priorities of their work is to develop methods that can efficiently produce voice animation and integrate it easily into existing work. This end-to-end approach is described in their paper, which includes some design decisions for machine learning. In this paper, a broad range of speech animation results, including singing and foreign language input, are demonstrated through different characters and sounds in animated clips. This method can also generate flexible speech animation according to the user's speech input.
Lei Feng AI will be part of the science and Technology Review compiled as follows:
is a part of speech animation generating realistic character animation in the important and time-consuming. In a broad sense, voice animation is a task of changing the facial features of a graph (or robot) model to synchronize the movements of the lips with the sounds produced, creating a sense of speech. As humans, we are experts in facial expressions, and bad speech animation can be distracting, unpleasant, and confusing. For example, when the mouth shape and the sound are inconsistent, sometimes the viewer thinks he hears another sound (McGurk and MacDonald's paper, 1976). For actual character animation, high fidelity voice animation is critical.
currently uses traditional speech animation methods in film and video game production, usually tending to two extremes. One approach is that high budget products typically employ performance capture technology or hire a large professional animation team, which is costly and difficult to replicate on a large scale. For example, there is no good way to produce high-quality speech animation that can be cost-effective and efficient across a wide variety of languages. Another approach is that for low cost, content intensive products, a simple lip library may be used to quickly produce relatively low quality voice animation.
recently, people more interested in automatic generation of speech animation on the development of a data driven method and, in order to find the two extreme compromise (De Martino papers, 2006; Edwards, 2016; Taylor papers, papers, 2012). Previous work, however, requires a predefined set of lips with a finite number of lips, which must be mixed. The simple blending function limits the complexity of the visual speech dynamics that can be modeled. So, we plan to use modern machine learning methods to learn the complex dynamics of visual speech directly from data.
, we propose a method for automatic generation of speech animation by depth learning, which provides a cost-effective and efficient means of generating high fidelity voice animation on a large scale. For example, we use more than 100 degrees of freedom to create realistic speech animation on the face model of the movie special effects level. One focus of our work is to develop an efficient method of voice animation that can seamlessly integrate into the production of existing works." we use continuous sliding window deep learning predictor, which is inspired by a paper published in 2015 by Kim et al. The sliding window approach implies that the predictor can represent complex nonlinear regression between the input speech description of the continuous speech and the output video, as well as the context and the articulation effect. Our results show the improvement of neural network deep learning method in decision tree method before Kim et al.
uses the sliding window overlap will learn more directly focused on capturing the local scope and context of coarticulation effect, compared to recurrent neural network and LSTM (Hochreiter and Schmidhuber, 1997) and other traditional sequence learning method is more suitable for the prediction of speech animation. One of the major challenges for
to use machine learning is to properly define learning tasks in a manner that is useful to the desired end goal (for example, what input / output and training set are selected). Our goal is to enable animators to easily integrate high fidelity voice animation onto any rig, for any speaker, and easy to edit and style.<