Column horizon robot Yang Ming: some new trends in the study of deep learning

Robot horizon

jiqizhixin· 2016-04-29 05:59:18

专栏 | 地平线机器人杨铭:深度学习研究的一些新趋势

< br style= > horizon robot column" max-width: 100%; box-sizing: border-box important; word-wr! Ap: break-word! Important; "/>


< /section>

robotic company (Horizon Robotics), the horizon by former Baidu Deep Learning Institute (IDL) founder Yu Kai founded, is committed to innovation in the field of artificial intelligence the. the company's vision is to define "smart things", make life more convenient, more interesting, more security. Yang Ming, co-founder and vice president of software for the horizon robot.

2016 years is the 60 anniversary of the birth of artificial intelligence. April 22nd, 2016 Global Conference on artificial intelligence technology (GAITC) and artificial intelligence 60 year anniversary of the launching ceremony held at the National Convention Center in Beijing, about 1600 experts, academics and industry professionals attended the meeting.


conference presentations by the Deputy Secretary General of the Chinese Association for artificial intelligence, robot technology horizon founder and CEO Yu Kai, presided over the meeting. horizon robot co founder and vice president of software Dr. Yang Ming delivered a keynote speech, some of the new trends in the study of deep learning.

other speakers also includes Chinese Association for artificial intelligence director, Li Deyi academician of Chinese Academy of engineering, the IBM China Research Institute of large data and cognitive computing Research Director Su, baidu depth study researcher "distinguished scientist Xu, Cambrian technology founder and chief executive officer Chen Tianshi, Beijing cloud known sound information technology Co., Ltd. chairman and CTO Liangjia en, today's headlines laboratory director Lei, Ali Baba senior technical experts Jeng Wang. />

Yang Ming horizon robot co founder and vice president of software

line-height: learning depth; 1.75em; "> thank Yu Kai introduction. Hello, everyone. I'm Yang Ming. It is a great honor to have this opportunity to share with you some thoughts and a summary of the new trends in the study of deep learning, and we will be abbreviated as one word "MARS" ". This is some discussion with my colleague dr..


simple introduce I was last summer to join the horizon, for software engineering. Before this I was in the Facebook Artificial Intelligence Laboratory for face recognition algorithm research and back-end system development, but also in the NEC American laboratory and Xu Wei work together, learn a lot of things, benefit a lot.



before talking about new trends in the depth of learning, we should first clarify the definition of deep learning and its current development state. Very fortunate, the academic circle for the definition of deep learning there is a relatively clear consensus. Deep learning is the expression or description of these data, which is obtained from the original data through continuous learning and continuous abstraction. So simply, depth learning is from the original data (data raw) learning its expression (representations learning) . These original data may be image data, may be voice, or text; this expression is a simple digital expression. The key to deep learning is how to learn the expression. This expression is obtained through the multi layer nonlinear complex structure, and this structure may be the neural network, it may also be other structure. The key is to hope that through the end to the end of the training, from the data directly to learn to express.


if talking about deep learning of origin or to go back to 1957. From a very simple structure unit -- "perceptron (perception)" began. Some input signals are weighted by weights, and a threshold value is compared to the output. Why does it say that this is the origin of deep learning? Because these weights are not pre designed by the rules, they are trained to learn. The beginning of the "perception" is the hardware design, these connections are physical connection, these weights may be achieved through the regulation of resistance. When the media predicted that this is the prototype of an intelligent computer, can quickly learn to walk, talk, look at the pictures and writing, even self replicate or self aware. So after 60 years, the current progress in the middle picture and writing for the stage, I hope at least 60 years to learn self replication.

depth study of the fall and

depth learning from appears, generally after the two fall. Everyone is very optimistic at the beginning, but soon found that there are some very simple problems that it can not solve. From the beginning of 2006, in Hinton/LeCun/Bengio/Ng several professors to promote, depth of learning has been an explosive development, in image recognition, speech recognition and semantic understanding, and advertising recommendation problem, resulting in a many breakthrough improvement. The latest progress is in March this year, the AlphaGo go game, in a very intuitive way to let the public feel the progress of the depth of learning. We hope that more than five years, the depth of learning technology can really use the daily lives of millions of households, so that each device can run the depth of the learning module.


in the landing, deep learning basic learning mode and network structure in fact, there is no essential changes or a multi-level artificial neural network structure. As shown in this picture, the input layer is a number of original data, and there are marked. Whether to learn what, as long as there is a error in the evaluation of the function), what is the evaluation of neural network error, then < strong > with the input and output, deep learning or the depth of the neural network can learn the target as a black box. artificial neural network is the structure of the multi layer of neurons and their connections. There may be an input and a target at the start, for example, you want to identify the person from the face image. At this time the neural network certainly can not recognize it, because it has never seen. We will give a random set of values for the neural network, so that it can predict the recognition results, the beginning of the output layer will almost certainly be a false recognition results. It does not matter, we put the error of the output layer slowly back to back, a little bit to modify the connection parameters between these neurons. Through a large number of data constantly on the neural network to be modified, the network is able to complete a very complex function. From 80s to the present, this 30 years, the basic structure and learning algorithm is not changed.


from the beginning of 2006, depth of learning with the explosive growth, due to the following reasons. First is the use of the vast amounts of data and the use of these data makes this original depth of neural network are is no longer a problem (such as the noise data sensitive, easily in a small data set performance is very good, but cannot generalize to large data set). To be able to use these big data to learn, the ability to require a very high parallel computing. Of course, there are improvements on the algorithm, such as batch, normalization residual, networks dropout, etc., to avoid the problem of fitting gradient disappear. but the nature of the outbreak of the depth of the study or the development of large data and computing capabilities to achieve the . Before that the neural network itself like a black box, structural design is not a good guide, unfortunately, the current situation is still the case.


depth learning why this few years can get so much attention? The key reason is that the performance accuracy is increased with the increase of the data. Other machine learning methods may be as the data increases, the performance is increased to a certain point on the saturation. But so far this has not been observed, which is probably one of the most noteworthy points. At present, the depth of the study also achieved a lot of success, such as how to do a good job of image classification. For a 1000 class of image classification test, after about less than five years, the error rate from 25% to 3.5% levels, has been higher than the average person's recognition accuracy rate. This is the main success we have achieved in the depth of the neural network learning, that is, how to learn how to identify, how to classify.


depth study new trend

back to our topic, the depth study and research of the new trend? We summarize four directions.

first is learning how to memory (memory networks);

second is to learn how to focus on (attention model), focus on the need to care about the details

third is reinforcement learning (reinforcement learning), learning how to control

fourth is a new trend of task structure on the overall learning. Is the serialization (Sequentialization). Style=


the first is learning how to memory. Conventional neural networks have a characteristic: every time you input and output is to determine the relationship, for a pair of images, whenever input into the neural network, we after a layer of a layer of calculation will get a certain result, which is irrelevant with the context. How can we introduce the ability of memory into the neural network? The simplest idea is to join a number of states in the neural network, so that it can remember a little things. Its output depends not only on its input, but also on its own state. This is a basic idea of recurrent neural network. Outputs depending on the state itself, we also can be expanded into a sequential series of structure, that is to say, the current state of the input includes not only now input, but also contains a moment output, this will constitute a very deep in the network. This approach allows the neural network to remember some of the previous state. Then the output depends on the combination of these States and the current input. But this method has a limitation: these memories will not be long, will soon be washed away behind the data. < strong > after depth study of development is long short-term memory (long short term memory), and put forward the concept of a memory unit (memory cell), the unit joined the three doors, an input gate, a gate output, a forgotten < / strong >. Enter the door to control whether your input will affect the content of your memory. The output gate is the impact of your memory is the output of the impact of the future. The door is to see if your memory is self update to keep it down. In this way, you can keep your memory flexible, and control the memory of how these doors are maintained through learning, and learning how to control them through different tasks. The length of the memory unit is put forward in 1999 the. In recent years, and some new improvements such as gated recurrent unit, reduced to only two doors, one is updating the door, a reset gate, memory control whether to survive.


these methods can actually the memory preserved a little longer, but in fact is still limited. Updated research methods put forward the concept of a neural Turing machine (neural turning machine): a permanent memory module, a control module to control how to according to the input to read the memory storage, and converted into output. This control module can be implemented using neural network. For example, for example, a sort of work, there are a number of chaotic sequences, and want to put it into a sequence of sequences. Before we need to design different sorting algorithm, and the idea of the neural Turing machine is we are given the input and output, let the neural network to learn how to store and take out sorting through these numbers. In a sense, let the neural network learning how to achieve the task of programming. This is a similar work, Network Memory memory network, learning to manage such a long memory, in the application of the question and answer system, you can learn some of the ability to reason.


the second direction is to pay attention (attention model) model, the dynamic of the attention to some details and improve the recognition performance. For example, talk about the pictures and image understanding, you can generate a sentence according to the picture, is likely to be very macro. < strong > if we are able to focus attention in the mechanism of from the introduction to the recognition process, according to the results of the current recognition, dynamic step by step adjustment focus image details, you can generate more reasonable or more fine expression. < / strong > such as in the image, pay attention to a flying saucer, we can adjustment region of interest in the image to look at the flying saucer, extract the features for recognition, image can be obtained more accurate description.


. The third one is enhanced learning (reinforcement learning). In the framework of reinforcement learning, there are two parts, one is the autonomous control unit (agent), a part of the environment (environment). The autonomous control unit is through the choice of different strategies or behavior, hoping to maximize their long-term expected return, get a reward; and the environment will receive strategic behavior, modify the state, feedback the reward. In this enhanced learning framework has two parts, a part is how to choose these behaviors (policy function), another part is how to assess their likely to make these gains (value function). The enhanced learning framework itself has been in existence for many years, combination of depth and learning refers to how to choose the function of behavioral strategies, and how to evaluate the expected reward function, by the depth of the neural network learning, for example AlphaGo go go chess (Policy Network) network and evaluation network (value networks.


altogether. < strong > from the research angle, depth of learning is from a supervised learning slowly to the interactive learning < / strong >; network structure from the first to the network to recursively, considering memory, consider timing network; at the same time, the contents from static to dynamic input input, in predicting the way is from the prediction, but also slowly into the prediction step by step serialization. In other words, many of the past concurrent things in depth learning is developing towards the serialization process development and is the largest such benefits can be concurrency model in a large number of redundant were removed by the association between the front and the key information can be a very good preservation and utilization. 2014 and 2015 development from the point of view, deep learning is now very simplified idea is, if there is a relatively new problem, what to do first description of the problem and ensure the input to the final purpose of this process every step is differentiable, then put the part of one of the most difficult to rule description with the depth of the neural network to fit, to achieve end to end. Several new trends mentioned before, in general, are still the idea.


both public or the media, or researchers themselves, we may be of deep learning and some different angles of understanding. I personally think that this is a very pure computing problem in the field of computer science, to explore how to understand the nature of these data content and the structure of the better understanding of the abstract. I hope that some of the new trends in the study of today's depth, to help and learn from. Thank you all!

machine heart we spread foreign production, learning and research dynamic of at the same time, also will look to invest in domestic excellent artificial intelligence and expert. To this end, the heart of the machine to add the company column for the domestic Intelligent Company and experts to better spread of ideas and knowledge. Welcome domestic Intelligent Company and artificial heart machine contact set column, e-mail:

add machine of the heart (reporter / Full-time Intern):

submission or report:

& advertising; business cooperation:

machine almosthuman2014

The lastest articles of jiqizhixin

Invitation |2017 national robotics Forum

Speech, Tong Xin, chief researcher: from interactive to intelligent network...

Research progress of |TTIC on QA task in one week

Application of column depth learning in face recognition -- the "evolution"...

Column of the Tencent excellent figure Garyhuang: excellent map, not just...

CVPR2016| Shang Tang scientific papers analysis: Dress recognition search...