Teaching neural networks to point to improve language modeling and translation
Imagine you were a young child and wanted to ask about something. Being so young (and assuming you are not exceedingly precocious), how would you describe a new object, the name of which you have yet to learn? The intuitive answer: point to it!
Surprisingly, neural networks have the same issue. Neural Networks typically use a pre-defined vocabulary on the order of hundreds of thousands of words. Even so, these pre-defined vocabularies cannot capture all the words we use in natural language! Inevitably, new and rare words arise that the networks struggle to understand and use properly. To remedy this, we introduce the pointer sentinel mixture model, which allows neural networks to 'point' to relevant, recently observed words. This can help in machine translation and language modeling as rare words (e.g. uncommon names) tend to come up fairly often.
In the example above, an RNN is trying to guess the next word after seeing a set of sentences like "Fed Chair Janet Yellen ... raised rates. Ms. ???".
At the bottom: The neural network tries to guess using the fixed set of words it has in its vocabulary. This knowledge may be more related to the distant past, such as old references or training data involving the previous Fed Chair Ben Bernanke.
At the top: The pointer network is able to look at the story's recent history. By looking at the relevant context, it realizes that 'Janet Yellen' will likely be referred to again, so the network, also realizing that 'Janet' is a first name and that it needs a last name, points to 'Yellen'.
By mixing the two information sources - by first "pointing" to recent relevant words using context and then otherwise using the RNN's internal memory and vocabulary if there's no good context - the mixture model is able to get a far more confident answer.
While it's obvious to use this in the case of names, which may be rarely seen or even entirely new to the neural network, the pointer sentinel method works on a broad class of other word types too. One of the most interesting is units of measure. While units of measure, such as [kilograms, tons] or [million, billion], are very common words, the neural network uses the pointer mechanism heavily!
Results
Penn Treebank (PTB)
WikiText
= Super Mario Land = Super Mario Land is a 1989 side @-@ scrolling platform video game , the first in the Super Mario Land series , developed and published by Nintendo as a launch title for their Game Boy handheld game console . In gameplay similar to that of the 1985 Super Mario Bros. , but resized for the smaller device 's screen , the player advances Mario to the end of 12 levels by moving to the right and jumping across platforms to avoid enemies and pitfalls . Unlike other Mario games , Super Mario Land is set in Sarasaland , a new environment depicted in line art , and Mario pursues Princess Daisy . The game introduces two Gradius @-@ style shooter levels . At Nintendo CEO Hiroshi Yamauchi 's request , Game Boy creator Gunpei Yokoi 's Nintendo R & D1 developed a Mario game to sell the new console . It was the first portable version of Mario and the first to be made without Mario creator and Yokoi protégé Shigeru Miyamoto . Accordingly , the development team shrunk Mario gameplay elements for the device and used some elements inconsistently from the series . Super Mario Land was expected to showcase the console until Nintendo of America bundled Tetris with new Game Boys . The game launched alongside the Game Boy first in Japan ( April 1989 ) and later worldwide . Super Mario Land was later rereleased for the Nintendo 3DS via Virtual Console in 2011 again as a launch title , which featured some tweaks to the game 's presentation . Initial reviews were laudatory . Reviewers were satisfied with the smaller Super Mario Bros. , but noted its short length . They considered it among the best of the Game Boy launch titles . The handheld console became an immediate success and Super Mario Land ultimately sold over 18 million copies , more than that of Super Mario Bros. 3 . Both contemporaneous and retrospective reviewers praised the game 's soundtrack . Later reviews were critical of the compromises made in development and noted Super Mario Land 's deviance from series norms . The game begot a series of sequels , including the 1992 Super Mario Land 2 : 6 Golden Coins , 1994 Wario Land : Super Mario Land 3 , and 2011 Super Mario 3D Land , though many of the original 's mechanics were not revisited . The game was included in several top Game Boy game lists and debuted Princess Daisy as a recurring Mario series character .