In the example above, an RNN is trying to guess the next word after seeing a set of sentences like "Fed Chair Janet Yellen ... raised rates. Ms. ???".
At the bottom: The neural network tries to guess using the fixed set of words it has in its vocabulary. This knowledge may be more related to the distant past, such as old references or training data involving the previous Fed Chair Ben Bernanke.
At the top: The pointer network is able to look at the story's recent history. By looking at the relevant context, it realizes that 'Janet Yellen' will likely be referred to again, so the network, also realizing that 'Janet' is a first name and that it needs a last name, points to 'Yellen'.
By mixing the two information sources - by first "pointing" to recent relevant words using context and then otherwise using the RNN's internal memory and vocabulary if there's no good context - the mixture model is able to get a far more confident answer.
While it's obvious to use this in the case of names, which may be rarely seen or even entirely new to the neural network, the pointer sentinel method works on a broad class of other word types too. One of the most interesting is units of measure. While units of measure, such as [kilograms, tons] or [million, billion], are very common words, the neural network uses the pointer mechanism heavily!