Statistics and Analytical Sciences

Document Type


Submission Date



Long Short-Term Memory (LSTM) units are a family of Recurrent Neural Network (RNN) architectures that have proven incredibly effective at learning from sequence data. They are also extremely complex, making them expensive to train and difficult to understand. A recent trend towards simplification has produced the Gated Recurrent Unit (GRU) and the Minimal Gated Unit (MGU), both of which perform as well as the LSTM (or better) on a variety of tasks. The MGU is one of the simplest gated recurrent architectures at the moment. Our study demonstrates that it is possible to radically simplify the MGU without significant loss of performance for some tasks and datasets. For the gun violence data used here, an extraordinarily simple Forget Gate (FG) architecture (as well as many other simplified architectures) performs just as well as an MGU on the given task. While more complex architectures such as the MGU, GRU, or LSTM may be needed in some situations, they are likely overkill for many real-world datasets, and the marginal performance benefit may come with a very large price tag.