Why this? What is the goal?
The goal of this repository is to write the recurrent architectures from scratch in tensorflow for learning purposes. This is a Work-In-Progress. I plan to implement some more architectures and publish the results and performances for all of them.
The inspiration for this post was the last paragraph of Understanding LSTMs. Chris Olah mentioned two papers that did extensive study on recurrent architectures and I wanted to implement all the architectures in these two papers. A short Google search resulted that Jim Fleming already did half the work here, so I decided to implement all the remaining architectures of Jozefowicz’s paper. (I also updated parts of Jim Fleming’s code so that all the architectures work in the newest version of tensorflow. Both these papers are fantastic and worth a read. Feel free to send me a pull request if you spot an error and/or find other papers with recurrent architecture variants. As and when time permits, I will implement them. All the implementations are in Tensorflow (0.12).
Deep Learning Recurrent Architectures
- LSTM Network Variants This tutorial has a very nice approach to creating variations of LSTM Networks. A good approach to learning how to code a new network architecture and more importantly a methodical approach to understanding the gates in LSTM. This tutorial is based on this paper.
- Empirical Exploration of Recurrent Network Architectures: This paper from Google explored quite a few recurrent architectures and came up with three variants of Recurrent architectures that performed better than traditional LSTM’s and GRU’s in certain tasks. I have implemented all the three architectures mentioned in the paper.
This is directly a fork of LSTM Network Variants, with the code changes to run on the most recent version of tensorflow. (0.12.0 as of this writing). The remaining architectures are from here: Empirical Exploration of Recurrent Network Architectures
The implementations are not optimal, in the sense, that in the actual implementations of the LSTM, GRU and RNN cells the states and input are concatenated before multiplications to reduce the number of matrix multiplications whereas this is directly an implementation of the lstm network that you would see in a textbook. [This is also going to change soon. ;)]
Recurrent Architectures Implemented
If with a (*) then it was implemented in LSTM Network Variants, else was implemented by me based on Empirical Exploration of Recurrent Network Architectures . Also network architectures that I have implemented follow the conventions and syntax of Empirical Exploration of Recurrent Network Architectures.
- mut1 : Variant 1 from Empirical Exploration of Recurrent Network Architectures
- mut2 : Variant 2 from Empirical Exploration of Recurrent Network Architectures
- mut3 : Variant 3 from Empirical Exploration of Recurrent Network Architectures
- vanillaRNN : Just a vanilla RNN Network
- gru : Gated Recurrent Unit
- cifg (*) : Coupled input-forget gate
- fgr (*) : Full Gate Recurrence
- lstm (*) : Long Short Term Memory
- nfg (*) : No forget gate
- niaf (*) : No input activation function
- nig (*) : No input gate
- noaf (*) : No output activation function
- nog (*): No output gate
- np (*): No peephole connections
See the jupyter notebook here