-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RNN + LSTM Layers #3948
RNN + LSTM Layers #3948
Conversation
|
||
/** | ||
* @brief An abstract class for implementing recurrent behavior inside of an | ||
* unrolled network. This Layer type cannot be instantiated -- instaed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "instaed"
RNN + LSTM Layers
It doesn't work with current net_spec.py. In specific, 1) it will fail when using L.LSTM() or L.RNN() since only RecurrentParameter is defined in the caffe.proto. 2) it will fail when using L.Recurrent() since RecurrentLayer is not registered (an abstract class). I did a simple hack by adding the following in the param_name_dict() function in net_spec.py
|
@weiliu89 the recurrent parameter for these layers, like the convolution parameter for
Whether or not to map these shared parameter types as you suggest here or as suggested or |
const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom); | ||
|
||
/// @brief A helper function, useful for stringifying timestep indices. | ||
virtual string int_to_str(const int t) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little surprising to see a helper like this show up in the recurrent layer, but if there weren't any use for it elsewhere then I suppose it could live here. That said, there is already format.hpp and its format_int()
function that was added for cross-platform compatibility in b72b031. How about making use of that instead?
LGTM overall—my only comments were about comments and naming (and that one int -> string function). @longjon are you done with your review? |
Looks great. Thanks for this @jeffdonahue. We've been using a variant of this for a while and it has performed great. One thing we can additionally PR/gist (if it's useful) is a wrapper around the LSTM layer that allows for arbitrary length (batched) forward propagation - which came in handy when doing inference on arbitrary length sequences (relaxing the constraint around T_ while preserving memory efficiency for the forward pass by reusing activations across timesteps). |
bbd33d2
to
4b6c835
Compare
@shelhamer @longjon thanks for the review! Fixed as suggested. @ajtulloch glad to hear it's been working for you guys, thanks for looking it over! I'm not sure I understand the idea of the wrapper though. I think this implementation should be able to do what you're saying -- memory efficient forward propagation over arbitrarily long sequences -- by feeding in |
@jeffdonahue yeah, the only contribution was around allowing variable- |
Ah -- batching the input transformation regardless of sequence length indeed makes sense. Thanks in advance for posting the code! |
Used the proto file from BVLC/caffe#3948
Dear all I have very big matrix(rows are ID and columns are label ) and I was wondering to know How can i do the training on caffe with just fully connected layers? Thanks a lot. |
Used the proto file from BVLC/caffe#3948
When will this issue be merged ? |
Anyone successfully merged @jeffdonahue caffe:recurrent-layer and BVLC's caffe:master? Why does the assertion of
|
RNN + LSTM Layers * jeffdonahue/recurrent-layer: Add LSTMLayer and LSTMUnitLayer, with tests Add RNNLayer, with tests Add RecurrentLayer: an abstract superclass for other recurrent layer types
…LSTM Layers BVLC#3948' by jeffdonahue for BVLC/caffe master.
4b6c835
to
51a68f0
Compare
Thanks again for the reviews everyone. Sorry for the delays -- wanted to do some additional testing, but I'm now comfortable enough with this to merge. |
Very nice work @jeffdonahue. |
@jeffdonahue |
Any plans for a release? |
Can you have a link of a working tutorial/example on using these layers? It would be easier for new learners. I know you have it somewhere. |
RNN + LSTM Layers
Great work!!! @jeffdonahue I used https://github.com/jeffdonahue/caffe/tree/recurrent-rebase-cleanup/ as the example to do
Any suggestion? |
@jeffdonahue I am new to caffe. Do you have any example about RNN. How to use RNN layer. |
@jeffdonahue May I ask your help for a clarification? I can see in |
Hello, what makes it necessary to switch the dimension order of bottom blob from |
RNN + LSTM Layers
Hi. @jeffdonahue @weiliu89 . Thanks in Advance. |
This PR includes the core functionality (with minor changes) of #2033 -- the
RNNLayer
andLSTMLayer
implementations (as well as the parentRecurrentLayer
class) -- without the COCO data downloading/processing tools or the LRCN example.Breaking off this chunk for merge should make users who are already using these layer types on their own happy, without adding a large review/maintenance burden for the examples (which have already broken multiple times due to changes in the COCO data distribution format...). On the other hand, without any example on how to format the input data for these layers, it will be fairly difficult to get started, so I'd still like to follow up with at least a simple sequence example for official inclusion in Caffe (maybe memorizing a random integer sequence -- I think I have some code for that somewhere) soon after the core functionality is merged.
There's still at least one documentation TODO: I added
expose_hidden
to allow direct access (via bottoms/tops) to the recurrent model's 0th timestep and Tth timestep hidden states, but didn't add anything to the list of bottoms/tops -- still need to do that. Otherwise, this should be ready for review.