
===========
Performance
===========

Theano uses several tricks to obtain good performance:
 * common sub-expression elimination
 * [custom generated] C code for many operations
 * pre-allocation of temporary storage
 * loop fusion (which gcc normally can't do)

On my neural net experiments for my course projects, I was getting around 10x
speed improvements over basic numpy by using theano.
[More specific speed tests would be nice.]


With a little work, Theano could also implement more sophisticated
optimizations:

 * automatic ordering of matrix multiplications 
 * profile-based memory layout decisions (e.g. row-major vs. col-major)
 * gcc intrinsics to use MMX, SSE2 parallelism for faster element-wise arithmetic
 * conditional expressions
