Join 10350+ others. No spamming.
I promise!

Follow us at github.



hshindo / Merlin.jl


Flexible Deep Learning Framework in Julia


Merlin: deep learning framework for Julia

Merlin is a deep learning framework written in Julia.

It aims to provide a fast, flexible and compact deep learning library for machine learning.

Merlin is tested against Julia 0.6 on Linux, OS X, and Windows (x64).

Build Status Build status



  • Julia 0.6
  • g++ (for OSX or Linux)


julia> Pkg.add("Merlin")

Quick Start


  1. Wrap your data with Var (Variable type).
  2. Apply functions to Var.
    Var memorizes a history of function calls for auto-differentiation.
  3. Compute gradients if necessary.
  4. Update the parameters with an optimizer.

Here is an example of three-layer network:

Merlin supports both static and dynamic evaluation of neural networks.

Dynamic Evaluation

using Merlin

T = Float32
x = zerograd(rand(T,10,5)) # instanciate Var with zero gradients
y = Linear(T,10,7)(x)
y = relu(y)
y = Linear(T,7,3)(y)

params = gradient!(y)

opt = SGD(0.01)
foreach(opt, params)

If you don't need gradients of x, use x = Var(rand(T,10,5)) where x.grad is set to nothing.

Static Evalation

For static evaluation, the process are as follows.

  1. Construct a Graph.
  2. Feed your data to the graph.

When you apply Node to a function, it's lazily evaluated.

using Merlin

T = Float32
n = Node(name="x")
n = Linear(T,10,7)(n)
n = relu(n)
n = Linear(T,7,3)(n)
@assert typeof(n) == Node
g = Graph(n)

x = zerograd(rand(T,10,10))
y = g("x"=>x)

params = gradient!(y)

opt = SGD(0.01)
foreach(opt, params)

When the network structure can be represented as static, it is recommended to use this style.




This is an example of batched LSTM.

using Merlin

T = Float32
a = rand(T,20,3)
b = rand(T,20,2)
c = rand(T,20,5)
x = Var(cat(2,a,b,c))
lstm = LSTM(T, 20, 20) # input size: 20, output size: 20
y = lstm(x, [3,2,5])

More examples can be found in examples.