Comparisons between deep learning frameworks such as Theano, TensorFlow, Torch, MXNet and recently PyTorch have always been a hot topic of concern. Machine Heart has also published a number of articles on this topic, such as “Mainstream Deep Learning Frameworks Compare: Which one is Best for You?” , “MXNET is the Best Choice”, “TensorFlow is The Most Popular but Not the Best” and “From TensorFlow to Theano: A Horizontal Comparison of seven Deep Learning Frameworks”.


But do you know how users actually feel about it? Reddit user cjmcmurtrie recently posted a discussion thread titled “PyTorch vs. TensorFlow” to see what the two popular frameworks have in common.


[D] So… Pytorch vs Tensorflow: what’s the verdict on how they compare? What are their individual strong points? R. / MachineLearning


The first floor of the post reads:


I have not migrated from Torch7 to TensorFlow. I played TensorFlow, but I found Torch7 more intuitive (maybe I didn’t play it enough?). . I also tried a bit of PyTorch, so I decided to see the results first.


After using PyTorch for a few weeks, I don’t think I need to migrate to TensorFlow just yet, at least for projects I’m interested in. Writing custom modules in PyTorch is incredibly easy. And its Dynamic Graph construction brought a lot of new ideas to things I had to stay up late to implement (or wait list). I think PyTorch is a great toolset for machine learning developers. I also know that the community resources for TensorFlow are much stronger, but when it comes to developing entirely new projects (as opposed to rewriting code for an existing architecture or reading tutorials), the community doesn’t necessarily help much.


The Reddit post was followed by many machine learning researchers and developers, who shared their thoughts and experiences (not just with PyTorch and TensorFlow, but with many more tools). Heart of the Machine has selected some of the comments here that we think are valuable, in the hope that they will help you in your studies and research. The following is in order of approval.


Ajmooch replies:


I’ve been working on a TensorFlow project, so I can fairly make a comparison between Theano+Lasagne, PyTorch and TensorFlow. But FOR the first two, I can offer some broad insights.


Background: I started Theano+Lasagne about a year ago and used it in two of my papers. I switched to PyTorch last week and rebuilt two key projects THAT I had previously implemented with Theano.


API: Theano’s graph-building and compilation ways of working made it hard for me to learn, but once I got the hang of it, everything worked out (this might take two months, but I’m still learning Python and basic neural networks, so take this speed reference with a grain of salt). Lasagne’s API, to me, is like the elegant Queen Catherine in battle on a killer whale, which means I love it. If I had known in advance how much I wanted a Theano library to work with, I would have written one, which greatly reduced the drudgery.


PyTorch’s API, on the other hand, feels a bit rough, but there are some qualifiers for it, more on that later. If you just do some standard tasks (implementing ResNet or VGG) I don’t think you’ll have a problem, but I’ve always had some disagreements because everything I do is a bit weird. For example, in my current project, because Strided tensor indexing is not implemented yet, I have to use several hacky solutions, although current indexing techniques are very flexible and much less intuitive than using Numpy-style indexes directly. The qualification of the center is that they are really just a framework for releasing Friggin, of course not everything is implemented and there are still some problems to be solved. Theano has been around for a long time and has matured, and I don’t see it or Lasagne having trouble with it.


Beyond that, my biggest “complaint” about PyTorch is basically that “things aren’t laid out the way I told them to be put together” with the NEURAL network API. In particular, I’m a big fan of Lasagne’s “Layers” paradigm — but a little critical thinking would lead you to conclude that it’s particularly inappropriate for dynamic diagram frameworks. I’m totally used to thinking about and optimizing my thought process for static graph definitions, so switching API methods was a minor pain point. This was very important – I spent a lot of time thinking “WELL, since I can’t use my standard flow control to write a regular program like this, so how do I define this graph Theano,” which made me stronger and stronger in the path of thinking.


However, dynamic diagrams require a fundamentally different API from “define + run”, and while I personally don’t find it intuitive, just last week its way of performing the definition alone, as CJ said, opened my mind and gave me ideas for dozens of projects that weren’t possible before. I also imagine that if you do anything with RNNs anywhere you want, for example, implementing dynamic computation without consuming computation, the command nature of the interface will make it easier to do so.


Speed: So I didn’t do extensive benchmarking, but I was surprised to find that PyTorch is instantly available and has a 100% faster training time on a single GPU than Theano + Lasagne on my current project. I have tested it on the Geforce GTX 980 and Titan X and implemented networks that have been confirmed to be identical and within a reasonable margin of error. 100% verbatim on CIFAR100 from (in the simplest case) 5 min/epoch to 2.5 min/epoch, and down to 2 min/epoch in some cases (i.e., twice as fast)


This is the same template code, using the same data extractor (I have to sarcastically say “fetcher” didn’t think “go to hell, fetcher (DIE, fetcher!)” ), everything is the same except for the actual code, training and running the network.


This surprised me because I was under the impression that Theano’s extensive and aggressive memory optimisation (in this case it only takes a few minutes to compile when you start training) meant it was very fast on a single GPU. I don’t know what caused the speed increase, or, since they were both using the latest version of cuDNN (I double-checked to make sure), the harvest must be somewhere in the sky, but I don’t know where.


Related, WORKING with Theano, I’ve never been able to get multiple Gpus or semi-precision floating point numbers. I spent days trying to get libgpuarray to work and using Platoon to try to fix it, but every time I was exhausted (imagine it wouldn’t be so hard even if I could get the resources to compile, it was already a pain point). However, the immediate use of PyTorch’s data parallelism (single-node, 4-GPU) and semi-precision (pseudo-fp16 for convolution, which means it doesn’t get faster but uses less memory) was solved. That’s what happened.


Development team interaction: I’ve been having great conversations with the core development teams of both frameworks. With Lasagne and Theano, I had some difficulties, a lot of strange problems. Many times, they quickly and succinctly helped me figure out what was wrong (I usually didn’t). The PyTorch team has also been helpful — I’ve been raising bugs or problems I’ve encountered and getting a prompt response, usually fixed on the same day, or getting a fix or problem tracking. I haven’t worked on Keras or Tensorflow, but I’ve seen their “problem” logs and some user groups, and just because of the large number of users, these frameworks don’t seem to get this kind of personal attention — like when I went to Cal Poly, a place where, The professor/student ratio is high, you rarely see more than 20 students in a class, whereas at Berkeley you can see lecture halls of 1,000 people. This isn’t to criticize Cal’s kids or imply Berkeley’s blind expansion, but if you’re someone like me who develops non-standard neural networks (and I’m not talking Chuck Tingle Weird), getting quick feedback from someone who actually built the framework is an invaluable skill.


Misc: One particular problem I’m worried about (why I’m going to pick up TensorFlow and use it as the main framework for a few years) is that neither Theano nor PyTorch are designed for deployment, and the development team doesn’t seem to be focusing on PyTorch (although I may be wrong in this regard, I vaguely remember seeing this in a forum post). I’d like to practice putting something on a website or Droid app (mostly for fun, but I’ve always been very focused on research and think it’s a really useful skill to actually get at what I’m doing on the device), and I’m not sure other frameworks support this approach very well.


Similarly, PyTorch’s distributed framework is still experimental, and I recently heard that TensorFlow was designed with distribution in mind, so if you need to run really large scale projects, TensorFlow is probably the best.


TL; DR: I’m not trying to recommend one framework over another; I’ve loved Lasagne to my death (probably more), but I’ve found the flexibility of dynamic diagrams and their fast, incomprehensible gain in speed. I installed PyTorch last week and got up and running in a fraction of the time, so I’m unlikely to go back. I don’t know much about TensorFlow. But it’s important for me to get timely feedback from the PyTorch developers because I’m doing some research that may seem a little strange, but I might also reuse TensorFlow for some projects in the future. This discussion post is great, but I want you to read it with the impression that this is their subjective experience, not a stereotype like “That’s it, you definitely feel the same way”.


Taion replies:


We recently switched from Theano+Lasagne to TensorFlow.


I haven’t tried any distributed architecture, but overall using TensorFlow after Theano feels very familiar — even better. My replies to the points you mentioned are as follows:


The equivalent graph compilation is much faster; We took seconds instead of minutes. But it’s still not fast enough if we want to add most of it to our CI Suite, but we don’t have to wait long to start training.


After switching from Lasagne to TensorFlow, I liked the higher-level functionality in tf.layers and tF.contrib. layers; They are functional apis for taking tensors and returning them, so it’s easier to integrate with “raw” TensorFlow. We can do normal tensors without having to write a layer.


On the model we used, TensorFlow was slightly faster than Theano (20%-30%). When used for the first time, we see roughly the same performance, and think it is acceptable, but then we read TensorFlow performance guide (www.tensorflow.org/performance)… “And switched to NCHW and incorporated the Batch norm, and everything ran much faster. I guess Theano himself is not very fast…


On the speed of developer feedback: I’ve asked a few trivial questions in TF’s question area, but TF developers usually get back to me within a day or two.


Also, the tools are pretty good. The TensorBoard is definitely good, as is the timeline/trace tool for representation. But I haven’t tried the new TFDBG yet.


TensorFlow certainly has a few drawbacks, such as being deployed to iOS in practice, but it’s a long story. Using TensorFlow is not painless, but is it anything compared to Theano, which requires Python Runtime? This is considerable progress.


If you use TensorFlow, I highly recommend you look at TF.Layers or TF-Slim. Specifically, TF.Layers essentially embed the Keras API.


Although I don’t expect any meaningful performance differences; The operations discussed in this post ultimately define a static computed Graph, so using a wrapper like Keras doesn’t increase resource consumption by itself, except for minimal resource usage at graph definition, but if you’re using Theano, You’ll notice that TensorFlow’s startup time is much faster (in seconds rather than in minutes).


It is useful to follow the TensorFlow Performance Guide. On the DenseNet (L = 40, K = 12) model, when we switched from the default NHWC and non-integrated batch specifications to NCHW and integrated batch specifications, our epoch times decreased by over 30%. In the WRN-16-4 model, we see that the epoch time decreases by more than 20%.


Badmephisto (Andrej Karpathy)


I think PyTorch is currently close to enlightenment in the design of deep neural network libraries.


  • It is lightweight;

  • It is currently in Python;

  • It gives you explicit control over the calculation. No compiler can be clever enough to “help you” or speed up your code; In fact, most compilers cause a lot of trouble in debugging;

  • It leaves only a few (explicable) layers of abstraction on top of GPU kernel calls, which is a guarantee of high performance;

  • Maybe it’s a personal preference, but I get specific OCDs that are related to abstractions. I get nervous when I have to do hard work, because once my future is revealed, I can feel its inescapable and unbearable pain. This is especially true in most cases where relatively simple things are supposed to happen under the hood;

  • Debugging is easier because specific lines fail in specific code (rather than far from sess.run () using large or generated Graph objects). Your stack trace doesn’t fill three screens for you to play “Find what went wrong!” Vertical scrolling games;

  • There is no compile time. I can’t understand how Theano users handle it, they must be more patient;

  • You can manipulate gradients directly, which obviously makes it easier and more natural to do some things (such as gradient clipping during backpropagation, or various “broken backpropagation” ideas, like the recent Shake Shake reg command; Indeed, I think you can crack a solution with stop_gradient);

  • Its support for dynamic diagrams was a top-down design principle from the start, not an afterthought that followed. And we’re going to see more dynamic diagrams, like a big NLP, or a network of neural modules;

  • It doesn’t indent or inflate your code’s explicit session object;

  • The abstraction it gets is correct: Raw Numpy – > Tensors (but raw Numpy on gpus probably knows nothing about deep learning!) – > variables (they understand deep learning), and Modules or Optim etc can be slightly beneficial.


My experience with TF has expanded a bit so far, so even things that are supposed to be simple like data-driven initialization use some tricks. Of course, the TF developers have their own set of solutions, but often involve combinations of five TensorFlow functions you’ve never heard of. I don’t remember doing that with a Torch, or maybe what I’m doing now is more complicated.


Disclaimer: I’m still a long way off trying PyTorch; So my experience was based on a background of doing a lot of complex things in TensorFlow, and just getting started on PyTorch. Let’s see what happens in the future.


Reply by jeremyhoward:


Practical Deep Learning For Coders-18 hours of Lessons For Free part 2 We switched from Keras + Theano (Part 1) to a state where KerAS, TensorFlow, and PyTorch are used together. In general, using PyTorch is always pleasant, mainly because:


  • Dynamic computing makes a lot of things easier. For example, neural translation of Seq2Seq + Attention is hard to do with Keras + TF, but easy to do with PyTorch.

  • Easier to debug because you can just use standard PyThon tools;

  • PyTorch makes custom implementations easier, so you can spend more time focusing on algorithms, which tends to improve primary performance;

  • Make multi-GPU easy to understand;

  • Torch- Vision makes loading and transforming images easy.


TensorFlow’s API is ridiculous, reinventing the wheel at every stage and requiring developers to learn a lot of unnecessary new concepts. However, Developer Summit says the situation is improving — and that using Both TensorFlow Servin and Cloud ML will increase your productivity.


Reply from Powlerbare:


I implemented some SEQ2SEQ models with TensorFlow about six months ago and realized why TensorFlow is so good: it has great built-in components that make research as easy as having a teacher pushing you, and note that loss functions are like noise comparison estimates, and so on. If I collect baselines or modify functions, everything becomes easier. I’m used to searching code bases and figuring out exactly which functions are working (some optional function defaults can get bad results if I’m not careful) — and I’m pretty confident in most of the reported results based on these implementations.


Here are a few things about my beloved PyTorch that TensorFlow doesn’t offer:


1) PyTorch provides an enhancement feature that I love. Enhancements basically don’t consume too much resources in the implementation, and it’s nice to have some built-in functions to call RL.


2) I didn’t use Autograd a lot because it was slow at the time, but people felt it was fun to use it for crazy things. I’m a big fan of this paradigm because I love writing networks on Numpy for no reason.

Compiled from Reddit Heart of the Machine