Shipping a Neural Network on iOS with CoreML, PyTorch, and React Native

(attardi.org)

501 points | by ot 2694 days ago

20 comments

xedarius 2694 days ago
What I like about this write up is it's end-to-end. Most of the ML write-ups leave you with a Keras model. Leaving many questions around how you turn the model into a product. Especially if you have to move the model to a non python platform. Really good read, enjoyed it.
[-]
- djhworld 2694 days ago
  The author comes across as very humble too, he doesn't pretend to be an expert, and takes his time to explain his rationale at every step. I'm not even that interested in iOS development or deploying models to phones - but I really enjoyed the read for the journey rather than the destination.
Philomath 2694 days ago
Really enjoyed the article. I don't do ML but I'm a React Developer, so I was interested in reading the article cause it said React Native. To my surprise, I read it entirely, understood most of it, and best of all, React wasn't part of it until the very last part. I encourage you to write more posts like this, I learned a lot.
zawerf 2694 days ago
If I understand correctly, the reason a CNN is used here is because we want to find the splits that a human would visually agree "looks" the best?
So rather than a regression it's more like the "line simplification" problem in graphics: (https://bost.ocks.org/mike/simplify/ https://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93...)
Just thought this solution seems a little overkill. Surely you can pick some error metric over the splits to optimize instead?
[-]
- moconnor 2694 days ago
  Overkill is a point of view here. Training and deploying neural networks is becoming easier than ever.
  In my group at Arm there's a solid expectation that we'll see neural networks integrated into every part of a running application, and whether they execute on special NN processors or the general-purpose CPU will largely depend on where the data is needed.
  I cut a very long comment short and wrote the rest of this up here: http://yieldthought.com/post/170830096265/when-are-neural-ne...
  [-]
  - zawerf 2694 days ago
    I said it was overkill because I thought I had a simple analytical solution as follows. Note: I don't know anything about segmented regression, this is just your standard CS dynamic programming to calculate splits
    DP[i][j] = min over k of (DP[i][k] + (cost of splitting at k) + (linear regression error of points from kth to jth))
    This should run in O(n^3) which will be fine for the author's requirement of ~100 points. But this isn't a complete solution since it's not obvious how to choose the cost of splitting (which is needed otherwise it will just split everything into 1 or 2 point segments).
    I think thinking about this more and explicitly trying to design this cost function is still better than labeling a bunch of data until the machine learning algorithm can reverse engineer the cost function from your head. Then you can be confident of what your code is doing and why and know that it won't randomly output potato.
    [-]
    - loverofthings 2694 days ago
      If you have labeled data (meaning that you know where proper splits need to be made) it's quite easy to build a DP based classifier that minimizes the error over training set.
      For example, if you were building a model that spits out 0 for no split, and 1 for split, you can easily make a simple cost sensitive linear model that takes into account previous decisions (something like HMM).
      Viterbi algorithm would be the DP step.
      For some, to me unknown, reason CNN performs well if not better than DP based HMM.
- casey 2694 days ago
  I believe this is what the author tried first in the post. He even links to this test UI where you can compare the "plain math" approach to the neural network:
  https://attardi.org/pytorch-and-coreml-test-ui/
- ianbooker 2694 days ago
  I think it is a great example for deployment and may be even a good example for tackling a problem that is not easy to understand. Either I missed it, or the nature of the problem is never thoroughly analyzed. I am not an expert when it comes to mechanical watches, but the main question hovering over this topic is: Will a watch deviate from the perfect time in a linear fashion? Or can there be other models, even more so: Is it possible that different watches will deviate in completely different modes? If so, the problem gets instantly three magnitudes harder...
  [-]
  - mr_toad 2694 days ago
    I’m guessing that it’s desirable to report linear deviations to the users (seconds per day), even if the deviations are non-linear.
- mark_adams_iii 2694 days ago
  This, I believe, is a discussion about analytical vs cognitive problem solving. And specialized vs generic approaches.
  Maybe the author of the app had other uses for CNNs in mind for other features in the future.
  So if you account for that, why have individual analytical solutions when you can solve a whole bunch of problems with one cognitive approach?
casey 2694 days ago
I love the description of convolutions:
> I think of convolution as code reuse for neural networks. A typical fully-connected layer has no concept of space and time. By using convolutions, you’re telling the neural network it can reuse what it learned across certain dimensions.
The diagram is great too: https://attardi.org/pytorch-and-coreml#convolution
flatfilefan 2694 days ago
Happy to hear it worked out for the author and a great showcase example for the technologies. Thanks for sharing with us!
I’m tracking the performance of my mechanical watch myself for over a year now. After some experimentation I’ve settled for making a burst picture of the watch hands at exact minute with my iPhone camera and reading out the EXIF for exact timing. This solves quite a few logistical problems with the measurements.
From my point of view spending time to design an automatic ml solution to something that is caused by a watch owner and can be easily identified is less optimal than for instance automating the measurements themselves as described above.
If the author is interested in moving into that direction I’d be happy to share my experience directly.
Otherwise good luck further on and keep us posted.
[-]
- steadicat 2694 days ago
  Using the camera for taking measurements is a great idea. Deciding where to split the trendlines is a separate problem though. A different way to take the measurements wouldn’t change that. Would love to chat about ways of improving both. Shoot me an email!
epanchin 2694 days ago
Great walkthrough. Purchased the app - I have been curious about the performance of my watch for a while but never got around to measuring it. The app nicely hit a niche.
What would be the challenges in using the camera to identify the time on the watch face?
[-]
- steadicat 2694 days ago
  It’s a great idea. I had it on the back burner. Wasn’t sure if it was worth the time for a relatively niche app. Now I’m thinking it might be worth doing just so I can write about it!
  [-]
  - iverjo 2683 days ago
    I also had that idea, and I made a crude proof of concept: https://github.com/iver56/cnn-clock
JonasJSchreiber 2694 days ago
Dude. This is awesome stuff! Great design, topic, demeanor, problem statement, solution. I also like the way you visually offset deep(er) dive subtopics. Thanks!
ultrasounder 2694 days ago
Very nice writeup. Also the timing couldn't be perfect. As I am also dabbling with an idea for a React Native app which would help me target both ios and Android. Thanks for sharing your experience.
[-]
- JonasJSchreiber 2694 days ago
  Timing couldn't be more perfect, I see what you did there.
gok 2694 days ago
Looks like this is a classification problem as opposed to a regression problem. (The NN is trying to pick one output). You very likely want to use a cross entropy loss function, not MSE.
[-]
- moconnor 2694 days ago
  I think this is correct - binary cross-entropy on the sigmoid outputs should at least make the network easier to train and may as a consequence improve test performance.
- steadicat 2694 days ago
  Good point. I tried to use PyTorch’s cross_entropy function but I got this error: "multi-target not supported".
- amelius 2694 days ago
  But the number of outputs is not fixed. What if the number of given points is 1000, or 10000?
elrhedda 2694 days ago
Great work shipping NNs on an app. However, if I understand the challenge correctly, wouldn't starting a new sequence for a new trendline every time the deviation drops (potentially with an error margin) do the trick?
kevinmannix 2694 days ago
Content aside, I love the ToC to the left with clickable links. Helps prepare a reader for what information their about to consume, and provides relevant context for each section.
SergeyHack 2693 days ago
Really great end-to-end write up.
Just a few things: in general case it's better not to use MSE after sigmoid due to slow convergence.
And "logits" variable is not logits actually, it's probabilities. Logits is what you have before applying sigmoid activation.
programmarchy 2694 days ago
I am guessing the ReactNative choice was for fun/learning since the author didn’t target Android. But is there a toolchain that can convert Torch or TensorFlow models to an Android-compatible ML framework?
[-]
- Marat_Dukhan 2694 days ago
  You can use the same toolchain to convert PyTorch model to Caffe2 through ONNX. Caffe2 supports both Android and iOS. There is even a tutorial: http://pytorch.org/tutorials/advanced/super_resolution_with_...
- parmesan 2694 days ago
  Both are already compatible. (I'm using both on Android)
- gauravm 2694 days ago
  There is Tensorflow Lite that Google just opensourced.
shreyask 2694 days ago
Really good end-to-end article! Enjoyed reading it.
tobyhinloopen 2694 days ago
"CUDA is not available" eh... OpenCL!
[-]
- steadicat 2694 days ago
  I’ve been subscribed to this issue since forever: https://github.com/tensorflow/tensorflow/issues/22
  Doesn’t sound like using OpenCL on iOS will be realistic any time soon. Am I wrong about that?
  [-]
  - programmarchy 2694 days ago
    I am pretty sure Apple has abandoned OpenCL in favor of Metal, which CoreML is built upon.
natehouk 2694 days ago
Love this.
junp0819 2694 days ago
Neural network is great.
omarforgotpwd 2694 days ago
Great walkthrough, but was a neural network really necessary for detecting how far off the time on a mechanical watch is? I didn't read the article that closely so maybe I'm missing something but just intuitively a machine learning model and neural network seem like a lot of work for something like this.
[-]
- steadicat 2694 days ago
  The neural network is not for measuring how far off the time is, it’s for guessing where different trendlines in the charts should start and end. You could say it’s overkill, but as I mention in the article I wasn’t happy with the results I got using simpler math. Also I’m pretty honest about the fact that I used a neural network because I felt like it. :)
  Check out this response for a different take on this: http://yieldthought.com/post/170830096265/when-are-neural-ne...