Optical Character Recognition (OCR) and Machine Learning with Ahmad Anis - ML 086

Optical character recognition, or OCR for short, is used to describe algorithms and techniques (both electronic and mechanical) to convert images of text to machine-encoded text. Today on the show, Ahmad Anis shares how he applies Machine Learning to OCR for small hardware applications, for example, blurring a face in a video in real time or on a stream to safeguard privacy using AI. The panel also discusses various strategies related to learning and soft skills needed for success within the industry.

Hosted by:

Michael Berk •

Ben Wilson

Special Guests:

Ahmad Anis

RSS Spotify Apple Podcasts YouTube Amazon Music

Show Notes

In this episode…

Optical character recognition (OCR) defined
Multiprocessing vs. multithreading
I/O bound tasks vs. CPU tasks
How to handle a retry in Python
Strategies for employing on small hardware
Template matching and preprocessing
Gray scaling integrations
How to learn and get started within the industry
Reducing the scope and industry soft skills

Links

Transcript

Michael_Berk:

Hello everyone, welcome back to another episode of Adventures in Machine Learning. I'm your host Michael Burke and today we are joined by Ben Wilson. Thank God he's back.

Ben_Wilson:

Hey there everybody, it's good to be back.

Michael_Berk:

And today we have a special guest, Ahmad Mustafa Nis. He is currently a machine learning engineer. He studied computer science in undergrad and has worked as a deep learning and computer vision software engineer. And he's been doing some blogging, which is how we found out about him. And now he's currently working at Red Buffer as a machine learning engineer. So, Mustafa, Ahmad Mustafa, do you mind introducing yourself a little bit more and telling us potentially why you are famous?

Ahmad_Anis:

Hi everyone, my name is Eman Mustafa Anif and I have recently completed my bachelor's in computer science. from Pakistan and I'm working as a junior machine learning engineer at Redbuffer and I have written quite a lot of blogs which are basically what I have learned over the time and most of them were on KDNuggets, some of them were published on Towards Data Science and some of them are on our company's publication of Redbuffer at Medium. So yeah, that's pretty much it about me.

Michael_Berk:

Cool, and what do you do specifically at Red Buffer? What type of projects are you working on?

Ahmad_Anis:

So Redbuffer is basically a services based company and most of my work it was around computer vision but I have other experiences as well. I have worked in creating dashboards as well in Python and other work was around computer vision mostly.

Michael_Berk:

What kind of computer vision specifically?

Ahmad_Anis:

So I have joined RedBuffer around five months ago. So most of my work in computer vision was revolving around OCR. So I was working on a system which requires a lot of OCR work and we wanted it to be quite fast. So that was like my, most of my work was there.

Michael_Berk:

Sounds cool. And for those of you who don't know, OCR is optical character recognition. So recognizing license plates, for instance, or for example, an address when you're sending a letter, if we want to automate that and see where the address or where the letter should be sent, we can use machine learning to figure out where it should go. So you mentioned latency requirements. What types of applications require low latency?

Ahmad_Anis:

So that was a project and the requirement was to make it in real time. So I cannot disclose a lot about the project itself but what The main target was that we want to achieve something which can work really fast on low hardware requirement and the main portion was OCR. So we were figuring

Ben_Wilson:

What's up about

Ahmad_Anis:

out

Ben_Wilson:

hardware are we talking?

Ahmad_Anis:

mostly normal laptops, like not GPU specific. So like we don't want it to be like just run fast on Nvidia laptop or Nvidia desktop. So we wanted to work fast on every laptop. So essentially our goal was to optimize it for CPU, not for GPU.

Michael_Berk:

Got it. That's super cool. Can you tell us anything about the application whatsoever? Only time I'll ask. We're just curious.

Ahmad_Anis:

I cannot disclose about the application itself that what was the application but I can definitely tell about my learning. I've learned quite a lot of things in that process. How to optimize the OCR, how to look for the things that can actually fasten your pipeline. So yeah, I can talk about that.

Michael_Berk:

Yeah, that'd be great.

Ahmad_Anis:

So there were quite a lot of things that were actually that actually factors that actually play an important role in a fast OCR. One of the important things which I think was missing in a lot of tutorials online which I mean I was referring to the online material was how to use multi processing or multi threading inside such a system. So for example I was using Tetheract. Tetheract does provide some multi threading options that are built inside it but that that also produced very good results and it was using multi threading and what we were focusing on was using multiple processes so that we can utilize all our CPUs so I definitely jumped into the multiprocessing portion and looked for different techniques that I can use to speeden up my pipeline. So some of the things which I tried were like I tried slicing the image into different parts and pass them into different processes so that an image can be processed multiple times in same span of time in different processes. But one of the reasons that didn't work quite well was because even though we were reducing the time of OCR which was on single image but eventually it was increasing the number of calls of the OCR which was which itself was quite an expensive thing. So slicing an image was not a great option. So what I eventually did was just that I created multiple processes and for each process I was processing a different input to my OCR. So that was something which actually

Ben_Wilson:

Soon.

Ahmad_Anis:

boosted a lot of, which actually gave a lot of speed boost to our system.

Ben_Wilson:

So you're parallelizing on a single system. For the listeners out there, could you break down what we're talking about? Not everybody's gonna be familiar with the difference between multiprocessing and multithreading, but could you get the thousand foot view and explain it to the audience? Like, hey, this is what we're talking about here.

Ahmad_Anis:

So I can explain it on a high level that multi-threading these are basically operating system concepts and multi-threading is basically a concept which allows which allows the process to run multiple threads inside it and In multi-processing we can actually run multiple processes or multiple CPUs or multiple code So we are running actually is mostly for IO jobs, IO bound jobs. So for example if you have a job which requires some IO time which has some waiting time so you can use multi-threading and you can actually just bypass that IO time which is in which system is waiting. In multiprocessing you have multiple processes that are running in parallel and doing the same job which essentially speeds up your pipeline or speed up your process by the time your number of processes you are running.

Ben_Wilson:

So in multi-processing, we have memory isolation and thread isolation across those processes, whereas in multi-threading, we have a common threading pool that tasks can be submitted via futures to different threads, and threads can be reused when a task completes. So when we're talking about IO-bound tasks in relation to ML, you bring up a great point with images. big, they take a while to load up, and a lot of that is IO-bound. And then you go into the actual model execution itself, which is going to be, in your case, CPU-bound in the other scenario that we were talking about before with NVIDIA chips. You're using GPUs, you're going to be GPU-bound. I was just processing through that. So when we were talking about the multi-threading concept. Why does that not work so well for your use case? From a technical standpoint, what's going on in the CPU that's sort of breaking stuff?

Ahmad_Anis:

So actually we when I completed a basic pipeline, I just profiled my code. So what I found out that the 96% of the time is spent inside the OCR like when the OCR stamp is actually being performed. So my main focus was to. optimize that 96% part and bring it down and I was not focusing mainly on that on the rest 4% part which was actually reading and reading the video and stuff like that other stuff which was happening. So I was focusing main on that 96% task which was the CPU bound. So I was focusing on multiprocessing.

Ben_Wilson:

Yeah, and to translate what you just said to people, when you're talking about those long running CPU tasks, if you're using multi-threading, you're locking a thread for the execution duration of that. So you're not getting the benefit of tasks switching to additional threads, because the task runs so long, you can actually exhaust your thread pool relatively easily with CPU-bound tasks. and it can create resource contention on the CPU, which slows everything down and it just becomes really inefficient. Whereas multiprocessing, which is designed for that CPU bound, you're locking resources and saying, hey, I'm gonna do four of these things all at the same time, and each of them are gonna get its own core and it'll manage its own thread processing within that multiprocessing, you know, sort of pseudo container, and it becomes more efficient. So, If anybody listening wants to sort of process that, think about it for a moment. When you're talking about using these more advanced concepts rather than just doing a list comprehension or a for loop within your code and you want to get better performance, try to keep that in mind, that relationship. CPU bound tasks, multiprocessing is gonna work better for you, IO bound tasks, multi-threading is usually gonna work better for you. And then anytime you're using this sort of stuff, don't forget to read the docs. The underlying versions of the language and the libraries that implement this, and most of this is language native information, we're probably talking about Python here. Those libraries change, like Python 3.7 to 3.8 had a major change in multiprocessing and multithreading. It was a very large refactoring of the underlying code. base for that changed how it behaves, massive performance improvements. Um, but you're only going to know that if you go and read the docs and get familiar with the APIs and the examples that are in there to know how to do this. Cause these processes are a nightmare to troubleshoot if you don't know what you're doing.

Ahmad_Anis:

Yeah,

Michael_Berk:

Yeah,

Ahmad_Anis:

definitely

Michael_Berk:

100%. So,

Ahmad_Anis:

the

Michael_Berk:

Ahmad,

Ahmad_Anis:

debugging

Michael_Berk:

you said

Ahmad_Anis:

portion

Michael_Berk:

that...

Ahmad_Anis:

is so hard.

Michael_Berk:

Yeah, no,

Ahmad_Anis:

you get

Michael_Berk:

it really

Ahmad_Anis:

so strange

Michael_Berk:

is.

Ahmad_Anis:

error messages that actually you do not understand if you are working just like copying pasting the code and if it just crashes and you get an error message and you have no idea where it leads to and you have no idea what is it saying

Ben_Wilson:

Michael_Berk:

Yeah.

Ben_Wilson:

you talked about before the NOCR process fails, and how do you handle in multiprocessing land with an ML task like that? How do you handle a retry? When we're talking about futures in Python, how did you figure that out and implement that in a way that you could not bottleneck the processing of your real time system?

Ahmad_Anis:

So yeah, definitely there was a limit. So there was a limit that it, so I did not get a lot of processes. So the number of, I did tune some space for other tasks as well. And definitely there was a limit for when the frames are coming in real time. So I have to like set a threshold that if my system is being slow, so a lot of frames can get stuck. into memory and that can actually just take all of your RAM and your system eventually crash. So multiprocessing module in the Python it actually gives you the option that inside its data structures which you are using that you we have to we can specify a number of we can specify its length so that it just not gets overflow and eventually crashes your system.

Ben_Wilson:

So if we're going to do this at an extreme scale, like customers of the company that Michael and I work for, when we're talking about video processing with OCR, and you're taking in 10 terabytes of data an hour of video feed. Did you have any problems like that, like scale of that magnitude where you're like, hey, I can't handle this just on, I'm not thinking on just a single CPU scale. I'm now thinking on multiple CPUs across multiple machines. Did you get into that world of complexity?

Ahmad_Anis:

I know it was not that world of complexity but yeah when we were dealing with the 4k video we ran into that problem so the problem was that when we were working with 4k videos the size of videos big and our maximum use case was a 4k video in our case. So definitely by setting the threshold and doing some pre-processing before it to reduce the size of it a bit it actually handled that. But yeah we our system was not on a very huge scale and there was not terabytes of data in this case.

Michael_Berk:

Got it. So the multi-processing and multi-threading seem to not improve performance that much? Is that correct?

Ahmad_Anis:

It actually improves the performance when I just used like I just created the multiple processes of the main OCR step. So there were actually four or number of what how many processes I do have when there were that instances of that number of instances of CPU that was that were processing the. stream so it actually improves the process quite a lot.

Michael_Berk:

Oh, okay, that's great to hear. Well, what other methods did you employ to make it run on smaller hardware?

Ahmad_Anis:

One of the tricks which we considered and which actually worked was that we do not actually need when we are talking about video stream, we do not actually need to perform every operation on every frame. So we do have.

Michael_Berk:

who editor a mod

Ben_Wilson:

Michael_Berk:

dropped.

Ben_Wilson:

lost our interviewee.

Michael_Berk:

Yeah.

Ben_Wilson:

at break. I teed that up for you, Mike. The multiprocessing versus multithreading, that multiprocessing concept, that spark. That's how executors

Michael_Berk:

Yeah,

Ben_Wilson:

work.

Michael_Berk:

I know. Yeah.

Ahmad_Anis:

Hi.

Ben_Wilson:

Or you could use Ray

Michael_Berk:

Hey,

Ben_Wilson:

as well.

Michael_Berk:

yeah. Hey. Yeah. So

Ben_Wilson:

Okay.

Michael_Berk:

no worries about

Ahmad_Anis:

Hey,

Michael_Berk:

dropping.

Ahmad_Anis:

can you hear me?

Michael_Berk:

Yeah. So we have editors for the podcast and they'll stitch it together. So no worries, we can just continue where we

Ahmad_Anis:

Alright,

Michael_Berk:

were what we were talking about.

Ahmad_Anis:

that's good. Alright, so

Michael_Berk:

Ahmad_Anis:

the

Michael_Berk:

Ahmad_Anis:

other

Michael_Berk:

guess

Ahmad_Anis:

steps

Michael_Berk:

I can restart.

Ahmad_Anis:

which are... Okay.

Michael_Berk:

Let's restart

Ahmad_Anis:

All right.

Michael_Berk:

from this point. So it seems like you improved performance with the multi-threading and multi-processing process. But what other methods did you employ to make it run on small hardware?

Ahmad_Anis:

So another step when we were dealing with and which worked was that we do not actually need to perform the operation OCR operation or any other operation on every frame so we do have a margin that we can skip some frames and we can get the desired result so that actually plays an important role in having a boost in your pipeline so if for whole pipeline time and if you can skip it for let's say if we have a 30 FPS video and we can just perform it like three times in a second every tenth frame so we can actually get a quite good boost in the speed and we can still retain a lot of information and we cannot miss anything so that frame skipping part that was actually quite helpful

Ben_Wilson:

So one of the things that I've seen people do for this in order to reduce the data density going to a deep learning model that's doing classification, like a multi-class classification identification and defining bounding boxes, is doing dynamic differencing in a video feed, where instead of, I mean, the approach that you guys use is definitely completely legitimate and it's 100% what I would do initially. Like, hey, how often do frames really change? And what is the nature of this video? Is it something that's very fast moving? What is the rate of change of items within the frame? But for things that are somewhat static, and you might not have a change that occurs over a 30 second period, when you're talking about surveillance video feeds and stuff. you're trying to do detection. In your pre-processing data coming in, you can maintain a buffer of the images coming in and saying, what is the general difference? Like how different are the pixels from my reference image that's sort of moving through time as a snapshot of a take effectively. And then as the new ones come in, do a quick check and say, what is the percentage difference here? And if it's at If it's outside of a threshold, then send that for classification. If not discard. Were you thinking of something like that to further reduce that pressure or was that completely out of scope?

Ahmad_Anis:

No. we actually tried that also. So we did some frame differentiation and we tried some techniques and they were actually working but they were not very generalized. So and it was difficult to have a good threshold set for that. So for example for some cases if we do like if they are 96 percent similar then we can just skip it. But in some cases we tried to do we have to we can even down the threshold to 90% but sometimes even we would even need a bigger threshold. So we definitely tried that but another thing that was similar to this which we tried was using a tracker. So if we have to focus on let's say a specific part of the video feed. So what we did that initially when we detected that part, we localized it, we got its burning boxes. And now we can use different techniques to find out whether that part is still there or not. So we have to focus on that part regardless of the other frames, how much they are changing or not. So we can use different techniques and if that part is static, that is not changing in background or something, we can use template matching or we can use some sort tracker that runs very fast and we can skip the OCR part that was quite a lot expensive than that. So that actually gave a lot of boost.

Ben_Wilson:

lose them again.

Ahmad_Anis:

I hope I'm audible.

Ahmad_Anis:

Hi, can you hear me?

Michael_Berk:

Yeah, hey.

Ben_Wilson:

So you're talking

Ahmad_Anis:

That's

Ben_Wilson:

about template

Ahmad_Anis:

it.

Ben_Wilson:

matching.

Ahmad_Anis:

So should I start or will you start the question?

Michael_Berk:

Let's restart after Ben's. sort of the pre-processing step. So if you could start from your whole sentence, I think that would be easier to edit, or the whole phrase, that would be easier to edit.

Ahmad_Anis:

So, yeah definitely we tried. different frame differentiation techniques which were not working very well. But we did try the technique which was similar to it. The idea was similar and which worked really well. So what we did was that we had to focus on a part of screen. So that was our area of concern and we have to see if that part is still in the stream or not. So if that part is in the stream. actually skip the OCR process and we use different techniques we use different techniques so for example we just in the first iteration we just localize that part and we can use different techniques like template matching so if our background is not changing very much we can use template matching and template matching works well on things which are static we can also use some sort of or some other tracker which are very fast and as compared to OCR they are really fast. So using this sort of technique they actually saved us a lot of time. So we did not have to OCR on every single frame. Instead we just OCR on the frame in which the tracker was failing.

Ben_Wilson:

Yeah, I really like where you're going with this. And this really speaks to. When we put aside the ML aspect of a lot of production deployments of things, a lot of people focus on the model and how cool it is. And like, oh, this image recognition model can do these things. And yeah, that's cool. But when you're talking about creating something that's going to be useful for humans to actually do something with, these are the issues that we're talking about right now. How do we? How do we get this so this isn't going to cost a fortune to run? And how do we limit the amount of data that we have to feed into this to still get the unacceptable result? Could it potentially be better if you analyze every frame? Maybe. But what is that gain compared to how much that's going to cost? And that's why I really wanted to talk about how did you do this and how did you think about. reducing that frame count because those creative aspects of almost pure engineering work that you have to do as an ML engineer That's really where The magic happens in production deployment Is figuring that stuff out coming up with a bunch of hypotheses to test and then testing them all out seeing which one works What approaches solve this problem the best? What's the cheapest and most importantly? What's the easiest to maintain? So yeah, thanks for explaining all that.

Ahmad_Anis:

No worries.

Michael_Berk:

And also, did you use any gray scaling?

Ahmad_Anis:

Yeah definitely, greyscaling was actually a lot better. Like when we started I was just fitting in RGB images. that act and when we grayscale it so it actually the results were no different so the OCR was giving same results on RGB images as it was giving on grayscale images and the speed was actually a lot better and another thing which I noticed was when I when I was just doing like the last step before the OCR. So when I just added that step in the start of my pipeline before the other processes it actually gave more boosts. So for example what I want to explain is that if you use a process that will not change the results of your pipeline but it will definitely improve the speed so take that step into the top in Before everything else happened so that every the time of every step reduces

Michael_Berk:

And then...

Ben_Wilson:

Yeah, and the only time that you have to be creative about when you would do that conversion is if you're doing any sort of pre-processing that relies on color, where you're like, hey, I'm changing from RGB space to YCRCP or something, in order to do some sort of color conversion or color matching or replacement. But if you're not doing that stuff, you don't need to do that stuff, the earlier you can reduce the dimensionality of that data structure that's coming in. We're talking about color and grayscale. In case any of the listeners don't understand what we're talking about with that, a particular pixel in color has a three-dimensional array that's in there, or the image is three-dimensional with respect to each pixel has an R, G, and B coordinate position. And the combinations of those three colors gives us, you know, the color based on hue and saturation of that, of those values when they're like rendered by a device. In grayscale, we're talking about a single value. Although there are some great grayscale things that have two elements to it. But if you can reduce that dimensionality, that's so much less data that has to go through your model and your pipeline. So earlier you can do that the better.

Michael_Berk:

What are the two dimensional grayscales?

Ben_Wilson:

There's one that has to do with a lumen intensity as well as a hue. I can't remember what the formatting is. It's been years, man, since I messed with that. But there was a particular file format that had that. We were messing around with it at a previous job.

Michael_Berk:

That's super cool. Yeah, so that was one of my questions. And then another question is, this seems like it would lend well to transfer learning. Theoretically you could have a large model up in your cloud or the user's cloud, and then you could do fine-tuning on a laptop. Have you guys thought about that at all?

Ahmad_Anis:

So actually in our portion we were not training a model. So we were using pre-trained models, we were using pre-trained OCRs and we did not have to do any sort of our own training. But what One of the things that I would like to mention was when we were using the model. So every model has a main. So when you're using pre-trained model or any model that has some sort of weights, there are weights that are actual weights and there are quantized versions of that ring. So the idea of quantized version is that you actually reduce the data type in simple terms of your weight. weights are in float 64 or float 32 you can reduce them to like int 8 or a very small data type so it actually gives us a little bit of accuracy reduction but that accuracy reduction comes in cost of good speed so we were also using quantized models so that was a good speed effector.

Ben_Wilson:

Yeah, when you're talking about the CPU computations that are involved in 64-bit integer operations versus in 8, the memory and what the CPU is actually doing, the number of transistors involved in that thread process that's being executed is much lower. So you don't have to use L0 cache that much when it's executing that. And that adds up when you're talking about deep learning. There's a lot of calculations that are happening throughout that. that model structure pre-trained or not, even if you're doing it yourself and building a very simple wide and shallow network. That's a really good point and I think it's something that not too many people focus on, but they should. When you're talking about releasing something, you're like, hey, let's figure out how good this needs to be. Let's get it to that. And then how can we save on costs by making things faster and cheaper and run on cheaper hardware? Excellent point.

Michael_Berk:

I'm just curious, what types of algorithms did you end up using? Yeah, what kind of algorithms did you end up using? You mentioned Tesseract, but what else, if any? Well, that'll be an easy transition to get back into.

Ben_Wilson:

S-R-A-C-T O-C-O.

Michael_Berk:

Yeah, I looked it up last night too. HP developed in the 80s. Basically they incrementally create a bounding structure on black, white images. Yeah, I've...

Ahmad_Anis:

Hi can you hear me? We actually just shifted our house so internet is quite buggy right now.

Michael_Berk:

nowhere is it all. But this is a great, the easiest transition so far. So the question was basically what algorithms have you used. So if you could just start anywhere.

Ahmad_Anis:

Alright, so... The major machine learning algorithm was only the Tetheract, that OCR engine was Tetheract which we were using. There was not actually much more machine learning portion involved in it, but yeah we used different algorithms such as we used in the frame differentiation techniques, we used template matching, we used tracker with CSRT tracker and yeah I guess there are some of the algorithms that are worth mentioning which are famous. The rest of the portion was mostly software engineering process and the logic and operations that were being performed on the data.

Michael_Berk:

Got it, and so sort of taking a step back, do you mind explaining a little bit about what Tesseract does and how?

Ahmad_Anis:

Alright, so treasure is basically a one of the most famous OCR engines that is out there and it is I believe backed by Google and it is developed in C language so it is quite fast and they have like a lot of pre-trained models available so if you have a general task related to OCR so you do not need to do any sort of training you just need to pass it to Tetheract with the appropriate pre-processing and yeah it will give you all the text from it. So it is just a deep learning. Tesseract version 3 was mostly classical computer vision and machine learning but up to Tesseract version 4 and 5 they are using deep learning models such as LSTM and RNN. But one of the things which I found which I did not like about Tesseract actually was that it is all C language but if for example someone wants someone has a GPU a nvidia GPU so you cannot actually speed up your computations with the tesseract so you would have to shift to another OCR engine so if you want to use GPU you a better choice would be using maybe Keras OCR or Easy OCR or Paddle OCR so these OCRs can utilize your GPU as well so setting up your GPU NVIDIA GPU would desert it that is I can say one of the hardest thing you will ever do

Ben_Wilson:

Oh, you'd

Michael_Berk:

Peace.

Ben_Wilson:

have to compile it and you'd have to get it to communicate with the GPU, right? Cause it's not compiled for that. Um,

Ahmad_Anis:

Yeah.

Ben_Wilson:

yeah, that would, that would be, that would be rough. I mean, any, any software package that's been around as long as Tesseract has where it's been compounded upon over, when we're talking about something that's been worked on for over 30 years. Uh, HP labs was the, the originator of that with the old school machine vision back in the day. and why like very widely used around the world. Um, yeah, it's, it's hard to take that old code. And some of that stuff has probably been written in, in libraries that, that those languages are for all intents and purposes to a modern computer science field. Seem like dead languages. Um, or you look at the, the underlying C code and you're like, wait a minute, this was Last time this was edited was 1986? Really? And it still runs? Wow. So yeah, you'll see that with the really foundational things like that. Yeah, it's interesting to me how they've ported a lot of that functionality over into, with like version 4 and version 5, and Google said, hey, let's run this on TensorFlow and we'll get Keras support here to sort of modernize that. Did you end up trying? version 3 versus version 4 and see how the classic performed versus the deep learning implementation.

Ahmad_Anis:

I actually tried version 4 versus version 5. So the version 4 I was using quantized model of version 4 and version 5 model which I was using was not quantized. They haven't uploaded a quantized model.

Michael_Berk:

By the way, this stuff is so cool. I was reading about it last night. Shit's awesome.

Ben_Wilson:

Yeah, when we were doing, at my last job, we were doing image recognition. We did it with OpenCV manually,

Michael_Berk:

Hmm.

Ben_Wilson:

and that's how we had to learn all of the underlying, like how do you, like how do these libraries actually outline a shape in an image, and how do they determine what that is? And we didn't have it for our use case, so we had to build it all ourselves, and like extend those libraries. We're like, wow. This is tough stuff. It's cool though.

Michael_Berk:

Yeah, so welcome back Ahmad.

Ahmad_Anis:

Can you hear me?

Michael_Berk:

Yeah.

Ahmad_Anis:

Can you hear me? Alright. So should I start?

Michael_Berk:

Yes.

Ahmad_Anis:

So I did not write

Michael_Berk:

Yeah.

Ahmad_Anis:

version 3 versus version 4. But... what I tried was version 4 versus version 5. I was using quantized model of version 4 and the version 5 was non quantized and to my surprise the version 5 model was actually giving almost same speed. It was a little bit slower than version 4 quantized model. So they have brought a lot of optimization in TensorFlow version sorry Tetheract version 5. And so... When I just started I installed Tetheract on my Ubuntu and the default Tetheract which came with Ubuntu 18.04 I guess I was using and that was version 4. So I was not really aware that time about the version defensing of Tetheract at that time. So when I later on tested it so I found it quite interesting that version 4 quantized models are. a little bit faster than version 5 non-quantized models. That was pretty cool.

Ben_Wilson:

That

Michael_Berk:

And just

Ben_Wilson:

brings up

Michael_Berk:

Ben_Wilson:

Michael_Berk:

Ben_Wilson:

a very

Michael_Berk:

clear...

Ben_Wilson:

important point when we're doing evaluations for solutions in ML is test stuff out. Particularly if you're an ML engineer. I mean, data scientists are always testing stuff out, right? Like, oh, I'm going to try this model. No, I'm going to try this other model. Or I'm going to try this framework. But even from an ML engineering perspective, when you're getting a solution, which is the project we're talking about right now, you're effectively getting somebody else's, you know, in the form of a packaged model that somebody else already built, and you're having to figure out what's a best solution for solving this problem, and even you're testing stuff out. So people out there, listeners out there, if you're in ML engineering and you're working with the data science team, they punt some code over to the wall and say, we need this in production. You can do exactly what Ahmad is talking about where we're going to test different things out. What different versions are there of this? Can we use a different operating system? Can we use different hardware? You know, how can we solve this in the best way possible and, and do validation checks on our ideas. And so long as it meets the needs, not of the data science team, but of the business, then it's a good solution.

Ahmad_Anis:

Yeah, that's a great point. So as long as it meets the business needs, it is a great solution.

Michael_Berk:

So I wanted to also transition a bit into sort of more high level concepts about teaching yourself machine learning. I know you have, you're more sort of junior in your career, but you are very prolific in the blogging space and have written some really, really great posts, seen a lot of success on Katie Nuggets. So I was wondering, I have like a couple quick hitters that I would like to ask both you and Ben as practitioners of teaching yourself ML. So you can go take a Udemy course and get that certificate or should you just read a blog and generally know what's going on?

Ahmad_Anis:

I think they definitely helps. So having a certificate, it does help a bit. Like it is not, it might not be very worth it or very much, you know, having a lot of importance but yeah it definitely helps and it looks especially when I'm as a junior engineer so when I landed my first internship my interview did not go very well so what my CEO my ex CEO he told me that even though your interview is not very well I can see that you are putting a lot of efforts in doing courses and having satisfied so I can see that you you do have that motivation inside so yeah it definitely helped me in my this case having those certificates or doing having even though he did not check those certificates but just because I had listed those on my CV and I had actually done those he actually was inspired by it so yeah it helped me.

Ben_Wilson:

My take on it is going to be a little different, um, simply because I've been doing this a while, it's been a long time since I've been like an entry level person, but what I've seen over the years is when somebody's trying to learn something, there's two key factors that I think about one is what is their motivation and, uh, with respect to motivation, are they going to Are they going to be somebody who's going to self-teach themselves and can just find information where they need it, wherever they go and they can develop their own learning plan of like, hey, here's what I need to do. Usually you need a mentor to do that with you if you want it to be efficient. But if you're completely on your own and you don't know where to get started, you don't know even what to search for or what to work on or what foundation do you need. Those courses that provide those certificates are incredibly helpful because it's structure. It leads you down a learning path. Is it going to make you a professional data scientist or ML engineer after doing one of those courses? No, it is not. It's going to allow you to understand the breadth and scope of what you don't know. Because when you're getting started, you don't know what you don't know. You don't know where to go next. It's the same thing as in university. Uh, when you, you don't walk out of any university program, knowing how to be a professional in your field. Nobody does, uh, internships help, but you're not going to, you know, go and do that stuff and then be immediately qualified to be somebody who's, who's a foundational technical member of their company. You're going to be in, in like go from an intern and then graduate. And then you're like this new hire who has to learn a bunch of stuff and. the structure that you get from those learning courses, it's the same as in college, it's a structured program which is teaching you foundational elements, but at the same time, it's teaching you how to go and learn yourself when you identify the gaps or you know where you have a gap. And now that you need this skill, okay, where do I go to find out how to learn this and how do I get good at this? So I think that structure is really good. And I'm always a big proponent of people doing those courses for that reason. And then the second element that's important for learning, whether it's through reading a bunch of blog posts or doing a number of different certificate programs, is can you reduce the scope? And what I mean by that, we were talking about this yesterday, Michael, when we were talking about guitar learning, that you know, just started picking up learning guitar again after almost 20 years. And the guy that I'm learning from, is it was talking about only learn what's absolutely essential when you're at that point. You know, you don't pick up a guitar, learn your basic chords and then be like, okay, I'm going to memorize all my scales, all my chords. I'm going to learn all the, you know, C add nine augmented chord and diminished minor chords for everything. Then I'm going to learn my jazz chords and I'm going to learn the blues chords. You don't do that because it's, your brain just can't handle that amount of information. You can't be like, oh, I'm going to memorize all possible 347,000 chords on a guitar. It's useless. It's the same thing with ML or any technical thing that you're doing in your career. If you go out there just trying to learn information at random, you're not going to retain anything. You're not going to really know it. But if you have the foundation of like, OK, I have the tool set that I need to do this basic job that prepares me to understand when I encounter something I've never seen before and I'm really stuck and I have to learn this, I know how to go about learning that and then I'm going to learn that thing because I need to learn that. And that's what I really see. That's what blogs are really great for. You can go and find them and be like, oh, this explains this concept that I need because I'm working on a project that is doing this. That's my

Michael_Berk:

Yeah,

Ben_Wilson:

two cents.

Michael_Berk:

I completely agree. I think if you understand the fundamentals, you can learn anything. Especially in the internet age, there's just too much to know. You could spend a lifetime working on OCR and still not scratch... Well, you'd probably scratch the surface, but you wouldn't potentially be the leading expert. Question for both of you though is... So we have this giant field of ML, right? And there are theoretically underlying principles and commonalities between different concepts. So for example, if you learn how a decision tree works, you can probably learn random forests and other tree-based models. If you learn the basics of a neural network, you can learn RNNs and all these other crazy things. How do you identify what are the first principles of machine learning? I think that's one of the more interesting challenges that I've run into because if you know what to learn and know where to like, then you can just go to the internet and learn those things. But knowing what to learn and knowing what is a first principle versus a technique on top of a first principle, that's really hard to identify. So do you guys have advice on how to think about what ML first principles are?

Ben_Wilson:

I'll let

Ahmad_Anis:

Ben_Wilson:

you go first.

Ahmad_Anis:

So what happened in my case was I had always, like when I started my CS journey, it was always in my head that I want to learn machine learning. So I came across the podcast. I don't exactly remember the name of that podcast right now. But yeah, I think that person has also appeared in the Adventures in Machine Learning podcast once in a while, but that podcast series is different. So what he talks about that in that podcast was actually he just goes on. At first he just explained the overall ecosystem of the AI and machine learning community and what things are present and then he goes on explaining all the different basic algorithms. So that actually really helped me a lot in developing a path for myself. So he just explained all the concepts in easy words and he would then recommend a lot of good resources for that. a lot on Andrew Engie's machine learning course for beginners and that course was actually really good for someone who has not who do not have a lot of theoretical knowledge in machine learning and wants to dive in I think he explains it in the easiest way and definitely there are some other things in which which he with the help of that podcast and with the help of some other things some other blogs and listening to some other professionals and watching some YouTube tutorials. Before starting learning, I created a path for myself. So I know that I have to cover a good amount of Python. I have to cover basic amount of mathematics. I have to cover basic machine learning, supervised learning, unsupervised learning. These are the resources which I am going to take and these are the books which I can read. editing that path for quite some time and yeah I think that this thing really helped me listening to the professionals what they say on which courses should you learn or which is the good course which are the good books to read I think this helps someone a lot in creating a path for themselves and having a complete path I think it motivates you and it gives you a structured knowledge just talked earlier that if you go on learning unstructured way you might listen a lot of things but you won't retain any of that so if you go in a structured way you will retain a lot of it so yeah

Michael_Berk:

So if I were to sort of summarize your point, it would be find a predefined structure either through a lesson plan or a podcast or a bunch of YouTube videos that will outline those fundamentals and then you can go about learning deeper wherever you want to, but make sure you rely on that structure.

Ahmad_Anis:

I think that was essentially my point that if you have a structure and if you have got to a basic structure, you will know even if you are in the middle of that, if you have started that path, you will know that whether doing this is worth it or not. So I know there are a lot of people have created paths for themselves. a lot of things in common. So for example, Andrew and his machine learning course, it is common in a lot of beginners' paths. If you search beginners' learning path, I can say that almost all of them, if they have courses section, they have recommended it. So if you do not know on which path you should rely, you can take out those common things which are common in all of those paths and which are basically trusted by everyone. So, and definitely when you have completed a basic path, you have some basic knowledge. You will eventually get to know what you want to do next. So when you have completed a basic path, you will know that if you want to go into computer vision or if you want to go into natural language processing or you want to go in time series or you will eventually develop an interest and you will keep on finding new things which can help you. if you go to a basic structure at once.

Ben_Wilson:

My answer will be controversial, maybe. Um, but core, core things for the two different paths in this profession. Uh, one is model centric focused about, you know, what we would call data scientists, they do a lot of analysis. They build models. They, they create a lot of prototypes and do a lot of research. And then the production side, which is either ML engineering or ML ops, whatever you want to call. whatever that job title will eventually be, the people that do that. I think there's commonality between both of those groups that there is a foundation that needs to be there. And you need to know how to write code and not just copy paste stuff from the internet, put it into a notebook and hope that it's gonna work well. You don't have to write advanced code for data science stuff typically. But you should understand the fundamentals, how to write human-readable code. And it doesn't have to be the most performance stuff, but from the ML engineering side, it should be more of a focus on that. You should be pretty darn good with software and know that you need to continue to get good with that and focus on optimizations and testability and the ability to write code that can be maintained and extended. But there should also be a common foundation in statistics on both sides. It's really tough to understand the concepts of how a model or an implementation would work. The underlying library, if you don't have at least moderately advanced statistical knowledge. And if you don't have any of that knowledge, it's really hard to even determine if your solution is even going to work. We've talked in the past, Michael, about A-B testing and how do you do attribution analysis? Hey, deploying this model to production has run for a month. Should we keep it in production? Yes or no? If an executive asks you that. If you don't know how to analyze that and provide the correct analysis of, hey, this is objectively what this is doing. It's causing our revenue to go up or our membership to go down or whatever it may be. You need to have a pretty deep understanding of statistics to do that stuff. Uh, and then the most important thing from both sides of that, I think for a foundation is, is the soft skills. Uh, and two key soft skills. One is know how to talk to people and share ideas, listen to them and test out things that they're suggesting. Be open to that. set your ego aside and learn to work with people. It's super important in this profession. When you get, particularly the further along you get in it and the more years you put behind you, you'll realize how important that is. And the successful people are the ones that collaborate and are just really nice to work with. And then the people who aren't nice to work with and wanna be the lone wolf, just coming up with amazing models, they usually don't stay in this profession very long. uh, no matter how good they are. And then the second aspect of the soft skills is can you figure out problems in creative ways? Can you present it with a problem? Can you think outside the box and come up with clever solutions? That's really what this job is. It's not models. It's, it's not, you know, how good your code is. It's how well can you think? Period. End of sentence. Can you. get creative and talk with other people and come up with creative solutions.

Michael_Berk:

you So let me see if I can recap this. So Ahmad sort of takes an academic approach and says leverage a pre-created path either through, well, whatever resource you want to use. So blogs, podcasts, actual courses, or even a university program. Ben takes the approach of be a god and be really, really smart. Ideally, you will have a really strong programming experience, really strong statistical experience, but more importantly, you can distill problems into their component parts and sort of shift them around to your needs. I will take a third approach which is the field is too big to try to learn it all so I advocate for just doing what you like and trying to make an impact in that area. So an example is for my thesis I was doing environmental science and did a forecast of coral reef health in the Caribbean Sea. So I would take a bunch of time series data that was very disparate and then I would run into all these really crazy time series problems and through that process I got pretty comfortable with the state-of-the-art methods for doing time series analysis and found a bunch of holes that were just the data were crap. So it just wasn't really possible. But this approach is really fun and it helps you sort of take a bunch of creative angles because you actually care about the problem. So that would be my piece of advice and then in terms of... That's sort of the high level process, but in terms of learning the first principles of your project, one thing that I found really helpful is spend 90 minutes on the internet Googling, and then do a couple more Googling sessions where you look up every single word or every single phrase that you don't know. So first pass, go through the first 10 Google results of how to do time series modeling, how to do this, how to do that, and then write down every word that you know. Next day, because this will exhaust you, next day go and Google every single concept you don't know until you have every single concept outlined perfectly. And then from there you sort of have the raw materials to understand a concept and then as you start applying those materials it will start sort of assembling and chunking and the hierarchical structure in your head will start being formed. So that is how I have done a lot of my learning. well for me, but it's also really painful. So it really depends upon your style.

Ben_Wilson:

I don't know if it's painful. I think that's kind of fun. Fun fact, when we're writing new implementations, like in my day job right now, when we're doing something that's like, hey, we need to interface with this open source project or this package, nobody on the team has any experience with it. Or maybe it's new and we just haven't seen it before. That's exactly how we do it. We read through their docs, go through their Getting Started guide, look at the examples. Yeah, that's cool. We'll give it a couple of minutes seeing the structure of it. If anything stands out, we're like, that's weird. And then go straight into the API docs and start reading through the main descriptions for the main access point for it. And anything that we don't know, we don't write it down. You just open up a new tab that's a search for that term or somewhere in that doc of what is this thing. Sometimes it's a link. If people are really nice in their API docs, it's a link to like a Wikipedia article or a white paper that explains what this thing is. Have, you know, give that a read, a quick scan, read the abstract of it, and be like, oh, okay, I know what this is. Or you get that other situation where like, I have no idea what they're talking about here. I've never heard of this before. Maybe I'm an idiot or just incredibly ignorant about this, or I thought I knew this, but this person is presenting in a different way. and then give some time to understand that so that when you do it, the implementation, you know what is important, why it's important. You can write your own docs on that other package that explain, Hey, this is how this is used in our toolkit by using this other toolkit. And this is why we're using this this way. So I think it's a great way to learn. It, it takes a little bit of effort, but you will learn it by following that procedure. But

Michael_Berk:

Yeah,

Ben_Wilson:

you brought

Michael_Berk:

it's true.

Ben_Wilson:

up an interesting point there where it's, you're only learning stuff that you need on, on that topic. When you were learning time series stuff, you probably went and checked out a couple of time series libraries, checked out the concepts behind it and they're like, Oh, that's what stationarity is. And like, Oh, that's what, you know, when I do compose the trend and I, I see these things, I can understand what the components are, but You probably didn't go out and learn everything there is to learn about time series, you know, good luck

Michael_Berk:

Ben_Wilson:

with

Michael_Berk:

didn't,

Ben_Wilson:

that. That's

Michael_Berk:

yeah.

Ben_Wilson:

an entire career worth of time. So you learned what you needed to learn in order to do the forecasting of, of coral reef health.

Michael_Berk:

Exactly. Yeah, you're 100% right. It's just scary looking into the abyss of a new Wikipedia article and not recognizing every fourth word. But

Ben_Wilson:

Mm-hmm.

Michael_Berk:

if you just have some stick-to-itiveness, it'll lead to understanding after some amount of time.

Ben_Wilson:

Yep, definitely.

Michael_Berk:

Great, so we're coming up on time, but. This has been really, really fun. Just to sort of recap what we've chatted about, we've talked about OCR and specifically what Ahmad has been doing at RedBuffer. He is working on developing simple and small OCR models that work on laptop-like devices for low latency applications. Some of the really effective methods that he used were pre-processing the data to grayscale and distributing different workloads across different threads. And so he's been doing some really, really cool stuff and definitely suggest checking out his KD Dungitz post. I was reading it before this call. Lots of really, really practical tips for OCR optimization. So with that, Ben, do you have any closing thoughts?

Ben_Wilson:

I thought it was a good discussion and hopefully people get something out of some of the little wisdom nuggets that we dropped there about how to learn, how to get started, and how to stay focused and how people do it at different stages of their career. Hopefully everybody enjoyed that and I guess until next time.

Michael_Berk:

Yeah, let's just real quick before we close out, I know we want to, but Ben, or sorry Ahmad, do you have any ways that people can reach out to you, whether it be through social media or LinkedIn or a blog post, in case they want to get in contact?

Ahmad_Anis:

Yeah, definitely. I'm mostly active on my LinkedIn account. I'll post the link to my account here. You can definitely reach out to me on my LinkedIn.

Michael_Berk:

Beautiful. All right, well, until next time, it's been Michael Burke.

Ben_Wilson:

and Ben Wilson.

Michael_Berk:

And thank you, Ahmad, for joining us. Bye, everyone.

Ahmad_Anis:

It was it.

Ben_Wilson:

Take it easy.

Ahmad_Anis:

Bye.

Optical Character Recognition (OCR) and Machine Learning with Ahmad Anis - ML 086

0:00

53:36

Playback Speed:

Show Notes

In this episode…

Sponsors

Links

Transcript