JSJ 445: Augmented Reality for Mobile Browsers with Connell Gauld - JavaScript Jabber -

JSJ 445: Augmented Reality for Mobile Browsers with Connell Gauld

Connell has been working on Universal AR, a cross-platform Augmented Reality kit for Mobile Browsers delivering native-level performance using only JavaScript (and a bit of WASM under the hood). We talk about what AR actually is, some of its use cases, as well as the fascinating details as to how the Zap.works team is delivering near native performance and accuracy without IR, LiDAR, or any other of the common advanced AR sensors - just the good ol' phone camera and advanced Computer Visual trickery.

Hosted by:

AJ ONeal •

Steve Edwards

Special Guests:

Connell Gauld

RSS Spotify Apple Podcasts YouTube Amazon Music

Show Notes

Transcript

AJ_O’NEAL: Welcome back to another exciting episode of JavaScript Jabber Z. On today's episode, we'll be interviewing Connell Gould. And on our panel, we have Steve Edwards.

STEVE_EDWARDS: How you doing AJ? Good to hear you again and see I can actually see you too. So yes, good to see you again.

AJ_O’NEAL: And oh no, the rest of the JS Jabber Z fighters are missing. Where could they be? Find out next time. All right. Yo, yo, yo, Sunny Pleasant Grove.

STEVE_EDWARDS: Oh yes, yeah, from sunny, actually sunny and beautiful Portland, Oregon area today.

AJ_O’NEAL: I have no idea if it's sunny outside. I've got blackout curtains. Anyway, so here we are to talk about building AR experiences using Universal AR. Universal AR is a brand name by Zapworks.

Your app is slow and you probably don't even know it. Maybe it's fine in most places, but then the customer loads the page up, that one page, and after a couple of seconds, their attention disappears into Twitter and never comes back. The reality is there are performance issues in your app and they're affecting your customer experience. What you need to do is hook up your app to Scout APM and let it start telling you where the slowdowns are happening. It makes it really easy. It tells you how slow things are and what the problem is, like N plus one queries or memory bloat. It's also built for developers. So it makes it really easy to identify where the fix needs to go. I've hooked it up to some of my apps and I saw what I needed to fix in a couple of minutes. Try it today for free and they'll donate $5 to the open source project of your choice. Just go to scoutapm.com slash dev chat and then deploy it to your app. Once you do that, they'll donate the five bucks that scoutapm.com slash dev chat.

AJ_O’NEAL: Steve had asked a question just a moment ago before we started rolling and you started to give a really good answer. Steve asked that question again.

STEVE_EDWARDS: If it's a question I think you're asking about, it was about the tie into JavaScript with augmented reality.

AJ_O’NEAL: Yeah. Let's just start with that. Don't bury the lead.

CONNELL_GAULD: Yeah. Well, so maybe just a, a little notice to augmented reality itself. I suppose it's clear for a lot of people, but it is these days, but there's also a ton of people that haven't encountered augmented reality around. And what it does is it shows you the real world around you and it adds virtual content to that world as if it's actually there with you. So for example, in movies where they have a, like in Jurassic Park, for example, they have a virtual Tyrannosaurus Rex running around in what is otherwise the real world around the characters. So that's augmented reality, reality augmented with that dinosaur.

AJ_O’NEAL: Microsoft Glass or Google HoloLens?

CONNELL_GAULD: Those technologies definitely incorporate augmented reality for sure. Although I'd say that the concept is a bit broader. Here's another example of a place where you might have tried augmented reality is Snapchat face filters. So if you open Snapchat and you're like recording a little video and you can have funny ears or various effects that apply to your face and your real-world face, but that are virtual effects that are then kind of baked into the video for you to send. That's another example of augmented reality. So yeah, taking the real world and adding additional content.

STEVE_EDWARDS: So if I can jump in, so let's define, I think the term that most people are probably familiar with is virtual reality. So let me see if I can explain this right. The difference between virtual reality and augmented reality, since augmented means adding, to augment something. So virtual reality is you've created the entire world, what you're seeing inside of your Oculus headset or whatever your medium is where augmented reality is, you're taking what already exists and adding a layer or multiple layers on top of that. Is that correct?

CONNELL_GAULD: That's exactly right. Yeah. With virtual reality, you've taken somewhere else with augmented reality. You're in your existing environment, but then we instrument that, that environment with 3D content or interactive content, whatever, whatever the use case calls for. And so at Zapper, we've been making augmented reality content and tools, particularly targeting mobile phones. So where you can take your mobile phone out and you can either have a face filter experience that tracks your face and applies virtual content there, or perhaps it tracks the world around you and lets you place content into it to interact with. It's a type of technology that ARKit and ARCore, which you may have heard of from iOS and Android, those are technologies that accomplish a similar thing. They work out the environment you're in in order for you to place content into them. So yeah, we've been doing that on mobile phones, and for a while it's been native app based. So in order to do all the computation of the camera images that we get in from the camera on the device and also the sensors like the accelerometer and the gyroscope, we have to process that super quickly in real-time in order to work out the environment that the users in. And so that because of the computational requirements has thus far been a kind of native app game if you like. You've got to download an app in order to have those experiences. But in the last couple of years, it's now become possible to do this in the web browser. So users can visit a website that will then show them the camera and let them do a face filter experience or

AJ_O’NEAL: you keep saying that word really fast. Then have a experience.

CONNELL_GAULD: Oh, I can't remember exactly which one I said.

AJ_O’NEAL: You said something like a face filter.

CONNELL_GAULD: Oh, sorry. A face filter experience. Yeah, exactly.

AJ_O’NEAL: Okay. You've just been saying the word face filter faster than my brain could comprehend it. And I'm like, wait, what is that word?

CONNELL_GAULD: I'm sorry. I'll try and slow down. Cause I know also my accent is quite a confused accent. So it's not always the easiest to hear.

AJ_O’NEAL: What is your accent?

CONNELL_GAULD: Well, I'm Scottish, uh, grew up in Scotland, but I've been living in London now for 10 plus years. And so I've got this sort of a weird combination of a Scottish accent and a London twang going on. So yeah, it's just as confusing for my family as it is for everybody else.

AJ_O’NEAL: OK, all right, all right.

CONNELL_GAULD: So yeah, so face filters, effectively, it's that Snapchat-type experience where you're viewing a picture either of your face or you're recording a picture of your face. And on that, there's perhaps a funny hat that you can wear, a virtual hat, or it might apply texture to your skin a tattoo or face paint, that type of thing. And you're typically doing that to create a shareable social experience that you can pass on to your friends and have them do. But there's lots of other use cases for that type of thing. For example, we've seen one recently, how to wear a mask properly, a medical mask to make sure that, for example, it's covering your nose and things like that. So there's lots of use cases for that specific type of technology from our perspective at the kind of heart of all of these, the core computation we're doing is finding the position of the user's head in space, in 3D space, and tracking its position as they look around and as their expression changes. And then the content developer who's building that content experience, those 3D assets, if you like, it's their job to take that position of the head and then apply virtual content to it so that it looks like it's in the right place.

AJ_O’NEAL: How does that not require a GPU?

CONNELL_GAULD: That's a good question, and it's been quite hard to achieve that. So we do use the GPU for 3D rendering, so WebGL in the web browser. But in terms of the computation of those camera images, it comes down to just a really optimized pipeline that tries to get camera frames out of the browser as quickly as it can. We then use technologies like Web Workers and WebAssembly so that we can perform the computation we need to do on those camera images in a separate thread in the web browser so that it's not slowing down the user's interaction with the web page.

AJ_O’NEAL: What language, just curious, what language are you using for web assembly?

CONNELL_GAULD: We use C and C plus plus for web assembly. I'd say predominantly because our platform has been native in the past, you know, our, our computer vision libraries and everything are written in C and C plus plus and have been for the past. But I, you know, I see that there's a ton of interesting work being done in Rust now for WebAssembly. And I'd like to see us use that more.

AJ_O’NEAL: And WebAssembly already gives you a sandbox. So if you have insecure C++ code, it's not going to be able to jump that to execute arbitrary code in the user's phone or whatever. So I guess C++ isn't quite so dangerous when you compile it to Wasm.

CONNELL_GAULD: Yeah, exactly. Absolutely. And there still are classes of danger there. So for example, from WebAssembly, you have access to things like the cookies that the web page has access to and various elements like that. But I wouldn't say that WebAssembly, it doesn't provide a larger attack surface area, if you like, than just JavaScript would in your browser. I think with Rust, the paradigms you use when building or encoding in Rust are interesting in and of themselves beyond the security implications. It's just a really interesting and nice way to write code where you can, the guarantees that Rust provides are comforting to the developer.

AJ_O’NEAL: Well, from what I've seen, it seems like everyone who actually likes C++ and uses it on purpose prefers Rust. But everybody who doesn't like C++ doesn't like Rust.

CONNELL_GAULD: Yeah, Rust definitely feels like, like coding in Rust, I feel like I'm coding in C plus plus more than I feel like I'm coding in something like JavaScript or Python, or obviously, you know, type safety is, is a key difference there, but I think C plus plus has had a bad rep and as a language, it's, it's a lot better to write now than it had been in the past. And so I do enjoy writing it also, but it's a rust is a super interesting language, but unfortunately, we do not use it very much yet.

AJ_O’NEAL: So are you just using one of those open source image recognition libraries to, to, to do that in WASM or are you building your own?

CONNELL_GAULD: It's all, all our own. We use various, you know, different elements to, uh, you know, like, uh, image loading and, uh, matrix multiplication or, you know, matrix math libraries, but the core computer vision is all ours. We've actually been in the game for quite a long time now. And so 10, 10 years ago was when we started, uh, with AR and building content and tools or AR back in the sort of iPhone 3GS and Nexus one eras of smartphones where we had limited computational resources. It's turned out to be a big benefit when it comes to trying to put these technologies into the web browser. And so our tech stack is kind of our own from across these last 10 years and thus is what we deploy and use.

AJ_O’NEAL: So you guys are you're out of the VC phase and into the profits stage?

CONNELL_GAULD: Yeah, I'd say so. So at Zacker, we've been in profit every year so far. We have had investment in the past, but when you have a technology like augmented reality, it's such a wide range of uses, but there are particular uses that are very novelty and often the eye-catching ones where it's like your Snapchat style face filter, for example. The technology has a value beyond what the kind of glitzy demos that you sometimes see are. And so from our perspective, it's super important that with our business, we demonstrate that it's a technology that's valuable, that delivers value to end users and also to the businesses that we work directly with. And so for that reason, we want a business that works as a business, as well as a kind of house for producing technology.

AJ_O’NEAL: So aside from Amiibo faces, what do you do or not necessarily what do you do, but what do people use the product for aside from the social animorph, I forget what they call it. But yeah.

CONNELL_GAULD: Yeah, I know what you mean. I can't remember the name either. Well, there's a wide range of them and perhaps discussing the categories is maybe the easiest way. So a huge amount is in learning and development where you can take, for example, what are static educational resources like textbooks and you can bring bring digital content to them. So one of the elements of our technology is image tracking. So detecting a specific page in a textbook and then being able to augment that with perhaps a video that accompanies that text, or perhaps there's a 3D visualization of the concept that's being discussed, or perhaps it's even just simply, you know, links to other resources online or what have you. So there's a ton of learning and development use case where what we're really leveraging is the context. So attaching digital content to a physical location or physical object all the way through to retail and helping users navigate your store, to giving people more information about the products that they're buying, like where they've come from, or the story of the farmer that produced this product. There's of course a ton of marketing and advertising as well, and bringing posters to life with additional content for movies. It's quite broad in terms of the use cases.

STEVE_EDWARDS: I can remember when I first, I think virtual reality first came out, I think it was virtual, yeah, long before augmented reality. One of the classic examples I saw was a real estate agent wanting to show a home to somebody, either that or like construction, you want to see what your house would actually look like on the inside, you know, when you made the changes or when it's built or something like that. And then, you know, the one I mentioned earlier today that I saw on a post on LinkedIn and I can't find it again was using like an Oculus tool to help somebody see you play piano because it shows you the keys that you're supposed to be touching, you know, as it's, it looks like, you know, it's right over your finger showing you where to play. So I know those are a couple pretty functional examples that I've seen of, of AR.

AJ_O’NEAL: So the, the next question that I had on this, going back to what you're talking about with learning and development. So I imagine that these scenarios take a lot of, um, backend information. Like for example, if we're having this scenario you talked about where you're doing image recognition on text in a textbook, well, somebody had to scan in that textbook and then tag all the images and, you know, correlate, um, uh, material, I imagine that there's some sort of context-specific training that has to take place. I I'm assuming there's some sort of machine learning algorithms that, that are context specific. So two things, are you a provider for that solution? One, and two, how much of these types of scenarios can exist on the phone in a poor internet or no internet connection situation, versus how many is it really important to have reliable internet connection for these types of experiences to work?

CONNELL_GAULD: Yeah, it's a great question. So I think the answer to the first element of that is, yes, we do provide these technologies. I suppose a toolbox of different primitives, if you like, for building this type of experience or these types of use cases. And so I suppose we can separate it out into a number of different elements. So at the kind of computer vision level, you have the image tracking algorithm. And what the image tracking algorithm can do is take a description of an image. So that's like, say, a PNG file that we've maybe done a bit of processing on. And then we can take that description and offline on the device. So without having to send every camera frame to the server or anything like that, so just processing on the device itself, we can say, I see that image in this camera frame and it's in this position. We can do that, you know, 30, 60 times a second. So there's one primitive if you like, the next primitive is as we were discussing, like face tracking. So, you know, set 30 or 60 times a second, being able to detect and position a face in a camera frame. We have also the ability to do that from point in the user's environment that the user has selected. So perhaps a place on their floor that they've tapped on, if you like, so we can track that location in space as they move their phone around that space. Then there are some other technologies which we provide at a server-supported level. So the aforementioned ones there all can be done offline on the device without a network connection. But when we have a network connection, we can do interesting things like we can be sending camera frames to a server and that server will search for the identity of an image within a database of hundreds of thousands of images. So for example, that service, you could upload all of the DVD covers that have ever been made and you could go to your cabinet where you've got your DVDs. Should people still have such cabinets? And you can pull out a DVD, point your phone at it, and that server will be able to respond, you're looking at this DVD. And then perhaps you have some content that's associated with that DVD. So sort of lookup database, if you like.

AJ_O’NEAL: If it was possible for me to take a picture of a DVD on my shelf and have it start playing, that is something that I realistically think I would use on a regular basis.

CONNELL_GAULD: Yeah. It's cool.

AJ_O’NEAL: Well, because the reason I have the DVDs is because then you have something that's permanent, unlike cloud-based where license terms change every freaking year and you never know what movies are gonna remain in your library, if it's on the shelf, it's gonna be there, it doesn't matter if Warner Bros. has some political kerfuffle with Netflix, you know, it's gonna stay around.

CONNELL_GAULD: Yeah, it's yours.

AJ_O’NEAL: Yeah, so I very much, I've thought about this before, I'd never thought about it from the picture perspective, I thought of it like, I don't know, maybe like RFID tag, probably because that's what was popular at the time I was thinking about this, but that's cool, I think.

CONNELL_GAULD: Yeah, and the great thing about DVD covers as well, and this class of images, that they're all really descriptive. You know, they're all trying to look different from every other DVD cover, which is great for technology like the cloud image lookup. It's easy for the technology to be able to say, yeah, that's definitely that movie versus that movie. Whereas there's of course classes of images where they look very similar and it's hard for humans to tell the difference, nevermind an algorithm.

AJ_O’NEAL: So Steve, pull me back on this if you think I'm going too far, but I actually want to get a little more specific on this. This is something I'm personally interested in. Do you know the process by which like the technical process by which an image is broken apart in a way that it can become searchable, like what type of processing happens, you know, let's, I'll frame it for our viewers slash. We don't have viewers, we only have listeners. But if you wanted to do a rudimentary type of image match on your own, do you know how to explain that process?

CONNELL_GAULD: So I can certainly do so in broad strokes if you like. I should say that I'm a CTO at Zapper, but I look after the kind of infrastructure and the tooling and what have you. Our research director, he's the proper computer vision bod in our business. But I can definitely give you a kind of rundown for how the technology broadly works as I understand it with the caveat that I could be completely wrong. If that's...

AJ_O’NEAL: Well, I mean, I guess like bad information is better than no information. I mean, social media has shown us that time and time again. So let's, let's carry on.

CONNELL_GAULD: So I think, I think the, broadly the technique comes down to features, which are specific little patches of image that the algorithm will find. So that it will take your camera frame that you're looking at, or, or perhaps even during that process of setting up your database, it will do the same process on the DVD covers themselves as you're kind of training the database to know what the DVD covers are. So we'll take a given image, and typically the first thing it does is look for little descriptive parts across the image. So with our systems, they're typically what we call corners, they're just little areas of the image where there's good amount of texture, perhaps it's the edge of various different letters, perhaps it's a head of somebody or a bit of a mountain, just little kind of high contrast elements in the image that are quite easy to describe. So it'll just be a little patch of image. And what you do is you take a ton of those from the image. So you just look and say, find 50 or 100 different little patches of that image, which are quite high contrast. And then you do some math to make sure that they're all the same way up, if you like. So you find a way of making sure that each little patch always has, for example, its lightest pixel at the top of the patch. So you rotate them round so that they're all one way up. Then you just compute a hash of that patch. So you, you compute a way of representing that patch in code rather than in, like as an image, and then that gives you a ton of things that you can search. So you load those into your database. And then whenever you're doing the same thing on the phone, you'll say, okay, I found these little bits of the, of, of image and they're, this one is over here relative to that one, like this one is above and to the right, this one's below and to the left. You send that to the database and the database looks through its set of all of these patches and it's like, Oh, I see. You see this one. I see you see that one. You know, all of these movies have these are very similar-looking patches in their covers and by checking how they relate to each other in space, I can see, Oh, it's most likely going to be men in black.

AJ_O’NEAL: So it's from your way you're describing, this sounds less like a hash technology like what's commonly used for audio files and more like a graph technology of relationship between objects. Is that, am I understanding that correctly? Or does it get resolved to, does it get represented as a hash at some point to make it easier to search in a database?

CONNELL_GAULD: I think each of those little patches is represented by a hash if you like, or some sort of concise representation if you like. Basically, you have to try and get the volume of information really compressed and down as much as you can so that you can perform that search.

AJ_O’NEAL: But it is, it is likely that the search, when the search takes place, the search is not on one item, the search is on a hundred items or it's very iterative. This is not something that you do in a, that you'd be able to reduce to something that could go in a SQL database. This requires a special, special type of database.

CONNELL_GAULD: Yeah, exactly. So it's definitely, as you say, it requires an understanding of how the different patches relate to each other on given covers. It requires a problem-specific solution, for sure. But it's not one that's going to lose a good amount of research and papers out there about different ways of solving this problem. This is certainly the way we solve it. And I'm sure there are other ways as well. But at the end of the day, it's definitely a tractable problem. It's not one of these problems where I think, you know, you read an academic paper and you're like, oh my goodness, I have absolutely no idea of what this is talking about or how it solves it. I think it is a,

AJ_O’NEAL: I think you overestimate my ability to comprehend these things.

CONNELL_GAULD: No worries. And in the meantime, you can definitely use our services to help solve this problem for you should you wish to.

AJ_O’NEAL: Cool beans, cool beans. And earlier, I think the word I was searching for about the framework was openCV. Computer vision that this, this is all in the realm of computer vision, all this AR stuff.

CONNELL_GAULD: Exactly. Yeah.

AJ_O’NEAL: But you, you don't use open CV. You have your own.

CONNELL_GAULD: Exactly. Yes. Okay. Yes. And lots of the things that our platform and infrastructure will do will be, you know, similar primitives to what happens in open CV for sure. Although, you know, our, our, this has come from the research that we've done and from computer vision research and the various implementations we have. And I should say that OpenCV is a fantastic tool and definitely worth investigating if you're keen on hacking on computer vision in general. So I think the big difference between where we were like three or four years ago and now is that in the web browser with, you can compile OpenCV for WebAssembly and you can be running this in the web browser. And it does take quite a lot of jumping through hoops to be able to get camera frames out from the browser fast enough especially across the different mobile browsers. And so we've spent a lot of time making sure that we have a super smooth pipeline of camera frames into the computer vision, into the results that then content developers can use to make your augmented reality experience. And so that for us has been a big challenge when it comes to the browser, but the results speak for themselves, I think. And now with, we can do face tracking in the web browser at 30 FPS on your iPhone XR and produce what is, we call it, I can't believe it's not native. So like it feels like it's a native app that's actually running in a web browser.

Are you stuck trying to figure out how to get to the next stage of your developer career? Maybe you're just not advancing fast enough in the job you're in, or you're trying to break in in the first place, or for whatever reason you keep going to interviews and it's just not working. You wanna land that dream coding job, but it just doesn't seem to be working out. Well, Johnson Mez has written a book for you called the complete software developers career guide. He walks through each stage of the development career and all of the things that you need to do in order to move up, keep learning, keep growing, and find that next job that's going to get you where you want to go. So if you're stuck and trying to figure this stuff out, go pick up the complete software developers career guide. It's the number one software development book on Amazon. It's sold over 100,000 copies so far. I actually have friends of mine that reach out to me and go, hey, do you know this John Sonmez guy? Cause his book is awesome. So go get the book. You can get it at devchat.tv slash complete guide. That's devchat.tv slash complete guide.

STEVE_EDWARDS: Oh, that brings back some old commercial memories from the 80s. So, you know, one of the, you know, the, the key issues that people deal with more and more, and sometimes maybe a little too much, I think, as Sarah Dresdner's pointed out lately, is performance. And so with augmented reality, obviously you're dealing with very heavy content, with video and pictures as compared to strictly audio. So knowing the number of probably lower-end devices that are out there in the world, whether it's other parts of the world, even people in the US or London or wherever. Are there device limitations from a mobile standpoint on using this type of augmented reality in terms of maybe a minimum model of a particular phone or maybe minimal specifications in terms of memory and processor power?

CONNELL_GAULD: I mean, I suppose the easy answer is yes. But the kind of more general answer is that for us, making this work across a large range of devices is a very important element of what we're trying to do with this technology. The promise of a web browser particularly, if you can get technology running in a web browser, that opens it up to such a huge audience, potential audience. The audience that are not willing or don't wish to install an app from an app store. The audience of all of the devices that don't run, for example, Google Play Services. So of course, like Huawei, lots of devices now that we have in the Western world that will now not be running Google Play Services and so don't have the Google Play Store on them. So all of those devices where devices particularly in China, where they're just never gonna have Google Play, having the technology running in the web browser, all of a sudden you can support these people. But as you say, only if those devices are capable of running the computer vision and the algorithms, computation that you're trying to deliver to them. So with our technology, a big focus has been on making that performance run across the board well on devices. It's a very difficult question to answer in terms of what the minimum specification is, partly because there's browser differences. There are just differences in terms of the amount of memory available versus the amount of CPU available and how those influence the ability to do the computation. And then of course, as you say, in addition to doing the kind of raw computer vision calculations, you then have a whole load of content-led questions as well. So if you're going to have a really big complex 3D model, then you're probably texting the GPU on the device more than you're texting the CPU. And so there are questions about how that all comes together. So, you know, as with, I suppose, as with any performance question, it's, there's so many variables that are all kind of intermingling and it's hard to, to give good solid answers or predictable answers to that question. But I would say that we intend that our technology allows you to perform these on your sort of mid-level Android devices from the last two or three years onwards with it being interactive for users in a way that they can be having that experience and enjoying it. But then when you say you add the latest iOS and Android devices, those users are going to have perfect experiences where the frame rate is going to be as fast as the cameras come from the device. So it doesn't feel janky or slow or anything like that. So I've kind of rambled and waffled there, but it's a complex answer.

AJ_O’NEAL: We love waffles.

STEVE_EDWARDS: Yes, waffles are very good, especially with syrup. Yes.

AJ_O’NEAL: It's an American thing, for sure. I bet you don't have waffles in London, do you?

CONNELL_GAULD: We certainly don't have them like you guys have them. So I'm definitely a fan of waffles when I'm over there.

AJ_O’NEAL: We've got Belgian waffles. We used to have American waffles, but they got thrown out. Those are little crispy ones like the small squares. Those are the good ones. Belgian waffles? Belgians don't know what they're doing. I don't know how they got popular here.

STEVE_EDWARDS: There's always the Eggos.

AJ_O’NEAL: In regards to waffles, not in regards to chocolate or anything else.

STEVE_EDWARDS: There's always the good old Eggos you can fall back on for high quality waffles.

AJ_O’NEAL: It is stereotypically American, like consumerist American, but those are not what I would consider high quality waffles. That aside.

STEVE_EDWARDS: I was being slightly facetious there.

AJ_O’NEAL: Oh, I know. I know. But for our international listeners.

STEVE_EDWARDS: Oh, yes.

CONNELL_GAULD: I don't know what an Eggo is.

STEVE_EDWARDS: I don't know an Eggo, I'm afraid. Oh, it's a packaged waffle. There was a whole classic, years ago, there was a whole classic series of cartoons where it's Lego, my Eggo, kid would pop up a waffle and somebody would try to steal it because it's obviously so good. And he'd say Lego, my Eggo, spelled E-G-G-O.

AJ_O’NEAL: They're not great waffles. No. I mean, they're okay, they're okay.

STEVE_EDWARDS: In a pinch.

AJ_O’NEAL: Yeah, but they're not something you could eat every morning and feel comfortable. Well, I won't say that because some of our listeners do that. Yeah, I'm talking to you. You hear me. You know what you're doing. Cool. All right. Well, let's see. What would a hello world look like? I mean, obviously we can't show code examples, but what's the hello world for getting integrated with, with, do I call it Zapper or Zapworks or, oh, Universal AR is the specific product.

CONNELL_GAULD: Exactly. Yes.

AJ_O’NEAL: Zapworks that we're talking about. So what does that Hello World look like and can I get it without the $150 a month subscription?

CONNELL_GAULD: Yeah, absolutely. I suppose I can clarify just a wee bit on this. So our company is called Zapper. We do a ton of work in AR including building content for people and what have you. And so ZappWorks is specifically our tooling and our products and services for people to build content themselves. And then Universal AR is a specific product of ours, which our computer vision algorithms. So for all of these different tracking types, but wrapped up into SDKs for various different tools. And so on the web, particularly, if you're looking to build something with JavaScript, then we provide SDKs for, for example, 3JS, which is library for JavaScript, really popular open source library for JavaScript for building a 3D content. And so with our tech, our, with Universal AR, the kind of hello world with 3JS takes an existing three JS experience, which maybe has a scene, a 3D scene, a canvas on the-

AJ_O’NEAL: I don't know what three JS is in the first place. Maybe it'd be worthwhile to describe that because I don't know if anybody else does.

STEVE_EDWARDS: I think it's one step up from two JS.

AJ_O’NEAL: Oh, one JS, two JS. So this is like post typescript.

CONNELL_GAULD: Oh, quite. So three JS, basically, you know, if you're building 3D content for the web. The browser exposes WebGL as its underlying 3D API, and it feels like OpenGL, which is just another API for building 3D on desktop computers. But for the web, you can use WebGL to build 3D content that's such that it runs in a web browser, but the API is quite difficult to use. And so there's a huge learning curve to understanding that API and then building a 3D experience with it. Three.js is a wrapper around that API, if you like that it's a library for JavaScript, where you, as the developer of some 3D content, express your will, if you like, the content you want to have, in a way that's much easier to do. So instead of working out what lots of the maths is in order to get all of the camera and the projection and all of the positioning of things with matrix maths, instead of having to worry about that, with 3JS you say, I'd like a scene, which is just a world, if you like. I'd like to have a camera here that's pointing at this location. I'd like there to be a box in 3D space here, or I'd like this 3D model to appear here. So you code that out in a way that's much easier to do with 3JS. And so with our technology, we provide some additional elements to that you can use in 3JS where you can say, here's my 3D model that I've loaded with 3JS. I want that 3D model to appear in the space in front of the user, perhaps on their coffee table, so that the user can look around that 3D model and see what it looks like. Maybe it's a vase that you might want to purchase, and so you might want to see what it looks like in your room. And so you could build this experience with 3JS. Another example is A-Frame. So A-Frame has been this library that lets people build VR experiences for the web. And it's very component-based. So if you're using A-Frame to build an experience, you start out with an HTML index page, if you like, like you're just going to make a webpage. And then you use just tags that are like a dash model tag, rather than instead of a div tag, use like a dash model. And that would be a 3d model of a file that you've included in your web project. And with, um, universal AR and it's the, our a-frame SDK, what you can have is like an a dash face tracker, if you like. And so you can put your 3d model into that object so that when the user visits that webpage that object appears attached to their face. So perhaps it's a 3D model of a pirate's hat, for example. And so in each of these cases, the underlying library, if you like, is probably where you'll get your Hello World from. So you'll start with a basic 3JS project that perhaps the 3JS community have provided to help you learn how to use that library. Or there's a ton of A-frame examples for how to build content there. And then with just a little bit of swapping out of things, you can say, instead of using just the environment that Three.js provides, I wanted to attach this content in AR to some space. So in most cases, the changes to your Hello World project that you had from these tools is very minimal in order to get an AR experience based on that same project working.

AJ_O’NEAL: So I noticed that it looks like you only support mobile browsers. Is that true?

CONNELL_GAULD: It does work on desktop browsers for sure. Where your webcam is the camera that the library will open in order to do this augmentation. And things like face tracking work well in that environment. And so when developers are building content using our tools, they will often use their desktop browsers as the compile and reload cycle to testing and trying their tools. There are some of our tracking technologies which require motion data. So the accelerometer and gyroscope data in the device and most desktop browsers don't expose those elements. And so some tracking types won't work in the desktop browser.

AJ_O’NEAL: I thought Mac did.

CONNELL_GAULD: Maybe.

AJ_O’NEAL: Maybe but I didn't. Cause I did something on this like years ago, years ago. And I remember running it on my Mac and being surprised to find out that some of these features that I thought were mobile were actually working just fine on. My head. Okay. But that's good to know. So they're like, so full support is only on Google Chrome for Android and Safari on iOS 11.3 plus does now because Chrome is a web view slash whatever. Does that mean that Chrome also works on iOS?

CONNELL_GAULD: Unfortunately not. And the reason is that the web view that Apple provides for developers to build their own browsers with, so say Chrome doesn't allow getUserMedia. So the API we use to get camera access is not allowed there. So for example, if you use Chrome on iOS, you can't use Google Hangouts, for example. You have to come out either into Safari or you have to go into the Hangouts app. Now, the Apple are, I believe, working on this. And so we're hoping that in a soon release of iOS, we will be able to access that API from within browsers that embed a web view like WKWebView. But at the moment, we have some library functions to tell the content developer, you're running on a device that's not supported, so you can show a screen to the user asking them to open in actual Safari.

AJ_O’NEAL: So I would consider first generation AR to be the Nintendo Wii. And starting with the second generation, which would be, from my point of view, historically the Xbox Kinect. AR has always had infrared and that's been fairly necessary in order to get depth and to know the position of things. Do you get access to the IR camera without, without native code? Or do you just double down on the image processing and say, ah, AR, the IR infrared camera later?

CONNELL_GAULD: Yes. So unfortunately, we don't get access to those cameras. In the web specifically, we only get access to your normal color camera data that we would see if we're using the camera app on your phone, for example. So as you exactly say, we doubled down on the computer vision elements of it and the image processing we do in order to try and infer structure from that scene and to do the computation we require. However, you know, we see really interesting cameras. Uh, there's the iPad pro has like a LIDAR camera or a depth camera. If you like, I think it's like LIDAR, but I could be wrong.

AJ_O’NEAL: What was the second word you said?

CONNELL_GAULD: A depth camera.

AJ_O’NEAL: Oh, depth. Okay. Yeah. Okay. I'm not familiar. I know LIDAR is used on cars, automated vehicles, but I had not heard of LIDAR being used in mobile devices or, or consumer devices yet.

CONNELL_GAULD: I could be completely wrong. It could just be an infrared-based one, as you say. But certainly, there's a depth camera on that device and the new iPhone, I believe, is the highest-end iPhone is going to have that same camera. Now, I think if we are targeting all of those devices across the world that we've spoken about, we have to make sure that we can present these experiences to users that maybe don't have that fanciest version of the phone. And so we need to make sure that the algorithms that we use don't require those fancy sensors. But it does still convey an advantage, which is that if you're a content developer, some of these fancy sensors will help you build 3D models or will help us in terms of developing models for training data and these sorts of things. So there's lots of interesting ways that these devices do help in the industry and in the scheme of things. But at the end of the day, if we're targeting content towards end users. We have to make sure it's a kind of common denominator level. And I think we're quite a while off having these sensors in your mid-tier Android devices that are going to be shipping worldwide.

AJ_O’NEAL: And you are correct. The iPad pro, the March 2020 iPad pro does have a LIDAR scanner, which honestly that, I don't know that that struck me with a little bit of fear. I've never considered this before, but I mean, like the minute I see this press release, it's like, well, yeah, okay. So this could be used to potentially make a 3D model of my face for a mask that I want to buy. But it also scans a 3D model of my face. Like, that's like, you know, that's, that's like, that's, that's a personal fingerprint. That's, that's a little, I don't know if I want LIDAR. I mean, I'd never thought about it that way with infrared, but.

CONNELL_GAULD: We just hope we should hope that Apple don't put it on the front of cars in order to, uh, you know, do you just put your iPad on the front of the car in order to make it self-driving? We definitely want to avoid.

AJ_O’NEAL: Someone will do it.

STEVE_EDWARDS: There are people that will try anything.

CONNELL_GAULD: And I think one of the kind of underlying, underlying point in your question, I think, which is that web browsers take a little while to adapt to these types of technologies. They typically take longer than the native platforms. So obviously, Apple have put this LiDAR sensor into the iPad. And in order for this to be a growing concern, they have also made sure that the development tools that they provide for iOS developers, they make sure that those iOS developers can access the data that comes from that LiDAR camera in order to facilitate whatever they want to do making that camera available in the web browser is a slower and more complicated process because Apple don't have end-to-end control of the web specifications. Safari, I imagine, has a different development team within Apple to the team that are developing the LiDAR camera, to the team that are developing the tooling for native app developers. So there's a kind of trickle effect that we see, which means that... Even if we did want to rely in the web browser on one of these fancy cameras, it will be a long time, I think, before that API is there in the browser for us to use. And we can hope at the same time that if Android also have these cameras if Android devices have these cameras, that then Google Chrome will also expose the same API or a compatible API, although there is no guarantee that they will be compatible in the first instance. So we really are dealing with a set of tools when we're talking to the web browser that are more limited, more constrained, that are a bit further behind the curve. And we can see that even with video conferencing tools. It's only been recently in the last couple of years that you can really have a Google Hangout in a web browser that works well and that provides an ads native type experience. And it will take similarly the same amount of time for us to get access to some of these other more exciting sensors or technologies. So even with WebAssembly, when we are computing, performing really heavy computations, normally in a native app we would use what we'd call vector instructions. So these are instructions on the CPUs of these devices that are designed to process large amounts of data at a time and are often very specifically used by algorithms that have to run through a lot of it data like pixels in an image, for example. And there's good access to those on iOS and Android and native apps, but WebAssembly doesn't currently expose any of these vector instructions. So not only do we have to deal with the fact that we're running in a sandbox and a virtual machine running inside the web browser, but we also don't have access to just the same set of tools for computation that we would otherwise natively be able to access. The browser really is an interesting challenge. And I have to say one that- it's incredibly satisfying to work with. It's really cool being able to see what you can do in the web browser these days. I'm constantly astonished by how well these devices can perform. But it's taken time, and it will take time. And it will always lag a little bit behind native, I think. That was another waffle there.

STEVE_EDWARDS: I think you stumped AJ. It's all good.

CONNELL_GAULD: That's OK. The summary is web browsers are hard.

AJ_O’NEAL: Yes. Okay. Yeah.

CONNELL_GAULD: And a bit behind later.

STEVE_EDWARDS: Not exactly groundbreaking news there. All right.

AJ_O’NEAL: Cool. Well, I think we've, I feel like we've filled the show. I don't know if we, if we need to grasp at straws to, to, to drag it out further. I think we covered good topic material and got through all the discussion points here and answered my questions. It's cool. It's cool that you've been able to get so much done on mobile browsers.

CONNELL_GAULD: Thank you. Yeah. And it's. It's a labor of love, often. But at the end of the day, there's some fantastic engineers around the world working on the web browsers themselves and on these devices. And it will take the industry a little while just to work out, I think, exactly what you can do in these environments. But it's definitely possible to be processing images, performing computer vision at high frame rates, and produce some really interesting end-user applications as a result and be able to deploy that on devices that are already in people's hands. And for us at Zapper and for me personally, that's a very exciting development in terms of how you can deploy this type of application to users.

AJ_O’NEAL: All right. So if people want to find out more, how do they find you and connect with you or whoever you want them to connect with?

CONNELL_GAULD: Yeah, absolutely. Well, you can find out a ton about our tools, including our universal AR product at zap.works, which is a site for our content development tools. So, uh, head there. We have a fantastic forum and a fantastic support team, which they're support at zappar.com. Also, but they're all linked to from our website. So, uh, feel free to, to check out there and like, uh, well pre COVID I was sitting next to the support guys were a small team, so they can definitely put you in touch with me if you'd like to, to chat more about, about the technology or what's possible in the web browser. One of the things that great about working in this sector is we constantly see use cases for this technology brought to us by people using our tech that we haven't encountered before or didn't think would be possible or just didn't envisage happening. So it's, it's always great to hear from users about what they're trying to achieve or about what they have been able to achieve. So yeah, please do get in touch.

AJ_O’NEAL: Right. Somehow we went through a whole show about AR without even once mentioning Pokemon Go.

CONNELL_GAULD: Yeah.Indeed. With Universal AR, you could build the next Pokemon Go.

AJ_O’NEAL: Alright, that's what we need to hear. Investment's gonna start rolling in.

CONNELL_GAULD: Yeah. Indeed. You just, you know, you just will have to speak to Nintendo about securing a very good license.

AJ_O’NEAL: You know anything about the ROMs community. You know that's not necessary. Just go on Amazon. You can buy a whole arcade with 500 games on it. Nintendo don't care. Redact.

Hey folks, this is Charles Maxwood. And over the last few years, I've gotten to know a lot of great people within the Microsoft community. And specifically in the.NET area. One of our guests from JavaScript Jabber, Sean Clabo actually reached out to me and said he wanted to start a show on.NET. And there are a ton of people out there that I feel like sometimes get neglected in the.NET space. So if you're one of those folks, you've been listening to maybe one or two of the other.NET focused or Microsoft focused podcasts for a while and thought. Well, where's the devchat.tv style podcast for me in.net. You can find it. It's at adventures in.net.net is spelled out. D O T N E T adventures in.net.com. Go check it out today.

AJ_O’NEAL: Yeah. So then let's just go through picks. Uh, Steve, do you want to start us off?

STEVE_EDWARDS: Yes, I will. So I'm going to go old school and I hope I haven't picked this before. My picks just sort of blur together in my mind after a while, but I'm going to go with Looney tunes. So there's a channel called Boomerang on that's available on cable TV here in the States. And they replay a lot of Looney Tunes cartoons. And my nine-year-old has gotten into them as well. His favorites, I think are the, uh, Roadrunner Coyote cartoons. Always a classic, you know, there's all the Tweedie Bird and Sylvester and Bugs Bunny and so on. Yeah, they're just, they're, they're a good watch. I think, you know, unfortunately a lot of them would be considered politically incorrect these days, whether it's from the characters or too violent or whatever, but I watched him growing up and it's a hit watching him with my son because he gets a kick out of him too. That's my pick.

AJ_O’NEAL: Cool beans. Nice. I'm going to pick a couple of things. One, I'm going to pick RipGrep for anybody who has not been using RipGrep. You are living below your privileges. You deserve RipGrep in your life. And if you don't know what grep is, then you're really, really, really living below the level of comfort that you could be. These are technical tools by the way. So it's a grep and rip grep. Grep is search through files. You can search recursively through a directory for exact string pattern or a regular expression pattern. Rip grep is basically just a modern version of grep. So it understands Git and it understands code to some degree. So it's just, just like all the features that you want to turn on. And most importantly, it understands.gitignore and.ignore.

STEVE_EDWARDS: Hallelujah.

AJ_O’NEAL: Yeah. And the link that I've provided is to webinstall.dev slash RG, because that is the easiest way to install it. Although there are other ways that you can install it such as Brew. And I also tried to put some cheat sheet material there so that you can kind of see at a glance common things that you'd want to do. Whereas the actual ReadMe focuses entirely on things that aren't important at all, like performance rather than getting to the point of what it does. Like if I just read their ReadMe, I think, oh great, you made grep 30% faster, who cares? But that is not the point. It is awesome, not for that reason. If it were 30% slower than grep, it would be worth using, but it's not, it's like significantly faster in most cases. That's that.

CONNELL_GAULD: Yeah, well, so mine is, I just finished watching Dark on Netflix. I don't know if you've seen it. It's like time travel. It's kind of sci-fi time travel show. It's in German, but it's got English subtitles. And I'm sure there's a dubbed version as well, but it's basically three seasons of this crazy time travel story with lots of different interwoven timelines and things happening. It's one of these things that's like, it's almost a puzzle to watch. And so it's quite...It's quite engaging and entertaining. I mean, I'm a sucker for a time travel story anyway. So, but that's my pick. Yeah. It's on Netflix and it's just very well shot. Uh, great soundtrack, a fantastic story, like of which I've just been dying for, for a while in terms of sci-fi and yeah, I can't recommend it enough.

AJ_O’NEAL: Well, that's a wrap. Thanks so much, Connell, for coming on the show. We really enjoyed having you.

CONNELL_GAULD: Uh, it's my pleasure. I had a great time as well. Good, good chat.

AJ_O’NEAL: Yeah. So we look forward to, well, I don't know if we're going to have you on the show again, but at some point in the future, we look forward to seeing, seeing, seeing where things go in AR, keep us abreast.

CONNELL_GAULD: Absolutely. Well, yeah. And thank you very much for having me. It's been, it's been a lovely chat. All right.

AJ_O’NEAL: Cool. Steve, anything else?

STEVE_EDWARDS: Not for me. I like, well, I was going to say that his interview really augmented my reality in terms of understanding the technology. So I appreciate that very much.

CONNELL_GAULD: No worries.

AJ_O’NEAL: Cool. Well then, adios.

STEVE_EDWARDS: Adios.

CONNELL_GAULD: Great. Thanks, guys.

Bandwidth for this segment is provided by Cashfly, the world's fastest CDN. To deliver your content fast with Cashfly, visit C-A-C-H-E-F-L-Y dot com to learn more.

JSJ 445: Augmented Reality for Mobile Browsers with Connell Gauld

0:00

53:52

Playback Speed: