Protecting Your ML From Phishing And Hackers - ML 101

Have you ever wondered how to secure a cloud deployment? Well, today we talk to the president at a cloud security company about personal security, detecting malicious actors, startup trends, and much more!

Hosted by:

Ben Wilson •

Michael Berk

Special Guests:

Kevin Dominik Korte

RSS Spotify Apple Podcasts YouTube Amazon Music

Show Notes

On YouTube

Protecting Your ML From Phishing And Hackers - ML 101

Links

Transcript

Michael_Berk::

Hello everyone. Welcome back to another episode of Adventures in Machine Learning. It's Michael Burke and my co-host.

Ben_Wilson:

Ben Wilson.

Michael_Berk::

And today we are joined by Kevin. He is a mentor at Newchip Accelerator, which is an equity-free and completely online startup. And both equity-free and online are rare in the accelerator setting. He's invested in over 16 startups, specializes in financial regulation, and has at least 38 articles throughout the news world. couple might've slipped through the cracks as I was doing my research. But the list goes on. And what I wanted to kick off the episode with is one of Kevin's main projects over the past 11 years, which is a company called Univention. So Kevin, could you please explain your role as well as what the organization does? So Kevin, could you please explain your role as well as what the organization does? So Kevin, could you please explain your role as well as what the organization does? So Kevin, could you please explain your role as well as what the organization does?

Kevin Dominik_Korte:

Of course, and thank you for having me. At Univention, we focus on identity management. For users or podcast hosts, it looks like you start your day, like most of us, logging into your computer and then probably your emails and your scheduling software and use hopefully one email address. And if it's really good, you only log in once and all of that pops up by itself. Of course, on the backend, there's someone who has to put it in everywhere. And IT people really hate doing that stuff because it's repetitive. It's either you feel undervalued or your boss feels you're overpaid for it. So what we do is we automate that. So you only put in the name and that's it. And the system takes care of all of that.

Michael_Berk::

Got it. So what are a couple of features that Univention provides?

Kevin Dominik_Korte:

It's basically the directory. It's like your address book where you have all your employees in. Then the main feature, the main selling point is really the automation part. As in guessing what do you want your emails to be? If Michael Burke comes, you might want to be at M. Burke or you want to have it Michael B. And making sure that that email goes really everywhere and no one has to type in like, oh, which one do I need to type in again? And that's really the big selling point where we come in normally.

Michael_Berk::

Got it. So sort of a unified management system for all people directories and things, it

Kevin Dominik_Korte:

Yes, unified identity and system management.

Michael_Berk::

Nice. That's super cool. Um, and do you ever see any common security mistakes, uh, when working with customers, let's say.

Kevin Dominik_Korte:

One of the big things where we come in is if people don't follow standards, don't follow policies, because as I said, if you have six different systems where you put it in here and there, you'll have six different passwords, which makes it okay. The password gets shorter and shorter, the more you have to remember them. And that's really the big one where we come in, because if you have one login, you make the password longer. You can enforce that users make it longer. And the other thing is you can enforce things like two factor. Hopefully for your bank, you get that SMS, is it really use logging in? And that's something which comes into the corporate world more and more. And that only works if you really put it together because otherwise, it's just an SMS out of a hundred.

Michael_Berk::

Got it. And so this sounds a lot like SSO or a sort of a similar technology. Um,

Kevin Dominik_Korte:

single sign on is

Michael_Berk::

yeah.

Kevin Dominik_Korte:

really the master class where you only annoy the person once with a password and

Michael_Berk::

Yeah.

Kevin Dominik_Korte:

you get around that statistics that what a third of Americans refuse to do work where they have to enter another password.

Michael_Berk::

Americans only?

Kevin Dominik_Korte:

This study was about Americans. I would guess it's the same everywhere.

Ben_Wilson:

No, I would concur with that study. Before we did unified security at Databricks a number of years ago to do like the dev loop associated with. developing a feature branch, pushing to a PR, and then starting up the systems that you would need to validate. I think it was something like 15 different passwords and validation steps that you would have to go through as an engineer. And now it's a password to log into your computer and then a YubiKey that when you touch that thing, validating that it's you, does the authentication and handles everything for you. And... I have a question about... When you simplify a process with security access management and you're a service provider that automates that on behalf of organizations. Does it now, because you're having the management of a single source of authentication to sensitive systems, does it now become more of a potential headache for you as a service provider for the different factors of intrusion that bad actors might be trying to do? How do you detect stuff like that?

Kevin Dominik_Korte:

So if you look at the stack, there's really two ways in if you, for most actors, that's either phishing emails, so getting someone to click on something or finding stolen credentials. And that accounts for stolen credentials are around 80% of intrusion into corporate networks, phishing 18, and then the rest, the 2% in the end, it's like... bad software, missed updates. We really focused on getting the first, what, 98% done. Oh, actually, okay, our software development team was hopefully focusing on making the 2% with bad updates go away as well. But yeah, so really question like, okay, can we automatically verify that passwords and identities are not reused or not? found some on the black market and I think a quarter of Americans reuse passwords in banking on social media. So I don't want to know how many use a Facebook password for their corporate login. That's not their own money. So these kind of questions probably shorten it already. And then you come to ideas. Okay, if I have a single source of truth where I have to disable a user, I'm not forgetting anyone. and if you have like six or seven systems, wait, is that guy in the accounting system or not? Oh, he's probably not. He's a developer now. So I don't go check in there. Versus if you have a single software, you disable the system. Oh, actually he started as an accountant. I really have to disable that. And that way you kind of increase the security, you increase compliance. And if you look into then logging, into monitoring, into doing... plausibility checks. That's only possible if you have a single source. Otherwise, I might not notice that, okay, the guy is locked in Minnesota in the accounting system and wait, he really shouldn't log in from South Korea into the HR system at the same time. And so these detections and having the system actually make reasonable guesses of what's possible or not. I think it's something which doesn't happen if you have everything distributed.

Michael_Berk::

Got it. And so some of those guesses can come from geolocation. What other technology and data do you use to determine if it's a malicious actor?

Kevin Dominik_Korte:

Geolocation is one of the big ones. Number of logins is, I think, standard everywhere now. But it's surprising how many systems don't have that. And especially if you think of, it's the same approach if I send an email, if I try to log into the email once, HR once, accounting once, and then I might have three at each, versus if you centralize it, you have one in each. and then you have three in total and are done. What we see now also is, yeah, you're in machine learning, trying to build traces on employees, which is kind of a double-edged sword on the one hand, you don't wanna really trace your employee on the other end, it makes it easy to do checks. Is that a guy who checks his email first? Is that someone who really downloads? a gigabyte of files the first thing in the morning because he wants to work on that giant spreadsheet that's somewhere floating around. And these kind of behaviors are something we're seeing as requests in which might open up an ethical question in the future, whether we really want to do it or not.

Ben_Wilson:

What I'm most interested with, with what you said earlier is sort of the product management aspect of how do you determine malicious actors? Uh, anybody who's ever been in like web based companies where you're selling services, there's money to be made or products to be delivered. Tons of people are criminals and we'll try to. get access to systems in order to make money or sell data or just fraudulently interact with the company and pretend to be somebody else. I'm going to change the delivery address on the shipment to this other location, then I can go and pick it up. They might want to just get into the system with access to just do that and try not to get detected otherwise. As those vectors change, as... criminals start behaving differently because they're trying to work around access restrictions that your company puts in place. You're like, hey, it's whack-a-mole. You took out the Pareto, we got the 80%. Nobody can get into these vectors anymore. When you start seeing intrusions happen again, what's that process like for your company? Everybody sits down and discusses, brainstorms like, hey, this is a problem that we found. We need to come up with a solution and how do you tackle that? Like from soup to nuts.

Kevin Dominik_Korte:

So most of our customers still use our product on premises. So there's kind of an air gap in between on what we get on information and what we need to do product decisions. That's already the biggest hurdle because if you run your own system or if people use the cloud system, okay, you can see it, you can analyze it, you can grab as much data as you want versus that gap in between. And especially if you deal with security or with military or with healthcare, they might be, okay, we actually can't give you data

Ben_Wilson:

Mm-hmm.

Kevin Dominik_Korte:

as much as you really want. But then really starts into looking into, okay, that's the vector they got in. That's what they exploited. And, okay, what's then a reasonable guess what's missing in here? Is it that, okay, they guessed the password, that they found new ways that some connections were wrong, that factors were copied. And I think SMS security is one of the big ones where people are always like, oh, I getting an SMS. And truth is, okay, everyone with a simple radio chip can get that SMS in your network or in your cell zone. And

Ben_Wilson:

Mm-hmm.

Kevin Dominik_Korte:

then... rather than depends on finding out, okay, what's the missing information here? And then of course you come into, okay, can I reproduce it easily? And then once you reproduce it, you can build something against it. And then sometimes it's as much as getting everyone together to do a hackathon and someone says like, you know, let me try to break into that

Ben_Wilson:

Mm-hmm.

Kevin Dominik_Korte:

and kind of spurring people's. competitiveness. And then we had one hackathon where actually one of our marketing guys got in just by asking, oh, can you show me that again?

Ben_Wilson:

Ha ha ha.

Kevin Dominik_Korte:

And It's, I think, the great insider attack, which we don't normally assume. Think one of the red blue attacks I've seen at a customer was just the delivery guy bringing something to the IT department, walking out with the admin's UVQ, which was plugged in. It's the same attack vector as always. That's more the human part than anything technological.

Ben_Wilson:

Yeah, a company I worked at many, many years ago, which I can't disclose their name, but they had somebody take some extremely valuable factory data out on, uh, basically a shoe box full of flash drives. And there was no restriction on network. Like if you had access and you logged in, um, You could just plug a flash drive in and copy files. And the security around that was, well, we know who logged into the computer. So we know who it was that downloaded it, except on the factory floor, they had shared computers, which are there for basically tool control and maintenance. And they were open to the tool data, like the recipe data that was being used. So an enterprising individual. snuck a flash drive in their clean room suit a couple of times and managed to download a couple terabytes of critical recipe data. The solution to that from the IT department was probably the biggest ban hammer I've ever seen where there are stickers that were placed on everybody's computer throughout the entire factory. They just said, hey, don't plug anything into these ports. You've been warned. And a couple of people tried it and they're like, this can't be that bad. Instantly it just bricks your computer. The BIOS actually shuts down and it turns the computer off. And then security comes running over to your cubicle being like, what are you doing?

Kevin Dominik_Korte:

Yeah, I've seen when I was still a student, we went to one of our partners and they had actually the USB ports welded shut

Ben_Wilson:

Mm-hmm.

Kevin Dominik_Korte:

on the back of the computer.

Ben_Wilson:

Yeah,

Kevin Dominik_Korte:

But nowadays I don't think that's possible anymore.

Ben_Wilson:

it's an access paradigm that a lot of people don't really think about. Uh, it's almost like an old school, like, Hey, if I want to get data out of a company, there's a printer over there. I can print information out onto paper and carry that out. Um, not very efficient, but yeah, USB is super easy to just plug in. And some of them have massive capacities these days. Uh, it's pretty easy to leak, leak data out that way. So the social engineering stuff, does your company figure out that analysis and then talk to your customers and say, hey, here's some things you should think about to supplement your security or is it purely services based?

Kevin Dominik_Korte:

We generally tell them, okay, that's the new best practices. And of course the best practices then go into the product as default settings for anyone who installs it a new. And that kind of puts everyone who's on the new system, okay, you have to make the decision if you wanna be less secure, but maybe need compatibility. You have to take an active step, make it less secure. But yeah, for everyone who is a current customer, they get told, okay, here is what we recommend. And sometimes even, okay, here's what we don't support anymore because it's so old. And that way you kind of move them along.

Michael_Berk::

I have one

Kevin Dominik_Korte:

But

Michael_Berk::

question.

Kevin Dominik_Korte:

yeah.

Michael_Berk::

So I recently concluded an engagement with a financial company and I, for that financial company, I had to do a bunch of security training and it was really annoying. I did it. And my question is, why are the phishing attacks so bad? Like I've gotten an email from our CEO of Databricks, probably like at least once a month and it's like, xml something something at gmail.com. Hello, please wire me $10,000 to this bank account signed our CEO. Like why, why aren't there better phishing attacks? And why, like why is there so much just really poorly made ones?

Kevin Dominik_Korte:

because you're probably not the right target.

Michael_Berk::

Oh, I'm not

Kevin Dominik_Korte:

Michael_Berk::

important enough, got it.

Kevin Dominik_Korte:

you're not in the right position. Ask your HR people right now what they got over the last, what, 20 days, and they'll get really well worded emails with please redirect my W2 somewhere.

Michael_Berk::

Interesting.

Kevin Dominik_Korte:

And it will have someone's current address and their new address. And that way you get the social security number and last year's income with what maybe 20 cents spent on a black market for their current address. And they're much more likely to respond to that because it's such a legitimate point. Oh my God, I need my W2, I just moved and they might react to it even though they're not supposed to. Versus wire me. 10,000 might be not the big one, but why are we a hundred dollars? I forgot my card. I'm sitting here at the rental car place to get to the conference. That's something, okay, you have to be kind of a bit idiotic to do it because it's really, why wouldn't they pick up the phone and call you if they're really at the rental car place? So they go for people who really just fall for that so they don't waste their time. So they use bad spelling, they use an obviously fake email address, they don't spend the money on getting, on guessing the CEO's actual address. And that way you really find the people who are worth engaging with. And it's kind of the sales tactic. That's the same with, I go out sell identity management, I'm calling the IT guys or the CIO, I'm not going to call the factory floor worker because he's not the target.

Michael_Berk::

Got

Kevin Dominik_Korte:

And

Michael_Berk::

it.

Kevin Dominik_Korte:

that way you're just in the flyby and they hope for you to click on it and not be... What's the comedian's name? James Veach, I think. British comedian who engages with these really bad spammers and then says, Oh yeah, I'm interested in your billion dollar diamond box. Or in building a snail farm in Indonesia.

Michael_Berk::

Got it. So, so this financial company was right to have me go through all these trainings, even though I wasn't the target, uh, because some people at that company were, will be targeted a lot more effectively.

Kevin Dominik_Korte:

Yes, plus there might be then just the attachment you and then you get kind of sensitized if the training is done well. If the training is done poorly on the other end, the training might also desensitize you to it if they only focus on one attack or one vector and not keep it kind of focusing everything and keep like with everything, keep gamified or. competitive, who reports most spans, who clicks on least. And I think one of the worst things I've seen is when you click on a test phishing email and they sent you to IT training and they go over the same boring thing again and they don't learn why there are so many retakes of the training. They're just, okay, you clicked on it, you have to be retrained.

Ben_Wilson:

Yeah, we've, we've gone through that. Not since you've been at Databricks, Michael, but we, we did that about two years ago where it was like white hat hacking and the IT department or security team was sending out phishing emails and then identifying people who were falling for it. They did some pretty sophisticated ones that I've never seen a scammer send to me personally, like mockup are basically password reset. webpage that's an internal page and you had to really look at it and look at the domain. You're like, yeah, the domain looks legit, but what's that special character in there? That's weird. And then you look at the layout of it and the resolution just isn't quite high enough for some of the images. I'm like, well done. But if you just looked at it really quickly, you wouldn't know the difference. And it was asking for your current login. Like please put in your username and current login so we can reset your account to a new password to cycle it. And a lot of people fell for that. And then we all had to take training and be like, Hey, here are the things to look for to know if this is a legit website.

Kevin Dominik_Korte:

Yeah, but then you deal with someone who's really targeting you. And we have that quite a lot with actually military contractors.

Ben_Wilson:

Mm-hmm.

Kevin Dominik_Korte:

People are like, okay, plug in your information here to reset your password. And it's like, I wouldn't be able to detect that that's the same, something different. And yeah, as you said, the URL has somewhere semi-colon and then a special character and then everyone thinks, okay, that's just the... that kind of the end of the URL which says that's for you to reset, but it's actually continuing the URL somewhere.

Michael_Berk::

Are there any ML specific security issues that either of you guys have seen? Cause we're talking a lot about system access and user management and that type of thing. Um, but is there anything specific to let's say ML algorithms or the data that they leverage in your experience?

Kevin Dominik_Korte:

It's building the perfect match, kind of your digital copy. People like to, during COVID, talk about, okay, how do you detect the mouth wiggler? But that's kind of the stupid idea of someone working, the mouth is moving versus if we clearly look at it more from an access point, we see, okay, someone's accessing emails, someone's accessing data from somewhere else and... copying that, kind of taking your profile, taking it over to a different system, and then starting copying data in the same way you work. That's something we've seen happening now and going around all these presence detection, all the plausibility checks, and then slowly copying data. And we've seen that for high value targets especially. So that's obviously nothing you see. Okay, I'm going to secretary xyz to copy that, but yeah, someone who develops rocket technology, that's obviously a target for someone to spoof really their whole workflow, spoof the identity and use that to copy data out once you have access. The other thing is the initial password. We are really bad as humans to make patterns. Just think of a few, I tell you, think of a random string, the chance that you put three times the same character after another is relatively low until you, unless you do it every for every password. So if I go out with a machine learning algorithm, get the six, seven largest data breaches from last year, plug them in into an algorithm, there's a chance I can get passwords which you might have thought of. And that's something we've seen that people use like training data over different data breaches.

Ben_Wilson:

Yeah, I think for memorization sake, and I don't know where it came from with the, hey, you have to use special characters in with your password. I think it's been proven over time that that's a really stupid idea because people just replace letters with those characters that kind of look like them, replace I with an exclamation mark, replace O with zero. It's pretty easy for a brute force attack algorithm to try those. and try to just the single word patterns. But I read somewhere a report from a white hat hacker who tried to figure out how complex it would be to brute force an actual sentence with the words either reversed so that somebody who tried to try a dictionary and create a sentence of six words together. how hard that password is to crack. It's almost impossible to figure that out. I've never understood why we still hold on to these password systems that are like, hey, we can have a 12-character password and it has to have these two special characters at least. It's like, really? It's easier to remember a sentence.

Kevin Dominik_Korte:

It's, yeah, but it's, humans are hard to change. And we've been trained, for someone who's been using computer all his life, they've been trained from the very beginning. Okay, you need at least eight characters and one needs to be a special character and one needs to be a number and a caps. So how does it look? You start with a caps, you put five characters, five small letters, you put a special character, you put a number,

Ben_Wilson:

Mm-hmm.

Kevin Dominik_Korte:

and yeah, if I then have... six or seven data breaches, six or seven passwords, I know, okay, you're the guy who has the capital. And then, at the worst case, the smaller letters are the website you wanna log in.

Ben_Wilson:

Hmm.

Kevin Dominik_Korte:

That way, an algorithm can quickly detect, okay, that's the guy who does that. Here are the 10 most likely ones he's using.

Ben_Wilson:

Let's see if he has an account on Twitter and then we'll exhaust our password attempts there. Now let's try LinkedIn and let's see if he's like reusing a password at 30 different websites. Yeah, it's, it's actually much easier from what I understand about it to do those sort of attacks than it ever was in the past, just because so many people have so many online accounts and they reuse like who can remember 150 passwords. that are all unique and different. Nobody can.

Kevin Dominik_Korte:

Yeah, plus algorithms are really good at spotting patterns, which

Ben_Wilson:

Hmm.

Kevin Dominik_Korte:

we might not be even be aware of when we build passwords. So, I mean, capital, small letters, that's a really simple pattern, but yeah. So, yeah. even something unconscious might be more complex. And we think, oh, I'm putting a random one every time, but it's, yeah, just my little pattern in my head.

Ben_Wilson:

One thing that interests me about the application of ML and artificial intelligence to this space is the potential weaponization and proliferation of advanced language models, not in the sense of, oh, we're going to get one to guess passwords really efficiently. It's like, no, those algorithms aren't that hard to write. You can write a password guesser in an afternoon if you know how to write. you know, an algorithm in a language. It's more of the, what we were talking about earlier, where what if you interacted with the IT department for a trouble ticket and requested access to a system that there's some back and forth response that happens, what if you took that as a temporal data set and trained like a GPT type model on that? So like here was the initiation. Here was their response. Here's what I responded to their questions. Here was their response and the back and forth that happens until finally, Hey, I got access to this system. Well, what if you did that at a thousand different companies and maybe 20 of them were successful. What if the model learned how to do that? And then you just pointed that chat bot at the email address that's contacting these different IT departments. What happens then?

Kevin Dominik_Korte:

I would be scared of what's happening then because I would assume that a lot of IT departments, especially if you look at consumer facing companies, your telcos, your big financial institute will probably be among the first one who on their side implement some chat box for easy consumer interaction. Because

Ben_Wilson:

Mm-mm.

Kevin Dominik_Korte:

if you've been stuck in an airline waiting loop for this year, You'd probably be happy to chat with a bot who can rebook your flight instead of waiting four hours for someone to pick up the phone.

Ben_Wilson:

Mm-hmm.

Kevin Dominik_Korte:

But that also, of course, gives you the other end that someone trains a bot to speak to their bot to then rebook 500 flights or see how many you need to rebook till the whole system comes down again. So I think we'll see it a lot earlier in these consumer facing interactions. Please reset my online shopping password. I'm trying to buy the Christmas present.

Ben_Wilson:

or, hey, I didn't receive any of these packages. Can you resend them please to this address? And then you get an entire UPS truck full of free stuff that gets dropped off on somebody else's house that you run over there and pick up. The bot versus bot. I think that's, I mean, it's somewhat comical when we think about it, um, about some of the things that people could do, but. It could also be, even though it is sort of funny when you think about it, like, oh, what if somebody had that airline situation? But what if that process was rebooking everybody that was on that flight to flights that are anywhere from three to seven days in the future and the plane's empty and nobody shows up? What if they did that to a thousand flights across the world? How much? Could that potentially impact global travel, impact economies? When you start talking about, you know, Hey, 250,000 people couldn't fly for three days, how big of an impact is that to the world? Yeah. You could do some very powerful

Kevin Dominik_Korte:

Ben_Wilson:

stuff if you don't have security controls in there.

Kevin Dominik_Korte:

alone the reputation impact.

Ben_Wilson:

Yeah.

Kevin Dominik_Korte:

It was just one airline.

Michael_Berk::

Yeah. Speaking of all these sort of cutting edge technology driven approaches, like chatbots fighting chatbots. Have you seen any trends in cybersecurity startups or sort of what's on the cutting edge for companies these days?

Kevin Dominik_Korte:

So the one thing we've seen is verification. as in, okay, how do we leverage blockchains to verify this is our update, this is something that works with us. How do we use that to actually, yeah, also verify identities, verify, okay, I'm the one who's talking to you. And kind of getting blockchain beyond cryptocurrencies and weird pictures. It's how one of the startups I talked to tried to explain it. Um, on the other hand, if you look into thread and thread analysis, again, we have pattern recognition, pattern recognition algorithms. Um, I think that was an interesting article recently about how someone used chat GP to write a morphing virus. So actually

Ben_Wilson:

Mm-hmm.

Kevin Dominik_Korte:

something which is on the computer world, like a real virus. then use ChatGP to detect it. And the biggest hurdle for him was, okay, you had to use the API because the website filters out some malicious attacks.

Ben_Wilson:

Mm-hmm.

Kevin Dominik_Korte:

But I think that's where we see a lot of security startups, again, building patterns, detecting patterns, and reacting faster than humanly possible. Because once you have a morphing virus, kind of like, with the human body you can't go in there and try to remove every virus somehow with a syringe. You need something which morphs with it and we have sent vaccines and kind of vaccinate the computer system against the particular viruses.

Michael_Berk::

That's interesting. Have you seen, uh, any companies be very successful in that realm? Cause it seems like a pretty far off concept and a difficult thing to implement.

Kevin Dominik_Korte:

I've seen some interesting approaches of defense arc as one which comes to mind right now which uses a very heavily ML based threat detection or virus antivirus system which is kind of driven by also the idea of the Stuxnet so the industrial military attack where someone with an unlimited budget tries to get into the system. or virtually unlimited. And where you really, if everything goes well, you don't have the math, which most antivirus rely on. That, okay, there are 10 different companies who report the virus. Okay, that's really a threat. Versus, okay, that's a behavior which we wanna detect. And even so it's changing slightly because it's mutating, it's still something which goes wrong.

Michael_Berk::

Interesting. Ben, have you ever seen any of these sort of detection algorithms implemented in your work?

Ben_Wilson:

Uh, not trying to detect, as Kevin said, the unlimited budget people. Um, if you're up against somebody who's attacking you, who's developing something to interrupt your uranium enrichment centrifuges, you're up against the NSA, good luck. Um, yeah, they don't have an unlimited budget, but effectively they do. And. You're up against probably a team of 300 brilliant computer scientists who, there's probably a dozen convicted felons on that staff as well, who got a parole because they attacked some financial institution stole a half of billion dollars or something. So very intelligent people that are working on something like that. Everything that I've worked with has been on the human side, which is like, How do we detect fraudulent behavior that somebody's doing with our revenue? You know, so you work for a commerce based company or people can buy things from you. There's some very clever things that people attempt to do. You know, there's basic stuff like charge backs. Like, Hey, we got delivered. They report, Hey, I didn't get this package. I want a refund. We're like, no, we have a delivery validation and the driver took a picture of it. at your house and we can see your address on the picture. And then they'll just call the credit card company and call a stop funds transfer on that. So yeah, you flag, that's pretty easy to detect. What's not easy to detect is the somebody returning goods or saying that they're returning goods. And then they set it up so that that will never get actually shipped. So they'll take it to a place to ship it, they'll take it to like the post office or something. They'll get the stamp, they'll get the certified registration of that, and then ask a question of the clerk so they turn around and pay attention to something else and they take their package back. So it's in the system saying that it's, it's going to be on a truck, but it never arrives because it's back at home. So detecting those behavior patterns is just based on the raw data that you have about shipments and returns and cancellations and refunds. And simple algorithms can be used for that. We just use stuff like logistic regression, where you train a model on a ton of data and you get a probability that this account is fraudulent. Once it is, you shut it down. But then you find that people will restart an account with a different email address, a different mailing address. But you can start looking at behavior patterns if you have just some very simple things about the registration process to your website. ask some stupid questions that don't really seem like they're meaningful. Like, Hey, what are your favorite things? And what do you really like? And if that's part of your onboarding process to register an account, sometimes that's something that companies do or ML teams do to influence the business to say, Hey, if we just collect this data, it helps us identify these clowns, they keep on doing this. So we can make it so that they have to go through a validation process before we ship anything to them. Or. If they buy something over this monetary value, we don't let them do that until they've been a customer and made three or four purchases of small things. So you can't order a motorcycle, for instance, and we deliver a $30,000

Kevin Dominik_Korte:

Thanks for watching!

Ben_Wilson:

piece of equipment to them and they say, we didn't get it. It's like, you can buy a towel first. And once you keep that towel, we know you're good, maybe. So yeah, there's lots of different things that are being checked for. And just the binary labeling of, hey, this is fraud or this is not fraud on transactions or on users is very limiting. But opening it up to be sort of a multi-class classification problem where you train it to detect a bunch of things all at once. And then you leave open this bucket of, I don't know what this is. And it requires human intervention. that triggers the labeling process.

Michael_Berk::

And typically in your guys's experience are these sort of ML algorithms that fall under the IT and security bucket. Is that work done by ML engineers that report into IT? Is it done by ML engineers that are on the ML engineering team, or is it done by contractors that the IT brings in?

Kevin Dominik_Korte:

depends very much on the size of the company.

Ben_Wilson:

Yes.

Kevin Dominik_Korte:

So I think for most of the ones I work directly with is if one or two people on the team and then once updates come in, they bring 20 more by a contractor who's specialized on this kind of model building, model training. And then you have the two guys who maintain it, who can work with it well enough to keep it going and to... flag, okay, this is a problem with the model versus this is really a problem with the person using it or the customer, the data. And once you hit the next update, once someone got smart enough to fight with a good ML model against it, if you want to go back against bot against bot, then you again bring in the big team, bring in the contractor who builds it up.

Ben_Wilson:

Yeah, I'd echo that exactly. At startups that I've worked at are very small commercial companies. Uh, it's usually just one or maybe two data science teams. And for a problem like this, it's all up to the executive staff to say, Hey, the most senior people like, Hey, you two team leads and your senior staff. You got a month to figure this out, put something into production where we can stop losing so much money and you'll build it. But if it's. so big that, or you, you have other commitments. Yeah. You hire a contractor to come in, you know, pay for a specialist to build it. But if it's a, it's a bank, like an international bank, they'll have ML engineers and data scientists. That's their only job is to do this fraud detection and compliance. You know, machine learning jobs. They might have. 200 or a thousand production jobs running every single day, just monitoring different aspects of their business.

Michael_Berk::

Interesting. Okay. And Kevin, do you guys at Univention provide any of those services or is that sort of not part of your stack?

Kevin Dominik_Korte:

No, we normally we provide the interfaces for that, but it's not within the contractors who are really in that area who are our partners, but who are also the partner to the customer who then builds a model because a bank deals with different threats than a healthcare provider, than the government. And so it's kind of the difference between having a project and having a product. We can make it good enough to fit most of them, but for certain security tasks, you need to be really the fit in there. Because if you're a bank, no one is interested in the HIPAA data, versus getting millions of dollar out is not what you go for if you rob a medical device maker.

Michael_Berk::

Yeah, that's a good point. So it's very use case specific. Um, and then you mentioned HIPAA and do you guys take different measures for, uh, sort of levels of security, whether you're in Europe or the U S or Australia, because all the regulations in each country, there's some similarities, but there's also a lot of differences. So do you guys have different products?

Kevin Dominik_Korte:

So we have configurations which can differ between countries. But our approach normally is that it should be secure enough to hit all of them. The only point where it sometimes doesn't happen if new algorithms come in, new security, new encryption algorithms, there might be different versions for Europe and the US because generally US... The US goes faster in requiring new algorithms than Europe. Maybe also because a lot of the algorithms come from the US government space.

Ben_Wilson:

Thanks for watching! Yeah,

Michael_Berk::

Yeah,

Ben_Wilson:

Michael_Berk::

that makes

Ben_Wilson:

a former

Michael_Berk::

sense.

Ben_Wilson:

communications officer in the Navy, I can tell you that the security that is part of message traffic, like how information gets passed remotely through something that's not connected to the internet. The internet that's not the internet. I've never seen any company that I've interacted with since leaving the military. that comes even remotely close to that level of security. There's no real mechanism to breach any of that stuff. That's not continuously tested by researchers, but there's like physical validation. Like you have to, there are devices that are created that prove that a human and the specific human is inside this room at this time. and is sitting at this computer. And if you're not, the system just locks you out. You can't do anything. You can't see anything. The screen doesn't show anything sensitive. Do you ever see in your field that people are going to start asking for trending towards that to say, hey, we really don't ever want to have a breach or we can't lose this reputation. We want to go all in on whatever you can. and able for us.

Kevin Dominik_Korte:

I think not to the point of where we have seen it was when one of our contractors was building their new threat management, they had like sensors in the chair, which detected if you got up, that locked directly the computer. I think nowhere outside the military have I seen something like that.

Ben_Wilson:

biometric validation on the computer, on the keyboard, on the door to the room, and then basically heat signature detection in the room so that if there's more than one heat signature detected that's of a human shape, the screen goes black. So there's crazy stuff, but

Kevin Dominik_Korte:

Yes.

Ben_Wilson:

so nobody's saying like, hey, we actually have data that's this secure that's not in the government. Okay.

Kevin Dominik_Korte:

No, I don't think I've seen it in the government. I think also a price question. Heat signature detection to, for example, it's a great Arduino project, but it's, that thing can't detect whether it's my toddler running under the desk or whether my computer is currently running Chrome.

Ben_Wilson:

Hmm.

Kevin Dominik_Korte:

That's just too big heat signatures versus what's. What we've seen is the security space that can say, okay, that's a computer, that's a person, that's a person who's currently very nervous. And it's just, okay, one of these sensors, oh, that was 10 years ago, and I was in college, was around, what, 15,000 bucks, without anything wired to it yet.

Ben_Wilson:

Mm-hmm.

Kevin Dominik_Korte:

It's just beyond what a bank would pay for any office.

Michael_Berk::

Yeah, wow. So we're coming up on time. I'll do a quick wrap and then hand it over to you, Kevin, for any next steps in case people want to get in contact. So there seem to be about three ways that there can be malicious actors getting access to your systems. The first is stolen credentials, and that's about 80% of the use cases. The next is phishing. That's about 18%. And then third is bad software. And that's only about 2%. This is a lack of updates or things like that. But typically software is not the issue. The human component is, um, from a personal security standpoint, you can use password generators, uh, as humans. We are not great at generating random things in general. So a password generator will actually generate a random password and, um, also using multifactor authentication is really helpful. And then finally, just sort of on the algorithms that detect these security faults, uh, geolocation, a number of sign-ins are two common features that are used. But beyond that, you should get creative and figure out what is applied to your use case. So Kevin, if people want to get in contact, where can they find you?

Kevin Dominik_Korte:

I think searching me on LinkedIn with Kevin Dominic Carter is the easiest way and the way where there's no debate about who owns it right now as with other social media. Otherwise via Univention.com you definitely find my name and contact information on there.

Michael_Berk::

Terrific. All right. Well, this was a lot of fun. Thank you so much, Kevin, for joining. And until next time, it's been Michael Burke and my co-host.

Ben_Wilson:

Wilson.

Michael_Berk::

And have a good day, everyone.

Ben_Wilson:

Take it easy.

Protecting Your ML From Phishing And Hackers - ML 101

0:00

41:48

Playback Speed:

Show Notes

On YouTube

Sponsors

Links

Transcript