JSJ 477: Understanding Search Engines and SEO (for devs) - Part 2
If you're building a website or web-app, there's a good chance that you want people to find it so that they will access it. These days this mostly means that you want it to appear in the relevant search engine results pages (SERP). In this episode we are joined by Martin Splitt, DevRel at Google for the Search & Web ecosystem, who explains in detail how search engines work, and what developers and SEOs need to know and do in order to be on their good side.
Special Guests:
Martin Splitt
Show Notes
If you're building a website or web-app, there's a good chance that you want people to find it so that they will access it. These days this mostly means that you want it to appear in the relevant search engine results pages (SERP). In this episode we are joined by Martin Splitt, DevRel at Google for the Search & Web ecosystem, who explains in detail how search engines work, and what developers and SEOs need to know and do in order to be on their good side.
Panel
- Aimee Knight
- AJ O'Neal
- Dan Shappir
- Steve Edwards
Guest
- Martin Splitt
Sponsors
Links
Picks
- AJ- What If?: Serious Scientific Answers to Absurd Hypothetical Questions by Randall Munroe
- AJ- How To: Absurd Scientific Advice for Common Real-World Problems by Randall Munroe
- AJ- Thing Explainer: Complicated Stuff in Simple Words by Randall Munroe
- AJ- From Microsoft, Oracle, etc to NSA Data Center (Google Map)
- AJ- Square Stone Wheel (Test Institute Stone and Stone Caveman User Focus Group)
- Dan- How to Systematically Debug Your CSS Just Like You Would Your JavaScript?
- Martin- The curious tale of Tegel’s Boeing 707
- Martin- Escaped cloned female mutant crayfish take over Belgian cemetery
- Martin- Duke Graduate School Scientific Writing Resource
- Steve- In Plain Sight (TV Series 2008-2012)
Special Guest: Martin Splitt.
Sponsored By:
Transcript
DAN_SHAPPIR: Hello everybody and welcome to another episode of JavaScript Jabber. I'm Dan Shapir coming to you from Tel Aviv and today on our panel, we have Steve Edwards.
STEVE_EDWARDS: Hello from Portland.
DAN_SHAPPIR: Amy Knight. Hello from Nashville. AJ O'Neil. Yo, yo, yo. Coming at you live from they changed trash day to Tuesday.
DAN_SHAPPIR: And our special guest for today is Martin Splitt from Google. Hi, Martin.
MARTIN_SPLITT: Hi there, and hello from Zurich, Switzerland.
DAN_SHAPPIR: Oh, it must be a very nice weather over there right now.
MARTIN_SPLITT: Yeah, it's actually sunny, warm, and blue sky, and not too bad. Yeah, surprisingly.
DAN_SHAPPIR: And how's the temperature like?
MARTIN_SPLITT: I think today we had 14 degrees centigrade. I think we'll need to maybe try to convert it for American.
AJ_O’NEAL: Nice and toasty.
MARTIN_SPLITT: Yeah, I think in American units, it's 25.3 caterpillars in a nutshell or something.
AJ_O’NEAL: Yeah. Well, what you do is you, you divide by two and then multiply by five nights.
MARTIN_SPLITT: Right. Sounds, sounds legitimate.
AJ_O’NEAL: Plus 32. Don't forget the plus 32. You got to carry the 32.
MARTIN_SPLITT: I think it's in the upper fifties for you guys.
STEVE_EDWARDS: What are the caterpillars and the nutshell fit in there? I missed that in the calculation.
MARTIN_SPLITT: Yeah, I don't know. I have no idea how the exact like there are coefficients there and like you have to figure out the units I guess but I'm not a I'm not really good at this stuff so I don't know.
AJ_O’NEAL: It's either 39 or 57 depending on whether it was supposed to be five ninths or nine fifths.
DAN_SHAPPIR: I'll tell you one thing whenever I need to convert Celsius to Fahrenheit or vice versa I just use this thing called Google search. Martin have you heard about it?
MARTIN_SPLITT: I heard about it. It's apparently like a really hot startup from the Mountain View area right now.
DAN_SHAPPIR: Yeah, for those of you who don't know, Martin is actually involved with Google Search. Can you tell us your role there?
MARTIN_SPLITT: Yes, so I'm a developer advocate at the Google Search Relations team. So my job is to both help everyone build websites that can be discovered through search engines, more specifically through Google Search, and also to to basically bring back developer and SEO feedback to the relevant product teams at Google, in Google Search more specifically, to make sure that Google Search works the way it's supposed to do.
DAN_SHAPPIR: And we brought Martin on our show to explain to us exactly how the Google Search algorithm works on the inside, right, Martin?
MARTIN_SPLITT: Yes, correct. That's exactly what I'm here for.
This episode is brought to you by Dexecure a company that helps developers make websites load faster automatically. With Dexecure, you no longer need to constantly chase new compression techniques. Let them do the work for you and focus on what you love doing, building products and features. Not only is Dexecure easy to integrate, it makes your website 40% faster, increases website traffic, and better yet, your website running faster than your competitors. Visit dexecure.com slash JSJabber to learn more about how their products work.
STEVE_EDWARDS: Okay, so now this being a JavaScript podcast, one of the known issues over the, since JavaScript frameworks have come into vogue, shall we say, starting with Angular, React, Vue, Svelte, et cetera, has always been the issue, the known issue where the Googlebot would have issues indexing stuff that was strictly in a JavaScript front end because the HTML is not there. And so you got to query, go get the HTML, and by the time it's done, the HTML isn't there, so nothing's indexed. I, for instance, I was at a very, very large international enterprise company and they were redoing their whole site and had initially the one somebody to do it in Angular and a front end. And then they started looking, saw all the SEO issues and said, no, we're going to do it in back end. So they did their back end front end in Drupal, which was a nightmare in itself, PHP on the front end. But, you know, obviously that was a standard issue. And so then you started getting frameworks like Next and Nuxt and your static site generators, 11D, Jekyll, so on and so forth to try to overcome those issues. And it was interesting that, you know, I think cloaking was considered an evil thing by Google back in the time. And you can obviously correct me on the history. So I'm curious one, how has. Is Google better able to index pure JavaScript front ends? Is it still necessary to do server-side generation to get good SEO for its JavaScript-based site? And two, as I understand it, cloaking or is now not considered as evil as it was. In my current job, and we had a full discussion about this in our reviews on view podcast that came out a couple of weeks ago, we do things like where we use view on the front end and flareable on the back end, but we basically have a whole separate a site that we warm, we load up our cache, the Nginx cache, with all our URLs so that when Google comes around and hits those and doesn't try to index a view page. So hopefully all that makes sense. I just care, I guess what I'm looking for you to address is where Google search stands in terms of JavaScript frontends and whether or not you still need to do server-side rendering in order to get good site indexing.
MARTIN_SPLITT: So that's a really good question and it's an interesting one to answer. I think in the last three years, we have gotten a lot better, especially in 2019 in May when we announced that Googlebot is now using an evergreen chromium to actually render pages. Because beforehand, we have been stuck on Chrome 41 for a really long time, which usually led to issues, especially if you have untranspiled ES5 or ES6 that was not fully supported in Chrome 41, you would obviously run into the situation that certain bits of JavaScript wouldn't work and we wouldn't see the content. And that was not great. And I remember before I joined Google in 2018, I ran a few experiments just for shits and giggles because I was basically like talking to the team that I'm now part of. And they're like, yeah, you know, there are a few challenges and I wanted to see what the challenges were. And debugging was so much harder because all you got was like you could type a URL into the fetch and render and then you hit test and then it would come back with an image. And sometimes this image will just be a blank page. And then you're like, well, why, why is this a blank page? This is not a blank page when I open it in the browser. And then you would have to like comment out half of your code and see if it starts to reappear. Oh, it doesn't. Okay. In that case, I comment on half of that half and basically bisecting your way towards figuring out what breaks Google. That was frustrating. That's now a lot easier with the new URL inspection tool, where you have the console messages. So if something goes wrong, you see the console message, and you can figure out what's going on there. That's one side of the thing. So I think we are now in a state where we are doing reasonably well when it comes to rendering pages that are JavaScript generated or client-side rendered. And we are pretty much rendering all the pages anyway. So that's not really something that's that is a concern per se. That being said, frameworks used to contribute their own wealth of issues themselves to this. When I say that, what I mean is, for instance, Angular, I think Angular 2 at the very beginning in a beta version or something, and I only remember I had, I've seen it once, was generating links. If you are basically saying like, oh, I want a router link that goes somewhere, it would generate an A navlink equals something something. So the anchor element or the link tag would not have an href with an actual URL. And it would work because it was dealing with click handlers, which is fantastic, but Googlebot doesn't click on anything. And because Googlebot doesn't click on anything, these links were basically lost for Google. So Google was not able to discover these links and thus would probably get stuck on your homepage and then move swiftly on.
DAN_SHAPPIR: Which company, by the way, created Angular?
MARTIN_SPLITT: Yeah, no, that's a fun one. Yeah. You would think that we give any beneficial treatment to the people at the Angular team. We don't. And no one at Google gets any special treatment in terms of SEO. They all have to go through the public support channels. And if they don't care for it, then that's what happens. And in this case, I think the Angular team very quickly realized that mistake and fixed it, and we are seeing improvements there, definitely. I have also talked to the different frameworks communities out there. For instance, I have talked to the maintainer of the Vue router, which gets an overhaul in version 3, or got an overhaul in version 3 that solves one of the things that I have seen in the past being a problem with Vue applications where ViewRouter by default uses hash routes, which is great because you don't need to configure your server for that. And that's fantastic for local development, but it really, really is bad for search engines because fragment links are not meant to link to different pieces of content. They are meant to link into a piece of content that is already on the page. So we disregard them, and pretty much everyone else does as well. And so it's like, oh, so yeah, there's like this hash slash about or like hash slash product slash one, two, three, four. But that's just the part of the page that I already just indexed. So by, and then again, you get stuck on the homepage. So that's, that's something that you can easily fix by just reconfiguring it to use the history API, but you need to know that. And if you don't know that that is a bit of a problem because then you just published your website and then it doesn't work and then you're like, I thought Google search works with JavaScript and it's like, yes, it does. It's just your JavaScript is generating content in a way that is bad. So that's, that's, that's a bit of a problem, but in general, that is not a concern these days. It should just work out of the box. It works pretty much out of the box of all the frameworks in current versions, as far as I'm aware. But when people are moving from a non JavaScript website to a JavaScript generator website. They often make other mistakes, like, as I said, blocking things from the robots TXT or doing something really weird in terms of caching with the way that they have JavaScript release process works and so on and so forth. And that way they sometimes make mistakes that they are not aware of until it is too late, kind of. And that can be frustrating. And that's why you want an SEO to help you with these kind of migrations, because they know where to look before you go live. And they can prevent the damage from accidentally dropping out of search engine result pages out of SERPs by accident when you do these kind of migrations. The other thing that you mentioned is cloaking that I want to get into real quick. Cloaking is still a big problem and still a no-no. But it's not what you think it is, I think. So cloaking is when a website basically tells the bots, like Googlebot. I'm a website about kitties. Yay. Look at the cute kitties I have on my website here. Whoo. And then when the user clicks on them, it's actually trying to sell drugs or weapons or porn or whatever it is that
STEVE_EDWARDS: Yeah, definitely not what I was asking. I guess I used the wrong term.
DAN_SHAPPIR: Or dogs.
MARTIN_SPLITT: Or dogs. Yeah. I mean, it doesn't have to be nasty stuff. It can also be just like a website that says like I'm about kittens. And then the user comes and it's all about dogs. Also bad. It's not cloaking if you serve as a different version of the same content. So if you do that, that's what we would call dynamic rendering. And that's actually fine. A lot of people are still doing it. I would consider that a more short-term solution because I mean, some people are just experiencing problems with their JavaScript for all sorts of interesting reasons. If it's a really large website, you might run into crawl budget issues. As I said, like we only can do so many HTTP requests at the time and the JavaScript files and all that stuff, all the API calls, are counting towards your JavaScript, sorry, towards your crawl budget. So they might still see issues and then decide that they want to go server-side rendered. I think that's great, especially if you also serve that server-side rendered version to users, and they just get a quicker, faster experience. But if you can't or don't want to do that, and you only want to give us a static version in terms of us as in the search engines, that's fine, as long as it's not misleading the user. As I said, if the website says it's about kittens in the user version and it says about kitten, it's about kittens in the bot version, that's fine. That's time and I'm going to ring. There's no problem with that.
DAN_SHAPPIR: Martin, we are actually nearing the length of our usual episodes.
MARTIN_SPLITT: Oh my.
DAN_SHAPPIR: Yeah. But this is such an awesome conversation. So I'm really going to pose a question to you with, which is whether you want to go a little longer this time and you know, we could theoretically even split it into two episodes or alternatively, we can try to find another slot in your busy schedule to do a follow up episode because this is just so great.
MARTIN_SPLITT: We can we can do that today. Yeah, sure.
DAN_SHAPPIR: We can just so we can just keep on going for now.
MARTIN_SPLITT: Mm hmm. Yeah.
DAN_SHAPPIR: Excellent. That's awesome. So back to our regular content, you mentioned the term search budget. And I recall reading or hearing somewhere that indeed, because you kind of, you know, the Google bot needs to crawl the entire web. And obviously the web is kind of big and that takes some time, which means, and you've got obviously a finite amount of computing power. There's just so much time and resources and effort you can allocate to each and every page on the web. And you kind of mentioned that some pages are heavier than others that, you know, if it's just static HTML that obviously that's the lightest way thing. But on the other hand, if it's, let's say a blank page that the entire content is rendered client side using JavaScript and Ajax calls that can obviously be much heavier. How does that factor into when the pages get indexed? Like do you, would you index?
them at separate times? When would they get the resources to actually be crawled? That's the point that I'm trying to ask.
MARTIN_SPLITT: Okay, so to make that a little more transparent and clearer to grasp, the term specifically is crawl budget. So it doesn't matter if it's a just HTML website or if it's a client-side rendered heavy website with lots of images. That does not concern us. There's no difference in treating these two different things. Where there is a difference is, is your website a blog with a hundred articles or is your website a large newspaper with a million pages or let's say 10 million pages because maybe they have like different sub silos for different regions and then they have different resources like meta-science and celebrities and political and whatnot and then easily you end up with like a hundred million pages in your site. It's that that is the problem the 100 million pages. Because for a website that has like 100 pages, that's easy, right? Making 100 HTTP connections and just downloading the 100 articles, that's not really a problem for us. Doing 10 million, that is trickier. And then comes in scheduling. And scheduling is complicated and interesting. Basically, crawl budget is made out of two separate parts. One is the crawl rate, and the other one is the the thing that you as a website owner can control more or less is the crawl rate. The crawl rate is basically, as I said, us trying to not kill your server. If I am running my blog from a Raspberry Pi, then maybe, just maybe, and unlike, let's say like a dial-up modem, maybe 10 users are what I can handle. Normally, I get five users at the same time, so everything's easy and nice and dandy. But then comes along Googlebot and discovers 300 blog posts on that site that I surf off of my Raspberry Pi dial-up connection. If Google were to actually do 200 connections at a time with my fantastic server setup being able to handle 10, that's a problem. Because then we would bring down the website at least temporarily. Not great. So the way that we do it is we basically just say, OK, so can we have? I don't know. And these numbers are not like set in stone. I'm just making them up as I go along as an illustration of the scenario. We might try to make 10 connections and then we see, oh, OK. So the server is already starting to like the five connections went fast. And then number six, seven, eight and nine and ten. Those are the ones that are already slowing down. Then we might consider maybe we shouldn't do this like this many connections. Maybe next time we come around, we just fetch five. So we fetch the first five pages, we look at them, we discover links to the other articles, and then we take the next batch of five, and then the next batch of five, and then the next batch of five, which you can imagine with 200 articles might take a while, especially if you create a new one, or if you are updating an existing one, it can take quite a bit until we actually see the update. If we are coming once a day to your site, because there's just not that much change then it might take us a few days until we discover an update to one of your pages. And that's maybe not enough. So in that case, you need to beef up your server setup a little bit so that we can make more connections at the same time. That's one thing. You can counteract that as well with the tips and tricks that I mentioned where it's like, OK, so maybe I have a bunch of URLs pointing to the same content, but I have a canonical tag so I can avoid some of the requests to be made to begin with, or I make my server quicker. The other thing is if we see a specific kind of like, if we see a server response that looks like we are about to overwhelm the server, like a 500, 501, 502, all of these basically tell us, oh, we may have bitten off more than we should actually have bitten off of your server. So we will also remember that and basically try to make less connections at the same time. We'll always try to increase it a little bit in case the server has gotten better or there's just less load on the server right now. But this can for large websites really be an important factor. The other thing is crawl demand. That's something that you don't really have control over. That's just us figuring out from looking at all the pages that we got from you. So if my website, for instance, is a blog that I don't block as much as I used to, I think I blog like once every couple of months. So Googlebot has figured out there's just not that much going on on this website. So it's perfectly fine to just check like once a month, probably, or maybe twice a month, just to be sure. So it doesn't really like go there that often. Whereas the website that is a news website probably has lots of stuff coming up all the time. And also the difference might be that my blog is probably not as popular because it doesn't cover any current issues or like high volume issues. Whereas something that covers the current affairs is probably something that gets searched a lot. So we want to make sure that we have the best information. If your website is on, let's take a current example on Corona vaccinations in Berlin, your website is on there and has like lots of information and updates this information on an hourly basis. It probably gets searched quite a lot and actually probably provides really good search results for frequently done searches and Berlin is probably large enough to also have like regional interest on top of that, then probably we want to give your website a little more attention in terms of crawling than my website, which is a blog that updates every couple of years and is about like some random niche tech topics.
STEVE_EDWARDS: So how is that, let me interrupt for a quick, sorry, Martin. So how is that determined in terms of, okay, this one's updated more frequently, so we need to come back more often. Is that something that the Google bot indexes and maintains, you know, like a diff between what was their last time versus this time. I know I've seen before, for instance, in a sitemap that, which leads me to another question I'll have, that you can say, okay, yeah, this is updated daily or weekly, I believe, or something like that. So how exactly is that determination made that this is updated more frequently, so we need to come back here more often?
MARTIN_SPLITT: That happens on a...
DAN_SHAPPIR: Sorry, and to add to that, would an RSS feed impact that?
MARTIN_SPLITT: I don't think does impact this. I don't think we're using RSS specifically. We do use the sitemap. That's one thing that you can use. But the sitemap only gives us signals that we can or cannot trust. It depends on how well you have been doing this in the past. And generally, with sitemap, we see a lot of spammy sitemaps where lots of people are basically saying, this is really important and has been updated a minute ago. And we're like, yeah, sure. We do a diff. So we do look at how often does content change, when was the last update that we saw when the content has changed effectively. Also, when you have dates on the blog post, we do read those. So we try to figure out, OK, so the author says, this has been updated yesterday. But actually, we have found out that this hasn't actually been changing in the last five days. Then we might disregard that. So we basically are pulling a lot of signals, including the date that you specify. If there's structured data for an article, for instance, you can also specify a last updated date. We look at the sitemap information. We get a bunch of bits and pieces. I'm actually not 100% sure about the sitemap. We might have stopped that a while ago. I would have to clarify that. I'm not sure about sitemaps. But we look at a bunch of different things where we have a feeling like, oh, okay, so this roughly updates on an hourly basis, whereas this other page seems to be only changing its content every couple of weeks. We will every now and then increase the crawling frequency a little bit to see if it's more frequent in terms of the changes. But if we then don't find anything, then we'll ramp it down again, basically. So that's that's how we do that.
AJ_O’NEAL: So how long does it take you to crawl the Internet? Long.
MARTIN_SPLITT: I actually don't have the numbers. I don't know. That's a good question.
AJ_O’NEAL: Like long is that it takes more than 20 minutes or long takes more than a few microseconds?
MARTIN_SPLITT: Oh, no, no, no. It takes more than 20 minutes.
AJ_O’NEAL: Does it take more?
MARTIN_SPLITT: I don't know. I actually don't know how quickly we are going through the entire. And then the question is, are we measuring against our index? Like, how long does it take for us to recrawl the entire index? Or are we talking about the entire web, which is a different story?
AJ_O’NEAL: I don't know what the difference is.
MARTIN_SPLITT: It's interesting. It turns out we can't actually store everything. Surprise! I know.
STEVE_EDWARDS: What?
MARTIN_SPLITT: Yeah. I'm as surprised as everyone else is in this case.
DAN_SHAPPIR: We've got the NSA for that, I think.
MARTIN_SPLITT: I guess so. Yeah, I think they probably store everything. No, so when we see a page that has very little content or has really crappy content, then we don't index it. We can choose to not index something.
AJ_O’NEAL: So do you think that you guys have a better backup of the internet or does the Bluffdale data center have a better backup of the internet?
MARTIN_SPLITT: I don't know.
AJ_O’NEAL: Okay.
MARTIN_SPLITT: I'm assuming the Bluffdale is the NSA. Or what is that? Yeah.
DAN_SHAPPIR: Okay. But going back to, to this whole, uh, crawl budget thing, again, I might have understood things wrong or maybe I've been given wrong information, but I, I remember distinctly being told by someone that if my, if I have a page whose content is generated by server, either via SSR or it's just static content, whatever, then it will get scanned faster than if it's just or sooner, I would say, not get scanned sooner potentially than if it's just JavaScript generated content. And even more specifically, if I've got a page that that is a mixture that has some static content, which is then augmented by JavaScript on the client side, that static part would get read earlier potentially than the dynamic content that's added via client side JavaScript. Is that correct?
MARTIN_SPLITT: Yeah, that is partially true. It's also a metaphor backfiring. We used to say so when so the the big challenge for us.
DAN_SHAPPIR: I'm glad I am use you.
MARTIN_SPLITT: No, it's I'm just laughing because this keeps happening and I understand where it's coming from. And I'm just like, ah dang this happened right before I joined and then continued right after I joined and I still have to to deal with the fallout of it. I wouldn't have done it differently. So the problem for us is, or the challenge for us really, is to explain processes that are really hard to explain. In terms of Google search indexing, the entire pipeline basically is a bunch of microservices that talk to each other. A bunch of things happen in parallel, and a bunch of other things happen multiple times if it needs be, and we have to simplify. And the way that we used to simplify the way that JavaScript worked, in Google search was to say we have two waves of indexing, where we would look at the HTML version first and then index that. And then once we have rendered the JavaScript, we would update the index. That is a gross and incorrect oversimplification, even though it's not 100% incorrect actually. Because what happens is that once we crawl, we look at the HTML. We would be stupid not to look at the HTML, right? It's the best thing we can do is we have the HTML, even if it doesn't have all the content, it might have some bits and pieces of information that we can already use and work with. One such piece of information that we can work with would be if it has like a canonical tag, and this canonical tag is now pointing at this being a duplicate page, then we can potentially just stop our work here. Or if it contains a noindex, robots meta tag, then why would we do the rest of the step? All of these steps cost us time and resources and we could spend them elsewhere on pages that want to be indexed. Another important thing that we do is the moment we have downloaded the HTML from the crawl, we are looking at the HTML to find links so that we can then feed the scheduler with new links to potentially crawl. Now, if a website is completely done with JavaScript, this HTML probably does not contain anything. So we can't really kickstart the process of discovering or like we do the process of discovering links, but won't discover any because there are none. So you do have an advantage there if you are actually having at least some bits and pieces in the static HTML because we can scan the HTML already and potentially discover other parts of your website that we might not know about yet and then we can potentially index them a little quicker. Unless you have submitted a sitemap, then that's pointless because then you already gave us a list of all the links that you care about. So doesn't really help us that much. It still does help us because the sitemap does not have a hierarchy, and the links on pages actually do have a hierarchy. But there, it's about understanding the link graph inside your site versus discovering new content. So it's not that time critical, really. But if I were to have a new website today and I want all the links to be discovered as quickly as possible, and I, for some reason, don't want to or can't submit a sitemap, then I would be happy to do that. Having that in the initial HTML that comes over through the crawl is probably buying me a little bit of time. With indexing, as I said, pretty much all the pages get rendered, and they get indexed based on the rendered content. So the rendering delay is a couple of minutes. So after the crawl, before we can index, it takes a few minutes, usually, until we have the rendered content, and then we can carry on. There are scenarios where if the JavaScript fails to download or if there are any other things, like if you robot your JavaScript, then we're kind of screwed. Then the rendering won't produce the content that you care for. And again, you would have had a more robust and successful experience if you would have had your content and the initial HTML. So server-side rendering does still have its place. But you're not getting quicker results or you're not getting more crawl budget just because you are not using JavaScript. But there is a point where, as I said, very large websites with millions of pages, they might be conscious of their crawl budget. And again, downloading the JavaScript bundle is an additional request. So you can eliminate a request there if you are keen on that. Most people won't be. Most people don't care and don't have to care. But there is no inherent slowness or tardiness because of JavaScript in the process. I know that that still gets said a lot of times and that usually comes from the fact that people have over-interpreted the two waves of indexing metaphor.
I remember working my tail off to become a senior developer. I read every book I could get my hands on. I went to any conference I could and watch the videos about the things that I thought I needed to learn. And eventually I got that senior developer job. And then I realized that the rest of my career looked just like where I was now. I mean, where was the rush I got from learning? What was I supposed to do to keep growing? And then I found it. I got the chance to mentor some developers. I started a podcast and helped many more developers. I did screencasts and helped even more developers. I kind of became a dev hero. And now I want to help you become one too. And if you're looking forward to something more than doing the same thing at a different job three years from now, then join the Dev Heroes Accelerator. I'll walk you through the process of building and growing a following and finding people that you can uniquely help as you build the next stage of your career. You can learn more at devheroesaccelerator.com.
DAN_SHAPPIR: So just to clarify and put this myth to rest and say it as shortly as.
MARTIN_SPLITT: Oh, I wish we could put this to rest.
DAN_SHAPPIR: I don't... Whether the site is SSR or whether the site is CSR, server-side rendered or client-side rendered, especially given that today the bot is in evergreen chrome, it doesn't make a difference in terms of the indexing. That's essentially what you're saying index just as well either way.
MARTIN_SPLITT: Yeah, unless obviously there's technical problems, which JavaScript invites many, but if you do it right, then there's no problem.
DAN_SHAPPIR: Oh yeah, obviously, for example, theoretically, I could be adding content when you scroll or stuff like that, and obviously Google both doesn't actually scroll.
MARTIN_SPLITT: Yeah, and we've seen many iterations of that. For instance, people were like, oh, why do all these things drop out of ranking for this query? And we're like, because you don't have any content for that specific thing. And we do. It's like only if we scroll. They didn't have that problem beforehand.
STEVE_EDWARDS: Sorry. So when you are indexing, does it, this just sort of came to my mind, and I could be totally off base here, does it pay any indication to CSS attributes? So for instance, if I had a display none on an element because I want it to show, and then once I scroll, that comes into view you know, just as a dumb example, is that something that impacts Google at all? Or is it just looking strictly at the content of the elements?
MARTIN_SPLITT: It does have some impact, but not as much as people sometimes think. This is a really good question that I'm always very careful about when I get it because it's easy to misinterpret what I'm saying there. We will see the content. Oh, we know that there we go immediately failed we will consider the content that is a CSS display none. But because you're hiding it, you're kinda telling us you don't care as much about it as the other content that is visible.
STEVE_EDWARDS: At that point in time, you're not. I mean, you could be, they need to click on some button. Oh, okay, we wanna see it now. So yeah, I could see that.
MARTIN_SPLITT: Fair enough, fair enough. Yeah, but that's exactly the challenge there where it's like, okay, so. If we were to say, so there's like five things on this page that is being talked about or that are being talked about, which of these five are the main things? And then I would argue the thing that is right on top of the page with a huge headline over it and lots of things in it, to other good things on your site or other sites and with like images and stuff is probably more important to you and to probably also to the user than something that is hidden behind an interaction. Even though it is there, it's clearly important. It's just not as important as the other things. So we might come to a slightly different conclusion than what you intended us to come to, where it's like we think this really is about wine and red wine specifically, but you're hiding that behind a read more button or something and then the rest of the page talks about how beautiful the landscape is, then we might not get the hint that this is a page about wine.
DAN_SHAPPIR: It'll be interesting to see how you guys handle this new CSS property that it was in fact championed by Google people working on the Chrome browser, which is content visibility. That's intended to optimize how pages render their stuff as you scroll down the page.
MARTIN_SPLITT: I haven't really looked into that specific property yet in terms of how we are dealing with it in rendering, but the way I understand it, we wouldn't cause issues there because we don't scroll but yet we make interesting things happen to make sure that all the content gets into the viewport. Let me put it that way. There's like a bunch of implementation details that I don't really want to go into because a these can change and b these are really not something actionable for webmasters but I would assume that the CSS content visibility property is safe because the intersection observer for instance is also safe to use even though we don't scroll on pages.
STEVE_EDWARDS: Well there's a term I haven't heard in a while webmaster. I can still remember my own webmaster at such a site.
MARTIN_SPLITT: We have only recently rebranded ourselves away from Google Webmasters because no one uses that term anymore. And here I am using exactly that term.
STEVE_EDWARDS: So, OK, so switching topics, this is I'm going to guess this is something you might be able to speak at length on the updates that are coming in 2021 in terms of page experience and core web vitals. So I just wondered if you could talk about what those are and brief description and then maybe why Google is implementing these requirements or what's the term I'm looking for, characteristics or things you're looking at as you're crawling.
MARTIN_SPLITT: Signals, yeah. So as Google search, we have figured out that, surprise, surprise, one characteristic of a good search result is also that the page is reasonably fast, because you don't want to click on a search result and then wait for, I don't know, two minutes until you can actually do whatever it is supposed to. You were wanting to do in the first place, be it order a pizza or read an article about something. Doesn't matter. So speed is an important factor, or is one important factor out of many important factors. And speed is a factor. Page speed has historically already been a ranking factor for quite a while. And we figured that we should not have our own definition of page speed because there used to be a proprietary non-disclosed version of us determining how fast the page is for that specific ranking purpose. And the other thing is the challenge with page speed is that's an ever evolving understanding of what that is, right? At the beginning of the internet, you could just basically look at time to first buy it. How long does it take from my computer to the server and back? That's what made the website faster or slower. But then JavaScript came in and then we figured out like, oh, maybe we need like the speed index metric. I don't even know how that was calculated. I completely forgot about it relatively quickly again. Then there was like first common full pain and then it was first meaningful pain and all this kind of metrics came up to determine what a fast website is or how to measure if a website is fast or not. And that had...
DAN_SHAPPIR: I can interrupt you for a second. I'll just mention to our listeners that a while back in episode 428, we had an episode titled the alphabet super performance measurements, where we kind of went down the list of all these different metrics if they're interested in this. So yeah, there has been a jungle of metrics. They have continued to change, which led also to frustration because of moving goalposts. So you might go off and focus on improving your website's loading performance and speed measured by some metric A. And then you come back after a month, say like, it's faster now. And then everyone's like, no, it's not. Look, metric B says it hasn't improved. And you're like, what? I didn't even know metric B existed. So Google proposed the Core Web Vitals, which are three metrics, largest contentful paint, first input delay, and cumulative layout shift, that are basically saying like, when is the website visually complete, how stable is it visually, and how quickly can I interact with it. That's the gist of Core Web Vitals. And in Search, we have decided that this is a good opportunity for us to change the way that we use page speed. And basically, we are bundling a few ranking signals. One is HTTPS, for instance. The other one is the core or are the core web vitals instead of the proprietary page speed variation. And a bunch of other things are being bundled together under a new ranking signal that's called the page experience signal. And this ranking signal will launch probably in May. That's what's happening there.
DAN_SHAPPIR: In this context, I want to mention something. And feel free to correct me. In fact, that's what you're here for, to correct me is that what I'm hearing from various SEOs is like, some of them are really becoming hysterical about this. I mean, you know, my day job is about performance. So obviously, I think that performance is super important. In a lot of ways, it's actually more important in other aspects rather than SEO, for example, in the conversion rates and stuff like that, and avoiding high bounce rates. But some SEOs are just looking at the Core Web Vitals and with the perception that this is going to be the most important signal. If your website is slow, you're not going to rank at all. And if you're really fast, then you'll be the number one on the search results page. What do you have to say about that?
MARTIN_SPLITT: That's a really tricky one. And we are asking ourselves how to do this differently or better daily, basically. So on one hand, we want to raise the awareness for speed being an important factor in search ranking generally being important on the web, as to say, it's not necessarily only for for SEO is also conversion rate and other things. On the other hand, we don't want to overstate it either, because it is one out of hundreds of ranking factors. PageSpeed has beforehand already been a ranking factor, and it will continue to be one. And it's just that it's now a little more transparent, how we are measuring it. And the question really is, why would this be a huge, huge, huge update? If you think about it, the fastest website is probably a blank page. That's going to be fantastically fast. I'm sure the largest contentful pane will be like instantly, basically, it will instantly be interactive and it will be visually very stable, but it's not serving any purpose. It doesn't, I don't know which query I would have to type in, maybe besides blank page to actually be seeing that as a good search result. So it will not be the only one and it will not be the deciding factor. That being said, while many websites might not see much of a change in terms of the rankings because of the page experience update, some others will, because if there's others that are relevant, as relevant and as good as their page, but they are faster now, they might change their order or their position and ranking. So for some niche, it might be a very big impact. For other niches, it might be very small. I am a little worried with SEOs being hysterical, as you say, about this, because that's clearly overstating the impact and that's potentially burning bridges in the future. Because if you cry wolf too many times, people will stop listening to you. And that's something that I, as an SEO, wouldn't want to do right now. I would say like, hey, this is a fantastic opportunity to make our website faster. But before that, I would have a very, very realistic look at what other things are there. Are there any low-hanging fruits? Maybe performance improvements are the low-hanging fruits in this situation. Maybe they are not. Maybe your website's loading speed isn't the issue. But the fact that, I don't know, your content is badly structured or that there are no internal links that explain the site structure to Google. Or maybe you need a site map or maybe you need, I don't know, better alt text for your images or, or, or it, you have to get it holistically, as much as I hate that word. But basically, you have to look at the whole thing, and then make a decision, a reasonable decision if speed is really going to be the thing that could potentially break your neck, or not. And also, if that's something that you really want to push on, or maybe not. And unless, unless I have a very competitive niche and see that all the other pages are doing a lot better in terms of core web vitals than my page is, I wouldn't worry too much about it, to be honest. It doesn't mean that I wouldn't worry about it because page speed is important, but probably not a reason for history and crying and yelling and screaming.
DAN_SHAPPIR: I don't know if you can respond to this question or not, but the data that you use for this performance measurement, is it from Crux, the Crux database, or some other source, or what?
MARTIN_SPLITT: As far as I'm aware, right now we'll be using Crux.
DAN_SHAPPIR: For those of you who don't know what Crux is, I think we mentioned it as well in previous episode. When you install Chrome, unless you opt out, Chrome actually collects anonymous information about your browsing, including performance data. This data then goes into a database in the cloud that Google has, which is called the crux. And we actually had Rick Viscome on our show, talking about it. I'll find the episode in a minute and link to it in the show notes. So, so yes, that's what it is.
STEVE_EDWARDS: Yeah. That sounds like it was the real crux of the issue there.
DAN_SHAPPIR: Crux stands, yeah. Crux stands for Chrome user experience, I think.
STEVE_EDWARDS: So for a, let's say. Going back to my case of a new site, and this is something I actually Googled in the searching. So if I spin up a new site, let's say I do my tags, I've got my tags, my OG tags too as well for Twitter and Facebook or whatever. And I've got good content and maybe I submit a site map to Google, an initial site map. There's probably not a fixed answer. Ballpark, how long would it Google search results.
MARTIN_SPLITT: It depends. It can be from within minutes to within days, I would probably assume if it's a completely new page, sorry, a completely new site, and you submit it to Google Search Console, it'll probably take like a few days, order of days. Yeah. Right.
STEVE_EDWARDS: So if I have, and this is my last question, I know we're running a lot. Good second episode here, right? Yeah. So if I have content in my page and I'm wanting to see, you know, just do some test searching and I want to see how well my site has been indexed. To me, it seems logical that if I maybe have some unique string of text, you know, a sentence in one of my posts, for instance, or maybe even a sentence from one of my meta description tags and I'm searching on that and I'm not finding it, is there a particular first place to look for something like that where you're searching and okay, I know it's indexed, but it's not returning even a very specific unique search string.
MARTIN_SPLITT: So it can be, so that's a tricky one.
STEVE_EDWARDS: Sure it is.
MARTIN_SPLITT: In general, in general, it should show up for very specific pieces of texts and oftentimes it does, even though it might not be ranking for only related queries, like the moment you'll go away from very specific pieces of texts, you might actually find that it's not ranking anymore or not ranking very high anymore. If it doesn't rank for that, that's usually a sign that it might actually not be there. Or if it's a thing that is very unusual, we might rewrite the query and then actually not come up with the text because it doesn't fit the very specific thing that you have just typed into the query field. The best place to check out what's going on is Search Console. If it does not show up in the indexing report and index coverage report probably tells you why it's not indexed. And if it's indexed but not ranking, then you can at least take a look like what does the performance report in Search Console say? What are queries that I'm showing up for? And then you can try with that specific query and see if you're showing up. And the alternative is to use the site colon operator. So you can go like site colon and then your URL and maybe some string there and see if it shows up. Yeah, but it should generally show up if it's indexed and if it can rank for the specific thing.
DAN_SHAPPIR: The last topic that I wanted to bring up is I know that now that various browsers support specifying a search pattern inside of a page in the URL. I know that now often Google results actually link to a particular section within a page, even if that section doesn't have its own anchor tab or its own ID actually. So you can literally get a link to any text within the page. Do I, as owner of the page, have any control over that? Like, you know, indicate that a certain bit of text is higher priority or maybe indicate that certain bit of text should not be included in such results or stuff like that.
MARTIN_SPLITT: I don't think you can exclude specific parts of the page from search results, unless there's a way of dealing with paywall content. If you have a certain part of the page behind the paywall, there are ways of doing that, I think, with structured data. In terms of making things more eligible for these kinds of things, it's just like the information architecture, again, the structure of your content. If you use HTML structure right, then we are probably more able to understand what you care about and what the page really is centered around topic-wise. But there's no way to specifically opt out right now, as far as I'm aware, for parts of your page.
DAN_SHAPPIR: I think I've finished with my own questions and Steve said he did as well. I don't know if aging has any additional questions. But you, Martin, do you have anything that you want to add over the stuff that we already talked about?
MARTIN_SPLITT: No, I think I'm pretty happy with the questions. It's always interesting to see what puzzles people about search. And yeah, that has been a lot of fun.
DAN_SHAPPIR: Yeah, I think we've had some really awesome content in this episode. I'm really happy with it. Like Steve said, there's a good chance that this episode will need to be split into two because it's been running so long, which is actually great. It's a great indicator that there's just so much content to talk about. Before we go into Picks, what's the best way for people to contact you on the web? I know that you're also putting out a lot of content as part of your role in Google videos and interviews like this and podcasts and stuff like that. So what's the best way for people to find you, to contact you, to access the information that you're putting out?
MARTIN_SPLITT: So a lot of our documentation is on developers.google.com search search. That's also full of links to other things like our blog, our YouTube channel. Definitely check out our YouTube channel at youtube.com slash Google Search Central. That's also where we are posting the office hours that we every now and then do. And that's a great way of contacting us. If you have any questions, if it's a general SEO issue, you can ask John, who does the general SEO office hours. If it's a JavaScript-specific SEO issue, I do the JavaScript SEO office hours in regular intervals. And we have the Webmaster Forum. You can use the Webmaster Forum to ask questions and get answers from other people in the community. That's really, really good. We also have a few Googlers making sure to look through them every now and then as well. You can find us on Twitter. We are Google Search C, because we thought Google Search Central is a little long for Twitter names, so Google Search C on Twitter. I am also on Twitter. If you look for Marge Spritt, there's only one person. Please don't send me direct messages with questions because I cannot provide direct private support so everything has to happen in the public channels.
DAN_SHAPPIR: That's a surprise. So you don't provide support on demand?
MARTIN_SPLITT: No. I know. If you knew how many direct messages I get on Twitter asking exactly that.
DAN_SHAPPIR: Yeah, somehow I'm not surprised. And you probably also get some nasty direct messages when you don't provide the answers at all.
MARTIN_SPLITT: Oh, yeah, of course. Of course. Because, you know, why wouldn't I?
DAN_SHAPPIR: Yeah. Why wouldn't you?
STEVE_EDWARDS: I would assume that podcast hosts would get a slight slide on that. Right. You might slide us in a little bit if we message you.
MARTIN_SPLITT: As long as it's happening to the public channels, everything's fine. Anything else? Sorry. I'm very sorry.
This episode is sponsored by Sentry. Sentry is the thing that I put into all of my apps first thing. I figure out how to deploy them. I get them up on the web and then I run Sentry on them. And the reason why is because I need to know what's going on in my app all the time. The other thing is, is that sometimes I miss stuff. I'll run things in development, works on my machine. We've all been there, right? And then it gets up into the cloud or up on a server and stuff happens, stuff breaks. I didn't configure it right. AWS credentials, something like that, right? And so I need to get the error reporting back. But the other thing is, and this is something that my users typically don't give me information on, is I need to know if it's performing well, right? I need to know if it's slowing down because I don't want them getting lost into the Twitterverse because my app isn't fast enough. So I put Sentry in, I get all of the information about what's going right and what's going wrong and then I can go in and I can fix the issues right away. So if you have an app that's running slow, you have an app that's having errors, you have an app that you're just getting started with, go check it out, sentry.io slash four, that's F-O-R slash JavaScript, and use the code JSJABBER for three months of their base team plan.
DAN_SHAPPIR: Okay, so that'll push us to pick. Steve, let's start with you then. Did you come up with a pick for today?
STEVE_EDWARDS: Yeah, I think I did. So I'm going to go TV show again. And it's one that I watched quite a bit when it was on and went away for a while. And now it's come back on one of the various cable channels that I have. And I don't think I picked it before if I have them, sorry, but it was called, uh, it was on initially a cable-only show when those first started coming around. That's compared to everything being initially on the networks, but it was called in plain sight to show that takes place in Albuquerque, New Mexico. And it's about a couple US federal marshals that are in charge are part of the witness protection program, which is where the government puts witnesses that have testified for them against really dangerous people and dangerous people want to kill them. So they're trying to protect them. And it's was written produced and written mostly by the lead actress named Mary McCormick. But just one of those shows that I really got into, I think it has seven or eight seasons and ended six or seven years ago. Again, I don't have the dates, but in plain sight. Really, really good show really fun to watch.
DAN_SHAPPIR: Cool. AJ, how about you? Usually, you usually have a lot of excellent picks.
AJ_O’NEAL: All right, well, today is no exception. I'm gonna pick the Randall Monroe book trilogy that's not really a trilogy because they're not related, but what if, how to, and thing explainer. Randall Monroe is the ex-KCD guy, and you know, he's just funny. He's quippy and witty, and he's got these books which you should get in hardcover versions so that you can keep them forever and ever in your collection and keep them prominently on display for your guests or put them in your office waiting room. And it's just like absurd, absurd explanations of things like an overly detailed explanation of buried pirate treasure and the economics of it. And if you can jump out of a plane with a helium tank and realistically be able to fill enough balloons with helium before you hit the ground to slow your descent, which apparently you can. And, and the thing, the thing explainer is,
DAN_SHAPPIR: don't forget to drop the tank. Well, once you fill all the balloons.
AJ_O’NEAL: I, yeah, I don't know exactly how it works, but I think you got to have those big, big balloons
DAN_SHAPPIR: and, you know, take it off the tank.
AJ_O’NEAL: Yeah. And you got to, yeah, of course. I'm sure I'm sure that's really important. Yeah. And, and then thing explainer is using only 10,000 words. Uh, he explains the world's most complex topics such as nuclear fission. So it's basically using words that a second or third grader would know. So the, the, the, uh, international space boat, for example, stuff like that. So picking that I'm also going to pick. We were talking about the NSA. And there's this nice little route between the NSA building and the Microsoft building. It's not that much different between the Oracle building or the whatever big company you want to name Adobe building, et cetera. Cause all of those buildings are right there together. But yeah. So, I mean, I anyway, just for fun, there's the, the Google maps link to see that. I am a little bit worried about big tech invading Utah though. I mean, it's been doing it for years and years. It's you know, Utah has always been a tech hub since the eighties or seventies or whatever it was. But I'm, I'm a little bit concerned with how things are changing around here. And it's starting to feel a little too much like San Francisco, a lot more. Hmm. Uh, the things I don't like anyway, but interesting just to kind of see how close those, those things are together, the tech hub and the NSA data center. And then, uh, I'm also, uh, see, no, I'm not going to pick that. Okay. Last thing I'm going to pick is. There's this nice video that's a parody of a user focus group, which isn't like that far off because you have to do focus groups, right? Otherwise you get bad answers from your respondents. But I just thought of it as like the perfect explanation of traditional 1970s run of the mill, public key cryptography and blockchain technology. That's, that's what I came to in my mind, but it's, it's a user study of cavemen. It's like a one minute clip of cavemen discovering the wheel and, and their feedback on what they should do with it. And, and I just thought it was hilarious. And it's like, Oh, that's like, that's like what they did to cryptography to come up with the idea of the blockchain. Excellent. So those are the things that I am going to leave you with some links to, to check out. That is all.
DAN_SHAPPIR: Okay. So I only have one pick for today. It's a pick about our very own Amy Knight who unfortunately had to drop off before this section because of work stuff, who would have thought. So one thing that I really liked that Wix engineering, the company that I work at does is that we organize a lot of technical meetups and share a lot of useful content, and then usually the videos for these meetups go online and you can find them on YouTube in the Wix engineering, TACTOX channel. And recently we had a meetup where our very own Amy Knight spoke. And she spoke about the technicalities of CSS, you know, how CSS actually works inside the browser, how it combines with the HTML, the DOM to actually form the visual representation of the page, how to debug CSS and so on and so forth. A really excellent talk, obviously, because it's Amy. So I'm going to link to that and that will be, that will be my, my pick for today. So with that, we'll go over to you, Martin. Do you have any picks for us?
MARTIN_SPLITT: Yes, I actually do. I usually fall into all sorts of weird rabbit holes on the internet. And I come up with random.
DAN_SHAPPIR: And why do you want to find them?
MARTIN_SPLITT: Yeah, it's wild. It's actually not Google, interestingly enough. So I'll pick an interesting article about the curious tale of Tegel, as in Berlin, Tegel's old airport. Tegel's Boeing 707, where they recapitulate the history of a Boeing 707 that was standing around in one of the more remote areas of Tegel Airport in Berlin and how it got there and what happened there and like a bunch of backstory about like plane hijacking and stuff. So yeah, it's quite a ride, but it's quite an interesting one. I think then I'm also picking a fantastic article that covers a thing that happened in Belgium. So apparently, a bunch of mutant crayfish as in like genetically manipulated female crayfish escaped a facility in Belgium and are now like proliferating themselves and like living a happy life on a Belgian symmetry, which I think is such a bizarre story. But I love that it's true. And it's actually happening. And I'm like, Oh, my God, this is fantastic crayfish over. I for one welcome our crayfish overlords. And last but not least, because I feel compelled about like sharing something useful as well. I think that technical writing is tricky and important to get right and it's a useful skill. And I haven't really seen like much in terms of how to self educate yourself about it. I work with fantastic tech writers like Lizzie Harvey on our team was an amazing tech writer. But I wanted to brush up my skills a little bit and I found, or actually was pointed to this by Marie Schweitzer, I think, pointed me to an article from the Duke Graduate School or Duke University Graduate School on scientific writing. And it explains like why it's important, how you can get better at it, what's making a communication effective and like basically walks you through a bunch of lessons. I think it's three lessons on how to write better.
DAN_SHAPPIR: Excellent. So Martin, I want to thank you very, very much for coming on our show. It's a lot of amazing content that you've provided us with. I think this is really, wasn't really excellent. So if I may say so myself, and with that, we conclude another episode of JavaScript Jabber. So thank you to our listeners. Bye-bye.
STEVE_EDWARDS: Adios.
MARTIN_SPLITT: Thanks. Thanks for having me. Bye-bye.
AJ_O’NEAL: Adios.
Bandwidth for this segment is provided by Cashfly, the world's fastest CDN. Deliver your content fast with Cashfly. Visit C-A-C-H-E-F-L-Y dot com to learn more.
JSJ 477: Understanding Search Engines and SEO (for devs) - Part 2
0:00
Playback Speed: