JavaScript Links and Everything Related
eye

By Maxim Kovkrak, CEO Executive at ADINDEX, Digital Marketing Agency, Ukraine.

I was prompted to write this article when reading this post by Rand Fishkin. The author claimed that old methods, rel = nofollow and JS, are useless for PageRank (PR) sculpting. There were many discussions and open questions in the comments, which means the issue is pressing for many SEO marketers.

In this post, I speak about the nuances, risks and cons of using JS in managing link juices and crawl budgets. I examine the challenges related to Google’s handling of JS, crawl budget optimization, and the relevance of PageRank sculpting efforts.

1. Let’s get straight to the point: does Google recognize JS codes?

The answer is ‘Yes’.

yesgiphy.gif

Google officially announced it on their blog back in 2015. All the top experts in the field have been talking about it. Multiple tests have confirmed this as well. For example, the experts from Merkle | RKG conducted a series of meticulous tests for JavaScript links, and published the results in their article on Search Engine Land.

The tests have shown the following:

1. JavaScript redirects

The Merkle | RKG experts chose the window.location function as a method for testing Javascript redirects. Two tests were conducted: the A test included an absolute URL, while the B test included a relative URL.

Result:

Google easily followed redirect links, interpreting them as 301s. In the index, the URLs where redirects were coming from, were replaced with the result URLs.


2. JavaScript links

Standard JavaScript links were tested. These are the links most commonly used by SEO specialists. The links in test were coded using:

  • Functions outside the href AVP, but within the a tag (onClick);
  • Functions inside the href AVP ("javascript: window.location");
  • Functions outside the a tag, but called within href AVP ("javascript: openlink ()");
  • and more.

Result:

All the links were fully crawled and followed.

3. Dynamically inserted content

The dynamically inserted content was checked in two situations:

  • The ability of the search engine to scan for dynamic text if the text is in the HTML source code of the page. Something like the "Read all" button that shows the other part of the content when you click it.
  • The ability of the search engine to account for the dynamically inserted text, if the text is located in an external JavaScript file.

Result:

In both cases, the text was crawled, indexed and influenced the page rank.

4. Dynamically inserted meta data and page elements

The Merkle | RKG experts dynamically inserted in the DOM different tags that are key for SEO professionals:

  • Title elements
  • Meta descriptions
  • Robots meta tags
  • Canonical tags

Result:

In all cases, the tags were crawled appropriately, exactly in the same way as the HTML elements in the source code.

In total: Google has long learned to recognize JS codes, both simple and complex ones. Not only does it execute various types of JavaScript events, but it also indexes dynamically generated content by reading the DOM.

However, if you block the bot from accessing the *js file that stores the executable code and hidden content encoded with the Base64 standard, you’ll witness a completely different situation. In addition, you won’t see the URL in the HTML code, instead there will be something like this:

<span hashstring="0lrrg9c0ljrgdc70l7qvdcw" hashtype="content"> &nbsp; </span>

In this case, browsers will be able to execute the code, but Google won’t.

Yes, it’s a working method — the technology is called SEOhide. Still, you do understand the difference between ‘handling’ and ‘accessing’, don’t you?

2. Hiding links

Why would you need to hide links from search engines?

  1. To manage PR (link juice, in simple words).
  2. To save your crawl budget.

2.1. Managing link juice

The story begins with the introduction of the Google search engine that featured the PageRank algorithm. The idea was that, the more pages link to a particular page, the more ‘important’ rank the page would receive. The importance of the referring pages was also taken into account. Simply put, the know-how was to consider both the internal and external amount of traffic.

Since that time, the PageRank algorithm has been updated multiple times. It has changed and improved in many aspects (for example, now it takes into account the relevance of the link and its position on the page), but the basics remain the same.

Each page on the website has a certain value. This value is passed, with some amount of loss, to other pages through the links located on the page. The amount of value is determined by how many pages link to this page (this includes pages both on and outside of the website), and also by the level of authority these pages have.

PR sculpting means preventing link juice from flowing to useless (or non-priority) website pages and redirecting the link juice to important pages.

Useless pages are pages that don’t generate traffic: shopping cart, contacts, "About us", user agreement pages and more. Usually links to such pages are placed in the header and the footer of the website, which means they tend to collect the link juice from absolutely all site pages.

Important pages are the priority pages that are being promoted.

1) Preventing the flow of link juice

Until 2009, the sculpting of link juice meant using the attribute rel = »nofollow». The nofollow value was set at the link level or page level, preventing the search engine from scanning and passing PageRank.

Initially, Google introduced this attribute to fight spam. The idea was to provide webmasters with a tool to deal with links left behind by SEOs to promote their projects. Thanks to the nofollow attribute, there was no more meaning to such an outright link building method.

However, when the Google team realized that nofollow was used to simply redistribute PR within the website, they updated the mechanism behind the tag. Matt Curtis presented the new logic of the attribute at the SMX conference. Later, he posted "PageRank Sculpting", an article in his own blog, explaining the logic in more detail.

Now all the link juice going through nofollow links simply disappears in vain: it doesn’t remain on the original page and isn’t passed to the acceptor page.

PR algorithm

So, the rel = "nofollow" attribute doesn’t limit the flow of link juice. On the contrary, keep in mind that through such links you would lose your website’s PR.

As we already know, JS won’t hide the site links from Google. Still, the SEOHide method will do the job. But there is one drawback which calls the use of this technology into question, and which it is worth mentioning. There may be penalties from Google.

In July 2015, Google issued a Search Console warning to webmasters: “Googlebot Cannot Access CSS & JS on…”, explaining that this problem would harm SERP rankings is not resolved.

Googlebot Cannot Access CSS & JS

That means that, if you block the search engine from accessing CSS and JS-files, there is a possibility that your website will lose its rank in the SERP.

This is considered to also affect websites’ rankings in mobile search results. The bot won’t be able to test the responsiveness of the website without analyzing CSS and JS files, which means the website will fail the test. If that’s the case, then using SEOhide won’t affect the Mobile Friendly test and you’ll lose no rank.

2) Passing the link juice to the important website pages (interlinking)

To direct the link juice to more important pages, SEO specialists use a number of different interlinking methods. For example, they place the links in the site menu, on the filter panel, on in separate interlinking blocks. The logic is to send the accumulated link juice to the priority categories that are being promoted at the moment.

Recently, people are talking more and more about the weak effect of interlinking. There’s often a question, ‘Is it worth sculpting the link juice, is it worth it to spend time on this at all?’ In particular, Rand Fishkin talks about this in his article, ‘Should SEOs Care About Internal Links?’. He claims that the classical sculpting format discussed above is rarely effective, and if there’s some effect, it’s quite small:

“When PageRank was the dominant algorithm inside of Google's ranking system, yeah, it was the case that PageRank sculpting could have some real effect. These days, that is dramatically reduced. It's not entirely gone because of some of these other principles that we've talked about, just having lots of links on a page for no particularly good reason is generally bad and can have harmful effects and having few carefully chosen ones has good effects. But most of the time, internal linking, optimizing internal linking beyond a certain point is not very valuable, not a great value add.”

Rand doesn’t talk about the impact of links outside the website, since they do work. It's all about the changes in the effectiveness of the PR algorithm. In particular, changes in the impact of one of the algorithm’s components, the amount of traffic within the website. External links generate more value for website ranking and have more influence on the website’s position in the search results, while internal links generate little to no value.

Rand's statements were confirmed by Dmitry Shakhov's tests, which he outline at the online conference "WebPromoExperts SEO Day" in 2018, in his session "Internal Linking: Myths and Reality." Dmitry had analyzed commercial websites that have long been in the top ranks, but haven’t been worked upon for quite a long time.

In his tests, Dmitry checked the following:

  • Hornet Sting (interlinking by levels)
  • Linking from relevant pages
  • Linking from random pages
  • Impacts of the amount of links

During the experiments, the targeted, sting-like linking didn’t yield any significant effect. The results were so insignificant that it was impossible to overcome the influence of the external textual and referential factors. The effect was close to zero.

The only option found was to push the page up using a very large number of links. The trust level of the host itself plays an important role here. Websites that are already well ranked in the search would have the impact.

Conclusion: Again, interlinking within the website has very small effect. It can be achieved only with a large number of links and only for trusted websites.

But are things with the interlinking really so straightforward?

The answer is ‘no’.

No.gif

There was an event suggesting there are other factors affecting the effectiveness of the link.

Once asked, "If there are 3 links on the page: 2 dofollow and 1 nofollow, what amount of PR will be transmitted through each dofollow link: 1/3 or 1/2?" Andrei Lipattsev, Google Search Quality Senior Strategist at the time, said that it’s neither a half nor a third.

If there are 3 links on the page

What factors can these be? We can’t tell exactly, but, most likely, it’s about the ‘usefulness and relevance of the link’. We all know that the backlinks depend on where the link is on the page, whether users are passing through it, whether it is relevant to the content — it’s all true for the internal site links and their effectiveness as well.

So it turns out, as Rand said, that in some cases internal interlinking has a small effect, and in some cases it doesn’t. Most likely, it all depends on how the interlinking was implemented.

Managing the link juice: conclusion

From now on, you should be very careful. The conclusion below is the author’s personal opinion, and it may not coincide with yours. It’s OK, it’s SEO and it always works like this. If you have a different opinion, please share your thoughts in the comments.

Since the very beginning, the PageRank algorithm has changed significantly and become more complex. One of its components, the internal link juice, now has lower impact than before.

Updating the algorithm requires taking a new approach to sculpting the link juice.

Should you hide links?

Since the influence of the internal value is insignificant, why use the sophisticated and costly SEOhide method to plug the holes your link juice is running through? Most likely, this will turn out a waste of your money, given that the link juice algorithm doesn’t work by the "divide equally among all links on the page” rule.

We don’t recommend using SEOhide, or similar methods, to save PR value. Among all, the nofollow attribute will steal the amount of traffic from your website.

What can you do with the interlinking?

Interlinking blocks placed at the bottom of the webpage are ineffective. Try to make all the interlinking elements a part of the user functionality, so that they are useful and bring additional value. This way, users clicking on the link, the link’s importance, will be an additional signal of influence for the acceptor page.

Classical interlinking can be used to maximize the indexing.

2.2. Saving the crawl budget with JS

The logic is that, if there is no link on the page, the bot wouldn’t waste time indexing it.

Webmasters often tend to use JS, for example, in order to hide links on the filter panel.

Hide links on the filter panel

There’s often a lot of these links and most of them are closed from indexing. Having ‘removed’ unnecessary links from the page, the bot should go through pagination pages, product pages, subcategory links, and more.

As we already know, JS is easily handled by Google. If you use JS to hide links, this would make things worse and increase the crawl budget expense.

Why increase? The answer to this question is provided, in full details, in ‘JavaScript and SEO: The Difference Between Crawling and Indexing’, an article by Barry Adams.

The point is that, there are two fellows in Google working close together: the Googlebot crawler and the Caffeine indexer.

The Googlebot crawler and the Caffeine indexer

Googlebot’s task is to go through the pages on the website, find all the URLs and scan them. It also has a parsing module that checks the HTML source code and retrieves any links it can find. The parser doesn’t render the page, it simply parses the source code and extracts any URLs found in <a href=»..."> fragments.

When Googlebot finds new or modified URLs, it sends them to the other guy.

Caffeine is the guy who sits calmly and tries to identify the URLs it receives from the crawler, by analyzing their content and relevance. Caffeine is responsible for rendering web pages and executing the JS code.

The Google documentation for developers explains in detail how their WRS web rendering service works.

It’s exactly the WRS in Caffeine that executes JavaScript. The ‘Get and display’ feature in the Search Console allows you to see how Google’s WRS sees your page.

Now that we know there’s a complicated process of interactions between the two systems, scanning and indexing, lying behind the ‘site scan’ mechanism, what’s next?

On a website where there’s a large number of links that aren’t part of the original HTML source code, all the crawler can find for the first time is only a limited set of URLs. Next, it needs to wait for the indexer to process these pages and retrieve the new URLs that the crawler will then scan and send back to the indexer. And so it goes, time after time.

It’s possible that Google will spend a lot of time on scanning and rendering of unnecessary pages and very little time for processing the important ones.

And what’s with the SEOHide we’ve mentioned already?

Yes, it will work, but there’s a much simpler way provided and recommended by Google.

The official certificate states:

Crawl prioritization

Simply put, if you want to save your crawl budget for more important pages, you need to add the nofollow attribute to all secondary links in order to manage the scanning priority.

There is one downside to this option. This way, there’s a conflict of interests between PR sculpting and saving the crawl budget. If you still want to manage link juice, remember that your PR might get wasted through such links.

It’s up to you what’s more important, just adapt to the situation. If you face problems with website indexing, use nofollow to prevent the robot from accessing these "useless pages". If you don’t want to lose PR, it's better to just leave the links open.

In our example with links to filter pages that aren’t needed in the SERP, there are two more methods you can use.

1) Block these pages in the robots.txt file

If your page is excluded only with the meta name=»robots» content=»noindex, follow» tag, this won’t help you save your crawl budget. This tag is used by the indexer, not the crawler, which means the page will be scanned and processed, but not added to the index.

Use both methods, so that robots.txt will block the access to the crawler.

2) In Search Console, block access to page scanning by URL parameters

Conclusions

  • Google can handle JavaScript. However, the reliable method to hide any content (SEOhide) would be, to code the executable code, hide it in a file, and block access to the file in robots.txt.
  • Using JS to hide links would only aggravate the situation. Firstly, the links would be accessible; secondly, you’d spend more of your crawl budget.
  • Using nofollow to manage the link juice would have the opposite effect. All the link juice would flow into nowhere through such links.
  • SEOhide is a costly method with a questionable impact.
  • The amount of PageRank passed through the link depends not only on the number of clicks on this link and the nofollow parameter. It’s also influenced by the placement, application, and “weight” of the link.
  • To bring results, interlinking with a purpose to pass PR should be made a part of website functionality. Interlinking at the bottom of the page should only be used to maximize the indexing.
  • SEOhide does help save crawl budget, but there are more simple methods.
  • You can save your crawl budget without hiding links. The most effective way would be, to use the nofollow attribute at the link level. There are two more options for pages you don't want to have indexed. You can exclude the links either in the robots.txt or in the webmaster settings. The disadvantage of both methods is that you'd lose your PageRank.

Website indexing is the interaction between two systems, a crawler and an indexer, each of which follows its own rules. The crawler checks robots.txt files, the nofollow link parameter, Search Console settings, and more. The indexer uses the meta name=»robots» content=»» tag.

Другие интересные статьи