Skip to main content

Abnormal indexing loss: a case study

| Gianluca Gabella | Guide e Documentazione
share on facebook share on twitter share on linkedin

Index:


In this blog post I want to expose a direct experience we had a few months ago with a newly created site. This site, after a small change on the hosting side, began to inexplicably lose indexing. In this post I will explain how we identified and solved the problem.

Foreword

The site in question (www.progettonpm.it) is a revisitation (both structural and graphic) of the original site we had implemented in 2012 with the graphic support of Cristiano Capelli, built at the time with Joomla 2.5.
The 'old' site was based on a hand-built theme to meet all design requirements and relied on the standard com_content component of the articles for content, even for the most interactive part, i.e. that concerning job offers.

After eight years it was rightly time to update everything: both the platform and, of course, the graphics. For the latter we once again collaborated with Cristiano Capelli, for the former it was decided to redo everything from scratch with Joomla 3.9 to avoid the usual probable problems due to a migration from 2.5 to 3.x.
Again, the template was created by hand with YoothemePro to meet all modern accessibility and responsiveness requirements.

Over these eight years, the site has positioned itself very well both in general (general searches for a design studio in Bologna and Italy) and with regard to specific keywords. It was therefore imperative, with the transition to the new site, not to lose this positioning and obviously, if possible, to improve it even more.

Moving from a Joomla 2.5 to a 3.x is not that complicated, but problems can arise especially if the PHP version of the server or the MySQL database version are too old to support the new version. More specifically, the minimum requirements for Joomla 3 recommend PHP 7.3+ and MySQL 5.5.3+, versions that were not selectable on the server where the old site was running. A request was therefore made to the hosting company to switch to a new, updated server/database structure, which took place in a short time and without any particular problems.

But from this moment on, the problems started.

The new site: the first errors

Once the switch to the new server was made, we were finally ready to go online with the new site. We then made the transfer of files and databases from the test environment to the production environment without any problems. Everything had gone well: the site was working, there were no slowdowns and all functionalities were operating smoothly. But after a few days, the first warning signs appeared on the Google Search Console. For those unfamiliar with it, it is a free Google tool that allows the site to be monitored, both in terms of functionality and indexing, giving the possibility, among other things, to send the sitemap.

In the coverage section, a set of 5xx errors (which are server errors) came up, resulting in a considerable drop in impressions. On cross-checking with the company's feedback, it turned out that indeed, since the new site had gone online, some positioning had been lost, both generic and on many keywords:

impressioni search console

The first thing that occurred to us was that there were coverage problems. That is, that some pages were not being indexed, either because of sitemap problems or because individual pages were not being accessed (e.g. because of a badly configured htaccess or robots.txt). Again from the Search Console, however, we saw that the coverage was there for all pages, and there were no errors whatsoever:

copertura singole pagine

The first tests

Since all the pages were correctly indexed by Google (or at least, so it seemed from the Search Console), we immediately thought there was a structural problem with the site pages. For example, unreadable texts, or obvious problems with the insertion of texts (H1/H2 structure..., or page titles) that had caused a drop in impressions. We knew that this could not be the only problem because poorly inserted texts could not cause all 5xx errors, but it was necessary to do this check as well.

Relying on advanced online SEO check tools (semrush, seozoom and others) we saw that, in fact, the internal structure of the pages was almost perfect. All tags were correct, as were the structured data, meta tags and open graphs. Even 'less important' details had been implemented and were clearly recognisable. The score received by these tools was in fact very high:

punteggio seo check

As can be seen from the image above, the problem was not the basic structure but rather the speed of the site/server (very low peak especially on 'mobile speed'). The slowness of a site can be caused by many factors. If we are talking about a CMS such as Joomla or Wordpress, the main suspects are the third-party plugins added to the site to expand its functionality. If these plugins are not well developed, they can slow down the loading of a page by several seconds. However, we were confident about this aspect: to try to make the site as light as possible, we had almost exclusively used Joomla core components that are well written and extremely fast loading.

Testing everything with Google Lighthouse (and its online counterpart PageSpeed Insights) we saw that the slowness was not due to the site itself, but to the server, which had very high initial response times.

tempi risposta server

The response times you see above are also 'lenient'. In many other tests we have done, the 'initial server response time' was as low as 5 to 6 seconds in the worst cases: a decidedly disastrous result if one is trying to position a site in the best way (Google loves speed, if the site is slow, you can have the best texts in the world, but Google will always tend to cut you off).

Let's start connecting the dots

Summarising what had been discovered so far, we had that:

  • The site was fast and structured SEO wise;
  • The server had high response times, in some cases very high. Far too high even for a medium/low-end shared server;
  • Search Console did not report any problems in page coverage: they were in the index;
  • Search Console again, however, was reporting various 5xx errors that could obviously affect indexing

All signs pointed to a server error, which for some reason made the page visible to us mortals but not to Google. Problems with htaccess or robots.txt had already been ruled out, so what could have caused such a glitch?

There are many ways to test how Google can 'read' a page, one of the most widely used is to use the structured data test. Structured data are meta-information inserted within an article that allow Google to better catalogue the information, and make it visible to the public in different formats than the classical search results. Structured data are, for instance, used on recipe pages to mark cooking time, ingredients, or rating (the classic 5 stars) and allow Google to display the results of a search in a visually prettier and easier-to-understand way for a user, making site-to-site comparisons more immediate.

This is an example of how the structured data of a recipe is displayed by Google in the search results:

google dati strutturati

The tool that can be used to test structured data is free of charge, and is made available to the public by Google itself. You can find it here.

The result left us stunned:

risultato test dati strutturati

Not only did Google not see the structured data of the page correctly. But it was seeing a different page at all. In fact, it didn't even see a web page at all!

From what you can read in the results above, Google was seeing an application-type PDF instead of the homepage, with encrypted and unreadable data. So the problem was not that the pages were not being indexed (and indeed they were, the Search Console coverage proved it) but that they were simply being indexed incorrectly, with fictitious content from who knows where.

At this point we did the most natural thing in the world, we Googled the domain to see what came up. Here is the result:

risultati google dominio

The domain was indexed, with a non-Italian title and content, which if clicked on led to a foreign website with malicious content (phishing).

Don't panic

The very first thing we thought was 'our site has been hacked!': probably a hacker attack had managed to get into the website and messed it up. It's not uncommon for a CMS to get hacked, especially if it hasn't been updated for a long time or if it uses a lot of poorly written or outdated plugins, but the new site was updated to the very latest version, didn't use any third-party plugins, and above all had only been online for a few days: it was highly unlikely that it had been hacked in such a short time.

To test everything, we copied the current site (the one that is now on www.progettonpm.it) and moved it (same files and same database) to a temporary hosting. We then retested the site for structured data: if there had been a code change by some hacker, then the APPLICATION-TYPE/PDF problem would obviously also have been seen on the moved site.

Instead, the test was successful:

dati strutturati server di test

The result, as you can see, is the structure of a normal HTML page, with all the tags in place (and all the structured data correctly configured). The problem, fortunately, was not in the site code, but at this point there could only be two culprits:

  • Hacked Server on which the website is hosted
  • Wrong/hacked DNS so as to create a redirect to malicious sites

In both cases, the problem was huge, and the hosting was responsible. In order to avoid further loss of time, it was decided to move everything (domain, hosting and email) to another hosting that was more structured and more dedicated to high-performance and SEO-oriented corporate sites.

Once the new hosting was chosen, the switch was made within 24 hours, and already a week later the Search Console stopped reporting 5xx errors and the content ranking was back to pre-update times.

All's well that ends well, but it was hard to find the culprit, in a surreal situation that had never happened in 12 years. I hope that this report can help other webmasters find 'hidden' culprits that can do great damage to websites if not debugged and resolved quickly.


If you liked this article, please share it!

share on facebook share on twitter share on linkedin
condividi su Facebookcondividi su Twittercondividi su LinkedIncondividi su WhatsAppcondividi su Telegram