Digital Economy

Google's secrets revealed: what we discovered from the leaked documents

New details on how Google and Chrome use navigation data

by Alessandro Longo

31 May 2024

3' min read

Key points

What's in the big leak
Chrome and privacy

3' min read

A treasure trove of information on how the Google search engine works, on whose criteria a substantial part of the web economy depends - from newspapers to ecommerce. But also details on how Google uses the data of our surfing (and interests) that comes from the Chrome browser.

There is this in the thousands of documents, which appear to come from Google's internal Content API Warehouse, published on Github by an automated bot called yoshi-code-bot. The publication date is 13 March, but they have been revealed to the public in these hours and for experts it is an open window into how Google's ranking algorithm might work.

A super 'leak' of confidential data, in short: it had never happened to what is by far the most used engine in the world (although in 2023 the same had happened to the Russian Yandex).

What's in the big leak

The documents detail 2,596 forms and 14,014 attributes used to classify content. They explain that content can be downgraded for reasons such as mismatched links, user dissatisfaction, product reviews, location, exact match domains and pornographic content.

Google also has a detailed change history: it keeps track of every version of every indexed page, but only uses the last 20 changes for link analysis.

It confirms the idea, shared by seo - who try to 'guess' how Google works - that links and clicks are very important for Google.

In particular, the diversity and relevance of links on a page are crucial and PageRank is still used, especially for the homepage of a website. The user's successful clicks ('goodClicks', 'lastLongestClicks') on search results are crucial in determining how high that site will appear on Google. It is a criterion that in fact says a lot about the relevance of that content for users.

Longer documents can be truncated and shorter contents are evaluated according to originality.

Also confirmed is the (growing) importance, for Google, of reputation and brand: of a site and of the individual page authors themselves.

Building a notable and well-recognised brand is crucial for improving organic search rankings. Google also stores information about the author of a page and associates it with the content to determine authorship.

This does not detract from what Google considers the overall user experience: the creation of excellent content and user experiences remains essential. More qualified traffic signals to Google that a page deserves to be ranked. Just as important is the title of the page, its correspondence to the user's search query.

In short, as the international SEO experts commenting on the leak in these hours say, it remains crucial for a site to build a valued, esteemed and recognised brand, to publish expert content, and at the same time to take care of the all-round user experience.

Google responded to the leak by saying that 'we would caution against making inaccurate assumptions about Search based on out-of-context, out-of-date or incomplete information. We have shared extensive information about how Search works and the types of factors our systems weigh, while working to protect the integrity of our results from manipulation'. The fact remains that the leak, at least in key principles, confirms international seo experts' inferences about how Google works.

Chrome and privacy

Another aspect that emerges is how Google uses data from Chrome, its browser and the most used browser in the world (by far).

"From what we read in the leaks, Chrome could send Google various detailed data on the clicks made by the user (clickstreams), with the aim of improving the results of the searches made on Google by the user himself," explains Matteo Zambon, one of the founders of Tag Manager, a Seo consultancy company. 'Chrome would contribute data to at least two systems (called NavBoost and Glue) that would collect users' browsing data in order to identify search trends and improve the quality of search results'.

"Chrome's data would be used by Google to fight spam and to improve the quality of searches by analysing cookie history and users' interactions with the sites they browse," adds Roberto Guiotto, another founder of Tag Manager.

The more Google knows about us, even through Chrome, the more it improves our search, personalising it. At the same time, our privacy depends on it. 'The leak highlights the contrast between two opposing needs of users: on the one hand, the desire to benefit from increasingly advanced and personalised digital services, and on the other, the just desire to protect their privacy,' Zambon summarises.