I tentativi estremi di rianimare i negoziati tra Usa e Iran
dal nostro corrispondente Marco Masciaga
3' min read
3' min read
A treasure trove of information on how the Google search engine works, on whose criteria a substantial part of the web economy depends - from newspapers to ecommerce. But also details on how Google uses the data of our surfing (and interests) that comes from the Chrome browser.
There is this in the thousands of documents, which appear to come from Google's internal Content API Warehouse, published on Github by an automated bot called yoshi-code-bot. The publication date is 13 March, but they have been revealed to the public in these hours and for experts it is an open window into how Google's ranking algorithm might work.
A super 'leak' of confidential data, in short: it had never happened to what is by far the most used engine in the world (although in 2023 the same had happened to the Russian Yandex).
The documents detail 2,596 forms and 14,014 attributes used to classify content. They explain that content can be downgraded for reasons such as mismatched links, user dissatisfaction, product reviews, location, exact match domains and pornographic content.
Google also has a detailed change history: it keeps track of every version of every indexed page, but only uses the last 20 changes for link analysis.