Last updated on
Recently, a former Senior Software Engineer from Google goes public and leaked internal Google documents to Project Veritas, a non-profit organization that seeks to expose corruption in public and private institutions.
The ‘Google Document Dump’ is a web page that shows 9 folders obtained by the Google insider which can be downloaded and be personally viewed. Note that all information and images presented on this blog post are sourced from Project Veritas.
It’s hard to say if these leaked documents are authentic or not. There are more than 900 files and most of them are screenshots of emails between Google employees with no context at all. Some documents are slides that seem to be made for internal presentations.
While the news sounds juicy, there is not a lot to be excited about these documents. For the SEO industry, a lot of useful information here might already make sense. Here are some highlights of the documents.
While the main purpose of the ‘leak’ was to expose Google for having political biases and hidden agendas, some of them actually make Google look good. In the ‘Fake News’ folder, there is a list of websites that are blacklisted from appearing in Google Now, a feature of the Google Search app for android and iOS phones which was phased out in 2016. Other documents include information on Google’s fight against disinformation or fake news.
This is no surprise at all. Even though most of these documents are from 2016, I wouldn’t be surprised if Google is still implementing most of these or probably improve their policies on fake news. While this document is for Google Now, I would say there is a possibility that they are using this for Google Discover and Google News if this was authentic.
One more notable document from this folder is the diagram of the Google News Ecosystem that shows the process of being verified news publication in the search engine. It indicates that Google receives 4,000 applications a month to be a news publisher but only 18-25% are approved.
Manual Actions for Websites in Google News
Another document that was in the Fake News folder is a document on integrating the WebSpam manual actions to the Google News Corpus. The WebSpam team is Google’s hand in handling reports on spam, paid links, and malware.
It is indicated in the document that domains that received manual actions from the WebSpam team will automatically be prevented from showing in Google News or be permanently removed. Penalties were initially requested for multiple penalties such as “MobileAdRedirect, ClockingRedirect, SpammyUserContent, and HackedIsolated”.
However, Google News limited it to three penalties; Blackhat, Demotion, and Hack. Websites that are flagged with Blackhat and Demotion (bad content, scraping content for duplication, links) will completely be removed. For Hacked websites, temporary removal will be applied but will then be returned upon cleaning of the malware.
Machine Learning Fairness
As an AI company, Google relies heavily on machine learning in building its algorithms, data, and products. In these documents which are mostly dated 2017, it seems like Google knows that unfairness can or does occur in their algorithm and is advocating and investigating for change.
As defined in one of the documents, “algorithmic unfairness” means unjust or prejudicial treatment of people that is related to sensitive characteristics such as race, income, sexual orientation, or gender, through algorithmic systems or algorithmically aided decision-making.
In one of the documents, they mentioned that their goal is to “Articulate the full range of algorithmic unfairness that can occur in products” for further development of their tools and products.
One of the presentations shows how machine learning systems can pick up biases from data.
These ‘leaked documents’ are not yet authenticated by Google but it seems like there is nothing too alarming. While the motives of the Google Insider is to expose Google or its political biases, the documents are not at all surprising since Google is currently facing anti-trust issues. However, the documents are also a reminder that webmasters should be concerned about how Google treats specific types of content.