You’re thinking Machine Learning is science fiction, confined to a select few in tax-dodging Silicon Valley or a far-off ever-developing concept. But it is not. You’re looking at it right now: machine learning is used every time you use the web.
Let’s go back in time. It’s 1956, a group of researchers organised by the computer scientist John McCarthy wanted to study the simulation of learning, mimicking how humans learned and developed. This type of theory was once exclusive to the realms of cybernetics, philosophy or seen as a complex data processing idea akin to the infinite monkey theorem.
John McCarthy proposed the research in what was called “A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence.” This was the first time the phrase ‘Artificial Intelligence’ had ever been used.
In the proposal, it stated:
“The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.”
The research was, in retrospect, an excited get-together of like-minded peers. But it brought to fruition the idea of machines independent of human input. Now, machine learning is used in essential aspects of innovations involving Big Data from automated quantitative finance, robotics, hospitals, smart cars. But more relevantly they play a massive part within search engines.
According to John Giannandrea, Google’s head of Machine Learning, the codes that drive the Machine Learning algorithms take up the most of Google’s computing power. So, anyone who works in SEO or Digital Marketing needs to lock themselves away and get to grips with what’s going on – because it’s happening now.
How does Machine Learning Work Today?
Machine Learning (ML) is not Artificial intelligence (AI). Instead, it’s a smaller concept within AI that takes the approach that software should be able to learn. Machines have, in the past, not been self-aware of mistakes. Computer scientists have always tweaked the code to make algorithms more efficient. With ML, not only can it find errors, but develop from them with no human input.
In the past, computers have been fed raw data and a set of algorithms, which would then produce the dataset that they were pre-purposed to find. ML software gains knowledge by itself. With regards to search engines, it looks at the users’ journey through the web, then feeds its insights into software that brings up the results page.
How does it affect my search?
With unsupervised ML, it’s completely reverse to how traditional algorithms were set within your search journey. That’s because the software creates the algorithms based on the data fed to them, and gives the search engines data-sets they didn’t even think they had.
Machine learning is its maker, distributing its created algorithms. Which means ranking factors are becoming even more complicated than they already are. Although nothing is ever going to be static when it comes to the SERP, it’s going to be more difficult to pinpoint how and why a rank has shifted.
Ranking factors are going to grow as more data gathered as the machines can calculate lot more categorization and indexing of sites.
How do we optimise for machine learning?
The fact is, if you’re asking this question then you’re already in the wrong state of mind.
Although, some of the technical SEO aspects and research does provide better usability which is small gains when it comes to the results page. It all goes back to the reason why search engines in theory exist, to provide exceptional and relevant content for the user.
Machine learning hasn’t stricken rankings immediately unlike significant algorithm shifts have in the past but has been chopping at it robbing poor content of its coveted position. So, it has indeed revolutionized the SERP, with the intent of providing optimal results for search engine users.
Some of the small gains you’re looking to achieve within landing pages might change, but some static factors have always been the case when it comes to getting onto the first page.
Some of these are:
- Creating well researched and thorough content. The content should sell itself so you shouldn’t have to be on the prowl for links.
- Good bounce rate (below 55%) to click ratio is your weapon against the robots as this is a good indicator that you’re ticking all the boxes when it comes to providing relevant content to the keyword entered in the search engine.
- Structured data or Schema is essential as the more information you can provide to the search engine, then the landing page is not only going to rank better but also more broadly.
Content and links to your site are still the main two ranking factors within Google and its competitors. What comes at number 3? Let me introduce you to RankBrain.
What is RankBrain?
Launched in 2015, RankBrain is Google’s complete comprehensive name of its machine learning algorithms and is with you in every time you search. It exists within Google Hummingbird and seen as one of the leading ingredients in providing a conclusive semantic search. This wasn’t always the case, but Google has become convinced of its reliability in bringing users relevant search engine results. So, what does RankBrain do? The answer is, almost everything! The main purpose of RankBrain is to deal with the queries that come through Google that has never appeared before. This adds to about 15% of total search queries.
One of the significant aspects that RankBrain delivers is figuring out the intent of the user search and what is relevant to that purpose. Some of the behaviour RankBrain will look at includes:
- Dwell time vs. competitors. Does the search query need a quick answer or more in-depth and does this align with competitors?
- SERP pattern. Have users clicked on your site only to go back to the SERP and click on another link from somewhere else? RankBrain may assume that your content is not fulfilling users.
Long-tail search terms have always been tricky for algorithms to decipher because if there are more words, there are more variables in the user intent. Long-tail keywords often mimic natural language and are what SEO’s have been told to pay attention to when it comes to keyword research as these types of queries are likely to imitate voice search.
Natural Language Processing
Understanding natural language in real-time is one the primary goals for search engines (especially Google) and will affect not only the way we interact with search engines but how we create content. We can see this in full practice with the drastic development in Voice Search, which is becoming more adaptable every year.
RankBrain is backed by various AI systems and libraries. One of these is Google’s home-grown developed ML library software called Tensorflow. Good news is that Tensorflow is open source and it may give us an idea of how RankBrain might work when it comes to natural language processing. For example, in its tutorial, there is a section on word vectors that are an essential aspect of partitioning data and a tool for computers to understand natural language.
Word vectors are the vector representation of words so that computers can fathom what we are saying and to group them with other words. The reason it does this is to make the data more manageable and is similar way search engines in the past handled image data.
Learning how software like Tensorflow works is now an essential part of anyone in an SEO role. They’ll need to look at of the core factors that software that Tensorflow looks at, such as:
- Stop words that are the filler of our language such as “a”, “is”, “are” and “the”. When extracting from the corpus, the traditional algorithms would not have processed these words as it assumed that they did not have any weight or value because it was too complicated for the machine to contextualise. But with ML this is no longer the case as they can attribute the intent of the user based on stop words.
- One of the ways word vectors work within the algorithm is through a process called Syntactic Dependencies. This means, once it’s chopped up a sentence and found out what the words mean, it then sees the context of the words in relation to each other in accordance with the grammar.
- Latent Semantic Indexing is something that SEO’ers should be doing already albeit more manually, which is to look for synonyms for keywords but also other words relating to that content. ML software uses a denser quantitative approach with processing massive data-sets at a faster rate. This makes the bots smarter at filtering out keyword-stuffing.
TensorFlow is not only a platform for developing ML, but there is also another form of machine learning that most SEO experts will think has nothing to do with search engines, this is called Deep Learning.
What is Deep Learning
Neural networks are at the core of Deep Learning (DL), emulating the human cognitive process which is how the brain processes information. Google is on the forefront of this breakthrough that is DL within AI and ML. In 2014, Google bought the company DeepMind which developed AlphaGo which was able to beat the world champion of the traditional Chinese board game Go. A feat that was thought impossible due to the sheer unthinkable complexities with the outcome of each move. But, why would Google (a search engine) be interested in emulating the human brain?
Here are a few ways it’s applied already:
- Image recognition. Google used a DL process to categorise better and classify images to give users better search results. In turn doing the job of the alt tag which shortly may become obsolete.
- Google Assistant runs on a DL platform called Google Neural Machine Translation and is key to understanding natural language.
- Our viewing habits are under constant scrutiny in YouTube with Google using Deep Neural Networks to provide better recommendations. Google released an open publication in 2016 of how this works which can be seen here.
Before you bow down to our robot overlords, there are downsides. For instance, last week one of Google’s ML API’s graded ethnic minorities as unfavourable – proving that although the machine is a bastion of impartiality, the data it mines from the internet is somewhat dirty and bias. Concerns range from how will the algorithms parse this bias data or with Google’s obsession with catering the web for the user. Will users see their own biases reflected in the SERP?
As already mentioned, we need to be self-aware that more of our daily activities are being regulated and monitored by machines which are learning from our habits to become “more human than human”. But don’t start wearing a tin foil hat, positives from an SEO perspective are that this is reinforcing what content should be about and the reason ML has been implemented within the search. It shouldn’t be poor-quality keyword-stuffed robot bait and it should be written in a natural language with a clear audience in mind.