Google’s RankBrain and the Future of Smart Search
BY Ryan Conley STAFF CONTRIBUTOR
In October, Bloomberg News reported on a new facet of Google’s search technology called
RankBrain. RankBrain has been termed a form of artificial intelligence or machine learning
with the potential to shake up the entire search engine and SEO industries.
Google is being somewhat circumspect in how much it reveals about RankBrain. After all, the company’s competitors are always looking for ways to emulate its success. Businesses and scientists are heavily investing money and time in machine learning and artificial intelligence. Microsoft has publicly stated for years that it is researching machine learning for use in its Bing search engine. But the details of how RankBrain works are less important than the broader implications for search engines, web designers, content creators and the entire computer industry.
Google’s search engine algorithms employ hundreds of “signals” to determine each web page’s rank and its relevance to a given query. The company has publicly stated that RankBrain is already the third most important signal overall.
(In true Google fashion, they will not reveal what the first two are.) Thus, RankBrain is clearly very important to search, even at this early stage. But what does it actually do?
Google users enter some three billion queries into the search engine each day. The company says that about 15 percent of those queries are unique — no user has ever before performed exactly that search. RankBrain helps Google’s computers understand these unique queries. An example in Bloomberg’s report is the question, “What’s the title of the consumer at the highest level of a food chain?” While most uses of the word “consumer” probably refer to purchasers of products, it is also a scientific term for an animal that consumes another animal. Entering this rather awkwardly phrased query into Google indeed returns as the first result a Wikipedia articled titled “Consumer (food chain),” which contains the answer to the query: “apex predator.” (Note that searching for the exact question by enclosing it in quotation marks returns a list of articles covering RankBrain itself.) RankBrain is presumably helping to make a connection between this once-unique query and the innumerable other variations Google has encountered and successfully answered before.
An AI that can mimic all the cognitive tasks a human being performs is called an “artificial general intelligence” (AGI) or “strong AI.” AGI remains a hypothetical concept. The ramifications of a computer capable of tasks reasoning, creativity and extrapolation are impossible to predict, just as no one can perfectly predict the thoughts and actions of a person with free will.
An AI that is designed to do a strictly limited set of tasks is called an “artificial narrow intelligence” (ANI) or “weak AI.” A modern supercomputer programmed to play chess can beat virtually any human player in the world, but it cannot perform other tasks for which it is not programmed, no matter how simple. This is an example of weak AI. Another is Siri, the voice-activated assistant on iPhones. Siri is capable of accomplishing many tasks, but often is stymied by user queries it fails to understand.
Machine learning is distinct from both types of artificial intelligence, although it is surely a prerequisite to strong AI. Machine learning is a system by which a computer can adjust its own instruction set, or algorithm, to more effectively accomplish its tasks. Shortly after revealing RankBrain, Google held an event for tech journalists called “Machine Learning 101.” There, as reported by the website Marketing Land, Google engineers explained that machine learning systems comprise three main parts: the model, the parameters and the learner.
The model is a process for making decisions or predictions. The initial model is provided to the machine learning system by human programmers. The parameters are the factors or signals that the model uses to make determinations. The learner is a system that analyzes the differences between the determinations actually made by the model and the determinations deemed most accurate or optimal by human designers. The learner then makes adjustments to the parameters and the model in an attempt to optimize them to produce better results. The system then runs with its new model and parameters in place, and the process is repeated. Google says that an important quality of most machine learning systems is “gradient learning,” which means that the system favors small adjustments over larger ones.
Because RankBrain and machine learning are concepts very much in their infancy, Google says RankBrain’s learning occurs offline. That is, the changes that it makes are not immediately incorporated into the search engine’s algorithms. Instead, engineers analyze them to determine whether they actually improve the search process. If they do, they are integrated into a new version of RankBrain, and the new version is activated.
A Google blog post from 2013 provides another example of how computers can learn. The post explains the concept of creating “vectors,” or mathematical expressions, from words. Google fed the raw text from a large number of news articles into the system, which it calls Word2vec. Without any explicit instructions to do so, Word2vec recognized a similarity in the relationships between the words for various countries and the words for their capitals. As Google put it, “it understands that Paris and France are related the same way Berlin and Germany are… and not the same way Madrid and Italy are.” Google was careful not to say that its computers understand the words themselves or even the concepts of countries or capitals. But that is immaterial to search engine users, who can derive value from a system that is able to understand relationships between words even if it does not understand those words as humans do.
Much like natural intelligence in humans and animals, artificial intelligence and machine learning are rather abstract, ethereal concepts. In animals, and even in humans, it is difficult to say where instinct ends and intelligence begins. Likewise, computers may already appear to the layman to be capable of intelligent thought, when in fact they are merely executing huge and complex instruction sets. Few experts would argue seriously that today’s computers are literally a form of intelligence, but many think that the eventual emergence of strong artificial intelligence is likely, if not inevitable.
Some disagree, including philosopher John Searle. He created a thought experiment called the Chinese Room which illustrates the difficulty in distinguishing true intelligence from a highly complex instruction set. Searle imagined an English speaker locked in a room with a set of instructions. The occupant, when presented with a series of Chinese characters from someone outside the room, follows the instructions to correlate those characters with other Chinese characters, which he then copies and returns to the person outside the room. The instructions do not allow for translating between Chinese and English; they merely tell the occupant which characters to output in response to a given input.
Given a sufficiently sophisticated and exhaustive set of instructions, those outside the room would be convinced, mistakenly, that the occupant understood Chinese. Searle theorized that a computer can never amount to more than the occupant of the Chinese room. A computer executing a set of instructions can never achieve true understanding, no matter how convincing the result.
Search engines contain vast amounts of data. They know more about the raw content of web pages than any person ever could. But in terms of their ability to mimic higher cognitive functions, such as pattern recognition, decision making and understanding context, they hardly measure up against a toddler. Google and other search engine companies want their algorithms to emulate humans’ cognitive abilities. Only humans can effectively judge the quality of a source of information. The best possible search results for any given query would be ones manually curated by a human expert. Search engines do a fair job of approximating this ideal by following algorithms consisting of huge sets of instructions that have been painstakingly written, tested and rewritten by programmers.
The promise of machine learning and artificial intelligence is that computers may someday teach themselves. That is, having been endowed by programmers with the ability to alter their own algorithms, they will do so in ways that allow them to meet their goals — to mimic human cognition and make human-like value judgments — more effectively. If and when computers can reliably alter their programming in positive ways, the implications are profound, but quite difficult to predict. At the very least, it will reduce the burden on programmers of making innumerable fine adjustments to algorithms.
The immediate and direct implications of RankBrain for SEO are minimal. No one needs to change the way they design web pages or create content because of RankBrain. But it does serve to reinforce the notion that the old days of SEO, dominated by finding ways to game the search engines and fool them into thinking a web page is better than it really is, are long gone. People will always be the ultimate arbiter of what makes web content good, because people are the end users. The highest aspiration Google can have for its algorithms is that they will effectively mimic and anticipate a typical user’s preferences. RankBrain is a small but significant step toward that goal — one of many steps already taken since the early days of the internet.
And what are typical users’ preferences? They want accurate, well-written information that is useful and easy to understand. They want pages that load quickly and are free of clutter. Shortcuts to these qualities do not exist. They require skilled content creators and designers. The lesson RankBrain teaches is clear: do not cut corners on your website or try to game the search engines. With each passing day, they are better able to recognize true quality, which can only be achieved with time and effort.
Will Google’s machine learning systems someday make more changes to the search engine’s algorithms than the engineers themselves? It’s certainly possible, and perhaps likely. But humans will always be at the beginning and the end of the process. The systems will always be designed by engineers to accomplish certain tasks, and the goal will always be a search engine that returns results as if they were hand-picked by a human expert. Like Searle’s Chinese room, a sufficiently robust instruction set is indistinguishable from actual human understanding. You should therefore design your website and content as if all of Google’s search results are chosen by actual people, because someday, the difference may be merely philosophical.