Data Flow on the Internet

I lead the search and machine learning teams at My site. I think it's amazingly inspiring that people all over the world turn to search engines to ask trivial questions and incredibly important questions.

So it’s a hugeresponsibility to give them the best answers that we can There are many times wherewe will start looking into artificial intelligenceand machine learning, but we have to address how arethe users going to use this, because at the end of the day,we want to make an impact to society.

Let’s ask a simple question. How long does it take to travel to Mars? Where did these results come from and why was this listedbefore the other one? Okay, let’s dive in andsee how the search engine turned your request into a result. The first thing you need toknow is when you do a search, the search engine isn’t actuallygoing out to the World Wide Web to run your search in real time. And that’s because there’sover a billion websites on the internet and hundreds more arebeing created every single minute. So if the search enginehad to look through every single site tofind the one you wanted, it would just take forever.

So to make your search faster, search engines are constantlyscanning the web in advance to record the information that mighthelp with your search later. That way, when you searchabout travel to Mars, the search enginealready has what it needs to give you an answer in real time. Here’s how it works. The internet is a web of pagesconnected to each other by hyperlinks. Search engines areconstantly running a program called a Spider that crossthrough these web pages to collect information about them. Each time it finds a hyperlink, it follows it until ithas visited every page it can find on the entireinternet.

For each page the spider visits, it records any informationit might need for a search by adding it to a specialdatabase called a search index. Now, let’s go back tothat search from earlier and see if we can figureout how the search engine came up with the results. When you ask how long doesit take to travel to Mars, the search engine looksin each of those words in the search index toimmediately get a list of all the pages on theinternet containing those words. But just looking for these search terms could return millions of pages, so the search engine needsto be able to determine the best matches to show you first.

This is where it gets trickybecause the search engine may need to guess whatyou’re looking for. Each search engineuses its own algorithm to rank the pages based onwhat it thinks you want. The search engine’s rankingalgorithm might check if your search term showsup in the page title, it might check if all of thewords show up next to each other, or any number of other calculations that help it better determine which pages you’ll wantto see and which you won’t.

Google invented the most famous algorithm for choosing the most relevant resultsfor a search by taking into account how many other Web pageslinked to a given page. The idea is that iflots of websites think that a web page is interesting, then it’s probably the one you’re looking for. This algorithm is called page rank, not because it ranks web pages, but because it was named afterits inventor, Larry Page, who’s one of the founders of Google. Because a website often makesmoney when you visit it, spammers are constantly trying to find ways to game the search algorithm so that their pages are listed higher in the results.

Search engines regularlyupdate their algorithms to prevent fake or untrustworthysites from reaching the top. Ultimately, it’s up to you to keep an eye out for these pages that are untrustworthy by looking at the web address andmaking sure it’s a reliable source. Search programs are always evolving to improve the algorithms wo they return better results, faster results than their competitors. Today’s search engineseven use information that you haven’t explicitly providedto help you narrow down your search.

So, for example,if you did a search for dog parks, many search engineswould give you results for all the dog parks nearby, even though you didn’ttype in your location. Modern search enginesalso understand more than just the words on a page, but what they actually meanin order to find the best one that matches what you’re looking for.

For example, if you search for fast pitcher, it will know you’relooking for an athlete. But if you search for large pitcher, it will find you optionsfor your kitchen. To understand the words better, we use something called machine learning, a type of artificial intelligence. It enables searchalgorithms to search out not just individual lettersor words in the page, but understand the underlyingmeaning of the words. The internet is growing exponentially, but if the teams that designsearch engines do our jobs right, the information you want shouldalways be just a few keystrokes away.

