← Go back

My Google Search History – Tutorial

image

One month ago, I posted an article with an analysis of my Google search history. People liked it. But people also asked “How did you do this?”

It’s easy!

And to demonstrate that, I will write my first tutorial. It’s an extremly detailed tutorial for absolute beginners in …well, everything. If you can code, there are better ways for you do clean up and visualise the data (eg., a way to merge the Google Search history JSONs was written down here in Ruby by Daniel Kirsch). And if can’t code, but you’ve dealt with JSONs and/or Google Refine before: this is the short version – or, let’s say, ONE short version, because of course there are many ways to visualise the data:

Here comes the long version:


1. Download your archive

Follow the instructions on this site and you end up getting such a screen:

image

Listen to the Google guys: Be careful with your data! It IS sensitive. It contains your darkest desire and highest hopes – especially if you’re as careful with what you type in Google as 99% of the population.

After clicking on “Create Archive”, you’ll need to wait. Google will send you an email as soon as your archive is ready. Then download this archive and unzip the ZIP file they give you. Congratulations, you know own your Google Search History (together with Google. Well, Google owns it more.) First step: Done!


2. Merge the JSONs

You unzipped your files? Good! Looks like it’s not a simple Excel file called “YourSearchHistory.xls”, right? The files you unzipped are JSONs, a neat file format. Google created one JSON for each quarter in which you’ve used Google Search. Which might be convenient for them, but not for us: We want to have ONE file for ALL our search queries. So we need to merge our JSONs to one mega JSON. And the least advanced but most simple way to do that is by hand.

Try to doubleclick on such a JSON file. If you’re not satisfied with the software that opens this file, you might want to download an editor like Sublime or Brackets. Your JSONs folder and your open JSON files should look like that:

Awesome gif

So what do we see? Let’s understand the structure and content of our JSONs a little bit, so that we’re merging them correctly.

Every JSON starts with an{"event": [. And because every bracket that got open in JSON needs to be closed at some point, every file ends with ]}. Between these two brackets go all our queries. All of them come with a unix timestamp that humans can’t really read, but that work perfectly for computers. We’ll convert them later in the process. Then we can see two objects called “type” and “tbm”. They basically communicate us the same: Which kind of search we have used. “isch” means image search, “blg” means blog search, “nws” means news search, etc. Find all values on this site. And at the end, we can see our search query text! In this case, I’ve searched for images about “intellectual property” at the 4th of July 2010 at 3:53pm.

   {
      "query": {
        "id": [
          {
            "timestamp_usec": "1278251637054301",
            "type": "IMAGE_QUERY", 
            "tbm": "isch"           
          }
        ],
        "query_text": "intellectual property"  
      }
    }

When we’re now merging our JSONs, we can’t just copy and paste one JSON to the other. We need to make sure that we maintain the struchture: we open only once with {"event": [ and we’re only closing once with ]}. Meaning, we need to delete all the additional {"event": [’s and ]}’s before we’re copy and pasting them. So our final JSON should look like that:

Awesome gif

There is another thing you need to change in your JSON: commas. Before the last closing brackets of the whole final JSON, there shouldn’t be a comma. But between all the other {"queries"}, there needs to be a comma:

{"event":[
	{"query": “…”}, 
	{"query": “…”}, 
	{"query": “…”}
]}


To sum up the rules for merging our JSONs by hand:


YOU DID IT! Yeah! To get that right is the hardest part of the whole tutorial – everything will be easy from now on.


3. Convert the JSONs to an Excel file

JSONs can be directly visualied with d3.js for example – but you don’t know Javascript. So we will need to convert this file to a format that Tableau Public can read (=the software we’ll use to visualise the data). For the converting part, we’ll use OpenRefine, actually a tool to clean up messy data. There are many JSON-to-CSV/TSV-converter online, and if your JSON is fairly small, I would recommend them. But because my final merged JSON file “AllSearches.JSON” was 4.6MB big, these tools had some difficulties dealing with it. OpenRefine, however, did a smooth job. Also: I’m a huge fan of this software. So go download it here and use it to clean up data on every future occasion!

For our little project, we will click on exactly four buttons. So open OpenRefine (it should open in your browser) and do the following:

Awesome gif

4. Convert the timestamps

Open the exported Excel file with Excel or LibreOffice. You should see lots and lots of rows and four columns:

image

The first one shows your query text. The second one your timestamp. The third one your query tbm and the fourth one your tbm written as type. You will see that I didn’t lie to you: “tbm” and “type” ARE communicating the same. So if you want to, you can delete one of the columns.

You may wonder: “Well, we got our Excel file, right? Tableau can read Excel, so why are we not in Tableau yet?” Patience you must have my young padawan. Look at the long number in your second column with the header “query - id - anonymous - timestamp_usec”. Can you read it? No. Neither can Tableau.

So we opened up Excel to convert that timestamp to a date that we and Tableau can read. Do the following steps:

Awesome gif

If you don’t just want to follow instructions but also want to know “Why?”, you might be curious about the formula =LEFT(B2;10)/(60*60*24)+"1/1/1970". What does it do, exactly? For that, you need to know a little bit about Unix time. It’s the attempt to display time as a simple machine-readable number instead of having these inconsistent date formats like “01/07/2015 16.50” or “2015/07/01 4:50pm”. The unix time increases with every second. And it has counted all seconds since the nice winter day of 1st of January 1970. And, last thing you need to know: The unix time has 10 characters.

Now look at our timestamp column: Whoooah, every cell has more than ten characters! What does it MEAN? It just means that Google wants to be damn exact and gives us the microseconds, too. But no problem! We just cut down the microseconds; seconds should be enough for us.

So that’s what our formula is doing: First, it only takes the first ten characters of our timestamp cell B2 and ignores the rest of it: =LEFT(B2;10). Then, it divides these numbers through 60seconds, 60minutes and 24hours: =LEFT(B2;10)/(60*60*24). And then it adds the date on which the unix time started counting: =LEFT(B2;10)/(60*60*24)+"1/1/1970". And voilà! A “normal” date is born.


5. Visualise and analyse your data with Tableau

Finally! We arrived at the part of this tutorial in which we will see a visualisation of our data. For that, we’re using Tableau Public. Why that, you wonder, and not Excel? Because Tableau Public is really good in dealing with fairly huge numbers, having an intuitve interface and therefore letting you see different perspectives of your data in quick and easy way. Try it. You will like it. If you haven’t downloaded it yet, go here and do it.

Then, open it and do the following steps:

Awesome gif

Awesome gif

The sad thing about Tableau Public? You can’t save your results on your computer, only in the Tableau Cloud. And listen to the Google guys: Your data is extremly sensitive! You might not want to save it to ANY cloud.



That’s it! You’ve made it! Congratulations! You downloaded, merged, cleaned, visualised and analysed your Google Search history. I’d love to see what you’re coming up with. If you have some interesting insights or visualisations, send me a link at lisacharlotterost@gmail.com or comment on this blog post. Also, if you know a way to analyse your data in a super simple way, please share your knowledge!