Google and Base64 (btoa, Data URIs)

Print Friendly, PDF & Email
This time we'll experiment with base64. First, we need to explain what it is, so if you already know it, just don't read the following paragraph.

What is base64 and why it's being used

base64 is an encoding algorithm or scheme. Basically it allows us to encode any data in text format so that it could be decoded later. This is usually used to store any kind of data in text format. Why is it used in web?
  • Store images and other binary files in DBs
  • Have non-text resources served within the main http-response
  • Hide things from people
  • To ensure safe storage of information
  • Many other things I didn't think of
Let's take a look at how this base64 looks like: This is a sample text that is encoded into base64 below: VGhpcyBpcyBhIHNhbXBsZSB0ZXh0IHRoYXQgaXMgZW5jb2RlZCBpbnRvIGJhc2U2NCBiZWxvdzo= How is it done? Well, you can use lots of online services, but you don't need it since you always have your wonderful JS console, don't you? JS console screenshot displaying atob/btoa you can read more and play around by the following links: Okay, it was quite brief, but should do. Let's start the experiment.

Would Google index data URIs?

The following image doesn't have any third party source. The whole image is actually in the img tag. You can inspect it and see how it looks from below.
Tardis
Okay. We'll see if G indexes Tardis on this page. Although it's not quite unique. Anyways. Why would anyone do something like this? Well, it's a trick to make browsers download the image within the same http request as opposed to the normal way when after the initial request, response and parsing, browser sends additional requests to load all the resources that were not downloaded with the initial http response. A neat server push would make some sense here, but let's leave it to the webdev professionals. We're just simple SEOs here.

Would Google execute and index various eval(btoa())?

First of all, eval is not safe. Don't use it on the backend. Many systems have been hacked because of evals. We're gonna use it on the front, so it's relatively fine. What does eval do? It executes an input string. So the input is code. Why would it be used in conjunction with btoa? To hide front-end JS from other people who get confused by base64. Let's take a look at how it works in console:JS console screenshot displaying atob/btoa in eval Okay. Now let's write some code. Below we would have inline JS and a span with our JS output Okay, the math-related text was injected with base64ed JS. The funny thing is that WP didn't let me do this with normal code. It populated my JS with its paragraphs, destroying the syntax. However, it doesn't do it with base64 since base64 doesn't have spaces and new lines for WP to corrupt. If you inspect the text above, you would be able to see the code. Here's how I tested it: JS console screenshot displaying atob/btoa in eval - Large Looks good. We conducted two experiments here to test how Google treats base64 in two different cases. I have feeling I missed something important. I wanted to test something else related. Will add it later if remember. Let's see how fetch&render will display the page: Looks like Google sees both: base64 image and base64 evaled JS code: Screenshot of Google's fetch and render results to check if it uses base64 data uri images Screenshot of Google's fetch and render results to check if it uses base64 eval code

Results

As expected, Google rendered the page. A screnshot of Google Search results proving it executes atob/btoa and evals in JS

Leave a Reply

avatar
  Subscribe  
Notify of