How Google treats texts with foreign characters?

Print Friendly, PDF & Email

Let’s test if Google can adequately parse text that has foreign characters in it. By foreign characters I mean symbols and letters from other alphabets that, despite looking the same, have a different UTF-8 code.

Let’s take a glance at some examples with cyrillic symbols:

Latin | Cyrillic

a A | ? ?

c C | ? ?

e E | ? ?

k K|? ? – this one looks a bit off

You get the idea.

What I’m gonna do is write a few English words, replacing some letters with the corresponding letters from the Cyrillic alphabet. After doing so, I’ll let Google index the page. Once it’s done I’ll try searching for those words now with all English letters. If this post appears on SERP, it would mean that Google is smart enough to treat the words based on how they look. If not, it would prove that Google works with character codes rather than with how they look like. I believe the former case would be the thing.

Oh, and maybe it would make sense to test it in Bing and Yahoo, too. You could do that on your own really.

Here are the words:

Schw? – the most popular sound in English.

Y?nic – female version of phallic.

Zugzw?ng – a situation when any decision would make things worse.

(compromised by the images) P?trich?r – a smell that occurs with first rain drops and holds for minutes. Basically, this smell is wet dust.

Okay. these were the words. I actually tested an idea quickly. What would SERP do if we looked for a word with “wrong letters”?

This is a legit search:

SERP for legit Petrichor

SERP for legit Petrichor

As you can see, Google is able to pull the results. Now what happens when we replace e and o with Cyrillic alternatives:

SERP for not legit Petrichor

SERP for not legit Petrichor

No results. Niiice. I feel like this experiment will show the need to add more functionality to my scripts, Now to check for foreign alphabet symbols on clients’ sites 🙂

Can’t wait to see the results. C’mon, G, index it 🙂

Okay, G indexed it really quickly.

To my surprise, it didn’t recognize the words. Google, I thought more of you. You can check it yourself, googling the keywords, but here are screenshots demonstrating the behavior:

Google Search Results for legit keyword

Google Search Results for legit keyword

And here’s what happens when I use the broken keyword:

Google Search Results for broken keyword

Google Search Results for broken keyword

There are a few sites besides mine that have it written wrongly.

PS

Okay, if you realized that you can use my article to find compromised sites by parsing tons of SERPs the way I demonstrated before and offer them SEO services that would 100% be beneficial… You owe me 5%. 😉