How to prepare a CMS for website translation

Ian Harris

Executive Chairman


Technical SEO

Introduction

Moving from a single language to multiple languages is a daunting process when approached for the first time. To do this right is as easy as doing it wrong, but the penalty for doing it in the wrong way can be immense. It can mean that translation becomes difficult or impossible, that the site does not list on foreign search engines, or may mean that future updates are expensive and complex because the structure makes it impossible to apply cost-saving translation tools and techniques.

Preparing a site for translation is critical to success

Preparing a site for translation is critical to success

This document contains eleven tips when moving a website from a single language into multiple languages. It is a technical guide, written for the web designers/developers, and has been built from years of experience of helping companies through this process. The tips are designed for database driven or content managed websites, but many of the techniques discussed also apply to flat HTML sites.

1. Don’t build a user interface for translators

A mistake made by almost all CMS vendors when they embark on making their system multilingual is to build a set of screens to facilitate translation. They allow the translators to see the English (or source language) page and provide a box, or set of boxes, to type in the translation of the page. Don’t do it. It is a waste of time. Translators and translation companies can translate files, such as XML files, text files, HTML or similar, much more efficiently than they can work within any interface you can provide.

Translators use electronic glossaries and automatic translation memory tools when they translate. These are systems that suggest certain words or phrases to use based on client-approvals or previous translations. Translation memory ensures that if a sentence was translated in the past for this client in one way, that it will always be translated that way in future. It is essential for quality control, speed of turnaround, and delivers major cost saving. Translators cannot use these essential tools if they are translating through your interface and you will be adding cost to their work while degrading the quality.

The translation tools also protect any non translatable content, such as XML and HTML codes, and they expose only the translatable text to the translator.

Many translation companies have web-service based translation interfaces so you can automatically call out for profession human translation via an API. Such interfaces make on-going updates easy.  However, consider the frequency of changes to the site before deciding whether to use such an interface. It may make easier development and be just as effective to simply catch up with updates every month.

2. Be careful using a cookie-based language selector

We have been asked by many companies to optimise their sites for search engines in other languages, only to find that the language selector relies entirely on a cookie. This seems like a good idea at the time: the user comes to the site, sees a screen that allows them to set the language in which they would like to view the site, and then they proceed through the site with the cookie set to their preferred language. Any time they revisit the site the pages are served in their chosen language.

This all works fine, as long as the URLs for the language pages are unique to that language. For example, if the English product page is www.yoursite.com/products, and the French products URL is exactly the same (but serves up French content if the user has the language cookie set) then you will never get listed on Google France. The Googlebot does not have a language cookie, so must be able to spider your language pages without selecting its language first using your cookie system.

Another problem is that even if Google could spider the pages any inbound links to the French pages (essential for multilingual SEO) will only serve to boost the English ranking because the URLs, when followed in the absence of the cookie, will serve up English pages.

3. Support language switching on every page

The most user-friendly multilingual sites have a language selector on every page. This means that a user can arrive at the page, even if it is a page deep within the site, and will see a language selector. If they choose to switch languages they will not be thrown back to the homepage of that language, but will instead be served the same page in the chosen language.

This element is not essential, but it provides excellent usability. More importantly in my opinion, it is an excellent tip that will enforce disciplines that will help at every stage. If the site is built like this from the ground up, most of the other elements will fall into place.

Note that not every page will have an equivalent in all other languages – see below.

4. Consider non-translated sections

There will be sections or pages of the site that will not have an equivalent section in every language. These may be news pages where the English news is not relevant to other markets, or it may be that certain products or services are not sold in other markets.

The CMS needs to be able to deal with this situation. If a user is viewing an English product page and then swaps to French, where no French page exists, the user needs to see a message in their language explaining the situation.

If browsing through the French product pages, the user should not see menu items or links to products that do not exist in their market.

This requires good planning and structure, but considering these situations up front will avoid costly problems later.

5. Have localisable SEO-friendly URLs

SEO friendly URLs include strategic keywords in the directory structure or filename. An example of an SEO-friendly URL is:

www.yoursite.com/digital-cameras/kodak-x1.aspx .

As opposed to a non-SEO-friendly URL being:

www.yoursite.com/products.aspx?cat=1&subcat=9&prod=3

The URLs are often forgotten in the localisation process. There are two problems with this:

  • Firstly, the French user will see immediately that this is an English site. This can be very detrimental, depending on the market.
  • Secondly, the keyword friendly URLs will not help the SEO effort in the foreign markets.

Therefore if SEO-friendly URLS are going to be used, make sure they are translatable along with every other piece of content. If this is not possible, or is too complex for your development, stick with non-friendly URLs. This way, you will not put off users in other countries.

6. Make sure you can split to local domains in the future if you want to

If the language sites do well, site owners will be demanding more. This will mean setting up an office in France, getting French speakers on board, and will almost certainly mean presenting the site on a ‘.fr’ domain.

Make sure this does not cause a rewrite. If you initially build the site so that the French pages are served up from a /fr directory (such as www.yoursite.com/fr/page.html) then this switch will probably be fairly easy. Be ready for the switch!


7. Get the Database Structure Right

The structure of the CMS database must support multiple languages. Every text item which appears on screen must be held in a multilingual structure. This can be done in a number of ways. For example, the entire database can be replicated for each language, or additional fields added in tables where text is held, or there are a myriad of other options.

We would always recommend the separation of all text into a single table. This will mean that the language processing for the site becomes very easy and this will bring massive rewards. Ongoing updates will be easy from a programming perspective, and localisation of a single table will be easier. The structure of this is below:

Before Internationalisation:

before translationAfter Internationalisation:

after translation

The Text Table is a new table and will contain all translatable text within the site. All text from the database is extracted out to this table. A single set of routines then calls the text (depending on the language of the user) to display on screen, and a single set of routines manages the translation process.

8. Separation of Translatable Content from Business Logic

It is important that the translatable content should be separated from the business logic of the content management since this will enable a single set of code to be maintained. This is desirable to aid version control and maintenance of the site.

All displayable text must appear in the database. The programs (ASP, PHP, or other pages) should contain the business logic and display mechanism only. If any text remains in pages, it must be extracted into either the database or into resource files. Note that some text (certain error messages) may not require localisation.

9. Localised Formats

Items such as dates, currency, number separators have different formatting in different locales. Note that Microsoft and other development system vendors provide facilities to handle the formatting of dates and numbers in the session variables. It is important to get this right because credibility is a major factor affecting conversion rates on websites. In the UK, the message: ‘Buy Now Risk Free for only £7.999,00’ does not look right and so damages credibility. Likewise the wrong formatting in other languages is equally likely to create the wrong impression and deter conversion.

10. Program Support for Alternate Character Sets

If any of the business logic of the site is looking for user input, the resulting character parsing should not assume that Latin script will be entered. The site could be deployed in Japan, so the logic must deal with Japanese input if required.

If the site contains a search mechanism, this must work in alternate character sets. Multiple word searching may require segmentation of Japanese (and other) characters so that the user gets the desired results. Third party tools are available to help with such issues.

Most systems support Unicode now, so it is always advantageous to use the search facilities of third party tools or the database server. Using home-programmed search tools may cause problems.

11. Remove All Text-Based Logic

Within the program there may be certain switches software which are based on text items. An example of this would be:

If ProductType = ‘Trousers’ Then

Display (“Please enter the leg measurement of your ” & ProductType)

End If

This trivial example shows how a selection can take place based upon an item of text, and then that item of text may display as output to the client. Apart from being bad programming practice, the translation of the ProductType would therefore break this switch.

These switches must be removed from the code.

About the Author

Ian Harris has worked in website internationalisation and multilingual search engine optimisation for the last ten years. Prior to that he programmed e-commerce applications. He has helped many companies take their web presence abroad, including British Airways, IBM, Novell and HSBC. He is the founder and CEO of Search Laboratory, a multilingual search engine marketing company.