Black and White Hat Website Design

Anybody with a website will have received emails from people with the mysterious title SEO. SEO stands for Search Engine Optimiser. Some of the emails will inform you that your site could be at the top of Google within days with the paid help of said SEO. There are two ways to do this. The first is to cheat or, in the parlance of the web, adopt Black Hat design techniques. Since cheating is always more fun than doing things properly we will look at Black Hat techniques first and White Hat techniques later.

Black Hat techniques are designed to persuade search engines that the subject matter on a site is something other than that displayed to the viewer and that the site is more extensive and more important than it really is. The search engines depend on a number of parts of the site to determine what the site is about for example the text, titles, image names, internal links, links to other sites and those mysterious meta tags. The idea is that search terms or key words and key phrases, the words and phrases visitors use to find the site, are prevalent in each part. The search engines determine how important the website is to viewers by the number of links to it from other sites and how relevant they are to the content of the websites to which they are linked. They also keep a count of how many visitors the site has, the more visitors the higher its status.

Black Hat techniques include devising search words and terms that will draw the search engines to the site but that are not relevant to the site. There are a number of ways of introducing bogus search terms into web pages so that they are not visible to the viewer. For instance if the search term text is the same colour as the background colour it will be invisible. These search terms are not relevant to the content of the page or site on which they are hidden; if they were they would be visible on the page and probably within a title or subtitle. They are however visible to the search engine robots that trawl around the net.

Key word or key phrase stuffing is another way to persuade search engines that a site contains more that it does. Search engines expect key words and key phrases to appear in pages on a website in various places as explained above and, to a certain extent, words and phrases that appear a number of times are considered important. Black Hat designers put key words and phrases as visible text in the pages and in meta tags. Meta tags are not visible to the viewer. They then go overboard and load the meta tags with repeats of the key words and phrases and stuff the page with the same words and phrases but hidden from view.

Another Black Hat technique uses 'doorway' or 'gateway' pages. These are pages that contain irrelevant search terms that have a piece of code in them that redirects the viewer immediately to the target site. A viewer never sees the window page because they land on it and immediately jump off to another page, the target.

Another cheat is called link farming. Link farms are websites that consist primarily of links to other sites. The most famous was millionpixels.com where the single page contained thousands of minute images, each one linked to a site. Businesses paid 1 dollar per image. The sheer number of diverse businesses represented on the site meant that any search was likely to result in a page one link to millionpixels.com. The worst link farms now have fewer links but charge more for the doubtful privilege of being on them.

The trouble with cheats is that the search engine robots are becoming very clever and can identify websites that employ them. They can detect hidden text, keyword stuffing, doorway pages and link farms. What they then do about it is another matter. Enter Google Caffeine.

Google Caffeine is a new breed of search engine that is being introduced in January 2010. It has actually been around for some time in test mode, gathering information from all over the Internet. The process is complicated but all website owners need to know is that if their site uses cheats and Google Caffeine finds them they will be penalised. At best your site will drift down the rankings, from Page 1, to Page 2, to Page 3 to oblivion. At worst you will be dropped altogether.

The second way to emerge at the top of Page 1 rapidly is actually encouraged by Google because they benefit financially from it. It is called Sponsored Links. Basically the website owner hands Google large sums of money and tells Google how much they will pay every time somebody clicks on their link. Google for their part ensures the link appears at the top of Page 1 in response to all relevant searches. Until the money runs out that is. Then you either hand over some more money or disappear from sight. Believe it or not many businesses actually benefit from Sponsored Links, they pay Google less than they make from the extra visitors to their sites. The problem here is that the SEO may not have optimised your site at all.

The question is, how does a website owner tell whether his site is designed and optimised properly or not?

The easiest way to take a first look is, in Internet Explorer, the browser used by most people to see the Internet, to examine the blue stripe at the very top of every page. It should give a brief summary of the subject of the page and those words and phrases should appear within the page the viewer sees.
I have chosen 'Black and White Hat Website Design | Keyword Stuffing | Gateway Pages | Link Farming' as the title and the same for the description.

With any page of a website open
Using your right mouse button click on the page and then click your left mouse button on View Source.
View Source
You will see the code that goes to make the page itself together with the contents of the page.

Title Tags

Towards the top of the page you should see something similar to the image shown above

Anything between a < and > is called a tag.

If these particular tags are not there or are empty or contain more than one reference to keywords and keyphrases or contain keywords and phrases that do not appear in the text on the page then you have been the victim of, at worst, a Black Hat designer or at best an incompetent optimiser

With the introduction of the latest Google Caffeine search algorithms and the increasing communication between search engines, for example Open Directory shares its database with Google, Yahoo and others, it has become apparent that those websites that have employed Black Hat design techniques are being penalised by being dropped through the search rankings, the equivalent of a slow cyber death. This month we look at essential elements of White Hat website design.

A website has to have a purpose. It may inform or educate or offer services or products. Whatever else it may be, for a business, the website is often the first glimpse of the business that a potential customer sees. It is the shop window through which customers look before deciding whether or not to enter. The website should reflect the type of business you are operating, it should look professional, be easily understood, easy to navigate and fast to load. Nobody likes waiting in a queue. Ideally it should offer something new periodically to entice customers back and it should be focused on its purpose, not drifting off into irrelevant blind alleys. If these characteristics sound uncannily like the design features looked for by a human. They are. They are also the characteristics looked for by search engine robots, small programmes that trawl the Internet visiting websites. How they identify good sites changes weekly, daily sometimes, as each search engine refines the algorithms used by its robots. Increasingly this process is automated in the sense that the robots learn from past experience. Just as a human can put a hand in a fire and decide never to do that again so the robots can visit a website and have such a bad experience that they 'decide' never to go there again.

Although all this sounds complicated and likely to keep a web designer in permanent employment, changing the site daily to keep up with the demands of the robots, that is not the case. Since the robots are looking for exactly the same design features as humans if the site is initially built to provide a good experience for humans then it will also be good for the human emulating robots.

One of the criteria looked for by robots is the number of people who visit the site, the more visitors there are, the longer they stay on the site and the frequency with which they return, indicates the relevancy of the site. There are three stages to consider, bearing in mind that, apart from people being told directly that the site exists, nobody else knows the site is there. The first stage is getting people to the site, the second stage is keeping them on the site and the third stage is persuading them to return.

Most people find sites through searching. To search the web they type in keywords and phrases. Those keywords and phrases are injected into the database that stores information brought back by the robots. Websites that contain those keywords and phrases are listed. The list is in an order called priority and it is a priority that is decided by the search engine. This is a critical stage. The robots must have found the keywords and phrases in the first place in various parts of the site for instance:

The page title, visible in the blue bar right at the top of a web page.
The page description, not visible to the viewer of a web page but often seen in the search results page.
In image alt tags, not seen directly but appear as a small window when the mouse cursor hovers over an image on a web page.
In image names, not seen directly but can be found by hovering the cursor over an image and clicking the right mouse button and then the word Properties.
In page names, seen in the address bar.
In internal hypertext links, text or images on a page that take you to other pages on the same site.
In external hypertext links, text or images on a page that take you to another site.
In the title of all hypertext links, not seen directly but appear as a small window when the mouse cursor hovers over a link image or text.
In paragraph headings visible on the page.
In bold text visible on the page.
In the body text of paragraphs on the page.

The robots however are not stupid. They look for keywords and phrases as a percentage as compared to other text on the page. It is essential to achieve a balance between emphasis and a technique described earlier 'keyword stuffing'. A new site that displays White Hat design techniques can appear high on the priority list in a relatively short space of time because it will have gained priority over sites that do not.

The robots must also have found the visit 'pleasurable' as well as informative. Here the robots actually look at the design of the website pages to try to determine whether or not it will be a pleasant experience for a human visitor. Now this sounds impossible but here is a list of the design features robots actually do look for:

Using many sizes of text on the same page. Makes the page difficult to read and looks infantile to humans.
Using many font colours on a page for the same reasons.
Misspelling.
Long or unusual words.
Extraneous coding. Code that is not necessary often left behind by pages created using a free 'web design programme'.
Repetitive code. Usually the code that determines the 'style' of the page or the colour and size of text or how images appear and links work can be inserted using a technique called 'including' that reduces potentially hundreds of lines of repetitive code to a single line.
The speed at which a page loads. This is determined by the number of lines of code, the number and file size of images, and the web server on which the website resides.
Impenetrable code. Code that does not conform to the compliancy rules of which there are thousands and, horror of horrors to many space age designers, all flash, typically those intro pages that have wonderful swirling imagery that robots cannot penetrate to see what is behind.
Navigation. The ease with which the robots, and ultimately humans, can travel between pages without getting lost.

Websites that load rapidly automatically receive a higher priority than slower websites. Sites with good navigation also score higher than sites in which the robots, and humans, get lost. The robots also look at the other design criteria and allocate an 'age to read' score. Ideally the site should be easy to read by all visitors. On average a good age to read score is between 10 years and 15 years. That is tough if your site is about nuclear physics but fine for the majority. Again, even a new website can immediately achieve higher priority than its older peers by employing White Hat design.

Having created a page that achieves lots of hits it would be tempting to then use that fact to make money by changing the content on the page to display information that you planned to display in the first place but which is say illegal or a straight forward sales pitch for a product or service that is irrelevant to the initial subject of the page. This is a strict no no and will result in that page or site being dropped very quickly.

Returning to White hat techniques. The search engine robots are looking for exactly the results that you, the site owner, wants. You want people to stay on the page long enough to absorb whatever information is on it, a property called 'page stickiness' and you want them to re-visit the page.

Think of a non-fiction book. You initially find the book in the library, bookshop or on line bookstore by searching for the subject material of the book. If the contents of the book interest you then you will read it from cover to cover. You may find references or facts within it that you know will be useful in the future so you put bookmarks between the pages so that you find it easy to return. If the information becomes outdated then you may well go through that process again in order to obtain the latest edition. The second search for edition 2 is much easier because, in addition to the subject matter, you also now know the author and title of the book. Is that not exactly how you want people to use your website?

Initial content then is important. It must be readable, interesting to the chosen audience and contain information that they did not previously know. It should also be new content, not regurgitated from another website. As this article was being written a newsletter arrived from a respected designers forum. Since search engine robots note the time and date of 'first discovery' they can also detect copycats. They in effect put their own copyright on the material they find first. Sites that display the same information at a later date will be penalized. The answer is simple and a White Hat optimization technique in any case. Refer to related data via a link from your page to the original data, exactly as happens with references in non-fiction books. Wikipedia by the way is recognized as the ultimate linking machine both internally and externally and that is one very good reason why it achieves very high rankings in searches.

If well written the initial content will automatically contain the key phrases that people will be searching for. It is well worth jotting these down before you start. Even better ask somebody else to write them down since you will already have preconceived ideas of what the key phrases should be. Similarly have somebody proof the entire content, not just for spelling and grammatical errors but for understandability. Do not forget you may be an expert in your subject whilst your audience is not. What makes perfect sense to you may be nonsense to others. Remember the 'age to read' score discussed last month.

Having uploaded the site or page the next step is to get people on it. If the content conforms to the rules suggested here the search engines will help enormously but there is plenty you can do yourself. You need to tell people the content is there. You can send out emails and use social sites like Facebook and Twitter. If it is new and relevant you can even consider having a précis or selected parts of the content published in trade journals, magazines and newspapers. Make sure you refer to your website for further information.

The next task is to have people return to the site or page once they have found it. This is often considered more difficult but in reality it is not. Think about the learning process. At primary school you were taught definitive facts. 'The Battle of Hastings was in 1066'. That is a fact. Once you know it you no longer need to remember how you came about that information. In secondary school you delve a little deeper into any subject. You will learn that William the Conqueror invaded England and defeated King Harold in battle so changing the course of history. Again, once you know the set of facts you have little need to remember how the information came to be in your brain in the first place. By the time you get to university however things have changed. You will be adding to your store of core knowledge by examining previously held concepts. Why did William invade England? Why did Harold lose the battle? If the result had been reversed what would the future have held? In order to answer those questions you may have to consult small parts of a large number of books. The trick now is to remember not what the books said but how to return to that particular set of facts that allowed you to assemble your arguments. That is what you are trying to replicate on your website.

You want just enough in the initial content that people put the site or page into their favourites or bookmarks rather than try to remember every word and argument. Support your facts with an argument and a conclusion. People will remember the facts but not how you reached your conclusion. If appropriate you can even generate Internet debate and therefore an increase in use of your keywords and phrases by making your conclusion controversial, but not that controversial that you will be considered a nutter. Finally you can leave your reader with a 'draw'. Let them know that if they return next week or month they will be able to read the next installment.

New content should be added to existing content. Remember that the search engines will have constructed indexed links to the original content. If the original content disappears then over time so do the links. On the other hand the robots like to see new content periodically. They consider that another sign of a good, reputable, worthy site. Very good sites with regular new content often benefit from a search result that shows the main or first indexed content and, below, second references or even third. Then you know you have a White Hat site.

Article written by Nick Nutter December 2009 to February 2010

For more information on Search Engine Optimisation and WebSite Design
contact (+ 34) 951 276 283