Archive for the 'Web Development' Category

Web Developers Available and A Junior Web Vacancy

As you might expect, I have a lot of friends who work in the web industry, from programmers to graphic designers, SEO experts to pay per click wizards. In a strange and unconnected coincidence, I have three different friends all with a pretty high degree of experience looking for new opportunities in the web development arena.

So, if you’re looking for a web developer to join your team right now, drop me an email and I’ll put you in touch. Their experience in all cases is in excess of six years, many more for some and by and large they are all front-end developers with some scripting skills, so they can do graphic design, layout, HTML/CSS and implement some JavaScript, ASP, PHP etc. They have a fair bit of management and client liaison experience too, so these aren’t techies with no social skills.

I also heard from a client today looking for a junior web manager, someone who can do a bit of graphics manipulation, HTML, SEO, AdWords etc. Some of that can be provided as training, the key is that you understand the web and have some basic skills to start with. Again, if you know someone, drop me an email.

Solving Website Structural Problems With The Canonical Tag

Long time no blog! I hope you all had a good festive season. I thought I would kick off the new year with a technical post, as Google announced cross-domain support of the Canonical tag last month (worth reading for the explanations of when you might want to use it and how to implement).

You may remember from my earlier post on the canonical tag, that it is a way of telling the search engines the “master” address of a page, when multiple addresses for the same content might exist. Why would you have multiple addresses (URLs) for a page, you might wonder? Well, how about a product list on an e-commerce website with options for ordering the products alphabetically, by price or by manufacturer? It’s likely that the URL will be different in some way for each version of the list, even though its contents are actually the same. That means that a search engine will index all three versions (or possibly six if you have reverse-order options too).

Why the problem? Well, you probably want visitors to see that list in a certain order the first time they visit, let’s say ordered by price, cheapest first. If Google has all six versions of that page in its database, what’s to say it won’t link to your price: descending (i.e. most expensive first) list from its search results? That might make you look expensive and put off potential buyers.

The other issue is link juice – with multiple addresses for the same page, you might have some links to one URL, some to another, all essentially to the same page but for Google, they are different pages. That means the link juice is being split between those different versions of the page. So, using the rel=canonical tag, you can tell Google what the master version of the page is and that therefore, all link juice should be applied to that version and that’s the one that should appear in search results.

This is what it looks like:

<link rel="canonical" href="http://www.website.com/category/product-page">

It goes in the <head> section of each version of the page, so in the product list example, your page would contain the above code regardless of what version is being displayed at the time. This would probably be done automatically by your content management system, so that when a different category of products is being displayed, the canonical tag references the correct category/product list, because it’s likely the same page template is used for all categories.

In effect, the canonical tag works like a 301 redirect, but without you having to mess around with server settings. What changed in December is that now, you can make cross-domain (i.e. cross-website) canonical tags, when before, you could only use it within one domain. So, even those of you with problematic servers (for example, you’re on shared Windows hosting without access to IIS Admin), you can now create “301″-style redirects, avoiding duplicate content issues.

As noted by Rand, there is no problem having the canonical tag in the “master” page.

On-Site Search Box Text Confuses Users

Do you have a search box on your website that contains a phrase like “Enter Search” or similar? Are you using Google Analytics to track Site Search?

I’ve noticed this on several sites for a while, so thought I would post about it. In most cases where there is text already in the search box, instructing the user what to do, that text tops the list of keywords searched for on the site. Take this example:

kac

As you can see, the search box in the top left of the page has the text “Keyword/Code Search…” inserted by default, and it disappears when you click in the box. Can you guess what the most popular keyword used to search on the site is?

search-keywords

Yep, “Keyword/Code Search…” by a long way! What does that tell us about this use of text in the search box on a website?

My opinion is that it isn’t sufficiently clear to the user what they are supposed to do. So they click the arrow next to the box, expecting it to take them to a full search page, but instead, it gives them the search results from the site for “Keyword/Code Search…” In this site’s case, that gives you a full list of all the products in the catalogue, but not a search page – you get the same box and text again.

My take on this is that designers need to be more instructional about what to do with/how to use the search box on a web page. Just putting “Keyword” in the box is not telling people how to use the function, but “Type what you’re looking for in this box” might just work.

New Google Webmaster Tools Labs Features

google_logo_smallGoogle launched a new Labs section of Webmaster Tools today, containing two features. The first is called Fetch as Googlebot, which shows you the page that Google gets when you enter a URL from your website. Quite handy to see what Googlebot sees, particularly HTTP headers. Here’s a screenshot of the tool showing the 301 permanent redirect from the old holding page to my new homepage on the Keyword Examiner site:

webmaster_tools_fetch

The other tool reports any Malware found on your site, but I’m happy to report I can’t give you a screenshot from one of my sites for that! ;)

Google Analytics Campaign Tracking vs. ASPX

Google-Analytics-LogoAs a result of the MyDeco experience (see earlier post), we found that the site in question wasn’t recording campaign tracking (although obviously we can see referring websites). In case you’re not aware of Campaign Tracking, there’s a guide here.

MyDeco are keen for retailers to use campaign tracking to ensure more accurate results with a better quality of data. This is usually done by appending “?partner=mydeco” to the end of any link to the retailer, so that it shows up in their logfiles. If you’re using Google Analytics, this won’t do, as Google wants campaign tracking to be done in its own UTM format (see the guide linked above).

So, we tried this with the site in question, which is hosted on a Microsoft IIS server and written in ASP .NET (.aspx). This had the effect of causing an error – the pages really didn’t like having a query string put on the end of the URL, which is what putting a “?” means. So, we needed a way to get Analytics to accept an alternative character to replace the “?” and thereby stop the website from throwing errors.

The solution, after some searching, was to use the anchor signifier “#” instead of “?”, which the website is happy to accept. However, you can’t just make campaign URLs with “#” instead of “?”, because by default Analytics won’t know what it means. You need to add this line of code to your Analytics tracking code (the code inserted into every page of your website when you set up Analytics):

pageTracker._setAllowAnchor(true);
 

This line of code should be inserted as follows: Find the Google Analytics code in your webpage and add it like this:

var pageTracker._gat._getTracker(’UA-xxxxxx-x’);
pageTracker._setAllowAnchor(true);
pageTracker._trackPageview();
 

I found this tip courtesy of Digital Notions, so hat-tip to them. :)

First Click Free – the solution to Google’s “protected” content problem?

Google logoI was discussing the issues around “hidden” or “protected” content with a client yesterday, specifically the problem that as a website owner you want as much content in the search engine’s index as possible, so that your site will be found, but you don’t actually want humans to see it without registering/paying.

This is an issue that has plagued paid-for content sites for years (see Danny Sullivan’s history lesson here). The problem being that whilst there are pretty simple technical solutions to allowing search engine spiders into your site, whilst preventing access to the casual human browser, pretty much any way of doing this you can come up with constitutes “cloaking” in the eyes of the search engine. If you have a look at Google’s Webmaster Guidelines on the subject, you can understand why this practice is frowned upon – they don’t want users to be taken somewhere they weren’t expecting, as that could severely affect the quality of the user experience and ultimately lead to people using another search engine.

I noticed that Google had made a blog post attempting to deal with this problem while I was on holiday – they want users to be able to find “protected” content because it may be just what they’re looking for, but not at the expense of inviting spam into the index. The solution is simple – allow Googlebot to index your site and when a user finds that page via a Google search, let them see the full page. If they want to access another “protected” page, Google is quite happy for you to require registration/payment; but not for that first page/article they clicked to from the search result. They call it “First Click Free” (FCF), something that has been accepted in Google News search for some time.

Initially, that sounds like a sterling solution. But it doesn’t take long to realise the problems here – firstly, a simple site: command search on Google for the site in question will reveal every page on the site. According to Google’s rules, if you click on any of those pages in the search result, you should see the whole article for free. So, a simple run down the full list of pages provided by that site: search gives you access to every page of paid content on the site in question.

Secondly, there are some simple technologies freely available out there to make you appear to be Googlebot or to make it look like every page you view has been referred from a Google search (here’s just one). So, using these, it would be simple to browse a site conforming to Google’s FCF rules and get access to every page – you wouldn’t even need to keep going back to that site: search listing.

So, what should the webmasters of such sites do? Well, you could take the view that the vast majority of web users have no idea about the site: command, changing user agents or accessing Google’s cache (the “Cache” link that appears under each search result that shows Google’s copy of the page in its database, rather than the “live” page). In which case, the vast majority of your site’s visitors will experience the site just as Google suggests.

However, if this becomes a popular method of allowing Google access to hidden content, how long before tools are developed and widely publicised to make things like changing your user agent incredibly easy? Eventually, there will be enough users doing it to really affect your site. In that case, there are a couple of options:

  1. Create summary pages that contain info “teaser” information to get the user’s attention and to work well enough in terms of SEO. In this case, your full protected pages won’t be accessible to Google or anyone else, but if the pages contain sufficient information and are optimised, they should still appear in searches and therefore do the job.
  2. Change your business model slightly. Allow everyone access to at least one page of protected content when they arrive, then request registration when they move to another page. This is like Google’s FCF model, except it is universal rather than applying only to Google users. If so desired, you could use the <meta name=”robots” content=”noarchive”> tag in the head of your pages to prevent search engines making copies in their cache. However, this may have a negative impact on pages’ performance in search results, as search engines like to compare copies of a page over time to assess its “trustworthiness” and topical relevancy. Remember also that this may restrict crawling of your pages, as Google will experience the site in the same way – it will be able to access one page, but then get the “registration required” message. I would be interested to know if anyone has tried this and whether an XML sitemap gets all the pages indexed anyway?

If I come across any other ideas, I’ll add to this post.

Google’s change of policy on URL re-writes

An interesting post and comments at Search Engine Roundtable regarding Google’s recent statement on re-writing URLs.

Google has somewhat changed its mind about re-writing URLs, as they now claim to be better able to understand dynamic URLs (the sort of query strings you often see in e-commerce website addresses, for instance, along with many content management systems). The reason is that they now see query strings such as “search.php?keyword=toys” as more meaningful to the page’s intention and content than “search.php/keyword/toys”, which is how many URLs are re-written. The structure of the former is now properly identified by Google as a search term, whereas previously it may have had little meaning. Converseley, the latter now looks like a page three layers deep in the site, but doesn’t necessarily represent a search query, so Google is less likely to identify the true purpose of that page.

My take on this is that if you are re-writing URLs from something meaningless such as “page.php?id=76″ to something meaningful like “page.php/seo-urls-still-good”, that still helps both the search engines and users to understand the contents of the page and I would continue to use it. If you are re-writing search queries like the “toys” example above, maybe you could try a few without the re-writes – but remember that you could lose the PageRank of the originals, so be sure to 301 re-direct the old URLs to the new ones (and update your sitemap accordingly!)

Don’t forget to update your XML sitemap!

I was shown this in a client’s Webmaster Tools earlier this week:

We had carried out a number of 301 redirects on some of their pages, as for reasons known only to the original developer, a lot of pages had been created as sub-domains, which was causing duplicate content and indexing issues with Google.

What I wasn’t aware of, was that there isn’t any code in the site to auto-update the sitemap.xml file provided to Google Webmaster Tools. I hadn’t seen the error above before – clearly, Google is unhappy if too many of the URLs in your sitemap don’t match what it sees on the site. A lot of those URLs of co urse no longer exist (e.g. the sub-domains), so we have updated the sitemap using GSiteCrawler – it’s a bit techie, but it certainly does the job and can be scheduled to make regular updates with automatic FTP of the new sitemap.xml file.

So, if you’re making changes to your site, remember to update your sitemap.xml files!

Is your web server’s location damaging your rankings?

I’ve been asked to do some search engine optimisation for classical guitar shop, Kent Guitar Classics. We’ve only just begun the keyword research phase, so don’t flame me for the site’s current SEO!

What I noticed whilst conducting that research, is that even for the name of the business (usually an easy number one spot unless you have a very generic business name), the site only comes second when using the “pages from the UK” option in Google. The number one result using “pages from the UK” is a page on the Venezuelan UK embassy’s website! As you would expect, Kent Guitar Classics comes first if you just search “the web” using google.co.uk. Here are a couple of screenshots for posterity:

 

Kent Guitar Classics web search

 

Kent Guitar Classics UK search

 

A bit of investigation using a whois service like Domain Tools shows that the website is hosted in Oslo!

Kent Guitar Classics whois lookup

 

Why is this important? Well, Google’s search results are biased according to the country in which the search is being performed. This is because it knows that most searchers are looking for something local to them. Google uses lots of information to decide whether a site is in the same country as the searcher: the domain extension (e.g. .co.uk), the postal address on the site (if it can find one), the geographic-targeting setting in Webmaster Tools, links from local websites and quite possibly numerous other factors.

One other factor is the physical location of the web server, i.e. if it is hosted in the same country. Clearly, in Kent Guitar Classics’ case, it isn’t – it’s hosted in Norway. As a result, one of the big pointers Google uses to determine a site’s country of origin is way off. Naturally, I have advised Miles at Kent Guitar Classics to move server.

 

An interesting aside I noticed while researching the site’s setup is that for some reason, the deafult homepage for www.kentguitarclassics.com is index.html, but the homepage appears to be index.asp. This could be another problem for Google, as it doesn’t like “bounce”-type redirects. A quick disabling of Javascript and meta refresh tags using the excellent Web Developer Toolbar for Firefox means that I can see this page:

Kent Guitar Classics redirect page - click to enlarge

Kent Guitar Classics redirect page - click to enlarge

 

Examine the code, and there is a Javascript redirect to index.asp – not something that Google will take particularly kindly to. This could be because the developer originally used index.html and when the change to index.asp was made, they didn’t want to break peoples’ bookmarks, so they used a redirect to ensure everyone still got the homepage.

This is one of the problems with Windows web servers running Internet Information Server (IIS) – there isn’t an easy way to create permanent (301) redirects, because the .htaccess files used by Apache (the usual web server on Linux machines) mean nothing to IIS. Instead, you either have to code the redirect into the page using ASP, or make changes directly in IIS (or install an ISAPI filter), which on anything but a dedicated server, the host won’t let you near.

That’s a completely separate problem to the physical location of the server, but I thought I’d mention it whilst looking at that site! :)

Google Chrome experience

Just read this short article that I think is a good overview of Chrome from a user’s perspective, following last week’s release.