Wednesday 5 December 2012

The Perils of Duplicate Content

Currently one of the most frequent questions I am asked is - Does duplicate content really matter? Just for a change I can give a definitive answer - As far as Google is concerned at the moment; Yes!

As with all these questions what is really meant is does this matter to Google and we can see from Matt Cutts blog and the Google Webmasters Blog that it's not just a myth, You will be pushed down the rankings for doing it.

Google actually identify duplicate content down to short quotes, if you don't link it to the originators site as a quote you *will* be marked down and the content will have a lower ranking in the Search Engine Results Page (SERP.) THere is a short YouTube Video about it.

Overnight this has destroyed the concept of article "syndication" where a journalist or copywriter would write an article one and send it to multiple publications, being paid for each one. I have seen writers weep, on stage as they have recanted this. I feel for them, but that's how the internet has changed life (I was fed up with reading the same article in different motoring magazines anyway!)

The problem with all of this is that it is retrospective. All those old news articles on your high authority site? check them as they can suddenly be marking you down. Those datasheets that just said what was on the main article? The old ideas that if you keep saying the same thing over and over on different pages in your site to just to drive up the rankings? Now we all knew it was 'cheating' now you get penalised for it.

A really common mistake I am coming across on even large sites is page replication across domains and URL's. I thought this was obvious and people knew, but I will spell it out... Having www.mysite.com and www.mysite.com/home as the same page is duplicate content. I have tested it, having removed the page and redirecting all the incorrect links and watched the site climb up the search rankings over the next few *days* This really seems to make a difference!

The other really simple mistake is buying all the domain names that relate to you and pointing them at your site. www.mysite.com having the same content as www.mysite.co.uk *is* duplicate content! It seems really obvious, because it is! but if you must have multiple domains redirect them, better still make them relevant to the country/business type. Best advice, if the domain doesn't relate to you, don't buy it. You may not want www.mysite.xxx as a porn site, but knowing that xxx is blocked as porn, do you want it pointing to your home page? Leave it alone!

If you are worried, get somebody that knows what they are doing to look at it. Just fixing the stupids will make a remarkable difference!

Thursday 20 September 2012

Fun with Windows 8 Preview

I've spent the last few days playing with the Windows 8 Consumer Preview. Indeed I am using it now to post this entry.

I am using an Asus EP121 as a test machine:
  • 64GB hard drive
  • Additional 64GB SD Card
  • Bluetooth Keyboard (essentially a Microsoft 6000 keyboard re-branded by Asus)
  • Microsoft Arc Touch Wireless Mouse (with RF dongle)
First impressions were very good, I had more issues backing up my existing Windows 7 installation than I did installing a fresh version (Windows backup really needs to be left on it's own... Don't try and do anything whilst it's working, just be very, very patient!)

The installation itself worked flawlessly without the need for a mouse or keyboard, it picked up the touchscreen and configured itself perfectly. Several minutes later I was faced with a fresh looking 'Metro' interface, clean and uncluttered with a nice new desktop behind all using what looks like a cleaner font and nice new background images that fade in and out.

However beyond the skin things get more difficult. I needed to join my home domain; normally I do this by right clicking the computer entry on the start menu and choosing properties. But there is no start menu in Windows 8... OK, system in the control panel, um control panel is on the start menu. Choosing settings from the Charms menu (slide finger in from the right of the screen) produces no sign of the control panel. Eventually I cheated (Windows key + R -> run menu -> type control) and hey presto the control panel appears and I could choose system to join the domain. I have found since that the charms menu is context sensitive, so that when you are in desktop mode (traditional windows interface) the settings button takes you to a sub menu that does have the control panel. It is somewhat less than intuitive!

Windows 8 joined the domain fine, as usual all the homegroup stuff stops working. Many of the sample applications seem to use aspects of homegroup and library functionality mixed in with live/Xbox sign in, much of which doesn't work properly at the time of writing.

Lets see how it progresses!

Friday 3 February 2012

What to Index

After another brainstorm today on search performance we discovered an interesting anomaly on how one of our competitors submit pages to Google. Its quite clever the way Yell submit pages, essentially client pages are not submitted to search engines in the traditional manner, the keyword phrases that link to the clients are; leading to very high keyword density for their chosen phrases. It's actually quite cunning.  I'm going to implement it over the next few weeks as an additional sitemap. If it works I'll document it properly!

We had a little bit of a mess up with one of our suppliers that came to a head this week. Softcat were chosen to supply some tablet PC's and laptops. To be fair I was a picky bastard insisting on a Samsung Tablet PC which we ordered about 3 weeks ago. It was very difficult to source 3 weeks ago, now of course it's on ebuyer with next day delivery and all the accessories included. Needless to say when we received the pc from softcat we checked the contents of the box to find that there was no keyboard or stand, they were optional extras that were not available in this country. Annoying... we could have bought it with everything included and got it a week earlier from Ebuyer... hmmmm time to return it and buy it from someone else!

Contacting softcat we found that as we had opened the box we could not return it. Despite literally not even touching the PC! The sales rep was insistent that we could not return the PC, even though we had actually ordered 2 slate PC's, plus a 17" HP laptop and needed a further 3 laptops and 2 more slate PC's; probably £8K worth of kit. Nope, tough. You open the package, you own it. Guess what we did? sent everything except the opened box back. Sorry, that is shit customer service, I'll go to Ebuyer and use my credit card thank you.

One problem... I managed to upset our assistant Systems Admin in the process (Laura) it wasn't her fault, I did pester her and it all got got pretty confusing and I got pretty annoyed at softcat, I expect it rubbed off. Damn it. So I have some crawing to do. In summary, Softcat you are useless and your customer support is shit. You have cost me £1000 and I've upset Laura which I really didn't want to do :(

Thursday 2 February 2012

Fanatical Day Out

I went to visit Rackspace at their UK office today, ostensibly to discuss new hosting options and provide due diligence for our current solution. It was a genuinely interesting trip; I thoroughly recommend a visit if you get the opportunity. We discussed at length different virtualisation options with Bruce who is one of their pre sales technical support guys. Again I ended up learning a thing or two about oracle, VM-Ware and SAN utilisation. Its all about the IOPS!

We were shown around the offices by Patrick Williams whom I have dealt with at Rackspace for about 3 years now, we have sparred over pricing every time; some say haggling, some say refinement of solution architecture!

The Rackspace offices themselves are pretty cool. 700 people in one office and every one a smiling face (despite our bad taste in suits!) and Rackspace themselves are not only fanatical about their client service and support, but also about their employee support.

After visiting Rackspace we visited Titan internet (now part of the Iomart group) who have provided hosting for part of our services for a few years now. It was unfortunate that the monitoring system went down during the day, locking itself up whilst still responding to ping requests. No real hassle as during the day a our intensive glassfish consoles give the developers far greater granularity, but annoying none the less as our SMS and email alerts were not online whilst we were away from the office. I spotted it fairly early as we didn't get the normal 7:30 alert as the indexes were rebuilt, but it's a manual process that needs a sysadmin to start it. Somewhat of a highlight that we are awfully dependent on the skill of our internal personnel, something we really need to automate so we can concentrate on innovating rather than managing what we already have!

Wednesday 25 January 2012

inexpensive iSCSI solutions

We are in the unenviable position of upgrading many servers to meet the growing requirements of an ever increasing sales force. Basically we need a new Exchange server, whilst we are at it, the additional data means more backup space is needed... oh and could we have another 4 full development environments, P.S. the SAP server could use another 500GB.

Servers are easy in today's new order of virtualising everything. And indeed given the way that we retire production web kit to development servers, CPU and memory resource is cheap. Hard disks however are a bit more difficult. Yes hard disks are cheap, but hard disk enclosures fill up fast, then your only choice is to get *all* new disks rather than just adding a couple more. Once you do proper capacity planning you suddenly have some quite big bills on your hand and actually you are no better off in a years time :(

Yes, the solution is obvious, get a  nice big SAN:
  • It makes vitalisation simpler
  • Improves your redundancy by allowing you to easily move vm's between hypervisors
  • Its fast
  • Very flexible (you can repartition on the fly)
  • High disk redundancy
A quick lesson on what a SAN is. Essentially SAN falls under the category of external storage. Its a big dedicated disk array, highly optimised to deliver data over the network, sometimes over its own dedicated network to reduce interference from other devices. Here comes the hard part... SAN stands for Storage Area Network as apposed to NAS or Network Attached Storage. The difference is that a NAS looks like a shared drive on the network that anybody can attach to (provided they have permission) it looks for all the world like a server in its own right.

A SAN is somewhat more esoteric, normal network users cannot even see the device. It presents itself like a raw hard drive usually through an iSCSI interface that you attach to and use literally like a real internal hard drive. The big thing is that normally only one server uses that iSCSI partition, most often it is the main drive for a virtual machine.

The principle difference between a SAN and a NAS is in the way that data is transferred between the network storage decice and the data consumer. On a NAS data is transferred as files in a structured manner, irrespective of how the data is represented on the disk. On a SAN the data is transferred at a block level mimicking the way data is stored in blocks on the disk. In theory a SAN is more efficient and faster than a NAS as it is optimised for data transfer rather than file structure and so the data can transfer without having to be assembled up into files. Very good for large unstructured data, i.e. databases. In practice, there actually isn't a lot of difference until you get to quite large devices.

The big selling point with a SAN nowadays is it's compatibility with VM-Ware and other Hypervisors, giving you the ability to leave the hard disk in one place and run the actual server on any old CPU that you have spare without having to transfer terabytes of data around!

So, SAN == Good. Unfortunately it also usually == $$$ (about £20K in 2012) especially for a reasonably large (>10TB) or redundant (RAID 50 + redundant PSU) often with about 1 months wait for assembly and commission.

I WANT MY SAN TOMORROW
OK, we need to put something in place as an interim solution. It doesn't have to be that high availability, it's only for the development environments until the full SAN arrives. Step up to the mark Netgear with the ReadyNAS Pro. These great little NAS devices also publish themselves as an iSCSI endpoint. Put four 2TB disks in it and you have 6TB of RAID which can easily service 4 Apache + Glassfish stacks for about £1000 (2012 prices) hell have 2!

Little Netgear NAS devices are actually pretty good at this, they have dual GbE networks, built in backup and replication as well, in fact we have a company that has often helped us who would like an offsite backup solution and we will probably re-purpose this for him after we have learnt it's capabilities! Alan Schofield has a huge repository of photographs that he has always backed up manually. Two NAS devices (one at his office and one in ours 5 miles away) will give him a maintenance free solution and let him concentrate on wonderful photography rather than IT!

Sounds like win-win to me :)

The after glow of the light-bulb

We put the jQuery code live today at 8:30 am. Possibly the fastest rollout in the history of the website!

On Friday it dawned on me that animated banners on the site could be built automatically, I briefed an old friend on the weekend, code went into test Monday, live wednesday (with very little change.)

The change has saved about 2 man months development, pulled a 1 month waiting list for flash development back go zero and freed up 2 flash developers to do the creative tasks. Going forward we will never have to build another flash image!

We had an interesting meeting today regarding SEO packages offered by other companies and what we could offer our clients in addition to the current packages. The conclusion was that we could indeed put together a package of external SEO add-ons, mainly centred around external links from tailor made blogs and forum linking. My only concern is that it generates a lot of manual work that has to be recreated every month if it is to have any impact. It will not be cheap or nearly as effective as buying a hotlink on the site which gives measurable referrals!

What giving an SEO package does do is give another package for our sales team to offer, we all know nothing is garanteed in SEO after all!

Sunday 22 January 2012

Weekends, the new productivity

I received a terrific email on Friday. One of the client services team had great feedback from a customer that we were now the highest referrer to there site after Google. This is obviosly great news, but not because we have changed anything other than fixing the way Google measures our referrals. Over the next few months we will be rolling out more referral drivers to the site along with some interesting search enhancements to provide direct linking.
This weekend I persuaded an old friend of mine to do a little jQuery work for me. It's been very productive, achieving about a weeks worth of work in an afternoon. jQuery has some huge advantages over Flash. Principle being the way it works on mobile devices where flash doesn't work, but crucially in our useage it saves us a great deal of manual processing time. The only difficulty is finding people who can really get their heads around it!
The coming week brings a visit to the Rackspace datacentre, this will be my first visit despite having dealt with them for the past 5 years and having hosted the best part of 100 servers with them!

We are still showing Business Magnet having downtime nearly 40% of the time in the last week; the best part of 3 full days down and it is still going on (7.5 hours out last night alone.) Not good for them, but then Applegate used to get those kind of outages; however nobody saw them because they were out of hours. We have had zero unexpected downtime for the last 3 months now, despite search load increasing by a factor of 3!

Thursday 19 January 2012

Business Magnet still down

So Business Magnet are still down, 48 hours now. Currently they are showing a 500 error, which on IIS means 'oh shit the script has serious bugs' although there have been a few 404 errors (page not found) mostly its been DNS resolution errors, as the main web server appears to also be the main DNS server. The secondary DNS also appears to be one of the web servers, I wouldn't be surprised if it was the database as well.

Having your Web server on the same box as your DNS isn't good idea (as I am sure Business magnet are discovering) there are lots of reasons, DNS servers have a habit of being compromised and web servers need to be taken down sometimes, you really don't want one affecting the other, especially when DNS servers are usually given away free with your domain name registration!

48 hours is a long time, we have seen search results dropping places in Google already (a couple of places in most cases that we monitor.) I am starting ro feel sorry for their IT team, there must be some sweaty palms around there by now.

But even given they are having problems somebody needs a slap. Why is there not a holding page apologising for the problems? Get a move on, it will help; it makes you look more professional and it will reduce the load on your servers giving you breathing space to recover the application. Then put a proper load balancing firewall in place and put in a fault tolerant architecture with backup. Better still don't use a cheap hosting solution and use a proper hosts with fanatical support

What happened to my statistics?

Once upon a time, when websites were just HTML, stats on a site came from counting how many lines were in the server logs. Things evolved and businesses blossomed and became too expensive for mere mortals (remember webtrends?) but those big packages could filter out the bots and handle big page views, they could even make a good stab at the differnce between pageviews, visitors and on the big expensive packages unique views (even if it was only a guess!)

Then along came Google... they had to have a pretty good analytics solution to make their advertising solutions work, they got even more information if you used it as well and all you had to do was add a little bit of code to your site... Great! almost, umm well pretty good... except it doesn't work for people who have javascript disabled, it won't measure image or asset views, some proxies seem to block it and we see that the script fails some of the time from our monitoring.

More importantly Google doesn't measure how many search engines spider your site, which is interesting considering Google is best known as a, um, search engine. I have seen some sections of sites where bot views are 40x the number of real human views! Sites that get over a million bot searches a day to 25,000 real views, so proper sites with circa 10,000,000 pages.

Here is the rub, to measure bot activity or asset usage you need log based analysis, but Google doing analytics for free seems to have put all the serious ones out of business! I spoke to webtrends, they told me they had stopped doing log based stats; they have gone to tag based tracking like DC Storm. These packages are great for tracking specific actions and really good for measuring marketing/advertising effectiveness and attributing sales/conversions.. But rubbish for finding out how much load your server is getting or if people are stealing images off you!

Interestingly one of the last good log based analytics packages is Urchin, now owned by Google it's also really hard to get hold of, only being available through authorised re-sellers.

So we had a demo of IBM Unica Netinsight. This is log based analytics on steroids. It reads your logs onto a database (which you can query with ODBC if you like) combines your logs with tags for even better event tracking, combines the logs with your own CRM database or even GIS address data and presents it all in a web based data analysis package that seems to be actually usable... unlike many packages that present you with a uselessly documented complex blank screen from which you need to pry your data kicking and screaming (are you listening DC Storm!) We are learning a lot just by playing with the interface... useful things that are changing the way we view users... which is what analytics is supposed to be!

FIRST POST and competitor woes make you think.

So after many promises to start blogging but completely failing to do so I have bitten the bullet and gotten on with it... so to get it out the way... FIRST POST!

Todays main point of interest was to see one of applegate directories (who I work for at the time of writing) competitors went down today! At the time of writing business magnet was down for 26 hours. Life will be interesting for their head of IT I am sure, tbh I am surprised there is not at least a holding page there; it looks suspiciously like a major infrastructure failure given that even their DNS servers are dead...

The interesting part of a competetor failure is the way it makes you look at your own infrastructure. We currently run mirrors of all the servers at applegate, in some cases triple, even 4 way redundancy to handle load spikes, their are currently 12 servers in the applegate cluster. I am working with the head of IT to get competative quotes for the next generation of servers, vitualisation would seem a wise route, but think about it... 12 servers, 24GB RAM, dual quad core chassis on each, just how big will the hypervisors have to be??? 3 hypervisors will mean 96GB RAM on each to deal with any DR issues, 16 cores minimum! Big iron for a single website. Our discussions with Rackspace are getting interesting!

The next most interesting  discussion today has been one of SEO. Yes that old bugbear has raised it's head again, I am sure I will be discussing it a lot in this blog! Applegate has a lot of very 'old school' SEO techniques throughout the site, mostly revolving around multi-level repetative internal linking. It makes for a very complex user funnel... more sieve actually, it's very difficult to work out user flow through the site, but it does give very high ranking to some very unexpected pages, nearly all our index pages are page 1 in Google for instance, often number 1 page 1 in bing! yet the advertising on these pages is very under utilised... I think that may change very soon :)

The recent changes I pushed through with hotlinks now seem to be making serious headway. Certainly sales are reporting *massive* uplift in referrals. The big issue has been with Google analytics not reporting the referrals correctly. It appears that on Internet Explorer referrals are mis-attibuted to the first referrer ever used, so if the site was initially found through Google, that is where all further referrals are attributed. Seems to work fine in Firefox or Chrome, funny that! We found a fix, having implemented it on the 15th of January we have seen an immediate upturn on clients analytics, finally some headway!

More to come in future blogs, once I am convinced on the Google analytics fix I will document it more, in the meantime I have reported it on their forums to see if it gets recognised!