Wednesday, 25 January 2012

inexpensive iSCSI solutions

We are in the unenviable position of upgrading many servers to meet the growing requirements of an ever increasing sales force. Basically we need a new Exchange server, whilst we are at it, the additional data means more backup space is needed... oh and could we have another 4 full development environments, P.S. the SAP server could use another 500GB.

Servers are easy in today's new order of virtualising everything. And indeed given the way that we retire production web kit to development servers, CPU and memory resource is cheap. Hard disks however are a bit more difficult. Yes hard disks are cheap, but hard disk enclosures fill up fast, then your only choice is to get *all* new disks rather than just adding a couple more. Once you do proper capacity planning you suddenly have some quite big bills on your hand and actually you are no better off in a years time :(

Yes, the solution is obvious, get a  nice big SAN:
  • It makes vitalisation simpler
  • Improves your redundancy by allowing you to easily move vm's between hypervisors
  • Its fast
  • Very flexible (you can repartition on the fly)
  • High disk redundancy
A quick lesson on what a SAN is. Essentially SAN falls under the category of external storage. Its a big dedicated disk array, highly optimised to deliver data over the network, sometimes over its own dedicated network to reduce interference from other devices. Here comes the hard part... SAN stands for Storage Area Network as apposed to NAS or Network Attached Storage. The difference is that a NAS looks like a shared drive on the network that anybody can attach to (provided they have permission) it looks for all the world like a server in its own right.

A SAN is somewhat more esoteric, normal network users cannot even see the device. It presents itself like a raw hard drive usually through an iSCSI interface that you attach to and use literally like a real internal hard drive. The big thing is that normally only one server uses that iSCSI partition, most often it is the main drive for a virtual machine.

The principle difference between a SAN and a NAS is in the way that data is transferred between the network storage decice and the data consumer. On a NAS data is transferred as files in a structured manner, irrespective of how the data is represented on the disk. On a SAN the data is transferred at a block level mimicking the way data is stored in blocks on the disk. In theory a SAN is more efficient and faster than a NAS as it is optimised for data transfer rather than file structure and so the data can transfer without having to be assembled up into files. Very good for large unstructured data, i.e. databases. In practice, there actually isn't a lot of difference until you get to quite large devices.

The big selling point with a SAN nowadays is it's compatibility with VM-Ware and other Hypervisors, giving you the ability to leave the hard disk in one place and run the actual server on any old CPU that you have spare without having to transfer terabytes of data around!

So, SAN == Good. Unfortunately it also usually == $$$ (about £20K in 2012) especially for a reasonably large (>10TB) or redundant (RAID 50 + redundant PSU) often with about 1 months wait for assembly and commission.

OK, we need to put something in place as an interim solution. It doesn't have to be that high availability, it's only for the development environments until the full SAN arrives. Step up to the mark Netgear with the ReadyNAS Pro. These great little NAS devices also publish themselves as an iSCSI endpoint. Put four 2TB disks in it and you have 6TB of RAID which can easily service 4 Apache + Glassfish stacks for about £1000 (2012 prices) hell have 2!

Little Netgear NAS devices are actually pretty good at this, they have dual GbE networks, built in backup and replication as well, in fact we have a company that has often helped us who would like an offsite backup solution and we will probably re-purpose this for him after we have learnt it's capabilities! Alan Schofield has a huge repository of photographs that he has always backed up manually. Two NAS devices (one at his office and one in ours 5 miles away) will give him a maintenance free solution and let him concentrate on wonderful photography rather than IT!

Sounds like win-win to me :)

The after glow of the light-bulb

We put the jQuery code live today at 8:30 am. Possibly the fastest rollout in the history of the website!

On Friday it dawned on me that animated banners on the site could be built automatically, I briefed an old friend on the weekend, code went into test Monday, live wednesday (with very little change.)

The change has saved about 2 man months development, pulled a 1 month waiting list for flash development back go zero and freed up 2 flash developers to do the creative tasks. Going forward we will never have to build another flash image!

We had an interesting meeting today regarding SEO packages offered by other companies and what we could offer our clients in addition to the current packages. The conclusion was that we could indeed put together a package of external SEO add-ons, mainly centred around external links from tailor made blogs and forum linking. My only concern is that it generates a lot of manual work that has to be recreated every month if it is to have any impact. It will not be cheap or nearly as effective as buying a hotlink on the site which gives measurable referrals!

What giving an SEO package does do is give another package for our sales team to offer, we all know nothing is garanteed in SEO after all!

Sunday, 22 January 2012

Weekends, the new productivity

I received a terrific email on Friday. One of the client services team had great feedback from a customer that we were now the highest referrer to there site after Google. This is obviosly great news, but not because we have changed anything other than fixing the way Google measures our referrals. Over the next few months we will be rolling out more referral drivers to the site along with some interesting search enhancements to provide direct linking.
This weekend I persuaded an old friend of mine to do a little jQuery work for me. It's been very productive, achieving about a weeks worth of work in an afternoon. jQuery has some huge advantages over Flash. Principle being the way it works on mobile devices where flash doesn't work, but crucially in our useage it saves us a great deal of manual processing time. The only difficulty is finding people who can really get their heads around it!
The coming week brings a visit to the Rackspace datacentre, this will be my first visit despite having dealt with them for the past 5 years and having hosted the best part of 100 servers with them!

We are still showing Business Magnet having downtime nearly 40% of the time in the last week; the best part of 3 full days down and it is still going on (7.5 hours out last night alone.) Not good for them, but then Applegate used to get those kind of outages; however nobody saw them because they were out of hours. We have had zero unexpected downtime for the last 3 months now, despite search load increasing by a factor of 3!

Thursday, 19 January 2012

Business Magnet still down

So Business Magnet are still down, 48 hours now. Currently they are showing a 500 error, which on IIS means 'oh shit the script has serious bugs' although there have been a few 404 errors (page not found) mostly its been DNS resolution errors, as the main web server appears to also be the main DNS server. The secondary DNS also appears to be one of the web servers, I wouldn't be surprised if it was the database as well.

Having your Web server on the same box as your DNS isn't good idea (as I am sure Business magnet are discovering) there are lots of reasons, DNS servers have a habit of being compromised and web servers need to be taken down sometimes, you really don't want one affecting the other, especially when DNS servers are usually given away free with your domain name registration!

48 hours is a long time, we have seen search results dropping places in Google already (a couple of places in most cases that we monitor.) I am starting ro feel sorry for their IT team, there must be some sweaty palms around there by now.

But even given they are having problems somebody needs a slap. Why is there not a holding page apologising for the problems? Get a move on, it will help; it makes you look more professional and it will reduce the load on your servers giving you breathing space to recover the application. Then put a proper load balancing firewall in place and put in a fault tolerant architecture with backup. Better still don't use a cheap hosting solution and use a proper hosts with fanatical support

What happened to my statistics?

Once upon a time, when websites were just HTML, stats on a site came from counting how many lines were in the server logs. Things evolved and businesses blossomed and became too expensive for mere mortals (remember webtrends?) but those big packages could filter out the bots and handle big page views, they could even make a good stab at the differnce between pageviews, visitors and on the big expensive packages unique views (even if it was only a guess!)

Then along came Google... they had to have a pretty good analytics solution to make their advertising solutions work, they got even more information if you used it as well and all you had to do was add a little bit of code to your site... Great! almost, umm well pretty good... except it doesn't work for people who have javascript disabled, it won't measure image or asset views, some proxies seem to block it and we see that the script fails some of the time from our monitoring.

More importantly Google doesn't measure how many search engines spider your site, which is interesting considering Google is best known as a, um, search engine. I have seen some sections of sites where bot views are 40x the number of real human views! Sites that get over a million bot searches a day to 25,000 real views, so proper sites with circa 10,000,000 pages.

Here is the rub, to measure bot activity or asset usage you need log based analysis, but Google doing analytics for free seems to have put all the serious ones out of business! I spoke to webtrends, they told me they had stopped doing log based stats; they have gone to tag based tracking like DC Storm. These packages are great for tracking specific actions and really good for measuring marketing/advertising effectiveness and attributing sales/conversions.. But rubbish for finding out how much load your server is getting or if people are stealing images off you!

Interestingly one of the last good log based analytics packages is Urchin, now owned by Google it's also really hard to get hold of, only being available through authorised re-sellers.

So we had a demo of IBM Unica Netinsight. This is log based analytics on steroids. It reads your logs onto a database (which you can query with ODBC if you like) combines your logs with tags for even better event tracking, combines the logs with your own CRM database or even GIS address data and presents it all in a web based data analysis package that seems to be actually usable... unlike many packages that present you with a uselessly documented complex blank screen from which you need to pry your data kicking and screaming (are you listening DC Storm!) We are learning a lot just by playing with the interface... useful things that are changing the way we view users... which is what analytics is supposed to be!

FIRST POST and competitor woes make you think.

So after many promises to start blogging but completely failing to do so I have bitten the bullet and gotten on with it... so to get it out the way... FIRST POST!

Todays main point of interest was to see one of applegate directories (who I work for at the time of writing) competitors went down today! At the time of writing business magnet was down for 26 hours. Life will be interesting for their head of IT I am sure, tbh I am surprised there is not at least a holding page there; it looks suspiciously like a major infrastructure failure given that even their DNS servers are dead...

The interesting part of a competetor failure is the way it makes you look at your own infrastructure. We currently run mirrors of all the servers at applegate, in some cases triple, even 4 way redundancy to handle load spikes, their are currently 12 servers in the applegate cluster. I am working with the head of IT to get competative quotes for the next generation of servers, vitualisation would seem a wise route, but think about it... 12 servers, 24GB RAM, dual quad core chassis on each, just how big will the hypervisors have to be??? 3 hypervisors will mean 96GB RAM on each to deal with any DR issues, 16 cores minimum! Big iron for a single website. Our discussions with Rackspace are getting interesting!

The next most interesting  discussion today has been one of SEO. Yes that old bugbear has raised it's head again, I am sure I will be discussing it a lot in this blog! Applegate has a lot of very 'old school' SEO techniques throughout the site, mostly revolving around multi-level repetative internal linking. It makes for a very complex user funnel... more sieve actually, it's very difficult to work out user flow through the site, but it does give very high ranking to some very unexpected pages, nearly all our index pages are page 1 in Google for instance, often number 1 page 1 in bing! yet the advertising on these pages is very under utilised... I think that may change very soon :)

The recent changes I pushed through with hotlinks now seem to be making serious headway. Certainly sales are reporting *massive* uplift in referrals. The big issue has been with Google analytics not reporting the referrals correctly. It appears that on Internet Explorer referrals are mis-attibuted to the first referrer ever used, so if the site was initially found through Google, that is where all further referrals are attributed. Seems to work fine in Firefox or Chrome, funny that! We found a fix, having implemented it on the 15th of January we have seen an immediate upturn on clients analytics, finally some headway!

More to come in future blogs, once I am convinced on the Google analytics fix I will document it more, in the meantime I have reported it on their forums to see if it gets recognised!