Ethics in Web Statistics

How important are Website statistics?

Increasingly we hear about the importance of our personal information to marketers, companies and other organisations, but is that information worth more than the actions we might perform – whether it be purchasing an item or accessing a resource?

There will be cases where it is true: If you wish to target a specific group with your direct mail marketing campaign knowing their interests is essential for financial viability. Likewise – but irrespective of the number of people you market to – knowing how to speak to them, to take advantage of their interests, desires and culture is information that could make or break a campaign.

E-mail marketing can to some extent use sheer numbers to overcome lack of information, while websites may benefit from viral campaigns, advertisements, and the stumble upon factor – but would you sacrifice your stumble-upon traffic to gain information about the rest of your visitors?

I use a custom HOSTS files in order to block certain advertisement sites and web statistics servers, requests to them going to localhost instead. The main reason I do this is to block adverts on some social networking sites, where the various banners strewn about the page make the content tougher to read. Recently a number of sites I have visited have failed to return pages when I click on their links, simply because my zealous hosts file spots that the URL is a webstats server such as with the true, requested resource tacked on the end as a redirect.

The resource provider has chosen to deny my request unless I provide some data to their third party statistics provider. The ethics of this are interesting, in that the resource provider could harvest the same information from me if they were to handle it themselves, but by outsourcing (in this way, or rather by this method) they empower me to refuse.

But should I still have access to the resource, despite my protestations? Who loses out the most – I do not get access to the resource, but the resource provider has failed to inform me and failed to gather my information. One of the sites I commonly visit (a technology retailer) has failed to promote a product to me, while another (a charity) has failed to inform me of their campaign.

As I’m the one who can simply copy the requested resource straight from the querystring and paste it into the address bar (or use Greasemonkey to automate the process) , reasonably assured that using a third-party statistics provider probably means that my data isn’t even being stored in a useful way, I think I probably win. However, my habits, my interests, my hit, doesn’t register with the resource provider, so my kind lose out too.

I’m currently participating in a rollout of SiteStat, and their ClickIn feature uses just the method outlined above. This is not a method I would ever consider deploying – and conversation with the people behind some of the sites I visit show that when you point out the problem, they’re concerned about it too.

edit: There are a few other concerns I have about this method – it’s effect on otherwise RESTful URLs for example, and the mess it may make of your internal search spidering.

For some sites denying a resource under these circumstances would be perfectly fine, however is it wise to document and promote this method to your statistics-hungry customers, with all it’s pitfalls? Is a counting mechanism sound if it cannot count those who don’t want to be counted?