Are Analytics Services Sharing Your Personal Browsing History?

A huge part of social media measurement involves studying the Web analytics of your blog (or your company’s blog). There are numerous free and subscription-based services available that will tell you all kinds of things from how your traffic compares to the competition to what search terms people are using to find your blog. Compete.com is one such analytics service.

I think my issues with faulty metrics are fairly well documented. I know to take all data from these types of sites with a grain of salt. After all, samples can lead to educated guesses at best; terribly inaccurate data at worst. And I’ll give Compete some credit for getting at least some information (close to) correct about some of my sites — generally their unique visitor estimates aren’t too far off for me.

I take issue with other data the company is providing though — data that I consider to be a violation of my privacy, and data you might be unknowingly sharing as well. This post is about sharing something I recently discovered occurring on Compete.com, as well as my opinions on the ethics and defenses of it.

Unusual Referrals and What They Told Me

Periodically I run simple site comparisons through Compete’s free tool to see general trends — usually my primary blog and competitors / colleagues in the niche. It’s a good way to see if overall the niche is seeing increases in readership or how my blog is faring compared to others. And that’s fine.

But when I ran a search in July, I noticed a section in the results called “top destination sites.” These are sites people are visiting after they visit my blog. And some of them didn’t make sense. I knew for a fact that my blog was not referring traffic to some of these “top” sites shown. I checked my internal stats. And I was right. One site on the list was another one I owned. I checked the internal stats on that too. And indeed there was no traffic going from Blog A to Blog B.

Compete.com Top Destination Sites Report

Compete.com Top Destination Sites Report (free version)

OK. Then hold on. How can these sites be showing up as referred traffic? I realized something — these weren’t sites that my readers were being referred to from my blog. They were other sites I personally visited frequently. They were sites I visited directly via type-in traffic — not by clicking a link.

Note: My primary blog is set as my homepage in the browsers I use because I check the admin area frequently during the week. Therefore that page is naturally displaying when the browser first opens, and before I type in any other website URL into my navigation bar.

Well, hold on again. Why the heck does Compete know what I’m privately typing into my Web browser’s navigation bar, when that information should be between me and the site I’m directly visiting? Let’s just say I was “not happy” when I realized what was going on.

Essentially, here’s what we had:

  • My personal type-in browsing history was showing up for all the world to see via Compete’s free search tool (not just to paying members).
  • Anyone who knows me at all could come to the logical conclusion that these sites were my own browsing history and not that of my general readership (making it personally identifiable).
  • Sites that I might not want associated with said blog were showing as being “referred” by my blog — which implies some level of support or connection even when none existed. Even where a connection did exist (my ownership of two sites), that connection was not one of Site A referring anyone to Site B. The two sites should not have been connected in any way publicly in these stats.

I’m a big stickler for privacy. I read terms and conditions far more often than your average Joe. When companies specifically ask if they can collect information about what I’m doing on my computer (usually under the guise of reporting anonymous data back to them so they know how their software or whatever is performing), I say “no.” I’ve also yet to see any terms and conditions specifically grant a company permission to sell my type-in browsing history for public reporting in any way that could be even remotely personally identifiable.

Well, I contacted Compete before writing this article, and I had a chance to speak with Compete.com’s Director of Product Management, Eric Austrew. And it’s clear that we have very different views and definitions of what things like “referrals” and “invasion of privacy” are. That’s fine. As Austrew himself said, reasonable people can disagree. Of course we can. Then again, when it’s my personal information being shared publicly I have to be blunt and say I really don’t care how much a company wants to defend their definitions. I’d say I took it far easier on Austrew than I wanted to (if you remember my work at NakedPR.com, you know what I mean), but I do have to give him some credit and thanks for hearing me out given how passionate I still was about the issue.

That said, you might remember that I used to run a small PR firm. You might also know that I made it a point to set myself apart from the traditional sleazy PR image of hype and spin and constant “corporate speak.” I despise corporate speak — with a passion. When yes / no questions are overanalyzed or avoided (we’ll get to that), I start twitching. When someone I’m interviewing is constantly going out of their way to seemingly “stay on message,” I’m sickened by what usually comes across to me as a lack of authenticity that seems to hang in the air. I have little tolerance for it, and this was no exception. But we’ll get to that too.

What I want to do now is open up a few areas of discussion with readers here about this issue and what can be done to protect people’s privacy. Let’s get to it.

Referrals: Let’s Define

What exactly does it mean to say that you “refer” someone to something? To me it means that either you’re recommending it or you’re directly sending someone to something (in this case a website). It implies knowingly directing someone to a resource, whether that be in support of it or not (such as me linking and “referring” you to Compete’s site in this article regardless of actual support for the service).

Compete’s definition of “referral” seems quite different than its generally accepted use (and having been a webmaster for quite a few years, and having worked heavily with them through my former PR firm, I can tell you that the term “referral” does not generally include what someone happened to be looking at before they became a direct visitor via type-in traffic).

Austrew acknowledged that Compete defines “referral” differently than I do. And again, that’s fine. But I consider it less fine to use confusing terminology that can lead visitors to infer untruths, as in a site or its owner actually referring people to sites they wouldn’t dare refer. And it’s definitely confusing.

According to Austrew, even though type-in traffic is shown in the free sampling of “top destination sites,” it’s broken down further for paying members with a better explanation of what that data means. Again, fine. But what’s not fine is not breaking that down for every single visitor who has access to any of this data using Compete.com. Telling some people what something means but not others is irresponsible at best (in my opinion).

The phrase Austrew used for this traffic was “true referrals.” In other words, a true referral in Compete’s view is any site you were viewing when you left to go to another site. He used the example of BestBuy.com and CircuitCity.com. And from a corporate perspective I can see how someone might find it valuable to know a direct visitor was on a competitor’s site first, and that they then chose to visit you instead. But Compete doesn’t only report on corporate sites.

Also, “true referral” is hardly an industry standard phrase, and personally I consider it highly misleading to refer to something that could be complete happenstance as a “referral” of any sort. In fact, “true referral” is a term that’s been used to refer to multiple things including:

  • Actual relationships in referral networking (as opposed to referrals from people who know you but who have no actual experience in business with you);
  • Original referrals in Web traffic (the first site that directed you somewhere and exposed you to another site as opposed to later links you might have followed there).

So let me ask you. If you were doing research on a competitor’s or colleague’s site, and you looked up their “referral analytics,” would you get the impression that this was traffic actually referred from their site to somewhere else? Knowing that isn’t always the case now, if you visited a small blog from someone you know do you think you could probably pull out personally identifiable browsing histories, separating it from the relevant referred reader traffic? How do you feel about it?

Invasion of Privacy: What is (or Should be) Private?

I noted that Austrew and I also seem to have different views on what constitutes an invasion of privacy. While I do understand that Compete claims all traffic is opt-in, I personally do not consider it “opt-in” when it comes to personally identifiable information — no matter how rare of a case mine might be. To the best of my knowledge I’ve not once clicked an opt-in box agreeing to let anyone share potentially personally identifiable information. I’d actually consider opt-in boxes in general to be far too rare these days. Instead so much is tucked into big blanket terms & conditions or terms of use statements littered with legalese, which companies know damn well most consumers don’t read (or is it that they can’t understand it?). Companies know precisely what they’re doing in this sense, and it’s how they get you to agree to just about whatever they want, and you might not be any the wiser. (Talking about the 3rd party services here.) Even then though, there’s a reasonable expectation of complete anonymity when data is sold or otherwise given away — several sets of terms I’ve read since yesterday have blatantly said as much. And to me, a lack of complete anonymity when that data is transferred and / or published is an invasion of my privacy.

But I asked what I think was a reasonable question. I asked if Compete had a way for me to opt out. They did not. I was informed that to opt out of the data mining and selling / sharing, I would have to opt out through the 3rd party I had a relationship with. So I asked how I could find that 3rd party (I mean, seriously, how many services are we all a part of these days?). But Compete won’t reveal the identities of their partners, so there’s no easy way for me to say “Okay, I’m a member of X, Y, and Z, so let me go find out how to opt out with them or cancel the service.” I guess it’s nice that at least somebody’s privacy is protected that thoroughly. For me though, it feels nothing short of trapping and is reminiscent of Facebook previously making it incredibly difficult for people to delete their accounts once there — after all, the value is in keeping the information, right?

The Spam Comparison

Now take a moment and think about this in comparison to spam. Essentially you’re more protected from spammers legally than you’re protected from companies publishing this kind of information.

It starts in the same place sometimes. You sign up for a service. You give them your email address for legitimate uses. Somewhere in the terms that you have to agree to if you want to use that service it states the company is allowed to sell or rent your email address. Sometimes there’s a special check box regarding allowing “partners” to contact you, although again this seems to be far too rare.

That email address is included in a rented email list. Someone purchases that list and they email you, promoting their products and services. So far, all by the book. Here’s the thing though. They have to give you a way to opt out. And no, I don’t mean the original service provider. I mean the people who purchased the data and proceeded to email you. If they don’t give you a way to opt out, or if you opt out and they continue to contact you with those marketing messages, that’s spam.

So tell me…. Why should equally personal information like browsing history be held to lower standards? Why aren’t the 3rd party users of this data required by law to let you opt out of that data collection and publication if you do find that you’re one of the exceptions to the rule? And I mean opting out through them — not trying to track down a third party, when it’s highly unlikely the terms blatantly say “hey, we’re the ones providing your information to Compete.com.” It’s my opinion that they should be. What about you?

On a side note, after first being told there was nothing Compete could do to help me opt out or discover the third party that apparently led to an opt-in status, Austrew did offer to help in an email. I plan to take him up on that offer, and I’ll update here in the comments if / when that happens.

Where is this Information Coming From?

This is really my big question in this whole issue — who did I sign up with that led to an opt-in so I can now opt-out, given that information displaying in my site records is indeed personally identifiable to people who know me. Compete obviously can’t tell me who their partners are. But I had a thought — “hey, my browser tracks this information, so I wonder if it’s in the terms I agreed to when I installed it.”

Actually, I use three browsers. In Web development you don’t honestly have an option — you have to use all major browsers for testing. But the data that displayed in July would have been from June. I mostly used Firefox then, with a bit of Chrome tossed in. So I checked their privacy information. Firefox’s privacy policy is confusing beyond all hell, and that’s coming from someone who usually has a pretty fair grasp on the legalese. To their credit, the language is actually remarkably clear. It’s the information that isn’t. Early on it sounds like your information will stay within the Firefox / Mozilla community, and later on they say they can provide your personal information to certain third parties. So what is it? Is my information private, or is Firefox providing it to others? If so, who?

So hey… why not just ask Compete’s representative outright if they partner with any Web browser software companies? Well, I did.

When I spoke to Austrew on the phone, I asked this question. I mean there are only three main browsers for PC users. If one or more is selling this kind of browsing history information, we should know about it. The answer I received was somewhat vague, but basically amounted to it being an unlikely situation. Of course that’s not exactly a firm “no” either.

After that conversation I had another thought — what about our ISPs? Again, most Americans have limited options in this sense. Would we have to cancel our Internet service to “opt out” of this kind of data sharing and get around those terms of use? So I asked that question too, and re-questioned Austrew about the browser issue so there would be no chance of mis-quoting him on that.

Here are the questions I asked:

1. Does Compete partner with any ISPs? (As in, to opt out someone might have to go so far as to cancel or switch their Internet service provider to get away from those terms and conditions.)

2. Does Compete partner with any Web browser software company where the simple choice to use a browser could result in opting in to sharing data with Compete? (As in, the use of browser X means you’ve opted in to having type-in data shared — not situations where Compete might partner with the developing company in some other way, given that Google, Microsoft, etc. clearly have business beyond browsers.)

And here was his response:

To answer your questions, our panel consists of multiple sub-panels – basically, different recruiting sources – that that fall into one of the following categories: proprietary panelists that we recruit, and clickstream data that we license from ISPs or desktop application partners.  We follow industry best practices for ensuring that the resulting clickstream is anonymous.

So yes, your ISP might be tracking your type-in browsing history and selling off that data to Compete and similar companies. I can tell you how unhappy that makes me, but I’m sure you can already figure that out. I’m even less happy about it given that I’m not actually the person who signed up with this ISP — so your spouse, or roommate, or someone else involved might have agreed to something that tracks your personal data without you even realizing it.

As for browsers? Well, I still haven’t seen a “no.” But they’re not even directly referenced in that response. And they are “desktop applications.” So I still don’t know what exactly to tell you on that front. That concerns me.

As for the comment on best practices and anonymity, that’s all well and good but….

Sometimes “Anonymous” Data Isn’t Truly Anonymous

Here’s the thing. It doesn’t matter in the slightest if your browsing history data is anonymous to Compete. Or to your ISP (or whoever else is collecting the data). It doesn’t matter if things are aggregated. What matters is if the people researching your site can identify your browsing history through the data provided there. Actually, if even one person can do that, the information is no longer technically anonymous.

There are other potential issues too. What if the information reflects your website referring people to a site that’s completely and utterly inappropriate? What if you’re a teacher who blogs, and your blog were to show that it refers traffic to a racist hate site? You might never have gone to said site. You might never have referred to that site in any way. But a malicious user could repeatedly open your site, then directly type in that site’s address, making the public record show that your site refers traffic to them. For a huge site with a ton of traffic, they might not be able to influence anything. But what about for a smaller blog? The exact wording at Compete.com is “websites getting traffic from [insert your site here].” That statement is not technically accurate. That site would not be getting traffic from yours. It would be getting direct type-in traffic that just happens to be in the same browser window as the site previously viewed. And anyone who’s been in the Web development game for a while knows there’s a big difference.

referral

Referral Language on Top Destination Page (free version)

People can be malicious. If you’ve ever been under serious fire from a troll or competitor, you know how it can be (*raises hand*). You have a right, in my opinion, to not have your site falsely associated with things that could be viewed as some sort of relationship by an average user, which could come across as defamatory if implying any level of support.

Now is that likely to happen? I don’t believe that for a second. But I went ahead and ran a few searches on blogs in the freelance writing niche (the niche of my primary blog). I looked up information on competitors, colleagues, and friends. The simple truth is that when we’re talking about professional blogs of individuals, those three groups have a lot of overlap. You know a good deal about the people you’re researching, and it makes information that much more potentially personally identifiable.

In checking just a handful of sites, I found some interesting results, and I asked the site owners about them:

  1. One blog was tied to a black hat community. For those not familiar with the term, it has negative connotation — meaning someone who blatantly ignores or tries to get around the rules to manipulate things like traffic stats or search engine rankings. This professional writes for online business owners, and an association with that kind of community could cause them to lose business. This wasn’t just a top destination listed — it was the only destination listed. In this case the writer also is in a profession (in addition to writing) where they have to go out of their way to “keep their nose clean” when it comes to being associated with anything potentially questionable. In this case I could do a simple search and discover that nowhere did their site link to this community. They do not visit that site, and knowing this person fairly well I wouldn’t assume that they do (and it’s not such an awful thing that I’d care if they did). That leaves a couple of other possibilities: 1. The individual’s partner works on the Web as well and could have visited the site frequently. Or 2. Traffic was coming the other way around, and “back button usage” or re-typing the URL to return to the community could explain it. I’m just hypothesizing here.
  2. In another case I was able to very easily identify a frequently-visited Web comic of a colleague. In this case fortunately there was nothing inappropriate that would have led to a negative association. But it is still none of my business what they are reading, and I should not be able to easily determine that by viewing freely available “top destination” data on Compete’s website. More importantly, this isn’t just a colleague. The individual also works for me. I’m a client. Had this research shown something highly inappropriate, there’s always the chance it could have affected the working relationship. I think the worst you’d probably find in my own browsing history would be an online dating site or two from that phase a few months back. While I’m not embarrassed by that in general, I can say I’d rather my colleagues and clients not have been able to look me up there when a profile was still active. Those things are kept separate from work for a reason. Would you want clients knowing what you frequently look at online?
  3. In another example, I looked up a writer whose site was showing that a torrent site was a heavily referred destination. They didn’t even seem to know what a torrent was up front. And while there’s nothing inherently wrong with torrents, many of us do know that they’re often used in copyright infringement scenarios, giving them a negative image — especially in an industry where copyright infringement is a particularly serious issue. There were no links from the writer’s site to this torrent sharing service. How it ended up as a top destination site is still a mystery in that case, but it’s another example of how a not-so-“true referral” risks the image of the supposedly-referring site when it’s being researched by colleagues, clients, or others in the industry.
  4. In still another example, I looked up a site run for military families. A top destination was a gaming site — again, no direct referrals from said site to the gaming site. Now you’re talking about a family-oriented site and gaming — and I’m sure you know the ultra-violent reputation a lot of military-centric games have. In this case the culprit seems to be a contextual advertising network. The site owner now knows to try to ban certain types of ads from displaying on the site. However, as anyone who’s ever run a site with contextual advertising networks can tell you, you don’t generally get to hand-pick your ads. When you see something offensive, you can usually block the advertiser. But it would be extremely difficult, if even possible, for you to know every single ad that appears on your site. It varies not only by page, but by page load. And ad inventories frequently change. And given that the javascript ads are designed specifically so that the traffic “referral” is not counted by search engines as a credited link for rankings, it would be a logical assumption that they’re not being tracked as referrals in general. Not the case apparently when the page is viewed after a pageview on your own site.

That’s what I found in just a few minutes of searching. Eight sites were looked up. Five showed strange results. Four of those people got back to me. And one of those four had personally identifiable information show up (just as anyone who knew me at all could have come to that conclusion about my own site’s results).

Is it happening constantly? No. It’s not. But two is two too many. (Wow, say that three times fast.) And keep in mind I’m not talking about corporate sites here. Many blog communities are fairly tight-knit like ours is. Colleagues get to know each other fairly well. And sometimes those relationships aren’t the most friendly. There is a lot of information that could be easily inferred, and shared by people who should not have it. We aren’t talking about nameless, faceless corporate sites, but ones knowingly owned by individuals where the people who need to research those sites are the very same people who likely know the owners.

There are many types of sites you might not want showing up there as referrals (when they’re indeed not referred to by your site at all). It could be a porn site. Maybe an extremist political site. Perhaps you visit a religious site and you don’t think your faith should be of concern to people visiting your blog. Or it could just be another website you own. Many people, myself included, own numerous sites that we don’t publicly associate our names with (and no, no adult-oriented or shady sites here — sorry to disappoint). The fact that you visit these sites directly is no one’s business just because they’re checking stats on an unrelated site. Of course, that’s just my opinion because I was one of the people with identifiable information being displayed.

Protecting Your Private Information Online

I’ve always considered myself fairly careful about online privacy. I’m very calculating in what I choose to share and keep to myself. There’s a difference between sharing personal information in a post with my readers and giving up something like browsing history which I would consider more “private” than just personal. But somehow I got pulled onto this panel without realizing it (and hopefully they’ll be able to help me leave it). What can you do though?

  1. Run searches for your own blogs and sites to make sure nothing potentially personal is being displayed.
  2. If you’re on the panel and you know how you signed up and you don’t want this browsing behavior shared, then opt out through the third party service you used.
  3. Don’t save one of your websites as your browser’s homepage. Yes, I think it sucks that you’d have to change your browsing preferences to maintain what I believe should have always been private or at least unidentifiable but if it helps, it helps.
  4. Make it a point to open brand new tabs or browser windows, or at least visit relevant and / or completely “innocent” sites if you’re starting out with your own site in your browser window (less likely Yahoo! will show your destination as a top destination than your own much smaller site for example — fewer destination sites means a greater likelihood something will show as a top destination). I make no promises on the tab bit though — perhaps that’s enough to circumvent the issue, and perhaps it will still be tracked if your site was the last visited while the browser was open.
  5. I suppose you could also just never visit any kind of website you wouldn’t want your friends, family, colleagues, employer, or clients to know you’ve visited. But I really don’t think that’s a solution.

There’s one last thing I want to touch on, and that is the fact that Compete does not currently allow website owners to opt out of having their sites’ statistics measured and shared. Personally, I find that unacceptable. I do understand that they make their bread and butter by selling data. And the reasoning I was given was that if Compete lets one site owner opt out they have to make the option available to everyone. To that I say, yes, they should indeed make that option available to everyone. After all, if even Google lets you opt out of being indexed and therefore having information shared (like incoming links or information searchable via site-specific searches), then there is little excuse for that option not to exist in this case.

I don’t believe for one second that Compete or anyone working for the company is intentionally sharing personally identifiable information about any blogger or website owner. But the sad truth is that even if rare, it does happen. And I don’t know about you, but I would very much like to see that change. And if it does not, then I’d like to see Compete make opting out of reporting an option for site owners, especially if they do find their own browsing history is being exposed in some way.

Thanks for bearing with me for what turned out to be more of a short e-book than a blog post here. And thank you again to Eric Austrew for taking the time to talk to me, whether or not we’ll ever fully see eye to eye.

If you want to check your own blogs or websites to see what kind of referral traffic is being displayed and associated with your site, visit Compete.com. You can also learn more about Compete’s panel if you want to know more about where their data comes from.