A few days ago freelance writer Laura Spencer (@TXWriter) tweeted a link to a Mashable post. The hyped up headline read “Facebook Now Controls 41% of Social Media Traffic.” Before I even read the post my gut screamed “Bullshit!” It often does that. My gut is rather talented at sniffing out shady statistics. It must be that past life in PR where we all learn that statistics can say just about anything we want them to if we twist them enough (my disgust of that attitude makes me hypersensitive to them now).

Then I did read the article. What I found was baffling (okay, it wasn’t really — it was about what I expected):

  • Charts with no reference points related to the supposed trends shown
  • Assumptions about people jumping from one site to another without any real evidence to back that up (and data charts right in the post that contradicted the claim)
  • Other statistical claims that didn’t jive with the “relative” charts shown in the post
  • Big social media sites being completely left out of the comparison
  • Whole niches of social media completely left out of the comparison
  • Sites that probably shouldn’t have been included but were
  • A huge social media site included in the first set of stats suddenly disappeared from later ones

Yikes. I bet you’re wondering why I haven’t linked you to the post yet. That’s because it seems to have gone “Poof!” Vanished into thin air it did. Because of that I won’t pull the actual charts to show you the problems (doesn’t seem right to publish their charts when they’ve pulled them — especially when it wasn’t even clear in the post if they belonged to Mashable or were Comscore charts taken somewhat out of context). However, I do want to highlight something from the cached version which illustrates my biggest problem of all:

Mashable Headline re: Facebook Statistics
Mashable’s Headline on Facebook Statistics (now pulled) — Interestingly, the post doesn’t even mention YouTube beyond the intro.

Do you see those nice little Twitter and Facebook counts to the left of the post title? That’s how many times this story was shared just on those two platforms before it was moved or removed. That’s terrifying. Why? Because it shows just how easily and quickly bad information can spread on the Web. Let’s look at some of the specific problems.

1. Major segments of social media (and sites) are ignored. — The first chart shown in the post supposedly illustrates traffic changes for eight social media properties from February 2009 — February 2010. Those sites are Facebook, Myspace, Gmail, Twitter, LinkedIn, Ning, YouTube, and Hulu. While I don’t claim the 41% statistic came from this data set (if you can call it that), it’s a good indication of how narrow these stats clearly are. (The chart showing how big Facebook’s piece of the pie supposedly is actually highlights only six social media sites — take YouTube and Hulu out of the previous group.)

What happened to the other pieces of the pie?
What happened to the other pieces of the pie? (Credit: BigStockPhoto.com)

I guess they forgot two of the grandfather platforms of social media — forums and blogs. How can you claim anything controls 41% of the entire industry’s traffic (as their post title did) when you ignore huge segments of it? You can’t. The claim made was clearly too broad, and there’s no excuse for that (especially when readers trust the information you put in front of them).

I also found myself wondering where other reasonably large players were. Anyone remember Flickr, Delicious, Digg, StumbleUpon, Flixter, Classmates.com, or Reddit for example? If you want to really know how much of the social media space Facebook controls, don’t forget you have to include all of the socially-driven P2P (peer-to-peer) networks too — Limewire, BitTorrent, and related sites. Why do we so often forget about old school social media when it doesn’t suit our statistical purposes?

What’s just as questionable as not including some resources is including sites like Hulu in social media stats when it’s little more than a glorified vlog (and that’s coming from someone who loves Hulu just for the record). It’s not so much about social content as it is content consumption. Yes, you can comment and share. But unless I missed something, you can’t really contribute to the primary content base. I’m not saying Hulu doesn’t fall within the realms of social media — only that it’s completely senseless to include it in any “relative” statistic on what’s happening in social media as a whole while ignoring blogs and other significant sources operating in similar ways.

 

2. What happened to YouTube? — If you could still view the first chart they showed, you would see a dateline from Feb. ’09 through Feb. ’10. What you would not see are any metrics showing what stats were actually being measured during that time — just a blank y-axis.

When the author was questioned about the missing reference points in the comments, the reply was “Units of measurement are relative, since they come from a panel audience of a few hundred thousand. But the data definitely reflect average consumer behavior! comScore wouldn’t lead us astray.”  Um, yeah.

This is how faulty information spreads. Even worse, this is how poor interpretations of faulty information spreads. Why do I call it faulty? Because if you look at that chart (which clearly shows YouTube with higher starting and ending traffic levels than Facebook) and look at it in relation to the statistics shared in the post, they just don’t add up.

What does basic logic tell us? If you’re showing YouTube as having more traffic, then the percentage of total social media traffic must be higher than Facebook’s. That would put it at more than 41%. However, the author goes on to state that “As of March 2010, Facebook traffic made up 41% of all traffic on a list of popular social destinations. MySpace was in second place, capturing around 24% of traffic. Gmail had 15%, and Twitter had 8%.”

We’ve already talked about how the “list of popular social destinations” was faulty and didn’t justify Mashable’s claim to begin with. But here we have 88% of social media traffic (for those listed sites) accounted for. If YouTube was already shown to have more in another chart from the same source that would mean at least 42% of traffic went to them — putting us over 100%. That would tell us that YouTube wasn’t included in the list that led to the 41% statistic (or the metric-free graph from earlier in the post was complete hogwash). Neither should be acceptable to anyone reading the site or getting these statistics from anywhere.

It gets better though. I can’t find anything that shows YouTube having more visitors than Facebook. So okay. We’re back to the problem of missing reference points in their first chart. Maybe they wouldn’t be at least at 42% as the first chart in this post would suggest. However, on checking some other statistical sources just for more background (Compete and Alexa — note that I don’t put much faith in either of their stats individually either), it appears that YouTube does get more traffic than Myspace, even if not Facebook. So based on the 41% statistic for Facebook and the 24% statistic for Myspace, it still wouldn’t add up unless YouTube was given the boot from the calculations. Why?

 

3. Jumping to conclusions — One of my favorite assumptions is that because Myspace’s traffic share was shown to decrease and Facebook’s was shown to increase, that meant users were jumping from one social network to another. Of course there was no actual evidence to back up the claim. Nothing in the post suggests that it’s more than supposition.

In fact, two graphs in the post show that Myspace’s traffic remained relatively stable. In other words, on one hand they were showing that we had one maturing source leveling out in traffic over that year and one rapidly growing. On the other hand, that was somehow twisted into a mass migration. Of course traffic share for other sites will decrease when another comes in and shows massive gains. Less of the pie is available to them.

What would be far more interesting is to look at truly active users — with that “active” status being set uniformly by some kind of social media standard (if everyone could actually agree on one). Facebook isn’t exactly known for making it easy for people to delete their accounts. MySpace has been much easier to leave (as are other social media sites). That’s one reason I’ve always been skeptical of Facebook’s traffic numbers (at least their own claims). To some degree they’re like a black hole, sucking in social media users and not letting them back out. Okay. They apparently can delete accounts. But unless things have very recently changed, they don’t make the process easy enough to pretend we’re comparing apples to apples in most cases.

Then again, relying on “member” numbers is an inherent fault of measuring social media for this very reason. How many members are active? How many are legitimate members vs automated bot-driven “members” and other spammers? What kind of activity are we talking about anyway? Are traffic numbers high because people are really interested in more information, or are the numbers high because the social media site requires multiple page views to do simple things to inflate their overall traffic numbers?

Every site is different in what they consider an active member, how good they are at weeding out the automated spammers, and how they direct traffic on the backend of their sites. Rarely are they directly comparable, and that doesn’t bode well for these types of comparisons beyond very general trends.

 

There is No Excuse for Spreading Ignorance

 

This is just one example of why I so often cringe when I see social media statistics thrown about by a supposedly reputable source. This information spreads. Quickly. People don’t seem to analyze information before passing it along anymore (or maybe a good question is “did they ever?”). I don’t think there’s any excuse for it.

I’ll give Mashable some major credit for pulling the piece. I don’t know what their reasons were, and I won’t make assumptions. But even if it’s just an unintended benefit, I’m glad to see the stop of the spreading of that post. One of my first comments in that initial Twitter discussion about the post was that I feared it would spread. It clearly did.

The Bigger Problem of Social Media Statistics

 

This isn’t a problem with Jolie O’Dell. This isn’t even a problem with Mashable. This is a widespread problem in social media. Companies are essentially able to create their own self-fulfilling prophecies by interpreting and publishing incomplete or distorted data. Actually, it goes well beyond social media.

I’m not saying this is always intentional. Sure, sometimes it might be a case of creating linkbait headlines or trying to puff up a company’s image to bring in more traffic and members (even if just out of sheer curiosity). I don’t assume that O’Dell had any ill intentions. Her post just happened to be the latest example of stats gone awry.

bar graph
Graphs without reference points tell us what exactly? (Credit: BigStockPhoto.com)

I also can’t give you any easy answers when it comes to social media measurement. In no way am I saying that Facebook doesn’t have a commanding presence in the industry. The point is that we’ll probably never know exactly what percentage of social media traffic any given site has, and it’s silly at best to pretend we have authoritative data when we don’t. That’s because we don’t know every social media site out there. New ones pop up every day. Sometimes they disappear or are sold off and merged (like recent news from AOL about Bebo). And sometimes one changes its business model, potentially fueling the growth of new social platforms (can anybody say “Ning exodus?”).

Does that mean we should stop gathering and interpreting data? Absolutely not. But I do think it means that we have a greater responsibility as publishers to help our readers form their own opinions from that data. We can do that by pointing out not only interesting possibilities but also potential flaws. Or have we just become fad-feeders, discouraging critical thinking and informed decision-making in favor of a “Join me! Follow me! Friend me!” approach of hyping up the tools we use in order to build our own audiences there?

31 COMMENTS

  1. It is obvious to me that what we have here is not statistical analysis but only media hype. Quoting a statistic is intended to increase credibility and that does work with most people because few today have any critical thinking skills. (Or maybe they never did.) Using a specific percentage or number, ideally an odd one, is a frequently taught copywriting tactic to increase response.

    The widespread availability of free analytics data is extremely misleading because most have no idea how to control for the many variables. That article could never have been accurate because their starting concept was far too broad to ever bring it into focus and accurately attempt a statistical comparison.

    There were so many logic flaws in their information that it could never be “fixed”. You pointed out some of them such as growth in one Social Network does NOT prove desertion of another. (Many people can and do use more than one regularly!)

    Another is that Facebook sells traffic and many major brands are buying that traffic and driving their own visitors to Facebook through other mediums. Facebook SHOULD be growing faster because they are being supported by both the major media and big brands.

    Most important of all though is that whatever social network (and search engine) the media promotes will have the most traffic. Anyone who has been paying attention for a long time should be able to see this. Here are some patterns looking back in what the media has promoted most:

    Yahoo, Google, Bing
    MySpace, Facebook, Twitter

    This should be very obvious, especially regarding Google and Twitter. Everything from the News to sitcoms to movies mentions Google this or Googling that. Did you ever hear any of them suggest Yahooing anything? If you are old enough though Yahoo WAS once promoted by the media and now Bing is their new focus.

    Twitter went from rarely being mentioned to being promoted by celebrities (remember the “who can get to 1 million followers first” hype?), featured on Oprah, Ellen, popular shows like CSI, and on the major News shows all on the same day. Anyone who believes that could happen by happenstance is truly gullible.

    • “Quoting a statistic is intended to increase credibility and that does work with most people because few today have any critical thinking skills.”

      I couldn’t agree with you more strongly. Critical thinking is like a foreign concept for many. I often want to beat my head against my desk (and sometimes do) at the sheer ignorance I see — people believing something just because so-and-so said so, or situations like this where false “data” because news worth spreading.

      I also agree about there being problems with analytical data offered on the Web. Back on NakedPR, I attacked them pretty harshly when the “top blog” lists were all the rage, ranking bloggers based on extremely unreliable data. This is another example. But unfortunately people assume something is better than nothing. If that “something” gives false impressions, gets treated by Average Joes as serious and specific information because of your otherwise credible history, or is just downright BS like the Mashable headline’s claim was, then I’d argue some incomplete or inaccurate information is actually far worse than none.

  2. Thanks for taking the time to shine some light on these shady stats. It is amazing how so many people accept stats as facts without looking at what is being measured and how it is being measured. Most people are very trusting. Great reminder to stop and think about stats before we just accept them to be true.

    • Honestly, I think Twitter is a part of the problem in this case. It’s like virtual sound bytes — once people see something vaguely catchy, they spread it without bothering to know what they’re passing along. But that’s what happens when people want everything so condensed. You lose context.

  3. Ann, yet again a great post.

    I’m guilty of retweeting posts that had sensational headlines and seemed to have ground breaking stats then I look at my bit.ly account and see them as 404 errors. Until now I thought that it was a bit.ly error (apologies to bit.ly) but this post shows there may be a worrying trend to spam post with sensational stats, get a load of social media reaction and then pull it before any questions are raised.

    A basic PR strategy I suppose. Need to be less trusting in future.

    • I’m not sure that they pulled it for that reason. I didn’t get the vibe that it was an intentional PR stunt beyond the linkbait (in the worst sense of the word) headline. I’d rather see things like this pulled and showing 404s than having the incorrect info spread around further though.

  4. They use the same methods on weight loss product commercials. Mysterious charts with no reference points or source data. I knew the FB stuff was crap, but I’d like to point out that in addition to the faulty statistics, we’ve also got the herd mentality. As soon as 1 publication mentioned how influential FB was, everyone had to go create a FB fan page so we create our own self-fulfilling prophecy data. It almost becomes a situation where we feed on ourselves. Yes, I’m not hungry anymore so eating my own feet is an effective form of appetite satisfaction, but I can no longer walk… I really hope that analogy makes sense.

  5. Thanks for taking the time to write this thoughtful post, Jennifer.

    In my original post on Mashable, I noted that we’d asked comScore (who provided our data) for stats from a sampling of popular sites, not from the Internet at large. I also noted that, because of comScore’s hybridized approach to data collection and analysis, stats were presented for each site relative to the others. The reason for this isn’t shady at all: comScore used a group of around 100K average web users to gather behavioral data. To include hard numbers rather than percentages would have been slightly misleading to our audience; after all, we don’t want anyone thinking that Facebook had only 71,000 users in Feburary 2009. =)

    However, the data we presented ended up being confusing to some of our readers, precisely because of the way that comScore approaches stats and analytics (you can read more about that here: http://www.comscore.com/About_comScore/Methodology/Media_Metrix_360_Hybrid_Measurement), so we ended up pulling the post anyhow.

    Thanks again for your excellent critique. The issues you point out in the general sphere of social media stats gathering are important for all of us to acknowledge and address; believe me, no one wrinkles their nose at bad data as much as my colleagues at Mashable!

    Hope you’re having an excellent weekend.

    • Thanks for commenting Jolie, and for being a good sport about the critique. But I’m going to have to continue.

      1. Your headline made a false and generalized claim — not that Facebook took 41% of traffic from a small sample of sites, but that “Facebook Now Commands 41% of Social Media Traffic.” comScore didn’t write that headline or claim that as fact. You did (but again, to your credit it was pulled). That was the biggest problem I had on seeing the post. That’s also the kind of linkbait crap that gets spread regardless of people actually reading and understanding the content (as I mentioned to a previous commenter, that’s as much the fault of tools like Twitter and our craving for condensation and instant gratification as it was yours or Mashable’s).

      2. The only mentions of comScore in the post were as follows: “Facebook and YouTube are displacing rivals and taking over the social web, according to data we’ve just received from comScore.” [and] “comScore is an acknowledged leader in digital analytics and intelligence. comScore’s data for this post are based on a hybrid of site analytics and audience measurement for U.S. users at home, work and school.” The first again makes a generalized claim about the whole “social Web” rather than clarifying that you’re talking about an extremely limited sample that leaves out significant social media platforms and specific sites. The second is pretty standard boilerplate material, is presented so far from the headline’s misleading claim that it’s easily missed, and still doesn’t account for why a huge sm site like YouTube was included in one sample data set (the first you mentioned I believe) but not in anything else (namely the info your post headline was derived from). There also wasn’t any mention of “relative” stats in the post as you mentioned. That was actually only thrown out in the comments when someone questioned the lack of reference points.

      3. A far more accurate way of doing things would have been to indeed include that statistical information. Let readers see the real numbers — how many were surveyed, and then show the percentages. They’d have to be completely brain dead to not grasp things when you specifically say it’s a sample and not full user statistics. It’s far easier for them to become confused when they see charts that really say nothing at all. Anything can look good (or however you want it to) when it’s just “relative” to some other comparison point — you just choose comparison points that make your info look good (for example, leaving out major sm sites like YouTube when you want to show a big statistical advantage in the industry).

      Like I mentioned, this is definitely not just a Mashable issue, and we have to give them some credit for having the collective balls to pull something that was misleading to readers as opposed to allowing it to continue. And in the future, hopefully they’ll come up with some kind of sitewide standard on how to present this kind of information so you don’t have to worry so much about audience confusion each time. 🙂

  6. Everyone knows that 67% of statistics are made up on the spot.

    I’m not sure about labelling Mashable as shady though, they just didn’t provide enough secondary information to make their charts and statistics useful.

    • The problem was that they made a very specific claim (typical linkbait headline) which the given data (and I use that term loosely) did not bear out. I’d definitely call that “shady.” That doesn’t mean shady stats are always intentional. If that were the case, they wouldn’t have likely pulled the post.

  7. Statistics are a dangerous thing all around. Everything from the way the data is collected, to the techniques used to calculate the final numbers can vary widely and are, unfortunately subject to manipulation. Statistics can be useful, but only when produced by knowledgeable, experienced statisticians who are unbiased with regards to the players.

    • I don’t think surveys and such necessarily have to be completed by statisticians. Marketers and PR folks are generally fairly well-trained (if they’re formally educated in the fields at least) when it comes to creating and analyzing them.

      However….

      I do think a part of the problem in some cases (not this one) is that these people are employed by companies that want to see very specific information (meaning they want to hear good news). If the information can be manipulated to look like what the company wants, those marketers or PR folks might feel pressured to make the reports read that way. That’s completely unethical.

      Of course it’s easy for me to say that. I’m a big fan of the truth, and I work for myself so I don’t have to worry about a boss firing me if study results don’t meet their expectations. With me it’s more “You didn’t meet your goals. Tough cookies. Try harder next time.” That said, I think any employer who would ask someone to fudge numbers is a moron at best (knowing what you need to fix is far more important in any business than putting on blinders to your problems). And if that happened to me, I’d leave. My integrity matters too much to whore it off for a buck (or even a lot).

  8. You had me at, “…statistics can say just about anything we want them to if we twist them enough…”. If I had a nickel for every time I grumbled something like that, well…I’d have a crap load of nickels! People use statistics to make themselves sound knowledgeable, but the truth is that statistics are meaningless 99% of the time.

    See how easily I made that up? Exactly. 🙂

    As others have already pointed out, the manner in which the data is collected, who collects the data and the intent behind the collection of that data all factor into the equation as it applies to the accuracy and bias displayed by statistical analysis of any kind. If you’re looking to illustrate a point, finding or manufacturing the data to make your case is never too hard to come by.

    • Thanks for your thoughts Alysson. I agree with you — knowing more background about how those statistics were arrived at is always helpful. I get my incoherent statistics from politicians thanks. I don’t need them in my feeds as well. 🙂

      A lot of the work I do these days is for the freelance writing niche. We’re planning a large survey running the course of several months towards the end of this year. One of the first things I decided was that I wanted all of the details in readers’ hands so there was no question of manipulation. Not only will the end report include details like who was surveyed, how many people were surveyed, what our goals were, etc. be included, but I’ll also be running it by quite a few authorities in our field. The idea is to get reputable people behind it not only to broaden the respondent base, but more importantly to get their feedback on the survey before anything goes out publicly. We want to make sure they don’t find any questions leading for example (one survey last year in our niche was a complete joke — the blogger clearly had a desired result to publish and surprise surprise, she got it; if you saw the leading questions and missing choices from the questions, how she got it was no mystery).

      I hate misleading numbers. Before going into communications work, I was preparing for a career in engineering. Maybe my love affair with numbers and my insistence on their accuracy began there. 🙂 Of course that’s much harder in this game, where nearly every survey result set put out is unfortunately suspect.

      • I think you hit on an important point, Jennifer…and an important nugget of information to help people better discern whether the stats being presented are provided with the sole intent of manipulating people: who was surveyed, how many people were surveyed, what our goals were, etc.

        If those who commissioned or conducted the survey aren’t willing to publish or share that specific information, you can bet dollars to doughnuts it’s complete BS intended primarily to skew a conversation, change opinions or manipulate those naive enough to believe that numbers don’t lie.

        And, as you said, politics are a PERFECT example of how polling data and statistics can be so easily manufactured to supposedly prove a point.

  9. Titles like those are becoming a habit for success on the social media these days, whether accurate or not is not really as big an issue as to how its actually made or collected. atleast, its collected by comscore, the other article i was reading earlier today actually talked about making up such stats which we know works as well 🙁

    Though, when i read that the actual article is removed, i was even more disappointed, it actually meant that mashable did not trusted the stats themselves and felt that it was not worth keeping. This bring us to wonder why they decided to publish it in the first place, as one thing from it we know that it was re-tweeted 2k+ and shared on facebook 1k+ times. Such large scale sharing of information that mashable hardly feels worth keeping on their blog makes a very sad reading,

    • Titles like that are nothing new. They’re the type of thing that gave linkbait a bad name (often unfairly).

      I definitely don’t agree that accuracy isn’t really important. That’s what stats are about — specificity and accuracy. Otherwise they’d just say X is bigger than Y or something of the like.

      I do agree that it’s sad it went live and was pulled rather than never being published at all. Then again it gave us an excuse to talk about the issue with a recent case study, and I think it’s always a good time to discuss thinking critically about the massive amount of information bombarding us on any given day. 🙂

  10. I constantly dissect these types of reports for my clients.
    I like to un-spin things and point out mis-information.

    Don’t know how many people really care about publishing responsibility, but at least there are people like you (us?) out there to keep them slightly honest.

    Thanks for sharing!

  11. Bravo. Whether its intentional or unintentional the stats presented, and conclusions made about social media are more often wrong or useless than right. It’s shameful, and I’ve seen this on Mashable and Forrester among other “reputable” sites, and other regular media outlets that rely on so called reputable sites.

    It’s one reason that I’m Titling my new book on social media – Giving The Business To Social Media – Hype, Hope, Bust, Reality

    PS. I think I’ll make your article the first member of the good sense award on my site!

LEAVE A REPLY

Please enter your comment!
Please enter your name here