Dave's Adventures in Business Intelligence » Social Media

Sep 23 2011

Social Media – I Don’t Get It, But Spammers Do

Categories: General,Rants Dave Rathbun @ 9:27 am

SAP had some fun on the BI 4.0 launch in New York last February. For years SAP along with other vendors have been touting their ability to import and analyze external data from various social media sources. Two SAP presenters at the launch event took a vote via Twitter as to which tie would meet the “Scissors of Destiny” at the end of the session. (Steve Lucas made an impassioned plea to save his tie, which he said was a gift from his wife, versus Dave’s tie which he “… just bought last night.” Steve won, and his blue tie survived.) It was a fun display of technology, but is it really that important? How impressive would it have been if the “fail whale” had picked that moment to make an appearance?

I don’t usually spend a lot of time here on my blog talking about philosophical aspects of BI as I am personally more interested in technical issues and solving problems. But the apparent consensus as to the importance of social media bugs me.

Let’s Start With Twitter

For example, I’m not a big fan of Twitter. I only follow a few folks, and I don’t tweet much. Yet despite my lack of interest Twitter was in the news the other day for setting a record number of tweets during the final for the women’s world cup soccer tournament. The record (which is sure to be broken) was over 7,000 tweets per second during the tournament. 😯 That exceeded the rate during the last Super Bowl (United States professional football championship in case anyone doesn’t know…), the recent royal wedding, even the announcement of the death of Osama bin Laden.

The exciting climax drew 7,196 tweets per second, according to Twitter. Paraguay’s penalty shootout win over Brazil in a Copa America quarterfinal later the same day came close to beating it with 7,166.

The previous record of 6,939 was set just after midnight in Japan on New Year’s Day. Other spikes include bin Laden’s death (5,106 per second) and the Super Bowl in February (4,064).

Read the full article at ESPN.com for more details.

Seven thousand tweets per second. As a data point showing the growth of Twitter that’s certainly impressive. But how many of those tweets contained information that was valuable and relevant to what I really wanted or needed to know? How am I supposed to filter or otherwise process all of that information? Or put another way, how much of that data was signal, and how much was noise? During events like Tech Ed I already don’t tweet much because I would very likely be repeating something everyone else is already saying. Try this: search for the SAPTechEd hashtag, and see how many tweets that you find that are essentially the same. I think most folks would agree that quantity is not the same as quality, yet as a society we seem obsessed with bigger / better / faster so numbers impress us. Seven. Thousand. Tweets. Per. Second. Wow. 🙄

Some may argue that it’s not the individual tweets that become valuable at that point, it’s the overall sentiment displayed in the tweets that becomes interesting. That’s a valid point, and that’s something that the Text Analytics product (now integrated into the Data Services suite) was designed to do. High-level analysis of high-volumes of data makes sense, but I certainly can’t use it on a personal level.

Along those same lines (and related to the “my followers list is bigger than yours” competition) did you know that you can reportedly buy Twitter followers now? (Seriously, google for “buy twitter followers” and see what you find.) Why? Because people with “presence” (and to some companies presence can be defined by the size of your followers) can get paid for tweets. The Kardashian sisters are apparently masters of this, rumored to get paid in the neighborhood of $10,000 (some sites suggest even more) for a single tweet. I wonder how many followers I would have to buy before getting to that territory? What would the return on that investment look like? I remember getting excited – briefly – when I hit 100 followers. Then I went out and looked, and about 20% of them were spammers, including a number of “sweet young things” that wanted me to visit their web sites. Why would someone like that follow me? Especially when Jamie Oswald is obviously much more attractive? (He must be, he has three times as many followers as I do, not to mention 60 times as many tweets. Wait, what was that quality versus quantity thing again? :P)

In Which I Personally Declare 99% Of The Internet To Be Crap

The Internet is a wild place where what few rules might exist are not always followed. If there is money to be made, then someone will figure out a way to abuse the system. It’s not just the “little guys” either, as evidenced by the way retailer JC Penney took specific steps to outwit Google during the 2010 holiday shopping season. Google unwittingly initiated this years ago when they decided that their search engine would give bonus points to web sites that had more incoming links (also known as “back links” because they link back to a site) than other sites. Entire industries came into existence (SEO or Search Engine Optimization for one, link farming for another) that didn’t exist before, simply because there was money to be made. So why do spammers “follow” people on twitter? Because it’s yet another way to establish a link back to their own web site. Even if Google discounts the link, some of my other followers might be curious enough to click a link, and often that’s enough for the person on the other end to make a few pennies. Make a few pennies often enough and it can add up to real money after a while.

On BOB we are fighting this same battle and have been for years. In the early days of our community we were getting dozens and then hundreds of new members every week that signed up but never activated their accounts. Or perhaps they activated their account but never logged in to post anything. Why? It’s for the same reason, they wanted back-links to their spam web sites. Why would anybody waste time to do this? The reality is that they don’t. No single person would sit down one morning and decide to go out and register on hundreds of different discussion boards, or post comments on hundreds of different blogs. What happens instead is that a single person will instead sit down and decide to write a script that scans the Internet looking for discussion boards or blogs. Once they find them, then perhaps another script is used to go out and register and/or post on that board or blog. Last month (September, 2011) the BOB software blocked or rejected almost 20,000 such attempts. That’s over 650 attempts per day that were rejected. Numbers were even higher in previous months such as June when we blocked 23,514 similar attempts. In May it was 23,729 attempts.

Blogs are not immune. This blog ignored or denied an average of over 6,000 comment attempts per month over the last six months that were suspected to be from spammers. It’s my script fighting against their script, and it’s the “wild, wild west” all over again. As I said earlier this is not a new problem. Back in 2008 I wrote a blog post showing that spammers could have potentially cost me in the neighborhood of $20,000 had I not come up with my own automated tools to combat them.

Even Big Guns Like Google Are Not Immune

Google’s gmail service is overrun. In the past year, Google’s gmail has been used more often by spammers (based on my own spammer logs) than the next five service providers combined! That includes some service providers that folks probably know (yahoo.com, aol.com), some that board and blog owners around the Internet have come to know and despise (mail.ru, gawab.com), and some that are just plain humorous (hamstermail.net and anotherspamdomain.org). No, I did not make that last one up. Gmail is not a recent problem either. Here are some numbers that I compiled back in 2009 for another site. Want to read more about web-bots and how they post while you’re sleeping at night? Here are links to “Are You A Zombie?” Part I and Part II, written – appropriately enough – on Halloween of 2008. This is not a new problem, and it continues to get worse.

It turns out that computers may be even better at solving visual CAPTCHA puzzles than humans are! A CAPTCHA or “Completely Automated Public Turing test to tell Computers and Humans Apart” is that annoying series of distorted letters and/or numbers that I sometimes have to decipher in order to complete a web form.

Trust No One

Given all of this background, almost nothing blows my mind more than seeing a web address at the end of a TV commercial that links not to the advertising company’s home page but to Facebook! Why? Why are companies (and big companies too, not small ones) spending millions of dollars to build someone else’s brand? (Facebook in this case.) Beyond that, why would I trust any external entity to collect and manage what could very well be critical data from my (current or potential) consumers? And how do I know that the data you’re collecting is even valid?

There are rumors that Sarah Palin got caught setting up a secondary Facebook account, just so she could “like” herself and skew the results shown on her main page.

Why Worry About All Of This?

Aside from the stuff I have already mentioned about having to deal with spam blog comments, spam twitter followers, or spamming BOB members, is there anything else to be concerned about? Wasted time is wasted time, sure, but IT folks have become quite adept at identifying and eliminating email spam, and I’m sure we’ll get better tools to manage blogs and other social web sites as well. But I don’t think the problem stops there.

What if I am the target of an organized but subtle campaign?

Suppose I have a company… one that might be familiar: the ACME Widget Company. We make the best widgets the universe has ever seen. However, I have a major competitor who is trying to impact and they’re not shy about which methods to use. They found out that I was running a public opinion poll as to what popular color I should include in my next widget catalog. They hired some less-than-ethical programmers with plausible deniability and charged them with skewing the results of my public polling. With a little strategy, these hackers are able to carefully do so. Votes came from all over the world (via the already established botnets) and at all hours of the day. They carefully spread out the votes so that statistical analysis will not reveal any odd patterns. As a result, my new widget color for 2011 is hot pink. I produced tens of thousands of hot pink widgets, only to find out that nobody really wanted them. Is this a far-fetched scenario? Perhaps, but maybe not.

Granted an online survey should not be the only way I use to determine a new product color… but what if it was? Could we have ended up with pink M&M’s instead of purple? Would Steve’s tie still have survived if Dave been able to whip up a Twitter-spamming script to skew the votes?

What’s My Point?

Is there a point to all of this ranting about spammers and botnets and hactivists? In my opinion, despite SAP and others appearing to really want to make social media relevant, I find myself deciding that I’m not ready to trust it, not just yet. Hot topic for the day, certainly. Cute marketing gimmick with the ties? Sure. Critical success factor for my business? Not so much. With the anonymity of the Internet anyone can post anything anywhere they want, and with the help of some friendly spammers, as much as they want. It almost seems to me that people have forgotten how bad the email spam problem was years ago because IT folks and ISP’s have become fairly adept at identifying it and filtering it out. Social media services like Twitter and Foursquare and others are all going to have to figure out the same thing or they will be overrun just like Google’s gmail service already is.

As an aside: I talked to Timo Elliott at a conference a few years back and I asked him how he dealt with what surely must be hundreds if not thousands of “link requests” on LinkedIn from folks that he didn’t know. He shrugged and said that he just ignored them. This despite the fact that LinkedIn is a relatively closed system, unlike Twitter which is wide open for everyone to see and (ab)use.

If I seem bitter or disillusioned it’s probably because I know what goes on behind the scenes on my blogs and on the boards that I manage. As a result I have become more skeptical and much less trusting of data sourced from the Internet. Don’t even get me started on privacy issues related to social media sites (Facebook, Twitter, Foursquare, even the new Google+) as that’s an entirely different rant.

That’s my opinion today. What about five years from now?

Five years in Internet time is a lifetime. I can’t guarantee what it will be like then, but I can guarantee it will be different that probably anybody expects. Will Facebook still be here, or will it pull a MySpace and disappear? Will Twitter implode under the weight of millions of tweets, or more importantly will it finally find a way to make money? More importantly, will anybody care if they do?

Comments (6)

6 Responses to “Social Media – I Don’t Get It, But Spammers Do”

Comment by Scott Wallask, Managing Editor September 23rd, 2011 at 3:45 pm

Good post and an interesting view from the “behind the lines” of someone running a forum. I agree that the Internet is the wild west. However, one thing about Twitter that I like is that is has suddenly put an immense amount of power back into the hands of consumers, such that they can make companies change policies overnight and voice displeasure at corporate missteps (such as Target not ensuring its website could handle all of the Missoni orders). From an SAP BI perspective, our research here at WIS Pubs indicates not a lot of BI folks are on Twitter, but those who do use it would be categorized as influential people. So if you want to appear to be influential in BI, you should be on Twitter. Is that smoke and mirrors? I’m not sure yet.
Comment by Dave Rathbun September 26th, 2011 at 9:11 am

Hi, Scott, thanks for your comment. I guess I have two thoughts about a response. First, people have always had avenues to respond to companies with comments, suggestions, or complaints. What’s different now is that everyone else can see those responses. Years ago I might write a letter to a company expressing my dissatisfaction with one of their products or services. Companies would know there’s a problem if they get multiple letters, but I have no way of knowing about it. Today I can make a Facebook page and let other folks chime in, or I can tweet my dissatisfaction and see if others follow along. No company marketing department wants to find out that “xxx sucks” has become a trending topic on Twitter. 😉

With Twitter I can see if other folks are having similar issues. Does that make me feel better? Perhaps. Does it make Twitter a better resource?

Consider that Twitter is like just about everything else on the Internet with respect to anonymity. Case in point: the other day Nathan Fillion (star of “Castle” and the science-fiction series “Firefly” among others) tweeted to his followers that he was having a conversation with a friend (at least I assumed it was a friend). His friend didn’t believe that Nathan could take down his web site. Nathan tweeted that out to his followers, and within 20 minutes the target web site was offline. Classic case of a DDOS (distributed denial of service) attack. Nathan has close to a million followers, and with nothing more than a quick, casual tweet, he set some percentage of those folks in motion. Influence? Certainly, but not in a good way. 🙂
Comment by Jordan Petlev September 26th, 2011 at 3:20 pm

I agree with everything you posted here and have had discussions with friends about the so called ‘twitter’ craze that companies seem to be on. What I find even more amusing is as soon as I read this post, my first thought was “Ohh I should tweet this!” 🙂
Maybe they are winning after all.
Comment by Werner October 1st, 2011 at 2:05 pm

From a BI perspective, there are multiple aspects:

equal distribution
As with any direct democracy, not the number of potential voters count, but the number you can mobilize. That’s your example of using followers to tip the odds. The problem for analysis is, you have no baseline. If my country has 200 million voters, and I get a 90-to-10 ratio against my proposed law, it might sound like a statement until you figured out, just 1 million people voted, 199 did not. So even if that 1 million would have been representative, the uncertainty is high. But for most things you neither have the base line nor the means to figure out it was representative. For a statistical solid test, 1 million would be a large representative group, but as you have no control over the group, you have no idea on either.
I have the same problem in small scale as well. On ideas.sap.com there is an idea which got 5 times more votes than the second. Why? Because more people got motivated to vote, not because the idea was necessarily a good one.
I can proof easily that nobody likes to watch NFL. I just phone a representative group of people and ask: “Would like to participate in a poll?” – In the middle of the NFL finals.

drawing conclusions
If I do an analysis of BOB postings, I might get very disappointed. 100 people say they have a problem, 1 is saying something positive about my product (and that was probably me). So should I draw the conclusion the product is crab? Of course not, BOB is about solving problems. So if you give the statistics of the posts to somebody inexperienced in that field, he might draw the wrong conclusion or use the number for questions you can’t. You can identify problematic areas, problematic either because many people use it and have minor problems or few people us that feature and it is complete crap. Very hard to find out – again, because of the missing baseline.
Add statistics or even data mining and the number of problems you might overlook inadvertently, explode.

legal requirements
The US, at the moment at least, is pretty open. Read “open” in the sense of not-protected.
For the European Union the most important guidelines are, You shall use data….
* Processed for limited purposes.
* Adequate, relevant and not excessive.
* Kept no longer than necessary.
And it makes sense. And it would strictly rule out that I suck in all BOB posts and use the sentiments for further analysis inside my Data Warehouse.
http://en.wikipedia.org/wiki/Information_privacy#Legality

And now add fraud, ill-intentions, pre-opinionated,…
Comment by Andreas (Xeradox) October 3rd, 2011 at 2:41 am

Obviously, I do enjoy Twitter, mainly for professional reasons. It gives me the chance to tweet about issues or problems I have been facing (mainly in the BI world that is), without having to take the time to write a proper blog entry or technical paper. So for me it serves its purpose.
I agree though, that companies should think twice about making facebook or any other external site THEIR CUSTOMER facing site. They should rather spent time, effort, and money into making their corporate web site easy to access, easy to use and navigate, fun to navigate, fast to response, and up-to-date.

I believe certain big BI vendors could cretainly improve quite a few things on their corporate web sites. Facebook would then merley be a link back to the corporate web site

For companies the social media are just another channel to reach their customers or enable cusotmers to provide feedback.
Comment by Dave Rathbun October 6th, 2011 at 10:38 am

Interesting reading tweeted by Cindi Howson related to the “quantity versus quality” debate, specifically related to Twitter:

Social Media Takes a Page From the Playground: Is Quantity or Quality of Relationships More Important?

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30