03.12.08

A Scraping Worse than An OBGYN’S Visit

Posted in I Need A Scalpel at 12:18 pm by Andrea

She sits down at her computer and turns it on, the familiar sing song bing bong that accompanies the operating system logo chimes and then she’s ready to go.  Soon, her fingers are flying, an amused smile trickles across her face, disappearing and emerging as she relays a story about her two sons.  She stops to ponder the phraseology of her next thoughts, thinking of how best to convey the sight and sense of the scenes she’s witnessed and is writing about today.  She wants to get it just right.  These stories are, after all, about her kids, and she wants to not only do them justice but also make the boys proud when they read back on her words later in their lives.  Her fingers whir and click her thoughts to the screen.  With a flourish, her fingers cease, the keyboard silent once more.  After a quick perusal and a spell check, she clicks the mouse pointer over the Publish button.  Soon, her fans, thousands of them all across the country and from many foreign countries even, will be laughing along to her words, words they’ve come to adore about boys they feel they know and love themselves, if only vicariously through the Web.  They tune in with anticipation to their daily Dana dose, their Mamalogues® read. 

And yet, something is amiss. 

Some readers are not getting to her site when they read her.  They may not know it, and in fact probably do not.  What they also don’t know is that their click on the site where they usually do read is one more tick mark in the revenue column for a different website.  It’s an aggregator site, but the difference between this and an aggregator such as Google Reader or Bloglines is that this aggregator site has ads, ads that may not necessarily reflect Dana’s values, which are showing alongside her story about her sons. Another difference between these sites is that a quick click on the the links describing Mamalogues® won’t navigate them directly to http://www.mamalogues.com/ as would Google Reader or Bloglines.  There is a very vague link to the original post on Dana’s site by clicking the “…” at the end of each excerpt.  Without a description on that link to let readers know it’s the permanent link to the original entry, most readers will likely click on the post title, which only recently was changed to link back to the author’s blog instead of another page on the aggregator’s site.  It appears to the readers of this aggregator site that Dana’s directly contributing to them, supporting their posting of her content, and the site - NOT Dana - receives all the revenue from the various clicks that bring readers to her posts.  Given that Dana’s income is dependent on her content on Mamalogues®, and her blog is what has led her to receive other opportunities, including an award-winning online newspaper column, a St. Louis radio show on Sunday nights on 97.1 FM Talk, and a TV spot on the Saturday morning news, this is, in effect, stealing from her wallet.  Readers don’t get to her site very easily, and her own ad revenue suffers, as do her site stats which prove to advertisers that she has the audience to back up her request that they put their money where her mouth is.

The name of this aggregator site is Blog Net News, and I’m specifically not linking to them so that they won’t receive any revenue-generating clicks from me.

What’s happened is similar to what Bitacle.org did a couple years ago.  In an effort to build an online community presence in the St. Louis area, Blog Net News (BNN) has taken the blogroll of STLBloggers and scraped the content into BNN and is claiming Fair Use of said content.  The worst part of this is that they have sold ad space on BNN, which expressly violates most Creative Commons licenses and Copyright laws that authors of blogs have invoked to protect their writing, wherein most content is free to be used as long as there is no profit to be made.  This licensing allows other bloggers and aggregator websites to highlight their posts, the writers, and offer reviews of the content without fear of being prosecuted.  The writers of the borrowed content get more exposure and the websites build a community that many rely on to keep them reading quality content. 

Dana, becoming aware of her site’s use on BNN’s front page, immediately requested to be removed, and has since raised the question to other bloggers to get their opinions about the situation.  According to Dana, her removal request has been ignored by BNN’s owner Dave Mastio even as late as yesterday in the comments thread of the above linked post.  The comments thread of that post is very enlightening, and while I strongly urge you to click through and read the whole thing as a matter of principle being a blogger yourself and protecting bloggers’ rights as content creators in this ever-changing media of the Internet, I also realize that reading 70 plus comments takes time, so I’ll highlight what are in my opinion some of the most important points.

After a few comments on the matter by bloggers expressing that they wouldn’t like having their content used in this manner, Dave entered the discussion with a comment containing a URL for Fair Use as described on Wikipedia, his core defense for using the content he uses without blogger permission.

Dana referred to an email request to Dave to remove her from BNN due to violation of using her federally registered trademarked Mamalogues® to generate ad revenue without her permission and not just violation of her content copyright, to which Dave responded: “Using a trademarked name to refer to the trademark owner and link to the trademark owner’s site has never been a violation of trademark.” 

Call me stupid if I’m wrong, but I thought that using a trademarked name in the pursuit of revenue without that trademarker’s permission IS a violation. 

Another commenter, rev_matt states, “If you [the blogger] aren’t publishing under a CC license (or aren’t publishing your feed under one) consider doing so.

A license that would legally prohibit aggregators from using your feed (note this is an all or nothing approach: you can’t allow stlbloggers to use it and deny blognetnews under this scheme)”, to which Dave responds, “It is perfectly acceptable to publish one license for use by the public and then privately offer a better license to anyone else on better terms at your total discretion.

However, nobody can write a site license that restricts “fair use rights.” That’s why they are called ‘rights.’”

It all gets pretty technically jargony, and rev_matt comes back with “Fair use explicitly demands that it be for criticism, comment, news reporting, scholarship, etc. Aggregating doesn’t qualify under any of these standards.

Fair use doesn’t supercede copyright law, it makes limited exception to it for very specific circumstances with the core criteria that such fair use doesn’t diminish the value of the copyrighted material.”

Several more points are raised by bloggers with questions, opinions, and general comments, and throughout the whole comment string, Dana asks the question, “Does this mean that you’ll finally respond to my email and remove my content and trademark from your site? And the emails from other bloggers requesting the same?”

She asks this question no less than three times, and after the third time in five hours, during which time Dave has come back repeatedly to debate other commenters, he finally says this:

“Dana,

I have refused to acknowledge your emails because the very first time you contacted me, you brought a lawyer into it and before I even had time take another look at your blog and consider exactly how to reply to you, I was informed by another St. Louis blogger that you were sending badmouthing emails all over the St. Louis blogosphere. As a general rule, you’ll find people much more willing to give you exactly what you want when you ask for it without the lawyers and the lectures. Indeed that’s exactly what happened to Prologos when she asked to be added AND when she asked to be removed.”

It’s become a matter of her offending him by bringing lawyers to the table up front instead of actually honoring one blogger’s request to be removed from his site. To which Dana replies:

“Dave,
Are you delusional? The “badmouthing emails” is a total crock. I caught you in one lie already, give the smear tactic a rest and walk lightly.

You took my content without my permission; you completely disregarded any courtesy to me in that respect so don’t act like you were affronted that I immediately responded to explore my legal options. What did you expect? An invitation to tea? This is the real world - and this is what happens when what you do is possible infringement.

I sent you an email asking you to take down my work. You’re refusing. As for the condescending advice, I think you’ll find that it’s easier to work with people when you’re not scraping their content in a pathetic attempt to make a buck.”

And that people, is the whole thing in a nutshell.  Dave’s trying to make money off blogger content without their permission, and when they get pissy about it and bring up legal action to defend their own intellectual property, he ignores them until forced to say he’s ignoring them because they didn’t ask nicely.

I don’t know about you, but I’m thoroughly disgusted by this.  It makes me angry.  I want to throw things.  And dammit, my chocolate stash is gone now.

Others chiming in:
Mamalogues®
News-Bitch
Super Fun Patrol
WOBL in Training
STLProBloggers.com
The State of Discontent, Part 1, Part 2, Part 3, Part 4
Slacker Moms R Us
The Broad Brush
A Bun’s Life
Prologos
Hwy 61

34 Comments »

  1. Rebecca said,

    March 12, 2008 at 12:35 pm

    not everyone wants to read our 70 comments? it probably makes for better entertainment than most would think: we continually insist that Mastio and his site are shady; Mastio continually not caring. GRRR.

  2. Dana said,

    March 12, 2008 at 12:48 pm

    That title is hardcore! A perfect analogy though.

    Thanks for discussing some of the things bloggers face in regards to their work.

    And more baby photos!!

  3. jonniker said,

    March 12, 2008 at 12:52 pm

    I’m pretty fucking disgusted right now, actually. Not cool. At all. He’s trying to get away with skirting bloggers’ rights because on some level, he thinks they aren’t smart enough and/or don’t have any knowledge of their rights. I noticed he’s not doing it to the Post-Dispatch or any “real” journalism sites. Because he KNOWS what he’s doing is wrong. He knows it.

    What a pig.

  4. This just ain’t right | St. Louis Pro Bloggers said,

    March 13, 2008 at 8:05 am

    [...] Little Bald Doctors [...]

  5. Tony Bennett Thursdays : Running Aggregated said,

    March 13, 2008 at 8:46 am

    [...] Sites Voting/Writing (will be updated all day): Little Bald Doctors WOBL in Training Superfunpatrol MidwestBlogs -St [...]

  6. Dave Mastio said,

    March 13, 2008 at 9:15 am

    This statement is outright false: “Without a description on that link to let readers know it’s the permanent link to the original entry, most readers will likely click on the post title, which only recently was changed to link back to the author’s blog instead of another page on the aggregator’s site.”

    The title of each post title has linked back to the originating blog for more than 18 months.

    This statement by commenter jonniker is also false: “he thinks they aren’t smart enough and/or don’t have any knowledge of their rights. I noticed he’s not doing it to the Post-Dispatch or any “real” journalism sites.”

    Nearly ever section of BNN includes feeds from blogs produced by newspapers and/or magazines and/or TV stations.

  7. Andrea said,

    March 13, 2008 at 9:35 am

    Dave, two days ago, I clicked several post titles and was redirected to another page on BNN, and NOT to the original author’s website. Maybe it was a technical glitch that was fixed. I’m not positive. But in my personal experience, I was not sent to the original author’s blog. When I wrote this yesterday, those post links DID go back to the original author’s blog. That being my experience, I wrote that it was recently changed to link back to the author’s blog because that’s how it happened for me.

  8. Sugared Harpy said,

    March 13, 2008 at 10:16 am

    Dave, I SENT YOU AN EMAIL TOO. Because you are also stealing my content.

    Please just stop.

    You didn’t notify me, you aren’t paying me, I have a this text on my blog’s footnote: Copyright © 2005. All rights reserved.

    What part is confusing to you…you do not have my permission. I have requested you to take down my content.

    I didn’t even get a reply from you. STOP.

  9. Melody said,

    March 13, 2008 at 10:50 am

    I am baffeled at the fact that someone has enough time to find every conversation ever had about them on the internet and enough energy and balls to go to a person’s personal site and argue about it! I heard that people were too busy with elections to get things done on their own site, but I have also seen that they have enough time to be blunt, rude and condescending to others.

  10. jonniker said,

    March 13, 2008 at 11:43 am

    Dave, you’re right — at the time, I hadn’t seen the links and later, I did find the RSS links from others, and, if you were wondering, I talked to a few major media outlets about what you’re doing (former newspaper editor here). And, surprise! They aren’t happy with it and don’t see it as something that’s okay. However, at least one person told me that the only reason they haven’t done anything about it yet is because they don’t see you getting any traffic and see your business model as a failure, so why bother dealing with it?

    I, for one, tend to agree with them. Even if your site were on the up and up, which it isn’t in my book, I don’t like to read my content the way you organized it, and you’re trying to rally a culture around something that inherently goes against it. So ultimately, I suspect you will go the way of bitacle — big brouhaha followed by an even bigger failure.

  11. Dave Mastio said,

    March 13, 2008 at 11:49 am

    Andrea,

    I think you might be confused. It is possible that you clicked on the name of a blog — and that links to an index of post excerpts (all with two links to the originating blog). The headline of individual posts always leads directly to the blog in question.

  12. Craig Mayhem said,

    March 13, 2008 at 11:52 am

    Dave…

    The blogroll links internally to your site (with more ads).

    The blog title links internally to your site (with more ads).

    The only things that DON’T link internally to your site (with more ads):

    POST title and the elipsis.

    Not exactly the first place, from a usuability standpoint, that someone would expect to click to get to the website excerpt that is being scraped.

    As a web designer for OVER 12 years, I can tell you with that it seems a little decieving.

    One would expect that the blog’s title and the blogroll link would go back to the original site.

    I’m staying out of the whole liability issue because I’m no lawyer, but I know a thing or two about usability. I don’t know if you’re being deliberately underhanded in this way - but it sure seems suspect to me.

    I’ll now continue with your regularly scheduled peepee and fart jokes.

    http://www.superfunpatrol.net

  13. Andrea said,

    March 13, 2008 at 12:18 pm

    Dave,

    I am testing again the click thing to see where the confusion is. Specifically I’m checking the blog from BNN called CasaChristy.com. I’m clicking the POST title, and the little “…” thingy, and BOTH LINKS GO BACK TO BNN, despite your claim that they go to the blog author’s site. NO LIE. Granted, when they open a new window to show the link, the page comes up blank (except for the BNN logo and another ad, which I’m sure counts as a page click and a few more bucks in your pocket, so I quit trying to get to the entire post on CasaChristy’s place.) I spot checked a couple other blog links, and I was able to get to the author’s sites, but not all your links are working as you say they are.

    You’re experiencing technical difficulties, apparently, which makes you look bad. And making me look less like the liar you called me earlier.

    And you bitched at Dana about being insulting to you. Don’t try to make me look stupid. It WON’T work.

    Also, I see that you removed Sugared Harpy from the “blogroll” which of course links to more BNN pages (cha ching!). But you haven’t addressed Jonniker’s reply to your insinuation that she’s a liar, too.

  14. Dave Mastio said,

    March 13, 2008 at 12:20 pm

    Craig,

    There are usually three links associated within each post. One to our internal index in the name of the blog and two that go directly to the blog that originated the excerpt.

    Fairly frequently a blog posts two items in quick succession so that two posts in a row appear on BNN. In that case the title of the blog (and thus the internal link) appears only once. That means that of the three typical links, BNN gave itself the one that appears the least and the blogger in question the ones that appear slightly more often.

    Compare to Google. There the headline links back to the originating blog and then there is an internal Google link to similar pages and there is an internal Google link to the full-text cached version.

    To summarize — BNN gives web sites two of three links and doesn’t publish full text.

    Google gives one of three links and does reproduce full text.

    Draw your own conclusions.

  15. Craig Mayhem said,

    March 13, 2008 at 12:27 pm

    Google what?

    Search?

    That’s a TOTALLY different animal than an aggrigator.

    Google personalized pages have ZERO ads.

    Google NEWS has ZERO ads.

    Also, you’re not telling me anything I didn’t state!

    I was addressing your design from a usability standpoint.

    Not to mention (ok I did mention it) the BLOG NAME on both the blogroll and post do not link back to the originating site.

    The BLOG NAME is the closest thing (and usually IS) to being intellectual property of the blogger.

    So using the blog name for internal links, twice on the main page, is what I’m talkign about.

    Usability wise it’s just crummy.

  16. Dave Mastio said,

    March 13, 2008 at 12:55 pm

    Andrea,

    I said the statements were false. I didn’t call anyone a liar. There are lots of ways to be wrong. Over my life I have explored all of them.

    I am not here to call anyone names or to fight with them. If you are going to talk about BNN, I am going to be here to make sure the basic facts get out there correctly. You’re free to make your own judgements.

    The CasaChristy rss feed: http://casachristy.com/rss.xml does not include links to the blog posts in the titles of the blog post as is standard. I will send an email to the writer letting her know of the problem. Thanks for being specific.

    Craig,

    With all the caps it sounds like you are upset. I am not trying to make you that way, just explaining my perspective and the thinking that went into the decision we made about the way BNN works.

    I don’t think it much matters where Google has ads and where it does. Google is undeniably a commercial product.

  17. Dave Mastio said,

    March 13, 2008 at 1:01 pm

    woops — I don’t think it much matters where Google has ads and where it DOESN’T.

  18. Aggregation aggravation said,

    March 13, 2008 at 1:04 pm

    [...] in Training Super Fun Patrol News-Bitch Little Bald Doctors STL Probloggers State of [...]

  19. slackermommy said,

    March 13, 2008 at 1:07 pm

    I’m with you, Andrea. The title used to not lead back to my blog but to another page within BNN.

  20. Craig Mayhem said,

    March 13, 2008 at 1:08 pm

    Caps are for emphasis, not anger.

    Google IS undeniably a commecial product, but the logic is:

    In search - the information is requested by the user and therefore ads are served.

    In personalized pages and news - AGGRIGATED CONTENT - there are no ads. Google is not making money from aggrigated content - intellectual property of others.

    That’s the difference.

    More usuability issues (thanks anonymous source :)):

    Your “permalink” links to an internal page (with more ads). A permalink should link to the original post. - The trackback link more specifically.

    People don’t typically click “non-text” links. So the ellipsis is a rubber bone.

    I’ll directly quote my anonymous source (a usability expert):

    “He claimed on STLBloggers he was creating a more usable interface for blog readers but the basic informational hierarchy (because it is scraped out of context) means very little to users… The post starts with the blog name, but that looks like a headline, and we all know that people name their blogs some really weird things sometimes and they are sometimes not related to the topics they write about, so really the way these stories are listed. To a first time user who doesn’t understand what the site is really doing it really looks like these stories were submitted to the site (like it is an online newspaper) but without context, a catching headline, etc readers are going to skim the headlines, thinking man these guys can’t write and move on…”

    Also your colors used on your site wouldn’t pass usuability muster for colorblind people.

  21. Dana said,

    March 13, 2008 at 1:25 pm

    False statements?

    Wait, Dave’s statement about Midwest Bloggers? Or the “badmouthing” statement? I think we’re owed an apology.

    Rhetoric pwned.

  22. Dave Mastio said,

    March 13, 2008 at 3:10 pm

    For those interested, here is the most recent federal court case about text and fair use — it says the copying of entire works in the google cache is fair use.
    http://www.lessig.org/blog/archives/google_cache.pdf

    Actual analysis by actual lawyer here:
    http://www.benedict.com/Digital/Internet/Field/Field.aspx

    I suggest everyone read the section of the court decision on fair use.

  23. Craig Mayhem said,

    March 13, 2008 at 5:49 pm

    You’re joking right?

    A: You are not Google.

    B: Google was not directly profiting by use of others’ copyrighted material.

    C: You ARE.

    You aren’t providing anything but scraped content and ads.

    The more I read the document the more it convinces me that this is an entirely different animal.

  24. Aggregation? Aggravation! « The Broad Brush said,

    March 13, 2008 at 5:51 pm

    [...] News Bitch [...]

  25. Dave Mastio said,

    March 13, 2008 at 6:42 pm

    Craig,

    A: You’re right, I am not Google, but you and I and Dana and everybody else has exactly the same rights as Google.

    B: Why does Google have the Google cache? Cause Google users find it useful. Why does Google want those users to come to its site? Cause one way or another Google shows them advertising that Google gets paid for.

    C: I am doing exactly what dozens of other companies in addition to Google are doing on the Internet. The difference is BNN’s scope is small slices of the Internet instead of the whole thing.

    You can say BNN doesn’t provide anything, but our search engine is used by real people thousands of times a day. Our widgets were loaded five million times last month. 15,000 people a day come to BNN to get an idea of what is going on in the piece of the Internet they care about. Every day, those people use BNN to visit thousands of blogs.

  26. jonniker said,

    March 13, 2008 at 8:03 pm

    Dave, I just … you’re really fucking up here, I’m sorry. For what it’s worth, the blogosphere is very likely stronger than you. I’m not trying to be an asshat, I’m telling you the truth. There is a *reason* there are ad networks, and that companies bust their asses to curry favor with bloggers. Aaaaand, you … aren’t really getting off to a good start. In fact, you’re a walking “What Not To Do” advertisement.

    Like I said, I think you’re bound for failure anyway, but this might seal your fate. I usually stay out of this sort of thing, but yeah. Feel free to put the shovel down at any time. Did you SEE what happened to Bitacle? Is this your first day on the Internet?

  27. Andrea said,

    March 14, 2008 at 7:06 am

    Dave, all debate about the legality of what you’re doing aside, I have a question.

    How can you find yourself okay with pissing off so many bloggers in the process of running a business you say is there to help bloggers? How can you possibly think that what you’re doing is okay, ethical, honest, and actually a good business model when one of the goals you’ve said yourself that you’re trying to achieve ~ give bloggers better exposure and help their audience grow ~ is done in a way that many of those very bloggers find abhorrent and don’t want any part of? It seems so backwards to me that you’re pissing people off left and right by using their content without getting their permission first but claim that it’s all in an effort to help them. I just don’t understand how you can justify that to yourself.

  28. Sugared Harpy said,

    March 14, 2008 at 8:49 am

    Just a note that I did receive an email from Dave, saying he didn’t receive my first email (which I then included, of course), but also that he has taken down my content feed.

    I also can’t find it anymore.

    Thankfully, my part is gone.

  29. Craig Mayhem said,

    March 14, 2008 at 10:35 am

    Dave,

    Again, you are oh so wrong.

    A case that involves a search engine doesn’t relate to an aggrigator.

    I guess you feel that most people won’t read that PDF. It gave me a headache, but again - it has nothing to do with you and while a court case may fall in your favor in this matter, it wouldn’t be because of this precedent.

    It doesn’t relate. Just because you can search your site, you are not a search engine!

    When a user goes to Google, they are not presented with any content, cached material or ads until they request information.

    I’m pretty sure Google’s legal team provided coverage in their ‘terms of service’ that lets users know “if you use our service, be prepared for ads, etc.”

    When a user goes to your site, they are presented with pre-scraped, aggrigated content and ads right off the bat.

    Content written by others, copyrighted by most, and trademarked by some.

    Ergo - you are directly profiting from the creative work of others.

    Now with net legality a relatively new field, I couldn’t tell you if a judge would say, “Tough titty, bloggers” or “Pay these people, Dave” but frankly it could go either way.

    And your site stats have nothing to do with this discussion.

    Though I would like to see the difference between visitors to your site and click-throughs to the actual blogs.

  30. Dave Mastio said,

    March 14, 2008 at 6:24 pm

    Craig,

    To answer your last question, 90% of BNN readers click through to a blog within 60 seconds of landing on a BNN site.

  31. CourtneyWatson.net » Blog Archive » Secrets and Lies said,

    March 14, 2008 at 8:54 pm

    [...] Little Bald Doctors WOBL in Training Superfunpatrol The State of Discontent Mamalogues Slacker Moms-R-Us The Broad Brush Prologos A Bun’s Life HIghway 61 MidwestBlogs -St Louis [...]

  32. STLtoday.com - Virtual St. Louis - Blog Archive - The thin line between aggregating blogs and stealing content said,

    March 16, 2008 at 2:11 pm

    [...] Little Bald Doctors WOBL in Training Superfunpatrol The State of Discontent Mamalogues Slacker Moms-R-Us The Broad Brush Prologos A Bun’s Life Highway 61 MidwestBlogs -St Louis CourtneyWatson STLbloggers The News Bitch [...]

  33. Anachroclysmic » Dave Mastio (of “Blognetnews” infamy) can suck it. said,

    March 26, 2008 at 2:37 am

    [...] link back to the author’s blog instead of another page on the aggregator’s site. [original link here; my emphasis above is [...]

  34. Mom101 said,

    March 26, 2008 at 1:29 pm

    Thanks for making this clear to me. Until now, I’m not sure I understood the big dillio. Seems like whether his site actually hurts or helps bloggers, he’s just a big wanker.

Leave a Comment