SAFE Banner

JUNE 2011

VOLUME 3, ISSUE 6

Software Scan

The President's Column

When searching for facts, Wikipedia is easy to use. In fact, searching Google often brings up Wikipedia first. But is it reliable? Is it acceptable in court? Read more in the Scanning IP section of this month's newsletter.

Send me your comments and critiques. I'm always interested in hearing from you.

Regards,


Bob Zeidman
President, SAFE Corporation


Scanning IP

Wikipedia: Reliable Reference or Biased Blathering?

When I began writing my book on software intellectual property, I often needed definitions of terms, lawsuit citations, technical references, and historical facts. In those long ago pre-Internet times, this meant reserving whole days at local libraries to sort through catalogs, walk through mazes of bookshelves, run my fingers along Dewey Decimal-coded book spines, pull heavy volumes off the shelves, spread them across big wooden tables, flip back and forth between indexes and pages, and skim dense paragraphs of text. Now I just Google, and usually it's Wikipedia that comes up.

As I wrote my book it became filled with references from Wikipedia. Some of the information I knew was correct but I needed a formal reference and Wikipedia seemed good enough. Other information I could verify at multiple sites, but the Wikipedia definition was always concise. I had been told by many attorneys that Wikipedia references were not considered legitimate in court, and I never use them in expert reports for litigation, but I figured it was good enough for my book. It was only when one of the reviewers pointed out that using Wikipedia would hurt the reputation of my book, especially among lawyers, that I gave it a second thought. I went back and found alternate references and though the main concepts that I was referencing in Wikipedia were essentially correct, it was the details that Wikipedia often got wrong.

And I knew this fact already. My Cornell roommate Rob Smigel had gone on to be a fairly famous comedy writer for Saturday Night Live. The Wikipedia page originally said he had graduated from Cornell. I figured this needed correction, because Rob dropped out (before nearly failing out) and transferred to NYU where his dad sat on the board. Rob's story is actually that old cliché where his dad insisted he become a dentist like himself, but Rob only wanted to be a comedian, a career that his dad strongly disapproved. My corrections to the page were regularly removed because I couldn't document this fact with external references, yet most of the other information in the bio was unreferenced.

This points out one significant problem with Wikipedia. In the early days, people entered what they wanted with little if any fact checking required. Eventually those early pages, and there are probably millions of them, became accepted as incontrovertible fact. I have at least one friend whose Wikipedia page was created by colleagues as a joke, yet it gets quoted as true.

Later I submitted a reference to a Rolling Stone interview with my roommate Rob Smigel where he mentions not completing a degree at Cornell, but somehow a Wikipedia editor did not find even this credible enough and edited my sentence into a short phrase that has since been removed. In fact, as of today all references to Cornell have been removed from Rob's bio even though he attended for two years.

So this points out yet another significant problem with Wikipedia. There are now editors who have taken it upon themselves to be the correctness police. They go about removing edits of others if they don't conform to their own beliefs. Many of these editors boast tens of thousands of page edits. Wikipedia has set up rules for editing, but there is only a long process and many level of effort to disputing an edit, that still rely on these same biased Wikipedia editors who do not necessarily have any expertise in anything let alone the subject under consideration. In fact, although my company and my software is the most widely used copyright infringement detection software in court cases, even simple links to our website in Wikipedia have all been eventually removed by an editor who says this is self-promotion. Why is self-promotion bad if the facts are provably true?

Even Wikipedia states that the information on its site may be incorrect, as confirmed in this Wikipedia page about using Wikipedia1:

Wikipedia's most dramatic weaknesses are closely associated with its greatest strengths. Wikipedia's radical openness means that any given article may be, at any given moment, in a bad state: for example it could be in the middle of a large edit or it could have been recently vandalized…

Where does Wikipedia stand in courts? There have been many references to Wikipedia in court cases, but the rule is that it's a bad thing to do. Recent studies have shown that courts are allowing Wikipedia references much less than in the past2, 3, 4, 5.

So my advice is that Wikipedia is great for cocktail party banter, but don't rely on it for critical facts. The anonymity of its contributors, the poor fact-checking on the early contributions, and the bias of unqualified volunteer editors make it an increasingly inaccurate source that is losing its initial attraction for many.

Footnotes:
1. Wikipedia:Researching with Wikipedia
2. The Citation of Blogs in Judicial Opinions

3. Badasa v. Mukasy, 2008
4. Bing Shun Li v. Holder, 2010
5. Cohen v. Google, 2010

Advanced Tools to Detect Plagiarism and IP Theft

CodeSuite® & CodeSuite-LT®
Sophisticated sets of tools for analyzing software source code and object code including:

BitMatch®
Check binary object code for plagiarism.

CodeCLOC®
Measure software IP changes between versions of a program.

CodeCross®
Cross check source code for plagiarism.

CodeDiff®
Compare source code to find differences and measure changes.

CodeMatch®
The premiere tool for pinpointing copied source code.

SourceDetective®
Scour the Internet for plagiarized code.

CodeSuite-MP®
Speed up your analysis on a multiprocessor system.

CodeGrid®
Turbo charge your analysis on a supercomputer grid.

DocMatch
Find signs of copying in any document.

Get Smart

SAFE offers training at our facility or yours or on the Web. Contact us to make arrangements:

MCLE credit in software IP

CodeSuite certification

Your New Office

Remember that you can now have your own secure office at the SAFE facility for storing proprietary software, running CodeSuite, analyzing the results, and getting onsite support. We're located at

20863 Stevens Creek Blvd.
Suite 456
Cupertino, CA 95014
(408) 517-1167

This newsletter is not legal advice. Views expressed herein should be checked for accuracy and current applicability.
Copyright 2011 Software Analysis & Forensic Engineering Corporation