SAFE Banner

JUNE 2010

VOLUME 2, ISSUE 6

Software Scan

The President's Column

Welcome to another edition of Software Scan from Software Analysis and Forensic Engineering Corporation. In this month's Scanning IP section I discuss whether whitespace patterns in source code can really be used to detect plagiarism as a number of experts and lawyers claim. In this month's Scanning Tools section I discuss the latest version of CodeSuite, specifically the new CodeCLOC tool for transfer pricing calculations.

Send me your comments and critiques. I'm always interested in hearing from you.

Regards,


Bob Zeidman
President, SAFE Corporation


Scanning IP

Can Whitespace Patterns Provide Clues to Plagiarism?

Over the years I've run into expert witnesses and attorneys who have told me about software copyright infringement cases where the only clues that copying occurred were patterns of spaces and tabs ("whitespace"). The idea is that if a truly ambitious thief wanted to cover his tracks, he would modify the stolen code so much that there was no longer a visible trace of copying. However, the clever software sleuth could find patterns of whitespace that the thief had missed; although virtually nothing remained, the invisible tabs and spaces could produce a conviction.

This always sounded intriguing, but I wondered whether anyone had ever tested this theory. We could find no articles or papers on the subject, except for one inconclusive paper, and I dreaded to think that some programmer was convicted based on an untested theory. I decided to have my consulting company, Zeidman Consulting, do some carefully controlled research. If the results turned out well, SAFE Corporation would add whitespace pattern algorithms to CodeSuite to further enhance its ability to detect copying.

Our results were published in a paper entitled Measuring Whitespace Patterns as an Indication of Plagiarism that was recently presented at the ADFSL Conference on Digital Forensics, Security and Law. Our results are summarized in the final paragraph:

This whitespace pattern matching method can be used to focus a search for evidence of similarity or copying, but this method cannot stand by itself.

What we discovered is that even very different files have often have similar whitespace patterns. At Zeidman Consulting we've used whitespace patterns to confirm copying that was already detected through the use of CodeMatch to find correlated programming elements. In those cases, the whitespace patterns offered further confidence in our findings and in some cases showed which program had been developed first. For a copy of the paper, email us at info@SAFE-corp.com.

Our next research project is to look at sequences of whitespace within files. Maybe there we'll find some clues to copying. But for now our results show that whitespace patterns without any other evidence should not be used to determine that copying occurred.

Advanced Tools to Detect Software Plagiarism and IP Theft

CodeSuite®
A sophisticated set of tools for analyzing software source code and object code including:

BitMatch®
Check binary object code for plagiarism.

CodeCLOC
Measure software IP changes between versions of a program.

CodeCross®
Cross check source code for plagiarism.

CodeDiff®
Compare source code to find differences and measure changes.

CodeMatch®
The premiere tool for pinpointing copying.

SourceDetective®
Scour the Internet for plagiarized code.

CodeGrid®
Turbo charge your analysis on a supercomputer grid.

Get Smart

SAFE offers training at our facility or yours or on the Web. Contact us to make arrangements:

MCLE credit in software IP

CodeSuite certification

Your New Office

Remember that you can now have your own secure office at the SAFE facility for storing proprietary software, running CodeSuite, analyzing the results, and getting onsite support. We're located at

20863 Stevens Creek Blvd.
Suite 456
Cupertino, CA 95014
(408) 517-1167

Scanning Tools

CodeCLOC™

Last month we announced CodeMeasure, our new standalone tool for measuring software growth. This month we announced the release of CodeSuite 4.0 that includes CodeCLOC for measuring how software evolves across versions of code. CodeCLOC uses the same algorithms that were implemented in CodeMeasure and that were developed for the landmark software transfer pricing case Symantec v. Commissioner of Internal Revenue.

You're probably wondering what is the difference between CodeMeasure and CodeCLOC. CodeMeasure is a simple, inexpensive program for generating the CLOC measurement statistics for multiple versions of a program. CodeCLOC, intended for litigation, compares only two versions of code but produces a detailed database of results that can be further filtered and analyzed using CodeSuite or your own custom tools. The results from CodeCLOC can be presented in court and the CodeCLOC database can be presented to the opposing party for verification.

CodeSuite 4.0 also has a few other nice features including a revamped user interface. There's also a new function to generate statistics from any CodeSuite database and the command line interface has been enhanced for integrating with other programs. CodeSuite 4.0 is available for download here and can be purchased on a term license or project basis. CodeCLOC is priced at $20 per megabyte. A one year term license for CodeSuite is $100,000.

This newsletter is not legal advice. Views expressed herein should be checked for accuracy and current applicability.
Copyright 2010 Software Analysis & Forensic Engineering Corporation