SAFE Banner

APRIL 2010

VOLUME 2, ISSUE 4

Software Scan

The President's Column

Welcome to another edition of Software Scan from Software Analysis and Forensic Engineering Corporation. In this month's Scanning IP section I talk about the DUPE project, a way to collect plagiarized source code to create a standardized tests for "software plagiarism detection" tools. We need help with this important project, so I'm sending out a call for partners. In this month's Scanning Tools section I tell you how CodeSuite can recover from a computer crash or any other interruption of an analysis and pick up right where it left off.

Send me your comments and critiques. I'm always interested in hearing from you.

Regards,


Bob Zeidman
President, SAFE Corporation


Scanning IP

DUPE: Depository of Universal Plagiarism Examples

In 2003 I created the CodeMatch program that very quickly became a de facto standard in software IP litigation. I created a test bench of purposely plagiarized code that could be used to independently and objectively compare the results produced by different plagiarism detection programs. Some in the academic community claimed that my tests were biased toward the algorithms used by CodeMatch, which explained why CodeMatch fared so well compared to the other programs. However, these same critics, despite my requests, never produced their own set of standard tests.

Although I believe that the standard tests I have used are not biased, it occurred to me that there could be a better way to eliminate even unintentional bias. The solution would be to take the source code for certain open source programs and announce a new open source project that would involve purposely plagiarizing the code. Programmers from around the world would be invited, perhaps in a competition, to change the source code while retaining the functionality. The original programs and the plagiarized versions submitted from others would be stored in a database known as the Depository of Universal Plagiarism Examples or DUPE. Plagiarism detection programs would then be run on DUPE and comparisons of the results could be made to determine which programs best detected copying. Also, important statistics about plagiarized code could be determined, as well as patterns identified in order to improve the plagiarism detection programs.

SAFE Corporation has begun looking into creating this database. However, we realize that we would like to work with partners in academia and industry. We believe that there are several key issues that need to be resolved in creating DUPE. These are:

  1. Choosing appropriate open source projects.
  2. Creating a minimum definition of software plagiarism.
  3. Creating the database.
  4. Determining policies including who can access it, how it will be used, and who will maintain it.
  5. Determining how to run the tests, how to generate the results, and how to distribute the results.

Please contact me if you're interested in working on this important and groundbreaking project.

Advanced Tools to Detect Software Plagiarism and IP Theft

CodeSuite®
A sophisticated set of tools for analyzing software source code and object code including:

BitMatch®
Check binary object code for plagiarism.

CodeCross®
Cross check source code for plagiarism.

CodeDiff®
Compare source code to find differences and measure changes.

CodeMatch®
The premiere tool for pinpointing copying.

SourceDetective®
Scour the Internet for plagiarized code.

CodeGrid®
Turbo charge your analysis on a supercomputer grid.

Get Smart

SAFE offers training at our facility or yours or on the Web. Contact us to make arrangements:

MCLE credit in software IP

CodeSuite certification

Your New Office

Remember that you can now have your own secure office at the SAFE facility for storing proprietary software, running CodeSuite, analyzing the results, and getting onsite support. We're located at

20863 Stevens Creek Blvd.
Suite 456
Cupertino, CA 95014
(408) 517-1167

Scanning Tools

Computer Crash? No Problem!

Well it may be a problem for other reasons, but you can probably start your CodeSuite run right where it left off. I created this feature when I had been running a very large CodeDiff job. After running for a few days straight, CodeSuite was creating a large number of temporary files that eventually filled up the entire disk and crashed the system. I had a database fragment that was unusable and the job had to be started from the beginning.

This led to two changes. First, temporary files are now deleted periodically before they get to any appreciable size. Second, I designed the restart feature. This cool feature allows you to point at a partial database and run the interrupted CodeSuite function with all of the parameters set correctly. Simply click on the blue arrow in the toolbar or select Restart from the Tools menu. You then point to the interrupted CodeSuite database. The appropriate function will start up with everything filled in. You will be told how many licenses are needed to complete the job. The job will start where it left off and you've lost very little time and used up no extra licenses.

This newsletter is not legal advice. Views expressed herein should be checked for accuracy and current applicability.
Copyright 2010 Software Analysis & Forensic Engineering Corporation