Inherited Risk from Online Community Posts Referencing Open Source Projects

Here is a paper I was attempting to publish through ACM, but was primarily denied due to having to use proxies in-order to gain information from Bugzilla. Secondly, the reviewers didn't understand the issues around stackoverflow for companies and how this could be used intandem with web-proxies or browsers to warn developers etc... oh well - so here is to my blog for open publishing.

This exploratory study involves the examination of the relationship between online community forum posts to referenced open-source projects and calls attention to the software quality inherited by implementing associated project code. To deduce potential risk for a developer utilizing an online resource, over five million posts were filtered out from a large Stack Overflow (SO) data-mining experiment and examined for similarities, relationships and risk related to a popular opensource project.

The study presents a unique approach which utilizes extracted traditional software quality and change metrics from a selected project, combined with project association indicators to create an aggregate score for potential software quality risk. Resulting information suggests that determining relationships when evaluating software quality risk from incomplete source code snippets is difficult, but not impossible. In general, the study discovered that minimal-to-moderate risk was identified for snippets with an established relationship to the GTK+ project and no snippets found related to highly-to-extremely risky sourcecode.