Oregon State University
Alex Groce
2017
Fault masking happens when the effect of one fault serves to mask that of another fault for particular test inputs. The coupling effect is relied upon by testing practitioners to ensure that fault masking is rare. It states that complex faults are coupled to simple faults in such a way that a test data set that detects all simple faults in a program will detect a high percentage of the complex faults.. While this effect has been empirically evaluated, our theoretical understanding of the coupling effect is as yet incomplete. Wah proposed a theory of the coupling effect on finite bijective (or near bijective) functions with the same domain and co-domain, and assuming uniform distribution for candidate functions. This model however, was criticized as being too simple to model real systems, as it did not account for differing domain and co-domain in real programs, or for syntactic neighborhood. We propose a new theory of fault coupling for general functions (with certain constraints). We show that there are two kinds of fault interactions, of which only the weak interaction can be modeled by the theory of the coupling effect. The strong interaction can produce faults that are semantically different from the original faults. These faults should hence be considered as independent atomic faults. Our analysis show that the theory holds even when the effect of syntactical neighborhood of the program is considered. We analyze numerous real-world programs with real faults to validate our hypothesis.
Amin Alipour
Alex Groce
2016
Although mutation analysis is considered the best way to evaluate the effectiveness of a test suite, hefty computational cost often limits its use. To address this problem, various mutation reduction strategies have been proposed, all seeking to gain efficiency by reducing the number of mutants while maintaining the representativeness of an exhaustive mutation analysis. While research has focused on the efficiency of reduction, the effectiveness of these strategies in selecting representative mutants, and the limits in doing so has not been investigated. We investigate the practical limits to the effectiveness of mutation reduction strategies, and provide a simple theoretical framework for thinking about the absolute limits. Our results show that the limit in effectiveness over random sampling for real-world open source programs is 13.078% (mean). Interestingly, there is no limit to the improvement that can be made by addition of new mutation operators. Given that this is the maximum that can be achieved with perfect advance knowledge of mutation kills, what can be practically achieved may be much worse. We conclude that more effort should be focused on enhancing mutations than removing operators in the name of selective mutation for questionable benefit.
Ronald Metoyer
2016
As touchscreen mobile devices grow in popularity, it is inevitable that software developers will eventually want to write code on them. However, writing code on a soft (or virtual) keyboard is cumbersome due to the device size and lack of tactile feedback. We present a soft syntax-directed keyboard extension to the QWERTY keyboard for Java program input on touchscreen devices and evaluate this keyboard with Java programmers. Our results indicate that a programmer using the keyboard extension can input a Java program with fewer errors and using fewer keystrokes per character than when using a standard soft keyboard alone. In addition, programmers maintain an overall typing speed in words per minute that is equivalent to that on the standard soft keyboard alone. The keyboard extension was shown to be mentally, physically, and temporally less demanding than the standard soft keyboard alone when inputting a Java program.
Alex Groce
2016
Among the major questions that a practicing tester faces are deciding where to focus additional testing effort, and deciding when to stop testing. Test the least-tested code, and stop when all code is well-tested, is a reasonable answer. Many measures of “testedness” have been proposed; unfortunately, we do not know whether these are truly effective. In this paper we propose a novel evaluation of two of the most important and widely-used measures of test suite quality. The first measure is statement coverage, the simplest and best-known code coverage measure. The second measure is mutation score, a supposedly more powerful, though expensive, measure. We evaluate these measures using the actual criteria of interest: if a program element is (by these measures) well tested at a given point in time, it should require fewer future bug-fixes than a “poorly tested” element. If not, then it seems likely that we are not effectively measuring testedness. Using a large number of open source Java programs from Github and Apache, we show that both statement coverage and mutation score have only a weak negative correlation with bug-fixes. Despite the lack of strong correlation, there are statistically and practically significant differences between program elements for various binary criteria. Program elements (other than classes) covered by any test case see about half as many bug-fixes as those not covered, and a similar line can be drawn for mutation score thresholds. Our results have important implications for both software engineering practice and research evaluation.
2015
Software decay is a key concern for large, long lived software projects. Systems degrade over time as design and implementation compromises and exceptions pile up. However, there has been little research quantifying this decay, or understanding how software projects deal with this issue. While the best approach to improve the quality of a project is to spend time on reducing both software defects (bugs) and addressing design issues (refactoring), we find that design issues are frequently ignored in favor of fixing defects. We find that design issues have a higher chance of being fixed in the early stages of a project, and that efforts to correct these stall as projects mature and code bases grow leading to a build-up of design problems. From studying a large set of open source projects, our research suggests that while core contributors tend to fix design issues more often than non-core contributors, there is no difference once the relative quantity of commits is accounted for.
Amin Alipour
Alex Groce
2015
Mutation analysis is considered the best method for measuring the adequacy of test suites. However, the number of test runs required for a full mutation analysis grows faster than project size, which is not feasible for real-world software projects, which often have more than a million lines of code. It is for projects of this size, however, that developers most need a method for evaluating the efficacy of a test suite. Various strategies have been proposed to deal with the explosion of mutants. However, these strategies at best reduce the number of mutants required to a fraction of overall mutants, which still grows with program size. Running, e.g., 5% of all mutants of a 2MLOC program usually requires analyzing over 100,000 mutants. Similarly, while various approaches have been proposed to tackle equivalent mutants, none completely eliminate the problem, and the fraction of equivalent mutants remaining is hard to estimate, often requiring manual analysis of equivalence. In this paper, we provide both theoretical analysis and empirical evidence that a small constant sample of mutants yields statistically similar results to running a full mutation analysis, regardless of the size of the program or similarity between mutants. We show that a similar approach, using a constant sample of inputs can estimate the degree of stubbornness in mutants remaining to a high degree of statistical confidence, and provide a mutation analysis framework for Python that incorporates the analysis of stubbornness of mutants.
Alex Groce
2014
One of the key challenges of developers testing code is determining a test suite's quality -- its ability to find faults. The most common approach is to use code coverage as a measure for test suite quality, and diminishing returns in coverage or high absolute coverage as a stopping rule. In testing research, suite quality is often evaluated by its ability to kill mutants; artificially seeded potential faults. Determining which criteria best predict mutation kills is critical to practical estimation of test suite quality. Previous work has only used small sets of programs, and usually compares multiple suites for a single program. Practitioners, however, seldom compare suites --- they evaluate one suite. Using suites (both manual and automatically generated) from a large set of real-world open-source projects shows that evaluation results differ from those for suite-comparison: statement coverage (not block, branch, or path) predicts mutation kills best.
2014
How can we understand FOSS collaboration better? Can social issues that emerge be identified and addressed as they happen? Can the community heal itself, become more transparent and inclusive, and promote diversity? We propose a technique to address these issues by quantitative analysis and temporal visualization of social dynamics in FOSS communities. We used social network analysis metrics to identify growth patterns and unhealthy dynamics; This gives the community a heads-up when they can still take action to ensure the sustainability of the project.
2014
In this paper we describe a qualitative research on abandonment of a social network, i.e. Facebook, and why some people opt to terminate their use. Interviews were conducted with subjects who previously had daily use experience, and now opted for non-use. Four major themes were found as contributing to this technology abandonment. The insider story shared by the interviewees, of their technology non-use sheds light on the contributing factors leading to a shift from a user to a non-user.
2014
In this work, we studied collaboration network of three open source projects using a combined analysis method of temporal visualization and temporal quantitative analysis. We based our study on two papers by [Robles and Gonzalez-Barahona 2012] and [Hanneman and Klamma 2013], and identified three projects that had forked in the recent past. We mined the collaboration data, formed dynamic collaboration graphs, and measured social network analysis metrics over an 18-month period time window. We also visualized the dynamic graph (available online) and as stacked area charts over time. The visualizations and the quantitative results showed the differences among the projects in the three forking reasons of personal differences among the developer teams, technical differences (addition of new functionality) and more community-driven development. The personal differences representative project was identifiable, and so was the date it forked, with a month accuracy. The novelty of the approach was in applying the temporal analysis rather than static analysis, and in the temporal visualization of community structure. We showed that this approach shed light on the structure of these projects and reveal information that cannot be seen otherwise.
Alex Groce
2014
Mutation analysis is often used to compare the effectiveness of different test suites or testing techniques. One of the main assumptions underlying this technique is the Competent Programmer Hypothesis, which proposes that programs are very close to a correct version, or that the difference between current and correct code for each fault is very small. Researchers have assumed on the basis of the Competent Programmer Hypothesis that the faults produced by mutation analysis are similar to real faults. While there exists some evidence that supports this assumption, these studies are based on analysis of a limited and potentially non-representative set of programs and are hence not conclusive. In this paper, we separately investigate the characteristics of bugfixes and other changes in a very large set of randomly selected projects using four different programming languages. Our analysis suggests that a typical fault involves about three to four tokens, and is seldom equivalent to any traditional mutation operator. We also find the most frequently occurring syntactical patterns, and identify the factors that affect the real bug-fix change distribution. Our analysis suggests that different languages have different distributions, which in turn suggests that operators optimal in one language may not be optimal for others. Moreover, our results suggest that mutation analysis stands in need of better empirical support of the connection between mutant detection and detection of actual program faults in a larger body of real programs.
2013
In this paper we explore the design of Leyline, a provenance-based desktop search and file management system, both on a conceptual and user interface level. We start with a comparative analysis and classification of previous provenance based search systems, examining their underlying assumptions and focus, search coverage and flexibility, as well as features and limitations. We then describe a novel provenance-based search system based around a flexible visual sketchpad interface, and explore how this interface technique expands the flexibility of such systems within acceptable limits on complexity and search time. We conclude with a discussion of design implications and lessons learned in the development and evaluation of such a provenance-based search system.
2013
Development of interactive media and devices provides users with a wealth of new services such as online streaming (music and video), game play, web browsing, or email available on their televisions. While this increase in choice is great for the user, an inevitable consequence of many of these systems is a more demanding and interactive user experience. An easy solution to this problem is to use a keyboard and mouse; replicating the standard PC experience. However, this approach can be intimidating to novice users, and goes against the “lean back” experience often associated with TV viewing. A better approach might be to leverage novel interaction techniques used in common game controllers and tablets. This paper reports on the results of a user study in which we examined the limitations and affordances of game controllers such as the Nintendo Wiimote, Microsoft Kinect and “second display” tablets in an interactive TV (iTV) context. Task completion time, accuracy and user satisfaction measures show that Wiimote is most liked and performed best in almost all tasks.
2013
In this exploratory study, we map the use of free and open source software (FOSS) in the United States energy sector, especially as it relates to cyber security. Through two surveys and a set of semi-structured interviews—targeting both developers and policy makers—we identified key stakeholders, organizations, and FOSS projects, be they rooted in industry, academia, or public policy space that influence software and security practices in the energy sector. We explored FOSS tools, common attitudes and concerns, and challenges with regard to FOSS adoption. More than a dozen themes were identified from interviews and surveys. Of these, drivers for adoption and risks associated with FOSS were the most prevalent. More specifically, the misperceptions of FOSS, the new security challenges presented by the smart grid, and the extensive influence of vendors in this space play the largest roles in FOSS adoption in the energy sector.
2013
How can we understand FOSS collaboration better? Can social issues that emerge be identified and addressed before it is too late? Can the community heal itself, become more transparent and inclusive, and promote diversity? We propose a technique to address these issues by quantitative analysis of social dynamics in FOSS communities. We propose using social network analysis metrics to identify growth patterns and unhealthy dynamics; giving the community a heads-up when they can still take action to ensure the sustainability of the project.
ACM Creativity and Cognition
2013
Researchers often use participatory design – involving end-users in technology ideation – as this is found to lead to more useful and relevant products. Researchers have sought to involve older adults in the design of emerging technologies like smartphones, with which older adults often have little experience. Therefore, their effectiveness as co-designers could be questioned. We examine whether older adults can create novel design ideas, and whether critiquing existing applications prior to ideation helps or hinders creativity. Panelists from industry and academia evaluated design ideas generated by focus groups of older adults. Out of five groups, the most creative idea came from one with no smartphone experience or critique exposure. We found that while only some groups scored high on the novelty dimension of creativity, participants were enthusiastic about participating and adapted quickly. We found evidence that critiquing existing applications prior to ideation did more harm than good, potentially due to design fixation. We recommend continuing to involve older adults in the technology design ideation phase.
2013
How can we understand FOSS collaboration better? Can social issues that emerge be identified and addressed before it is too late? Can the community heal itself, become more transparent and inclusive, and promote diversity? We propose a technique to address these issues by quantitative analysis of social dynamics in FOSS communities. We propose using social network analysis metrics to identify growth patterns and unhealthy dynamics; giving the community a heads-up when they can still take action to ensure the sustainability of the project.
2012
An age wave is upon us, and many tech-savvy older adults are reaching retirement. To explore the barriers and benefits of engaging this population, promote an active post-working life, and foster community, we plan to involve retired programmers in the development of a free/open source software (FOSS) health and wellness application. FOSS communities are dominated by young male developers, and can be hostile to outsiders despite a shared philosophical alignment of altruistic motivations often embraced by retirees. I expect to contribute to the field by exploring the benefits and barriers of involving older adults in FOSS communities, as well as how they can benefit each other by collaborating to develop a meaningful product with and for older adults.
2012
The most effective strategy for finding files is to carefully arrange them into folders. This strategy breaks down for teams, where organizational schemes often differ between team members. It also breaks down when information is copied and reused as it becomes harder to track versions. As storage continues to grow and costs decline, the incentives to carefully archive old versions of files diminish. It is therefore important to explore new and improved search tools. The most common approach is keyword search, though recalling effective keywords can be challenging, especially as repositories grow and information flows across projects. A less common alternative is to use provenance –information about the creation, use and sharing of documents and their context, including collaborators. This paper presents a limited user study showing that provenance data is useful and desirable in search, and that an interface based on a graphical sketchpad is not only feasible, but efficient.
2012
With the growth of free and open-source software (FOSS) and the adoption of FOSS solutions in business and everyday life, it is important that projects serve their growingly diverse user base. The sustainability of FOSS projects relies on a constant influx of new contributors. Several large demographic surveys found that FOSS communities are very homogenous, dominated by young men, similar to the bias existing in the rest of the IT workforce. Building on previous research, we examine mailing list subscriptions and posting statistics of female FOSS participants. New participants often experience their first interaction on a FOSS project’s mailing list. We explored six FOSS projects – Buildroot, Busybox, Jaws, Parrot, Uclibc, and Yum. We found a declining rate of female participation from the 8.27% of subscribers, to 6.63% of posters, and finally the often reported code contributor rate of 1.5%. We find a disproportionate attrition rate among women along every step of the FOSS joining process.
2012
Free/Open Source Software (FOSS) projects have a reputation for being grass-roots efforts driven by individual contributors volunteering their time and effort. While this may be true for a majority of smaller projects, it is not always the case for large projects. As projects grow in size, importance and complexity, many come to depend on corporations, universities, NGO’s and governments, for support and contributions, either financially or through seconded staff. As outside organizations get involved in projects, how does this affect their governance, transparency and direction? To study this question we gathered bug reports and commit logs for GCC and the Linux Kernel. We found that outside organizations contribute a majority of code but rarely participate in bug triaging. Therefore their code does not necessarily address the needs of others and may distort governance and direction. We conclude that projects should examine their dependence on outside organizations.
Nitin Mohan
2011
Free/Open Source Software (FOSS) communities often use open bug reporting to allow users to participate by reporting bugs. This practice can lead to more duplicate reports, as users can be less rigorous about researching existing bug reports. This paper examines how FOSS projects deal with duplicate bug reports. We examined 12 FOSS projects: 4 small, 4 medium and 4 large, where size was determined by number of code contributors. First, we found that contrary to what has been reported from studies of individual large projects like Mozilla and Eclipse, duplicate bug reports are a problem for FOSS projects, especially medium-sized, which struggle with a large number of submissions without the resources of large projects. Second, we found that the focus of a project does not affect the number of duplicate bug reports. Our findings indicate a need for additional scaffolding and training for bug reporters.
2011
Free/Open source software (FOSS) is an important part of the IT ecosystem. Due to the voluntary nature of participation, continual recruitment is key to the growth and sustainability of these communities. It is therefore important to understand how and why potential contributors fail in the process of transitioning from user to contributor. Most newcomers, or “newbies”, have their first interaction with a community through a mailing list. To understand how this first contact influences future interactions, we studied eight mailing lists across four FOSS projects: MediaWiki, GIMP, PostgreSQL, and Subversion. We analyzed discussions initiated by newbies to determine the effect of gender, nationality, politeness, helpfulness and timeliness of response. We found that nearly 80% of newbie posts received replies, and that receiving timely responses, especially within 48 hours, was positively correlated with future participation. We also found that while the majority of interactions were positive, 1.5% of responses were rude or hostile.
Linux Journal
2010
Through a discussion of the Open Source Wireless Adaptive Learning Device, this article explains how open source hardware and software improve computer science education.
2010
In co-located software development, diagramming practices, such as sketching ideas out with a pen and paper, support the creative process and allow designers to shape, analyze, and communicate their ideas. This study focuses on the diagramming practices used in the design of Open Source Software (OSS), where the norm is highly distributed group work. In OSS, text-based communication (e.g., mailing lists) dominates, and sketching and drawing diagrams collaboratively remains difficult due to the barriers imposed by distance and technology. Previous studies have examined these practices and barriers in the context of individual projects. To understand how contributors across OSS projects use diagrams in design-related activities, we conducted a survey of 230 contributors from 40 different OSS projects, and interviewed eight participants. Our results show that although contributors understand the advantages of using diagrams for design-related activities, diagrams are infrequently used in OSS. This motivated us to examine how and why diagramming occurs, and the factors that prevent widespread use in OSS. Finally, we propose new ideas for supporting design activities in OSS projects.

Contact Info

Carlos Jensen
School of EECS
3061 Kelley Engr. Ctr.
Oregon State University
Corvallis, Oregon 97331
Tel: +1-541-737-2555 Fax: +1-541-737-1300
Copyright ©  2017 Oregon State University
Disclaimer

COE EECS EUSES