Thursday, January 04, 2007

End-User Modelling - The Need for Alternatives to Standard Spreadsheets

A consideration which influences my research into End-User Programming and Modelling is that organizations should not use proprietary or closed standards for their information. Systems should be designed with the assumption that information, which represents the primary system asset, may eventually need to be migrated to another software tool or environment. Open use of information was the priority for the DATUM project. This project is explained in 'DATUM Project: Cost Estimating Environment for Support of Aerospace Design Decision Making' (Scanlan et al, 2006). Therefore a requirement of this research is that open standard semantic languages are used to represent information, to be used both as input and output of the models. These languages are based on eXtensible Markup Language (XML). These same open standard languages can be used for developing the program code of models. It is proposed that software and information represented by the software, be separated but represented in the same open standard searchable way. Software and the information it manipulates are just information that has different uses, there is no reason why software must be represented differently from other information. So XML can be used both as the information input and output by the application, and for the definition of the model itself. The model can read or write information it represents, and the information can read from or write to the model. This recursion makes 'meta-programming' possible. Meta programming is writing of programs by other programs. The purpose of this is to provide a cascading series of layers that translate a relatively easy to use visual representation of a problem to be modelled, into code that can be run by present day compilers and interpreters. This is to make it easier for computer literate non-programmers to specify instructions to a computer, without learning and writing code in computer languages. To achieve this, any layer of software or information must be able to read the code or the information represented in any other. Code and information are only separated out as a matter of design choice to aid human comprehension, they can be represented in the same way using the same kinds of open standard languages. The methods used for this representation and translation are being researched.

End User Programming and Engineering Modelling

Many large companies have outsourced the management and support of their IT systems to third parties. Very strict management processes and procedures for the acquisition and implementation of new systems have been introduced. A side-effect of this policy is a tendency for employees to make extensive use of spreadsheets and macro programming languages for information storage, analysis, and manipulation (Scanlan et al, 2006). These applications establish themselves as a legitimate part of the business processes of the organization despite the essentially uncontrolled nature of their development. This is a worrying trend as these applications are frequently undocumented, rarely fully tested or validated, and are produced by people who often have little or no formal training in good systems development practice. An alternative approach of User Driven Modelling is required, because large spreadsheets are unmaintainable in the long run.

By their nature, large spreadsheets are difficult for a third party to comprehend as their inherent flexibility for editing allows users to generate a complex web of cell references which are arduous to audit. Panko (2000), Paine, (2003), and Scanlan et al (2006) examine this problem. Worse still, there is a tendency for the spreadsheet author to misguidedly compound the problem by expending a considerable amount of effort into hiding the detail behind an elaborate and visually attractive 'front end'. Should the author of such an application leave the organization, it is commonly abandoned as colleagues are reluctant to master its complexity and often refuse to take ownership of it, as they are busy with their main work. Paine states that spreadsheets have almost no features for building applications out of parts that can be developed and tested independently. Panko (2000) suggests that “Given data from recent field audits, most large spreadsheets probably contain significant errors.” The most recent audit he cites found errors in at least 86% of spreadsheets audited. In 1997 Panko reported that 90% of the spreadsheets audited in a study carried out by Coopers and Lybrand were found to have errors. In 'Automatic Generation and Maintenance of Correct Spreadsheets?' Erwig et al (2006) cite a figure of 90% from Rajalingham et al (2001) 'Classification of Spreadsheet Errors'. 'Given the billions of spreadsheets in use, this leaves the worlds of business and finance horribly vulnerable to programming mistakes' (Scanlan et al, 2006). The studies by Paine, and Panko show that the chances of any given spreadsheet cell containing an error are somewhere between 0.3 and 3%, so that a spreadsheet of only 100 cells has about a 30% chance of having one error or more. Aragones et al (2006) state - 'Desktop spreadsheet users are very creative in their adaptations, but
distributed spreadsheets have the problem of distributed, inconsistent inputs and distributed
results. There is no easy way to aggregate the collective wisdom of user experience'.

References

Aragones, A., Bruno, J., Crapo, A., Garbiras M., 2006. An Ontology-Based Architecture for Adaptive Work-Centered User Interface Technology. Jena User Conference, 2006, Bristol, UK [online]. Available from: http://jena.hpl.hp.com/juc2006/proceedings/crapo/paper.pdf.

Erwig, M., Abraham, R., Cooperstein, I., Kollmansberger S., 2006. Automatic Generation and Maintenance of Correct Spreadsheets?. Proceedings of the 27th international conference on Software engineering, St. Louis, MO, USA pp 136-145 [online]. Available from: http://web.engr.oregonstate.edu/~erwig/papers/Gencel_ICSE05.pdf

Paine, J., 2003. Spreadsheet Structure Discovery with Logic Programming, Proceedings of European Spreadsheet Risks Interest Group EuSpRIG Greenwich, England.

Panko, R. P., 2000. Spreadsheet Errors: What We Know, What We Think We Can Do. Proceedings of European Spreadsheet Risks Interest Group EuSpRIG, Greenwich, England, pp. 7–17.

Scanlan, J., Rao, A., Bru, C., Hale, P., Marsh, R., 2006. DATUM Project: Cost Estimating Environment for Support of Aerospace Design Decision Making. Journal of Aircraft, 43(4).

2 comments:

Unknown said...

Hi Peter-- interesting information, and I'd like to introduce you to my client that develops a quantitative modeling software program (Quantrix Modeler). I don't want to use your blog for commercial purposes, and you don't list your email address, so if you are interested in learning more, please contact me at alison AT harrismediaservices DOT com

ps. I liked the Return of the King better than vol. 1!

Peter Hale said...

Alison

I will look at this I have a modelling page at http://www.cems.uwe.ac.uk/amrc/seeds/Modelling.htm where I'm collcting all the references to modelling tools and on our team's page at http://www.cems.uwe.ac.uk/amrc/seeds/#Modelling, and my home page at http://www.cems.uwe.ac.uk/~phale/#ModellingDecisionSupportandKnowledgeManagementLinksSimulation
I'll look at your information on this tool gradually and see if it's relevant to our projects.