Software mining
Encyclopedia
Software mining is an application of knowledge discovery
in the area of software modernization
which involves understanding existing software artifacts. This process is related to a concept of reverse engineering
. Usually the knowledge obtained from existing software is presented in the form of models to which specific queries can be made when necessary. An entity relationship is a frequent format of representing knowledge obtained from existing software. Object Management Group
(OMG) developed specification Knowledge Discovery Metamodel
(KDM) which defines an ontology
for software assets and their relationships for the purpose of performing knowledge discovery of existing code.
, since existing software artifacts contain enormous business value, key for the evolution of software systems. Knowledge discovery from software systems addresses structure, behavior as well as the data processed by the software system. Instead of mining individual data set
s, software mining focuses on metadata
, such as database schemas. OMG Knowledge Discovery Metamodel
provides an integrated representation to capturing application metadata
as part of a holistic existing system metamodel. Another OMG specification, the Common Warehouse Metamodel
focuses entirely on mining enterprise metadata
.
. Software mining addresses structure, behavior as well as the data processed by the software system.
Mining software systems may happen at various levels:
Knowledge discovery
Knowledge discovery is a concept of the field of computer science that describes the process of automatically searching large volumes of data for patterns that can be considered knowledge about the data . It is often described as deriving knowledge from the input data...
in the area of software modernization
Software modernization
Legacy Modernization, or Software modernization, refers to the conversion, rewriting or porting of a legacy system to a modern computer programming language, software libraries, protocols, or hardware platform...
which involves understanding existing software artifacts. This process is related to a concept of reverse engineering
Reverse engineering
Reverse engineering is the process of discovering the technological principles of a device, object, or system through analysis of its structure, function, and operation...
. Usually the knowledge obtained from existing software is presented in the form of models to which specific queries can be made when necessary. An entity relationship is a frequent format of representing knowledge obtained from existing software. Object Management Group
Object Management Group
Object Management Group is a consortium, originally aimed at setting standards for distributed object-oriented systems, and is now focused on modeling and model-based standards.- Overview :...
(OMG) developed specification Knowledge Discovery Metamodel
Knowledge Discovery Metamodel
Knowledge Discovery Metamodel is publicly available specification from the Object Management Group . KDM is a common intermediate representation for existing software systems and their operating environments, that defines common metadata required for deep semantic integration of Application...
(KDM) which defines an ontology
Ontology
Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories of being and their relations...
for software assets and their relationships for the purpose of performing knowledge discovery of existing code.
Software mining and data mining
Software mining is closely related to data miningData mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
, since existing software artifacts contain enormous business value, key for the evolution of software systems. Knowledge discovery from software systems addresses structure, behavior as well as the data processed by the software system. Instead of mining individual data set
Data set
A data set is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data set in question. Its values for each of the variables, such as height and weight of an object or values of random numbers. Each...
s, software mining focuses on metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
, such as database schemas. OMG Knowledge Discovery Metamodel
Knowledge Discovery Metamodel
Knowledge Discovery Metamodel is publicly available specification from the Object Management Group . KDM is a common intermediate representation for existing software systems and their operating environments, that defines common metadata required for deep semantic integration of Application...
provides an integrated representation to capturing application metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
as part of a holistic existing system metamodel. Another OMG specification, the Common Warehouse Metamodel
Common Warehouse Metamodel
The Common Warehouse Metamodel defines a specification for modeling metadata for relational, non-relational, multi-dimensional, and most other objects found in a data warehousing environment...
focuses entirely on mining enterprise metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
.
Text-Mining Software Tools
Text-Mining Software Tools enable easy handling of text documents for the purpose of data analysis including automatic model generation and document classification, document clustering, document visualization, dealing with Web documents, crawling the Web and many other. The code is written in C++ and originally runs on Windows platform and using Wine or similar utility can be run on Linux/Unix. The code was developed through our own research needs guided by our research projects and refined/polished as the time permitted. The top level components build on the core of the software are contributed through the time by several people from our group including Janez Brank, Blaz Fortuna, Miha Grcar, Jure Leskovec, Blaz Novak.Levels of software mining
Knowledge discovery in software is related to a concept of reverse engineeringReverse engineering
Reverse engineering is the process of discovering the technological principles of a device, object, or system through analysis of its structure, function, and operation...
. Software mining addresses structure, behavior as well as the data processed by the software system.
Mining software systems may happen at various levels:
- program level (individual statements and variables)
- design pattern level
- call graph level (individual procedures and their relationships)
- architectural level (subsystems and their interfaces)
- data level (individual columns and attributes of data stores)
- application level (key data items and their flow through the applications)
- business level (domain concepts, business rules and their implementation in code)
Forms of representing the results of Software Mining
- data modelData modelA data model in software engineering is an abstract model, that documents and organizes the business data for communication between team members and is used as a plan for developing applications, specifically how data is stored and accessed....
- metadataMetadataThe term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
- metamodels
- ontologyOntologyOntology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories of being and their relations...
- Knowledge representationKnowledge representationKnowledge representation is an area of artificial intelligence research aimed at representing knowledge in symbols to facilitate inferencing from those knowledge elements, creating new elements of knowledge...
- business ruleBusiness ruleA Business rule is a statement that defines or constrains some aspect of the business and always resolves to either true or false. Business rules are intended to assert business structure or to control or influence the behavior of the business. Business rules describe the operations, definitions...
- Knowledge Discovery MetamodelKnowledge Discovery MetamodelKnowledge Discovery Metamodel is publicly available specification from the Object Management Group . KDM is a common intermediate representation for existing software systems and their operating environments, that defines common metadata required for deep semantic integration of Application...
(KDM) - Business Process Modeling NotationBusiness Process Modeling NotationBusiness Process Model and Notation is a graphical representation for specifying business processes in a business process model. It was previously known as Business Process Modeling Notation....
(BPMN) - intermediate representation
- Resource Description FrameworkResource Description FrameworkThe Resource Description Framework is a family of World Wide Web Consortium specifications originally designed as a metadata data model...
(RDF) - abstract syntax treeAbstract syntax treeIn computer science, an abstract syntax tree , or just syntax tree, is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code. The syntax is 'abstract' in the sense that it...
(AST) - software metricSoftware metricA software metric is a measure of some property of a piece of software or its specifications. Since quantitative measurements are essential in all sciences, there is a continuous effort by computer science practitioners and theoreticians to bring similar approaches to software development...
s - graphical user interfaceGraphical user interfaceIn computing, a graphical user interface is a type of user interface that allows users to interact with electronic devices with images rather than text commands. GUIs can be used in computers, hand-held devices such as MP3 players, portable media players or gaming devices, household appliances and...
s