Knowledge discovery
Encyclopedia
Knowledge discovery is a concept of the field of computer science
that describes the process of automatically searching large volumes of data
for patterns that can be considered knowledge
about the data . It is often described as deriving knowledge
from the input data
. This complex topic can be categorized according to 1) what kind of data is searched; and 2) in what form is the result of the search represented. Knowledge discovery developed out of the Data mining
domain, and is closely related to it both in terms of methodology and terminology .
The most well-known branch of data mining
is knowledge discovery, also known as Knowledge Discovery in Databases (KDD). Just as many other forms of knowledge discovery it creates abstraction
s of the input data. The knowledge obtained through the process may become additional data that can be used for further usage and discovery.
Another promising application of knowledge discovery is in the area of software modernization
which involves understanding existing software artifacts. This process is related to a concept of reverse engineering
. Usually the knowledge obtained from existing software is presented in the form of models to which specific queries can be made when necessary. An entity relationship is a frequent format of representing knowledge obtained from existing software. Object Management Group
(OMG) developed specification Knowledge Discovery Metamodel
(KDM) which defines an ontology for the software assets and their relationships for the purpose of performing knowledge discovery of existing code. Knowledge discovery from existing software systems, also known as software mining
is closely related to data mining
, since existing software artifacts contain enormous business value
, key for the evolution of software systems. Instead of mining individual data set
s, software mining
focuses on metadata
, such as database schemas.
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
that describes the process of automatically searching large volumes of data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...
for patterns that can be considered knowledge
Knowledge
Knowledge is a familiarity with someone or something unknown, which can include information, facts, descriptions, or skills acquired through experience or education. It can refer to the theoretical or practical understanding of a subject...
about the data . It is often described as deriving knowledge
Knowledge
Knowledge is a familiarity with someone or something unknown, which can include information, facts, descriptions, or skills acquired through experience or education. It can refer to the theoretical or practical understanding of a subject...
from the input data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...
. This complex topic can be categorized according to 1) what kind of data is searched; and 2) in what form is the result of the search represented. Knowledge discovery developed out of the Data mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
domain, and is closely related to it both in terms of methodology and terminology .
The most well-known branch of data mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
is knowledge discovery, also known as Knowledge Discovery in Databases (KDD). Just as many other forms of knowledge discovery it creates abstraction
Abstraction
Abstraction is a process by which higher concepts are derived from the usage and classification of literal concepts, first principles, or other methods....
s of the input data. The knowledge obtained through the process may become additional data that can be used for further usage and discovery.
Another promising application of knowledge discovery is in the area of software modernization
Software modernization
Legacy Modernization, or Software modernization, refers to the conversion, rewriting or porting of a legacy system to a modern computer programming language, software libraries, protocols, or hardware platform...
which involves understanding existing software artifacts. This process is related to a concept of reverse engineering
Reverse engineering
Reverse engineering is the process of discovering the technological principles of a device, object, or system through analysis of its structure, function, and operation...
. Usually the knowledge obtained from existing software is presented in the form of models to which specific queries can be made when necessary. An entity relationship is a frequent format of representing knowledge obtained from existing software. Object Management Group
Object Management Group
Object Management Group is a consortium, originally aimed at setting standards for distributed object-oriented systems, and is now focused on modeling and model-based standards.- Overview :...
(OMG) developed specification Knowledge Discovery Metamodel
Knowledge Discovery Metamodel
Knowledge Discovery Metamodel is publicly available specification from the Object Management Group . KDM is a common intermediate representation for existing software systems and their operating environments, that defines common metadata required for deep semantic integration of Application...
(KDM) which defines an ontology for the software assets and their relationships for the purpose of performing knowledge discovery of existing code. Knowledge discovery from existing software systems, also known as software mining
Software mining
Software mining is an application of knowledge discovery in the area of software modernization which involves understanding existing software artifacts. This process is related to a concept of reverse engineering. Usually the knowledge obtained from existing software is presented in the form of...
is closely related to data mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
, since existing software artifacts contain enormous business value
Business Value
In management, business value is an informal term that includes all forms of value that determine the health and well-being of the firm in the long-run...
, key for the evolution of software systems. Instead of mining individual data set
Data set
A data set is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data set in question. Its values for each of the variables, such as height and weight of an object or values of random numbers. Each...
s, software mining
Software mining
Software mining is an application of knowledge discovery in the area of software modernization which involves understanding existing software artifacts. This process is related to a concept of reverse engineering. Usually the knowledge obtained from existing software is presented in the form of...
focuses on metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
, such as database schemas.
Input data for knowledge discovery
- DatabasesData miningData mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
- Relational dataRelational data miningRelational data mining is the data mining technique for relationaldatabases. Unlike traditional data mining algorithms, which look forpatterns in a single table ,relational data mining algorithms look for patterns among multiple tables...
- DatabaseDatabaseA database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
- Document warehouseDocument warehouseIn the field of data warehouses, a document warehouse is a software framework for analysis, sharing, and reuse of unstructured data, such as textual or multimedia documents....
- Data warehouseData warehouseIn computing, a data warehouse is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations before it is used in the DW for reporting.A data warehouse...
- Relational data
- Software MiningSoftware miningSoftware mining is an application of knowledge discovery in the area of software modernization which involves understanding existing software artifacts. This process is related to a concept of reverse engineering. Usually the knowledge obtained from existing software is presented in the form of...
- TextText miningText mining, sometimes alternately referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as...
- Concept miningConcept MiningConcept mining is an activity that results in the extraction of concepts from artifacts. Solutions to the task typically involve aspects of artificial intelligence and statistics, such as data mining and text mining...
- Concept mining
- Graphs
- Molecule miningMolecule miningThis page describes mining for molecules. Since molecules may be represented by molecular graphs this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data instances...
- Molecule mining
- SequencesSequence miningSequence mining is concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence. It is usually presumed that the values are discrete, and thus Time series mining is closely related, but usually considered a different activity...
- Data stream miningData stream miningData Stream Mining is the process of extracting knowledge structures from continuous, rapid data records.A data stream is an ordered sequence of instances that in many applications of data stream mining can be read only once or a small number of times using limited computing and storage...
- Learning from time-varying data streams under concept driftConcept driftIn predictive analytics and machine learning, the concept drift means that the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes.The term concept...
- Data stream mining
- WebWeb miningWeb mining - is the application of data mining techniques to discover patterns from the Web.According to analysis targets, web mining can be divided into three different types, which are Web usage mining, Web content mining and Web structure mining.-Web usage mining:Web usage mining is the process...
Output formats for discovered knowledge
- Data modelData modelA data model in software engineering is an abstract model, that documents and organizes the business data for communication between team members and is used as a plan for developing applications, specifically how data is stored and accessed....
- MetadataMetadataThe term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
- Metamodels
- OntologyOntologyOntology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories of being and their relations...
- Knowledge representationKnowledge representationKnowledge representation is an area of artificial intelligence research aimed at representing knowledge in symbols to facilitate inferencing from those knowledge elements, creating new elements of knowledge...
- Knowledge tagsKnowledge tagsA knowledge tag is a type of meta-information that describes or defines some aspect of an information resource . Knowledge tags are more than traditional non-hierarchical keywords or terms...
- Business ruleBusiness ruleA Business rule is a statement that defines or constrains some aspect of the business and always resolves to either true or false. Business rules are intended to assert business structure or to control or influence the behavior of the business. Business rules describe the operations, definitions...
- Knowledge Discovery MetamodelKnowledge Discovery MetamodelKnowledge Discovery Metamodel is publicly available specification from the Object Management Group . KDM is a common intermediate representation for existing software systems and their operating environments, that defines common metadata required for deep semantic integration of Application...
(KDM) - Business Process Modeling NotationBusiness Process Modeling NotationBusiness Process Model and Notation is a graphical representation for specifying business processes in a business process model. It was previously known as Business Process Modeling Notation....
(BPMN) - Intermediate representation
- Resource Description FrameworkResource Description FrameworkThe Resource Description Framework is a family of World Wide Web Consortium specifications originally designed as a metadata data model...
(RDF) - Software metricSoftware metricA software metric is a measure of some property of a piece of software or its specifications. Since quantitative measurements are essential in all sciences, there is a continuous effort by computer science practitioners and theoreticians to bring similar approaches to software development...
s
See also
- Data archaeologyData archaeologyData archaeology refers to the art and science of recovering computer data encrypted in now obsolete media or formats. Data archaeology can also refer to recovering information from damaged electronic formats after natural or man made disasters....
- Data miningData miningData mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
- Data Mining in AgricultureData mining in agricultureData mining in agriculture is a very recent research topic. It consists in the application of data mining techniques to agriculture. Recent technologies are nowadays able to provide a lot of information on agricultural-related activities, which can then be analyzed in order to find important...
- ClusteringClusteringClustering can refer to the following:In demographics:* Clustering , the gathering of various populations based on factors such as ethnicity, economics or religion.In graph theory:...
- disKover (commercial tool created by Knowledge Now Limited)