
Design
Science Discovery System (SDS) has two
arms, the
data-driven arm, and the knowledge-driven arm. The central part of the
SDS that
utilizes these two arms is called Iterative Discovery program (IDP).
There are two information repositories
that IDP uses,
the patient data (PD) and the biomedical knowledge (BMK). Hospitals,
clinics,
and other organizations providing health care record clinical and
biologic data
for each patient encounter/visit increasingly in digital form,
frequently in
clinical data repositories represented by a relational database.
Existing human knowledge in biomedicine is
recorded
mostly in natural human language, in different forms, such as
textbooks,
research papers, scientific reports, guidelines, etc. There is an
increasing
trend to represent this knowledge in digital form. Examples are
electronic
textbooks, online publishing, and searchable indexes of published
papers such
as NLM PubMed.
IDP uses data miner (DM) software
applications to
compute strength of regularity for each subset of attributes (variables
measured for a patient) of PD; we call this “observed-score”.
For the
same attributes subset, IDP runs knowledge miner (KM) software to
compute
strength of support from current human biomedical knowledge for that
particular
observed regularity; we call it “expected-score”. Comparing the
observed- and expected-scores, IDP estimates the “surprise-score”.
For
subsets with high surprise score, IDP has algorithms that implement
four major
types of response: "rejection", "reinterpretation",
"peripheral theory change", and "complete theory change".
The last two comprise scientific discovery. SDS mines the BMK to build
and
propose mechanism, the explanation, for such surprise finding.
Algorithms for
closed- and open-discovery are utilized for this purpose.
The two main sources of information, the
PD and the
BMK, usually are in a format that may not be optimal for SDS
computations.
There are software applications that take such data, and translate it
into a
form that is more optimal for computations. We call these applications
representation modifiers (RM). There are two types of RM, one for the
PD and
the other for BMK, hence RM1 and RM2. The RM1 translates PD to
Experiment Space,
and the RM2 converts BMK to Hypothesis Space.
Since both PD and BMK update and grow
constantly over
time, there are updating applications, performing regular updates
usually
through network connections.