Science Discovery System / Dual-Mining Method



This is a web page of the paper

Siadaty MS, Knaus WA. Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method. BMC Med Inform Decis Mak. 2006 Mar 7;6(1):13

You can download a PDF copy of the paper for free from the publisher
BMC.

Reprint requests and correspondence
Mir S Siadaty, MD MS
University of Virginia,
School of Medicine, DPHS,
Box 800717
Charlottesville, VA 22908
Phone (434) 982 4436
Fax (434) 924 8437
MirSiadaty@virginia.edu



components of SDS
patents pending


Design

Science Discovery System (SDS) has two arms, the data-driven arm, and the knowledge-driven arm. The central part of the SDS that utilizes these two arms is called Iterative Discovery program (IDP).

There are two information repositories that IDP uses, the patient data (PD) and the biomedical knowledge (BMK). Hospitals, clinics, and other organizations providing health care record clinical and biologic data for each patient encounter/visit increasingly in digital form, frequently in clinical data repositories represented by a relational database.

Existing human knowledge in biomedicine is recorded mostly in natural human language, in different forms, such as textbooks, research papers, scientific reports, guidelines, etc. There is an increasing trend to represent this knowledge in digital form. Examples are electronic textbooks, online publishing, and searchable indexes of published papers such as NLM PubMed.

IDP uses data miner (DM) software applications to compute strength of regularity for each subset of attributes (variables measured for a patient) of PD; we call this “observed-score”. For the same attributes subset, IDP runs knowledge miner (KM) software to compute strength of support from current human biomedical knowledge for that particular observed regularity; we call it “expected-score”. Comparing the observed- and expected-scores, IDP estimates the “surprise-score”. For subsets with high surprise score, IDP has algorithms that implement four major types of response: "rejection", "reinterpretation", "peripheral theory change", and "complete theory change". The last two comprise scientific discovery. SDS mines the BMK to build and propose mechanism, the explanation, for such surprise finding. Algorithms for closed- and open-discovery are utilized for this purpose.

The two main sources of information, the PD and the BMK, usually are in a format that may not be optimal for SDS computations. There are software applications that take such data, and translate it into a form that is more optimal for computations. We call these applications representation modifiers (RM). There are two types of RM, one for the PD and the other for BMK, hence RM1 and RM2. The RM1 translates PD to Experiment Space, and the RM2 converts BMK to Hypothesis Space.

Since both PD and BMK update and grow constantly over time, there are updating applications, performing regular updates usually through network connections.

A Grid operating system will be used to support the infrastructure for the distributed execution of the SDS.



go back to Mir's
homepage.



Reprint requests and correspondence:
Mir S Siadaty, MD MS
UVA School of Medicine, DHES,
Box 800717
Charlottesville, VA 22908
Phone (434) 982 4436
Fax (434) 924 8437
MirSiadaty@virginia.edu



Go to the University of Virginia home page
Maintained by MirSiadaty@virginia.edu
Last Modified: 5 January 05



Click Here