Data Mining and Rasch Measurement

Data mining is finding useful relationships in large datasets. "When you mine data (by "drilling down"), you use data to improve your business by predicting and understanding behavior." (Peter Frometa, SPSS Inc., 2001)

According to a press release, "in May 1998, more than 20 key players in the data mining market met to discuss the first draft of a new process model, CRISP-DM ("CRoss-Industry Standard Process for Data Mining"). This is designed to help businesses plan and work through the complete data mining process - from problem specification to deployment of results. The core consortium consists of NCR, ISL, Daimler-Benz and OHRA. At the centre of the CRISP-DM project is a Special Interest Group (SIG) of data mining service suppliers and large-scale commercial users."

Data mining employs a 6-stage approach to extracting meaning from business data. This parallels Rasch-based approaches to measurement construction in the social sciences. The Table below focusses on the Data Cleaning component of data mining. It is in marked contrast to the conventional "data is inviolable" approach of social science research.

Data Mining Cycle


The Data Mining Reference Model

The Figure shows the six phases of a data mining process. The sequence of the phases is not rigid. Moving back and forth between different phases is always required. It depends on the outcome of each phase which phase or which particular task of a phase, has to be performed next. The arrows indicate the most important and frequent dependencies between phases.

The outer circle symbolizes the cyclical nature of data mining itself. Data mining is not over once a solution is deployed. The lessons learned during the process, and from the deployed solution, can trigger new, often more focused business questions. Subsequent data mining processes will benefit from the experiences of previous ones. In the following, we outline each phase briefly:

1. Business understanding
This initial phase focuses on understanding the project objectives and requirements from a business perspective, then converting this knowledge into a data mining problem definition and a preliminary plan designed to achieve the objectives.

2. Data understanding
starts with an initial data collection and proceeds with activities in order to get familiar with the data, to identify data quality problems, to discover first insights into the data or to detect interesting subsets with hidden information.

3. Data preparation
constructs the final dataset from the initial raw data. Data preparation tasks are likely to be performed multiple times and not in any prescribed order. Tasks include table, record and attribute selection as well as transformation and cleaning of data for modeling tools.

4. Modeling
selects and applies modeling techniques and calibrates their parameters to optimal values. Typically, there are several techniques for the same data mining problem type. Some techniques have specific requirements on the form of data. Therefore, stepping back to the data preparation phase is often necessary.

5. Evaluation
thoroughly reviews the model and the steps executed to construct the model to be certain it properly achieves the business objectives. A key objective is to determine if there is some important business issue that has not been sufficiently considered. A decision on the use of the data mining results should be reached.

6. Deployment
organizes and presents the knowledge gained in a way that the customer can use it. It often involves applying "live" models within an organization's decision making processes, for example in real-time personalization of Web pages or repeated scoring of marketing databases. However, depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data mining process across the enterprise. In many cases it is the customer, not the data analyst, who carries out the deployment steps. However, even if the analyst will not carry out the deployment effort it is important for the customer to understand up-front what actions need to be carried out in order to actually make use of the created models.

Excerpted from CRISP-DM 1.0 Step-by-step data mining guide (2000)

"Data Cleaning"
Task Clean data
Raise the data quality to the level required by the selected analysis techniques. This may involve selection of clean subsets of the data, the insertion of suitable defaults or more ambitious techniques such as the estimation of missing data by data modeling.
Output Data cleaning report
Describe the decisions and actions that were taken to address the data quality problems. The report should also address what data quality issues are still outstanding and what possible effects they could have on the results.
Activities Reconsider how to deal with observed types of noise.
Correct, remove or ignore noise.
Decide how to deal with special values and their meaning.
Reconsider data selection criteria in light of experiences of data cleaning (i.e., one may wish include/exclude other sets of data).
Good Idea! Remember that some fields may be irrelevant to the data mining goals and therefore noise in those fields has no significance. However, if noise is ignored for these reasons, it should be fully documented as the circumstances may change later!
Excerpted from CRISP-DM 1.0 Step-by-step data mining guide (2000)

Data Mining Stages
Rasch Measurement
1. Business Understanding
Determine business objectives
Determine data mining goals
"You must have a clear idea of what success would be."
1. Conceptualize the latent variable
What to measure?
How to do it?
What marks success?
2. Data Understanding
"Do the data match your objectives?"
2. Collect relevant data
3. Data Preparation
Select data
Clean data
Reconstruct data
3. Organize data
Select data
Orient data
Rescore data
4. Modeling
Build model
Assess model
4. Construct measures
Select measurement model
Explicable data fit?
Refine model
5. Evaluation
Evaluate results
Review process
"Can results be repeated and verified by someone else?"
5. Evaluate results
Meaningful construct?
Useful measures?
Reproducible results?
6. Deployment
Report
"Communicate! Impress! Compel!"
Activate
6. Utilize measures
Reporting
Decision making
Knowledge building



Data Mining and Rasch Measurement CRISP-DM, Linacre J.M. … Rasch Measurement Transactions, 2001, 15:2 p. 826-7



Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez

To be emailed about new material on www.rasch.org
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from Rasch.org

www.rasch.org welcomes your comments:

Your email address (if you want us to reply):

 

ForumRasch Measurement Forum to discuss any Rasch-related topic

Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
May 17 - June 21, 2024, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 12 - 14, 2024, Wed.-Fri. 1st Scandinavian Applied Measurement Conference, Kristianstad University, Kristianstad, Sweden http://www.hkr.se/samc2024
June 21 - July 19, 2024, Fri.-Fri. On-line workshop: Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
Aug. 5 - Aug. 6, 2024, Fri.-Fri. 2024 Inaugural Conference of the Society for the Study of Measurement (Berkeley, CA), Call for Proposals
Aug. 9 - Sept. 6, 2024, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Oct. 4 - Nov. 8, 2024, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 17 - Feb. 21, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
May 16 - June 20, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

 

The URL of this page is www.rasch.org/rmt/rmt152f.htm

Website: www.rasch.org/rmt/contents.htm