International Journal of Current Research and Review
ISSN: 2231-2196 (Print)ISSN: 0975-5241 (Online)
Bootstrap Slider

Indexed and Abstracted in: SCOPUS, Crossref, CAS Abstracts, Publons, CiteFactor, Open J-Gate, ROAD, Indian Citation Index (ICI), Indian Journals Index (IJINDEX), Internet Archive, IP Indexing, Google Scholar, Scientific Indexing Services, Index Copernicus, Science Central, Revistas Medicas Portuguesas, EBSCO, BOAI, SOROS, NEWJOUR, ResearchGATE, Ulrich's Periodicals Directory, DocStoc, PdfCast, getCITED, SkyDrive, Citebase, e-Print, WorldCat (World's largest network of library content and services), Electronic Journals Library by University Library of Regensburg, SciPeople.

Search Articles

Track manuscript

Readers around the world

Full Html

IJCRR - Vol 02 Issue 10, October, 2010

Pages: 09-15

Date of Publication: 30-Nov--0001


Print Article   Download XML  Download PDF

CLUSTERING OF DATA AFTER MINIMIZING DATA SIZE USING ROUGH SET THEORY

Author: Sunanda Das, Asit Kumar Das

Category: Technology

Abstract:Objective: Our approach is to reduce the large data size to a small data size which represents same features of the total large data set, so that computational complexity
becomes shorter.
Method:
In this paper we present a new approach to minimize the data size and then to cluster that reduced data. The volume of data being generated nowadays to cluster is
increasingly large. How to extract useful information from such data collections is an
important issue. A promising technique is the Rough set theory, a new mathematical
approach to data analysis based on objects of interest into similarity classes which are
indiscernible with respect to some features.
Result and conclusion:
This theory offers two fundamental concepts: reduct and core.
In this paper, some basic ideas of rough set theory are first presented. Some experiment
results are also given.

Keywords: Rough set theory, Data mining, correlation

Full Text:

INTRODUCTION

Data mining is an emerging area of computational intelligence that offers new theories, techniques, and tools for processing large volumes of data. It has gained considerable attention among practitioners and researchers as evidenced by the number of publications, conferences, and application reports. The growing volume of data that is available in a digital form has accelerated this interest. Data mining relates to other areas, including machine learning, cluster analysis, regression analysis, and neural networks [1-5]. Rough set theory [6] is a relatively new mathematical technique developed by Pawlak in the 1980s to describe quantitatively uncertainty, imprecision and vagueness. Classical set theory deals with crisp sets and rough set theory may be considered an extension of the classical set theory. The rough set approach has many advantages. An important step in the knowledge discovery process is the reduction of thedimensionality of data. In real database systems, though there are many attributes and records, in some circumstances, in fact only some of the attributes are indispensable. If the dispensable attributes can be eliminated, the complexity of analyzing the data can be greatly reduced. Our algorithm is based on rough set theory which consists of two parts. The first part is for attribute reduction and the second is for rule extraction.

Dataset: In this experiment we use a wine dataset having 13 different attributes each of having 178 different data values. Using the 10 fold method we divide this dataset into 10 different test datasets and train datasets.

MATERIALS AND METHODS

Correlation:

Using correlation function in MATLAB, we get 10 different tables for 10 different train datasets in which each attribute value shows the correlation with other attributes.

RESULT AND DISCUSSION

After doing the stepwise experimental work, we get 10 correlation matrix. From the correlation-matrix of first training dataset, we got graph as given in Figure 1. The correlation coefficients are a normalized measure of the strength of the linear relationship between two variables and range between -1 and 1, where -1 means that one column of data has a negative linear relationship to another column of data. 0 means there is no linear relationship between the data columns. 1 means that there is a positive linear relationship between the data columns. Figure 1 is a grouped bar graph. The bars in the first group correspond to the first row of the matrix, the 2nd group to the 2nd row and so on. In this figure, 1st group bar, there are 13 bars each of which represents the correlation between each pair of attribute ( row & column). which is having negative and positive values in correlation are represented in both side of the origin, in such a way that each cluster of column bar shows 13 attributes in positive and negative value. So, this above Figure1 represents the graphical representation of correlation-matrix of first training dataset of wine dataset. After getting the correlation values between the 13 attributes of wine dataset, we get the 8 different Functional Dependencies using the threshold value θ = 0.5. And then using closer property we get 7 different values to predict the classification of these attributes i.e. {A,J}, {F,G,L,M,I,K}, {B,C,D,E,H}. Then using the Information gain formula, we find that the set{ F,G,L,M,I,K } and {B,C,D,E,H} can not be clustered again. Then using cardinality formula, the 3 different cardinality values in percentage of the sets {A,J}, {F,G,L,M,I,K}, {B,C,D,E,H} are generelated that are 32.9%, 68.4% , 42.7% respectively. Among these values 68.4% of set {F,G,L,M,I,K} is the highest value. This means that the reduct set {F,G,L,M,I,K} only can represent the characteristics of the total 13 attributes of this wine dataset.

CONCLUSION

In this paper, basic concepts of data mining and the rough set theory were discussed. The patterns formed by the rules extracted with rough set theory differ from other patterns. Here the method was illustrated with a numerical example. This method shows that instead of handling large volume of data, we can easily work with small-size of data which gives same meaningful information and characteristics of the whole data. This method enhances the utility of the extracted knowledge, reduces timecomplexity. This method can be further enhanced by getting the cardinality value 100% approximately.

References:

1. Andrew Kusiak. Rough Set Theory: A Data Mining Tool for Semiconductor Manufacturing. IEEE Transactions on Electronics Packaging Manufacturing 2001; 24(1):44-50. 14 International Journal of Current Research and Review www.ijcrr.com Vol. 02 issue 10 Oct 2010

2. Langley P, Simon HA. Applications of machine learning and rule induction. Commun. ACM 1995; 38(11) :5564.

3. Carbonell JG. Machine Learning: Paradigms and Methods. J. G. Carbonell, Ed. Cambridge. MA: MIT Press 1990.

4. Reiter R. A theory of diagnosis form first principles. Artif. Intell 1987; 35: 5795.

5. Barto A, Sutton RS. Reinforcement Learning. Cambridge. MA: MIT Press 1998.

6. Pawlak Z. Rough sets. Int. J. Inform. Comput.Sci 1982; 11(5): 341-356.

7. Pawlak Z. Rough Sets - Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Boston, London, Dordrecht 1991:229.

8. Pawlak Z. Rough set theory and its applications to data analysis. Cybernetics and Systems 1998; 29: 661-688 .

9. Jiawei Han, Micheline Kamber. Data Mining: Concepts and Techniques. Data Mining Books, Publisher: Elsevier Science Ltd. Second edition, China Machine Press :296 -303.

10. Abraham Silberschatz, Henry Korth, Sudarshan S. Database System Concepts, Database Books. McGraw-Hill.

11. Ihn-Han Bae, Hwa-Ju Lee, KyungSook Lee. Design and evaluation of a rough set- based anomaly detection scheme Considering weighted feature values. International Journal of KnowledgeBased and Intelligent Engg. System 2007; 11:201-206.

Research Incentive Schemes

Awards, Research and Publication incentive Schemes by IJCRR

Best Article Award: 

One article from every issue is selected for the ‘Best Article Award’. Authors of selected ‘Best Article’ are rewarded with a certificate. IJCRR Editorial Board members select one ‘Best Article’ from the published issue based on originality, novelty, social usefulness of the work. The corresponding author of selected ‘Best Article Award’ is communicated and information of award is displayed on IJCRR’s website. Drop a mail to editor@ijcrr.com for more details.

Women Researcher Award:

This award is instituted to encourage women researchers to publish her work in IJCRR. Women researcher, who intends to publish her research work in IJCRR as the first author is eligible to apply for this award. Editorial Board members decide on the selection of women researchers based on the originality, novelty, and social contribution of the research work. The corresponding author of the selected manuscript is communicated and information is displayed on IJCRR’s website. Under this award selected women, the author is eligible for publication incentives. Drop a mail to editor@ijcrr.com for more details.

Emerging Researcher Award:

‘Emerging Researcher Award’ is instituted to encourage student researchers to publish their work in IJCRR. Student researchers, who intend to publish their research or review work in IJCRR as the first author are eligible to apply for this award. Editorial Board members decide on the selection of student researchers for the said award based on originality, novelty, and social applicability of the research work. Under this award selected student researcher is eligible for publication incentives. Drop a mail to editor@ijcrr.com for more details.


Best Article Award

A Study by Juna Byun et al. entitled "Study on Difference in Coronavirus-19 Related Anxiety between Face-to-face and Non-face-to-face Classes among University Students in South Korea" is awarded Best Article for Vol 12 issue 16
A Study by Sudha Ramachandra & Vinay Chavan entitled "Enhanced-Hybrid-Age Layered Population Structure (E-Hybrid-ALPS): A Genetic Algorithm with Adaptive Crossover for Molecular Docking Studies of Drug Discovery Process" is awarded Best article for Vol 12 issue 15
A Study by Varsha M. Shindhe et al. entitled "A Study on Effect of Smokeless Tobacco on Pulmonary Function Tests in Class IV Workers of USM-KLE (Universiti Sains Malaysia-Karnataka Lingayat Education Society) International Medical Programme, Belagavi" is awarded Best article of Vol 12 issue 14, July 2020
A study by Amruta Choudhary et al. entitled "Family Planning Knowledge, Attitude and Practice Among Women of Reproductive Age from Rural Area of Central India" is awarded Best Article for special issue "Modern Therapeutics Applications"
A study by Raunak Das entitled "Study of Cardiovascular Dysfunctions in Interstitial Lung Diseas epatients by Correlating the Levels of Serum NT PRO BNP and Microalbuminuria (Biomarkers of Cardiovascular Dysfunction) with Echocardiographic, Bronchoscopic and HighResolution Computed Tomography Findings of These ILD Patients" is awarded Best Article of Vol 12 issue 13 
A Study by Kannamani Ramasamy et al. entitled "COVID-19 Situation at Chennai City – Forecasting for the Better Pandemic Management" is awarded best article for  Vol 12 issue 12
A Study by Muhammet Lutfi SELCUK and Fatma COLAKOGLU entitled "Distinction of Gray and White Matter for Some Histological Staining Methods in New Zealand Rabbit's Brain" is awarded best article for  Vol 12 issue 11
A Study by Anamul Haq et al. entitled "Etiology of Abnormal Uterine Bleeding in Adolescents – Emphasis Upon Polycystic Ovarian Syndrome" is awarded best article for  Vol 12 issue 10
A Study by Arpita M. et al entitled "Estimation of Reference Interval of Serum Progesterone During Three Trimesters of Normal Pregnancy in a Tertiary Care Hospital of Kolkata" is awarded best article for  Vol 12 issue 09
A Study by Ilona Gracie De Souza & Pavan Kumar G. entitled "Effect of Releasing Myofascial Chain in Patients with Patellofemoral Pain Syndrome - A Randomized Clinical Trial" is awarded best article for  Vol 12 issue 08
A Study by Virendra Atam et. al. entitled "Clinical Profile and Short - Term Mortality Predictors in Acute Stroke with Emphasis on Stress Hyperglycemia and THRIVE Score : An Observational Study" is awarded best article for  Vol 12 issue 07
A Study by K. Krupashree et. al. entitled "Protective Effects of Picrorhizakurroa Against Fumonisin B1 Induced Hepatotoxicity in Mice" is awarded best article for issue Vol 10 issue 20
A study by Mithun K.P. et al "Larvicidal Activity of Crude Solanum Nigrum Leaf and Berries Extract Against Dengue Vector-Aedesaegypti" is awarded Best Article for Vol 10 issue 14 of IJCRR
A study by Asha Menon "Women in Child Care and Early Education: Truly Nontraditional Work" is awarded Best Article for Vol 10 issue 13
A study by Deep J. M. "Prevalence of Molar-Incisor Hypomineralization in 7-13 Years Old Children of Biratnagar, Nepal: A Cross Sectional Study" is awarded Best Article for Vol 10 issue 11 of IJCRR
A review by Chitra et al to analyse relation between Obesity and Type 2 diabetes is awarded 'Best Article' for Vol 10 issue 10 by IJCRR. 
A study by Karanpreet et al "Pregnancy Induced Hypertension: A Study on Its Multisystem Involvement" is given Best Paper Award for Vol 10 issue 09
Late to bed everyday? You may die early, get depression
Egg a day tied to lower risk of heart disease
88 Percent Of Delhi Population Has Vitamin D Deficiency: ASSOCHAM Report

List of Awardees

Awardees of COVID-19 Research

Woman Researcher Award

A Study by Neha Garg et al. entitled "Optimization of the Response to nCOVID-19 Pandemic in Pregnant Women – An Urgent Appeal in Indian Scenario" published in Vol 12 issue 09

A Study by Sana Parveen and Shraddha Jain entitled "Pathophysiologic Enigma of COVID-19 Pandemic with Clinical Correlates" published in Vol 12 issue 13

A Study by Rashmi Jain et al. entitled "Current Consensus Review Article on Drugs and Biologics against nCOVID-19 – A Systematic Review" published in Vol 12 issue 09

Emerging Researcher Award

A Study by Madhan Jeyaraman et al. entitled "Vitamin-D: An Immune Shield Against nCOVID-19" published in Vol 12 issue 09


RSS feed

Indexed and Abstracted in


Antiplagiarism Policy: IJCRR strongly condemn and discourage practice of plagiarism. All received manuscripts have to pass through "Plagiarism Detection Software" test before forwarding for peer review. We consider "Plagiarism is a crime"

IJCRR Code of Conduct: We at IJCRR voluntarily adopt policies on Code of Conduct, and Code of Ethics given by OASPA and COPE. To know about IJCRRs Code of Conduct, Code of Ethics, Artical Retraction policy, Digital Preservation Policy, and Journals Licence policy click here

Disclaimer: International Journal of Current Research and Review (JICRR) provides platform for researchers to publish and discuss their original research and review work. IJCRR can not be held responsible for views, opinions and written statements of researchers published in this journal.



Company name

International Journal of Current Research and Review (JICRR) provides platform for researchers to publish and discuss their original research and review work. IJCRR can not be held responsible for views, opinions and written statements of researchers published in this journal

Contact

148, IMSR Building, Ayurvedic Layout,
        Near NIT Complex, Sakkardara,
        Nagpur-24, Maharashtra State, India

editor@ijcrr.com

editor.ijcrr@gmail.com


Copyright © 2020 IJCRR. Specialized online journals by ubijournal .Website by Ubitech solutions