<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2d1 20170631//EN" "JATS-journalpublishing1.dtd">
<article xlink="http://www.w3.org/1999/xlink" dtd-version="1.0" article-type="healthcare" lang="en"><front><journal-meta><journal-id journal-id-type="publisher">IJCRR</journal-id><journal-id journal-id-type="nlm-ta">I Journ Cur Res Re</journal-id><journal-title-group><journal-title>International Journal of Current Research and Review</journal-title><abbrev-journal-title abbrev-type="pubmed">I Journ Cur Res Re</abbrev-journal-title></journal-title-group><issn pub-type="ppub">2231-2196</issn><issn pub-type="opub">0975-5241</issn><publisher><publisher-name>Radiance Research Academy</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="publisher-id">2929</article-id><article-id pub-id-type="doi"/><article-id pub-id-type="doi-url"> http://dx.doi.org/10.31782/IJCRR.2020.121924</article-id><article-categories><subj-group subj-group-type="heading"><subject>Healthcare</subject></subj-group></article-categories><title-group><article-title>Exploratory Data Analysis and ETL with SAS on Hadoop Eco-system with Cervical Cancer Dataset&#13;
</article-title></title-group><contrib-group><contrib contrib-type="author"><name><surname>Xiaotian</surname><given-names>Cheng</given-names></name></contrib><contrib contrib-type="author"><name><surname>Thiruchelvam</surname><given-names>Vinesh</given-names></name></contrib><contrib contrib-type="author"><name><surname>Vistro</surname><given-names>Daniel Mago</given-names></name></contrib></contrib-group><pub-date pub-type="ppub"><day>6</day><month>10</month><year>2020</year></pub-date><volume>9)</volume><issue/><fpage>88</fpage><lpage>104</lpage><permissions><copyright-statement>This article is copyright of Popeye Publishing, 2009</copyright-statement><copyright-year>2009</copyright-year><license license-type="open-access" href="http://creativecommons.org/licenses/by/4.0/"><license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution (CC BY 4.0) Licence. You may share and adapt the material, but must give appropriate credit to the source, provide a link to the licence, and indicate if changes were made.</license-p></license></permissions><abstract><p>Objective: The main objective of this project is to explore and analyse a secondary dataset which collected from __ampersandsignldquo;Hospital Universitario de Caracas__ampersandsignrdquo; in Caracas, Venezuela. Methods: The dataset comprises 858 patients__ampersandsignrsquo; information relating to demographic information and medical history data. There is a large number of records which are left with blank, which might be intentionally avoided by the patient due to privacy considerations. SAS Studio is utilized in data exploration and data pre-processing. Data cleaning and data transformation are conducted basing on the knowledge gathered in the process of data exploration. Afterwards, the dataset was exported from SAS Studio and uploaded to Hadoop Hortonworks platform for analysing purpose. Lastly, five hypotheses have been explored with the visualization tool of Tableau.&#13;
</p></abstract><kwd-group><kwd> Data Management</kwd><kwd> SAS</kwd><kwd> Hadoop</kwd><kwd> Cervical cancer</kwd><kwd> Tableau</kwd><kwd> Healthcare</kwd></kwd-group></article-meta></front></article>
