<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2d1 20170631//EN" "JATS-journalpublishing1.dtd">
<article xlink="http://www.w3.org/1999/xlink" dtd-version="1.0" article-type="healthcare" lang="en"><front><journal-meta><journal-id journal-id-type="publisher">IJCRR</journal-id><journal-id journal-id-type="nlm-ta">I Journ Cur Res Re</journal-id><journal-title-group><journal-title>International Journal of Current Research and Review</journal-title><abbrev-journal-title abbrev-type="pubmed">I Journ Cur Res Re</abbrev-journal-title></journal-title-group><issn pub-type="ppub">2231-2196</issn><issn pub-type="opub">0975-5241</issn><publisher><publisher-name>Radiance Research Academy</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="publisher-id">2855</article-id><article-id pub-id-type="doi"/><article-id pub-id-type="doi-url"> http://dx.doi.org/10.31782/IJCRR.2020.121712</article-id><article-categories><subj-group subj-group-type="heading"><subject>Healthcare</subject></subj-group></article-categories><title-group><article-title>Data Management for Healthcare with a Focus on Privacy and Security for Cancer Patients&#13;
</article-title></title-group><contrib-group><contrib contrib-type="author"><name><surname>Sandhu</surname><given-names>Hardeep Singh</given-names></name></contrib><contrib contrib-type="author"><name><surname>Vistro</surname><given-names>Daniel Mago</given-names></name></contrib></contrib-group><pub-date pub-type="ppub"><day>8</day><month>09</month><year>2020</year></pub-date><volume>7)</volume><issue/><fpage>62</fpage><lpage>70</lpage><permissions><copyright-statement>This article is copyright of Popeye Publishing, 2009</copyright-statement><copyright-year>2009</copyright-year><license license-type="open-access" href="http://creativecommons.org/licenses/by/4.0/"><license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution (CC BY 4.0) Licence. You may share and adapt the material, but must give appropriate credit to the source, provide a link to the licence, and indicate if changes were made.</license-p></license></permissions><abstract><p>Background: Despite the technological advancements in the medical field and patient care, a key area that is lacking for the healthcare sector is on patient privacy and security of the infrastructure enabling and managing patient data in digital format. Numerous security incidents such as ransomware and gross violations of patient security were observed in the healthcare sector, patient privacy must receive more attention from the medical sector. Furthermore, as the severity of the illness increases, it becomes paramount for the patient__ampersandsignrsquo;s privacy to be protected as there are socio-economic impacts on a patient__ampersandsignrsquo;s lifestyle. One such disease that is receiving greater attention and funding is cancer. With cancer-killing, some 8 million patients on an annual basis further research and diagnostics measures can leverage on various data management techniques to improve results accuracy and gain critical insights into the disease. Methods: As such the cervical cancer dataset from __ampersandsignldquo;Hospital Universitario de Caracas__ampersandsignrdquo; in Caracas, Venezuela is used to explore various data cleaning techniques for filling missing values such as global constants, proportion-based filling of missing values and using central tendency measures. Furthermore, as most data in this form of research tends to be skewed, data transformation techniques are also discussed to normalise the data. Another transformation which is applied extensively in this study is the discretisation methods that is used to bin continuous variables to qualitative groupings that are then used for machine learning techniques. Results: As medical data can be extremely large, the Apache Hadoop framework is used to upload the dataset and Optimised Row-Column (ORC) is the most optimal way to store and read data is also demonstrated. Several hypotheses were developed and tested to gain some preliminary insights into cervical cancer.&#13;
</p></abstract><kwd-group><kwd> Data management</kwd><kwd> Privacy</kwd><kwd> Security</kwd><kwd> Healthcare</kwd><kwd> Hadoop</kwd><kwd> Optimized row-column (orc)</kwd><kwd> Cervical cancer</kwd></kwd-group></article-meta></front></article>
