Data collection, entry, cleaning and management
The purpose of this section is to explain how to collect data as appropriate for the nature of your project, and what to do with them once they are collected to be ready for the statistical analysis.
Introduction
Population:
A group of individuals with a common characteristic. This characteristic may be a geographical area e.g. the population of Saudi Arabia, a common disease e.g. diabetic individual, or sex and so on.
Sample:
The sample is a subset of the population chosen for the study and conclusions are made for whole population.
Variable:
A variable is a characteristic that is specific for each member of the population or sample. It takes different values in different situations, e.g. age, weight, hight and sex of each member are considered 4 variables.
Types of variables
There are two main types:
Categorical: fixed (noncontinuous)
variables that can be grouped into categories.
They are further subdivided into:
Dichotomous:
When the piece of information is one of two responses (either or), e.g. hypertensive vs non hypertensive, Pass or fail and so on.
Nominal:
When the piece of information can be one of three or more responses, and the responses cannot be sequenced (ordered), e.g. blood groups (A, B, O, and AB).
Ordinal:
When the piece of information can be one of three or more responses, and the responses can be sequenced, e.g. the severity of any disease can be (mild, moderate or severe).
Continuous: Variables that have the potential to be any value within a continuous range.
Simply, Time for instance is a range of continuous values. E.g. when you are collecting data about time of admission, it can be 2:30 pm, 2:31 pm, 2:32 pm …etc. Other examples are: Age, Blood pressure, Height, Weight.
Data
The values of a variable, i.e. if the hights of 5 students in a class are 155, 160, 153, 158, and 166 cm, the the hight is called a variable and the values are its data.
How data can be collected?
Many methods used to collect data depending on the nature and the need of the study, including questionnaire, interviews, observation, records and files. Basically, any paper or electronic form used to collect data is accepted as long as contains the data of interest.
Database Structure
What is database?
It is all the data collected and organized but not yet interpreted. It can be easily reached and updated. Databases can be structured in many computer programs such as Microsoft Excel, Microsoft access, SPSS.
Each of these can be used for data entry, archiving, and for analysis too. However, what is done usually by researchers is entering and archiving their database in Microsoft excel till it's complete and ready, at this step it's transferred to SPSS for the statistical analysis.
Data Entry
It is the process of typing the data you got into a computer program. It can be single (entered into one computer) or double (entered into two computers, then the two databases are compared and any discrepancy will be resolved by computer). Double data entry is used to avoid or reduce errors in data entry.
Data Cleaning
Data cleaning is a process where the data is checked for entry errors or extreme values in the database. It is a very important point as errors may occur no matter how the person is cautious.
Data Management
It is a crucial step just before data analysis. The goal is to create new variables based on the available ones.
There are 3 types of data management:
Recoding:
Creating new variables from the already coded variables, e.g. when you gathered information about smoking status and you got (0=non-smoker) (1=ex-smoker) (2=smoker), if you're further interested in having a new variable of (smoker vs non-smoker) only, you can merge non-smokers and ex-smokers as (non-smokers=0, in the new variable), while smokers remains the same with a new code=1.
Categorization:
When one "continuous" variable is divided into more, e.g. Systolic BP is divided into hypotensive, normotensive and hypertensive.
Computation:
Is having a new variable from doing mathematical equations of more than 1 variable, e.g. calculating BMI from weight and height.
By: Abdullah Alharbi