Sampling is the process of selecting a subset of individuals or data points from a larger population to study. The goal is to choose a sample that is representative of the population, so that the findings from the sample can be generalized back to the population. There are two main categories of sampling methods: probability and non-probability.
Probability vs. Non-Probability Sampling
Feature
Probability Sampling
Non-Probability Sampling
Selection:
Randomly selected
Non-randomly selected
Chance:
Every member of the population has a known chance of being selected.
The chance of selection for each member is unknown.
Bias:
Reduces sampling bias
More susceptible to sampling bias
Generalization:
Allows for generalization to the population.
Limited generalizability.
Use:
Often used in quantitative research.
Often used in qualitative research or when probability sampling is not feasible.
Probability Sampling Methods
Simple Random Sampling: Every member of the population has an equal chance of being selected. This is like drawing names out of a hat. It's the most basic form of probability sampling.
Stratified Sampling: The population is divided into subgroups (strata) based on shared characteristics (e.g., age, gender, income). A random sample is then drawn from each stratum. This ensures representation from all subgroups.
Systematic Sampling: Every nth member of the population is selected after a random starting point. For example, you might select every 10th person on a list. This method is efficient but can be biased if there's a pattern in the population list.
Cluster Sampling: The population is divided into clusters (e.g., geographic areas). A random sample of clusters is selected, and all members within the chosen clusters are included in the sample. This is useful when the population is geographically dispersed.
Non-Probability Sampling Methods
Purposive Sampling: The researcher selects participants based on specific criteria or characteristics relevant to the research question. The researcher chooses participants who they believe will provide the most valuable information.
Quota Sampling: Similar to stratified sampling, the population is divided into subgroups. However, instead of randomly sampling from each subgroup, the researcher sets quotas for the number of participants to be selected from each subgroup. The researcher then uses convenience or judgment sampling to fill the quotas.
Snowball Sampling: Participants are recruited by asking existing participants to refer other potential participants. This method is useful when studying hard-to-reach populations, such as those involved in illegal activities or with rare conditions. It's like building a snowball – it gets bigger as it rolls.
Data Collection Methods
Data collection is the process of gathering information relevant to the research question. The specific methods used depend on the research design and the type of data needed. Here are some common data collection methods:
Interviews
Interviews involve direct interaction between the researcher and the participant. The researcher asks questions, and the participant provides answers. Interviews can be structured (with pre-determined questions), semi-structured (with some flexibility in the questions), or unstructured (conversational). They are useful for gathering in-depth information about experiences, opinions, and perspectives.
Advantages: Rich, detailed data; allows for probing and clarification; can explore complex issues.
Disadvantages: Time-consuming; can be expensive; potential for interviewer bias; requires skilled interviewers.
Questionnaires
Questionnaires are sets of pre-designed questions that participants answer, typically on paper or online. They are a cost-effective way to collect data from a large number of people. Questionnaires can include closed-ended questions (e.g., multiple-choice, rating scales) or open-ended questions (allowing for free-form answers).
Advantages: Efficient for large samples; cost-effective; easy to analyze quantitative data; minimizes interviewer bias.
Disadvantages: Limited depth of information; potential for response bias; difficult to ensure high response rates; may not be suitable for complex issues.
Observation
Observation involves systematically watching and recording behavior or events. The researcher may observe participants in a natural setting (e.g., classroom, workplace) or in a controlled setting (e.g., laboratory). Observation can be participant observation (researcher is part of the group being observed) or non-participant observation (researcher observes from a distance).
Advantages: Provides direct information about behavior; can study real-world situations; useful for exploring complex social interactions.
Disadvantages: Time-consuming; potential for observer bias; ethical considerations (e.g., privacy); behavior may change if people know they are being observed.
Document Analysis
Document analysis involves reviewing existing documents to gather information. Documents can include written materials (e.g., reports, letters, articles), visual materials (e.g., photographs, maps), or audio-visual materials (e.g., recordings, videos). Document analysis is useful for studying historical trends, organizational processes, or cultural phenomena.
Advantages: Cost-effective; readily available data; can provide historical context; unobtrusive (no direct interaction with participants).
Disadvantages: Limited to the information available in the documents; potential for bias in document creation or selection; may not be relevant to all research questions; interpretation can be subjective.
Data Analysis: Measures of Central Tendency and Dispersion, and Chart Construction
This section demonstrates how to calculate measures of central tendency and dispersion, and how to construct and interpret common charts.
Measures of Central Tendency
Central tendency describes the "center" or typical value of a dataset.
Mean: The average of all values. Sum all values and divide by the number of values.
Formula: Mean (μ or x̄) = Σx / n
Example: Data = {2, 4, 6, 8, 10}. Mean = (2+4+6+8+10) / 5 = 6
Median: The middle value when the data is ordered from least to greatest. If there are two middle values, the median is their average.
Example: Data = {2, 4, 6, 8, 10}. Median = 6. Data = {2, 4, 6, 8, 10, 12}. Median = (6+8)/2 = 7
Mode: The value that appears most frequently. A dataset can have multiple modes or no mode.
Example: Data = {2, 4, 6, 6, 8, 10}. Mode = 6
Measures of Dispersion
Dispersion describes how spread out the data is.
Range: The difference between the maximum and minimum values.
Example: Data = {2, 4, 6, 8, 10}. Range = 10 - 2 = 8
Variance: The average of the squared differences from the mean. It measures how much the data points typically vary from the average.
Formula: Variance (σ² or s²) = Σ(x - μ)² / n (for population) or Σ(x - x̄)² / (n-1) (for sample)
Standard Deviation: The square root of the variance. It provides a more interpretable measure of spread because it's in the same units as the original data.
Formula: Standard Deviation (σ or s) = √Variance
Example: Data = {2, 4, 6, 8, 10}. Variance = 8. Standard Deviation = √8 ≈ 2.83
Chart Construction and Interpretation
Pie Chart: Used to show the parts of a whole. Each slice represents a category, and the size of the slice is proportional to the percentage of the whole that the category represents.
Interpretation: Easy to visualize proportions. Best for a small number of categories.
Example: Showing the percentage of sales for different product categories.
Bar Chart: Used to compare values across different categories. The height of each bar represents the value for that category.
Interpretation: Easy to compare values between categories. Can be used for many categories.
Example: Showing the sales figures for different regions.
Histogram: Similar to a bar chart, but used for continuous data. The bars represent the frequency of data within specific intervals (bins).
Interpretation: Shows the distribution of the data. Helps identify patterns like skewness or normality.
Example: Showing the distribution of student test scores.
Frequency Polygon: A line graph that connects the midpoints of the bars in a histogram. It also shows the distribution of the data.
Interpretation: Similar to a histogram, but can be easier to compare multiple distributions on the same graph.
Example: Comparing the distribution of test scores for two different classes.