Tuesday, November 29, 2011

Use of Snowball Sampling to Study Populations Hard - to – Frame

Some populations that we may be interested in studying can be hard-to-frame. These include populations such as of AIDS/HIV positive individuals, individuals/ institutions involved in some illegal activities like theft, burglary, prostitution, use of banned drugs, abortions for sex determination etc. and so forth. Snowball sampling is a non-probability based sampling technique that can be used to gain access to such populations partially up to certain number and then say the findings based on the group selected about such group of individuals.
To have such a sample from the hard-to-frame population, there are two steps namely, i) try and identify one or more sample units in the desired population or render access to such individuals/units; and ii) use these individuals/units to find further similar individuals/units and so on until the required sample size is obtained.
Supposing, the population we are interested in are the students of a university that take banned drugs. Each student may be referred to as a sample unit. Collectively, all students of the university who are such drug users make up our population. However, we are only interested in examining the sample of these drug users who are the students of the university.
Firstly, we need to try and find one or more such students from the university we are concerned. Finding just a small number of individuals willing to identify themselves and take part in the research study on banned drug users may be quite difficult, so the aim is to start with just one or two students.
Due to the sensitivity of the study, the researcher should ask the initial students who agreed to take part in the study to help also in identifying some more students that are also the banned drug users. The process continues until sufficient students of the university have been identified to meet the desired sample size. We need not consider the individuals who are not part of the university at that point of time.
Snowball sampling is a useful choice of sampling strategy when the population is hard-to-frame because:
• It is difficult to identifying individuals/units to include in your sample, perhaps because there is no obvious list/frame of the population you are interested in.
• There may be no other way of accessing/getting your sample, making snowball sampling the only viable choice of sampling strategy.
• The sensitivity of coming forward to take part in a survey is more adverse in such contexts. However, since snowball sampling involve like individuals who know each other and may take part in such a survey as there may be some common characteristics and other social factors between these individuals that help to break down some of the barriers that prevent them from taking part outside their association.
• The unknown nature of some groups may also make it difficult to identify various parts of the population that warrant investigation. In the case of banned drug users, it may be obvious to identify strata such as gender, type of banned drugs used, frequency of the drugs used and so forth. One need to find the characteristics of the population you want to examine at the start of the survey and the same may not be known in its entirety. The snowball sampling may also be helpful in finding the unknown characteristics that could be of interest before settling on your sampling criteria.

Snowball sampling is a not very useful choice of sampling strategy when the population is hard-to-frame if we need to determine the possible sampling error and make generalizations from the sample to the population, since snowball sampling does not select sample units randomly as in case of probability sampling techniques. As such, snowball samples should not be considered to be true representative of the population being studied.

Monday, November 28, 2011


Using inverse sampling for the estimation of a rare population parameter has a long history. For example, Haldane (1945) used inverse sampling to estimate the frequency of a rare event. Recently, Christman & Lan (2001) applied inverse sampling designs to populations in which the variable of interest tends to be at or near zero for many of population units, and distinctly different from zero for a few population units. More recently, Abha Aggarwal & Arvind Pandey (2010) applied inverse sampling to study disease burden of leprosy. Their findings showed that inverse sampling was advantageous over conventional sampling and could be adopted for the large scale survey at national level.

A number of survey methodologies are available for estimation of Maternal Mortality Ratio (MMR). Recently in 2010-11, Registrar General of India (RGI) conducted Annual Health Survey (AHS) in nine EAG States and Assam in order to have district level estimates of Crude Birth and Death Rates, Infant Mortality Ratio (IMR) and Maternal Mortality Ratio (MMR). Over 40 lakh households were surveyed spread over nine states of India. But the estimates for the MMR could not be presented at the district level in their recently released Bulletins of the nine AHS states. Instead, they have published the same at commissionary (Districts) i.e. for a group of districts

Sample Design used for Annual Health Survey:

A Uni-Stage Stratified SRS without replacement except in case of large villages of rural areas (with population more than 2000 persons as per 2001 census) wherein a two stage stratified sampling has been applied in case of AHS the sample design has been adopted. In urban areas, the sample units are Census Enumeration Blocks (CEBs) and these are villages in the rural areas. In the rural areas, the villages have been divided into two strata. Stratum I comprise of all villages with population less than 2000 and stratum II contains the villages with population more than 2000 or more. Smaller villages with population less than 200 were excluded from the sampling frame in such a manner that the total population of villages so excluded did not exceed 2% of the total population of the district. In case of stratum I, the entire village of the sample unit and in case of stratum II, the village has been divided into mutually exclusive (non-overlapping) and geographically continuous units called segments of more or less of equal size, the population not exceeding 2000 in any case. One segment was selected from the frame of segments thus prepared in a random manner to represent the selected village at the second stage of sampling. The number of sample villages in each district was allocated between the two strata proportionally to their population. The villages in each size stratum were further ordered by the female literacy rate based again on the 2001 Census data, and three equal size and disjoint substrata were established. The sample villages within each substratum were selected by SRS without replacement. In urban areas, the CEBs within a district were also ordered by the female literacy rate based again on the 2001 Census data, and three equal size and disjoint substrata were established. The sample CEBs within each substratum were selected by SRS without replacement. This process of selection ensured equal representation across three sub-strata both ion rural as well as in urban areas of a district besides rendering the sample design as self-weighting.

Sample Size: For the AHS districts, the sample size determination has been done on the basis of the Infant Mortality Rate indicator and the permissible level of error has been taken as 10% relative standard error at the district level. It had been assumed that the sample size so worked out may also enable generation of rarer indicators like TFR and MMR (at least for a group of districts) with good precision. In the absence of district level estimates from any other reliable source, the district level estimates of IMR based on SRS pooled data have been used for the estimation of sample size for each of the districts.

. A need is felt to evolve suitable survey methodology to estimate the MMR at the district level. Conventional sampling techniques for estimation of MMR are difficult to apply due to large sample size requirement. In addition, when conventional sampling is used to detect rare events there is likelihood of not getting even a single event even after covering a large sample. In such situations, sampling techniques for rare event are more appropriate to be used for estimation. Inverse sampling is one of such techniques that detect predetermined cases in the study population. It is said to be appropriate for the survey of rare events wherein the number of rare events are fixed in advance or predetermined and the sampling is continued till the predetermined number of rare event is obtained in the population. This sampling methodology also serves a solution for the quota sampling5.
This note presents the feasibility and suitability of adopting inverse sampling as against conventional sampling procedure for estimation of MMR with a view to estimate it up at the district level.
Under inverse sampling the number of rare event is fixed in advance say ‘m’ (new cases of maternal death) and sampling is continued till the desired numbers of such rare events appear in the population. Apparently the required sample size ‘n’ is a random variable. It is contrary to the conventional cluster sampling where sample size ‘n’ is fixed in advance and the rare event is counted after attaining the sample size ‘n’, then the MMR ‘P’ is estimated as m/n where m is the number of events in the study population.
Under inverse sampling if ‘n’ is the sample size, at which the mth case appears, an unbiased estimate of P is given by p = (m-1)/(n-1). The unbiased variance estimator of p is given by,

Where, N is the total number of live births in the study area. Hence, the coefficient of variation (CV) is given by √[(N-n+1)*(1-p)/{N*p*(n-2)}] * 100.
A sample of m new cases of maternal mortality is assumed. The total female population giving live births to be covered at this stage is unknown (random variable). Hence, sampling is to be continued until m new cases of maternal mortality were found.
The sample design for selection of villages can be assumed to be the same as in case of AHS. The first village is to be selected randomly from the list of villages. The first household in the selected village was taken from the prominent point to start the data collection. Complete enumeration in the village is to be done to get the m new cases of maternal mortality. If m new cases were not found, then the next village is to be selected from the list/frame. In a similar way, sampling is to be continued till m new cases of maternal mortality were traced. All pregnant mothers who gave birth are to be interviewed for tracing maternal mortality.
The case of an AHS District: Chamoli
As per the AHS results in public domain and 2011 Census, we have

Actual population of the district 391114
Sample population covered in AHS (2010-11) 153000
Crude Birth Rate 17.7
MMR 190 (MOSPI Site)

Women giving births in the whole district N = .0177 * 391114 = 6923 (Estimate)
Women giving births in the AHS sample from the district n = .0177 * 153000= 2708
Sample Proportion used in AHS for MMR estimation 39.12%
No of Maternal deaths observed during AHS = 0.00190 * 2708 = 5 in a year (Estimate)

If we assume m = 2 and wish to arrive the same MMR of 190, sample size requirement is to trace 527 women giving live birth to a child and this gives sample proportion as 7.6% only and the coefficient of variation is just 0.18%. In case of AHS conventional design, this variation is much higher i.e. much more than 10% and the sample coverage taken is over 39%.

Thus, the inverse sampling can yield better precision at very low sample size requirement.