randsampleFS

randsampleFS generates a random sample of k elements from the integers 1 to n (k<=n)

Syntax

Description

example

y =randsampleFS(n, k) randsampleFS with default options.

example

y =randsampleFS(n, k, method) randsampleFS with optional argument set to method (2).

Examples

expand all

  • randsampleFS with default options.
  • default method (1) is used.

    randsampleFS(1000,10)
    ans =
    
       772    20   141   730   541   714   229   674    12   529
    
    

  • randsampleFS with optional argument set to method (2).
  • method = 2;
    randsampleFS(100,10,method)
    ans =
    
        17    27    37    47    57    67    77    87    97     7
    
    

    Related Examples

    expand all

  • randsampleFS with optional arguments set to method (3).
  • method = 3;
    % Here, being nsel so big wrt nsamp, it is likely to obtain repetitions.
    randsampleFS(100,10,method)

  • randsampleFS Weighted Sampling Without Replacement.
  • Extract k=10 number in [-1000 -900] with gamma distributed weights.

    population = -1000:1:-900;
    n = numel(population);
    wgts = sort(random('gamma',0.3,2,n,1),'descend');
    k=10;
    y = randsampleFS(n,k,wgts);
    sample  = population(y);
    plot(wgts,'.r')
    hold on;
    text(y,wgts(y),'X');
    title('Weight distribution with the extracted numbers superimposed')

    Input Arguments

    expand all

    n — A vector of numbers will be selected from the integers 1 to n. Scalar, a positive integer.

    Data Types: single|double

    k — The number of elements to be selected. Non negative integer.

    Data Types: single|double

    Optional Arguments

    method — Sampling methods. Scalar or vector.

    Methods used to extract the subsets. See more about for details.

    Default is method = 0.

    - Scalar from 0 to 3 determining the method used to extract (without replacement) the random sample.

    - Vector of weights: in such a case, a weighted sampling without replacement algorithm is applied using that vector of weights.

    Example: randsampleFS(100,10,2)

    Data Types: single|double

    Output Arguments

    expand all

    y —A column vector of k values sampled at random from the integers 1:n. For methods 0, 1, 2 and weighted sampling the elements extracted are unique; For method 3 (included for historical reasons) there is no guarantee that the elements extracted are unique

    Data Types - single|double.

    More About

    expand all

    Additional Details

    The method=0 uses MATLAB function randperm. In old MATLAB releases randperm was slower than FSDA function shuffling, which is used in method 1 (for example, in R2009a - MATLAB 7.8 - randperm was at least 50 slower).

    If method=1 the approach depends on the population and sample sizes: - if $n < 1000$ and $k < n/(10 + 0.007n)$, that is if the population is relatively small and the desired sample is small compared to the population, we repeatedly sample with replacement until there are k unique values;

    - otherwise, we do a random permutation of the population and return the first k elements.

    The threshold $k < n/(10 + 0.007n)$ has been determined by simulation under MATLAB R2016b. Before, the threshold was $n < 4*k$.

    If method=2 systematic sampling is used, where the starting point is random and the step is also random.

    If method=3 random sampling is based on the old but well known Linear Congruential Generator (LCG) method. In this case there is no guarantee to get unique numbers. The method is included for historical reasons.

    If method is a vector of n weights, then Weighted Sampling Without Replacement is applied. Our implementation follows Efraimidis and Spirakis (2006). MATLAB function datasample follows Wong and Easton (1980), which is also quite fast; note however that function datasample may be very slow if applied repetedly, for the large amount of time spent on options checking.

    Remark on computation performances. Method=2 (systematic sampling) is by far the fastest for any practical population size $n$. For example, for $n \approx 10^6$ method=2 is two orders of magniture faster than method=1. With recent MATLAB releases (after R2011b) method = 0 (which uses compiled MATLAB function randperm) has comparable performances, at least for reasonably small $k$. In releases before 2012a, randperm was considerably slow.

    References

    Fisher, R.A. and Yates, F. (1948), "Statistical tables for biological, agricultural and medical research (3rd ed.)", Oliver & Boyd, pp. 26-27. [For Method 1]

    Cochran, W.G. (1977), "Sampling techniques (Third ed.)", Wiley. [For Method 2]

    Knuth, D.E. (1997), "The Art of Computer Programming, Volume 2: Seminumerical Algorithms, Third Edition" Addison-Wesley, pp. 10-26. [For Method 3. For details see: Section 3.2.1: The Linear Congruential Method]

    Efraimidis, P.S. and Spirakis, P.G. (2006), Weighted random sampling with a reservoir, "Information Processing Letters", Vol. 97, pp. 181-185. [For Weighted Sampling Without Replacement]

    Wong, C.K. and Easton, M.C. (1980), An Efficient Method for Weighted Sampling Without Replacement, "SIAM Journal of Computing", Vol. 9, pp. 111-113.

    This page has been automatically generated by our routine publishFS