항목 순서를 유지하면서 목록에서 무작위 샘플을 얻습니까?
정렬 된 목록이 있습니다. (실제로 숫자가 아니라 복잡한 시간 소모 알고리즘으로 정렬 된 개체 목록입니다)
mylist = [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 ,9 , 10 ]
N 개의 항목을 제공하지만 주문을 유지하는 파이썬 함수가 있습니까?
예:
randomList = getRandom(mylist,4)
# randomList = [ 3 , 6 ,7 , 9 ]
randomList = getRandom(mylist,4)
# randomList = [ 1 , 2 , 4 , 8 ]
기타...
다음 코드는 크기 4의 무작위 샘플을 생성합니다.
import random
sample_size = 4
sorted_sample = [
mylist[i] for i in sorted(random.sample(range(len(mylist)), sample_size))
]
(참고 : Python 2에서는 ) xrange
대신 사용 하는 것이 좋습니다.range
설명
random.sample(range(len(mylist)), sample_size)
원래 목록 의 인덱스 에 대한 무작위 샘플을 생성 합니다.
이러한 인덱스는 원래 목록의 요소 순서를 유지하기 위해 정렬됩니다.
마지막으로, 목록 이해력은 샘플링 된 인덱스가 주어지면 원래 목록에서 실제 요소를 가져옵니다.
간단한 코딩 O (N + K * log (K)) 방식
인덱스를 교체하지 않고 무작위 샘플을 가져 와서 인덱스를 정렬 한 다음 원본에서 가져옵니다.
indices = random.sample(range(len(myList)), K)
[myList[i] for i in sorted(indices)]
또는 더 간결하게 :
[x[1] for x in sorted(random.sample(enumerate(myList),K))]
최적화 된 O (N)-시간, O (1)-보조 공간 방식
대안으로 수학 트릭을 사용하고 myList
동적으로 변화하는 확률로 숫자를 선택하면서 왼쪽에서 오른쪽으로 반복적으로 이동할 수 있습니다 (N-numbersPicked)/(total-numbersVisited)
. 이 접근 방식의 장점 O(N)
은 정렬을 포함하지 않기 때문에 알고리즘 이라는 것입니다 !
from __future__ import division
def orderedSampleWithoutReplacement(seq, k):
if not 0<=k<=len(seq):
raise ValueError('Required that 0 <= sample_size <= population_size')
numbersPicked = 0
for i,number in enumerate(seq):
prob = (k-numbersPicked)/(len(seq)-i)
if random.random() < prob:
yield number
numbersPicked += 1
개념 증명 및 확률이 올바른지 테스트 :
5 시간 동안 1 조 개의 의사 난수 샘플로 시뮬레이션 :
>>> Counter(
tuple(orderedSampleWithoutReplacement([0,1,2,3], 2))
for _ in range(10**9)
)
Counter({
(0, 3): 166680161,
(1, 2): 166672608,
(0, 2): 166669915,
(2, 3): 166667390,
(1, 3): 166660630,
(0, 1): 166649296
})
Probabilities diverge from true probabilities by less a factor of 1.0001. Running this test again resulted in a different order meaning it isn't biased towards one ordering. Running the test with fewer samples for [0,1,2,3,4], k=3
and [0,1,2,3,4,5], k=4
had similar results.
edit: Not sure why people are voting up wrong comments or afraid to upvote... NO, there is nothing wrong with this method. =)
(Also a useful note from user tegan in the comments: If this is python2, you will want to use xrange, as usual, if you really care about extra space.)
edit: Proof: Considering the uniform distribution (without replacement) of picking a subset of k
out of a population seq
of size len(seq)
, we can consider a partition at an arbitrary point i
into 'left' (0,1,...,i-1) and 'right' (i,i+1,...,len(seq)). Given that we picked numbersPicked
from the left known subset, the remaining must come from the same uniform distribution on the right unknown subset, though the parameters are now different. In particular, the probability that seq[i]
contains a chosen element is #remainingToChoose/#remainingToChooseFrom
, or (k-numbersPicked)/(len(seq)-i)
, so we simulate that and recurse on the result. (This must terminate since if #remainingToChoose == #remainingToChooseFrom, then all remaining probabilities are 1.) This is similar to a probability tree that happens to be dynamically generated. Basically you can simulate a uniform probability distribution by conditioning on prior choices (as you grow the probability tree, you pick the probability of the current branch such that it is aposteriori the same as prior leaves, i.e. conditioned on prior choices; this will work because this probability is uniformly exactly N/k).
edit: Timothy Shields mentions Reservoir Sampling, which is the generalization of this method when len(seq)
is unknown (such as with a generator expression). Specifically the one noted as "algorithm R" is O(N) and O(1) space if done in-place; it involves taking the first N element and slowly replacing them (a hint at an inductive proof is also given). There are also useful distributed variants and miscellaneous variants of reservoir sampling to be found on the wikipedia page.
edit: Here's another way to code it below in a more semantically obvious manner.
from __future__ import division
import random
def orderedSampleWithoutReplacement(seq, sampleSize):
totalElems = len(seq)
if not 0<=sampleSize<=totalElems:
raise ValueError('Required that 0 <= sample_size <= population_size')
picksRemaining = sampleSize
for elemsSeen,element in enumerate(seq):
elemsRemaining = totalElems - elemsSeen
prob = picksRemaining/elemsRemaining
if random.random() < prob:
yield element
picksRemaining -= 1
from collections import Counter
Counter(
tuple(orderedSampleWithoutReplacement([0,1,2,3], 2))
for _ in range(10**5)
)
Maybe you can just generate the sample of indices and then collect the items from your list.
randIndex = random.sample(range(len(mylist)), sample_size)
randIndex.sort()
rand = [mylist[i] for i in randIndex]
Apparently random.sample
was introduced in python 2.3
so for version under that, we can use shuffle (example for 4 items):
myRange = range(0,len(mylist))
shuffle(myRange)
coupons = [ bestCoupons[i] for i in sorted(myRange[:4]) ]
random.sample implement it.
>>> random.sample([1, 2, 3, 4, 5], 3) # Three samples without replacement
[4, 1, 5]
'Programing' 카테고리의 다른 글
할당을 해제하는 동안 뷰 컨트롤러의 뷰를로드하려고합니다. UISearchController (0) | 2020.10.09 |
---|---|
응용 프로그램 도메인을 이해하지 못합니다. (0) | 2020.10.09 |
스 와이프하여 이벤트 닫기 (0) | 2020.10.09 |
Jquery UI datepicker. (0) | 2020.10.09 |
Java 프로그램 실행 경로를 얻는 방법 (0) | 2020.10.09 |