Programing

numpy를 사용하여 두 배열의 모든 조합으로 구성된 배열 만들기

lottogame 2020. 7. 3. 18:41
반응형

numpy를 사용하여 두 배열의 모든 조합으로 구성된 배열 만들기


6 매개 변수 함수의 매개 변수 공간을 실행하여 복잡한 동작을 시도하기 전에 숫자 동작을 연구하려고하므로 효율적인 방법을 찾고 있습니다.

내 함수는 6-dim numpy 배열이 주어진 float 값을 입력으로 사용합니다. 처음에 시도한 것은 다음과 같습니다.

먼저 2 개의 배열을 가져 와서 두 배열의 모든 값 조합으로 배열을 생성하는 함수를 만들었습니다.

from numpy import *
def comb(a,b):
    c = []
    for i in a:
        for j in b:
            c.append(r_[i,j])
    return c

그런 다음 reduce()동일한 배열의 m 복사본에 적용했습니다.

def combs(a,m):
    return reduce(comb,[a]*m)

그런 다음 내 기능을 다음과 같이 평가합니다.

values = combs(np.arange(0,1,0.1),6)
for val in values:
    print F(val)

이것은 작동하지만 너무 느립니다. 나는 매개 변수의 공간이 크다는 것을 알고 있지만 그렇게 느려서는 안됩니다. 이 예제에서는 10 6 (백만) 포인트 만 샘플링 했으며 배열을 만드는 데 15 초 이상이 걸렸습니다 values.

numpy 로이 작업을보다 효율적으로 수행하는 방법을 알고 있습니까?

F필요한 경우 함수 가 인수를 취하는 방식을 수정할 수 있습니다 .


최신 버전 numpy(> 1.8.x) numpy.meshgrid()에서는 훨씬 빠른 구현을 제공합니다.

@pv의 솔루션

In [113]:

%timeit cartesian(([1, 2, 3], [4, 5], [6, 7]))
10000 loops, best of 3: 135 µs per loop
In [114]:

cartesian(([1, 2, 3], [4, 5], [6, 7]))

Out[114]:
array([[1, 4, 6],
       [1, 4, 7],
       [1, 5, 6],
       [1, 5, 7],
       [2, 4, 6],
       [2, 4, 7],
       [2, 5, 6],
       [2, 5, 7],
       [3, 4, 6],
       [3, 4, 7],
       [3, 5, 6],
       [3, 5, 7]])

numpy.meshgrid()2D로만 사용하면 이제 ND가 가능합니다. 이 경우 3D :

In [115]:

%timeit np.array(np.meshgrid([1, 2, 3], [4, 5], [6, 7])).T.reshape(-1,3)
10000 loops, best of 3: 74.1 µs per loop
In [116]:

np.array(np.meshgrid([1, 2, 3], [4, 5], [6, 7])).T.reshape(-1,3)

Out[116]:
array([[1, 4, 6],
       [1, 5, 6],
       [2, 4, 6],
       [2, 5, 6],
       [3, 4, 6],
       [3, 5, 6],
       [1, 4, 7],
       [1, 5, 7],
       [2, 4, 7],
       [2, 5, 7],
       [3, 4, 7],
       [3, 5, 7]])

최종 결과의 순서는 약간 다릅니다.


다음은 순수한 숫자 구현입니다. ca. itertools를 사용하는 것보다 5 배 빠릅니다.


import numpy as np

def cartesian(arrays, out=None):
    """
    Generate a cartesian product of input arrays.

    Parameters
    ----------
    arrays : list of array-like
        1-D arrays to form the cartesian product of.
    out : ndarray
        Array to place the cartesian product in.

    Returns
    -------
    out : ndarray
        2-D array of shape (M, len(arrays)) containing cartesian products
        formed of input arrays.

    Examples
    --------
    >>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
    array([[1, 4, 6],
           [1, 4, 7],
           [1, 5, 6],
           [1, 5, 7],
           [2, 4, 6],
           [2, 4, 7],
           [2, 5, 6],
           [2, 5, 7],
           [3, 4, 6],
           [3, 4, 7],
           [3, 5, 6],
           [3, 5, 7]])

    """

    arrays = [np.asarray(x) for x in arrays]
    dtype = arrays[0].dtype

    n = np.prod([x.size for x in arrays])
    if out is None:
        out = np.zeros([n, len(arrays)], dtype=dtype)

    m = n / arrays[0].size
    out[:,0] = np.repeat(arrays[0], m)
    if arrays[1:]:
        cartesian(arrays[1:], out=out[0:m,1:])
        for j in xrange(1, arrays[0].size):
            out[j*m:(j+1)*m,1:] = out[0:m,1:]
    return out

itertools.combinations is in general the fastest way to get combinations from a Python container (if you do in fact want combinations, i.e., arrangements WITHOUT repetitions and independent of order; that's not what your code appears to be doing, but I can't tell whether that's because your code is buggy or because you're using the wrong terminology).

If you want something different than combinations perhaps other iterators in itertools, product or permutations, might serve you better. For example, it looks like your code is roughly the same as:

for val in itertools.product(np.arange(0, 1, 0.1), repeat=6):
    print F(val)

All of these iterators yield tuples, not lists or numpy arrays, so if your F is picky about getting specifically a numpy array you'll have to accept the extra overhead of constructing or clearing and re-filling one at each step.


The following numpy implementation should be approx. 2x the speed of the given answer:

def cartesian2(arrays):
    arrays = [np.asarray(a) for a in arrays]
    shape = (len(x) for x in arrays)

    ix = np.indices(shape, dtype=int)
    ix = ix.reshape(len(arrays), -1).T

    for n, arr in enumerate(arrays):
        ix[:, n] = arrays[n][ix[:, n]]

    return ix

It looks like you want a grid to evaluate your function, in which case you can use numpy.ogrid (open) or numpy.mgrid (fleshed out):

import numpy
my_grid = numpy.mgrid[[slice(0,1,0.1)]*6]

You can do something like this

import numpy as np

def cartesian_coord(*arrays):
    grid = np.meshgrid(*arrays)        
    coord_list = [entry.ravel() for entry in grid]
    points = np.vstack(coord_list).T
    return points

a = np.arange(4)  # fake data
print(cartesian_coord(*6*[a])

which gives

array([[0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 0, 0, 2],
   ..., 
   [3, 3, 3, 3, 3, 1],
   [3, 3, 3, 3, 3, 2],
   [3, 3, 3, 3, 3, 3]])

you can use np.array(itertools.product(a, b))


Here's yet another way, using pure NumPy, no recursion, no list comprehension, and no explicit for loops. It's about 20% slower than the original answer, and it's based on np.meshgrid.

def cartesian(*arrays):
    mesh = np.meshgrid(*arrays)  # standard numpy meshgrid
    dim = len(mesh)  # number of dimensions
    elements = mesh[0].size  # number of elements, any index will do
    flat = np.concatenate(mesh).ravel()  # flatten the whole meshgrid
    reshape = np.reshape(flat, (dim, elements)).T  # reshape and transpose
    return reshape

For example,

x = np.arange(3)
a = cartesian(x, x, x, x, x)
print(a)

gives

[[0 0 0 0 0]
 [0 0 0 0 1]
 [0 0 0 0 2]
 ..., 
 [2 2 2 2 0]
 [2 2 2 2 1]
 [2 2 2 2 2]]

For a pure numpy implementation of Cartesian product of 1D arrays (or flat python lists), just use meshgrid(), roll the axes with transpose(), and reshape to the desired ouput:

 def cartprod(*arrays):
     N = len(arrays)
     return transpose(meshgrid(*arrays, indexing='ij'), 
                      roll(arange(N + 1), -1)).reshape(-1, N)

Note this has the convention of last axis changing fastest ("C style" or "row-major").

In [88]: cartprod([1,2,3], [4,8], [100, 200, 300, 400], [-5, -4])
Out[88]: 
array([[  1,   4, 100,  -5],
       [  1,   4, 100,  -4],
       [  1,   4, 200,  -5],
       [  1,   4, 200,  -4],
       [  1,   4, 300,  -5],
       [  1,   4, 300,  -4],
       [  1,   4, 400,  -5],
       [  1,   4, 400,  -4],
       [  1,   8, 100,  -5],
       [  1,   8, 100,  -4],
       [  1,   8, 200,  -5],
       [  1,   8, 200,  -4],
       [  1,   8, 300,  -5],
       [  1,   8, 300,  -4],
       [  1,   8, 400,  -5],
       [  1,   8, 400,  -4],
       [  2,   4, 100,  -5],
       [  2,   4, 100,  -4],
       [  2,   4, 200,  -5],
       [  2,   4, 200,  -4],
       [  2,   4, 300,  -5],
       [  2,   4, 300,  -4],
       [  2,   4, 400,  -5],
       [  2,   4, 400,  -4],
       [  2,   8, 100,  -5],
       [  2,   8, 100,  -4],
       [  2,   8, 200,  -5],
       [  2,   8, 200,  -4],
       [  2,   8, 300,  -5],
       [  2,   8, 300,  -4],
       [  2,   8, 400,  -5],
       [  2,   8, 400,  -4],
       [  3,   4, 100,  -5],
       [  3,   4, 100,  -4],
       [  3,   4, 200,  -5],
       [  3,   4, 200,  -4],
       [  3,   4, 300,  -5],
       [  3,   4, 300,  -4],
       [  3,   4, 400,  -5],
       [  3,   4, 400,  -4],
       [  3,   8, 100,  -5],
       [  3,   8, 100,  -4],
       [  3,   8, 200,  -5],
       [  3,   8, 200,  -4],
       [  3,   8, 300,  -5],
       [  3,   8, 300,  -4],
       [  3,   8, 400,  -5],
       [  3,   8, 400,  -4]])

If you want to change the first axis fastest ("FORTRAN style" or "column-major"), just change the order parameter of reshape() like this: reshape((-1, N), order='F')

참고URL : https://stackoverflow.com/questions/1208118/using-numpy-to-build-an-array-of-all-combinations-of-two-arrays

반응형