November 26, 2019 / Viewed: 4314 / Comments: 0 / Edit
Examples of how to randomly select rows of an array in python with numpy:
Let create the following array:
>>> import numpy as np
>>> data = np.arange(80).reshape((8, 10))
>>> data
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79]])
\begin{equation}
data = \left( \begin{array}{ccc}
0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 \\
10 & 11 & 12 & 13 & 14 & 15 & 16 & 17 & 18 & 19 \\
20 & 21 & 22 & 23 & 24 & 25 & 26 & 27 & 28 & 29 \\
30 & 31 & 32 & 33 & 34 & 35 & 36 & 37 & 38 & 39 \\
40 & 41 & 42 & 43 & 44 & 45 & 46 & 47 & 48 & 49 \\
50 & 51 & 52 & 53 & 54 & 55 & 56 & 57 & 58 & 59 \\
60 & 61 & 62 & 63 & 64 & 65 & 66 & 67 & 68 & 69 \\
70 & 71 & 72 & 73 & 74 & 75 & 76 & 77 & 78 & 79
\end{array}\right)
\end{equation}
To randomly select rows of the array, a solution is to first shuffle() the array:
>>> np.random.shuffle(data)
>>> data
array([[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79]])
and to slice the n first rows, example with n = 4,
>>> data = data[:4,:]
>>> data
array([[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69]])
\begin{equation}
data = \left( \begin{array}{ccc}
50 & 51 & 52 & 53 & 54 & 55 & 56 & 57 & 58 & 59 \\
0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 \\
40 & 41 & 42 & 43 & 44 & 45 & 46 & 47 & 48 & 49 \\
60 & 61 & 62 & 63 & 64 & 65 & 66 & 67 & 68 & 69
\end{array}\right)
\end{equation}
Note: to remove rows where a condition is true (see), we can do:
>>> data = data[~(data[:,3] > 50)]
>>> data
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])
Another option is to create a list of random integers:
>>> data = np.arange(80).reshape((8, 10))
>>> data
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79]])
>>> import random
>>> rows_id = random.sample(range(0,data.shape[1]-1), 4)
>>> rows_id
[4, 6, 2, 3]
>>> data = data[rows_id,:]
>>> data
array([[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39]])
Links | Site |
---|---|
numpy.random.shuffle | Scipy Doc |
Create numpy array with random elements from list | stackoverflow |
randomly selecting items from an array python | stackoverflow |
Select cells randomly from NumPy array - without replacement | stackoverflow |
How to truncate matrix using NumPy (Python) | stackoverflow |
How to truncate the values of a 2D numpy array | stackoverflow |
numpy.delete | stackoverflow |
Python delete row in numpy array | stackoverflow |
Je développe le présent site avec le framework python Django. Je m'intéresse aussi actuellement dans le cadre de mon travail au machine learning pour plusieurs projets (voir par exemple) et toutes suggestions ou commentaires sont les bienvenus !