pyspark.pandas.DataFrame.iloc#

property DataFrame.iloc#

Purely integer-location based indexing for selection by position.

.iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a conditional boolean Series.

Allowed inputs are:

  • An integer for column selection, e.g. 5.

  • A list or array of integers for row selection with distinct index values, e.g. [3, 4, 0]

  • A list or array of integers for column selection, e.g. [4, 3, 0].

  • A boolean array for column selection.

  • A slice object with ints for row and column selection, e.g. 1:7.

Not allowed inputs which pandas allows are:

  • A list or array of integers for row selection with duplicated indexes, e.g. [4, 4, 0].

  • A boolean array for row selection.

  • A callable function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing (one of the above). This is useful in method chains when you don’t have a reference to the calling object but would like to base your selection on some value.

.iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing (this conforms with python/numpy slice semantics).

See also

DataFrame.loc

Purely label-location based indexer for selection by label.

Series.iloc

Purely integer-location based indexing for selection by position.

Examples

>>> mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
...           {'a': 100, 'b': 200, 'c': 300, 'd': 400},
...           {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }]
>>> df = ps.DataFrame(mydict, columns=['a', 'b', 'c', 'd'])
>>> df
      a     b     c     d
0     1     2     3     4
1   100   200   300   400
2  1000  2000  3000  4000

Indexing just the rows

A scalar integer for row selection.

>>> df.iloc[1]
a    100
b    200
c    300
d    400
Name: 1, dtype: int64
>>> df.iloc[[0]]
   a  b  c  d
0  1  2  3  4

With a slice object.

>>> df.iloc[:3]
      a     b     c     d
0     1     2     3     4
1   100   200   300   400
2  1000  2000  3000  4000

Indexing both axes

You can mix the indexer types for the index and columns. Use : to select the entire axis.

With scalar integers.

>>> df.iloc[:1, 1]
0    2
Name: b, dtype: int64

With lists of integers.

>>> df.iloc[:2, [1, 3]]
     b    d
0    2    4
1  200  400

With slice objects.

>>> df.iloc[:2, 0:3]
     a    b    c
0    1    2    3
1  100  200  300

With a boolean array whose length matches the columns.

>>> df.iloc[:, [True, False, True, False]]
      a     c
0     1     3
1   100   300
2  1000  3000

Setting values

Setting value for all items matching the list of labels.

>>> df.iloc[[1, 2], [1]] = 50
>>> df
      a   b     c     d
0     1   2     3     4
1   100  50   300   400
2  1000  50  3000  4000

Setting value for an entire row

>>> df.iloc[0] = 10
>>> df
      a   b     c     d
0    10  10    10    10
1   100  50   300   400
2  1000  50  3000  4000

Set value for an entire column

>>> df.iloc[:, 2] = 30
>>> df
      a   b   c     d
0    10  10  30    10
1   100  50  30   400
2  1000  50  30  4000

Set value for an entire list of columns

>>> df.iloc[:, [2, 3]] = 100
>>> df
      a   b    c    d
0    10  10  100  100
1   100  50  100  100
2  1000  50  100  100

Set value with Series

>>> df.iloc[:, 3] = df.iloc[:, 3] * 2
>>> df
      a   b    c    d
0    10  10  100  200
1   100  50  100  200
2  1000  50  100  200