pyspark.pandas.DataFrame.iloc#
- property DataFrame.iloc#
Purely integer-location based indexing for selection by position.
.iloc[]
is primarily integer position based (from0
tolength-1
of the axis), but may also be used with a conditional boolean Series.Allowed inputs are:
An integer for column selection, e.g.
5
.A list or array of integers for row selection with distinct index values, e.g.
[3, 4, 0]
A list or array of integers for column selection, e.g.
[4, 3, 0]
.A boolean array for column selection.
A slice object with ints for row and column selection, e.g.
1:7
.
Not allowed inputs which pandas allows are:
A list or array of integers for row selection with duplicated indexes, e.g.
[4, 4, 0]
.A boolean array for row selection.
A
callable
function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing (one of the above). This is useful in method chains when you don’t have a reference to the calling object but would like to base your selection on some value.
.iloc
will raiseIndexError
if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing (this conforms with python/numpy slice semantics).See also
DataFrame.loc
Purely label-location based indexer for selection by label.
Series.iloc
Purely integer-location based indexing for selection by position.
Examples
>>> mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4}, ... {'a': 100, 'b': 200, 'c': 300, 'd': 400}, ... {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }] >>> df = ps.DataFrame(mydict, columns=['a', 'b', 'c', 'd']) >>> df a b c d 0 1 2 3 4 1 100 200 300 400 2 1000 2000 3000 4000
Indexing just the rows
A scalar integer for row selection.
>>> df.iloc[1] a 100 b 200 c 300 d 400 Name: 1, dtype: int64
>>> df.iloc[[0]] a b c d 0 1 2 3 4
With a slice object.
>>> df.iloc[:3] a b c d 0 1 2 3 4 1 100 200 300 400 2 1000 2000 3000 4000
Indexing both axes
You can mix the indexer types for the index and columns. Use
:
to select the entire axis.With scalar integers.
>>> df.iloc[:1, 1] 0 2 Name: b, dtype: int64
With lists of integers.
>>> df.iloc[:2, [1, 3]] b d 0 2 4 1 200 400
With slice objects.
>>> df.iloc[:2, 0:3] a b c 0 1 2 3 1 100 200 300
With a boolean array whose length matches the columns.
>>> df.iloc[:, [True, False, True, False]] a c 0 1 3 1 100 300 2 1000 3000
Setting values
Setting value for all items matching the list of labels.
>>> df.iloc[[1, 2], [1]] = 50 >>> df a b c d 0 1 2 3 4 1 100 50 300 400 2 1000 50 3000 4000
Setting value for an entire row
>>> df.iloc[0] = 10 >>> df a b c d 0 10 10 10 10 1 100 50 300 400 2 1000 50 3000 4000
Set value for an entire column
>>> df.iloc[:, 2] = 30 >>> df a b c d 0 10 10 30 10 1 100 50 30 400 2 1000 50 30 4000
Set value for an entire list of columns
>>> df.iloc[:, [2, 3]] = 100 >>> df a b c d 0 10 10 100 100 1 100 50 100 100 2 1000 50 100 100
Set value with Series
>>> df.iloc[:, 3] = df.iloc[:, 3] * 2 >>> df a b c d 0 10 10 100 200 1 100 50 100 200 2 1000 50 100 200