pyspark.pandas.DataFrame.dropna#

DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)[source]#

Remove missing values.

Parameters
axis{0 or ‘index’}, default 0

Determine if rows or columns which contain missing values are removed.

  • 0, or ‘index’ : Drop rows which contain missing values.

how{‘any’, ‘all’}, default ‘any’

Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.

  • ‘any’ : If any NA values are present, drop that row or column.

  • ‘all’ : If all values are NA, drop that row or column.

threshint, optional

Require that many non-NA values.

subsetarray-like, optional

Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.

inplacebool, default False

If True, do operation inplace and return None.

Returns
DataFrame

DataFrame with NA entries dropped from it.

See also

DataFrame.drop

Drop specified labels from columns.

DataFrame.isnull

Indicate missing values.

DataFrame.notnull

Indicate existing (non-missing) values.

Examples

>>> df = ps.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
...                    "toy": [None, 'Batmobile', 'Bullwhip'],
...                    "born": [None, "1940-04-25", None]},
...                   columns=['name', 'toy', 'born'])
>>> df
       name        toy        born
0    Alfred       None        None
1    Batman  Batmobile  1940-04-25
2  Catwoman   Bullwhip        None

Drop the rows where at least one element is missing.

>>> df.dropna()
     name        toy        born
1  Batman  Batmobile  1940-04-25

Drop the columns where at least one element is missing.

>>> df.dropna(axis='columns')
       name
0    Alfred
1    Batman
2  Catwoman

Drop the rows where all elements are missing.

>>> df.dropna(how='all')
       name        toy        born
0    Alfred       None        None
1    Batman  Batmobile  1940-04-25
2  Catwoman   Bullwhip        None

Keep only the rows with at least 2 non-NA values.

>>> df.dropna(thresh=2)
       name        toy        born
1    Batman  Batmobile  1940-04-25
2  Catwoman   Bullwhip        None

Define in which columns to look for missing values.

>>> df.dropna(subset=['name', 'born'])
     name        toy        born
1  Batman  Batmobile  1940-04-25

Keep the DataFrame with valid entries in the same variable.

>>> df.dropna(inplace=True)
>>> df
     name        toy        born
1  Batman  Batmobile  1940-04-25