pyspark.pandas.DataFrame.dropna#
- DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)[source]#
Remove missing values.
- Parameters
- axis{0 or ‘index’}, default 0
Determine if rows or columns which contain missing values are removed.
0, or ‘index’ : Drop rows which contain missing values.
- how{‘any’, ‘all’}, default ‘any’
Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.
‘any’ : If any NA values are present, drop that row or column.
‘all’ : If all values are NA, drop that row or column.
- threshint, optional
Require that many non-NA values.
- subsetarray-like, optional
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.
- inplacebool, default False
If True, do operation inplace and return None.
- Returns
- DataFrame
DataFrame with NA entries dropped from it.
See also
DataFrame.drop
Drop specified labels from columns.
DataFrame.isnull
Indicate missing values.
DataFrame.notnull
Indicate existing (non-missing) values.
Examples
>>> df = ps.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'], ... "toy": [None, 'Batmobile', 'Bullwhip'], ... "born": [None, "1940-04-25", None]}, ... columns=['name', 'toy', 'born']) >>> df name toy born 0 Alfred None None 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip None
Drop the rows where at least one element is missing.
>>> df.dropna() name toy born 1 Batman Batmobile 1940-04-25
Drop the columns where at least one element is missing.
>>> df.dropna(axis='columns') name 0 Alfred 1 Batman 2 Catwoman
Drop the rows where all elements are missing.
>>> df.dropna(how='all') name toy born 0 Alfred None None 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip None
Keep only the rows with at least 2 non-NA values.
>>> df.dropna(thresh=2) name toy born 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip None
Define in which columns to look for missing values.
>>> df.dropna(subset=['name', 'born']) name toy born 1 Batman Batmobile 1940-04-25
Keep the DataFrame with valid entries in the same variable.
>>> df.dropna(inplace=True) >>> df name toy born 1 Batman Batmobile 1940-04-25