pyspark.sql.streaming.DataStreamReader.csv#

DataStreamReader.csv(path, schema=None, sep=None, encoding=None, quote=None, escape=None, comment=None, header=None, inferSchema=None, ignoreLeadingWhiteSpace=None, ignoreTrailingWhiteSpace=None, nullValue=None, nanValue=None, positiveInf=None, negativeInf=None, dateFormat=None, timestampFormat=None, maxColumns=None, maxCharsPerColumn=None, maxMalformedLogPerPartition=None, mode=None, columnNameOfCorruptRecord=None, multiLine=None, charToEscapeQuoteEscaping=None, enforceSchema=None, emptyValue=None, locale=None, lineSep=None, pathGlobFilter=None, recursiveFileLookup=None, unescapedQuoteHandling=None)[source]#

Loads a CSV file stream and returns the result as a DataFrame.

This function will go through the input once to determine the input schema if inferSchema is enabled. To avoid going through the entire data once, disable inferSchema option or specify the schema explicitly using schema.

Parameters
pathstr

string for input path.

schemapyspark.sql.types.StructType or str, optional

an optional pyspark.sql.types.StructType for the input schema or a DDL-formatted string (For example col0 INT, col1 DOUBLE).

.. versionadded:: 2.0.0
.. versionchanged:: 3.5.0

Supports Spark Connect.

Other Parameters
Extra options

For the extra options, refer to Data Source Option in the version you use.

Notes

This API is evolving.

Examples

Load a data stream from a temporary CSV file.

>>> import tempfile
>>> import time
>>> with tempfile.TemporaryDirectory(prefix="csv") as d:
...     # Write a temporary text file to read it.
...     spark.createDataFrame([(1, "2"),]).write.mode("overwrite").format("csv").save(d)
...
...     # Start a streaming query to read the CSV file.
...     q = spark.readStream.schema(
...         "col0 INT, col1 STRING"
...     ).format("csv").load(d).writeStream.format("console").start()
...     time.sleep(3)
...     q.stop()