pyspark.sql.streaming.DataStreamReader.csv#
- DataStreamReader.csv(path, schema=None, sep=None, encoding=None, quote=None, escape=None, comment=None, header=None, inferSchema=None, ignoreLeadingWhiteSpace=None, ignoreTrailingWhiteSpace=None, nullValue=None, nanValue=None, positiveInf=None, negativeInf=None, dateFormat=None, timestampFormat=None, maxColumns=None, maxCharsPerColumn=None, maxMalformedLogPerPartition=None, mode=None, columnNameOfCorruptRecord=None, multiLine=None, charToEscapeQuoteEscaping=None, enforceSchema=None, emptyValue=None, locale=None, lineSep=None, pathGlobFilter=None, recursiveFileLookup=None, unescapedQuoteHandling=None)[source]#
Loads a CSV file stream and returns the result as a
DataFrame
.This function will go through the input once to determine the input schema if
inferSchema
is enabled. To avoid going through the entire data once, disableinferSchema
option or specify the schema explicitly usingschema
.- Parameters
- pathstr
string for input path.
- schema
pyspark.sql.types.StructType
or str, optional an optional
pyspark.sql.types.StructType
for the input schema or a DDL-formatted string (For examplecol0 INT, col1 DOUBLE
).- .. versionadded:: 2.0.0
- .. versionchanged:: 3.5.0
Supports Spark Connect.
- Other Parameters
- Extra options
For the extra options, refer to Data Source Option in the version you use.
Notes
This API is evolving.
Examples
Load a data stream from a temporary CSV file.
>>> import tempfile >>> import time >>> with tempfile.TemporaryDirectory(prefix="csv") as d: ... # Write a temporary text file to read it. ... spark.createDataFrame([(1, "2"),]).write.mode("overwrite").format("csv").save(d) ... ... # Start a streaming query to read the CSV file. ... q = spark.readStream.schema( ... "col0 INT, col1 STRING" ... ).format("csv").load(d).writeStream.format("console").start() ... time.sleep(3) ... q.stop()