pyspark.sql.functions.substr#

pyspark.sql.functions.substr(str, pos, len=None)[source]#

Returns the substring of str that starts at pos and is of length len, or the slice of byte array that starts at pos and is of length len.

New in version 3.5.0.

Parameters
strColumn or column name

A column of string.

posColumn or column name

A column of string, the substring of str that starts at pos.

lenColumn or column name, optional

A column of string, the substring of str is of length len.

Returns
Column

substring of given value.

Examples

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([("Spark SQL", 5, 1,)], ["a", "b", "c"])
>>> df.select("*", sf.substr("a", "b", "c")).show()
+---------+---+---+---------------+
|        a|  b|  c|substr(a, b, c)|
+---------+---+---+---------------+
|Spark SQL|  5|  1|              k|
+---------+---+---+---------------+
>>> df.select("*", sf.substr(df.a, df.b)).show()
+---------+---+---+------------------------+
|        a|  b|  c|substr(a, b, 2147483647)|
+---------+---+---+------------------------+
|Spark SQL|  5|  1|                   k SQL|
+---------+---+---+------------------------+