pyspark.sql.functions.substring_index#

pyspark.sql.functions.substring_index(str, delim, count)[source]#

Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything the left of the final delimiter (counting from left) is returned. If count is negative, every to the right of the final delimiter (counting from the right) is returned. substring_index performs a case-sensitive match when searching for delim.

New in version 1.5.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters

strColumn or column name: target column to work on.
delimliteral string: delimiter of values.
countint: number of occurrences.

Returns

Column: substring of given value.

See also

pyspark.sql.functions.instr()
pyspark.sql.functions.locate()
pyspark.sql.functions.substr()
pyspark.sql.functions.substring()
pyspark.sql.Column.substr()

Examples

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([('a.b.c.d',)], ['s'])
>>> df.select('*', sf.substring_index(df.s, '.', 2)).show()
+-------+------------------------+
|      s|substring_index(s, ., 2)|
+-------+------------------------+
|a.b.c.d|                     a.b|
+-------+------------------------+

>>> df.select('*', sf.substring_index('s', '.', -3)).show()
+-------+-------------------------+
|      s|substring_index(s, ., -3)|
+-------+-------------------------+
|a.b.c.d|                    b.c.d|
+-------+-------------------------+