pyspark.sql.functions.array_agg#
- pyspark.sql.functions.array_agg(col)[source]#
Aggregate function: returns a list of objects with duplicates.
New in version 3.5.0.
- Parameters
- col
Column
or column name target column to compute on.
- col
- Returns
Column
list of objects with duplicates.
Examples
Example 1: Using array_agg function on an int column
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([[1],[1],[2]], ["c"]) >>> df.agg(sf.sort_array(sf.array_agg('c')).alias('sorted_list')).show() +-----------+ |sorted_list| +-----------+ | [1, 1, 2]| +-----------+
Example 2: Using array_agg function on a string column
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([["apple"],["apple"],["banana"]], ["c"]) >>> df.agg(sf.sort_array(sf.array_agg('c')).alias('sorted_list')).show(truncate=False) +----------------------+ |sorted_list | +----------------------+ |[apple, apple, banana]| +----------------------+
Example 3: Using array_agg function on a column with null values
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([[1],[None],[2]], ["c"]) >>> df.agg(sf.sort_array(sf.array_agg('c')).alias('sorted_list')).show() +-----------+ |sorted_list| +-----------+ | [1, 2]| +-----------+
Example 4: Using array_agg function on a column with different data types
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([[1],["apple"],[2]], ["c"]) >>> df.agg(sf.sort_array(sf.array_agg('c')).alias('sorted_list')).show() +-------------+ | sorted_list| +-------------+ |[1, 2, apple]| +-------------+