EVAL1_SPC1 - Details for Query 629

Submitted Time: 2026/02/20 12:34:44
Duration: 26 s
Succeeded Jobs: 761 764

Show the Stage ID and Task ID that corresponds to the max metric

digraph G { 0 [labelType="html" label=" AdaptiveSparkPlan "]; subgraph cluster1 { isCluster="true"; label="WholeStageCodegen (4)\n \nduration: 0 ms"; 2 [labelType="html" label="HashAggregate time in aggregation build: 0 ms number of output rows: 1"]; } 3 [labelType="html" label="Exchange shuffle records written: 3 local merged chunks fetched: 0 shuffle write time total (min, med, max (stageId: taskId)) 1 ms (0 ms, 0 ms, 0 ms (stage 967.0: task 931)) remote merged bytes read: 0.0 B local merged blocks fetched: 0 corrupt merged block chunks: 0 remote merged reqs duration: 0 ms remote merged blocks fetched: 0 records read: 3 local bytes read: 171.0 B fetch wait time: 0 ms remote bytes read: 0.0 B merged fetch fallback count: 0 local blocks read: 3 remote merged chunks fetched: 0 remote blocks read: 0 data size total (min, med, max (stageId: taskId)) 48.0 B (16.0 B, 16.0 B, 16.0 B (stage 967.0: task 931)) local merged bytes read: 0.0 B number of partitions: 1 remote reqs duration: 0 ms remote bytes read to disk: 0.0 B shuffle bytes written total (min, med, max (stageId: taskId)) 171.0 B (56.0 B, 56.0 B, 59.0 B (stage 967.0: task 932))"]; subgraph cluster4 { isCluster="true"; label="WholeStageCodegen (3)\n \nduration: total (min, med, max (stageId: taskId))\n25.8 s (4 ms, 4 ms, 25.8 s (stage 967.0: task 932))"; 5 [labelType="html" label="HashAggregate time in aggregation build total (min, med, max (stageId: taskId)) 25.8 s (3 ms, 3 ms, 25.8 s (stage 967.0: task 932)) number of output rows: 3"]; 6 [labelType="html" label=" Project "]; } 7 [labelType="html" label="Filter number of output rows: 1"]; subgraph cluster8 { isCluster="true"; label="WholeStageCodegen (2)\n \nduration: total (min, med, max (stageId: taskId))\n25.8 s (18 ms, 18 ms, 25.8 s (stage 967.0: task 932))"; 9 [labelType="html" label="Generate number of output rows: 737"]; } 10 [labelType="html" label=" Project "]; 11 [labelType="html" label="Filter number of output rows: 1"]; subgraph cluster12 { isCluster="true"; label="WholeStageCodegen (1)\n \nduration: total (min, med, max (stageId: taskId))\n25.9 s (28 ms, 28 ms, 25.8 s (stage 967.0: task 932))"; 13 [labelType="html" label="ColumnarToRow number of output rows: 9,522 number of input batches: 3"]; } 14 [labelType="html" label="Scan parquet number of files read: 1 scan time total (min, med, max (stageId: taskId)) 206 ms (3 ms, 3 ms, 200 ms (stage 967.0: task 932)) metadata time: 0 ms size of files read: 9.9 MiB number of output rows: 9,522"]; 2->0; 3->2; 5->3; 6->5; 7->6; 9->7; 10->9; 11->10; 13->11; 14->13; }

AdaptiveSparkPlan isFinalPlan=true

HashAggregate(keys=[], functions=[count(1)])

WholeStageCodegen (4)

Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=14050]

HashAggregate(keys=[], functions=[partial_count(1)])

Project

WholeStageCodegen (3)

Filter (get_json_object(COL_9F4C7B82_8EA5_42B7_8724_EDC3D750C2D3#89374, $.term_number_in_text) <=> 1)

Generate explode(COL_9E6B0BF3_343E_49F2_87DB_EE022165520A#89354), false, [COL_9F4C7B82_8EA5_42B7_8724_EDC3D750C2D3#89374]

WholeStageCodegen (2)

Project [from_json(ArrayType(StringType,false), to_json(str_to_words(str_replace_regex(str_replace_regex(BODY_3253#89006, <br\s*\/?>, ), <[^<>]+>, )), Some(Etc/UTC)), Some(Etc/UTC)) AS COL_9E6B0BF3_343E_49F2_87DB_EE022165520A#89354]

Filter (((DOCUMENT_ID_3241#89009 <=> 8BE75A8015FDF0D67EA8C8C6A4008D75E25BAEEF) AND (size(from_json(ArrayType(StringType,false), to_json(str_to_words(str_replace_regex(str_replace_regex(BODY_3253#89006, <br\s*\/?>, ), <[^<>]+>, )), Some(Etc/UTC)), Some(Etc/UTC)), true) > 0)) AND isnotnull(from_json(ArrayType(StringType,false), to_json(str_to_words(str_replace_regex(str_replace_regex(BODY_3253#89006, <br\s*\/?>, ), <[^<>]+>, )), Some(Etc/UTC)), Some(Etc/UTC))))

ColumnarToRow

WholeStageCodegen (1)

FileScan parquet [BODY_3253#89006,DOCUMENT_ID_3241#89009] Batched: true, DataFilters: [(DOCUMENT_ID_3241#89009 <=> 8BE75A8015FDF0D67EA8C8C6A4008D75E25BAEEF), (size(from_json(ArrayType..., Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/data/output/cache/parquet/uet/DOCUMENT_3240], PartitionFilters: [], PushedFilters: [EqualNullSafe(DOCUMENT_ID_3241,8BE75A8015FDF0D67EA8C8C6A4008D75E25BAEEF)], ReadSchema: struct<BODY_3253:string,DOCUMENT_ID_3241:string>

Details

== Physical Plan ==
AdaptiveSparkPlan (20)
+- == Final Plan ==
   * HashAggregate (11)
   +- ShuffleQueryStage (10), Statistics(sizeInBytes=48.0 B, rowCount=3)
      +- Exchange (9)
         +- * HashAggregate (8)
            +- * Project (7)
               +- Filter (6)
                  +- * Generate (5)
                     +- Project (4)
                        +- Filter (3)
                           +- * ColumnarToRow (2)
                              +- Scan parquet  (1)
+- == Initial Plan ==
   HashAggregate (19)
   +- Exchange (18)
      +- HashAggregate (17)
         +- Project (16)
            +- Filter (15)
               +- Generate (14)
                  +- Project (13)
                     +- Filter (12)
                        +- Scan parquet  (1)


(1) Scan parquet 
Output [2]: [BODY_3253#89006, DOCUMENT_ID_3241#89009]
Batched: true
Location: InMemoryFileIndex [file:/data/output/cache/parquet/uet/DOCUMENT_3240]
PushedFilters: [EqualNullSafe(DOCUMENT_ID_3241,8BE75A8015FDF0D67EA8C8C6A4008D75E25BAEEF)]
ReadSchema: struct<BODY_3253:string,DOCUMENT_ID_3241:string>

(2) ColumnarToRow [codegen id : 1]
Input [2]: [BODY_3253#89006, DOCUMENT_ID_3241#89009]

(3) Filter
Input [2]: [BODY_3253#89006, DOCUMENT_ID_3241#89009]
Condition : (((DOCUMENT_ID_3241#89009 <=> 8BE75A8015FDF0D67EA8C8C6A4008D75E25BAEEF) AND (size(from_json(ArrayType(StringType,false), to_json(str_to_words(str_replace_regex(str_replace_regex(BODY_3253#89006, <br\s*\/?>, ), <[^<>]+>, )), Some(Etc/UTC)), Some(Etc/UTC)), true) > 0)) AND isnotnull(from_json(ArrayType(StringType,false), to_json(str_to_words(str_replace_regex(str_replace_regex(BODY_3253#89006, <br\s*\/?>, ), <[^<>]+>, )), Some(Etc/UTC)), Some(Etc/UTC))))

(4) Project
Output [1]: [from_json(ArrayType(StringType,false), to_json(str_to_words(str_replace_regex(str_replace_regex(BODY_3253#89006, <br\s*\/?>, ), <[^<>]+>, )), Some(Etc/UTC)), Some(Etc/UTC)) AS COL_9E6B0BF3_343E_49F2_87DB_EE022165520A#89354]
Input [2]: [BODY_3253#89006, DOCUMENT_ID_3241#89009]

(5) Generate [codegen id : 2]
Input [1]: [COL_9E6B0BF3_343E_49F2_87DB_EE022165520A#89354]
Arguments: explode(COL_9E6B0BF3_343E_49F2_87DB_EE022165520A#89354), false, [COL_9F4C7B82_8EA5_42B7_8724_EDC3D750C2D3#89374]

(6) Filter
Input [1]: [COL_9F4C7B82_8EA5_42B7_8724_EDC3D750C2D3#89374]
Condition : (get_json_object(COL_9F4C7B82_8EA5_42B7_8724_EDC3D750C2D3#89374, $.term_number_in_text) <=> 1)

(7) Project [codegen id : 3]
Output: []
Input [1]: [COL_9F4C7B82_8EA5_42B7_8724_EDC3D750C2D3#89374]

(8) HashAggregate [codegen id : 3]
Input: []
Keys: []
Functions [1]: [partial_count(1)]
Aggregate Attributes [1]: [count#89875L]
Results [1]: [count#89876L]

(9) Exchange
Input [1]: [count#89876L]
Arguments: SinglePartition, ENSURE_REQUIREMENTS, [plan_id=14050]

(10) ShuffleQueryStage
Output [1]: [count#89876L]
Arguments: 0

(11) HashAggregate [codegen id : 4]
Input [1]: [count#89876L]
Keys: []
Functions [1]: [count(1)]
Aggregate Attributes [1]: [count(1)#89872L]
Results [1]: [count(1)#89872L AS count#89873L]

(12) Filter
Input [2]: [BODY_3253#89006, DOCUMENT_ID_3241#89009]
Condition : (((DOCUMENT_ID_3241#89009 <=> 8BE75A8015FDF0D67EA8C8C6A4008D75E25BAEEF) AND (size(from_json(ArrayType(StringType,false), to_json(str_to_words(str_replace_regex(str_replace_regex(BODY_3253#89006, <br\s*\/?>, ), <[^<>]+>, )), Some(Etc/UTC)), Some(Etc/UTC)), true) > 0)) AND isnotnull(from_json(ArrayType(StringType,false), to_json(str_to_words(str_replace_regex(str_replace_regex(BODY_3253#89006, <br\s*\/?>, ), <[^<>]+>, )), Some(Etc/UTC)), Some(Etc/UTC))))

(13) Project
Output [1]: [from_json(ArrayType(StringType,false), to_json(str_to_words(str_replace_regex(str_replace_regex(BODY_3253#89006, <br\s*\/?>, ), <[^<>]+>, )), Some(Etc/UTC)), Some(Etc/UTC)) AS COL_9E6B0BF3_343E_49F2_87DB_EE022165520A#89354]
Input [2]: [BODY_3253#89006, DOCUMENT_ID_3241#89009]

(14) Generate
Input [1]: [COL_9E6B0BF3_343E_49F2_87DB_EE022165520A#89354]
Arguments: explode(COL_9E6B0BF3_343E_49F2_87DB_EE022165520A#89354), false, [COL_9F4C7B82_8EA5_42B7_8724_EDC3D750C2D3#89374]

(15) Filter
Input [1]: [COL_9F4C7B82_8EA5_42B7_8724_EDC3D750C2D3#89374]
Condition : (get_json_object(COL_9F4C7B82_8EA5_42B7_8724_EDC3D750C2D3#89374, $.term_number_in_text) <=> 1)

(16) Project
Output: []
Input [1]: [COL_9F4C7B82_8EA5_42B7_8724_EDC3D750C2D3#89374]

(17) HashAggregate
Input: []
Keys: []
Functions [1]: [partial_count(1)]
Aggregate Attributes [1]: [count#89875L]
Results [1]: [count#89876L]

(18) Exchange
Input [1]: [count#89876L]
Arguments: SinglePartition, ENSURE_REQUIREMENTS, [plan_id=14014]

(19) HashAggregate
Input [1]: [count#89876L]
Keys: []
Functions [1]: [count(1)]
Aggregate Attributes [1]: [count(1)#89872L]
Results [1]: [count(1)#89872L AS count#89873L]

(20) AdaptiveSparkPlan
Output [1]: [count#89873L]
Arguments: isFinalPlan=true

SQL / DataFrame Properties

Name	Value
spark.sql.optimizer.nestedPredicatePushdown.supportedFileSources	parquet,orc,geoparquet