Skip to content

SchemaMapping.map_column_statistics produce column_statistics mismatch. #19096

@yjerry-fortinet

Description

@yjerry-fortinet

Describe the bug

Here is a test case slightly modified from "test_schema_mapping_map_statistics_basic":

fn test_schema_mapping_map_statistics_basic() {

The result is not as expected

To Reproduce

use arrow::datatypes::{DataType, Field};
use arrow_schema::Schema;
use datafusion::datasource::schema_adapter::DefaultSchemaAdapterFactory;
use datafusion_common::{ColumnStatistics, Statistics, stats::Precision};
use std::sync::Arc;

#[test]
fn test_schema_mapping_map_statistics_error_case() {
    // Create table schema (a)
    let table_schema = Arc::new(Schema::new(vec![Field::new("a", DataType::Int32, true)]));

    // Create file schema (b, a)
    let file_schema = Schema::new(vec![
        Field::new("b", DataType::Utf8, true),
        Field::new("a", DataType::Int32, true),
    ]);

    // Statistics for column b (index 0 in file)
    let b_stats = ColumnStatistics {
        null_count: Precision::Exact(5),
        ..Default::default()
    };

    // Statistics for column a (index 1 in file)
    let a_stats = ColumnStatistics {
        null_count: Precision::Exact(10),
        ..Default::default()
    };

    // Create default SchemaAdapter
    let adapter = DefaultSchemaAdapterFactory::from_schema(table_schema);
    // Get mapper and projection
    let (mapper, projection) = adapter.map_schema(&file_schema).unwrap();
    // Should project columns 1 from file
    assert_eq!(projection, vec![1]);

    // Create file statistics
    let mut file_stats = Statistics::default();
    file_stats.column_statistics = vec![b_stats, a_stats];
    // Map statistics
    let table_col_stats = mapper
        .map_column_statistics(&file_stats.column_statistics)
        .unwrap();

    let expect_col_a = Precision::Exact(10); // a from file idx 1

    assert_eq!(table_col_stats[0].null_count, expect_col_a);
}

Expected behavior

this test case should pass

Actual Behavior

running 1 test
test test_schema_mapping_map_statistics_error_case ... FAILED

failures:

---- test_schema_mapping_map_statistics_error_case stdout ----

thread 'test_schema_mapping_map_statistics_error_case' (193595) panicked at src/main.rs:56:5:
assertion `left == right` failed
  left: Exact(5)
 right: Exact(10)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    test_schema_mapping_map_statistics_error_case

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

error: test failed, to rerun pass `--bin demo`

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions