Skip to content

select is not consistent with dplyr::select when used on data.frame with duplicate column names #92

@TimTeaFan

Description

@TimTeaFan

I was playing around with data.frames with duplicate column names and stumbled upon this inconsistency with {dplyr}:

library(dplyr)
dat <- data.frame(a = 1, b = 2, a = 3, check.names = FALSE) 

dat %>% poorman::select(a)
#>   a
#> 1 1

dat %>% dplyr::select(a)
#> Error: Names must be unique.
#> x These names are duplicated:
#>   * "a" at locations 1 and 2.

Created on 2021-05-24 by the reprex package (v0.3.0)

The question is: is {poorman} supposed be 100% consistent with {dplyr}?

If yes then poorman::select should throw an error as well.

On the other hand, {poorman} - unlike {dplyr} - might not be bound in the same way to the concept of tidy data, and it would be nice to have a go-to package when dealing with untidy data.frame's. In this case both a columns should be selected.

Regarding mutate the behavior differs as well:

dat %>% poorman::mutate(c = 4)
#>   a b a.1 c
#> 1 1 2   3 4

dat %>% dplyr::mutate(c = 4)
#> Error: Can't transform a data frame with duplicate names.

It seems like mutate automatically uses check.names = TRUE and renames the duplicate column name without notice. In this case an error might be preferable (or as an alternative, the column names could be left untouched).

Created on 2021-05-24 by the reprex package (v0.3.0)

I didn't consider this to be a "bug", so I opened a blank issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions