In tests it could be ok to have copy/paste code but I believe that in common estimator checks we could remove the boilerplate of datasets generation for each test by creating a common fixture. Apart from that we could reduce the number of instances of the dataset(s) in order to speed up the execution of the commont tests. The later could be measured, though.