hành vi kỳ lạ với chọn trong dplyr

Tôi đang gặp phải hành vi lạ với chức năng select của dplyr. Nó không làm giảm biến từ khung dữ liệu.hành vi kỳ lạ với chọn trong dplyr

Đây là dữ liệu gốc:

orig <- structure(list(park = structure(c(4L, 4L, 4L, 4L, 4L), .Label = c("miss", 
"piro", "sacn", "slbe"), class = "factor"), year = c(2006L, 2009L, 
2006L, 2008L, 2009L), agent = structure(c(5L, 5L, 5L, 7L, 5L), .Label = c("agriculture", 
"beaver", "development", "flooding", "forest_pathogen", "harvest_00_20", 
"harvest_30_60", "harvest_70_90", "none"), class = "factor"), 
    ha = c(4.32, 1.17, 3.51, 2.07, 9.18), loc_01 = structure(c(9L, 
    5L, 9L, 5L, 5L), .Label = c("miss", "non_miss", "non_piro", 
    "non_sacn", "non_slbe", "none", "piro", "sacn", "slbe"), class = "factor"), 
    loc_02 = structure(c(5L, 1L, 5L, 1L, 1L), .Label = c("none", 
    "piro_core", "piro_ibz", "slbe_mainland", "slbe_southmanitou" 
    ), class = "factor"), loc_03 = structure(c(1L, 1L, 1L, 1L, 
    1L), .Label = "none", class = "factor"), cross_valid = c(1L, 
    1L, 1L, 1L, 1L)), .Names = c("park", "year", "agent", "ha", 
"loc_01", "loc_02", "loc_03", "cross_valid"), row.names = c(NA, 
5L), class = "data.frame")

Hình như:

> orig 
    park year   agent ha loc_01   loc_02 loc_03 cross_valid 
1 slbe 2006 forest_pathogen 4.32  slbe slbe_southmanitou none   1 
2 slbe 2009 forest_pathogen 1.17 non_slbe    none none   1 
3 slbe 2006 forest_pathogen 3.51  slbe slbe_southmanitou none   1 
4 slbe 2008 harvest_30_60 2.07 non_slbe    none none   1 
5 slbe 2009 forest_pathogen 9.18 non_slbe    none none   1 
> str(orig) 
'data.frame': 5 obs. of 8 variables: 
$ park  : Factor w/ 4 levels "miss","piro",..: 4 4 4 4 4 
$ year  : int 2006 2009 2006 2008 2009 
$ agent  : Factor w/ 9 levels "agriculture",..: 5 5 5 7 5 
$ ha   : num 4.32 1.17 3.51 2.07 9.18 
$ loc_01  : Factor w/ 9 levels "miss","non_miss",..: 9 5 9 5 5 
$ loc_02  : Factor w/ 5 levels "none","piro_core",..: 5 1 5 1 1 
$ loc_03  : Factor w/ 1 level "none": 1 1 1 1 1 
$ cross_valid: int 1 1 1 1 1

Sau đó, tôi làm một chút tóm tắt ...

library (dplyr) 
    summ <- orig %>% 
    + group_by(park,cross_valid,agent) %>% 
    + summarise(ha_dist=sum(ha)) 
    summ 
    Source: local data frame [2 x 4] 
    Groups: park, cross_valid 

     park cross_valid   agent ha_dist 
    1 slbe   1 forest_pathogen 18.18 
    2 slbe   1 harvest_30_60 2.07 
    str(summ) 
    Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 4 variables: 
    $ park  : Factor w/ 4 levels "miss","piro",..: 4 4 
    $ cross_valid: int 1 1 
    $ agent  : Factor w/ 9 levels "agriculture",..: 5 7 
    $ ha_dist : num 18.18 2.07 
    - attr(*, "vars")=List of 2 
     ..$ : symbol park 
     ..$ : symbol cross_valid 
    - attr(*, "drop")= logi TRUE

Sau đó, tôi cố gắng thả 'cross_valid '...

sel <- select (summ,-cross_valid) 
summ 
Source: local data frame [2 x 4] 
Groups: park, cross_valid 

    park cross_valid   agent ha_dist 
1 slbe   1 forest_pathogen 18.18 
2 slbe   1 harvest_30_60 2.07 
str(summ) 
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 4 variables: 
$ park  : Factor w/ 4 levels "miss","piro",..: 4 4 
$ cross_valid: int 1 1 
$ agent  : Factor w/ 9 levels "agriculture",..: 5 7 
$ ha_dist : num 18.18 2.07 
- attr(*, "vars")=List of 2 
    ..$ : symbol park 
    ..$ : symbol cross_valid 
- attr(*, "drop")= logi TRUE 
- attr(*, "indices")=List of 1 
    ..$ : int 0 1 
- attr(*, "group_sizes")= int 2 
- attr(*, "biggest_group_size")= int 2 
- attr(*, "labels")='data.frame': 1 obs. of 2 variables: 
    ..$ park  : Factor w/ 4 levels "miss","piro",..: 4 
    ..$ cross_valid: int 1 
    ..- attr(*, "vars")=List of 2 
    .. ..$ : symbol park 
    .. ..$ : symbol cross_valid

Và nó sẽ không thả summ$cross_valid

Nếu tôi sử dụng cơ sở R để thả cross_valid, nó hoạt động ...

base.sel <- summ[-2] 
base.sel 
Source: local data frame [2 x 3] 
Groups: 

    park   agent ha_dist 
1 slbe forest_pathogen 18.18 
2 slbe harvest_30_60 2.07

tôi có thể thả orig$cross_valid sử dụng lựa chọn ...

drop.orig <- select (orig,-cross_valid) 
drop.orig 
    park year   agent ha loc_01   loc_02 loc_03 
1 slbe 2006 forest_pathogen 4.32  slbe slbe_southmanitou none 
2 slbe 2009 forest_pathogen 1.17 non_slbe    none none 
3 slbe 2006 forest_pathogen 3.51  slbe slbe_southmanitou none 
4 slbe 2008 harvest_30_60 2.07 non_slbe    none none 
5 slbe 2009 forest_pathogen 9.18 non_slbe    none none

Vì tôi có thể thả biến với cơ sở R, nó không phải là một vấn đề lớn, nhưng tôi nghĩ rằng có thể có một số trục trặc với dplyr. Nó có thể là một cái gì đó với cấu trúc của biến, nhưng tôi không biết nó sẽ là gì.

Cảm ơn ..

-cherrytree

Nguồn

2014-09-08 cherrytree

Hãy thử ungroup()

summ%>% 
ungroup() %>% 
select(-cross_valid) 
# park   agent ha_dist 
#1 slbe forest_pathogen 18.18 
#2 slbe harvest_30_60 2.07 



groups(summ) 
#[[1]] 
#park 

#[[2]] 
#cross_valid

Nguồn

2014-09-08 16:08:04 akrun

Yep, bạn không thể '-select' một biến nhóm. – Ajar

Cảm ơn @akrun. Tôi không biết bạn không thể loại bỏ một biến nhóm ... rất thú vị và tốt để biết. – cherrytree

@cherrytree Không sao cả. Vui vì nó đã giúp – akrun

hành vi kỳ lạ với chọn trong dplyr

Trả lời

Các vấn đề liên quan