pandas.Series.applyでDataFrameを返す

decision tree の hyper parameter

grid_search_scores.py

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.cross_validation import KFold
from sklearn.grid_search import GridSearchCV


clf = DecisionTreeClassifier(criterion='entropy', max_depth=2, min_samples_leaf=2)

param_grid = {'max_depth': [2, 3, 4, 5], 'min_samples_leaf': [2, 3, 4, 5]}
cv = KFold(len(y), 5, shuffle=True, random_state=0)

grid_search = GridSearchCV(clf, param_grid, cv=cv, n_jobs=-1, verbose=1)
grid_search.fit(X, y)

grid_search.best_score_, grid_search.best_params_
grid_search.grid_scores_

[mean: 0.76880, std: 0.02381, params: {'max_depth': 2, 'min_samples_leaf': 2},
 mean: 0.76880, std: 0.02381, params: {'max_depth': 2, 'min_samples_leaf': 3},
 mean: 0.76880, std: 0.02381, params: {'max_depth': 2, 'min_samples_leaf': 4},
 mean: 0.76880, std: 0.02381, params: {'max_depth': 2, 'min_samples_leaf': 5},
 mean: 0.80471, std: 0.01474, params: {'max_depth': 3, 'min_samples_leaf': 2},
...
]

pandas.Series.applyでDataFrameを返す

dictのlistを見たら、DataFrameで包んで見やすくしたくなるのでやってみた

問題点

flatじゃないので、paramsのcolumnにdictが入っていて見づらい

解決までの流れ

  • Series.apply の documentに特に記載がなかった
  • SeriesからDataFrameといえば、 pandas.Series.str.split("sep", expand=True) . 探してみるも自分のpandas core力が低くて断念
  • pandas committer のblogにapply時にSeriesで返すとよいと書いてあった
  • applyのcodeを見てみるとSeriesの場合にDataFrameで包んで返しているところがあった
gs_df = pd.DataFrame(grid_search.grid_scores_)
#pd.DataFrame(gs_df.parameters)
params_df = gs_df.parameters.apply(
    lambda p: pd.Series(list(p.values()), p.keys()))
# listにしないとtupleで返ってくる
pd.concat([
    gs_df.drop(["parameters", "cv_validation_scores"], axis=1),
    params_df
], axis=1).head()

    mean_validation_score   max_depth   min_samples_leaf
0   0.768799    2   2
1   0.768799    2   3

参考

  • https://github.com/pydata/pandas/blob/e1aa2d94b416ee31da705b186facc707710671e6/pandas/core/strings.py#L1365
  • https://github.com/pydata/pandas/blob/5e11243a11cf09007f774b3605e32ee8aa3f9592/pandas/core/series.py#L2197
  • http://sinhrks.hatenablog.com/entry/2015/06/18/221747