1 Reply Latest reply on Dec 19, 2017 8:17 PM by Ajit Kumar Pookalangara

# Beginner-----How to evaluate the return value by CV function？

e.g :

use the function:

,

cross_val = cross_val_score(KNN, train_x, train_y, cv=4, scoring='neg_mean_squared_error')

then return 4 values:

-0.19076923, -0.17384615, -0.14153846, -0.15846154

and How to know whether these values are what I want or still need to improve the model?

Depend on cross_val.mean() / Y_data?

• ###### 1. Re: Beginner-----How to evaluate the return value by CV function？

The goal of CV is tuning the hyperparameters of the algorithm being used (e.g. KNN ) or model selection. We would be choosing the parameters which will best generalize the data and along the best K value. You can use CV to find out this best K

You run the KNN for a range of values of K (say 1: 10) and check the validation scores

Model Selection:

Compare how different models are performing by simply calculating the mean of the scores

print(cross_val_score(knn, X, y, cv=10, scoring='accuracy').mean())

say 0.97

print(cross_val_score(logreg, X, y, cv=10, scoring='accuracy').mean())

say 0.93

We can conclude that KNN is better than Logistic regression

Parameter Selection : Finding the best value of K when using KNN

k_range = range(1, 10)

k_scores = []

for k in k_range:

knn = KNeighborsClassifier(n_neighbors=k)

scores = cross_val_score(knn, X, y, cv=10, scoring='accuracy')

k_scores.append(scores.mean()) # use average

You can  plot the K values vs. scores and choose K that gives best accuracy or print and see the scores