1 Reply Latest reply on Dec 19, 2017 8:17 PM by Ajit Kumar Pookalangara

    Beginner-----How to evaluate the return value by CV function?


      e.g :

      use the function:


      cross_val = cross_val_score(KNN, train_x, train_y, cv=4, scoring='neg_mean_squared_error')


      then return 4 values:


      -0.19076923, -0.17384615, -0.14153846, -0.15846154


      and How to know whether these values are what I want or still need to improve the model?


      Depend on cross_val.mean() / Y_data?

        • 1. Re: Beginner-----How to evaluate the return value by CV function?
          Ajit Kumar Pookalangara

          The goal of CV is tuning the hyperparameters of the algorithm being used (e.g. KNN ) or model selection. We would be choosing the parameters which will best generalize the data and along the best K value. You can use CV to find out this best K

          You run the KNN for a range of values of K (say 1: 10) and check the validation scores


          Model Selection:


          Compare how different models are performing by simply calculating the mean of the scores

          print(cross_val_score(knn, X, y, cv=10, scoring='accuracy').mean())

          say 0.97

          print(cross_val_score(logreg, X, y, cv=10, scoring='accuracy').mean())

          say 0.93

          We can conclude that KNN is better than Logistic regression


          Parameter Selection : Finding the best value of K when using KNN


          k_range = range(1, 10)

          k_scores = []


          for k in k_range:

             knn = KNeighborsClassifier(n_neighbors=k)

             scores = cross_val_score(knn, X, y, cv=10, scoring='accuracy')

             k_scores.append(scores.mean()) # use average


          You can  plot the K values vs. scores and choose K that gives best accuracy or print and see the scores