How to load datasets in sklearn?

How to load datasets in sklearn?

Sklearn is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection, model evaluation, and many other utilities.

You can get some datasets which already exist in the library, if you can use those datasets for your own purposes.

There are many datasets you can use . These are some datasets from sklearn

  • California housing dataset

  • Covertype dataset

  • Kddcup99 dataset

  • Olivetti faces data-set from AT&T

  • Species distribution dataset from Phillips

  • Iris dataset

To see more datasets , you can go to this page

Now if you want to use iris dataset, you have to load it

from sklearn.datasets import load_iris
data = load_iris()

What this data returns? let's see

If you print the data you will see a dictionary. This dictory is called Bunch object. If you want to know more about the Bunch object , excute the following code

from sklearn.utils import Bunch
?Bunch

Output

Init signature: Bunch(**kwargs)
Docstring:     
Container object exposing keys as attributes.

Bunch objects are sometimes used as an output for functions and methods.
They extend dictionaries by enabling values to be accessed by key,
`bunch["value_key"]`, or by an attribute, `bunch.value_key`.

Examples
--------
>>> from sklearn.utils import Bunch
>>> b = Bunch(a=1, b=2)
>>> b['b']
2
>>> b.b
2
>>> b.a = 3
>>> b['a']
3
>>> b.c = 6
>>> b['c']
6
File:           /opt/conda/lib/python3.10/site-packages/sklearn/utils/_bunch.py
Type:           type
Subclasses:

So from this bunch object we can see that their are some key in load_iris(). These are "data" : Feature_matrix, "target" : Labels, "frame" : pandas dataframe, "target_names" : features, "DESCR" : description.

load_iris() accepts two parameters. One is return_X_y=False another is as_frame=False

If we give return_X_y=True. We will get feature and label matrix

feature_matrix, label_matrix = load_iris(return_X_y=True)

If we give as_frame=True, we will get pandas dataframe

data = load_iris(as_frame=True)
data.frame

Output :

To learn more about datasets , go to this page https://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets

Thank you so much for reading.