Sklearn is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection, model evaluation, and many other utilities.
You can get some datasets which already exist in the library, if you can use those datasets for your own purposes.
There are many datasets you can use . These are some datasets from sklearn
California housing dataset
Covertype dataset
Kddcup99 dataset
Olivetti faces data-set from AT&T
Species distribution dataset from Phillips
Iris dataset
To see more datasets , you can go to this page
Now if you want to use iris dataset, you have to load it
from sklearn.datasets import load_iris
data = load_iris()
What this data
returns? let's see
If you print the data you will see a dictionary. This dictory is called Bunch object. If you want to know more about the Bunch object , excute the following code
from sklearn.utils import Bunch
?Bunch
Output
Init signature: Bunch(**kwargs)
Docstring:
Container object exposing keys as attributes.
Bunch objects are sometimes used as an output for functions and methods.
They extend dictionaries by enabling values to be accessed by key,
`bunch["value_key"]`, or by an attribute, `bunch.value_key`.
Examples
--------
>>> from sklearn.utils import Bunch
>>> b = Bunch(a=1, b=2)
>>> b['b']
2
>>> b.b
2
>>> b.a = 3
>>> b['a']
3
>>> b.c = 6
>>> b['c']
6
File: /opt/conda/lib/python3.10/site-packages/sklearn/utils/_bunch.py
Type: type
Subclasses:
So from this bunch object we can see that their are some key in load_iris(). These are "data" : Feature_matrix, "target" : Labels, "frame" : pandas dataframe, "target_names" : features, "DESCR" : description.
load_iris() accepts two parameters. One is return_X_y=False another is as_frame=False
If we give return_X_y=True. We will get feature and label matrix
feature_matrix, label_matrix = load_iris(return_X_y=True)
If we give as_frame=True, we will get pandas dataframe
data = load_iris(as_frame=True)
data.frame
Output :
To learn more about datasets , go to this page https://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets
Thank you so much for reading.