Kaggle datasets download featured

Kaggle Datasets Download Guide

Kaggle is one of the best websites every data scientist should be aware of, simply because it gives free access to one of the largest collection of datasets out there.

It’s a meeting ground for beginners and experts alike. Furthermore, you can learn from notebooks, which users publish for everyone to see. I think it’s one of the greatest social media outlets, where all the content that’s posted is for science.

There are a number of ways you can download any given dataset from Kaggle. For one, probably the most obvious, is you can download them directly from the website. Furthermore, you can inspect each dataset before you download it.

I use Kaggle datasets for developing machine learning models and experiment with all sorts of dataset formats. However, I don’t prefer downloading them directly from the website, so I can make my projects more portable.

Download Kaggle datasets using python

Another way of downloading a dataset is by using Kaggle API client and I’m going to demonstrate how to do it with python here. But first, you need to setup your API token and download the official Kaggle library using pip command.

So first of all, let’s install Kaggle API client by running “pip install kaggle” command in your terminal. And secondly, we need to get the API token from the website. You can find this section on your account page under API section, where you can just click on Create New API Token and it will download a kaggle.json file.

Kaggle API Token section for downloading datasets and participating in competitions
Kaggle API token section

Furthermore, this will also download it in C:/Users/your-username/.kaggle folder, which the client will access when we authenticate it in our python script. So let’s get to this step next.

from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()

It’s simple as that, and now all we have to do is call another method from this KaggleApi class that will download our desired dataset. In order to point to the dataset of your choosing we need to copy a part of the URL where it resides on the Kaggle website and input the name of the dataset file.

Kaggle dataset URL
Kaggle dataset URL
api.dataset_download_file(
    'brendan45774/test-file',
    file_name='tested.csv'
)

Keep in mind that this will download the dataset into the root folder of your project. Therefore, if you want to keep things neet, you can also download it in a separate folder. All you need to do, is use “path” argument in the method above and point to that folder.

api.dataset_download_file(
    'brendan45774/test-file',
    file_name='tested.csv',
    path='path/to/desired/folder'
)

Third way of downloading datasets

Lastly, the third way of downloading Kaggle datasets is by using API client directly, using “kaggle” command in your terminal.

kaggle datasets download brendan45774/test-file

Conclusion

I hope this short guide shed some light on how to use Kaggle within your projects. It truely is a marvelous tool for anyone who aspires to delve into data science waters.

Share this article:

Related posts

Discussion(0)