How to fine-tune DistilBERT for text binary classification via Hugging Face API for TensorFlow.
In this tutorial, you will see a binary text classification implementation with the Transfer Learning technique. For this purpose, we will use the DistilBert, a pre-trained model from the Hugging Face Transformers library and its API for Tensorflow.
This time I will explain (with full code examples) how to create a web scraper in eight steps using the Selenium Python framework.
I will take a recipe site https://www.simplyrecipes.com/. The subject of this post can be a base part of any Data Science project: data collection.
So I chose this website because it just contains the data I need for my NLP adjective. Additionally, this tutorial in Step 3, Step 5, and Step 7 will cover some specific issues (selenium exceptions) which can arise during web crawling. …
Data pre-processing is a fundamental part of data scientist work. Apart from data collecting, it is one of the principal stages. On it depends our future model’s quality and accuracy. The better we clean/prepare the data:
So what is pre-processing in our current case?
In simple words: it is the process of text transformations. You have to make text useful for the analysis and prediction of your business goal.
It is my first tutorial about web scraping. I will explain (with full code examples) how to create a web scraper using BeautifulSoup and Grequests Python libraries.
Assuming you have an NLP task — collect text data from the recipe website and make a binary classification: ingredients/instructions. Let’s scrape the data from a recipe site https://www.loveandlemons.com/. For this purpose, we will use the most popular, beginner-friendly libraries: BeautifulSoup and Grequests.
BeautifulSoup is open-source and completely free to use the library, makes it easy to scrape information from web pages. It sits at the top of an HTML or XML parser…
Let’s start with the background of this race for the truth.
In the first week of our boot camp training (winter 2020), we got the first team task to make a presentation “within 20 minutes” on one of 12 topics. No one attached much importance to this: there were still four weeks left before the day of the presentation. But in two days the conditions had changed: me and my teammate Yoav Vollansky should speak about the differences between Python 2 and Python 3 to the audience already in 3 days.A …