How to Build a Data Science Web App in Python (Penguin Classifier) – Part 3

Chanin Nantasenamat

October 19, 2020 Big Data, Cloud & DevOps

Part 3: ML-Powered Web App in a Little Over 100 Lines of Code

This is Part 3 and I will be showing you how to build a machine learning powered data science web app in Python using the Streamlit library in a little over 100 lines of code.

The web app that we will be building today is the Penguins Classifier. The demo of this Penguins Classifier web app that we are building is available at http://dp-penguins.herokuapp.com/.

Previously, in Part 1 of this Streamlit tutorial series, I have shown you how to build your first data science web app in Python that is able to fetch stock price data from Yahoo! Finance followed by displaying a simple line chart. In Part 2, I have shown you how to build a machine learning web app using the Iris dataset.

As also explained in previous articles of this <em>Streamlit Tutorial Series</em>, model deployment is an essential and final component of the data science life cycle that helps to bring the power of data-driven insights to the hands of end users whether it be business stakeholders, managers or customers.

This article is based on a video that I had made on the same topic on the Data Professor YouTube channel (How to Build a Penguin Classification Web App in Python) in which you can watch it alongside reading this article.

Overview of the Penguin Classification Web App

In this article, we will be building a Penguin Classifier web app for predicting the class label of Penguin species as being Adelie, Chinstrap or Gentoo as a function of 4 quantitative variables and 2 qualitative variables.

Penguins dataset

The data used in this machine learning-powered web app is called the Palmer Penguins dataset, which is released as an R package by Allison Horst. Particularly, the data is derived from the published work of Dr. Kristen Gorman and colleagues entitled Ecological Sexual Dimorphism and Environmental Variability within a Community of Antarctic Penguins (Genus Pygoscelis).

The data set is comprised of 4 quantitative variables:

Bill length (mm)
Bill depth (mm)
Flipper length (mm)
Body mass (g)

And 2 qualitative variables:

Sex (male/female)
Island (Biscoe/Dream/Torgersen)

Let’s take a look at the Penguins dataset (shown below is a truncated version that shows only the first 3 row entries for each of the 3 Penguin species):https://towardsdatascience.com/media/8ec412d3767aca2b2c5bffd0c8f05422

(Note: The full version of the Penguins dataset is available on the Data Professor GitHub)<

Components of the Penguins Classifier web app

The Penguins Classifier web app is comprised of the Front-end and the Back-end:

Front-end — This is what we see upon loading the web app. The front-end can be further broken down into the Side Panel and the Main Panel. Screenshot of the web app is shown below.

The Side Panel is found on the left and it is labeled to have the header title of “User Input Features”. It is here that the user can either upload a CSV file containing the input features (2 qualitative and 4 quantitative variables). For the 4 quantitative variables, users can manually enter the input values of these input features by adjusting the slider bars. As for the 2 qualitative variables, users can select input values via the drop-down menus.

These user input features serve as input to the machine learning model that will be discussed in the back-end. Once a prediction is made, the resulting class label (the Penguins species) along with the Prediction Probability values are sent back to the front-end for display on the Main Panel.

Back-end — The user input features will be converted into a dataframe and sent to the machine learning model for predictions to be made. Herein, we will be using a pre-trained model that was previously saved as a pickle object called penguins_clf.pkl that can be quickly loaded in by the web app (without the need to build a machine learning model each time the web app is loaded by the user).

For this tutorial, we will be using 5 Python libraries: streamlit, pandas, numpy, scikit-learn and pickle. The first 4 will have to be installed if it is not yet already present in your computer while the last library is comes as a built-in library.

To install the libraries, you can easily do this via thepip install command as follows:

pip install streamlit

Then, repeat the above commands by first replacing streamlit with the name of other library such as pandas such that it becomes pip install pandas, and so forth.

Or, you can install them all at once with this one-liner:

pip install streamlit pandas numpy scikit-learn

Codes of the web app

Now, let’s look under the hood of the web app. You will see that the web app is made up of 2 files: penguins-model-building.py and penguins-app.py.

The first file (penguins-model-building.py) is used to build the machine learning model and saved as a pickle file, penguins_clf.pkl.

Subsequently, the second file (penguins-app.py) will apply the trained model (penguins_clf.pkl) to predict the class label (the Penguin’s species as being Adelie, Chinstrap or Gentoo) by using input parameters from the sidebar panel of the web app’s front-end.

Line-by-line explanation of the code

penguins-model-building.

Let’s start with the explanation of this first file that will essentially allow us to pre-build a trained machine learning model prior to running the web app. Why are we doing that? It is to save computational resources in the long run as we are initially building the model once and then applying it to make indefinite predictions (or at least until we re-train the model) on user input parameters made on the sidebar panel of the web app.

penguins-model-building.py

Line 1
Import the pandas library with alias of pd
Line 2
Reads the cleaned penguins dataset from CSV file and assigning it to the penguins variable
Lines 4–19
Perform ordinal feature encoding on the 3 qualitative variables comprising of the target Y variable (species) and the 2 X variables (sex and island).

Lines 21–23
Separates the df dataframe to X and Y matrices.
Lines 25–28
Trains a random forest model
Lines 30–32
Saves the trained random forest model to a pickled file called penguins_clf.pkl.

penguins-app.py

This second file will serve the web app that will allow predictions to be made using the machine learning model loaded from the pickled file. As mentioned above, the web app accepts inout values from 2 sources:

Feature values from the slider bars.
Feature values from the uploaded CSV file.

penguins-app.py

Lines 1–5
Import streamlit, pandas and numpy libraries with aliases of st, pd and np, respectively. Next, import the pickle library and finally imports the RandomForestClassifier() function from sklearn.ensemble.
Lines 7–13
Writes the web app title and intro text.

Sidebar Panel

Line 15
Header title of the sidebar panel.
Lines 17–19
Link to download an example CSV file.
Lines 21–41
Collects feature values and puts it into a dataframe. We are going to use conditional statements if and else for determining whether the user has uploaded a CSV file (if so then read the CSV file and convert that into a dataframe) or enter feature values by sliding the slider bars whose values will also be converted into a dataframe.
Lines 43–47
Combines user input features (either from CSV file or from the slider bars) with the entire penguins dataset. The reason for doing this is to ensure that all variables contain the maximal number of possible values. For instance, if the user input contains data for 1 penguin then the ordinal feature encoding will not work. The reason is because the code will detect only 1 possible value for qualitative variables. For ordinal feature encoding to work, each of the qualitative variable will need to have all possible values.

Situation A

In this first scenario, the qualitative variable island has only 1 possible value which is Biscoe.

The above input feature will produce the following ordinal features after encoding.

Situation B

The above input features will produce the following ordinal features.

Lines 49–56
Performs ordinal feature encoding in a similar fashion as explained above in the model building phase (penguins-model-building.py).
Lines 58–65
Displays the dataframe of the user input features. Conditional statements will allow the code to automatically determine either to display the dataframe of data from the CSV file or from the slider bars.
Lines 67–68
Loads the predictive model from the pickled file, penguins_clf.pkl.
Lines 70–72
Applies the loaded model to make predictions on the df variable, which corresponds to input from the CSV file or from the slider bars.
Lines 74–76
Predicted class label of the penguins species are displayed here.
Lines 78–79
Prediction probability values for each of the 3 penguins species are shown here.

Running the web app

Now that we have finished coding the web app, let’s launch it by first firing up your command prompt (terminal window) and type the following command:

streamlit run penguins-app.py

The following message should then be displayed in the command prompt:

> streamlit run penguins-app.py

> streamlit run penguins-app.py

You can now view your Streamlit app in your browser.

Local URL: http://localhost:8501
Network URL: http://10.0.0.11:8501

A screenshot of the penguins classifier web app is shown below:

Deploying and showcasing the web app

Great job! You have now created a machine learning-powered web app. Let’s deploy the web app to the internet so that you can share it to your friends and family.

Leave a Comment

Next Post

How Chatbots Have Created A Storm In The Tech World?

How Chatbots Have Created A Storm In The Tech World?

Leave a Reply Cancel reply

Big Data, Cloud & DevOps

Cognitive Load Of Being On Call: 6 Tips To Address It

If you’ve ever been on call, you’ve probably experienced the pain of being woken up at 4 a.m., unactionable alerts, alerts going to the wrong team, and other unfortunate events. But, there’s an aspect of being on call that is less talked about, but even more ubiquitous – the cognitive load. “Cognitive load” has perhaps

5 MINUTES READ Continue Reading »

Big Data, Cloud & DevOps

How To Refine 360 Customer View With Next Generation Data Matching

Knowing your customer in the digital age Want to know more about your customers? About their demographics, personal choices, and preferable buying journey? Who do you think is the best source for such insights? You’re right. The customer. But, in a fast-paced world, it is almost impossible to extract all relevant information about a customer

4 MINUTES READ Continue Reading »

Big Data, Cloud & DevOps

3 Ways Businesses Can Use Cloud Computing To The Fullest

Cloud computing is the anytime, anywhere delivery of IT services like compute, storage, networking, and application software over the internet to end-users. The underlying physical resources, as well as processes, are masked to the end-user, who accesses only the files and apps they want. Companies (usually) pay for only the cloud computing services they use,

7 MINUTES READ Continue Reading »

Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

Join Us At

1700 West Park Drive, Suite 190
Westborough, MA 01581

Email: support@experfy.com

Toll Free: (844) EXPERFY or
(844) 397-3739

© 2024, Experfy Inc. All rights reserved.