Data Science Basics SS2 Digital Technologies Lesson Note

Download Lesson Note
Lesson Notes

Topic: Data Science Basics

What is Data Science?

Imagine you are the owner of a small shop. If you know that every Friday, students buy more “Zobo” than “Coke,” you will make sure to prepare more Zobo on Friday mornings.

That is Data Science! It is the art of looking at information (data) from the past to make better decisions for the future. It involves three main stages: gathering the info, keeping it safe, and making sure it is correct.

 

Data Collection: Gathering the Facts

Data collection is the process of gathering information from different sources. If you don’t have good data, you can’t make good decisions.

Methods of Collection:

  • Surveys/Questionnaires: Asking people questions directly (e.g., Google Forms).
  • Observation: Counting things as they happen (e.g., counting how many cars pass the school gate).
  • Sensors: Automatic machines that record data (e.g., a thermometer recording the temperature every hour).
  • Web Scraping: Using a computer to “copy” information from websites automatically.

 

Data Storage: Where does it go?

Once you have collected your data, you need a place to put it. We don’t just throw it in one big pile; we organize it.

Types of Storage:

  1. Spreadsheets (Excel/Google Sheets): Great for small amounts of data that look like a table.
  2. Databases (SQL): Used for giant amounts of data (like all the bank accounts in Nigeria). It’s like a very fast, digital filing cabinet.
  3. The Cloud: Storing data on the internet (like Google Drive) so you can access it from any computer in the world.

Teacher’s Tip: Always have a backup! Data storage can fail (hard drives crash), so professional Data Scientists always keep copies in different places.

 

Data Cleaning: The Most Important Step

Real-world data is usually “dirty.” This doesn’t mean it has sand on it! It means it has mistakes, missing parts, or duplicates. If you analyze dirty data, you will get the wrong answers.

Common “Dirty” Data Problems:

  • Missing Values: A student forgot to write their age on a form.
  • Duplicates: The same person signed the attendance sheet twice.
  • Inconsistent Formatting: One person wrote “June 1st,” another wrote “01/06/2026,” and another wrote “1-Jun.”
  • Outliers: Mistakes that don’t make sense (e.g., a student’s age recorded as 200 years old).

How we “Clean” it:

  1. Removing Duplicates: Deleting the extra copies.
  2. Handling Missing Data: Either deleting the row or filling it with an average.
  3. Standardizing: Making sure all dates and names follow the same format.

 

Summary: The Data Life Cycle

Stage What we do Analogy
Collection Gathering info. Harvesting crops from a farm.
Storage Keeping info safe. Putting the crops in a barn.
Cleaning Fixing mistakes. Sorting the good crops from the rotten ones.
Analysis Finding patterns. Cooking the crops to make a meal.

 

Ethics in Data Science (A Note on Privacy)

When we collect data, we are often collecting information about people.

  • Consent: Always ask permission before taking someone’s data.
  • Security: Keep personal data (like home addresses) locked so bad people can’t find it.
  • Anonymity: When showing results, don’t show names. Say “50% of students,” not “Bisi and Obi failed.”

 

Class Activity: The “Snack Survey”

  1. Collection: Each student should ask 5 classmates what their favorite snack is. Write it down on a piece of paper.
  2. Storage: Create a small table on your paper with two columns: “Name” and “Snack.”
  3. Cleaning: Look at your list. Did anyone spell “Biscuits” wrong? Did anyone leave the snack column blank? Fix those errors now.
  4. Analysis: Which snack is the most popular in your group?

Lesson Notes for Other Classes