Data Science Basics SS2 Digital Technologies Lesson Note
Download Lesson NoteTopic: Data Science Basics
What is Data Science?
Imagine you are the owner of a small shop. If you know that every Friday, students buy more “Zobo” than “Coke,” you will make sure to prepare more Zobo on Friday mornings.
That is Data Science! It is the art of looking at information (data) from the past to make better decisions for the future. It involves three main stages: gathering the info, keeping it safe, and making sure it is correct.
Data Collection: Gathering the Facts
Data collection is the process of gathering information from different sources. If you don’t have good data, you can’t make good decisions.
Methods of Collection:
- Surveys/Questionnaires: Asking people questions directly (e.g., Google Forms).
- Observation: Counting things as they happen (e.g., counting how many cars pass the school gate).
- Sensors: Automatic machines that record data (e.g., a thermometer recording the temperature every hour).
- Web Scraping: Using a computer to “copy” information from websites automatically.
Data Storage: Where does it go?
Once you have collected your data, you need a place to put it. We don’t just throw it in one big pile; we organize it.
Types of Storage:
- Spreadsheets (Excel/Google Sheets): Great for small amounts of data that look like a table.
- Databases (SQL): Used for giant amounts of data (like all the bank accounts in Nigeria). It’s like a very fast, digital filing cabinet.
- The Cloud: Storing data on the internet (like Google Drive) so you can access it from any computer in the world.
Teacher’s Tip: Always have a backup! Data storage can fail (hard drives crash), so professional Data Scientists always keep copies in different places.
Data Cleaning: The Most Important Step
Real-world data is usually “dirty.” This doesn’t mean it has sand on it! It means it has mistakes, missing parts, or duplicates. If you analyze dirty data, you will get the wrong answers.
Common “Dirty” Data Problems:
- Missing Values: A student forgot to write their age on a form.
- Duplicates: The same person signed the attendance sheet twice.
- Inconsistent Formatting: One person wrote “June 1st,” another wrote “01/06/2026,” and another wrote “1-Jun.”
- Outliers: Mistakes that don’t make sense (e.g., a student’s age recorded as 200 years old).
How we “Clean” it:
- Removing Duplicates: Deleting the extra copies.
- Handling Missing Data: Either deleting the row or filling it with an average.
- Standardizing: Making sure all dates and names follow the same format.
Summary: The Data Life Cycle
| Stage | What we do | Analogy |
| Collection | Gathering info. | Harvesting crops from a farm. |
| Storage | Keeping info safe. | Putting the crops in a barn. |
| Cleaning | Fixing mistakes. | Sorting the good crops from the rotten ones. |
| Analysis | Finding patterns. | Cooking the crops to make a meal. |
Ethics in Data Science (A Note on Privacy)
When we collect data, we are often collecting information about people.
- Consent: Always ask permission before taking someone’s data.
- Security: Keep personal data (like home addresses) locked so bad people can’t find it.
- Anonymity: When showing results, don’t show names. Say “50% of students,” not “Bisi and Obi failed.”
Class Activity: The “Snack Survey”
- Collection: Each student should ask 5 classmates what their favorite snack is. Write it down on a piece of paper.
- Storage: Create a small table on your paper with two columns: “Name” and “Snack.”
- Cleaning: Look at your list. Did anyone spell “Biscuits” wrong? Did anyone leave the snack column blank? Fix those errors now.
- Analysis: Which snack is the most popular in your group?