Pandas - 4

Pandas Lesson – Part 3: Grouping, Merging, Exporting

Pandas Lesson – Part 3

Grouping, Merging, and Exporting Data

Welcome back! In this final part, we’ll explore:

  • Grouping – Summarizing data by category
  • Merging – Combining multiple DataFrames
  • Exporting – Saving results to CSV/Excel

1. Grouping Data

Grouping lets us find totals/averages for categories. Example with students' marks:

import pandas as pd

data = {
    'Name': ['Amit', 'Neha', 'Raj', 'Priya', 'Karan', 'Divya'],
    'Subject': ['Math', 'Math', 'Science', 'Science', 'Math', 'Science'],
    'Marks': [85, 90, 78, 88, 92, 81]
}
df = pd.DataFrame(data)

print("Grouped by Subject:")
print(df.groupby('Subject')['Marks'].mean())
    

Output:

Subject
Math       89.0
Science    82.3
Name: Marks, dtype: float64
    

2. Merging DataFrames

Merging helps combine data from two tables.

students = pd.DataFrame({
    'ID': [1, 2, 3],
    'Name': ['Amit', 'Neha', 'Raj']
})

scores = pd.DataFrame({
    'ID': [1, 2, 3],
    'Marks': [85, 90, 78]
})

merged = pd.merge(students, scores, on='ID')
print(merged)
    

Output:

   ID   Name  Marks
0   1   Amit     85
1   2   Neha     90
2   3    Raj     78
    

3. Exporting to CSV/Excel

We can save results into files.

# Save to CSV
merged.to_csv("students.csv", index=False)

# Save to Excel
merged.to_excel("students.xlsx", index=False)
    

This will create students.csv and students.xlsx in your working folder.

Practice Problems

  1. Group the following DataFrame by 'City' and find average salary:
    data = {
        'Name': ['Amit', 'Neha', 'Raj', 'Priya'],
        'City': ['Delhi', 'Delhi', 'Mumbai', 'Mumbai'],
        'Salary': [50000, 60000, 55000, 65000]
    }
    df = pd.DataFrame(data)
            
    print(df.groupby('City')['Salary'].mean())
    # Delhi     55000
    # Mumbai    60000
              
  2. Merge two DataFrames: one with ID, Name, another with ID, Age. Show combined output.
    students = pd.DataFrame({
        'ID': [1, 2],
        'Name': ['Amit', 'Neha']
    })
    
    ages = pd.DataFrame({
        'ID': [1, 2],
        'Age': [20, 21]
    })
    
    print(pd.merge(students, ages, on='ID'))
    #   ID   Name  Age
    # 0  1   Amit   20
    # 1  2   Neha   21
              
  3. Save any DataFrame you created into a CSV and open it in Excel.
    df.to_csv("output.csv", index=False)
    # Open output.csv in Excel or any spreadsheet software
              

Post a Comment

0 Comments

Me