```python
import pandas as pd
# Sample DataFrame
data = {
'id': [1, 2, 3, 4, 5, 6],
'name': ['Amit', 'Priya', 'Vikram', 'Anjali', 'Rahul', 'Neha'],
'age': [24, 27, 22, 32, 29, 35],
'city': ['Mumbai', 'Delhi', 'Bangalore', 'Chennai', 'Hyderabad', 'Kolkata'],
'score': [88, 92, 85, 70, 90, 95]
}
df = pd.DataFrame(data)
print(df)
```
This DataFrame contains the following columns:
- `id`: A unique identifier for each person.
- `name`: The name of the person.
- `age`: The age of the person.
- `city`: The city where the person lives.
- `score`: A score assigned to the person.
### Sample Queries
Here are some sample queries you can run on this DataFrame:
1. **Select all rows where age is greater than 25:**
```python
result = df[df['age'] > 25]
print(result)
```
2. **Select names and scores for people who scored more than 85:**
```python
result = df.query('score > 85')[['name', 'score']]
print(result)
```
3. **Select rows where the city is either 'Bangalore' or 'Chennai':**
```python
result = df[df['city'].isin(['Bangalore', 'Chennai'])]
print(result)
```
4. **Select all rows and sort by score in descending order:**
```python
result = df.sort_values(by='score', ascending=False)
print(result)
```
5. **Calculate the average score of all individuals:**
```python
average_score = df['score'].mean()
print("Average Score:", average_score)
```
6. **Group by 'city' and calculate the mean age for each city:**
```python
result = df.groupby('city')['age'].mean()
print(result)
```
7. **Filter rows where the name contains the letter 'a':**
```python
result = df[df['name'].str.contains('a', case=False)]
print(result)
```
8. **Select the top 3 rows with the highest scores:**
```python
result = df.nlargest(3, 'score')
print(result)
```
Sure! Here are some basic querying examples using Pandas, covering different ways to filter and manipulate DataFrame data.
### 1. Boolean Indexing
**Example:** Select all rows where the age is greater than 25.
```python
import pandas as pd
# Sample DataFrame
data = {
'id': [1, 2, 3, 4, 5, 6],
'name': ['Amit', 'Priya', 'Vikram', 'Anjali', 'Rahul', 'Neha'],
'age': [24, 27, 22, 32, 29, 35],
'city': ['Mumbai', 'Delhi', 'Bangalore', 'Chennai', 'Hyderabad', 'Kolkata'],
'score': [88, 92, 85, 70, 90, 95]
}
df = pd.DataFrame(data)
# Query
result = df[df['age'] > 25]
print(result)
```
### 2. Using `query` Method
**Example:** Select names and scores for people who scored more than 85.
```python
# Query
result = df.query('score > 85')[['name', 'score']]
print(result)
```
### 3. `loc` and `iloc`
- **`loc`:** Label-based indexing to select specific rows and columns.
- **`iloc`:** Integer-based indexing to select specific rows and columns.
**Example:** Select rows with index labels 1 and 3 and columns 'name' and 'score'.
```python
# Using loc
result = df.loc[[1, 3], ['name', 'score']]
print(result)
# Using iloc
result = df.iloc[[1, 3], [1, 4]] # Selecting same rows by integer index
print(result)
```
### 4. Combining Multiple Conditions
**Example:** Select all rows where age is greater than 25 and score is less than 90.
```python
# Query
result = df[(df['age'] > 25) & (df['score'] < 90)]
print(result)
```
### 5. Using String Functions
**Example:** Select rows where the name contains the letter 'a'.
```python
# Query
result = df[df['name'].str.contains('a', case=False)]
print(result)
```
### 6. Using `isin` Method
**Example:** Select rows where the city is either 'Bangalore' or 'Chennai'.
```python
# Query
result = df[df['city'].isin(['Bangalore', 'Chennai'])]
print(result)
```
### 7. Sorting
**Example:** Select all rows and sort by score in descending order.
```python
# Query
result = df.sort_values(by='score', ascending=False)
print(result)
```
### 8. Aggregations
**Example:** Calculate the average score of all individuals.
```python
# Query
average_score = df['score'].mean()
print("Average Score:", average_score)
```
### 9. Grouping
**Example:** Group by 'city' and calculate the mean age for each city.
```python
# Query
result = df.groupby('city')['age'].mean()
print(result)
```
### 10. Selecting Top N Rows
**Example:** Select the top 3 rows with the highest scores.
```python
# Query
result = df.nlargest(3, 'score')
print(result)
```
These examples demonstrate various ways to query and manipulate data in a Pandas DataFrame:
- **Boolean Indexing:** For simple conditions.
- **`query` Method:** For SQL-like syntax.
- **`loc` and `iloc`:** For label-based and integer-based selection.
- **Combining Conditions:** For complex filtering.
- **String Functions:** For filtering based on string patterns.
- **`isin` Method:** For checking membership in a list.
- **Sorting:** For ordering data.
- **Aggregations:** For calculating summary statistics.
- **Grouping:** For aggregating data based on groups.
- **Top N Rows:** For selecting the highest or lowest values.
0 Comments