Pandas Example



```python
import pandas as pd

# Sample DataFrame
data = {
    'id': [1, 2, 3, 4, 5, 6],
    'name': ['Amit', 'Priya', 'Vikram', 'Anjali', 'Rahul', 'Neha'],
    'age': [24, 27, 22, 32, 29, 35],
    'city': ['Mumbai', 'Delhi', 'Bangalore', 'Chennai', 'Hyderabad', 'Kolkata'],
    'score': [88, 92, 85, 70, 90, 95]
}

df = pd.DataFrame(data)
print(df)
```

This DataFrame contains the following columns:

- `id`: A unique identifier for each person.
- `name`: The name of the person.
- `age`: The age of the person.
- `city`: The city where the person lives.
- `score`: A score assigned to the person.

### Sample Queries

Here are some sample queries you can run on this DataFrame:

1. **Select all rows where age is greater than 25:**
    ```python
    result = df[df['age'] > 25]
    print(result)
    ```

2. **Select names and scores for people who scored more than 85:**
    ```python
    result = df.query('score > 85')[['name', 'score']]
    print(result)
    ```

3. **Select rows where the city is either 'Bangalore' or 'Chennai':**
    ```python
    result = df[df['city'].isin(['Bangalore', 'Chennai'])]
    print(result)
    ```

4. **Select all rows and sort by score in descending order:**
    ```python
    result = df.sort_values(by='score', ascending=False)
    print(result)
    ```

5. **Calculate the average score of all individuals:**
    ```python
    average_score = df['score'].mean()
    print("Average Score:", average_score)
    ```

6. **Group by 'city' and calculate the mean age for each city:**
    ```python
    result = df.groupby('city')['age'].mean()
    print(result)
    ```

7. **Filter rows where the name contains the letter 'a':**
    ```python
    result = df[df['name'].str.contains('a', case=False)]
    print(result)
    ```

8. **Select the top 3 rows with the highest scores:**
    ```python
    result = df.nlargest(3, 'score')
    print(result)
    ```



Sure! Here are some basic querying examples using Pandas, covering different ways to filter and manipulate DataFrame data.

### 1. Boolean Indexing

**Example:** Select all rows where the age is greater than 25.

```python
import pandas as pd

# Sample DataFrame
data = {
    'id': [1, 2, 3, 4, 5, 6],
    'name': ['Amit', 'Priya', 'Vikram', 'Anjali', 'Rahul', 'Neha'],
    'age': [24, 27, 22, 32, 29, 35],
    'city': ['Mumbai', 'Delhi', 'Bangalore', 'Chennai', 'Hyderabad', 'Kolkata'],
    'score': [88, 92, 85, 70, 90, 95]
}

df = pd.DataFrame(data)

# Query
result = df[df['age'] > 25]
print(result)
```

### 2. Using `query` Method

**Example:** Select names and scores for people who scored more than 85.

```python
# Query
result = df.query('score > 85')[['name', 'score']]
print(result)
```

### 3. `loc` and `iloc`

- **`loc`:** Label-based indexing to select specific rows and columns.
- **`iloc`:** Integer-based indexing to select specific rows and columns.

**Example:** Select rows with index labels 1 and 3 and columns 'name' and 'score'.

```python
# Using loc
result = df.loc[[1, 3], ['name', 'score']]
print(result)

# Using iloc
result = df.iloc[[1, 3], [1, 4]]  # Selecting same rows by integer index
print(result)
```

### 4. Combining Multiple Conditions

**Example:** Select all rows where age is greater than 25 and score is less than 90.

```python
# Query
result = df[(df['age'] > 25) & (df['score'] < 90)]
print(result)
```

### 5. Using String Functions

**Example:** Select rows where the name contains the letter 'a'.

```python
# Query
result = df[df['name'].str.contains('a', case=False)]
print(result)
```

### 6. Using `isin` Method

**Example:** Select rows where the city is either 'Bangalore' or 'Chennai'.

```python
# Query
result = df[df['city'].isin(['Bangalore', 'Chennai'])]
print(result)
```

### 7. Sorting

**Example:** Select all rows and sort by score in descending order.

```python
# Query
result = df.sort_values(by='score', ascending=False)
print(result)
```

### 8. Aggregations

**Example:** Calculate the average score of all individuals.

```python
# Query
average_score = df['score'].mean()
print("Average Score:", average_score)
```

### 9. Grouping

**Example:** Group by 'city' and calculate the mean age for each city.

```python
# Query
result = df.groupby('city')['age'].mean()
print(result)
```

### 10. Selecting Top N Rows

**Example:** Select the top 3 rows with the highest scores.

```python
# Query
result = df.nlargest(3, 'score')
print(result)
```


These examples demonstrate various ways to query and manipulate data in a Pandas DataFrame:

- **Boolean Indexing:** For simple conditions.
- **`query` Method:** For SQL-like syntax.
- **`loc` and `iloc`:** For label-based and integer-based selection.
- **Combining Conditions:** For complex filtering.
- **String Functions:** For filtering based on string patterns.
- **`isin` Method:** For checking membership in a list.
- **Sorting:** For ordering data.
- **Aggregations:** For calculating summary statistics.
- **Grouping:** For aggregating data based on groups.
- **Top N Rows:** For selecting the highest or lowest values.



Contact us for software training, education or development










 

Post a Comment

0 Comments

Me