The `pivot_table` method creates a spreadsheet-style pivot table, allowing you to summarize and aggregate data based on specified index, columns, and values, with support for various aggregation functions. It facilitates data analysis by reorganizing data for better insights.
Multi-indexes allow pandas DataFrames to have multiple levels of indexing on rows and/or columns. They enable more complex data structures and facilitate hierarchical data organization, making it easier to perform operations like grouping, reshaping, and selecting subsets of data based on multiple keys.
`loc` is label-based and used for selecting rows and columns by their labels or boolean arrays. `iloc` is integer position-based and selects by integer indices. For example, `df.loc[2, 'A']` selects the value in row labeled 2 and column 'A', while `df.iloc[2, 0]` selects the value at the third row and first column by position.
You can merge two DataFrames using the `pd.merge()` function, specifying the keys to join on and the type of join (e.g., inner, outer, left, right). Alternatively, `df1.join(df2)` can be used for joining on indexes, and `pd.concat([df1, df2])` can concatenate along a particular axis.
Pandas provides functions like `pd.read_csv()`, `pd.read_excel()`, `pd.read_json()`, `pd.read_sql()`, and `pd.read_html()` to read various file formats. Similarly, you can write DataFrames using methods like `df.to_csv()`, `df.to_excel()`, `df.to_json()`, and `df.to_sql()` to export data to different formats.
Data filtering and selection can be done using boolean indexing, the `query()` method, `loc` and `iloc` for label or position-based selection, and conditions applied to DataFrame columns. For example, `df[df['age'] > 30]` filters rows where the 'age' column is greater than 30.
Performance can be optimized by using efficient data types (e.g., categorical data), avoiding unnecessary copies, leveraging vectorized operations instead of loops, using built-in pandas functions, applying chunk processing for large datasets, indexing appropriately, and utilizing parallel processing or libraries like Dask for handling very large DataFrames.
Missing data can be handled using methods like `df.dropna()` to remove missing values, `df.fillna()` to fill them with specified values or strategies (e.g., mean, median), and `df.isnull()` or `df.notnull()` to detect missing values. Additionally, interpolation methods can estimate missing data.
The `groupby` function is used to split a DataFrame into groups based on one or more keys, apply a function to each group independently, and then combine the results. It is commonly used for aggregation, transformation, and filtration operations, such as calculating group-wise statistics.
You can apply functions using the `df.apply()` method, specifying `axis=0` for columns or `axis=1` for rows. Additionally, vectorized operations or specific methods like `df.applymap()` for element-wise operations can be used for efficiency.