facebook
blue and green pie chart

Pandas

Pandas is an open-source Python library providing high-performance data manipulation and analysis tools using its powerful data structures. The name Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data.

Prior to Pandas, Python was majorly used for data munging and preparation. It had very little contribution to data analysis. Pandas solved this problem. Using Pandas, we can accomplish five typical steps in the processing and analysis of data, regardless of the origin of data load, prepare, manipulate, model, and analyze.

Python with Pandas is used in a wide range of fields including academic and commercial domains including finance, economics, statistics, analytics, etc.

pandas

Introduction to Data Structure with pandas:

Pandas deals with the following three data structures:

  • Series
  • Data Frame
  • Panel

These data structures are built on top of a Numpy array, which means they are fast.

Dimenson & Description:

The best way to think of these data structures is that the higher dimensional data structure is a container of its lower-dimensional data structure. For example, DataFrame is a container of Series, Panel is a container of DataFrame.

Data StructureDimensionsDescription
Series11D labelled immutable.
homogeneous array, size.
Data Frames2General 2D labelled, size-mutable tabular
structure with potentially
heterogeneously typed columns.
Panel3General 3D labelled, size-mutable array.

Building and handling two or more dimensional arrays is a tedious task, a burden is placed. on the user to consider the orientation of the data set when writing functions. But using Pandas data structures, the mental effort of the user is reduced.
For example, with tabular data (DataFrame) it is more semantically helpful to think of the index (the rows) and the columns rather than axis 0 and axis 1.

Mutability

All Pandas data structures are valued mutable (can be changed) and except Series, all are size mutable. Series is size immutable.

Note: DataFrame is widely used and one of the most important data structures. The Panel is very less used.

Series

Series is a one-dimensional array-like structure with homogeneous data. For example, the following series is a collection of integers 10, 23, 56,…

10235617526173902672

Key Points

  • Homogeneous data
  • Size Immutable.
  • Values of Data Mutable

Data Frame

DataFrame is a two-dimensional array with heterogeneous data. For example, The table represents the data of a sales team of an organization with their overall performance rating. The data is represented in rows and columns. Each column represents an attribute and each row represents a person.

NameAgeGenderRating
Steve32Male3.45
Lia28Female4.6
Vin45Male3.9
Katie38Female2.78

Data Type of Columns

The data types of the four columns are as follows:

ColumnTypes
NameString
AgeInteger
GenderString
RatingFloat

Panel

The panel is a three-dimensional data structure with heterogeneous data. It is hard to represent the panel in graphical representation. But a panel can be illustrated as a container of DataFrame.

Leave a Comment

Your email address will not be published. Required fields are marked *

Analogicx

FREE
VIEW