Last Updated: 2023-12-11
Python excels at interactive computational analysis, enabling extensive data exploration and manipulation. Jupyter Notebook seamlessly complements this capability, offering an interactive environment where users can write code, visualize results, and document their workflow. Through its integrated interface, Jupyter provides versatile cells for code execution, rendering outputs, and adding explanatory markdown annotations, making it a powerful tool for data analysis and scientific computing.
At its most fundamental level, Python functions as a versatile computational tool, equipped to handle a wide array of mathematical operations and calculations with ease.
Addition
2+2
## 4
Subtraction
3-2
## 1
Multiplication
3*3
## 9
Division
4/2
## 2.0
Square root
To take the square root, we need to import the Math module first.
import math
math.sqrt(9)
## 3.0
Log10
math.log10(100)
## 2.0
In Python, especially at the outset, much of your work revolves around applying functions to data. In the preceding section, we encountered various functions, including mathematical operators like sqrt()
and log10()
. Functions are designed to take data (or values) as input, process them, and yield an output, which is typically displayed in your console by default.
NumPy, a foundational Python library for numerical computations, introduces a crucial data type called NumPyArray. Creating these NumPy arrays becomes indispensable when collaborating with other Python libraries that heavily rely on them, such as SciPy, Pandas, Matplotlib, scikit-learn, among others. NumPy stands out for array manipulation, thanks to its wealth of built-in functions, performance optimizations, and the ability to write concise code.
To harness NumPy’s capabilities, you’ll first need to import the NumPy library.
import numpy as np
In Python, “np” serves as a concise alias for the NumPy library. With this in place, we can proceed to generate sequential data. NumPy equips us with the “arange()” function for making arrays that encompass sequences of numbers.
np.arange(1,11)
## array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
In the arragne function above, pay special attention to the range we entered. The end number we entered is not included in the output.
While it’s convenient to view outputs in the console during interactive analysis, there are scenarios where we must retain data and outputs for future reference. This is accomplished through variable assignment, where we use the equal sign to link the values (objects) on the right with the names (variables) on the left.
my_variable = np.arange(1,11)
And plug it into functions, ie, do computations on it.
my_variable * 2
## array([ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20])
When naming variables in Python, keep in mind the following:
- Names should be descriptive and indicate the variable’s purpose or the kind of value it holds.
- Use lowercase letters.
- If a variable name consists of multiple words, separate them with underscores (e.g.,
variable_name
).- Do not use Python keywords or functions as variable names to prevent confusion and errors.
- Names must start with a letter or an underscore.
- Aside from underscores, avoid using special characters in variablethe code in the long term.res.
Data types are fundamental constructs in Python, each with distinct characteristics. The three primary data types include numeric, character, and boolean. Numeric data allows for mathematical operations such as addition and division. Character data, often assembled into strings, consists of individual characters or groups of characters. Boolean data is crucial for handling dichotomous (true/false) values. Each data type comes with inherent properties that facilitate specific operations and manipulations.
Main Data Types:
The function type()
, will tell you what data type you have…
type(2)
## <class 'int'>
type(2.2)
## <class 'float'>
type("a")
## <class 'str'>
type(True)
## <class 'bool'>
Python offers various data structures to aggregate and organize multiple data elements effectively, especially in data analysis tasks. Below are some of the most frequently used data structures:
Numpy Arrays: These structures, provided by the NumPy library, are efficient containers that hold homogenous data (usually numbers), allowing for vectorized operations and efficient data manipulation.
Pandas Series: The Pandas library offers the Series data structure, which can hold any data type. A Series is a one-dimensional labeled array that can accommodate various data types, making it versatile for data analysis tasks.
Pandas DataFrame: Also within the Pandas library is the DataFrame, a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). It’s especially suitable for handling and analyzing structured data.
Each of these structures comes with a suite of methods and attributes designed to facilitate data manipulation and analysis effectively.
We have previously provided a brief introduction to NumPy arrays. In this section, we will delve deeper into three prominent data structures, offering a more detailed exploration and understanding of each.
numpy
libraryimport numpy as np
my_array = np.array([1, 2, 3, 4, 5])
my_array
## array([1, 2, 3, 4, 5])
my_array*2
## array([ 2, 4, 6, 8, 10])
pandas
libraryimport pandas as pd
my_series = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
my_series
## a 1
## b 2
## c 3
## d 4
## e 5
## dtype: int64
pandas
library# Define data in a dictionary
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 22, 35, 32],
'City': ['New York', 'Paris', 'Berlin', 'Tokyo']
}
# Create DataFrame
df = pd.DataFrame(data)
# Print DataFrame
print(df)
## Name Age City
## 0 John 28 New York
## 1 Anna 22 Paris
## 2 Peter 35 Berlin
## 3 Linda 32 Tokyo
Invoke the help()
function for information on objects, classes, functions, etc.
help(object_name)
?
Append or prepend a ?
to an object’s name for quick reference.
object_name?
?object_name
For more extensive help, use double question marks ??
object_name??
??object_name
?print
For quick access to a function’s docstring, place the cursor inside the function’s parentheses and press Shift + Tab
.
list()
Use the “Help” menu in Jupyter Notebook’s toolbar. It provides links to documentation for Jupyter, IPython, NumPy, Pandas, Matplotlib, and other libraries.
Search for help and documentation online by opening a new browser tab.
To get information on the Pandas DataFrame
function, use:
help(pd.DataFrame)
General Python Programming:
Data Types in Python:
Data Structures in Python:
NumPy Library and Arrays:
Pandas Library, Series, and DataFrame:
Before using any package or library, it is advisable to consult the official documentation to understand its functionalities, capabilities, and usage conventions. The official documentation is the most reliable and up-to-date source of information for Python and its libraries.