workshops

Last Updated: 2023-12-11

Python excels at interactive computational analysis, enabling extensive data exploration and manipulation. Jupyter Notebook seamlessly complements this capability, offering an interactive environment where users can write code, visualize results, and document their workflow. Through its integrated interface, Jupyter provides versatile cells for code execution, rendering outputs, and adding explanatory markdown annotations, making it a powerful tool for data analysis and scientific computing.

Simple Math

At its most fundamental level, Python functions as a versatile computational tool, equipped to handle a wide array of mathematical operations and calculations with ease.

Addition

2+2
## 4

Subtraction

3-2
## 1

Multiplication

3*3
## 9

Division

4/2
## 2.0

Square root

To take the square root, we need to import the Math module first.

import math
math.sqrt(9)
## 3.0

Log10

math.log10(100)
## 2.0

Functions

In Python, especially at the outset, much of your work revolves around applying functions to data. In the preceding section, we encountered various functions, including mathematical operators like sqrt() and log10(). Functions are designed to take data (or values) as input, process them, and yield an output, which is typically displayed in your console by default.

NumPy, a foundational Python library for numerical computations, introduces a crucial data type called NumPyArray. Creating these NumPy arrays becomes indispensable when collaborating with other Python libraries that heavily rely on them, such as SciPy, Pandas, Matplotlib, scikit-learn, among others. NumPy stands out for array manipulation, thanks to its wealth of built-in functions, performance optimizations, and the ability to write concise code.

To harness NumPy’s capabilities, you’ll first need to import the NumPy library.

import numpy as np

In Python, “np” serves as a concise alias for the NumPy library. With this in place, we can proceed to generate sequential data. NumPy equips us with the “arange()” function for making arrays that encompass sequences of numbers.

np.arange(1,11)
## array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In the arragne function above, pay special attention to the range we entered. The end number we entered is not included in the output.

Variables

While it’s convenient to view outputs in the console during interactive analysis, there are scenarios where we must retain data and outputs for future reference. This is accomplished through variable assignment, where we use the equal sign to link the values (objects) on the right with the names (variables) on the left.

my_variable = np.arange(1,11)

And plug it into functions, ie, do computations on it.

my_variable * 2
## array([ 2,  4,  6,  8, 10, 12, 14, 16, 18, 20])

When naming variables in Python, keep in mind the following:

Data Types & Structures

Data types are fundamental constructs in Python, each with distinct characteristics. The three primary data types include numeric, character, and boolean. Numeric data allows for mathematical operations such as addition and division. Character data, often assembled into strings, consists of individual characters or groups of characters. Boolean data is crucial for handling dichotomous (true/false) values. Each data type comes with inherent properties that facilitate specific operations and manipulations.

Main Data Types:

The function type(), will tell you what data type you have…

type(2)
## <class 'int'>
type(2.2)
## <class 'float'>
type("a")
## <class 'str'>
type(True)
## <class 'bool'>

Data Structures

Python offers various data structures to aggregate and organize multiple data elements effectively, especially in data analysis tasks. Below are some of the most frequently used data structures:

Numpy Arrays: These structures, provided by the NumPy library, are efficient containers that hold homogenous data (usually numbers), allowing for vectorized operations and efficient data manipulation.

Pandas Series: The Pandas library offers the Series data structure, which can hold any data type. A Series is a one-dimensional labeled array that can accommodate various data types, making it versatile for data analysis tasks.

Pandas DataFrame: Also within the Pandas library is the DataFrame, a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). It’s especially suitable for handling and analyzing structured data.

Each of these structures comes with a suite of methods and attributes designed to facilitate data manipulation and analysis effectively.

We have previously provided a brief introduction to NumPy arrays. In this section, we will delve deeper into three prominent data structures, offering a more detailed exploration and understanding of each.

NumPy Arrays

import numpy as np
my_array = np.array([1, 2, 3, 4, 5])
my_array
## array([1, 2, 3, 4, 5])
my_array*2
## array([ 2,  4,  6,  8, 10])

Pandas Series

import pandas as pd
my_series = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']) 
my_series
## a    1
## b    2
## c    3
## d    4
## e    5
## dtype: int64

Pandas DataFrame

# Define data in a dictionary
data = {
    'Name': ['John', 'Anna', 'Peter', 'Linda'],
    'Age': [28, 22, 35, 32],
    'City': ['New York', 'Paris', 'Berlin', 'Tokyo']
}

# Create DataFrame
df = pd.DataFrame(data)

# Print DataFrame
print(df)
##     Name  Age      City
## 0   John   28  New York
## 1   Anna   22     Paris
## 2  Peter   35    Berlin
## 3  Linda   32     Tokyo

Getting Help within Jupyter Notebook

Using Python’s Built-in Help System

Invoke the help() function for information on objects, classes, functions, etc.

help(object_name)

Using Question Mark ?

Append or prepend a ? to an object’s name for quick reference.

object_name?
?object_name

For more extensive help, use double question marks ??

object_name??
??object_name
?print

Accessing Documentation Strings (docstrings)

For quick access to a function’s docstring, place the cursor inside the function’s parentheses and press Shift + Tab.

list()

Help Menu in Toolbar

Use the “Help” menu in Jupyter Notebook’s toolbar. It provides links to documentation for Jupyter, IPython, NumPy, Pandas, Matplotlib, and other libraries.

Online Help and Searching

Search for help and documentation online by opening a new browser tab.

To get information on the Pandas DataFrame function, use:

help(pd.DataFrame)

References

General Python Programming:

Data Types in Python:

Data Structures in Python:

NumPy Library and Arrays:

Pandas Library, Series, and DataFrame:

Examples & Tutorials:

Before using any package or library, it is advisable to consult the official documentation to understand its functionalities, capabilities, and usage conventions. The official documentation is the most reliable and up-to-date source of information for Python and its libraries.