💐Preliminaries💐
[TOC]
online reading
Preliminaries
Data Manipulation
data manipulation vs. data preprocessing
Data Preprocessing: A comprehensive process that includes data manipulation to prepare raw data for analysis and modeling.
Data Manipulation: A subset of preprocessing tasks focused on transforming and organizing data.
data preprocessing includes data manipulation
Data Manipulation
Definition: Data manipulation refers to the process of changing data to make it more organized and easier to analyze. This includes various operations to transform the data.
Common Tasks:
- Merging and Joining: Combining data from multiple sources or tables.
- Sorting: Arranging data in a specific order.
- Filtering: Selecting a subset of data based on conditions.
- Aggregation: Summarizing data, such as calculating averages or sums.
- Reshaping: Changing the structure or format of data, such as pivoting tables.
- Indexing: Selecting specific rows or columns of data.
- Data Cleaning: Correcting or removing incorrect, corrupted, or duplicate data.
Data Manipulation
Definition: Data manipulation refers to the process of changing data to make it more organized and easier to analyze. This includes various operations to transform the data.
Common Tasks:
- Merging and Joining: Combining data from multiple sources or tables.
- Sorting: Arranging data in a specific order.
- Filtering: Selecting a subset of data based on conditions.
- Aggregation: Summarizing data, such as calculating averages or sums.
- Reshaping: Changing the structure or format of data, such as pivoting tables.
- Indexing: Selecting specific rows or columns of data.
- Data Cleaning: Correcting or removing incorrect, corrupted, or duplicate data.
Saving Memory
Python first evaluates Y + X, allocating new memory for the result and then points Y
to this new location in memory.
1 | import torch |
False
We can assign the result of an operation to a previously allocated array Y
by using slice notation: Y[:] = <expression>
. To illustrate this concept, we overwrite the values of tensor Z
, after initializing it, using zeros_like
, to have the same shape as Y
.
1 | Z = torch.zeros_like(Y) |
id(Z): 140381179266448
id(Z): 140381179266448
If the value of X
is not reused in subsequent computations, we can also use X[:] = X + Y
or X += Y
to reduce the memory overhead of the operation.
1 | import torch |
True
Broadcasting
Broadcasting is a feature in numerical libraries like NumPy and PyTorch that lets you perform operations on arrays of different shapes by automatically expanding smaller arrays to match the shape of larger ones. This process is efficient and avoids unnecessary memory usage.
1 | # same shape |
Conversion to Other Python Objects
Converting to a NumPy tensor (ndarray
), or vice versa, is easy. The torch tensor and NumPy array will share their underlying memory, and changing one through an in-place operation will also change the other.
Numpy Array — —- —- PyTorch Tensor
1 | A = X.numpy() |
To convert a size-1 tensor to a Python scalar, we can invoke the item
function or Python’s built-in functions.
1 | a = torch.tensor([3.5]) |
Data Processing
Comma-separated values (CSV) files are ubiquitous for the storing of tabular (spreadsheet-like) data.
CSV files and Excel sheets both store tabular data, but they have key differences:
CSV Files
- Format: Plain text format where data is separated by commas.
- File Extension:
.csv
- Content: Only contains data, no formatting, formulas, or multimedia.
- Compatibility: Can be opened by any text editor, spreadsheet software, or program that supports CSV.
- Size: Typically smaller because it lacks additional features.
Excel Sheets
- Format: Binary or XML-based format for Microsoft Excel.
- File Extensions:
.xls
(older format),.xlsx
(newer format) - Content: Contains data, but also supports complex formatting, formulas, charts, and multimedia.
- Compatibility: Best opened with Excel or similar spreadsheet software (like Google Sheets or LibreOffice Calc).
- Features: Supports advanced features like pivot tables, macros, and data validation.
Summary
CSV files are simple and lightweight for storing plain data, while Excel sheets offer rich features for data manipulation and presentation.
Linear Algebra
Scalars
严格来说,仅包含一个数值被称为标量(scalar)。
We denote scalars by ordinary lower-cased letters .
Scalars are implemented as tensors that contain only one element.
1 | x = torch.tensor(3.0) |
Vectors
you can think of a vector as a fixed-length array of scalars.
For example, if we were training a model to predict the risk of a loan defaulting, we might associate each applicant with a vector whose components correspond to quantities like their income, length of employment, or number of previous defaults.
1 | x = torch.arange(3) |
Matrices
1 | A = torch.arange(6).reshape(3, 2) |
Tensor
While you can go far in your machine learning journey with only scalars, vectors, and matrices, eventually you may need to work with higher-order tensors.
Tensors give us a generic way of describing extensions to n th-order arrays.
Basic Properties of Tensor Arithmetic
The elementwise product of two matrices is called their Hadamard product (denoted ⊙)
Matrix Product vs. Hadamard Product
Reduction (sum)
1 | A: (array([[ 0., 1., 2., 3.], |
0 columns
1 rows
1 | A_sum_axis1 = A.sum(axis=1) |
Non-Reduction Sum
1 | sum_A = A.sum(axis=1, keepdims=True) |
When keepdims=True
is used with NumPy’s sum()
function along a specified axis, it affects the shape of the resulting array by retaining the dimensions of the summed axis. Here’s how it works:
累加
1 | A.cumsum(axis=0) |
dot product
1 | import numpy as np |
Norms
the norm of a vector tells us how big it is
1 | import numpy as np |
Output:
1 | Vector x: [ 3 -4 5] |
L1 Norm (Manhattan Norm): Computes the sum of the absolute values of the vector elements. It measures the distance a taxi would travel in a city grid system.
L2 Norm (Euclidean Norm): Computes the square root of the sum of the squares of the vector elements. It measures the straight-line distance between two points in Euclidean space.
Automatic Differentiation
自动微分
Create Tensor x
:
1 | x = torch.arange(4.0) |
Output:
1 |
|
x
is a tensor with values [0., 1., 2., 3.]
.
Dot Product of x
with Itself:
1 |
|
The dot product of a vector with itself is calculated as follows:
torch.dot(x,x)=x[0]⋅x[0]+x[1]⋅x[1]+x[2]⋅x[2]+x[3]⋅x[3]
Plugging in the values from x
:
torch.dot(x,x)=0⋅0+1⋅1+2⋅2+3⋅3=0+1+4+9=14‘ = 0 + 1 + 4 + 9 = 14
Multiply by 2:
1 | y = 2 * torch.dot(x, x) |
We then multiply the result of the dot product by 2:
y=2⋅14=28
Result:
1 | print(y) |
Output:
1 | tensor(28., grad_fn=<MulBackward0>) |
We can now take the gradient of y
with respect to x
by calling its backward
method. Next, we can access the gradient via x
’s grad
attribute.
1 | y.backward() |
1 | tensor([ 0., 4., 8., 12.]) |
Now let’s calculate another function of x
and take its gradient. Note that PyTorch does not automatically reset the gradient buffer when we record a new gradient. Instead, the new gradient is added to the already-stored gradient. This behavior comes in handy when we want to optimize the sum of multiple objective functions. To reset the gradient buffer, we can call x.grad.zero_()
as follows:
1 | x.grad.zero_() # Reset the gradient |
1 | tensor([1., 1., 1., 1.]) |
y.backward()
: This function call computes gradients using the chain rule of calculus. It calculates ∂y/∂xi for each element xi in x
and stores these gradients in x.grad
.
x.grad
: After calling y.backward()
, x.grad
will be [1.0, 1.0, 1.0]
, because ∂y/∂xi = 1 for each element of x
when y = sum(x)
.
1 | x.grad.zero_() |
1 | tensor([0., 2., 4., 6.]) |
Detaching Computation
https://www.geeksforgeeks.org/tensor-detach-method-in-python-pytorch/
1 | x.grad.zero_() |
1 | tensor([True, True, True, True]) |
因此,尽管 u = y.detach()
分离了 y
,使得 u
不再具有梯度信息,但 u
仍然是一个张量。它与 y
共享相同的数据,但不再与计算图相关联,因此在反向传播过程中不会传播梯度到 u
。
u
是 y
的一个副本,但不再与计算图相关联,因此不会进行梯度跟踪。
假设 x
的初始值为 [1.0, 2.0, 3.0]
,那么:
y = [1.0^2, 2.0^2, 3.0^2] = [1.0, 4.0, 9.0]
u = [1.0, 4.0, 9.0]
(因为u
是y
的副本)z = u * x = [1.0 * 1.0, 4.0 * 2.0, 9.0 * 3.0] = [1.0, 8.0, 27.0]
1 | x.grad.zero_() |
1 | tensor([True, True, True, True]) |
1 | x.grad.zero_() |
u
是通过detach()
方法从y
分离出来的一个张量。这意味着u
是一个与y
具有相同数值的张量,但是它不再与计算图关联,也不再跟踪梯度。因此,可以将u
视为一个常数张量。- 当执行
z.sum().backward()
时,PyTorch 计算z
的和z.sum()
对x
的梯度。因为z = u * x
,所以z
的梯度 = uz.sum()
对x[i]
的梯度是u[i]
,因为在计算图中,u
被视为一个常数,而不是变量。
因此,x.grad
包含的值 [2.0, 4.0, 6.0]
是根据这个推理得出的.
Dynamic control flow
控制流的梯度计算
1 | def f(a): |
让我们计算梯度。
1 | a = torch.randn(size=(), requires_grad=True) |
我们现在可以分析上面定义的f
函数。 请注意,它在其输入a
中是分段线性的。 换言之,对于任何a
,存在某个常量标量k
,使得f(a)=k*a
,其中k
的值取决于输入a
,因此可以用d/a
验证梯度是否正确。
1 | a.grad == d / a |
Dynamic control flow in the context of deep learning frameworks like PyTorch means that the computation and flow of operations can depend on the input data or conditions encountered during runtime. Unlike static control flow where the computational graph is predetermined and fixed before execution, dynamic control flow allows for flexibility in how operations are executed based on the actual data being processed.