Python创建DataFrame

发表于 2020-04-30 更新于 2025-08-23 分类于 Python ， Pandas 阅读次数： Waline：

创建 DataFrame 的几种方法。

1 2	class pandas.DataFrame(data=None, index: Optional[Collection] = None, columns: Optional[Collection] = None, dtype: Union[str, numpy.dtype, ExtensionDtype, None] = None, copy: bool = False)

data 参数可以是：ndarray (structured or homogeneous), Iterable, dict, or DataFrame.
Dict can contain Series, arrays, constants, or list-like objects.

由数组/list组成的字典创建 DataFrame

import pandas as pd

d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(d)
df

	col1	col2
0	1	3
1	2	4

看看创建的 DataFrame 元素的类型。

df.dtypes

col1    int64
col2    int64
dtype: object

如果要修改类型

import numpy as np
df2 = pd.DataFrame(d, dtype=np.int8)
df2.dtypes

col1    int8
col2    int8
dtype: object

由 Series 组成的字典创建 DataFrame

df3 = pd.DataFrame({'col1': pd.Series([1, 2]), 'col2': pd.Series([3, 4])})
df3


    col1	col2
0	1	3
1	2	4

由字典组成的列表创建 DataFrame

df4 = pd.DataFrame([{'col1': 1, 'col2': 3}, {'col1': 2, 'col2': 4}])
df4


    col1	col2
0	1	3
1	2	4

由字典组成的字典创建 DataFrame

column 为父字典的 key，index 为子字典的 key。

df4 = pd.DataFrame({'col1': {'idx1': 1, 'idx2': 2}, 'col2': {'idx1': 3, 'idx2': 4}})
df4


        col1	col2
idx1	1	3
idx2	2	4

由二维数组创建 DataFrame

df5 = pd.DataFrame([[1, 3], [2, 4]])
df5


    0	1
0	1	3
1	2	4

当然，也可以自定义 index 和 column

df6 = pd.DataFrame([[1, 3], [2, 4]], index=['a', 'b'], columns=['c1', 'c2'])
df6

	c1	c2
a	1	3
b	2	4

Windows 10环境生成Github SSH公钥

发表于 2020-04-29 更新于 2025-08-23 分类于 Tools ， Git 阅读次数： Waline：

不管是 Windows 还是 Linux，想要不输入用户名和密码操作 Git，只需要把 id_rsa.pub 公钥复制粘贴到 Github 里面就行了。

安装git

打开 Git Bash

$ ssh-keygen -t rsa -C "simon@finolo.gy"

Generating public/private rsa key pair.
Enter file in which to save the key (/c/Users/simon/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /c/Users/simon/.ssh/id_rsa.
Your public key has been saved in /c/Users/simon/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:epRQbzd91E6aLp3aM+ym2FEZMgLNCkA108nb5lsk7CM simon@finolo.gy
The key's randomart image is:
+---[RSA 3072]----+
|   .oo+oo+      o|
|      .++oo  . .o|
|      .. =+ = o=.|
|       .oo=o.+o+.|
|        S+ o oo. |
|       oE + o.+  |
|      . .. +.=   |
|       .  .o..*  |
|          . o+.o |
+----[SHA256]-----+

这个邮箱可以随便输入的，并不需要是 Github.com 的登录帐户。

拷贝 id_rsa.pub 内容到 Github

Settings -> SSH and GPG keys -> New SSH key

把前面生成的 /c/Users/simon/.ssh/id_rsa.pub 文件内容粘贴上去就可以了。

Python Pandas对DataFrame float数据进行四舍五入

发表于 2020-04-28 更新于 2025-08-23 分类于 Python 阅读次数： Waline：

原始数据如图：

我们要对float数据四舍五入，并把单位改为亿元。

方法一

使用 lambda 表达式。

1 2	cap_list = industry_df['MKT_CAP_ARD'].apply(lambda x: round(x / 100000000, 0)).astype(int) cap_list

方法二

1 2	cap_list = round(industry_df['MKT_CAP_ARD'] / 100000000, 0).astype(int) cap_list

最后需要使用 astype 函数把 float 转为 int。不然数据还是会带一位小数的，哪怕是0。