0%

Jupyter notebook的github版本管理

通过 Jupyter Notebook 做数据研究不错,但版本控制是个问题。后来找到一个最佳实践。

在保存ipynb文件之前,自动做一个ipynb转到py文件的转换,然后只把py文件提交到github上面。

生成jupyter notebook配置文件

1
jupyter notebook --generate-config

运行后会生成 ~/.jupyter/ipython_notebook_config.py 文件

编辑配置文件

添如下内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
### If you want to auto-save .html and .py versions of your notebook:
# modified from: https://github.com/ipython/ipython/issues/8009
# Solution2: https://jupyter-notebook.readthedocs.io/en/stable/extending/savehooks.html
import os
from subprocess import check_call
import re

def clear_prompt(dir_path, nb_fname, log_func):
"""remove the number in '# In[ ]:'"""
name, ext = os.path.splitext(nb_fname)
pattern = re.compile(r'^# In\[\d+\]:')

for n_ext in ['.py', '.txt']:
script_name = os.path.join(dir_path, name+n_ext)
if os.path.exists(script_name):
new_lines = []
with open(script_name, 'rt', encoding='utf-8') as f:
lines = f.readlines()
for line in lines:
new_line = re.sub(pattern, '# In[ ]:', line)
new_lines.append(new_line)
with open(script_name, 'wt', encoding='utf-8') as f:
f.writelines(new_lines)
log_func('Remove number in "# In[ ]:"! File Name: %s' % script_name)
break

def post_save(model, os_path, contents_manager):
"""post-save hook for converting notebooks to .py scripts"""
if model['type'] != 'notebook':
return # only do this for notebooks
d, fname = os.path.split(os_path)
check_call(['jupyter', 'nbconvert', '--to', 'script', fname], cwd=d) # '--no-prompt',
log = contents_manager.log
# log.info('Filename:%s'%fname)
clear_prompt(d, fname, log.info)
# check_call(['ipython', 'nbconvert', '--to', 'html', fname], cwd=d)

c.FileContentsManager.post_save_hook = post_save

重启jupyter notebook,配置生效。
当保存ipynb文件时,会自动生成py文件。

配置github的.gitignore文件

1
*.ipynb

设置以后,可能会发现规则没有生效。在项目根目录,执行如下命令:

1
git rm -r --cached .