- 文章信息
- 作者: kaiwu
- 点击数:124
问卷
http://kaiwu.city/openfiles/tourist_satisfaction_questionnaire_cn.pdf
http://kaiwu.city/openfiles/tourist.csv
http://kaiwu.city/openfiles/tourist_CN_python1.ipynb
1.python基础
python介绍
http://kaiwu.city/index.php/python
python推荐书籍
http://kaiwu.city/index.php/python-book
2.python软件的安装
http://kaiwu.city/index.php/python-vscode
3.使用python控制Excel
http://kaiwu.city/openfiles/hotel50python.xlsx
http://kaiwu.city/openfiles/python_excel50hotel.ipynb
4.豆瓣电影的简单分析
https://od.lk/d/165592124_vkZkQ/webscraping_douban_top150.ipynb
![](/images/excel_xlsx.png)
https://od.lk/d/165592122_b5z4S/analysis_douban_top150.ipynb
http://kaiwu.city/openfiles/analysis_douban_top150.ipynb
1.准备工作¶
1.1 import libraries¶
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import matplotlib
# 预设字体格式,并传给rc方法
font = {'family': 'SimSun', "size": 16}
# 设置 字体
matplotlib.rc('font', **font)
# 设定文件存储目录
datafolder='D:/tdata/'
# 导入数据
# 从网盘链接(直链)导入数据
#df = pd.read_csv('https://od.lk/d/179075968_VxeGV/movie150clean.csv', index_col=0)
# 从本地磁盘导入数据
df = pd.read_csv(datafolder+'movie150clean.csv', index_col=0)
df1.tail(5)
movie_weblink | photo_weblink | cn_name | fr_name | rating | numbers | movie_sentence | directors | actors | ryear | country | theme | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
n | ||||||||||||
146 | https://movie.douban.com/subject/1307315/ | https://img9.doubanio.com/view/photo/s_ratio_p... | 哪吒闹海 | 9.1 | 208858 | 想你时你在闹海。 | 严定宪 Dingxian Yan / 王树忱 Shuchen Wang | 梁正晖 Zhenghui | 1979 | 中国大陆 | 冒险 动画 奇幻 | |
147 | https://movie.douban.com/subject/26628357/ | https://img9.doubanio.com/view/photo/s_ratio_p... | 一个叫欧维的男人决定去死 | En man som heter Ove | 8.9 | 346098 | 惠及一生的美丽。 | 汉内斯·赫尔姆 Hannes Holm | 罗夫·拉斯加德 Rolf Lassgård | 2015 | 瑞典 | 剧情 |
148 | https://movie.douban.com/subject/6307447/ | https://img9.doubanio.com/view/photo/s_ratio_p... | 被解救的姜戈 | Django Unchained | 8.8 | 499185 | 热血沸腾,那个低俗、性感的无耻混蛋又来了。 | 昆汀·塔伦蒂诺 Quentin Tarantino | 杰米·福克斯 Jamie Foxx | 2012 | 美国 | 剧情 动作 西部 冒险 |
149 | https://movie.douban.com/subject/1295399/ | https://img9.doubanio.com/view/photo/s_ratio_p... | 七武士 | 七人の侍 | 9.3 | 158687 | 时代悲歌。 | 黑泽明 Akira Kurosawa | 三船敏郎 Toshirô Mifune | 1954 | 日本 | 动作 冒险 剧情 |
150 | https://movie.douban.com/subject/1395091/ | https://img9.doubanio.com/view/photo/s_ratio_p... | 未麻的部屋 | Perfect Blue | 9.0 | 242480 | 好的剧本是,就算你猜到了结局也猜不到全部。 | 今敏 Satoshi Kon | 岩男润子 Junko Iwao / 松本梨香 Rica Matsu | 1997 | 日本 | 动画 奇幻 惊悚 |
2.频数分析¶
2.1频数分析:电影上映年份¶
#df数据框的ryear变量,数据类型改为【整数】(int)
df['ryear'] = df['ryear'].astype(int)
df['ryear'].dtype
dtype('int32')
df['ryear'] 电影上映年份的频数分析——条形图
df["ryear"].value_counts().sort_index().plot(kind="bar")
<Axes: >
df['ryear'] 电影上映年份的频数分析——条形图 修改图的大小(15,7)——宽15英寸,高7英寸 柱形图的颜色设定为【green】——绿色 颜色的列表参考https://html-color.codes/
df["ryear"].value_counts().sort_index().plot(kind="bar",figsize=(15,7),color="green")
<Axes: >
df['ryear'] 电影上映年份的频数分析——条形图 给柱形图添加频数数值
plot=df["ryear"].value_counts().sort_index().plot(kind="bar",figsize=(15,7),color="orange")
for p in plot.patches:
plot.annotate(str(p.get_height()), (p.get_x() * 1.005, p.get_height() * 1.005))
df_year=df["ryear"].value_counts().to_frame()
df_year.reset_index(inplace=True)
df_year = df_year.rename(columns = {'index':'year','ryear':'freq'})
df_year['year'] = df_year['year'].astype(int)
print(df_year.columns)
df_year.sort_values(by='year')
Index(['year', 'freq'], dtype='object')
year | freq | |
---|---|---|
33 | 1936 | 1 |
28 | 1939 | 1 |
41 | 1953 | 1 |
44 | 1954 | 1 |
24 | 1957 | 2 |
40 | 1961 | 1 |
36 | 1965 | 1 |
35 | 1972 | 1 |
39 | 1974 | 1 |
37 | 1975 | 1 |
30 | 1979 | 1 |
31 | 1983 | 1 |
29 | 1984 | 1 |
43 | 1986 | 1 |
23 | 1987 | 2 |
27 | 1988 | 2 |
38 | 1989 | 1 |
26 | 1990 | 2 |
34 | 1991 | 1 |
42 | 1992 | 1 |
13 | 1993 | 5 |
0 | 1994 | 9 |
3 | 1995 | 7 |
32 | 1996 | 1 |
4 | 1997 | 7 |
17 | 1998 | 4 |
9 | 1999 | 5 |
20 | 2000 | 3 |
5 | 2001 | 6 |
11 | 2002 | 5 |
16 | 2003 | 5 |
2 | 2004 | 8 |
25 | 2005 | 2 |
10 | 2006 | 5 |
14 | 2008 | 5 |
6 | 2009 | 6 |
1 | 2010 | 9 |
7 | 2011 | 6 |
18 | 2012 | 4 |
8 | 2013 | 5 |
15 | 2014 | 5 |
19 | 2015 | 3 |
12 | 2016 | 5 |
22 | 2017 | 3 |
21 | 2018 | 3 |
df['ryear'] 电影上映年份的频数分析——条形图 修改图的大小(15,7)——宽15英寸,高7英寸 柱形图的颜色设定为【orange】——橙色 颜色的列表参考https://html-color.codes/
保存图片到本地
xs =df_year["year"]
ys =df_year["freq"]
plt.figure(figsize=(15,7))
plt.bar(xs, ys, color='orange')
# html color codes https://html-color.codes/
for x,y in zip(xs,ys):
label = "{:.0f}".format(y)
plt.annotate(label, # this is the text
(x,y), # these are the coordinates to position the label
textcoords="offset points", # how to position the text
xytext=(0,2), # distance from text to points (x,y)
ha='center') # horizontal alignment can be left, right or center
plt.savefig(datafolder+"movie_yearly.jpg",dpi=600) #保存图片到本地
plt.show()
2.2频数分析:电影产地(国家或地区)¶
# 字符变量,拆分为具体国家地区,一个多选题,
df_country=df['country'].str.split(' ',expand=True)
df_country.head(5)
0 | 1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|---|
n | ||||||
1 | 美国 | None | None | None | None | None |
2 | 中国大陆 | 中国香港 | None | None | None | None |
3 | 美国 | None | None | None | None | None |
4 | 法国 | 美国 | None | None | None | None |
5 | 美国 | 墨西哥 | 澳大利亚 | 加拿大 | None | None |
对多选题做频数统计
df_country1=df_country.stack().value_counts().to_frame()
df_country1.reset_index(inplace=True)
df_country1 = df_country1.rename(columns = {'index':'country',0:'freq'})
print(df_country1.columns)
df_country1.sort_values(by='freq',ascending=False)
Index(['country', 'freq'], dtype='object')
country | freq | |
---|---|---|
0 | 美国 | 84 |
1 | 日本 | 22 |
2 | 英国 | 19 |
3 | 中国香港 | 15 |
4 | 中国大陆 | 14 |
5 | 法国 | 12 |
6 | 德国 | 9 |
7 | 韩国 | 9 |
8 | 意大利 | 8 |
9 | 加拿大 | 6 |
12 | 新西兰 | 3 |
13 | 中国台湾 | 3 |
11 | 澳大利亚 | 3 |
10 | 瑞士 | 3 |
14 | 印度 | 2 |
15 | 瑞典 | 2 |
16 | 伊朗 | 1 |
17 | 荷兰 | 1 |
18 | 巴西 | 1 |
19 | 丹麦 | 1 |
20 | 卡塔尔 | 1 |
21 | 西班牙 | 1 |
22 | 波兰 | 1 |
23 | 塞浦路斯 | 1 |
24 | 黎巴嫩 | 1 |
25 | 墨西哥 | 1 |
26 | 奥地利 | 1 |
# 频数表输出为一个csv文件
df_country1.to_csv(datafolder+'movie_country.csv')
xs =df_country1["country"]
ys = df_country1["freq"]
width1 =0.4
plt.figure(figsize=(20,7))
plt.bar(xs, ys, width=width1,color='#ffb01f')
# html color codes https://html-color.codes/
plt.xticks(rotation=45)
plt.show()
xs =df_country1["country"]
ys = df_country1["freq"]
width1 =0.4
plt.figure(figsize=(20,7))
plt.bar(xs, ys, width=width1,color='#ffb01f')
# html color codes https://html-color.codes/
plt.xticks(rotation=45)
for x,y in zip(xs,ys):
label = "{:.0f}".format(y)
plt.annotate(label, # this is the text
(x,y), # these are the coordinates to position the label
textcoords="offset points", # how to position the text
xytext=(0,2), # distance from text to points (x,y)
ha='center') # horizontal alignment can be left, right or center
plt.savefig(datafolder+"movie_region1.png",dpi=600,format="png")
plt.show()
plt.figure(figsize=(12,10))
plt.barh(xs,ys,color='#b85cff')
for x,y in zip(xs,ys):
label = "{:.0f}".format(y)
plt.annotate(label, # this is the text
(y,x), # these are the coordinates to position the label
textcoords="offset points", # how to position the text
xytext=(10,-5), # distance from text to points (x,y)
ha='center') # horizontal alignment can be left, right or center
plt.savefig(datafolder+"movie_region2.jpg",dpi=600) #保存图片到本地
plt.show()
2.3 频数分析:电影主题¶
df_theme=df['theme'].str.split(' ',expand=True)
df_theme.head(5)
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
n | |||||
1 | 犯罪 | 剧情 | None | None | None |
2 | 剧情 | 爱情 | 同性 | None | None |
3 | 剧情 | 爱情 | None | None | None |
4 | 剧情 | 动作 | 犯罪 | None | None |
5 | 剧情 | 爱情 | 灾难 | None | None |
df_theme1=df_theme.stack().value_counts().to_frame()
df_theme1['theme']=df_theme1.index
df_theme1 = df_theme1.rename(columns = {1:'theme',0:'freq'})
print(df_theme1.columns)
df_theme1
Index(['freq', 'theme'], dtype='object')
freq | theme | |
---|---|---|
剧情 | 114 | 剧情 |
喜剧 | 34 | 喜剧 |
爱情 | 34 | 爱情 |
奇幻 | 33 | 奇幻 |
冒险 | 32 | 冒险 |
动画 | 24 | 动画 |
犯罪 | 21 | 犯罪 |
动作 | 18 | 动作 |
惊悚 | 17 | 惊悚 |
悬疑 | 16 | 悬疑 |
科幻 | 11 | 科幻 |
传记 | 10 | 传记 |
家庭 | 10 | 家庭 |
战争 | 9 | 战争 |
历史 | 6 | 历史 |
音乐 | 5 | 音乐 |
古装 | 5 | 古装 |
歌舞 | 4 | 歌舞 |
同性 | 4 | 同性 |
灾难 | 2 | 灾难 |
西部 | 2 | 西部 |
儿童 | 2 | 儿童 |
纪录片 | 2 | 纪录片 |
武侠 | 2 | 武侠 |
运动 | 1 | 运动 |
情色 | 1 | 情色 |
type(df_theme1)
pandas.core.frame.DataFrame
# 保存频数表
df_theme1.to_csv('D:/tdata/df_theme1.csv')
xs =df_theme1["theme"]
ys = df_theme1["freq"]
plt.figure(figsize=(18,7))
plt.bar(xs, ys, color='#ffb01f')
# html color codes https://html-color.codes/
#plt.xticks(rotation=45)
for x,y in zip(xs,ys):
label = "{:.0f}".format(y)
plt.annotate(label, # this is the text
(x,y), # these are the coordinates to position the label
textcoords="offset points", # how to position the text
xytext=(0,3), # distance from text to points (x,y)
ha='center') # horizontal alignment can be left, right or center
plt.savefig(datafolder+"movie_theme1.jpg",dpi=600,format="jpg") #保存图片到本地
plt.show()
plt.figure(figsize=(12,10))
plt.barh(xs, ys,color='#b85cff')
for x,y in zip(xs,ys):
label = "{:.0f}".format(y)
plt.annotate(label, # this is the text
(y,x), # these are the coordinates to position the label
textcoords="offset points", # how to position the text
xytext=(15,-5), # distance from text to points (x,y)
ha='center') # horizontal alignment can be left, right or center
plt.savefig(datafolder+"movie_theme2.jpg",dpi=600) #保存图片到本地
plt.show()
3.电影评分的均值¶
3.1 电影评分均值:上映年份¶
agg_year=df.groupby("ryear")[['rating']].agg('mean')
agg_year['year']=agg_year.index
print(agg_year.columns)
agg_year.head(5)
Index(['rating', 'year'], dtype='object')
rating | year | |
---|---|---|
ryear | ||
1936 | 9.3 | 1936 |
1939 | 9.3 | 1939 |
1953 | 9.0 | 1953 |
1954 | 9.3 | 1954 |
1957 | 9.5 | 1957 |
plt.figure(figsize=(15,7))
xs=agg_year['year']
ys=agg_year['rating']
plt.plot(xs, ys,color='red', marker='o')
plt.grid(True)
plt.ylim(8, 10)
plt.show()
#set font of all elements to size 12
plt.rc('font', size=12)
plt.figure(figsize=(15,7))
xs=agg_year['year']
ys=agg_year['rating']
plt.plot(xs, ys,color='red', marker='o')
plt.grid(True)
plt.ylim(8, 10)
plt.xticks(rotation=45)
for x,y in zip(xs,ys):
label = "{:.1f}".format(y)
plt.annotate(label, # this is the text
(x,y), # these are the coordinates to position the label
textcoords="offset points", # how to position the text
xytext=(0,10), # distance from text to points (x,y)
ha='center',# horizontal alignment can be left, right or cent
fontsize=12)
plt.savefig(datafolder+"movie_rating_yearly.jpg",dpi=600) #保存图片到本地
plt.show()
3.2 电影评分均值:电影产地(国家或地区)¶
df_country1
country | freq | |
---|---|---|
0 | 美国 | 84 |
1 | 日本 | 22 |
2 | 英国 | 19 |
3 | 中国香港 | 15 |
4 | 中国大陆 | 14 |
5 | 法国 | 12 |
6 | 德国 | 9 |
7 | 韩国 | 9 |
8 | 意大利 | 8 |
9 | 加拿大 | 6 |
10 | 瑞士 | 3 |
11 | 澳大利亚 | 3 |
12 | 新西兰 | 3 |
13 | 中国台湾 | 3 |
14 | 印度 | 2 |
15 | 瑞典 | 2 |
16 | 伊朗 | 1 |
17 | 荷兰 | 1 |
18 | 巴西 | 1 |
19 | 丹麦 | 1 |
20 | 卡塔尔 | 1 |
21 | 西班牙 | 1 |
22 | 波兰 | 1 |
23 | 塞浦路斯 | 1 |
24 | 黎巴嫩 | 1 |
25 | 墨西哥 | 1 |
26 | 奥地利 | 1 |
df_country1['rating']=0.0
for i in range(0,27):
temp=0.0
for j in range(1,151):
if df_country1.country[i] in df1.country[j]:
temp=temp + df1.rating[j]
#print(temp)
df_country1.rating[i] = round(temp / df_country1.freq[i],3)
C:\Users\kaiwu\AppData\Local\Temp\ipykernel_8656\2036557035.py:8: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_country1.rating[i] = round(temp / df_country1.freq[i],3)
df_country1
country | freq | rating | |
---|---|---|---|
0 | 美国 | 84 | 9.029 |
1 | 日本 | 22 | 9.018 |
2 | 英国 | 19 | 9.037 |
3 | 中国香港 | 15 | 8.953 |
4 | 中国大陆 | 14 | 9.121 |
5 | 法国 | 12 | 9.050 |
6 | 德国 | 9 | 9.011 |
7 | 韩国 | 9 | 8.956 |
8 | 意大利 | 8 | 9.175 |
9 | 加拿大 | 6 | 9.083 |
10 | 瑞士 | 3 | 9.100 |
11 | 澳大利亚 | 3 | 9.000 |
12 | 新西兰 | 3 | 9.167 |
13 | 中国台湾 | 3 | 9.100 |
14 | 印度 | 2 | 9.100 |
15 | 瑞典 | 2 | 9.000 |
16 | 伊朗 | 1 | 9.200 |
17 | 荷兰 | 1 | 8.900 |
18 | 巴西 | 1 | 8.900 |
19 | 丹麦 | 1 | 9.100 |
20 | 卡塔尔 | 1 | 9.100 |
21 | 西班牙 | 1 | 8.800 |
22 | 波兰 | 1 | 9.200 |
23 | 塞浦路斯 | 1 | 9.100 |
24 | 黎巴嫩 | 1 | 9.100 |
25 | 墨西哥 | 1 | 9.400 |
26 | 奥地利 | 1 | 8.800 |
df_country2 = df_country1.sort_values('rating', ascending=False)
xs =df_country2["country"]
ys = df_country2["rating"]
width1 =0.4
plt.figure(figsize=(20,7))
plt.bar(xs, ys, width=width1,color='#ffb01f')
# html color codes https://html-color.codes/
plt.xticks(rotation=45)
plt.ylim(8, 10)
for x,y in zip(xs,ys):
label = "{:.2f}".format(y)
plt.annotate(label, # this is the text
(x,y), # these are the coordinates to position the label
textcoords="offset points", # how to position the text
xytext=(0,2), # distance from text to points (x,y)
ha='center') # horizontal alignment can be left, right or center
plt.savefig(datafolder+"movie_rating_region1.jpg",dpi=600) #保存图片到本地
plt.show()
3.3 电影评分均值:电影主题¶
df_theme1.shape
(26, 2)
df_theme1['rating']=0.0
for i in range(0,26):
temp=0.0
for j in range(1,151):
if df_theme1.theme[i] in df.theme[j]:
temp=temp + df.rating[j]
#print(temp)
df_theme1.rating[i] = round(temp / df_theme1.freq[i],3)
C:\Users\kaiwu\AppData\Local\Temp\ipykernel_8656\474625504.py:8: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_theme1.rating[i] = round(temp / df_theme1.freq[i],3)
df_theme1
freq | theme | rating | |
---|---|---|---|
剧情 | 114 | 剧情 | 9.058 |
喜剧 | 34 | 喜剧 | 8.965 |
爱情 | 34 | 爱情 | 8.997 |
奇幻 | 33 | 奇幻 | 8.967 |
冒险 | 32 | 冒险 | 8.991 |
动画 | 24 | 动画 | 9.008 |
犯罪 | 21 | 犯罪 | 9.067 |
动作 | 18 | 动作 | 8.967 |
惊悚 | 17 | 惊悚 | 8.894 |
悬疑 | 16 | 悬疑 | 8.931 |
科幻 | 11 | 科幻 | 9.055 |
传记 | 10 | 传记 | 9.030 |
家庭 | 10 | 家庭 | 9.050 |
战争 | 9 | 战争 | 9.133 |
历史 | 6 | 历史 | 9.167 |
音乐 | 5 | 音乐 | 9.160 |
古装 | 5 | 古装 | 8.860 |
歌舞 | 4 | 歌舞 | 9.075 |
同性 | 4 | 同性 | 9.050 |
灾难 | 2 | 灾难 | 9.000 |
西部 | 2 | 西部 | 8.850 |
儿童 | 2 | 儿童 | 8.950 |
纪录片 | 2 | 纪录片 | 9.400 |
武侠 | 2 | 武侠 | 8.700 |
运动 | 1 | 运动 | 9.000 |
情色 | 1 | 情色 | 8.900 |
df_theme2 = df_theme1.sort_values('rating', ascending=False)
xs =df_theme2["theme"]
ys = df_theme2["rating"]
plt.figure(figsize=(20,7))
plt.bar(xs, ys, color='#ffb01f')
plt.ylim(8, 10)
# html color codes https://html-color.codes/
#plt.xticks(rotation=45)
for x,y in zip(xs,ys):
label = "{:.2f}".format(y)
plt.annotate(label, # this is the text
(x,y), # these are the coordinates to position the label
textcoords="offset points", # how to position the text
xytext=(0,3), # distance from text to points (x,y)
ha='center') # horizontal alignment can be left, right or center
plt.savefig(datafolder+"movie_rating_theme.jpg",dpi=600) #保存图片到本地
plt.show()
- 文章信息
- 作者: kaiwu
- 点击数:355
可以通过anaconda安装python(http://kaiwu.city/index.php/python),操作极为简单;但是anaconda程序很大,而且其包含的python及python拓展模块library不是最新的版本。
直接安装python,并以微软的开源软件 Visual Studio Code(https://code.visualstudio.com/)作为IDE是不错的选择。
微软提供了详细的安装指南
https://docs.microsoft.com/en-us/learn/modules/python-install-vscode/
Get started with Python in Visual Studio
Get started with learning Python by installing and configuring the tools you'll need to build real applications.
Learning objectives
By the end of this module, you'll be able to:
- Install Python 3, if needed.
- Install and configure Visual Studio Code and extensions on your computer.
- Create a Python file.
- Write and run Python code in Visual Studio Code.
Prerequisites
- Ability to install programs locally.
- Basic familiarity with programming concepts.
This module is part of these learning paths
1.安装python
访问python官网,下载安装程序
https://www.python.org/downloads/
2.安装VScode
https://code.visualstudio.com/
3.安装VScode的拓展程序
(1)code runner
(2)python
(3)jupyter notebook
4.以vscode作为IDE,新建、编辑、运行python程序
- 文章信息
- 作者: kaiwu
- 点击数:276
1.ollama
https://github.com/ollama/ollama
https://ollama.com/blog/llama3
![](/images/ollama.png)
Get up and running with large language models locally.
1.1 下载并安装ollama
macOS
Windows preview
curl -fsSL https://ollama.com/install.sh | sh
https://hub.docker.com/r/ollama/ollama
1.2 Ollama可以载入的大语言模型
Model | Parameters | Size | Download |
---|---|---|---|
Llama 3 | 8B | 4.7GB | ollama run llama3 |
Llama 3 | 70B | 40GB | ollama run llama3:70b |
Phi-3 | 3.8B | 2.3GB | ollama run phi3 |
Mistral | 7B | 4.1GB | ollama run mistral |
Neural Chat | 7B | 4.1GB | ollama run neural-chat |
Starling | 7B | 4.1GB | ollama run starling-lm |
Code Llama | 7B | 3.8GB | ollama run codellama |
Llama 2 Uncensored | 7B | 3.8GB | ollama run llama2-uncensored |
LLaVA | 7B | 4.5GB | ollama run llava |
Gemma | 2B | 1.4GB | ollama run gemma:2b |
Gemma | 7B | 4.8GB | ollama run gemma:7b |
Solar | 10.7B | 6.1GB | ollama run solar |
用于ollama的模型:
2.中文相关的几个模型
2.1 llama2-chinese
https://ollama.com/library/llama2-chinese
Llama 2 based model fine tuned to improve Chinese dialogue ability.
ollama run llama2-chinese
2.2 qwen
https://ollama.com/library/qwen
Qwen 1.5 is a series of large language models by Alibaba Cloud spanning from 0.5B to 110B parameters
343.2K Pulls379 TagsUpdated 2 weeks ago
ollama run qwen
2.3 yi
Yi is a series of large language models trained on a high-quality corpus of 3 trillion tokens that support both the English and Chinese languages.
https://ollama.com/library/yi:34b
https://huggingface.co/01-ai/Yi-34B
ollama run yi
ollama run yi:9b
ollama run yi:34b
Yi-34B-Chat model landed in second place (following GPT-4 Turbo), outperforming other LLMs (such as GPT-4, Mixtral, Claude) on the AlpacaEval Leaderboard (based on data available up to January 2024
Yi-34B model ranked first among all existing open-source models (such as Falcon-180B, Llama-70B, Claude) in both English and Chinese on various benchmarks, including Hugging Face Open LLM Leaderboard (pre-trained) and C-Eval (based on data available up to November 2023).
- 文章信息
- 作者: kaiwu
- 点击数:1299
2024-05-15
1.1questionnaire
http://kaiwu.city/openfiles/tourist_satisfaction_questionnaire_cn.pdf
http://kaiwu.city/openfiles/data_tourist_cn.sav
http://kaiwu.city/openfiles/analysis_tourist_cn.sps
4.两个变量之间的关系relationship between two variables
types of variables (level of measurment)
https://statistics.laerd.com/statistical-guides/types-of-variable.php
https://www.thoughtco.com/independent-and-dependent-variable-examples-606828
https://datatab.net/tutorial/level-of-measurement
4.1 卡方检验chi-square test
2023-11-21
http://kaiwu.city/openfiles/tourist_CN.zip
http://kaiwu.city/index.php/analysis-using-excel
VBA
http://kaiwu.city/openfiles/Excel_VBA_example.xlsm
http://kaiwu.city/index.php/spss-and-pspp
how to choose statistical procedures.
http://kaiwu.city/index.php/statistical-map
SPSS相关书籍
books on SPSS:https://www.douban.com/doulist/45508075/
- Collier, J. (2009). Using SPSS Syntax: A Beginner’s Guide. SAGE.
- Coolican, H. (2019). Research Methods and Statistics in Psychology (7th). Routledge.
- Dancey, C. P., & Reidy, J. (2020). Statistics Without Maths for Psychology (8th). Pearson.
- Field, A. (2009). Discovering Statistics Using SPSS (3rd). SAGE.
- Field, A. (2017). Discovering Statistics Using IBM SPSS Statistics (5th). SAGE.
- George, D., & Mallery, P. (2022). IBM SPSS Statistics 27 Step by Step: A Simple Guide and Reference (17th). Routledge.
- Harrison, V., Kemp, R., Brace, N., & Snelgar, R. (2022). SPSS for Psychologists (7th). Bloomsbury Academic.
- Howitt, D., & Cramer, D. (2017). Understanding Statistics in Psychology with SPSS (7th). Pearson.
- Mesquita, J. M. C. de, & Kostelijk, E. (2022). Marketing Analytics: Statistical Tools for Marketing and Consumer Behaviour Using SPSS. Routledge.
- Otieno Okello, G. (2023). Simplified Business Statistics Using SPSS. CRC Press.
- Roni, S. M., & Djajadikerta, H. G. (2021). Data Analysis with SPSS for Survey-based Research. Springer.
- Sarstedt, M., & Mooi, E. (2019). A Concise Guide to Market Research: The Process, Data, and Methods Using IBM SPSS Statistics (3rd). Springer.
- Singh Kaurav, R. P., Gursoy, D., & Chowdhary, N. (2021). An SPSS Guide for Tourism, Hospitality and Events Researchers. Routledge.
- Verma, J. P. (2016). Sports Research with Analytical Solution using SPSS . Wiley.
- 郭志刚. (2015). 社会统计分析方法: SPSS软件应用. 中国人民大学出版社
-
吴明隆. (2010). 问卷统计分析实务:SPSS操作与应用. 重庆大学出版社.
- 张文彤. (2002a). 世界优秀统计工具SPSS11.0统计分析教程(基础篇). 北京希望电子出版社.
- 张文彤. (2002b). 世界优秀统计工具SPSS11.0统计分析教程(高级篇). 北京希望电子出版社.
http://kaiwu.city/index.php/jamovi
The jamovi project (2021). jamovi (Version 1.6) [Computer Software]. Retrieved from https://www.jamovi.org
official website:https://www.jamovi.org/
Github: https://github.com/jamovi
softpededia:https://www.softpedia.com/get/Science-CAD/jamovi.shtml
jamovi is a free, open-source data analysis application that bridges the gap between the freedom and power of R and the accessibility of SPSS.
jamovi是一个开源免费、操作简单的统计分析软件,兼容R软件,替代SPSS等软件。
下列关于jamovi的简介来自http://www.obhrm.net/index.php/Jamovi
jamovi是在R语言基础上开发出来的免费开源统计分析软件,其操作与界面与SPSS非常相似。jamovi的特点如下:
1、开源软件。jamovi属于开源软件,全球任何人均可以下载使用,用jamovi进行统计分析,是免费用正版软件进行统计分析。
2、轻巧型软件。Windows版jamovi程序总共不到250M,属于特别轻巧的统计分析程序。
3、操作简单、方便。jamovi的操作界面类似于SPSS,使用非常方便。
4、保存结果的同时保存分析过程。他人在拿到结果的同时,能看到分析的过程,并重现所有的分析过程。
5、可以进行多种统计分析。jamovi本身包括t检验、ANOVA, 相关与回归分析,因子分析(包括探索性因子分析与验证性因子分析)等基本统计分析,是本科生与硕士研究生学习统计分析的极佳工具。
6、可扩充性。jamovi可以加载模块(Modules),加载(Modules)后可以实现更多的统计分析,包括元分析、功效分析(Power analysis)、中介与调节模型分析、贝叶斯方法等。
7、动态生成报告。在统计分析时,动态呈现对应的统计分析结果。
8、自动生成三横线表格,可以直接复制粘贴到WORD和Excel中,稍加编辑就是可用来报告的表格。
9、与R无缝对接。加载Rj模块后,可以在jamovi中用R进行统计分析;jamovi在R中有一个包 - jmv,加载该包,可以在R中运行并实现jamovi的全部功能。jmv包的网址:https://cran.r-project.org/web/packages/jmv/index.html
1.关于示例数据集 (tourist satisfaction)
1.1questionnaire
http://kaiwu.city/openfiles/tourist_satisfaction_questionnaire_cn.pdf
1.2 通用数据格式 (没有 variable labels变量名标签和 value labels变量值标签)
http://kaiwu.city/openfiles/tourist.csv
或
http://kaiwu.city/openfiles/data_tourist_satisfaction_cn.xlsx
2.数据准备data preparation (data cleaning or data wrangling)
2.1 导入数据
https://libguides.library.kent.edu/SPSS/ImportData
http://kaiwu.city/openfiles/data_tourist_cn.sav
2.2 数据管理
https://libguides.library.kent.edu/SPSS/data-management
Computing VariablesThe "Compute Variable" command allows you to create new variables from existing variables by applying formulas. This tutorial shows how the "Compute Variable" command can compute a variable using an equation, a built-in function, or conditional logic. Recoding (Transforming) VariablesRecoding a variable can be used to transform an existing variable into a different form based on certain criteria. This tutorial covers the "Recode into Different Variable" and "Recode into Same Variable" commands.
syntax file: add variable labels, value labels, compute variables and recode variables.
syntax:中文标签
http://kaiwu.city/openfiles/labels_tourist_CN.sps
syntax:英文标签
http://kaiwu.city/openfiles/labels_tourist_satisfaction_en.sps
使用这2个SPSS syntax程序,可以快速切换变量名标签(variable labels)、变量值标签(value labels)的语言:英文、中文
http://kaiwu.city/openfiles/data_tourist_cn.sav
3.基本分析
http://kaiwu.city/openfiles/analysis_tourist_cn.sps
http://kaiwu.city/openfiles/academic_report_SPSS_CN.docx
Kent State University Libraries. (2017, May 15). SPSS tutorials: Independent samples t test. Retrieved May 17, 2017, from http://libguides.library.kent.edu/SPSS/IndependentTTest
3.1 频数分析frequency Analysis(分类变量)
https://libguides.library.kent.edu/SPSS/FrequenciesCategorical
https://www.spss-tutorials.com/spss-frequencies-command/
https://datatab.net/tutorial/frequency-table
中文参考
https://zhuanlan.zhihu.com/p/108860781
3.2 列联表分析cross-table(分类变量)
https://libguides.library.kent.edu/SPSS/Crosstabs
中文参考
https://zhuanlan.zhihu.com/p/634975678
3.3 描述性统计分析Descriptive analysis(数值变量)
https://libguides.library.kent.edu/SPSS/Descriptives
https://www.spss-tutorials.com/spss-descriptives-command/
中文参考
https://blog.csdn.net/qq_42278015/article/details/119696576
3.4 自定义表格custom table(值得高度关注)
https://www.ibm.com/docs/en/spss-statistics/saas?topic=statistics-custom-tables
中文参考(本站)
http://kaiwu.city/index.php/spss-custom-table-cn
4.两个变量之间的关系relationship between two variables
types of variables (level of measurment)
https://statistics.laerd.com/statistical-guides/types-of-variable.php
https://www.thoughtco.com/independent-and-dependent-variable-examples-606828
https://datatab.net/tutorial/level-of-measurement
4.1 卡方检验chi-square test
https://libguides.library.kent.edu/SPSS/ChiSquare
https://datatab.net/tutorial/chi-square-test
4.2 独立样本T检验independent sample t-test
https://libguides.library.kent.edu/SPSS/IndependentTTest
https://datatab.net/tutorial/unpaired-t-test
中文参考
https://blog.csdn.net/qq_51843109/article/details/123612791
4.3 单因素方差分析ANOVA
https://libguides.library.kent.edu/SPSS/OneWayANOVA
https://datatab.net/tutorial/anova
中文参考
https://zhuanlan.zhihu.com/p/448983174
4.4 相关分析correlation
https://libguides.library.kent.edu/SPSS/PearsonCorr
https://datatab.net/tutorial/correlation
中文参考
https://blog.csdn.net/nekonekoboom/article/details/116708114
4.5 逻辑回归分析logistic regression
https://www.spss-tutorials.com/logistic-regression/
https://datatab.net/tutorial/logistic-regression
中文参考
https://zhuanlan.zhihu.com/p/340480145
5.量表分析analysis for scales (measurement)
5.1 信度分析reliability analysis
https://www.spss-tutorials.com/cronbachs-alpha-in-spss/
https://www.spss-tutorials.com/spss-split-half-reliability/
https://datatab.net/tutorial/cronbachs-alpha
中文参考
https://www.zhihu.com/tardis/zm/art/270005975
5.2 探索性因子分析(用于效度分析)exploratory factor analysis (EFA)
https://www.spss-tutorials.com/spss-factor-analysis-tutorial/
https://www.spss-tutorials.com/spss-factor-analysis-intermediate-tutorial/
https://www.spss-tutorials.com/apa-reporting-factor-analysis/
https://datatab.net/tutorial/exploratory-factor-analysis
中文参考
https://www.zhihu.com/tardis/zm/art/270005975
6.输出分析结果export result
https://www.spss-tutorials.com/spss-output/
https://www.spss-tutorials.com/spss-apa-format-descriptives-tables/
中文参考
https://spss.mairuan.com/jiqiao/spss-wuxja.html
- 文章信息
- 作者: kaiwu
- 点击数:317
1.中科院中文新闻
https://www.cas.cn/syky/202404/t20240419_5012028.shtml
4月21日,全球首套高精度月球地质图集正式发布,图集包括《1:250万月球全月地质图集》和《1:250万月球分幅地质图集》(中英文版,含说明书),其中《1:250万月球全月地质图集》包含《1:250万月球全月地质图》、《1:250万月球岩石类型分布图》和《1:250万月球构造纲要图》,《1:250万分幅地质图集》包含30幅月球标准分幅地质图。
自上世纪60年代美国阿波罗计划实施以来,国际和国内的月球探测和月球科学研究都取得了长足进步,然而,月球地质研究至今仍沿用阿波罗时期研制的月球地质图,随着研究的深入,该月球地质图已不能满足未来的科研和月球探测需求。
https://gyig.cas.cn/gzdt_/kydt_/202206/t20220608_6458724.html
4月21日,全球首套高精度月球地质图集正式发布。图集包括《1:250万月球全月地质图集》和《1:250万月球分幅地质图集》(中英文版,含说明书)。
On April 21st, China released a geologic atlas set of the global moon with a scale of 1:2.5 million, which is the first complete high-definition lunar geologic atlas in the world. This geologic atlas set, available in both Chinese and English, includes the Geologic Atlas of the Lunar Globe and the Map Quadrangles of the Geologic Atlas of the Moon.
2.News from the Chinese Academy of Sciences
https://english.cas.cn/newsroom/cas_media/202404/t20240422_660730.shtml
China on Sunday released a geologic atlas set of the global moon with a scale of 1:2.5 million, which is the first complete high-definition lunar geologic atlas in the world, providing basic map data for future lunar research and exploration.
This geologic atlas set, available in both Chinese and English, includes the Geologic Atlas of the Lunar Globe and the Map Quadrangles of the Geologic Atlas of the Moon, according to the Institute of Geochemistry of the Chinese Academy of Sciences (CAS).
3.News from Chinadaily
https://cn.chinadaily.com.cn/a/202404/28/WS662dc06ca3109f7860ddb633.html
reference
- Xin, L. (2024). China’s Moon Atlas Is the Most Detailed Ever Made. Nature. https://doi.org/10.1038/d41586-024-01223-0
-
L. X., Nature. (2024, April 29). China’s Moon Atlas Is the Most Detailed Ever Made. Scientific American. https://www.scientificamerican.com/article/chinas-moon-atlas-is-the-most-detailed-ever-made/
-
Fortezzo, C.M., Spudis, P. D. and Harrel, S. L. (2020). Release of the Digital Unified Global Geologic Map of the Moon At 1:5,000,000- Scale. Paper presented at the 51st Lunar and Planetary Science Conference, Lunar and Planetary Institute, Houston, TX. https://www.hou.usra.edu/meetings/lpsc2020/pdf/2760.pdf