本文实例讲述了Python下载网络文本数据到本地内存的四种实现方法。分享给大家供大家参考,具体如下:
?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | Stream Vera Sans Mono', 'Courier New', Courier, monospace !important; float: none !important; border-top-width: 0px !important; border-bottom-width: 0px !important; height: auto !important; color: rgb(0, 102, 153) !important; vertical-align: baseline !important; overflow: visible !important; top: auto !important; right: auto !important; font-weight: bold !important; left: auto !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;" class="py keyword">import urllib.request import requests from io import StringIO import NumPy as np import pandas as pd ''' 下载网络文件,并导入CSV文件作为numpy的矩阵 ''' # 网络数据文件地址 url = "http://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data" # 方法一 #
======================================================== # 下载文件 #r = urllib.request.urlopen(url) # 导入CSV文件作为numpy的矩阵 #dataset = np.loadtxt(r, delimiter=",") # 方法二 #
======================================================== # 下载文件 #r = requests.get(url) # 导入CSV文件作为numpy的矩阵 #dataset = np.loadtxt(StringIO(r.text), delimiter=",") #
此处用到 StringIO !!!!!! # 方法三 #
======================================================== #用genfromtxt直接下载网络文件,并将CSV文件导作numpy矩阵。爽!!!!!!!! #dataset = np.genfromtxt(url, delimiter=",") # 方法四 #
======================================================== #
用pandas.read_csv直接下载网络文件,并将CSV文件导作pandas.DataFrame。 # dataset = pd.read_csv('http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv',
index_col=0) dataset = pd.read_csv(url) #
======================================================== # separate the data from the target attributes X = dataset[:, 0 : 7 ] y = dataset[:, 8 ] print (X) #print(y) |