10 行 Python 代碼創建可視化地圖-36大數據
作者:renwofei423
import vincent world_countries = r'world-countries.json' world = vincent.Map(width= 1200 , height= 1000 ) world.geo_data(projection= 'winkel3' , scale= 200 , world=world_countries) world.to_json(path)
當我開始建造Vincent時, 我的一個目的就是使得地圖的建造盡可能合理化. 有一些很棒的python地圖庫-參見Basemap?和?Kartograph能讓地圖更有意思. 我強烈推薦這兩個工具, 因為他們都很好用而且很強大. 我想有更簡單一些的工具,能依靠Vega的力量并且允許簡單的語法點到geoJSON文件,詳細描述一個投影和大小/比列,最后輸出地圖.
例如, 將地圖數據分層來建立更復雜的地圖:
vis = vincent.Map(width= 1000 , height= 800 ) #Add the US county data and a new line color vis.geo_data(projection= 'albersUsa' , scale= 1000 , counties=county_geo) vis + ( '2B4ECF' , 'marks' , 0 , 'properties' , 'enter' , 'stroke' , 'value' ) #Add the state data, remove the fill, write Vega spec output to JSON vis.geo_data(states=state_geo) vis - ( 'fill' , 'marks' , 1 , 'properties' , 'enter' ) vis.to_json(path)
加之,等值線地圖需綁定Pandas數據,需要數據列直接映射到地圖要素.假設有一個從geoJSON到列數據的1:1映射,它的語法是非常簡單的:
#'merged' is the Pandas DataFrame vis = vincent.Map(width= 1000 , height= 800 ) vis.tabular_data(merged, columns=[ 'FIPS_Code' , 'Unemployment_rate_2011' ]) vis.geo_data(projection= 'albersUsa' , scale= 1000 , bind_data= 'data.id' , counties=county_geo) vis + ([ "#f5f5f5" , "#000045" ], 'scales' , 0 , 'range' ) vis.to_json(path)
我們的數據并非沒有爭議無需改造——用戶需要確保 geoJSON 鍵與熊貓數據框架之間具有1:1的映射。下面就是之前實例所需的簡明的數據框架映射:我們的國家信息是一個列有FIPS?碼、國家名稱、以及經濟信息(列名省略)的 CSV 文件:
00000 ,US,United States, 154505871 , 140674478 , 13831393 , 9 , 50502 , 100 01000 ,AL,Alabama, 2190519 , 1993977 , 196542 , 9 , 41427 , 100 01001 ,AL,Autauga County, 25930 , 23854 , 2076 , 8 , 48863 , 117.9 01003 ,AL,Baldwin County, 85407 , 78491 , 6916 , 8.1 , 50144 , 121 01005 ,AL,Barbour County, 9761 , 8651 , 1110 , 11.4 , 30117 , 72.7
在 geoJSON 中,我們的國家形狀是以 FIPS 碼為id 的(感謝 fork 自?Trifacta?的相關信息)。為了簡便,實際形狀已經做了簡略,在示例數據可以找到完整的數據集:
{ "type" : "FeatureCollection" , "features" :[ { "type" : "Feature" , "id" : "1001" , "properties" :{ "name" : "Autauga" } { "type" : "Feature" , "id" : "1003" , "properties" :{ "name" : "Baldwin" } { "type" : "Feature" , "id" : "1005" , "properties" :{ "name" : "Barbour" } { "type" : "Feature" , "id" : "1007" , "properties" :{ "name" : "Bibb" } { "type" : "Feature" , "id" : "1009" , "properties" :{ "name" : "Blount" } { "type" : "Feature" , "id" : "1011" , "properties" :{ "name" : "Bullock" } { "type" : "Feature" , "id" : "1013" , "properties" :{ "name" : "Butler" } { "type" : "Feature" , "id" : "1015" , "properties" :{ "name" : "Calhoun" } { "type" : "Feature" , "id" : "1017" , "properties" :{ "name" : "Chambers" } { "type" : "Feature" , "id" : "1019" , "properties" :{ "name" : "Cherokee" }
我們需要匹配 FIPS 碼,確保匹配正確,否則 Vega 無法正確的壓縮數據:
import json import pandas as pd #Map the county codes we have in our geometry to those in the #county_data file, which contains additional rows we don't need with open(county_geo, 'r' ) as f: get_id = json.load(f) #Grab the FIPS codes and load them into a dataframe county_codes = [x[ 'id' ] for x in get_id[ 'features' ]] county_df = pd.DataFrame({ 'FIPS_Code' : county_codes}, dtype=str) #Read into Dataframe, cast to string for consistency df = pd.read_csv(county_data, na_values=[ ' ' ]) df[ 'FIPS_Code' ] = df[ 'FIPS_Code' ].astype(str) #Perform an inner join, pad NA's with data from nearest county merged = pd.merge(df, county_df, on= 'FIPS_Code' , how= 'inner' ) merged = merged.fillna(method= 'pad' ) >>>merged.head() FIPS_Code State Area_name Civilian_labor_force_2011 Employed_2011 \ 0 1001 AL Autauga County 25930 23854 1 1003 AL Baldwin County 85407 78491 2 1005 AL Barbour County 9761 8651 3 1007 AL Bibb County 9216 8303 4 1009 AL Blount County 26347 24156 Unemployed_2011 Unemployment_rate_2011 Median_Household_Income_2011 \ 0 2076 8.0 48863 1 6916 8.1 50144 2 1110 11.4 30117 3 913 9.9 37347 4 2191 8.3 41940 Med_HH_Income_Percent_of_StateTotal_2011 0 117.9 1 121.0 2 72.7 3 90.2 4 101.2
現在,我們可以快速生成不同的等值線:
vis.tabular_data(merged, columns=[ 'FIPS_Code' , 'Civilian_labor_force_2011' ]) vis.to_json(path)
這只能告訴我們 LA 和 King 面積非常大,人口非常稠密。讓我們再看看中等家庭收入:
vis.tabular_data(merged, columns=[ 'FIPS_Code' , 'Median_Household_Income_2011' ]) vis.to_json(path)
明顯很多高收入區域在東海岸或是其他高密度區域。我敢打賭,在城市層級這將更加有趣,但這需要等以后發布的版本。讓我們快速重置地圖,再看看國家失業率:
#Swap county data for state data, reset map state_data = pd.read_csv(state_unemployment) vis.tabular_data(state_data, columns=[ 'State' , 'Unemployment' ]) vis.geo_data(bind_data= 'data.id' , reset= True , states=state_geo) vis.update_map(scale= 1000 , projection= 'albersUsa' ) vis + ([ '#c9cedb' , '#0b0d11' ], 'scales' , 0 , 'range' ) vis.to_json(path)
地圖即是我的激情所在——我希望 Vincent 能夠更強,包含輕松的添加點、標記及其它的能力。如果各位讀者對于映射方面有什么功能上的需求,可以在Github上給我發問題。
End.
轉載請注明來自36大數據(36dsj.com): 36大數據 ? 10 行 Python 代碼創建可視化地圖