本文共 1156 字,大约阅读时间需要 3 分钟。
[training@localhost ~]$ hdfs dfs -cat people.json
{"name":"Alice","pcode":"94304"} {"name":"Brayden","age":30,"pcode":"94304"} {"name":"Carla","age":19,"pcoe":"10036"} {"name":"Diana","age":46} {"name":"Etienne","pcode":"94104"} [training@localhost ~]$
hdfs dfs -cat pcodes.json
{"pcode":"10036","city":"New York","state":"NY"}
{"pcode:"87501","city":"Santa Fe","state":"NM"} {"pcode":"94304","city":"Palo Alto","state":"CA"} {"pcode":"94104","city":"San Francisco","state":"CA"}sqlContext = HiveContext(sc)
peopleDF = sqlContext.read.json("people.json")sqlContext = HiveContext(sc)
pcodesDF = sqlContext.read.json("pcodes.json")mydf001=peopleDF.join(pcodesDF,"pcode")
mydf001.limit(5).show()
+-----+----+-------+----+---------------+-------------+-----+
|pcode| age| name|pcoe|_corrupt_record| city|state| +-----+----+-------+----+---------------+-------------+-----+ |94304|null| Alice|null| null| Palo Alto| CA| |94304| 30|Brayden|null| null| Palo Alto| CA| |94104|null|Etienne|null| null|San Francisco| CA| +-----+----+-------+----+---------------+-------------+-----+本文转自健哥的数据花园博客园博客,原文链接:http://www.cnblogs.com/gaojian/p/7630003.html,如需转载请自行联系原作者