博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
[Spark][Python]Spark Join 小例子
阅读量:6452 次
发布时间:2019-06-23

本文共 1156 字,大约阅读时间需要 3 分钟。

[training@localhost ~]$ hdfs dfs -cat people.json

{"name":"Alice","pcode":"94304"}

{"name":"Brayden","age":30,"pcode":"94304"}
{"name":"Carla","age":19,"pcoe":"10036"}
{"name":"Diana","age":46}
{"name":"Etienne","pcode":"94104"}
[training@localhost ~]$

hdfs dfs -cat pcodes.json

{"pcode":"10036","city":"New York","state":"NY"}

{"pcode:"87501","city":"Santa Fe","state":"NM"}
{"pcode":"94304","city":"Palo Alto","state":"CA"}
{"pcode":"94104","city":"San Francisco","state":"CA"}

sqlContext = HiveContext(sc)

peopleDF = sqlContext.read.json("people.json")

sqlContext = HiveContext(sc)

pcodesDF = sqlContext.read.json("pcodes.json")

mydf001=peopleDF.join(pcodesDF,"pcode")

mydf001.limit(5).show()

+-----+----+-------+----+---------------+-------------+-----+

|pcode| age| name|pcoe|_corrupt_record| city|state|
+-----+----+-------+----+---------------+-------------+-----+
|94304|null| Alice|null| null| Palo Alto| CA|
|94304| 30|Brayden|null| null| Palo Alto| CA|
|94104|null|Etienne|null| null|San Francisco| CA|
+-----+----+-------+----+---------------+-------------+-----+

本文转自健哥的数据花园博客园博客,原文链接:http://www.cnblogs.com/gaojian/p/7630003.html,如需转载请自行联系原作者

你可能感兴趣的文章
20165320 第三周学习总结
查看>>
Struts2和Spring MVC的区别
查看>>
angular-bootstrap ui-date组件问题总结
查看>>
理解Javascript参数中的arguments对象
查看>>
day1
查看>>
(salesforce相关)AngularJs实现表格的增删改查
查看>>
p2:千行代码入门python
查看>>
bzoj1106[POI2007]立方体大作战tet*
查看>>
解决:Java调用.net的webService产生“服务器未能识别 HTTP 标头 SOAPAction 的值”错误...
查看>>
spring boot configuration annotation processor not found in classpath问题解决
查看>>
【转】正则基础之——神奇的转义
查看>>
团队项目测试报告与用户反馈
查看>>
MyBatis(1)——快速入门
查看>>
对软件工程课程的期望
查看>>
CPU高问题排查
查看>>
Mysql中文字符串提取datetime
查看>>
CentOS访问Windows共享文件夹的方法
查看>>
IOS 与ANDROID框架及应用开发模式对比一
查看>>
由中序遍历和后序遍历求前序遍历
查看>>
JQUERY Uploadify 3.1 C#使用案例
查看>>