新足迹

 找回密码
 注册

精华好帖回顾

· 80年代的一次旅游 (全文完) (2012-3-16) lvr · 金融危机征文-毕业留学生的故事 (2008-11-1) lizhe1985
· 云台纠结中 (2009-5-1) dickson · 3年之后的十一回目! (2022-12-20) joaquin
Advertisement
Advertisement
查看: 1404|回复: 4

3千5百万个Google Profile 下载到本地 = 1个月 [复制链接]

2010年度奖章获得者

发表于 2011-5-25 22:58 |显示全部楼层
此文章由 dalaohu 原创或转贴,不代表本站立场和观点,版权归 oursteps.com.au 和作者 dalaohu 所有!转贴必须注明作者、出处和本声明,并保持内容完整
最近网络安全问题凸现,Sony 爱立信online store 继Playstation network后又被黑掉。不少文章在质疑Cloud安全性问题。
在家上这位老兄,完全合法得吧Google 得用户profile 用1个月得时间下载到本地db。 如此不间断,从同一IP发出得大量request,居然没有受到google 安全系统得任何限制,和自动保护措施。

1 Database Containing 35.000.000 Google Profiles. Implications?


This is a follow-up to my previous blogpost on this topic.

In February 2011 it showed trivial to create a database containing ALL ~35.000.000 Google Profiles without Google throttling, blocking, CAPTCHAing or otherwise make more difficult mass-downloading attempts. It took only 1 month to retrieve the data, convert it to SQL using spidermonkey and some custom Javascript code, and import it into a database. The database contains Twitter conversations (also stored in the OZ_initData variable) , person names, aliases/nicknames, multiple past educations (institute, study, start/end date), multiple past work experiences (employer, function, start/end date), links to Picasa photoalbums, .... -- and in ~15.000.000 cases, also the username and therefore @gmail.com address. In summary: 1 month + 1 connection = 1 database containing 35.000.000 Google Profiles.

My activities are directed at inciting, or poking up, debate about privacy -- NOT to create DISTRUST but to achieve REALISTIC trust -- and the meaning of "informed consent". Which, when signing up for online services like Google Profile, amounts to checking a box. How can a user possibly be considered to be "informed" when they're not made aware 1) about the fact that it does not seem to bother Google that profiles can be mass-downloaded (Dutch) and 2) about misuse value -or hopefully the lack of it- of their social data to criminals and certain types of marketeers? Does this enable mass spear phishing attacks and other types of social engineering, or is that risk negligible, e.g. because criminals use other methods of attack and/or have other, better sources of personal data? Absence of ANY protection against mass-downloading is the status quo at Google Profile. Strictly speaking I did not even violate Google policy in retrieving the profiles, because http://www.google.com/robots.txt explicitly ALLOWS indexing of Google Profiles and my code is part of a personal experimental search engine project. I.e. at the time of this writing, that robots.txt file contains:

Allow: /profiles
Allow: /s2/profiles
Allow: /s2/photos
Allow: /s2/static

I'm curious about whether there are any implications to the fact that it is completely trivial for a single individual to do this -- possibly there aren't. That's something worth knowing too. I'm curious whether Google will apply some measures to protect against mass downloading of profile data, or that this is a non-issue for them too. In my opinion the misuse value of personal data on social networks ought to be elicited before publishing it under a false perception of informed consent. One possible outcome

My activities were performed as part of my research on anonymity/privacy at the University of Amsterdam. I'm writing a research paper about the above. Repeating from my previous post: this blog runs at Google Blogger. I sincerely hope my account "mrkoot" and blog.cyberwar.nl will not be blocked or banned - I did NOT publish the database and did NOT violate any Google policy.

Contact me by e-mail(*): kootNO_SPAM_PLEASE@uva.nl  (remove "NO_SPAM_PLEASE")
Contact me on Twitter: http://twitter.com/mrkoot.

(*)I prefer insults to be sent to mrkoot@gmail.com, as gmail has superior filters.

评分

参与人数 1积分 +4 收起 理由
matrics + 4 感谢分享

查看全部评分

足迹 Reader is phenomenal. If you never used, you never lived 火速下载
Advertisement
Advertisement

2010年度奖章获得者

发表于 2011-5-25 23:00 |显示全部楼层

Google Profiles Exposes Millions of Usernames, Gmails

此文章由 dalaohu 原创或转贴,不代表本站立场和观点,版权归 oursteps.com.au 和作者 dalaohu 所有!转贴必须注明作者、出处和本声明,并保持内容完整
这个他另一片blog,阐述了一下细节。


UPDATE 2011-05-23 #1: I'm currently writing a scientific paper about the topic discussed below. The activities are performed as part of my research on anonymity/privacy in the System & Network Engineering research group at the University of Amsterdam. A tweet on May 20th 2011 by https://twitter.com/#!/tomokas as described here urged me to post a bit prematurely. Google has been informed.

UPDATE 2011-05-23 #2: here is code that can convert most of the data in your Google Profile into a single SQL statement: http://cyberwar.nl/GProfile2SQL.js . When accessing a profile in a browser, the profile data (names, profession, education, ...) is stored in a single multidimensional Javascript array named OZ_initdata[][][...]. Install spidermonkey for its C-based Javascript engine js, download your own profile and save it as e.g.  mrkoot.html. Then execute someting like sed -n '/var OZ_initData = /,/^;window/{ s/.*var OZ_initData = /var OZ_initData = /g; s/^;window.*//g; p; }'  mrkoot.html | tee tmpjs | js -f tmpjs -e 'print(OZ_initData[5]);' | js -f tmpjs -f GProfile2SQL.js to get an INSERT statement. Optimizations are left as an exercise to the reader; you can figure out the table structure from the Javascript code and extend everything as you wish.

====== START OF ORIGINAL BLOGPOST FROM 2011-05-24 ======
The existence of Google's profiles-sitemap.xml has been known outside Google since at least 2008. The XML file, last updated March 16th 2011, points to 7000+ sitemap-NNN(N).txt files that each contain 5000 hyperlinks to Google profiles; 35M links in total. Snippet from sitemap-000.txt:

https://profiles.google.com/117135902571938793602
https://profiles.google.com/112006952710949332145
https://profiles.google.com/105382462492606983441
https://profiles.google.com/109299750146769054739
https://profiles.google.com/104555562341640123846
https://profiles.google.com/112956845518767535694

Google Profile allows users to choose whether they want to use thei username in the Google Profile URL to make it more easy to find and remember:



The text explicitly warns the user about possible exposure (bold emphasis added):
"To make it easier for people to find your profile, you can customize your URL with your Google email username. (Note this can make your Google email address publicly discoverable.)"
Selecting the second option gives an URL like https://profiles.google.com/USERNAME. Accessing profiles using the identifiers found in the sitemaps indeed reveals the Google username -- and therefore @gmail.com address. E.g. for me w/username "mrkoot":

irbaboon:be monkey$ curl -i -X HEAD http://www.google.com/profiles/115572197788225218471
HTTP/1.1 301 Moved Permanently
Location: /profiles/mrkoot
Content-Type: text/html; charset=UTF-8
Date: Mon, 23 May 2011 14:00:31 GMT
Expires: Mon, 23 May 2011 14:00:31 GMT
Cache-Control: private, max-age=0
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Server: GSE
Transfer-Encoding: chunked


Note that the HTTP 301 Redirect discloses the username before any HTML is requested. During February 2011 I checked ALL 35 MILLION LINKS --my connection did NOT get blocked after any amount of connections-- and found that ~40% of the Google Profiles expose their owner's username and hence @gmail.com address in this way. It totals to ~15 MILLION exposed usernames / @gmail.com addresses(*). With no apparent download restriction in place for connections to https://profiles.google.com and Google users disclosing their profession, employer, education, location, links to their Twitter account, Picasa photoalbums, LinkedIn accounts et cetera this seems like a large-scale spear phishing attack waiting to happen?(**) But hey, the users HAVE been warned.

This blog runs at Google Blogger. I sincerely hope my account "mrkoot" and blog.cyberwar.nl will not be blocked or banned - I do NOT publish any usernames or other profile data and did not violate policy I am aware of.
足迹 Reader is phenomenal. If you never used, you never lived 火速下载

2010年度奖章获得者

发表于 2011-5-25 23:01 |显示全部楼层

原文

此文章由 dalaohu 原创或转贴,不代表本站立场和观点,版权归 oursteps.com.au 和作者 dalaohu 所有!转贴必须注明作者、出处和本声明,并保持内容完整

发表于 2011-5-26 09:06 |显示全部楼层
此文章由 greed 原创或转贴,不代表本站立场和观点,版权归 oursteps.com.au 和作者 greed 所有!转贴必须注明作者、出处和本声明,并保持内容完整
这年头大家习惯了炒概念。应用成熟需要时间的考验。

发表于 2011-5-26 14:06 |显示全部楼层
此文章由 花蕾般的钟声 原创或转贴,不代表本站立场和观点,版权归 oursteps.com.au 和作者 花蕾般的钟声 所有!转贴必须注明作者、出处和本声明,并保持内容完整
我真没看懂

发表回复

您需要登录后才可以回帖 登录 | 注册

本版积分规则

Advertisement
Advertisement
返回顶部