The personal details of millions of people around the world have been swept up in a database compiled by a Chinese tech company with reported links to the country’s military and intelligence networks, according to a trove of leaked data.
About 2.4 million people are included in the database, assembled mostly based on public open-source data such as social media profiles, analysts said. It was compiled by Zhenhua Data, based in the south-eastern Chinese city of Shenzhen.
Internet 2.0, a cybersecurity consultancy based in Canberra whose customers include the US and Australian governments, said it had been able to recover the records of about 250,000 people from the leaked dataset, including about 52,000 Americans, 35,000 Australians and nearly 10,000 Britons. They include politicians, such as prime ministers Boris Johnson and Scott Morrison and their relatives, the royal family, celebrities and military figures.
When contacted by the Guardian for comment, a representative of Zhenhua said: “The report is seriously untrue.”
“Our data are all public data on the internet. We do not collect data. This is just a data integration. Our business model and partners are our trade secrets. There is no database of 2 million people,” said the representative surnamed Sun, who identified herself as head of business.
“We are a private company,” she said, denying any links to the Chinese government or military. “Our customers are research organisations and business groups.”
The database was leaked to American academic Christopher Balding, who was previously based in Shenzhen but has returned to the US because of security concerns. He shared the data with Internet 2.0 for recovery and analysis. The findings were first published on Monday by a consortium of media outlets including the Australian Financial Review and the Daily Telegraph in the UK.
Balding described the breadth of the data as “staggering”. In a statement, Balding said the individual who provided the data had put themselves at risk but had “done an enormous service and is proof that many inside China are concerned about CCP [Chinese Communist party] authoritarianism and surveillance”.
Balding said the database was built from a variety of sources and was “technically complex using very advanced language, targeting, and classification tools”. He said the information targeted influential individuals and institutions across a variety of industries.
“From politics to organised crime or technology and academia just to name a few, the database flows from sectors the Chinese state and linked enterprises are known to target,” Balding said.
It compiles information on everyone from key public individuals to low-level individuals in an institution in a way Balding believes could be used to better monitor and understand how to exert influence.
The database also reportedly includes profiles of 793 New Zealanders.
Sun of Zhenhua said that such a database, the Overseas Key Information Database (OKIDB), does exist but that it merely connects individuals to the social media they use. “OKIDB exists but it is not as magical as they say,” she said, referring to the foreign media reports. “It is research. There are many overseas platforms like this,” she said.
The CCP and China’s Ministry of State Security has long compiled country-by-country information about foreign economic and political elites, and foreigners who had lived in China for any period, said Anne-Marie Brady, a veteran China researcher and professor at the University of Canterbury in Christchurch, New Zealand.
“I’ve seen whole books outlining the careers and political views of US China experts,” Brady added. “But what is unusual about this discovery is the use of big data and outsourcing to a private company.”
Robert Potter, co-founder of the Canberra-based firm Internet 2.0, told the Guardian the database was “ambitious” in its scope. He said the compilation of public open-source material could be “hugely valuable” to an intelligence organisation.
Potter said the sources of the data included Twitter, Facebook, Crunchbase and LinkedIn.
“Open source doesn’t necessarily mean people want it to be public,” Potter said in an interview. “The reason Cambridge Analytica was scandalous wasn’t because they were accessing information on people’s private messages on Facebook. It was because they were misusing the permissions that were given by users to those platforms.”
Some analysts said it was not surprising that a private company was amassing detailed data sets on notable individuals in government, industry, finance and academic.
“The line between public and private surveillance in the digital age is blurry. Under authoritarian government it is non-existent,” said Dr Zac Rogers of Flinders University in South Australia.
Rogers, who is research leader at the Jeff Bleich Centre for the US Alliance in Digital Technology, Security and Governance, said the likely primary purpose of the data collection was “to provide grist for CCP information operations”.
Rogers said deeply personal and granular information about individuals was scattered freely across the internet.
“When agglomerated, this data opens up myriad opportunities to conduct targeted influence activities should the need arise … This can include dis and mis-information, inauthentic simulation (deep fakes), straight-up bribery, and general muddying of the information environment in which democracy operates.”
Samantha Hoffman, an analyst from the Australian Strategic Policy Institute’s Cyber Centre, said: “What is happening is that the PRC [People’s Republic of China] and PRC-based companies are engaging in global bulk data collection to assist the Chinese party state in various objectives whether it is military, propaganda or security.”
Hoffman said the insecurity of these databases was another point of concern. “There are many companies that are doing similar things. One thing that stands out is just how insecure many similar databases and this one were. That has its own implications in terms of privacy protection as well as how exploitable the data is.”
Hoffman said it was not clear what the data is used for. “A lot of data is being collected now and not all of it is usable, but later it could be. The mass collection of data will assist the objectives in the long term.”
She said: “What they’re doing isn’t so unique. It’s why they are doing it. Lots of Western tech companies collect a lot of data and that should be uncomfortable for a lot of people but at the end of the day there’s a difference between what they are doing and what Chinese companies who claim to be directly contributing to state security are doing.”
The ABC reported that Zhenhua had also closely profiled Gilmour Space Technologies, a Queensland-based firm involved in space industry, with every board member included in the database.
Gilmour Space Technologies said it was aware of the reports. “It is not an ideal situation, of course, but it is not unusual in our industry,” a spokesperson told the Guardian.
Australia’s energy minister, Angus Taylor, said the reports would be concerning if true, but he argued the government was already boosting spending on cybersecurity to ensure “that we are secure against cyber intrusion”.
Labor’s home affairs spokesperson, Kristina Keneally, told the ABC the case highlighted “that the threat of foreign interference in the capacity to amass big datasets on a population is real – and we’ve got to take that threat very seriously”.
The office of New Zealand’s prime minister, Jacinda Ardern, did not respond to a request for comment.