Data Mining
Subscribe to Data Mining

The Lead

Arabidopsis thaliana, a model flowering plant studied by biologists, has climate-sensitive genes whose expression was found to evolve. Courtesy of Penn State

Needle in a Haystack: Finding the Right Genes in Tens of Thousands

January 28, 2015 2:45 pm | by TACC | News | Comments

Scientists using supercomputers found genes sensitive to cold and drought in a plant help it survive climate change. The computational challenges were daunting, involving thousands of individual strains of the plant with hundreds of thousands of markers across the genome and testing for a dozen environmental variables. Their findings increase basic understanding of plant adaptation and can be applied to improve crops.

AI: Computers Know the Real You Better than Friends, Family

January 13, 2015 10:01 am | by University of Cambridge | News | Comments

Researchers have found that, based on enough Facebook Likes, computers can judge your...

Computer Equal To or Better Than Humans at Cataloging Science

December 2, 2014 2:53 pm | by David Tenenbaum, University of Wisconsin-Madison | News | Comments

In 1997, IBM’s Deep Blue computer beat chess wizard Garry Kasparov. This year, a computer...

Supercomputers Link Proteins to Adverse Drug Reactions

October 21, 2014 10:40 am | by Kenneth K Ma, Lawrence Livermore National Laboratory | News | Comments

The drug creation process often misses many side effects that kill at least 100,000 patients a...

View Sample

FREE Email Newsletter

NeuroSolutions Infinity

NeuroSolutions Infinity

September 11, 2014 3:58 pm | Neurodimension, Inc. | Product Releases | Comments

NeuroSolutions Infinity predictive data analytics and modeling software is designed to streamline data mining by automatically taking care of the entire data modeling process. It includes everything from accessing, cleaning and arranging data, to intelligently trying potential inputs, preprocessing and neural network architectures, to selecting the best neural network and verifying the results.

Robo Brain — a large-scale computational system that learns from publicly available Internet resources — is currently downloading and processing about 1 billion images, 120,000 YouTube videos, and 100 million how-to documents and appliance manuals. The in

Robo Brain Teaches Robots Everything from the Internet

August 28, 2014 11:52 am | by Cornell University | News | Comments

Robo Brain — a large-scale computational system that learns from publicly available Internet resources — is currently downloading and processing about 1 billion images, 120,000 YouTube videos, and 100 million how-to documents and appliance manuals. The information is being translated and stored in a robot-friendly format that robots will be able to draw on when they need it.

North Korea (the dark area) and South Korea at night. Courtesy of NASA

Citizen Science: Images of Earth at Night Crowdsourced for Science

August 19, 2014 2:59 pm | by NASA | News | Comments

A wealth of images of Earth at night taken by astronauts on the International Space Station (ISS) could help save energy, contribute to better human health and safety and improve our understanding of atmospheric chemistry. But, scientists need your help to make that happen.

University of Wisconsin Researchers utilized HPC resources in combination with multiple advanced forms of protein structure prediction algorithms and deep sequence data mining to construct a highly plausible capsid model for Rhinovirus-C (~600,000 atoms).

HPC Innovation Excellence Award: University of Wisconsin-Madison

June 23, 2014 4:33 pm | Award Winners

University of Wisconsin Researchers utilized HPC resources in combination with multiple advanced forms of protein structure prediction algorithms and deep sequence data mining to construct a highly plausible capsid model for Rhinovirus-C (~600,000 atoms). The simulation model helps researchers in explaining why the existing pharmaceuticals don’t work on this virus.

Making changes within a complex software system is often error-prone – even the smallest mistake can endanger the entire system.

Data Mining Software Version Histories

June 23, 2014 6:30 am | by AlphaGalileo | News | Comments

Making changes within a complex software system is often error-prone – even the smallest mistake can endanger the entire system. Ten years ago, computer scientists from Saarbrücken around Professor Andreas Zeller developed a technique that automatically issues suggestions on how to manage changes...

Projects that allow the general public to collaborate with scientists are becoming useful sources of knowledge on a large scale. Online databases such as and — based at UGA — rely on amateur observers to contribute photo

New Data Collection, Analysis and Sharing Tools Help Protect Threatened Species

May 29, 2014 9:41 pm | by Science Newsline | News | Comments

Athens, Ga. – New tools to collect and share information could help stem the loss of the world's threatened species, according to a paper published today in the journal Science. The study—by an international team of scientists that included John L. Gittleman, dean of the University of Georgia Odum...

Huynh Phung Huynh, Scientist and Capability Group Manager, A*STAR Institute of High Performance Computing

Huynh Phung Huynh

April 16, 2014 8:45 am | Biographies

Huynh Phung Huynh's research interests include high performance computing (HPC): compiler optimization for GPU, many cores and other accelerators; Parallel computing: framework for parallel programming or scheduling; and HPC for data mining and machine learning algorithms.

Data Mining Disaster

March 28, 2014 4:33 pm | News | Comments

Computer technology that can mine data from social media during times of natural or other disaster could provide invaluable insights for rescue workers and decision makers. Advances in information technology have had a profound impact on disaster management.


Mathematics for Safer Medicine: Calculating Uncertainties within Technical Systems

January 7, 2014 6:20 am | by Heidelberg Institute for Theoretical Studies | News | Comments

The new HITS research group “Data Mining and Uncertainty Quantification” analyzes large amounts of data and calculates uncertainties in technical systems. With Prof. Vincent Heuveline as their group leader, the group of mathematicians and computer scientists especially focuses on increasing the security of technology in operating rooms.

Text Mining: The Next Data Frontier

January 6, 2014 2:04 pm | by Mark A. Anawis | Blogs | Comments

Josiah Stamp said: “The individual source of the statistics may easily be the weakest link.” Nowhere is this more true than in the new field of text mining, given the wide variety of textual information. By some estimates, 80 percent of the information available occurs as free-form text which, prior to the development of text mining, needed to be read in its entirety in order for information to be obtained from it.

'Approximate Computing' Improves Efficiency, Saves Energy

December 18, 2013 4:03 pm | by Emil Venere, Purdue University | News | Comments

Researchers are developing computers capable of "approximate computing" to perform calculations good enough for certain tasks that don't require perfect accuracy, potentially doubling efficiency and reducing energy consumption.       

Meet HPC Innovator Taghrid Samak

December 3, 2013 4:03 pm | by Jon Bashor, Berkeley Lab Computational Research Division | Articles | Comments

Everything leading up to the actual coding, figuring out how to make it work, is what Samak enjoys most. One of the problems she is working on with the Department of Energy’s Joint Genome Institute (JGI) is a data mining method to automatically identify errors in genome assembly, replacing the current approach of manually inspecting the assembly.

Harnessing Collective Wisdom from Social Networks

November 7, 2013 12:49 pm | by National Science Foundation | News | Comments

In his 1937 book, "Think and Grow Rich," author Napoleon Hill identified 13 steps to success, one of which was the power of the mastermind. "No two minds ever come together without thereby creating a third, invisible, intangible force, which may be likened to a third mind," Hill wrote.


Hardware for Big Data, Graphs and Large-scale Computation

September 9, 2013 9:58 am | by Rob Farber | Articles | Comments

Recent announcements by Intel and NVIDIA indicate that massively parallel computing with GPUs and Intel Xeon Phi will no longer require passing data via the PCIe bus. The bad news is that these standalone devices are still in the design phase and are not yet available for purchase.

IBM Narrows Big Data Skills Gap, Partnering with More than 1,000 Global Universities

August 15, 2013 10:49 am | by IBM | News | Comments

IBM announced on August 24, 2013, that it has added nine new academic collaborations to its more than 1,000 partnerships with universities across the globe, focusing on Big Data and analytics - all of which are designed to prepare students for the 4.4 million jobs that will be created worldwide to support Big Data by 2015. The company also announced more than $100,000 in awards for Big Data curricula.

HPC Architectures Begin Long-Term Shift Away from Compute Centrism

August 15, 2013 8:43 am | by Steve Conway, IDC | Articles | Comments

The HPC market is entering a kind of perfect storm. For years, HPC architectures have tilted farther and farther away from optimal balance between processor speed, memory access and I/O speed. As successive generations of HPC systems have upped peak processor performance without corresponding advances in per-core memory capacity and speed, the systems have become increasingly compute centric

StatSoft Receives Top Ratings in KDnuggets Poll

June 11, 2013 2:36 pm | by StatSoft | News | Comments

The 14th annual KDnuggets Software Poll, conducted in May 2013, attracted record participation of 1,880 internet voters, more than doubling the previous year's numbers. is a data mining portal and newsletter publisher for the data mining community with more than 12,000 subscribers.

New Algorithm Cluster Improves Health Record Data Mining

May 14, 2013 9:18 pm | by New Jersey Institute of Technology | News | Comments

The time may be fast approaching for researchers to take better advantage of the vast amount of valuable patient information available from U.S. electronic health records. Lian Duan, an NJIT computer scientist with an expertise in data mining, has done just that with the recent publication of "Adverse Drug Effect Detection," IEEE Journal of Biomedical and Health Informatics (March, 2013).

Pathway Studio for Web

April 5, 2013 10:38 am | Elsevier, Inc. | Product Releases | Comments

Pathway Studio, a research solution for biologists, is now available in a Web-based version. The integrated data mining and visualization software features comprehensive knowledge bases produced by applying MedScan, Elsevier’s proprietary text-mining technology, to a large corpus of biological literature.

NSF funded Superhero Supercomputer Helps Battle Autism

March 26, 2013 7:45 pm | News | Comments

When it officially came online at the San Diego Supercomputer Center (SDSC) in early January 2012, Gordon was instantly impressive. In one demonstration, it sustained more than 35 million input/output operations per second--then, a world record.

i3D Enterprise Service

March 22, 2013 3:06 pm | Shimadzu Scientific Instruments | Product Releases | Comments

i3D Enterprise Service integrates storage, processing and data mining in an enterprise-level private cloud. Laboratory data can be automatically and securely uploaded from instruments to a private cloud and processed on the cloud, enabling workflow execution and data mining in a fraction of the time.

SampleManager 11

March 22, 2013 2:51 pm | Thermo Fisher Scientific | Product Releases | Comments

SampleManager 11 laboratory information management system (LIMS) features advanced tools that are designed to improve laboratory process mapping, management and automation. Users can build workflows to reflect their individual laboratory processes and take ownership of workflow management.

Big Data, Big Science, Big Collaboration: Delivering Connected R&D for Better Value

March 15, 2013 3:26 pm | by Yike Guo, Imperial College | Articles | Comments

Today, we are more connected than ever. We live in an always-on world whose digital economy has made data a new form of resource that fundamentally changes our lives. But has this revolution really occurred across R&D domains? At a time when global R&D investment is over $1.5 trillion, leading voices still bemoan a lack of open access to decision-making data and an innovation deficit syndrome.

Breakthrough Prize in Life Sciences Announced

February 21, 2013 5:38 am | News | Comments

Art Levinson, Sergey Brin, Anne Wojcicki, Mark Zuckerberg, Priscilla Chan and Yuri Milner announced the launch of the Breakthrough Prize in Life Sciences, recognizing excellence in research aimed at curing intractable diseases and extending human life

SDSC Invites Researches to Apply for Access to Gordon Supercomputer

September 28, 2012 11:13 am | News | Comments

The San Diego Supercomputer Center (SDSC) at the University of California, San Diego, is seeking innovative applications for the next round of user allocations on its data-intensive Gordon supercomputer, which went into operation earlier this year

You may login with either your assigned username or your e-mail address.
The password field is case sensitive.