Research Areas
Cybersecurity, Data Mining, Machine Learning, and Health Intelligence.
With long-term and strong collaboration with industry partners, I have advanced AI-driven innovations for cybersecurity, especially in the fields of malware detection and adversarial machine learning. The proposed algorithms and developed systems have been incorporated into popular commercial cybersecurity products, including Comodo Internet Security and Kingsoft Antivirus that protect millions of users worldwide against evolving cyberattacks. In addition, I have been awarded three patents in the area of malware detection and categorization. After joining academia, by collaboration with my colleagues in health sciences, I have been further expanding my research on health intelligence, particularly harnessing large-scale multi-source multi-modality data and developing advanced AI technologies to combat opioid crisis and infectious disease outbreaks. By working with my team, I have had over 120 publications in my fields (e.g., ACM CSUR, IEEE TNNLS, IEEE TKDE, IEEE TSMC, SIGKDD, ICDM, CIKM, WWW, NeurIPS, ICLR, AAAI, IJCAI, USENIX Security, ACSAC), including the AAAI-DCAA 2023 Best Paper Runner-Up Award, the SIGKDD 2022 Best Paper Award Shortlist (Research Track), the ACM CIKM 2021 Best Paper Award (Full Paper Track), the ACM CIKM 2021 Best Paper Runner-Up Award (Applied Paper Track), the WWW 2021 Best Paper Award Shortlist, the AICS 2019 Challenge Problem Winner, the SIGKDD 2017 Best Paper Award and SIGKDD 2017 Best Student Paper Award (Applied Data Science Track). Our works have been significantly supported by multiple federal agencies in research funding.
Current Research Projects
- Intelligent Malware Detection and Adversarial Machine Learning
- Securing Cyberspace: Gaining Deep Insights into the Online Underground Ecosystem
- Mining Large-scale and Dynamic Heterogeneous Networks to Combat Opioid Crisis and Reduce Opioid Overdose Risks
- AI-driven Techniques to Combat COVID-19 Pandemic and Future Natural or Health-related Disasters
Malware (short for malicious software) is a generic term to denote all kinds of unwanted software (e.g., viruses, trojans, worms, bots, ransomware, and cryptojacking). It has been used as a major weapon by the cybercriminals to launch a wide range of attacks that cause serious damages and significant financial losses to users in the cyberspace. To protect legitimate users against these attacks, the most significant line of defense is anti-malware software products, which traditionally used signature-based methods to recognize threats. However, driven by considerable economic benefits, malware attackers are using automated malware development toolkits to quickly write and modify malicious codes that can evade detection by anti-malware products. In order to remain effective, the anti-malware industry calls for much more powerful methods that are capable of protecting the users against new threats and are more difficult to evade. To combat the evolving malware attacks, systems applying AI and machine learning techniques have been successfully deployed and offered unparalleled flexibility in automatic malware detection. In these systems, based on different feature representations, various kinds of classifiers are constructed to detect malware. Unfortunately, as classifiers become more widely deployed, the incentive for defeating them increases. With long-term and strong collaboration with our industrial partners, this project will design and develop intelligent and resilient solutions against malware attacks, at both feature and model levels. Furthermore, the proposed techniques will also be designed to be arms race capable, and can be used in other cybersecurity domains, such as anti-spam, fraud detection, etc.
The importance of cybersecurity can hardly be understated, especially during the global pandemic we are facing. As many of social activities have moved online, society's overwhelming reliance on the complex cyberspace makes its security more important than ever. Unfortunately, utilizing both fear and financial incentives, cyber threat actors are using COVID-19 or coronavirus as a lure all over the spectrum of sophistication to spread malware to gain profits from the pandemic. To better protect users in the cyberspace, we continue our efforts on the development of innovative links between AI and security to design and develop an intelligent framework for COVID-19 themed malware detection to help mitigate its negative effects on public health, society, and the economy.
Sponsors:
NSF SaTC-RAPID/TWC/SaTC/CICI Awards (PI)
Press Coverage:
Prof. Ye received NSF RAPID Award to develop AI-driven innovations for COVID-19 themed malware detection (07/01/2020)
WVU researcher named Challenge Problem Winner in AI for Cyber Security workshop (12/05/2018)
WVU researcher awarded grant to develop techniques to enhance machine learning security (09/25/2018)
WVU researchers awarded grant to develop techniques to enhance cyberinfrastructure security (09/04/2018)
WVU researcher awarded grant to develop new techniques to prevent against cyber attacks (09/06/2016)
Cybercrime has become more and more dependent on the online underground ecosystem which has evolved into a complex and increasingly decentralized system that has an incentive to prevent infiltration. This forces cybersecurity researchers and industry practitioners to reconsider fire-fighting behavior. Built on the our prior work and strong collaboration with industry partners, we aim to design and develop an integrated framework (algorithms, scalable techniques) for in-depth investigation of the online underground ecosystem and thus to help secure cyberspace by producing data-driven intervention of cybercrimes. We have developed our own web crawlers to collect the data from underground markets emerging in the forms of online mediums (e.g., underground forums, dark webs). We have crawled the data from different underground forums (e.g., Blackhat, Hack Forums, Nulled, etc) including 508,876 threads with 8,232,550 posts corresponding to 725,449 users; we have also manually annotated 62,512 threads posted by 5,312 users in Hack Forums as the ground-truth for automatic detection of cybercrime-suspected threads. We are continuing our efforts on collecting and analyzing the data related to products of crimeware and crimeware-as-a-service (CaaS) to understand dynamics of illicit activities on the active underground markets in darknet.
Sponsors:
NSF Career Award (PI)
Press Coverage:
WVU researcher awarded NSF CAREER to develop new techniques to secure cyberspace (03/25/2019)
As opioid overdose deaths have continued to increase over the past decade across the country, it is critical to understand the drugs involved in those deaths and the potential role of polypharmacy (i.e., the concurrent use of multiple medications) in opioid overdose deaths. However, due to the formidable complexity of drug-drug interactions (DDIs) arising from polypharmacy, it is challenging if not impossible to count them all manually. Therefore, there is an urgent need for developing novel computational methodologies and models for early detection of risky DDI patterns when opioids are combined with other drugs (e.g., sedatives, muscle relaxants, anti-anxieties). Since relying on a single data source for biomedical knowledge discovery often results in unsatisfactory performance, the goal of this project is to design and develop a novel and integrated framework (algorithms, models, and techniques) to construct a heterogeneous network built from multiple data sources and extract useful information from the constructed network to reduce the risk of opioid overdoses resulting from polypharmacy. In addition, based on the large-scale data generated from social media and darknet, we aim to advance capbilities of artificial intelligence (AI) to detect, disrupt and dismantle the online trafficking networks and thus help combat opioid epidemic. Only through collective and persistent effort of society, can we build up a drug-free world - one community at a time. These works are strongly supported by the National Science Foundation (NSF) and the National Institute of Justice (NIJ).
Sponsors:
DoJ/NIJ Award (PI)
NSF IIS/D-ISN Award (PI)
Press Coverage:
Combating online opioid trafficking with advanced AI techniques (09/16/2022)
It is believed that the novel virus which causes COVID-19 emerged from an animal source, but it has been rapidly spreading from person-to-person through various forms of contact. According to the Centers for Disease Control and Prevention (CDC), the coronavirus seems to be spreading easily and sustainably in the community - i.e., community transmission which means people have been infected with the virus in an area, including some who are not sure how or where they became infected. With the fast evolution of the novel virus and before a vaccine or drug becomes widely available, community mitigation, which is a set of actions that persons and communities can take to help slow the spread of respiratory virus infections, is the most readily available interventions to help slow transmission of the virus in communities. In practicing community mitigation, there is still a need for groceries, medical supplies, etc., requiring travel and visits to local establishments. In doing so, we all have the opportunity to make choices on where we go and what establishments we visit to meet our daily needs. To assist with making an informed decision, in early pandemic (i.e., March 2020), we proposed and developed a system (named alpha-Satellite) to provide users with up-to-date community-level risk assessment in the United State. The system advances capabilities of artificial intelligence (AI) to estimate risk indices associated with a given area, by leveraging the large-scale and real-time data obtained from multiple sources including disease related data from official public health organizations and digital media, demographic and mobility data, and social media data. The available data are automatically analyzed and combined by the system to provide actionable information to users, by local area, to assess the potential risk of traveling to a specific area. After we launched our system for public tests, it has had 500,321 users as of June 29, 2022. The large number of its users indicate the high demand from the public for effective computational tools to assist people with actionable strategies. The system has receiving a lot of good feedback from the media (e.g., WKYC, Ideastream, NPR, Fox8, WTAM, GovTech, etc.) and users on the ease of use as well as the utility of dynamic risk estimations. The developed system, papers, and generated benchmark datasets have made publicly accessible through our website. We are continuing our efforts to expand the data collection and enhance the systems to help combat the fast evolving COVID-19 and future natural or health-related disasters.
CDS Prof. Ye awarded $500K grant to develop AI technologies to reduce opioid overdose risks (11/05/2019)
The COVID-19 pandemic has also exposed a critical set of vulnerabilities that have impacted community resilience in responding to escalating societal, economic, and behavioral issues. Unfortunately, there are no established solutions or proven models for us to depend on to tackle the complex challenges with significant uncertainties and unknowns. To help address the devastating effects caused by COVID-19, by advancing AI innovations, we will expand our efforts on the development of an AI-driven paradigm for collective and collaborative community resilience in responses to a variety of crises and exposed vulnerabilities in the COVID-19 era and beyond.
Sponsors:
NSF IIS/IIS-RAPID Awards (PI)
Google Maps Platform
Press Coverage:
Data and Artificial Intelligence Technologies Take On Infectious Disease Outbreaks (August 2021)
Real-Time Risk Assessment Tool Could Aid Reopening Measures (MetroLab Innovation of the Month, May 2020)
Researchers at Case Western Reserve University testing map that assess COVID-19 risk in real time (2020)
CWRU Researchers Create Real-Time Tool To Map COVID-19 Risk (2020)
CWRU researchers create map used to evaluate risk of COVID-19 transmission (2020)
CWRU has a coronavirus hot spot map and mobile app (2020)
CDS Prof. Ye and ECSE Prof. Loparo received new NSF Award to develop AI-driven Paradigm for Collective and Collaborative Community Resilience in the COVID-19 Era and Beyond (2020)
CDS Prof. Ye and ECSE Prof. Loparo received NSF RAPID Award to develop AI technologies for real-time COVID-19 risk assessment (2020)
Former Research Projects
- Phishing Fraud Detection
- Smart Devices for Children's Safety
Phishing is a form of online fraud, whereby perpetrators adopt social engineering schemes by sending emails, instant messages, or online advertising to allure users to phishing websites that impersonate trustworthy websites in order to trick individuals into revealing their sensitive information (e.g., financial accounts, passwords, and personal identification numbers) which can then be used for profit. To defend against phishing websites, security software products generally use blacklisting to filter against known websites. However, there is always a delay between website reporting and blacklist updating. Indeed, as lifetimes of phishing websites are reduced to hours from days, this method might be ineffective. In our study, resting on the webpage content and its related information, we propose a principled cluster ensemble framework to integrate different clustering solutions for phishing fraud detection.
In recent years, crimes against children and the cases of missing children have been increased at a high rate. Therefore, there’s an urgent need for safety support systems to prevent crimes against children or for anti-loss, especially when the parents are not around with their children, such as the children on their ways to and back from schools. In collaboration with our industrial partner, in this project, based on the children’s location histories reported by the smart devices the children wear, we explore the children’s life patterns which capture their general life styles and regularities, and apply big data analytic techniques to learn the safe regions as well as safe routes of the children. When the children are under potential dangers, their parents or guardians will receive automatic notifications. We also explore an effective energy-efficient positioning scheme for the smart devices which leverages the location tracking accuracy of the children while keeping energy overhead low.