Research Areas

Cybersecurity, Data Mining, Machine Learning, and Health Intelligence.

I have been working together with my research group in the areas of Cybersecurity, Data Mining, Machine Learning, and Health Intelligence. I enjoy finding novel yet elegant solutions to real-world driven problems that generate high impacts. With long-term and strong collaboration with industry partners, I have proposed and developed cloud-based solutions for mining big data in the area of cybersecurity, especially for malware detection and adversarial machine learning. I have had over sixty publications in my fields (e.g., ACM CSUR, IEEE TNNLS, IEEE TSMC, KAIS, JCV, SIGKDD, WWW, AAAI, IJCAI, ACSAC). The proposed algorithms and developed systems have been incorporated into popular commercial cybersecurity products, including Comodo Internet Security and Kingsoft Antivirus that protect millions of users worldwide. In addition, I have been awarded three patents in the area of malware detection and categorization. After joining WVU, by collaboration with the colleagues in the Health Sciences Center (HSC) at WVU, I have been further expanding my research on health intelligence, particularly leveraging social media analytics and AI technologies to combat opioid addition epidemic.



Current Research Projects

  • Intelligent Malware Detection and Adversarial Machine Learning
  • Malware (short for malicious software) is a generic term to denote all kinds of unwanted software (e.g., viruses, trojans, worms, bots, ransomware, and cryptojacking). It has been used as a major weapon by the cybercriminals to launch a wide range of attacks that cause serious damages and significant financial losses to many Internet users. To protect legitimate users from these attacks, the most significant line of defense against malware is anti-malware software products, which traditionally used signature-based methods to recognize threats. However, driven by considerable economic benefits, malware attackers are using automated malware development toolkits to quickly write and modify malicious codes that can evade detection by anti-malware products. In order to remain effective, the anti-malware industry calls for much more powerful methods that are capable of protecting the users against new threats and are more difficult to evade. To combat the evolving malware attacks, systems applying machine learning techniques have been successfully deployed and offer unparalleled flexibility in automatic malware detection. In these systems, based on different feature representations, various kinds of classifiers are constructed to detect malware. Unfortunately, as classifiers become more widely deployed, the incentive for defeating them increases. With long-term and strong collaboration with our industrial partners, this project will design and develop intelligent and resilient solutions against malware attacks, at both feature and model levels. Furthermore, the proposed techniques will also be designed to be arms race capable, and can be used in other cybersecurity domains, such as anti-spam, fraud detection, and counter-terrorism.

    Sponsors:

    NSF SaTC/CICI Awards (PI)

    WV HEPC Award (Co-PI)

    WVU Senate Grants for Research and Scholarship (PI)

    Press Coverage:

    WVU researcher named Challenge Problem Winner in AI for Cyber Security workshop (12/05/2018)

    WVU researcher awarded grant to develop techniques to enhance machine learning security (09/25/2018)

    WVU researchers awarded grant to develop techniques to enhance cyberinfrastructure security (09/04/2018)

    WVU researcher awarded grant to develop new techniques to prevent against cyber attacks (09/06/2016)

  • Securing Cyberspace: Gaining Deep Insights into the Online Underground Ecosystem
  • Cybercrime has become more and more dependent on the online underground ecosystem which has evolved into a complex and increasingly decentralized system that has an incentive to prevent infiltration. This forces cybersecurity researchers and industry practitioners to reconsider fire-fighting behavior. Built on the our prior work and strong collaborations with industry partners, we aim to design and develop an integrated framework (algorithms, scalable techniques) for in-depth investigation of the online underground ecosystem and thus to help secure cyberspace by producing data-driven intervention of cybercrime. We have developed our own web crawlers to collect the data from underground markets emerging in the forms of online mediums (e.g., underground forums, dark webs). By July 2018, we have crawled the data from four underground forums (e.g., Blackhat, Hack Forums, Nulled, etc) including 508,876 threadswith 8,232,550 posts corresponding to 725,449 users; we have also successfully collected the data from the dark webs (e.g., Dream Market) including products of crimeware and crimeware-as-a-service (CaaS). We have also manually annotated 62,512 threads posted by 5,312 users in Hack Forums as the ground-truth for automatic detection of cybercrime-suspected threads.

    Sponsors:

    NSF Career Award (PI)

    Press Coverage:

    WVU researcher awarded NSF CAREER to develop new techniques to secure cyberspace (03/25/2019)

  • Using Social Media to Study Opioid Addiction: Perception, Pattern and Acceptance
  • Opioid (e.g., heroin and morphine) addiction has become one of the largest and deadliest epidemics in the United States. Opioid addiction is a chronic mental illness that requires long term treatment and care. It is a psychiatric challenge because of high relapse rate. Medication-Assisted Treatment (MAT) using methadone or buprebnorphine has proven to be effective and beneficial because of favorable outcomes such as reduced illicit drug use and social occupational improvement. However, many people misconceive it as substituting one addictive drug for another. This lack of knowledge and sterotyping can be a barrier to people seeking treatment. Inadequate support from peers and family who maintain the sterotype of MAT, can also affect opioid addiction recovery. People may decide to quit MAT due to lack of support and/or peers and family pressure. Therefore there is a critical need to assess and break these barriers related to MAT. The role of social media in biomedical knowledge mining has turned into increasingly significant in recent years. The goal of this project is to discover, through social media data mining, the ultimate solution to remove the barriers to acceptance of MAT.

    Sponsors:

    DoJ/NIJ Award (PI)

    WVU NT-NS Pilot Grant (Co-PI)


Former Research Projects

  • Phishing Fraud Detection
  • Phishing is a form of online fraud, whereby perpetrators adopt social engineering schemes by sending emails, instant messages, or online advertising to allure users to phishing websites that impersonate trustworthy websites in order to trick individuals into revealing their sensitive information (e.g., financial accounts, passwords, and personal identification numbers) which can then be used for profit. To defend against phishing websites, security software products generally use blacklisting to filter against known websites. However, there is always a delay between website reporting and blacklist updating. Indeed, as lifetimes of phishing websites are reduced to hours from days, this method might be ineffective. In our study, resting on the webpage content and its related information, we propose a principled cluster ensemble framework to integrate different clustering solutions for phishing fraud detection.


  • Smart Devices for Children's Safety
  • In recent years, crimes against children and the cases of missing children have been increased at a high rate. Therefore, there’s an urgent need for safety support systems to prevent crimes against children or for anti-loss, especially when the parents are not around with their children, such as the children on their ways to and back from schools. In collaboration with our industrial partner, in this project, based on the children’s location histories reported by the smart devices the children wear, we explore the children’s life patterns which capture their general life styles and regularities, and apply big data analytic techniques to learn the safe regions as well as safe routes of the children. When the children are under potential dangers, their parents or guardians will receive automatic notifications. We also explore an effective energy-efficient positioning scheme for the smart devices which leverages the location tracking accuracy of the children while keeping energy overhead low.