Gary Ang Ming PHD SMU
Research on deep learning on networks (or graphs), specifically dynamic and/or temporal networks with multimodal attributes. 
Research Updates:
Dec. 2022: "Investment and Risk Management with Online News and Heterogeneous Networks" has been accepted by ACM Transactions on the Web 
Nov. 2022: "Learning and Understanding User Interface Semantics from Heterogeneous Networks with Multimodal and Positional Attributes" has been accepted by the ACM Transactions on Interactive Intelligent Systems
Oct. 2022: "Learning Dynamic Multimodal Implicit and Explicit Networks for Multiple Financial Tasks" has been accepted by the 2022 IEEE International Conference on Big Data
Apr. 2022: "Learning Semantically Rich Network-Based Multi-Modal Mobile User Interface Embeddings" has been accepted by the ACM Transactions on Interactive Intelligent Systems 
Feb. 2022: "Learning User Interface Semantics from Heterogeneous Networks with Multimodal and Positional Attributes" received "Honorable Mention" award at IUI 2022
Feb. 2022: "Guided Attention Multimodal Multitask Financial Forecasting with Inter-Company Relationships and Global and Local News" accepted by 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022)
Dec. 2021: "Learning User Interface Semantics from Heterogeneous Networks with Multimodal and Positional Attributes" accepted by 27th ACM International Conference on Intelligent User Interfaces (IUI 2022)
Oct. 2021: "Learning Knowledge-Enriched Company Embeddings for Investment Management" accepted by the 2nd ACM International Conference on AI in Finance (ICAIF 2021)
Dec. 2020: "Learning Network-Based Multi-Modal Mobile User Interface Embeddings" accepted by 26th ACM International Conference on Intelligent User Interfaces (IUI 2021) 
​​​​​​​My Research
Relationships between entities or objects play an important role in many domains, e.g., inter-company relationships in the finance domain, relationships between user interface (UI) objects in the design domain. Such relationships can naturally be represented as graphs or networks. Capturing and modelling structural network information can be useful for different tasks, e.g., stock price forecasting in the finance domain, prediction of UI object types for annotating UIs for accessibility in the design domain. 
Real-world networks frequently possess the following characteristics: associated with i) attributes from multiple modalities (multimodal network attributes), ii) multimodal attributes that evolve across time (dynamic multimodal network attributes), and/or iii) networks that evolve across time (dynamic networks). 
While there has been significant progress in deep learning for networks/graphs, e.g., network/graph embedding methods, graph neural networks, most existing works focus on static networks with unimodal static attributes. Due to the dynamic nature of such networks and multimodal attributes, an orthogonal but relevant field is time-series modelling. Classical time-series methods, such as ARIMA are however not designed for capturing information from multiple modalities or network information. Deep learning models that have been increasingly applied to time-series modelling also typically focus on numerical information, but not multimodal and/or network information. 
Hence, my research focuses on modelling dynamic multimodal networks. My works to date have addressed different aspects of dynamic multimodal networks in two real-world domains:

Multimodal Network Attributes: In the design domain, different design objects (e.g., UI screens and elements) may be connected to each other via different types of relationships at different levels (e.g., UI view structures where a UI screen can be linked to different UI classes and elements at different hierarchical levels) and be associated with attributes from multiple modalities (e.g., numerical UI element positional coordinates, categorical UI screen topics, visual UI screen images, textual UI code). Being able to effectively capture network structural and multimodal information can potentially improve model performance on a range of design tasks, e.g., predicting links between UI screens and elements for UI design layout assistance, missing UI attribute inference to improve UI data quality, UI screen retrieval, and UI element annotation for accessibility.
Learning Network-Based Multi-Modal Mobile User Interface Embeddings, which has been published at the 26th ACM International Conference on Intelligent User Interfaces (IUI 2021) proposes a novel self-supervised model - Multi-modal Attention-based Attributed Network Embedding (MAAN) model - that learns semantically rich information from networks of UI design objects with multiple modalities - text, code, images, categorical, numerical data. Paper
Learning User Interface Semantics from Heterogeneous Networks with Multimodal and Positional Attributes, which has been published at the 27th ACM International Conference on Intelligent User Interfaces (IUI 2022) proposes a novel Heterogeneous Attention-based Multimodal Positional (HAMP) graph neural network model that combines graph neural networks with the scaled dot-product attention used in transformers to learn the embeddings of heterogeneous nodes and associated multimodal and positional attributes in a unified manner. This work received an honorable mention award at IUI 2022. Paper
Learning Semantically Rich Network-Based Multi-Modal Mobile User Interface Embeddings, which has been accepted by the ACM Transactions on Interactive Intelligent Systems (ACM TiiS) is an extension of the earlier paper on the MAAN model that was published at IUI 2021. The number of linkages between UI objects can provide further information on the role of different UI objects in UI designs. Hence, to extend and generalize MAAN model to learn even richer UI embeddings by capturing edge attributes, we propose the EMAAN model in this extended paper. Paper
Learning and Understanding User Interface Semantics from Heterogeneous Networks with Multimodal and Positional Attributes, which has been accepted by the ACM Transactions on Interactive Intelligent Systems (ACM TiiS) is an extension of the earlier paper on the HAMP model that was published at IUI 2022. To further provide interpretations of the contribution of heterogeneous network information for understanding the relationships between the UI structure and prediction tasks, we propose Adaptive HAMP (AHAMP), which adaptively learns the importance of different edges linking different UI objects. Paper

Dynamic Multimodal Network Attributes and Dynamic Networks: In the financial domain, financial entities (e.g., companies) may be connected to each other via different financial relationships (e.g., common sector, management, news co-occurrences). Networks in the financial domain are generally dynamic, i.e. they vary across time, e.g., evolving relationships between financial entities. Financial entities are also associated with rich and dynamic information from multiple modalities (e.g., numerical stock prices, textual news, categorical events). Being able to effectively capture dynamic multimodal networks can potentially improve model performance on a range of important financial tasks, e.g., forecasting tasks for trading, investment and risk management.
Learning Knowledge-Enriched Company Embeddings for Investment Management, which was published at the 2nd ACM International Conference on AI in Finance (ICAIF 2021) proposes the Knowledge-Enriched Company Embedding (KECE) model, a novel multi-stage attention-based dynamic network embedding model that combines multimodal time-series information of companies with knowledge from Wikipedia and knowledge graph relationships from Wikidata to generate company entity embeddings that can be applied to a variety of downstream investment management tasks. Paper
Guided Attention Multimodal Multitask Financial Forecasting with Inter-Company Relationships and Global and Local News, which has been accepted by 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022) proposes the Guided Attention Multimodal Multitask Network (GAME) model. GAME uses novel attention modules to guide learning with global information (e.g., news on the economy in general) and local information (e.g., company-specific stock prices) from different modalities, as well as dynamic inter-company relationship networks for investment and risk management-related forecasting tasks and applications. Paper
Investment and Risk Management with Online News and Heterogeneous Networks, which has been accepted by the ACM Transactions on the Web (ACM TWEB) proposes the Guided Global-Local Attention-based Multimodal Heterogeneous Network (GLAM) model, which comprises novel attention-based mechanisms for multimodal sequential and graph encoding, a guided learning strategy, and a multitask training objective. GLAM uses multimodal information, heterogeneous relationships between companies and leverages significant local responses of individual stock prices to online news to extract useful information from diverse global online news relevant to individual stocks for multiple forecasting tasks. Paper
Learning Dynamic Multimodal Implicit and Explicit Networks for Multiple Financial Tasks, which has been accepted by the 2022 IEEE International Conference on Big Data (BigData) proposes the Dynamic Multimodal Multitask Implicit Explicit (DynMIX) network model, which pairs explicit and implicit networks across multiple modalities for a novel dynamic self-supervised learning approach to improve performance across multiple financial tasks. Paper