Gary Ang Ming PHD SMU
Hi! I'm Gary Ang, a rather atypical (or relatively old) PHD in Computer Science candidate at the Singapore Management University. Prior to this mid-life break, I headed the investment risk management function at the Monetary Authority of Singapore. I am interested in research relating to deep learning on networks (or graphs), specifically dynamic and/or temporal networks with multimodal attributes. My interests in multimodal dynamic networks stem from my parallel interests in the creative and financial domains, where such networks are common.
Research Updates:
Apr. 2022: "Learning Semantically Rich Network-Based Multi-Modal Mobile User Interface Embeddings" has been accepted by the ACM Transactions on Interactive Intelligent Systems 
Feb. 2022: "Learning User Interface Semantics from Heterogeneous Networks with Multimodal and Positional Attributes" received "Honorable Mention" award at IUI 2022
Feb. 2022: "Guided Attention Multimodal Multitask Financial Forecasting with Inter-Company Relationships and Global and Local News" accepted by 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022)
Dec. 2021: "Learning User Interface Semantics from Heterogeneous Networks with Multimodal and Positional Attributes" accepted by 27th ACM International Conference on Intelligent User Interfaces (IUI 2022)
Oct. 2021: "Learning Knowledge-Enriched Company Embeddings for Investment Management" accepted by the 2nd ACM International Conference on AI in Finance (ICAIF 2021)
Dec. 2020: "Learning Network-Based Multi-Modal Mobile User Interface Embeddings" accepted by 26th ACM International Conference on Intelligent User Interfaces (IUI 2021) 
My Research
Relationships between entities or objects play an important role in many domains, e.g., inter-company relationships in the finance domain, relationships between user interface (UI) objects in the design domain. Such relationships can naturally be represented as graphs or networks. Capturing and modelling structural network information can be useful for different tasks, e.g., stock price forecasting in the finance domain, prediction of UI object types for annotating UIs for accessibility in the design domain. 
Real-world networks frequently possess the following characteristics: associated with i) attributes from multiple modalities (multimodal network attributes), ii) multimodal attributes that evolve across time (dynamic multimodal network attributes), and/or iii) networks that evolve across time (dynamic networks). 
While there has been significant progress in deep learning for networks/graphs, e.g., network/graph embedding methods, graph neural networks, most existing works focus on static networks with unimodal static attributes. Due to the dynamic nature of such networks and multimodal attributes, an orthogonal but relevant field is time-series modelling. Classical time-series methods, such as ARIMA are however not designed for capturing information from multiple modalities or network information. Deep learning models that have been increasingly applied to time-series modelling also typically focus on numerical information, but not multimodal and/or network information. 
Hence, my research focuses on modelling dynamic multimodal networks. My works to date have addressed different aspects of dynamic multimodal networks in two real-world domains:
Multimodal Network Attributes: In the design domain, different design objects (e.g., UI screens and elements) may be connected to each other via different types of relationships at different levels (e.g., UI view structures where a UI screen can be linked to different UI classes and elements at different hierarchical levels) and be associated with attributes from multiple modalities (e.g., numerical UI element positional coordinates, categorical UI screen topics, visual UI screen images, textual UI code). Being able to effectively capture network structural and multimodal information can potentially improve model performance on a range of design tasks, e.g., predicting links between UI screens and elements for UI design layout assistance, missing UI attribute inference to improve UI data quality, UI screen retrieval, and UI element annotation for accessibility.
Learning Network-Based Multi-Modal Mobile User Interface Embeddings, which has been published at the 26th ACM International Conference on Intelligent User Interfaces (IUI 2021) proposes a novel self-supervised model - Multi-modal Attention-based Attributed Network Embedding (MAAN) model - that learns semantically rich information from networks of UI design objects with multiple modalities - text, code, images, categorical, numerical data. Paper
Learning User Interface Semantics from Heterogeneous Networks with Multimodal and Positional Attributes, which has been published at the 27th ACM International Conference on Intelligent User Interfaces (IUI 2022) proposes a novel Heterogeneous Attention-based Multimodal Positional (HAMP) graph neural network model that combines graph neural networks with the scaled dot-product attention used in transformers to learn the embeddings of heterogeneous nodes and associated multimodal and positional attributes in a unified manner. This work received an honorable mention award at IUI 2022. Paper
Learning Semantically Rich Network-Based Multi-Modal Mobile User Interface Embeddings, which has been accepted by the ACM Transactions on Interactive Intelligent Systems (ACM TiiS) is an extension of the earlier paper on the MAAN model that was published at IUI 2021. The number of linkages between UI objects can provide further information on the role of different UI objects in UI designs. Hence, to extend and generalize MAAN model to learn even richer UI embeddings by capturing edge attributes, we propose the EMAAN model in this extended paper. Paper
Dynamic Multimodal Network Attributes and Dynamic Networks: In the financial domain, financial entities (e.g., companies) may be connected to each other via different financial relationships (e.g., common sector, management, news co-occurrences). Networks in the financial domain are generally dynamic, i.e. they vary across time, e.g., evolving relationships between financial entities. Financial entities are also associated with rich and dynamic information from multiple modalities (e.g., numerical stock prices, textual news, categorical events). Being able to effectively capture dynamic multimodal networks can potentially improve model performance on a range of important financial tasks, e.g., forecasting tasks for trading, investment and risk management.
Learning Knowledge-Enriched Company Embeddings for Investment Management, which was published at the 2nd ACM International Conference on AI in Finance (ICAIF 2021) proposes the Knowledge-Enriched Company Embedding (KECE) model, a novel multi-stage attention-based dynamic network embedding model that combines multimodal time-series information of companies with knowledge from Wikipedia and knowledge graph relationships from Wikidata to generate company entity embeddings that can be applied to a variety of downstream investment management tasks. Paper
Guided Attention Multimodal Multitask Financial Forecasting with Inter-Company Relationships and Global and Local News, which has been accepted by 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022) proposes the Guided Attention Multimodal Multitask Network (GAME) model. GAME uses novel attention modules to guide learning with global information (e.g., news on the economy in general) and local information (e.g., company-specific stock prices) from different modalities, as well as dynamic inter-company relationship networks for investment and risk management-related forecasting tasks and applications. Paper