Abstract: The holy grail we strive for is, given a query, to identify an algorithm that answers it over general databases with optimal time guarantees for the specific query. In this tutorial, we focus on what can be seen as ideal time guarantees: linear preprocessing (needed to read the input) and constant time per answer (needed to print the output). We seek to understand which queries can be solved with these (or almost these) time guarantees and how.
We start with the basic building blocks of database queries: joins, and slowly increase the expressivity by introducing projections and unions until we cover positive relational algebra. We first consider the task of enumerating all query answers and then discuss related, more demanding, tasks such as ordered enumeration and direct access to query answers. We investigate the challenges in answering such queries and provide algorithms and conditional lower bounds.
Bio: Nofar Carmeli is currently a post doctoral researcher at the Valda team in École Normale Supérieure Paris. Her research focuses on theoretical aspects of database query optimization. She completed her Ph.D. at the Technion Data & Knowledge Laboratory in Israel.
Session Chair: Dan Olteanu (University of Zurich, Switzerland)
(#38)On the Hardness of Category Tree Construction
, Shay Gershtein (Tel Aviv University, Israel), Uri Avron (Tel Aviv University, Israel), Ido Guy (eBay Research, Israel), Tova Milo (Tel Aviv University, Israel) and Slava Novgorodov (eBay Research, Israel)
(#49)Linear Programs with Conjunctive Queries
, Florent Capelli (Université de Lille, France), Nicolas Crosetti (Université de Lille, France), Joachim Niehren (INRIA, France) and Jan Ramon (INRIA, France)
Session Co-Chairs: Block A: Cristian Riveros (Pontificia Universidad Catolica de Chile, Chile) and Block B: Mahmoud Abo Khamis (RelationalAI, Inc, USA)
(#36)Improved Approximation and Scalability for Fair Max-Min Diversification
, Raghavendra Addanki (University of Massachusetts Amherst, US), Andrew McGregor (University of Massachusetts Amherst, US), Alexandra Meliou (University of Massachusetts Amherst, US) and Zafeiria Moumoulidou (University of Massachusetts Amherst, US)
(#15)Streaming Enumeration on Nested Documents, Martín Muñoz (Pontificia Universidad Católica de Chile, Chile) and Cristian Riveros (Pontificia Universidad Católica de Chile, Chile)
Description: Half day workshop
Since big data have imposed a paradigm change in the way data are stored, managed, and queried, information systems have evolved into complex data platforms or data ecosystems supporting data-intensive storage, computation, and analysis of data with heterogeneous structures. Yet, a smart and comprehensive support for data scientists and architects to govern the data through the whole life-cycle is still lacking.
Data management support in data platforms requires the collection of a wide set of metadata capturing the distinguishing features of the data; this enables advanced functionalities spanning from research and data profiling to provenance control, orchestration of data transformation pipelines, incremental data integration, and efficient querying. The challenges begin with the management of metadata itself in terms of modeling effort, storage, complexity of retrieval activities, and effective exploitation. Besides addressing the Vs of big data, the enabled functionalities must cope with the heterogeneity of storage and computation engines - that include DBMSs supporting multiple data models and cloud storage systems with limited control and predictability - while meeting suitability requirements for less-skilled users.
This workshop calls for researchers and practitioners to propose innovative solutions to address the aforementioned challenges, by welcoming papers that contribute to the advancement of data platforms in optimizing and simplifying the different aspects of data and metadata management and fruition.
Organizers: Matteo Francia (University of Bologna, Italy), Enrico Gallinucci (University of Bologna, Italy), Patrick Marcel (Université de Tours, France) and Stefano Rizzi (University of Bologna, Italy)
Description: Full day workshop
In the last few years, the use of Information and Communication Technologies has made available a huge amount of heterogeneous data in various real application domains. For example, in the urban scenario, Internet of Things (IoT) systems capture massive data collections describing the overall urban environment and citizen exploitation and perception of available services. In health care systems, electronic health records allow storing various information about patients as adopted treatments and monitored physiological conditions. At the same time, the Internet of Medical Things (IoMT) ensures the availability and processing of healthcare data through smart medical devices and the web. Moreover, in most domains, individuals play a crucial role in generating data on the one side, driving a user and context-aware analysis process, and finally demanding easily accessible and understandable knowledge at the end of the process.
Digging deep in these data collections can unearth a rich spectrum of knowledge in the targeted domain valuable to characterise user behaviours, identify weaknesses and strengths, improve the quality of provided services or even devise new ones. However, data analytics on these data collections is still a daunting task because they are generally too big and heterogeneous to be processed through available data analysis techniques. Consequently, various challenges about data science arise dealing with the creation, storage, search, sharing, modelling, analysis, and visualisation of data, information, and knowledge.
Suitable data fusion techniques and data representation paradigms should be devised to integrate the heterogeneous collected data into a unified representation describing all facets of the targeted domain. Moreover, a massive volume of data demands the definition of novel data analytics strategies that exploit recent analysis paradigms and cloud-based platforms such as Hadoop and Spark. Proper strategies can also be devised for data and knowledge visualisation, possibly also involving interactive user interfaces.
The workshop aims to allow academics and practitioners from various research areas to share their experiences designing cutting-edge analytics solutions for real-life applications. Researchers are encouraged to submit their work-in-progress research activity describing innovative methodologies, algorithms, and platforms that address all facets of a data analytics process that provides interesting and useful services.
Industrial implementations of data analytics applications, design and deployment experience reports on various issues raising data analytics projects are particularly welcome. We call for research and experience papers and demonstration proposals covering any aspect of data analytics solutions for real-life applications.
Organizers: Tania Cerquitelli (Politecnico di Torino, Italy), Silvia Chiusano (Politecnico di Torino, Italy) and Genoveva Vargas-Solar (CNRS, LIRIS, France)
Description: Full day workshop
Information Visualization is nowadays one of the cornerstones of Data Science, turning the abundance of Big Data being produced through modern systems into actionable knowledge. Indeed, the Big Data era has realized the availability of voluminous datasets that are dynamic, noisy and heterogeneous in nature. Transforming a data-curious user into someone who can access and analyze that data is even more burdensome now for a great number of users with little or no support and expertise on the data processing part. Thus, the area of data visualization, visual exploration and analysis has gained great attention recently, calling for joint action from different research areas from the HCI, Computer graphics and Data management and mining communities.
In this respect, several traditional problems from these communities such as efficient data storage, querying & indexing for enabling visual analytics, new ways for visual presentation of massive data, efficient interaction and personalization techniques that can fit to different user needs are revisited. The modern exploration and visualization systems should nowadays offer scalable techniques to efficiently handle billion objects datasets, limiting the visual response in a few milliseconds along with mechanisms for information abstraction, sampling and summarization for addressing problems related to visual information overplotting. Further, they must encourage user comprehension offering customization capabilities to different user-defined exploration scenarios and preferences according to the analysis needs. Overall, the challenge is to offer self-service visual analytics, i.e., enable data scientists and business analysts to visually gain value and insights out of the data as rapidly as possible, minimizing the role of IT-expert in the loop.
The Big Data Visual Exploration and Analytics workshop (BigVis) aims at addressing the above challenges and issues by providing a forum for researchers and practitioners to discuss, exchange, and disseminate their work. BigVis attempts to attract attention from the research areas of: Data Management & Mining, Information Visualization, Human-Computer Interaction, Machine Learning, and Computer Graphics, and highlight novel works that bridge together these communities.
Organizers: Nikos Bikakis (ATHENA Research Center, Greece), Hanna Hauptmann (Utrecht University, Netherlands), George Papastefanatos (ATHENA Research Center, Greece) and Michael Sedlmair (University of Stuttgart, Germany)
Description: Half day workshop
Knowledge Graphs are a recent and promising incarnation of database methodologies and technology, which is attracting increasing use within domains characterized by the presence of many interconnected entities, interacting via complex dynamics.
While the convergence towards a consolidated definition has not been reached yet, underlying the different notions of KGs, there is the use of graph-based data models and systems in complex domains, where the need for handling and operationalizing specific and frequently complex domain knowledge calls for smart Knowledge Representation and Reasoning (KRR) paradigms and solutions, including logic-based reasoning, graph embeddings, graph neural networks, probabilistic reasoning, handling of uncertainty, modeling of temporal graphs, and many more.
Among the broad variety of fields where KGs are finding use and adoption, their impact on the economic and financial sector will be undoubtedly a long-lasting one, due to a close fit between technology and business, as witnessed by: i) the presence of large or extreme-scale stores of economic and financial data with inherent network structure; ii) the natural emergence of complex economic, financial and more in general societal network dynamics to be modeled and captured; iii) an articulated regulatory body that defines the interactions between the involved entities (e.g., the Basel III regulation, the European Central Bank legal frameworks covering many fields such as prudential supervision of credit institutions, the Investment Firm Directive, the MiFID/MiFIR and PSD2 directives, etc.).
EcoFinKG wants to reduce the distance between the database and economics/finance communities, sustaining new research-backed economic and financial applications that awarely use and demystify state-of-the-art data technology.
Organizers: Georg Gottlob (University of Oxford, UK),
Eleonora Laurenza (Financial Intelligence Unit for Italy, Italy and Bank of Italy, Italy), Emanuel Sallinger (TU Wien, Austria and University of Oxford, UK) and
Luigi Bellomarini (Bank of Italy, Italy)
Description: Full day workshop
Twenty three DOLAP workshops have been held in the past with great success. During these years, DOLAP has been established as one of the reference places for researchers to publish their work in the broad area of data decision support systems. DOLAP maintains a high quality of accepted papers, as attested by its ranking as "very good event" in the last edition of the GII-GRIN-SCIE Conference Ranking. It features a 2 rounds reviewing process, is associated with special issues of highly reputed international journals (like IS or DKE), invites keynotes from reputed speakers, attributes a best paper award since 2020, and favors open proceedings since 2017.
Research in data warehousing and OLAP has produced important technologies for the design, management, and use of information systems for decision support. Nowadays, due to the advent of Big Data, Decision Support Systems (DSS) embrace a wider range of systems, in which novel solutions combining advanced data management and data analytics, (semi-)automating the data lifecycle (from ingestion to visualization). Yet, the DSS principles remain the same: these systems acknowledge the relevance to manage data in an efficient way (by means of data modeling and optimized data processing) to serve innovative data analysis bringing added value to organizations.
DSS of the future will consequently be significantly different than what the current state-of-the-practice supports. The trend is to move from current systems that are "data presenting" to more dynamic systems that allow the semi-automation of the decision making process (including both data management and data analysis tasks). This means that systems partially guide their users towards data discovery, management and system-aided decision making via intelligent techniques (beyond OLAP) and visualization. In the back stage, the advent of the big data era, requires that new methods, models, techniques and architectures are developed to cope with the increasing demand in capacity, data type diversity, schema and data variability and responsiveness. And of course, this does not necessarily mean to re-invent the wheel, but rather, complement the wealth of research in DSS with other approaches. We envision DOLAP 2022 as a forum to discuss, foster and nurture novel ideas around these new landscapes of decision support systems in the era of big data in order to produce new exciting results, within a strong, vibrant community around these areas.
Like the previous DOLAP workshops, DOLAP 2022 aims at synergistically connecting the research community and industry practitioners and provides an international forum where both researchers and practitioners can share their findings in theoretical foundations, current methodologies, and practical experiences, and where industry technology developers can describe technical details about their products and companies exploiting BI and Big Data technology can discuss case studies and experiences.
Special Theme: Responsible Data Science. Data Science promises to bring significant improvements in people’s lives, accelerating knowledge discovery and innovation. However, lately, there has been an increasing concern regarding the lack of diversity (leading to exclusion), fairness (leading to discrimination), and transparency (leading to opacity) when making critical decisions. This motivates the need for methods, tools, and systems to ensure that data are used responsibly, especially in applications such as healthcare, education, and public policy. To promote novel solutions to this urgent problem, DOLAP 2022 will devote a special session to Responsible Data Science. Relevant topics include, but are not limited to: Explainable and interpretable analytics, Bias in big data and how to mitigate it, Data quality and data cleaning, FAIRness (Findability, Accessibility, Interoperability and Reusability) in OLAP.
Organizers: Kostas Stefanidis (Tampere University, Finland) and
Lukasz Golab (University of Waterloo, Canada)
Abstract: Graph data management and analytics has been one of the hottest topics at database conferences, and continues to be popular. Many graph database systems, originally managing RDF data, but increasingly those catering for the Property Graph model, have appeared and captured a growing niche of the data management market. After a decade of this trend, one would hope for mature technology, but this keynote will point out a number of weaknesses across the board. But there is also reason for optimism, in terms of convergence of functionality and technical and algorithmic improvements. Spurring this maturation of graph technology is one of the main goals of LDBC (ldbcouncil.org), an academic and industry non-profit organisation that our community has produced.
This keynote will describe the current landscape of graph database systems, including its challenges and opportunities. It will provide an update on multi-pronged efforts to standardise graph database functionality, in which LDBC collaborates with ISO, and outline the current understanding of the main technical ingredients of future graph data systems architecture.
Bio: Peter Boncz holds appointments as tenured researcher at CWI and professor at VU University Amsterdam. His academic background is in database architecture, with the early column-store MonetDB the outcome of his PhD. MonetDB later won the 2016 ACM SIGMOD systems award. He has a track record in bridging the gap between academia and commercial application. In 2008 he co-founded Vectorwise around the analytical database system by the same name which pioneered vectorized query execution. He is co-recipient of the 2009 VLDB 10 Years Best Paper Award, and in 2013 became a Fellow at TU Munich on receiving the Humboldt Research Award for his research on database architecture. In the past decade, his work in graph data management has focused on the Linked Database Benchmark Council (LDBC), which he co-founded in 2013 and still chairs. He also co-founded the SIGMOD workshops DAMON and GRADES, the latter themed on graph data management. His most recent new activity is in advising the new CWI startup around DuckDB (duckdb.org).
(#4)Rewriting with Acyclic Queries: Mind your Head
, Gaetano Geck (TU Dortmund, Germany), Jens Keppeler (TU Dortmund, Germany), Thomas Schwentick (TU Dortmund, Germany) and Christopher Spinrath (TU Dortmund, Germany)
Session Co-Chairs: Block A: Tilmann Rabl (HPI, Germany) and Block B: Ken Ross (Columbia University)
(#9)Bandwidth-optimal Relational Joins on FPGAs, Robert Lasch (TU Ilmenau, Germany & SAP SE, Germany), Mehdi Moghaddamfar (TU Dresden, Germany & SAP SE, Germany), Norman May (SAP SE, Germany), Suleyman Demirsoy (Intel Corporation, UK), Christian Faerber (Intel Corporation, UK) and Kai-Uwe Sattler (TU Ilmenau, Germany)
(#43)GPU-FAST-PROCLUS: A Fast GPU-parallelized Approach to Projected Clustering, Jakob Rødsgaard Jørgensen (Aarhus University, Denmark), Katrine Scheel (Aarhus University, Denmark), Ira Assent (Aarhus University, Denmark), Anne C. Elster (Norwegian University of Science and Technology, Norway) and Ajeet Ram Pathak (Vishwakarma Institute of Technology, India)
Abstract: Algorithmic systems that exploit large datasets are increasingly being used to assist or even replace human decision making, a fact that has raised various concerns about the trustworthiness of such systems. Algorithmic fairness addresses such concerns. In this talk, we focus on fairness for graphs and specifically on PageRank fairness. PageRank assigns a score to each node in the graph that signifies the importance of the node. Given that nodes belong to groups, we study fairness in terms of the distribution of the scores assigned to each group. We first present fairness-aware PageRank algorithms that achieve fairness with minimum loss from the original PageRank. Then, we take a different approach where instead of modifying the PageRank algorithm, we ask how to improve the graph. Concretely, we derive analytical formulas for the contribution of each edge to fairness and use them to design link recommendation algorithms for maximizing fairness. We present experimental results using various real and synthetic graphs. We conclude the talk with a list of open problems and promising research directions for algorithmic fairness for both graph and relational data.
Bio: Evaggelia Pitoura is a Professor at the University of Ioannina, Greece. She received a BSc degree from the University of Patras, Greece, and an MS and PhD degree from Purdue University, USA. Her research interests are in data management with a recent emphasis on data exploration, graphs, and responsible data management. Her publications include more than 150 articles in top-tier international journals and conferences. Her research has been funded by the EC and national sources. She has served or serves on the editorial board of ACM TODS, VLDBJ, TKDE, DAPD and as a group leader, senior PC member, or co-chair of many international conferences (including PC chair of EDBT 2016 and co-chair of ICDE 2012). She has received three best paper awards (ICDE 1999, DBSocial 2013, PVLDB 2013), a Marie Currie Fellowship and two Recognition of Service Awards from ACM. She is an ACM senior member, chair of the Greek ACM-W event steering committee and member of the sectorial scientific council of Greece National Council for Research, Technology and Innovation.
(#73)RingBFT: Resilient Consensus over Sharded Ring Topology, Sajjad Rahnama (University of California, Davis, US), Suyash Gupta (University of California, Berkeley, US), Rohan Sogani (University of California, Davis, US), Dhruv Krishnan (University of California, Davis, US) and Mohammad Sadoghi (University of California, Davis, US)
Session Co-Chairs: Block A: Thomas Schwentick (Technical University Dortmund, Germany) and Block B: Yufei Tao (Chinese University of Hong Kong, Hong Kong)
(#1)Robustness Against Read Committed for Transaction Templates with Functional Constraints
, Brecht Vandevoort (Hasselt University, Belgium and Transnational University of Limburg, Belgium), Bas Ketsman (Vrije Universiteit Brussel, Belgium), Christoph Koch (École Polytechnique Fédérale de Lausanne, Switzerland) and Frank Neven (Hasselt University, Belgium and Transnational University of Limburg, Belgium)
(#21)A Dyadic Simulation Approach to Efficient Range-Summability
, Jingfan Meng (Georgia Institute of Technology, US), Huayi Wang (Georgia Institute of Technology, US), Jun Xu (Georgia Institute of Technology, US) and Mitsunori Ogihara (University of Miami, US)
(#7)SAHARA: Memory Footprint Reduction of Cloud Databases with Automated Table Partitioning, Michael Brendle (University of Konstanz, Germany), Nick Weber (Celonis SE, Germany), Mahammad Valiyev (Technical University of Munich, Germany), Norman May (SAP SE, Germany), Robert Schulze (-), Alexander Boehm (SAP SE, Germany), Guido Moerkotte (University of Mannheim, Germany) and Michael Grossniklaus (University of Konstanz, Germany)
(#127) Columnar Storage Optimization and Caching for Data Lakes, Guodong Jin (Renmin University of China, China), Haoqiong Bian (EPFL, Switzerland), Yueguo Chen (Renmin University of China, China) and Xiaoyong Du (Renmin University of China, China)
Session Co-Chairs: Block A: Dan Olteanu (University of Zurich, Switzerland) and Block B: Ke Yi (Hong Kong University of Science and Technology, Hong Kong)
Abstract: Cardinality estimation is among the most important problems in query optimization. It is well-known that, when query plans go haywire, in most cases one can trace the root cause to the cardinality estimator being far off. In particular, traditional cardinality estimation based on selectivity estimation may sometimes under-estimate cardinalities by orders of magnitudes, because the independence or the uniformity assumptions do not typically hold. This talk outlines an approach to cardinality estimation that is "model-free" from a statistical stand-point. Being model-free means the approach tries to avoid making any distributional assumptions, as much as possible. The approach is information-theoretic, and generalizes recent results on worst-case output size bounds of queries, allowing the estimator to take into account histogram information from the input relations. The estimator turns out to be the objective of a maximization problem subject to concave constraints, over an exponential number of variables. We then explain how the estimator can be computed in polynomial time for some fragment of these constraints. Overall, the talk introduces a new direction to address the classic problem of cardinality estimation that is designed to circumvent some of the pitfalls of selectivity-based estimation. We will also present connections to information inequalities. This talk is based on (published and unpublished) joint works with Mahmoud Abo Khamis, Sungjin Im, Hossein Keshavarz, Phokion Kolaitis, Ben Moseley, XuanLong Nguyen, Kirk Pruhs, and Dan Suciu.
Bio: Hung Ngo is the VP of Research at RelationalAI Inc., where he also leads the query optimization group. Before RelationalAI, he worked at LogicBlox Inc from 2015 to 2017, and was a professor of Computer Science and Engineering at the State University of New York at Buffalo from 2001 to 2015. His current research focus is in the design and analysis of query evaluation and optimization algorithms for Rel, a Turing-complete declarative programming language developed at RelationalAI. Ngo received best papers awards at PODS 2012, PODS 2016, and ICDT 2019.
Session Co-Chairs: Block A: Emanuel Sallinger (TU Vienna, Austria) and Xiaofang Zhou (HKUST, Hong Kong) and Block B: Alin Deutsch (University of California San Diego, USA) and Bill Howe (University of Washington, USA)
Session Co-Chairs: Block A: Egor Kostylev (University of Oslo, Norway) and Block B: Markus Krötzsch (TU Dresden, Germany)
(#14)Inference of Shape Graphs for Graph Databases
, Sławek Staworko (INRIA, France), Benoit Groz (Paris Sud University, France), Aurélien Lemay (INRIA, France) and Piotr Wieczorek (University of Wroclaw, Poland)
(#42)Expressiveness of SHACL Features
, Bart Bogaerts (Vrije Universiteit Brussel, Belgium), Maxime Jakubowski (Hasselt University, Belgium) and Jan Van den Bussche (Hasselt University, Belgium)
(#27)TransER: Homogeneous Transfer Learning for Entity Resolution, Nishadi Kirielle (The Australian National University, Australia), Peter Christen (The Australian National University, Australia) and Thilina N Ranbaduge (The Australian National University, Australia)
(#86)Unsupervised Graph-based Entity Resolution for Accurate and Efficient Family Pedigree Search, Nishadi Kirielle (The Australian National University, Australia), Charini V Nanayakkara (The Australian National University, Australia), Peter Christen (The Australian National University, Australia), Chris Dibben (University of Edinburgh, UK), Lee Williamson (University of Edinburgh, UK), Eilidh Garrett (University of Edinburgh, UK) and Clair Manson (Public Health Scotland, UK)
(#122)Spatially Combined Keyword Searches, Artur Titkov (Johannes Gutenberg University of Mainz, Germany) and Panagiotis Bouros (Johannes Gutenberg University of Mainz, Germany)
Abstract: In this talk, we consider the problem of counting the solutions to a query. Our first motivating scenario is the use of regular expressions to extract paths from a graph database. More specifically, given a graph database D, a regular expression r and a natural number n, consider the problem of counting the number of paths p in D such that p conforms to r and the length of p is n. This problem is known to be hard, namely #P-complete. In this talk, we show that this problem admits a fully polynomial-time randomized approximation scheme (FPRAS). Remarkably, the key idea to prove this result is to show that the fundamental problem #NFA admits an FPRAS, where #NFA is the problem of counting the number of strings of length n accepted by a non-deterministic finite automaton (NFA). While this problem is known to be #P-complete and, more precisely, SpanL-complete, it was open whether this problem admits an FPRAS. In this work, we solve this open problem and obtain as a welcome corollary that every function in SpanL admits an FPRAS.
As a second motivating scenario, we consider the widely used class of conjunctive queries over relational databases. More specifically, for every class C of conjunctive queries with bounded treewidth, we introduce the first FPRAS for counting the answers to a query in C. In fact, our FPRAS is more general, and also applies to conjunctive queries with bounded hypertree width, as well as unions of such queries. As for the case of graph databases, the key ingredient in our proof is the resolution of a fundamental counting problem from automata theory. Specifically, we show that the problem #TA admits an FPRAS, where #TA is the problem of counting the number of trees of size n accepted by a tree automaton (TA).
This is joint work with Luis Alberto Croquevielle, Rajesh Jayaram and Cristian Riveros.
Bio: Marcelo Arenas is a Professor at the Department of Computer Science and the Institute for Mathematical and Computational Engineering, at the Pontificia Universidad Católica de Chile. He is the director of the Millennium Institute for Foundational Research on Data and the former director of the Center for Semantic Web Research. He received a Ph.D. from the University of Toronto in 2005. His research interests are in the areas of data management, applications of logic in computer science and Semantic Web. He has received an IBM Ph.D. Fellowship (2004), a SIGMOD Jim Gray Doctoral Dissertation Award Honorable Mention in 2006 for his Ph.D. dissertation "Design Principles for XML Data", the 2016 Semantic Web Science Association (SWSA) Ten-Year Award for the article "Semantics and Complexity of SPARQL" and nine best paper awards (PODS 2003, PODS 2005, ISWC 2006, ICDT 2010, ESWC 2011, PODS 2011, WWW 2012, ISWC 2014 and PODS 2019). He has served on multiple program committees and editorial boards, and he has chaired the program committees of ICDT 2015, ISWC 2015 and PODS 2018.
Learned Query Optimizer: At the Forefront of AI-Driven Databases, Rong Zhu (Alibaba Group, China), Ziniu Wu (MIT, US), Chengliang Chai (Tsinghua University, China), Andreas Pfadler (Alibaba Group, China), Bolin Ding (Alibaba Group, China), Guoliang Li (Tsinghua University, China) and Jingren Zhou (Alibaba Group, China)
Abstract: Applying ML-based techniques to optimize traditional databases, or AI4DB, has becoming a hot research spot in recent. Learned techniques for query optimizer (QO) is the forefront in AI4DB. QO provides the most suitable experimental plots for utilizing ML techniques and learned QO has exhibited superiority with enough evidence. In this tutorial, we aim at providing a wide and deep review and analysis on learned QO, ranging from algorithm design, real-world applications and system deployment. For algorithm, we would introduce the advances for learning each individual component in QO, as well as the whole QO module. For system, we would analyze the challenges, as well as some attempts, for deploying ML-based QO into actual DBMS. Based on them, we summarize some design principles and point out several future directions. We hope this tutorial could inspire and
guide researchers and engineers working on learned QO, as well as other context in AI4DB.
Bio: Rong Zhu is a research scientist in the Data Analytics and Intelligence Lab, Alibaba Damo Academy. He obtained the Ph.D. and B.S. degree from Harbin Institute of Technology in 2019 and 2013, respectively. His research interests lie in intelligent databases (AI4DB), graph data mining and graph processing system. He is nominated the China Computer Federation (CCF) Outstanding Doctoral Dissertation Award in 2020.
Ziniu Wu is a PhD candidate in CSAIL, MIT. He holds a Master degree in Computer Science from Oxford University in 2020. His research focuses on databases and ML for systems.
Chengliang Chai is a postdoc in the Department of Computer Science, Tsinghua University. He received his PhD degree in Computer Science from Tsinghua University in 2020. His research interests lie in crowdsourcing data management, data preparation and AI & DB co-optimization. He is one of the 2021 Forbes China 30 Under 30 member.
Andreas is a Senior Algorithm Engineer at the Data Analytics and Intelligence Lab of Alibaba’s Damo Academy. His research focuses on the intersection of ML, systems, databases and graphs. Before that he served as the Head of the Perception Computing Lab at Talking Data, Beijing. In a previous life he worked as consultant and managing consultant specializing in risk management, front office technology and quantitative development for financial institutions. He holds a PhD and Diploma in mathematics (TUM and TU Berlin) and an MSc in financial mathematics (Oxford).
Bolin Ding is a research scientist in in the Data Analytics and Intelligence Lab, Alibaba Damo Academy. Prior to joining Alibaba, he was a researcher in DMX group at Microsoft Research. He completed his PhD in Computer Science at University of Illinois at Urbana-Champaign in 2012. His research centers on large-scale data management and analytics, with focuses on data privacy for databases, AI for systems, and query optimization.
Guoliang Li is a professor in the Department of Computer Science, Tsinghua University. He received his PhD degree in Computer Science from Tsinghua University in 2009. His research interests mainly include database system, data cleaning and integration, crowdsourcing, and AI & DB co-optimization. He got VLDB 2017 Early Research Contribution Award, TCDE 2014 Early Career Award, CIKM 2017 Best Paper Award, KDD 2018 Best Papers and ICDE 2018 Best Papers.
Jingren Zhou is Senior Vice President at Alibaba and Deputy Director of Alibaba DAMO Academy. He has managed several core technical divisions at Alibaba to drive data intelligent infrastructure and applications in e-commerce and cloud business, including big data and AI infrastructure, search & recommendation, and advertising platform. His research interests include cloud-computing, databases, and large-scale machine learning. He received his PhD in Computer Science from Columbia University. He is a Fellow of IEEE.
(#22)Workload-Aware Materialization of Junction Trees, Martino Ciaperoni (Aalto University, Finland), Cigdem Aslay (Aarhus University, Denmark), Aristides Gionis (KTH Royal Institute of Technology, Sweden) and Michael Mathioudakis (University of Helsinki, Finland)
(#199)Distributed Training of Knowledge Graph Embedding Models using Ray, Nasrullah Sheikh (IBM Research - Almaden Lab, US), Xiao Qin (IBM Research - Almaden Lab, US), Yaniv Gur (IBM Research - Almaden Lab, US) and Berthold Reinwald (IBM Research - Almaden Lab, US)
Learned Query Optimizer: At the Forefront of AI-Driven Databases, Rong Zhu (Alibaba Group, China), Ziniu Wu (MIT, US), Chengliang Chai (Tsinghua University, China), Andreas Pfadler (Alibaba Group, China), Bolin Ding (Alibaba Group, China), Guoliang Li (Tsinghua University, China) and Jingren Zhou (Alibaba Group, China)
Abstract: Applying ML-based techniques to optimize traditional databases, or AI4DB, has becoming a hot research spot in recent. Learned techniques for query optimizer (QO) is the forefront in AI4DB. QO provides the most suitable experimental plots for utilizing ML techniques and learned QO has exhibited superiority with enough evidence. In this tutorial, we aim at providing a wide and deep review and analysis on learned QO, ranging from algorithm design, real-world applications and system deployment. For algorithm, we would introduce the advances for learning each individual component in QO, as well as the whole QO module. For system, we would analyze the challenges, as well as some attempts, for deploying ML-based QO into actual DBMS. Based on them, we summarize some design principles and point out several future directions. We hope this tutorial could inspire and
guide researchers and engineers working on learned QO, as well as other context in AI4DB.
Bio: Rong Zhu is a research scientist in the Data Analytics and Intelligence Lab, Alibaba Damo Academy. He obtained the Ph.D. and B.S. degree from Harbin Institute of Technology in 2019 and 2013, respectively. His research interests lie in intelligent databases (AI4DB), graph data mining and graph processing system. He is nominated the China Computer Federation (CCF) Outstanding Doctoral Dissertation Award in 2020.
Ziniu Wu is a PhD candidate in CSAIL, MIT. He holds a Master degree in Computer Science from Oxford University in 2020. His research focuses on databases and ML for systems.
Chengliang Chai is a postdoc in the Department of Computer Science, Tsinghua University. He received his PhD degree in Computer Science from Tsinghua University in 2020. His research interests lie in crowdsourcing data management, data preparation and AI & DB co-optimization. He is one of the 2021 Forbes China 30 Under 30 member.
Andreas is a Senior Algorithm Engineer at the Data Analytics and Intelligence Lab of Alibaba’s Damo Academy. His research focuses on the intersection of ML, systems, databases and graphs. Before that he served as the Head of the Perception Computing Lab at Talking Data, Beijing. In a previous life he worked as consultant and managing consultant specializing in risk management, front office technology and quantitative development for financial institutions. He holds a PhD and Diploma in mathematics (TUM and TU Berlin) and an MSc in financial mathematics (Oxford).
Bolin Ding is a research scientist in in the Data Analytics and Intelligence Lab, Alibaba Damo Academy. Prior to joining Alibaba, he was a researcher in DMX group at Microsoft Research. He completed his PhD in Computer Science at University of Illinois at Urbana-Champaign in 2012. His research centers on large-scale data management and analytics, with focuses on data privacy for databases, AI for systems, and query optimization.
Guoliang Li is a professor in the Department of Computer Science, Tsinghua University. He received his PhD degree in Computer Science from Tsinghua University in 2009. His research interests mainly include database system, data cleaning and integration, crowdsourcing, and AI & DB co-optimization. He got VLDB 2017 Early Research Contribution Award, TCDE 2014 Early Career Award, CIKM 2017 Best Paper Award, KDD 2018 Best Papers and ICDE 2018 Best Papers.
Jingren Zhou is Senior Vice President at Alibaba and Deputy Director of Alibaba DAMO Academy. He has managed several core technical divisions at Alibaba to drive data intelligent infrastructure and applications in e-commerce and cloud business, including big data and AI infrastructure, search & recommendation, and advertising platform. His research interests include cloud-computing, databases, and large-scale machine learning. He received his PhD in Computer Science from Columbia University. He is a Fellow of IEEE.
Abstract: The standard approach to algorithm development is to focus on a specific problem and develop for it a specific algorithm. Codd's introduction of the relational model in 1970 included two fundamental ideas: (1) Relations provide a universal data representation formalism, and (2) Relational databases can be queried using first-order logic. Realizing these ideas required the development of a meta-algorithm, which takes a declarative query and executes it with respect to a database. In this talk, I will describe this approach, which I call Logical Algorithmics, in detail, and explore its profound ramification.
Bio: Moshe Y. Vardi is University Professor and the George Distinguished Service Professor in Computational Engineering at Rice University. He is the recipient of several awards, including the ACM SIGACT Goedel Prize, the ACM Kanellakis Award, the ACM SIGMOD Codd Award, the Knuth Prize, the IEEE Computer Society Goode Award, and the EATCS Distinguished Achievements Award. He is the author and co-author of over 750 papers, as well as two books. He is a Guggenheim Fellow as well as fellow of several societies, and a member of several academies, including the US National Academy of Engineering and National Academy of Science. He holds seven honorary doctorates. He is a Senior Editor of the Communications of the ACM, the premier publication in computing.
Abstract: Training AI models can be viewed as interacting human intelligence with model intelligence. From this view, current norms of model training are constrained to crowdsourcing label annotations for a closed set. Such constrained interactions may explain why models solve datasets, instead of pursuing true learning goals. This talk discusses our recent work, showing how data intelligence research is relevant to enriching interactions between human and model intelligence, for robust training of NLP models that generalize well. More details can be found at http://seungwonh.github.io.
Bio: Seung-won Hwang is a Professor of Computer Science and Engineering at Seoul National University. Prior to joining SNU, she has been a faculty at POSTECH and Yonsei University, after her PhD from UIUC. Her research interests concern the interaction between data and language intelligence. Her work has been published at top-tier AI, DB/DM, and IR/NLP venues, including ACL, AAAI, IJCAI, NAACL, SIGIR, SIGMOD, VLDB, and ICDE. Her contributions have been recognized by awards from WSDM and Microsoft Research.
Abstract: Genomics is an extremely complex domain, in terms of concepts, their relations, and their representation in data. The proposed tutorial will introduce the use of conceptual models and databases in the context of genomic systems: these are of great help for simplifying the domain and making it actionable. We will focus on two relevant and current fields: 1) Next-Generation Sequencing (NGS) human genomes and 2) SARS-CoV-2 viral sequences collected during the COVID-19 pandemic.
Genomic experiments and sequences are described by several metadata, specifying information on the sampled organism, on the used technology, and on the organizational process behind the experiment. Challenges in the representation and computation regard also region data, i.e., the actual portions of the genome that have been read by sequencing technologies and encoded into a machine-readable representation.
For both discussed domains, we carry out a review of successful models presented in the literature for representing biologically-relevant entities and for grounding them on databases (both for region data and metadata). Then, we exploit the proposed models for designing databases' structure, provisioning pipelines, and required optimizations. Finally, we propose a number of search systems, visualizers, and analysis environments built upon such databases.
Both NGS human genomics and SARS-CoV-2 genomics offer several use cases and applications of broad public interest. The tutorial demonstrates the practical use of conceptual models’ principles within an interdisciplinary domain, setting the premises for collaboration with a greater public (possibly including all life science researchers).
Bio: Anna Bernasconi obtained her PhD cum laude at Politecnico di Milano, within the “Data-driven Genomic Computing” ERC Awarded project (2016-2021), under the supervision of Professor Stefano Ceri. She is now a postdoctoral researcher in the Department of Electronics, Information, and Bioengineering at Politecnico di Milano and a visiting researcher at Universitat Politècnica de València. Her research focuses on conceptual modeling, data integration, semantic web, and biological data analysis. Since the COVID-19 pandemic, her research has focused on viral genomics, by building models, databases, and Web search systems for viral sequences and their variants. She is active in the conceptual modeling community, with several paper presentations and the organization of two workshops on conceptual models and web applications for life sciences (ER and ICWE conferences).
Pietro Pinoli works as Researcher Fellow and lecturer at the Department of Electronics, Information and Bioengineering at the Politecnico di Milano (Italy). He received his PhD cum laude in 2017, with a thesis titled “Modeling and Querying Genomic Data” where he proposed and benchmarked data structures and algorithms to manage, search and elaborate huge collections of genomic datasets, by means of cloud and distributed technologies. He has been visiting PhD student at Harvard University (Cambridge, MA, US). His research interests include bioinformatics and computational biology, data bases and data management, big data technology and algorithms, machine learning and natural language processing, and drug repurposing. He participated in the Italian PRIN GenData, ERC GeCo and EIT VirusLab projects.
Session Co-Chairs: Block A: Maya Ramanath (IIT Delhi, India) and Block B: Romila Pradhan (Purdue University, USA)
(#32)SURAGH: Syntactic Pattern Matching to Identify Ill-Formed Records, Mazhar Hameed (Hasso Plattner Institute, Germany), Gerardo Vitagliano (Hasso Plattner Institute, Germany), Lan Jiang (Hasso Plattner Institute, Germany) and Felix Naumann (Hasso Plattner Institute, Germany)
(#117)Fine-Tuning Dependencies with Parameters, Alireza Vezvaei (University of Waterloo, Canada), Lukasz Golab (University of Waterloo, Canada), Mehdi Kargar (Ryerson University, Canada), Divesh Srivastava (AT&T Chief Data Office, US), Jaroslaw Szlichta (Ontario Tech University, Canada) and Morteza Zihayat (Ryerson University, Canada)
Abstract: Genomics is an extremely complex domain, in terms of concepts, their relations, and their representation in data. The proposed tutorial will introduce the use of conceptual models and databases in the context of genomic systems: these are of great help for simplifying the domain and making it actionable. We will focus on two relevant and current fields: 1) Next-Generation Sequencing (NGS) human genomes and 2) SARS-CoV-2 viral sequences collected during the COVID-19 pandemic.
Genomic experiments and sequences are described by several metadata, specifying information on the sampled organism, on the used technology, and on the organizational process behind the experiment. Challenges in the representation and computation regard also region data, i.e., the actual portions of the genome that have been read by sequencing technologies and encoded into a machine-readable representation.
For both discussed domains, we carry out a review of successful models presented in the literature for representing biologically-relevant entities and for grounding them on databases (both for region data and metadata). Then, we exploit the proposed models for designing databases' structure, provisioning pipelines, and required optimizations. Finally, we propose a number of search systems, visualizers, and analysis environments built upon such databases.
Both NGS human genomics and SARS-CoV-2 genomics offer several use cases and applications of broad public interest. The tutorial demonstrates the practical use of conceptual models’ principles within an interdisciplinary domain, setting the premises for collaboration with a greater public (possibly including all life science researchers).
Bio: Anna Bernasconi obtained her PhD cum laude at Politecnico di Milano, within the “Data-driven Genomic Computing” ERC Awarded project (2016-2021), under the supervision of Professor Stefano Ceri. She is now a postdoctoral researcher in the Department of Electronics, Information, and Bioengineering at Politecnico di Milano and a visiting researcher at Universitat Politècnica de València. Her research focuses on conceptual modeling, data integration, semantic web, and biological data analysis. Since the COVID-19 pandemic, her research has focused on viral genomics, by building models, databases, and Web search systems for viral sequences and their variants. She is active in the conceptual modeling community, with several paper presentations and the organization of two workshops on conceptual models and web applications for life sciences (ER and ICWE conferences).
Pietro Pinoli works as Researcher Fellow and lecturer at the Department of Electronics, Information and Bioengineering at the Politecnico di Milano (Italy). He received his PhD cum laude in 2017, with a thesis titled “Modeling and Querying Genomic Data” where he proposed and benchmarked data structures and algorithms to manage, search and elaborate huge collections of genomic datasets, by means of cloud and distributed technologies. He has been visiting PhD student at Harvard University (Cambridge, MA, US). His research interests include bioinformatics and computational biology, data bases and data management, big data technology and algorithms, machine learning and natural language processing, and drug repurposing. He participated in the Italian PRIN GenData, ERC GeCo and EIT VirusLab projects.
Session Co-Chairs: Block A: Ke Yang (University of Massachusetts, Amherst) and Block B: Amir Gilad (Duke University, USA)
(#150)DP-Shield: Face Obfuscation with Differential Privacy, Muhammad Usama Saleem (University of North Carolina at Charlotte, US), Dominick Reilly (University of North Carolina at Charlotte, US) and Liyue Fan (University of North Carolina at Charlotte, US)
(#143)An Extensive and Secure Personal Data Management System using SGX, Robin Carpentier (INRIA & University of Versailles Saint-Quentin, France), Floris Thiant (INRIA, France), Iulian Sandu Popa (INRIA & University of Versailles Saint-Quentin, France), Nicolas Anciaux (INRIA, France) and Luc Bouganim (INRIA, France)
(#49)PReVer: Towards Private Regulated Verified Data [Vision Paper], Mohammad Javad Amiri (University of Pennsylvania, US), Tristan Allard (Univ Rennes, CNRS, IRISA, France), Divy Agrawal (University of California, Santa Barbara, US) and Amr El Abbadi (University of California, Santa Barbara, US)
(#92)Masked Language Models as Stereotype Detectors?, Yacine Gaci (Université Claude Bernard Lyon 1, France), Boualem Benatallah (UNSW Sydney, Australia), Fabio Casati (University of Trento, Italy) and Khalid Benabdeslem (Université Claude Bernard Lyon 1, France)
Session Co-Chairs: Genoveva Vargas-Solar (CNRS, France) and Panos Chrysantis (University of Pittsburgh, USA)
Abdesalam Soudi (University of Pittsburgh, USA), Renata Borovica-Gajic (University of Melbourne, Australia), Sourav Bhowmick (Nanyang Technological University, Singapore), Tania Cerquitelli (Politecnico di Torino, Italy), Marta Matoso (COPPE/UFRJ, Brazil), and Paola Ricaurte Quijano (ITESM, Mexico)
Abstract: The EDBT/ICDT community believes that diversity and culture of support encourage retention and attraction of talent, promote diversity of thought and perspective, and help make the scientific community more flexible and responsive in times of change. However, our conferences still lack diverse voices, and we want to identify the critical barriers to building an inclusive database community.
Therefore, the EDBT/ICDT 2022 panel on Diversity and Inclusion (D&I) is opening borders and is willing to explore the meaning and perception of D&I in scientific contexts across different regions. The EDBT/ICDT D&I panel seeks to have other voices at the table to gain a broader perspective, address challenges, and see solutions from various angles (gender, race, age, experience level, industry, seniority, etc.). The key guiding question is:
What does DnI mean in Europe, Latin America, the Middle East, the north and south of Africa, Asia, Oceania, etc.?
The panel will gather voices from scientists who lead actions and integrate DnI policies into their environment.
Bio:
Dr. Abdesalam Soudi earned his PhD in sociolinguistics from the University of Pittsburgh. His research focuses on the human-computer interface in doctors’ consultations. In 2014, he completed the Speaking in the Disciplines Fellowship. He won the inaugural 2017 Diversity in the Curriculum Award for his success in creating a diverse and inclusive learning environment for his students. In 2018, Dr. Soudi won a first ever Pitt Seed award for a proposal to build an engagement platform for connecting his program to the community and tech industry. Dr. Soudi holds a full-time faculty appointment with the department of Linguistics at the University of Pittsburgh. He is also a faculty affiliate with the Global Studies, African Studies Center, and Faculty Fellow with Pitt Honors College. Dr. Soudi co-directs a Masters-level cultural competency course, and he serves as the program director for the Linguistic Internship at Pitt. Dr. Soudi spearheads the Humanities in Health (HinH) initiative at Pitt and has chaired five conferences on this topic. Dr. Soudi has published his research in several journals and magazines. Recently, he co-edited a volume on Diversity Across the Disciplines. The book investigates diversity across disciplines with attention to people, processes, policies, and paradigms.
Renata Borovica-Gajic holds the position of Senior Lecturer in Data Analytics in the School of Computing and Information Systems at The University of Melbourne. Dr Borovica-Gajic received her Ph.D. degree in Computer Science from Swiss Federal Institute of Technology in Lausanne (EPFL), Switzerland. Renata's research focuses on solving data management problems when storing, accessing and processing massive data sets, enabling faster, more predictable, and cheaper data analysis as a result. Her work has repeatedly appeared in the premier data management conferences including SIGMOD, VLDB, and ICDE. In addition to doing research, Renata serves as Diversity and Inclusion Lead for the School. Through this role, she develops a strategic program and leads numerous initiatives towards increasing the sense of belonging of staff and students.
Sourav S Bhowmick is an Associate Professor at the Nanyang Technological University, Singapore. His core research interests lie in data management, human-data interaction, and data analytics. Recently, he is interested in data-driven solutions to detect and manage conflicts-of-interest and bias in peer-review systems. He is the inventor of CLOSET, a comprehensive software for detection and management of COI that is currently been used in multiple conferences. He is a co-recipient of Best Paper Awards in ACM CIKM 2004, ACM BCB 2011, and VLDB 2021 for work on mining structural evolution of tree-structured data, generating functional summaries, and scalable attributed network embedding, respectively. Sourav is serving as a member of the SIGMOD Executive Committee, a regular member of the PVLDB advisory board, and a co-lead in the committee for Diversity and Inclusion in Database Conference Venues. He is a co-recipient of the VLDB Service Award in 2018 from the VLDB Endowment. He was inducted into Distinguished Members of the ACM in 2020. He likes to waste his time painting and sketching.
Tania Cerquitelli is an associate professor at the Department of Control and Computer Engineering at Politecnico di Torino, Italy. Her research interests include novel and effective techniques for explaining black-box models, algorithms for democratizing data science methods, self-learning methods, and innovative data science algorithms for industrial applications. Tania is Chair of the Gender Equality, Non-Discrimination and Anti-Harassment Committee and a member of the Gender Observatory at Politecnico di Torino. Tania is an associate editor of Computer Networks, Future Generation Computer Systems, Knowledge and Information Systems, and Expert Systems with Applications.
Marta Mattoso is a Full Professor at COPPE - Federal University of Rio de Janeiro. Her topics of interest in Data Science include aspects of large-scale data management. Among the interests are provenance data to support human analysis during parallel execution of many computational tasks in high performance environments. She has supervised more than 80 graduate students. She is a CNPq's research productivity fellow level 1B. Her research is applied to real problems, addressing scientific experiments in workflows in the area of Computational Science, including deep machine learning. She has coordinated research projects funded by CNPq, CAPES, Faperj and international collaboration projects with INRIA, France, since 2001. She is currently a Mercator Fellow at DFG, Germany and member of the board of experts of the WorkflowsRI project in the U.S. She is a member of ACM, IEEE and a founding member of the Brazilian Computer Society. She serves on program committees of international conferences and is a member of the editorial board of several international journals.
Mirella M. Moro is an associate professor at the Computer Science department at UFMG (Belo Horizonte, Brazil). She holds a Ph.D. in Computer Science (University of California Riverside - UCR, 2007), and MSc and BSc in Computer Science as well (UFRGS, Brazil, 2001, 1999). She is a member of the ACM SIGMOD, ACM SIGCSE, ACM-W, IEEE, IEEE WIE, SBC, and MentorNet. She was part of ACM Education Council (2011-2019), Education Director of SBC (Brazilian Computer Society, 2009-2015) and the editor-in-chief of the electronic magazine SBC Horizontes (2008-2012). Mirella has been working with research in Computer Science in the area of Databases since 1997. Her research interests include gender diversity, social networks, scientometrics, query optimization.
Paola Ricaurte is an associate professor in the Department of Media and Digital Culture at Tecnológico de Monterrey and a faculty associate at the Berkman Klein Center for Internet & Society at Harvard University. With Nick Couldry and Ulises Mejías, she co-founded Tierra Común, a network of scholars, practitioners, and activists interested in decolonizing data.
Panos K. Chrysanthis is a Professor of Computer Science and a founder and director of the Advanced Data Management Technologies Laboratory at the University of Pittsburgh. He is also an adjunct Professor at Carnegie-Mellon University and at the University of Cyprus. He received his BS degree (Physics with a concentration in Computer Science, 1982) from the University of Athens, Greece. He earned his MS and PhD degrees (Computer and Information Sciences, 1986 and 1991) from the University of Massachusetts at Amherst. His research interests lie within the areas of data management (Big Data, Databases, Data Streams & Sensor/IoT networks), data analytics, exploration & visualization, distributed & mobile computing, operating systems and real-time systems. In 1995, he was a recipient of the U.S. National Science Foundation CAREER Award for his investigation on the management of data for mobile and wireless computing. His editorial service includes VLDB J (2001-2007), IEEE TKDE (2012-2017) and DAPD (2011-present) and his recent conference service include PC Chair of IEEE ICDE (2018), IEEE MDM (2021) and ACM DEBS (2022). He is involved in the Diversity & Inclusion DB initiative, serving as the D&I Chair for EDBT 2021-2023, MDM 2021-2022, and DEBS 2022. Chrysanthis is an ACM Distinguished Scientist, a Senior Member of IEEE, and IEEE Computer Society Distinguished Contributor. He was honored with seven teaching awards, in 2015, he received the University of Pittsburgh's Provost Award for Excellence in Mentoring (doctoral students) and in 2019 the Outstanding Achievement in Education, School of Information and Computer Science, University of Massachusetts, Amherst.
Genoveva Vargas-Solar (http://www.vargas-solar.com) is a principal scientist of the French Council of Scientific Research (CNRS) and a member of the DataBase group of Laboratory on Informatics on Image and Information Systems (LIRIS). She is a regular member of the Mexican Academia of Computing. She obtained her Habilitation à Diriger des Recherches (HDR - tenure) from University of Grenoble. She obtained her first PhD degree in Computer Science at University Joseph Fourier and her second PhD degree in Literature at University Stendhal. She obtained her first master’s degree in computer science at University Joseph Fourier and her second master’s degree in Compared Literature at University Stendhal. She did her undergraduate studies on Computer Systems Engineering at Universidad de las Américas in Puebla. She contributes to the construction of service-based database/data science management systems. She proposes query evaluation methodologies, algorithms, and tools for composing, deploying, and executing data science functions on just in time architectures (disaggregated data centres). Her research interests in Literature concern middle age Literature, myths’ critics and myths’ analysis applied to different myths of origins. She promotes gender equality and diversity and inclusion (D&I) actions. She is a member of the gender equality committee at LIRIS and she represents EDBT in the inter-conference group D&I databases. She leads the SINFONIA and JOWDISAI projects on women's work in AI and DS. She actively promotes scientific cooperation in Computer Science between Latin America and Europe, particularly between France and Mexico.
Session Co-Chairs: Block A: Varun Pandey (TU Berlin, Germany) and Block B: Raoni Lourenço (Accern, USA)
(#23)Mining Change Rules, Daniel Lindner (Hasso Plattner Institute, Germany), Franziska Schumann (Hasso Plattner Institute, Germany), Nicolas Alder (Hasso Plattner Institute, Germany), Tobias Bleifuß (Hasso Plattner Institute, Germany), Leon Bornemann (Hasso Plattner Institute, Germany) and Felix Naumann (Hasso Plattner Institute, Germany)
(#135)A Neural Approach to Forming Coherent Teams in Collaboration Networks, Radin Hamidi Rad (Ryerson University, Canada), Shirin Seyedsalehi (Ryerson University, Canada), Mehdi Kargar (Ryerson University, Canada), Morteza Zihayat (Ryerson University, Canada) and Ebrahim Bagheri (Ryerson University, Canada)
(#63)Evaluation of Algorithms for Interaction-Sparse Recommendations: Neural Networks don't Always Win [Experiments & Analysis], Yasamin Klingler (ZHAW Zurich University of Applied Sciences, Switzerland), Claude Lehmann (ZHAW Zurich University of Applied Sciences, Switzerland), João Pedro Monteiro (Veezoo, Switzerland), Carlo Saladin (Veezoo, Switzerland), Abraham Bernstein (University of Zurich, Switzerland) and Kurt Stockinger (ZHAW Zurich University of Applied Sciences, Switzerland)
(#59)Evaluating In-Memory Hash-Joins on Persistent Memory, Tobias Maltenberger (Hasso Plattner Institute, Germany), Till Lehmann (Hasso Plattner Institute, Germany), Lawrence Benson (Hasso Plattner Institute, Germany) and Tilmann Rabl (Hasso Plattner Institute, Germany)
(#87)Integrating the Orca Optimizer into MySQL, Arunprasad P Marathe (Huawei Technologies Canada, Canada), Shu Lin (Huawei Technologies Canada, Canada), Weidong Yu (Huawei Technologies Canada, Canada), Kareem El Gebaly (Huawei Technologies Canada, Canada), Paul Larson (Huawei Technologies Canada, Canada), and Calvin Sun (Huawei Technologies Canada, Canada)
(#10)ArrayQL Integration into Code-Generating Database Systems, Maximilian E Schüle (Technical University of Munich, Germany), Tobias Götz (Technical University of Munich, Germany), Alfons Kemper (Technical University of Munich, Germany) and Thomas Neumann (Technical University of Munich, Germany)
Session Chair: Amelie Marian (Rutgers University, USA)
From Cloud to Serverless: MOO in the new Cloud epoch, Michail Georgoulakis Misegiannis (National Technical University of Athens, Greece), Verena Kantere (National Technical University of Athens, Greece) and Laurent d'Orazio (Univ Rennes, CNRS, France)
Abstract: During the last 10 years, the volume of global data has risen more than tenfold. The commercial rise of cloud computing eased the process of storing, processing and managing big data. Recently, the cloud evolved with the emergence of serverless computing platforms that offer an even more abstracted service model. The elasticity of cloud computing creates significant optimization problems, which can be tackled either with a single objective, or as multi-objective opimization problems (MOO). When it comes to data management, the two main MOO problems in a cloud
computing environment are query optimization and task scheduling. Some of the techniques used for solving MOO problems in the cloud are the weighted sum model, evolutionary algorithms and machine learning techniques. We propose the presentation of a tutorial that will underline the main MOO problems of a cloud computing environment in regards to data management, and evaluate the use of serverless computing for such problems. The tutorial will offer the audience a better understanding of current MOO challenges and applications in the cloud, while also giving them an overview of different solutions to such problems, and the techniques used.
Bio: Michail Georgoulakis Misegiannis is a graduate of the School of Electrical and Computer Engineering in the National Technical University of Athens (NTUA). His research interests include query processing, query
optimization and cost modeling which were the basis of his diploma thesis. He is also interested in trends and open problems in cloud computing, as well as data management systems for modern hardware.
Verena Kantere is an Assistant Professor at the School of ECE of the NTUA. She has been an Associate Professor at the School of Electrical Engineering and Computer Science at the University of Ottawa and a
Maître d’Enseignement et de Recherche at the Centre Universitaire d’Informatique of the University of Geneva. She has also hold a tenure-track junior assistant professor at the Department of Electrical Engineering and Information Technology at the Cyprus University of
Technology. She has received a Diploma and a Ph.D. from the NTUA and a M.Sc. from the Department of Computer Science at the University of Toronto. Dr Kantere has been working towards the provision of data
management and services as well as query processing and optimization in large-scale systems, including cloud computing systems, distributed systems and hybrid systems, focusing on properties of Big Data, the
performance of Big Data analytics and multi-objective optimization.
Laurent d'Orazio has been a Professor at Univ Rennes, CNRS, IRISA since 2016. He received his PhD degree in computer science from Grenoble National Polytechnic Institute in 2007. He was an Associate Professor at
Blaise Pascal University and LIMOS CNRS, Clermont-Ferrand from 2008 to 2016. His research interests include (big) data algorithms and architectures, distributed and parallel databases. He has published
papers in Information Systems, Sigmod Record, Concurrency and Computation Practice and Experience, EDBT. He served in Program Committees in BPM, workshops affiliated to VLDB, EDBT, etc. and Reviewing Committees in Distributed and Parallel Databases, Transactions on Parallel and Distributed Systems, Concurrency and Computation: Practice and Experience. He is or has been involved (sometimes as a
coordinator) in research projects such as the NSF MOCCAD project (since 2013), the ANR SYSEO project (797 000 euros funding, 2010-2015) and the STIC ASIA GOD project (30 000 euros funding, 2013-2015).
Session Co-Chairs: Block A: Ioana Manolescu (INRIA, France) and Block B: Julia Stoyanovich (New York University, USA)
(#44)Aggregation Detection in CSV Files, Lan Jiang (Hasso Plattner Institute, Germany), Gerardo Vitagliano (Hasso Plattner Institute, Germany), Mazhar Hameed (Hasso Plattner Institute, Germany) and Felix Naumann (Hasso Plattner Institute, Germany)
(#103)Voyager: Data Discovery and Integration for Data Science, Alex Bogatu (University of Manchester, UK), Norman Paton (University of Manchester, UK), Mark Douthwaite (Peak AI, UK) and André Freitas (University of Manchester, UK)
(#142)MM-infer: A Tool for Inference of Multi-Model Schemas, Pavel Koupil (Charles University, Czech Republic), Sebastián Hricko (Charles University, Czech Republic) and Irena Holubova (Charles University, Czech Republic)
(#124)Similarity-driven Schema Transformation for Test Data Generation, Fabian Panse (Universität Hamburg, Germany), Meike Klettke (Universität Rostock, Germany), Johannes Schildgen (Regensburg University of Applied Sciences, Germany) and Wolfram Wingerath (Baqend, Germany)
Session Chair: Amelie Marian (Rutgers University, USA)
From Cloud to Serverless: MOO in the new Cloud epoch, Michail Georgoulakis Misegiannis (National Technical University of Athens, Greece), Verena Kantere (National Technical University of Athens, Greece) and Laurent d'Orazio (Univ Rennes, CNRS, France)
Abstract: During the last 10 years, the volume of global data has risen more than tenfold. The commercial rise of cloud computing eased the process of storing, processing and managing big data. Recently, the cloud evolved with the emergence of serverless computing platforms that offer an even more abstracted service model. The elasticity of cloud computing creates significant optimization problems, which can be tackled either with a single objective, or as multi-objective opimization problems (MOO). When it comes to data management, the two main MOO problems in a cloud
computing environment are query optimization and task scheduling. Some of the techniques used for solving MOO problems in the cloud are the weighted sum model, evolutionary algorithms and machine learning techniques. We propose the presentation of a tutorial that will underline the main MOO problems of a cloud computing environment in regards to data management, and evaluate the use of serverless computing for such problems. The tutorial will offer the audience a better understanding of current MOO challenges and applications in the cloud, while also giving them an overview of different solutions to such problems, and the techniques used.
Bio: Michail Georgoulakis Misegiannis is a graduate of the School of Electrical and Computer Engineering in the National Technical University of Athens (NTUA). His research interests include query processing, query
optimization and cost modeling which were the basis of his diploma thesis. He is also interested in trends and open problems in cloud computing, as well as data management systems for modern hardware.
Verena Kantere is an Assistant Professor at the School of ECE of the NTUA. She has been an Associate Professor at the School of Electrical Engineering and Computer Science at the University of Ottawa and a
Maître d’Enseignement et de Recherche at the Centre Universitaire d’Informatique of the University of Geneva. She has also hold a tenure-track junior assistant professor at the Department of Electrical Engineering and Information Technology at the Cyprus University of
Technology. She has received a Diploma and a Ph.D. from the NTUA and a M.Sc. from the Department of Computer Science at the University of Toronto. Dr Kantere has been working towards the provision of data
management and services as well as query processing and optimization in large-scale systems, including cloud computing systems, distributed systems and hybrid systems, focusing on properties of Big Data, the
performance of Big Data analytics and multi-objective optimization.
Laurent d'Orazio has been a Professor at Univ Rennes, CNRS, IRISA since 2016. He received his PhD degree in computer science from Grenoble National Polytechnic Institute in 2007. He was an Associate Professor at
Blaise Pascal University and LIMOS CNRS, Clermont-Ferrand from 2008 to 2016. His research interests include (big) data algorithms and architectures, distributed and parallel databases. He has published
papers in Information Systems, Sigmod Record, Concurrency and Computation Practice and Experience, EDBT. He served in Program Committees in BPM, workshops affiliated to VLDB, EDBT, etc. and Reviewing Committees in Distributed and Parallel Databases, Transactions on Parallel and Distributed Systems, Concurrency and Computation: Practice and Experience. He is or has been involved (sometimes as a
coordinator) in research projects such as the NSF MOCCAD project (since 2013), the ANR SYSEO project (797 000 euros funding, 2010-2015) and the STIC ASIA GOD project (30 000 euros funding, 2013-2015).
(#118)RoleSim+: A Fast Algorithm for RoleSim Similarity Search, Weiren Yu (University of Warwick, UK), Sima Iranmanesh (University of Warwick, UK), Xuming Hong (Nanjing University of Science and Technology, China) and Jianxun Xu (Nanjing University of Science and Technology, China)
(#57)A Supervised Skyline-Based Algorithm for Spatial Entity Linkage, Suela Isaj (Aalborg University, Denmark), Vassilis Kaffes (University of the Peloponnese, Greece), Torben Bach Pedersen (Aalborg University, Denmark) and Giorgos Giannopoulos (Athena Research Center, Greece)