Session Co-Chairs: Yongluan Zhou (University of Copenhagen), Panos K. Chrysanthis (University of Pittsburgh), Vincenzo Gulisano (Chalmers University of Technology) and Sihem Amer-Yahia (CNRS, Univ. Grenoble Alpes)
Abstract: The rise of temperatures on Earth have alerted communities around the globe to devise immediate solutions to help curb the severe effects of Climate Change, which is attributed to greenhouse gas emissions caused by human activities. Even though the Computing field is a cause of emissions in its own right, it also has the potential to increase the efficiency of human workflows in all sectors, such as transportation, buildings, energy & heat production, industry, agriculture and livestock, etc. In this panel discussion, we start out with a general overview of terminology, factors, metrics, and objectives related to climate change, and then survey: (i) Green Conferences; (ii) Green Mobility; (iii) Green Cities and (iv) Green Smart Spaces. The participants are expected to bring into the discussion their own perspectives from the academic, governmental, and industrial sector to report on how they perceive the future of the Computing Field in a future shaped by Climate Change, and how we can all help achieve the goals of the Paris Agreement.
Bio: Antoine Amarilli is Associate Professor in Computer Science at Télécom Paris in the DIG team. His research topics focus on data management and theoretical computer science. He is a maintainer of the TCS4F initiative on the climate crisis and of the 'No free view? No review!' manifesto on open access to scientific publications.
Christophe Claramunt is a professor in computer science at the French Naval Academy. His research is oriented towards theoretical, pluri-disciplinary and practical aspects of geographical information science (GIS). His main research interests are oriented to environmental, maritime and and urban GISs. He has long contributed to the development of computing and GIS systems in developping countries and actively orientates his research on the development of green and environmental friendly computing applications.
Demetrios Zeinalipour-Yazti is an Associate Professor of Computer Science at the University of Cyprus. His primary research interests include Data Management in Computer Systems and Networks. He actively engages in activities to help curbing the climate crisis, including research and practice on green planning systems for self-consumption of renewable energy and development of virtual conference and tourism platforms to tackle the climate/COVID-19/energy crisis.
Abstract: Complex Event Recognition (CER) refers to the activity of detecting patterns in streams of continuously arriving “event” data over (geographically) distributed sources. CER is a key ingredient of many contemporary Big Data applications that require the processing of such event streams in order to obtain timely insights and implement reactive and proactive measures. Examples of such applications include the recognition of attacks in computer network nodes, human activities on video content, emerging stories and trends on the Social Web, traffic and transport incidents in smart cities, error conditions in smart energy grids, violations of maritime regulations, cardiac arrhythmias and epidemic spread. In each application, CER allows to make sense of streaming data, react accordingly, and prepare for counter-measures. In this tutorial, we will present the formal methods for CER, as they have been developed in the artificial intelligence community. To illustrate the reviewed approaches, we will use the domain of maritime situational awareness.
Bio: Alexander Artikis is an Associate Professor at the University of Piraeus (GR), and a Research Associate at NCSR Demokritos, leading the complex event recognition group1. He holds a PhD from Imperial College London on Multi-Agent Systems, while his research interests lie in the area of Artificial Intelligence. He has published over 100 papers in related journals and conferences. Alexander has been developing complex event processing techniques in the context of several EU-funded Big Data projects, and he was the scientific coordinator in some of them. He has given tutorials on complex event processing in IJCAI, KR, VLDB and ECAI. In 2020, he co-organised the Dagstuhl seminar on the “Foundations of Composite Event Recognition”.
(#14)Optimizing Complex Event Forecasting, Vasileios Stavropoulos (National Centre of Scientific Research "Demokritos"); Elias Alevizos (NCSR'D)*; Nikos Giatrakos (Technical University of Crete); Alexander Artikis (University of Piraeus)
Abstract: In our more and more connected world where people are used to managing their lives via digital services, it has become mandatory for a successful company to build applications that can scale with the popularity of the company’s services. Scalability is not the only requirement but similarly important is that modern applications are highly available and fast because users are not willing to wait in our ever faster moving world. Due to this, we have seen a shift from the classic monolith towards micro service architectures which promise to be more easily scalable. The emergence of serverless functions further strengthened this trend more recently. By implementing a micro service architecture, application developers are all of a sudden exposed to the realm of distributed applications with its seemingly limitless scalability but also its pitfalls nobody tells you about upfront. So instead of solving business domain problems, developers find themselves fighting with race conditions, distributed failures, inconsistencies and in general a drastically increased complexity. In order to solve some of these problems, people introduce endless retries, timeouts, sagas and distributed transactions. These band aids can quickly result in a not so scalable system that is brittle and hard to maintain. The underlying problem is that developers are responsible for ensuring reliable communication and consistent state changes. Having a system that takes care of these aspects could drastically reduce the complexity of developing scalable distributed applications. By inverting the traditional control-flow from application-to-database to database-to-application, we can put the database in charge of ensuring reliable communication and consistent state changes and, thus, freeing the developer to think about it. In this keynote, I want to explore the idea of putting the database in charge of driving the application logic using the example of Stateful Functions, a library built on top of Apache Flink that follows this idea. I will explain how Stateful Functions achieves scalability and consistency but also what its limitations are. Based on these results, I would like to sketch the requirements for a runtime that can truly realise the full potential of Stateful Functions and discuss with you ideas how it could be implemented.
Bio: Till is a PMC member of Apache Flink and a cofounder of Ververica. During his time at Ververica, his work focused on enhancing Flink’s scalability and high availability. He also bootstrapped Flink’s CEP and the initial ML library. Nowadays, Till focuses on making the development of distributed applications easier by employing stream processing techniques to this space.
Abstract: Freedom of the press is under threat worldwide, and the quality of information that people have access to is dangerously degraded, under the joint threat of non-democratic governments and fake information propagation. The press as an industry needs powerful data management tools to help them interpret the complex reality surrounding us. Since 2018, I have been cooperating with journalists from Le Monde, France’s leading newspaper, in devising tools for analyzing large and heterogeneuos data sources that they are interested in. This research has been embodied in ConnectionLens, a graph ETL tool capable of ingesting heterogeneous data sources into a graph, enriched (with the help of ML methods) with entities extracted from data of any type. On such integrated graphs, we devised novel algorithms for keyword search, and combine them in more recent research with structured querying. The talk describes the architecture and main algorithmic challenges in building and exploiting ConnectionLens graphs, illustrated in particular on an application where we study conflicts of interest in the biomedical domain. This is joint work with A. Anadiotis, O. Balalau, H. Galhardas and many others. ConnectionLens Web site (papers+code): https://team.inria.fr/cedar/connectionlens/. This research has been funded by Agence Nationale de la Recherche AI Chair SourcesSay (https://sourcessay.inria.fr).
Abstract: Meet AI - It’s a very strange time in its life. We have a tremendous amount of data at our disposal and a tremendous potential to do good, immense interest to build and widely deploy these systems and a very real impact (including irrevocable harm) associated with this technology. The landscape is rife with problems – incentive structures and gold-rush mentality in scholarship, celebrity culture and media hype, unhealthy extremes of techno-bashing and techno-optimism and the false dichotomy between “social problems” and “engineering problems”. Nuance and critical thinking are the most valuable, yet scarce commodities! A possible first step at self-correction could be for us – as practitioners and designers of these systems – to stop taking *ourselves* so seriously and instead direct this gravitas onto the *consequences* of our work (beyond citation counts and academic accolades). How, you ask? Using the marvelous world of comics! In this talk, I’ll present my thoughts (in comic form) on some of the pressing problems in the AI landscape (broadly defined) and attempt to motivate artistic interventions as a possible solution to catechisms of the scientific landscape that we take too seriously and perhaps need to rethink. A running theme through the talk will be the need to include diverse voices and methodologies, and to define the scope of impact more broadly — what difference does any of our work make if we can’t communicate it to the people that matter?
Bio: Falaah is a first year Data Science PhD student at NYU, working with Prof Julia Stoyanovich on the ‘fairness’ and ‘robustness’ of algorithmic systems. An engineer by training and an artist by nature, Falaah creates scientific comic books to bridge together scholarship from different disciplines, and to disseminate the nuances of her research in a way that is more accessible to the general public — She runs the ‘Data, Responsibly’ and ‘We are AI’ comic series with Prof Julia Stoyanovich at NYU’s Center for Responsible AI, and the ‘Superheroes of Deep Learning’ comic series with Prof Zack Lipton (CMU). Falaah holds an undergraduate degree in Electronics and Communication Engineering (with a minor in Mathematics) from Shiv Nadar University, India, and has industry experience in building machine learning models for access management and security at Dell EMC.
Abstract: Feature stores for machine learning are a new category of systems software that centralize the management of data for AI - for both training and serving data to models. They solve problems related to ensuring consistent transformations of data between training and serving, the reuse of pre-engineered features across different models, ensuring no future leakage in training data through point-in-time correct joins, and enabling collaboration between different personas putting AI in production, including data engineers, data scientists, and ML engineers. In this tutorial, we will present the historical evolution of feature stores and the needs that drove their development. We will deep dive into the first open-source feature store, Hopsworks, and how you can use Hopsworks to build both an analytical ML application and an operational ML application. This will involve showing an end-to-end system that includes feature engineering, the feature store, model training, and pipeline orchestration. You will need experience in programming in Python, some knowledge of Pandas will be helpful, but no prior experience of machine learning is needed - just enthusiasm for building real-world machine learning systems.
Bio: Jim Dowling is CEO of Hopsworks and an Associate Professor at KTH Royal Institute of Technology. He is one of the main developers of the open-source Hopsworks platform, a horizontally scalable data platform for machine learning that includes the industry’s first Feature Store. His research interests are in the areas of distributed file systems, decentralized systems, and systems support for real-time machine learning. Jim is a former Marie Curie Scholar, and has won awards for his research including the IEEE Scale Prize, awarded by CCGrid, for his work on the HopsFS file system. Jim is a regular speaker at industry conferences on data and AI and is currently writing a book on feature stores for Manning.
Abstract: Modern applications handle increasingly larger volumes of data, generated at an unprecedented and constantly growing pace. They introduce challenges that are transforming all research fields that gravitate around data management and processing, resulting in a blooming of distributed data-intensive systems. Each data-intensive system comes with its specific assumptions, data and processing model, design choices, implementation strategies, and guarantees. Yet, the problems data-intensive systems face and the solutions they propose are frequently overlapping. This tutorial presents a unifying model for data-intensive systems that dissects them into core building blocks, enabling a precise and unambiguous description and a detailed comparison. We show a list of classification criteria that derive from the model and we use them to build a taxonomy of state-of-the-art systems. The tutorial aims to offer a global view of the vast research field of data-intensive systems, highlight interesting observations on the current state of things, and suggest promising research directions.
Bio: Alessandro Margara is associate professor at Politecnico di Milano. He obtained his PhD from Politecnico di Milano and worked as a post- doctoral researcher at the Vrije Universiteit (VU) Amsterdam and Università della Svizzera italiana (USI). Alessandro’s research interests are in the area of software engineering and distributed systems. his research focuses on defining abstractions and building systems to simplify the design, development, and operation of complex distributed applications. Alessandro is a long-term member of the DEBS community and a regular member of the DEBS Program Committee. His DEBS 2010 paper received the DEBS 2020 Test of Time award. His DEBS 2014 paper received the Best Paper award. Alessandro was DEBS 2021 General Co-Chair. Alessandro has already presented two tutorials in DEBS 2011 and in DEBS 2016. Both tutorials had a similar goal and format as the proposed one. They were presenting a model and classification of heterogeneous software systems.
Abstract: Modern applications handle increasingly larger volumes of data, generated at an unprecedented and constantly growing pace. They introduce challenges that are transforming all research fields that gravitate around data management and processing, resulting in a blooming of distributed data-intensive systems. Each data-intensive system comes with its specific assumptions, data and processing model, design choices, implementation strategies, and guarantees. Yet, the problems data-intensive systems face and the solutions they propose are frequently overlapping. This tutorial presents a unifying model for data-intensive systems that dissects them into core building blocks, enabling a precise and unambiguous description and a detailed comparison. We show a list of classification criteria that derive from the model and we use them to build a taxonomy of state-of-the-art systems. The tutorial aims to offer a global view of the vast research field of data-intensive systems, highlight interesting observations on the current state of things, and suggest promising research directions.
Bio: Alessandro Margara is associate professor at Politecnico di Milano. He obtained his PhD from Politecnico di Milano and worked as a post- doctoral researcher at the Vrije Universiteit (VU) Amsterdam and Università della Svizzera italiana (USI). Alessandro’s research interests are in the area of software engineering and distributed systems. his research focuses on defining abstractions and building systems to simplify the design, development, and operation of complex distributed applications. Alessandro is a long-term member of the DEBS community and a regular member of the DEBS Program Committee. His DEBS 2010 paper received the DEBS 2020 Test of Time award. His DEBS 2014 paper received the Best Paper award. Alessandro was DEBS 2021 General Co-Chair. Alessandro has already presented two tutorials in DEBS 2011 and in DEBS 2016. Both tutorials had a similar goal and format as the proposed one. They were presenting a model and classification of heterogeneous software systems.
Session Chair: Sebastian Frischbier (Infront Financial Technology GmbH),
Arne Hormann, (Infront Quant AG),
Ruben Mayer (TU Munich),
Jawad Tahir (TU Munich), and
Christoph Doblander (TU Munich)
(#54)The DEBS 2022 Grand Challenge: Detecting Trading Trends in Financial Tick Data, Sebastian Frischbier (Infront Financial Technology GmbH); Jawad Tahir (Technical University of Munich)*; Christoph Doblander (Technical University of Munich); Arne Hormann (Infront Quant AG); Ruben Mayer (Technical University of Munich); Hans-Arno Jacobsen (University of Toronto)
(#26)A High-Performance Stream Processing System Implementation for Monitoring Stock Market Data Stream, Kevin A Li (The University of Texas at Austin); Daniel Fernandez (The University of Texas at Austin); David Klingler (The University of Texas at Austin); Yuhan Gao (The University of Texas at Austin ); Jacob Rivera (University of Texas at Austin); Kia Teymourian (The University of Texas at Austin)*
Abstract: In my 44 years building software, technology trends have dramatically changed what's difficult and what's hard. In 1978, CPU, storage, and memory were precious and expensive but coordinating across work was effectively free. Running on a single server, networking was infinitely expensive as we had none. Now, there's an abundance of computation, memory, storage, and network with even more on the way! The only challenge is coordination. Year after year, the cost of coordinating gets larger in terms of instruction opportunities lost while waiting. The first half of the talk explains these changes and their impact on our systems. In response, there are many approaches to avoiding or minimizing the pain of coordination. We taxonomize these solutions and discuss how our systems are evolving and likely to evolve as the world changes around us. I am, indeed, a person who's uncoordinated and very likely to drop and/or break stuff. I've adapted to that in my personal life and spend a great deal of my professional life looking for ways our systems can avoid the need to coordinate.
Bio: Pat Helland has been building distributed systems, database systems, high-performance messaging systems, and multiprocessors since 1978, shortly after dropping out of UC Irvine without a bachelor's degree. That hasn't stopped him from having a passion for academics and publication. From 1982 to 1990, Pat was the chief architect for TMF (Transaction Monitoring Facility), the transaction logging and recovery systems for NonStop SQL, a message-based fault-tolerant system providing high-availability solutions for business critical solutions. In 1991, he moved to HaL Computers where he was chief architect for the Mercury Interconnect Architecture, a cache-coherent non-uniform memory architecture multiprocessor. In 1994, Pat moved to Microsoft to help the company develop a business providing enterprise software solutions. He was chief architect for MTS (Microsoft Transaction Server) and DTC (Distributed Transaction Coordinator). Starting in 2000, Pat began the SQL Service Broker project, a high-performance transactional exactly-once in-order message processing and app execution engine built deeply into Microsoft SQL Server 2005. From 2005-2007, he worked at Amazon on scalable enterprise solutions, scale-out user facing services, integrating product catalog feeds from millions of sellers, and highly-available eventually consistent storage. From 2007 to 2011, Pat was back at Microsoft working on a number of projects including Structured Streams in Cosmos. Structured streams kept metadata within the "big data" streams that were typically 10s of terabytes in size. This metadata allowed affinitized placement within the cluster as well as efficient joins across multiple streams. On launch, this doubled the work performed within the 250PB store. Pat also did the initial design for Baja, the distributed transaction support for a distributed event-processing engine implemented as an LSM atop structured streams providing transactional updates targeting the ingestion of "the entire web in one table" with changes visible in seconds. Starting in 2012, Pat has worked at Salesforce on database technology running within cloud environments. His current interests include latency bounding of online enterprise-grade transaction systems in the face of jitter, the management of metastability in complex environments, and zero-downtime upgrades to databases and stateful applications. In his spare time, Pat regularly writes for ACM Queue, Communications of the ACM, and various conferences. He has been deeply involved in the organization of the HPTS (High Performance Transactions Systems - www.hpts.ws) workshop since 1985. His blog is at pathelland.substack.com and he parsimoniously tweets with the handle @pathelland.
Abstract: We are living in a data deluge era, where data is being generated by a large number of sources. This just got exacerbated with the emergence of the Internet of Things (IoT). Nowadays, a large number of different devices are generating data at an unprecedented scale: smartphones, smartwatches, embedded sensors in cars, smart homes, wearable technology, just to mention a few. We are simply surrounded by data without even noticing it. This represents a great opportunity to improve our everyday lives by using the new advances in AI, also called AIoT. Connecting IoT with data storage and AI technology is just gaining more and more attention. Yet, performing AIoT in an efficient and scalable manner is a cumbersome task. Today, users have to implement different ad hoc solutions to move data from the IoT to “stable” storage on which they can perform AI (typically on the Cloud). In this tutorial, we will discuss and learn how Apache Wayang (Incubating) frees users from this burden. In particular, we explain how Wayang enables users to seamlessly run their AI tasks on the Fog and Cloud via its cross-platform optimizer.
Bio: Jorge Quiané is the head of the Big Data Systems research group at the Berlin Institute for the Foundations of Learning and Data (BIFOLD) and a Principal Researcher at DIMA (TU Berlin). He also acts as the Scientific Coordinator of the IAM group at the German Research Center for ArtificialIntelligence (DFKI). His current research is in the broad area of big data: mainly in federated data analytics, scalable data infrastructures, and distributed query processing. He has published numerous research papers on data management and novel system architectures. He has recently been honoured with the 2022 ACM SIGMOD Research Highlight Award, the Best demo Award at ICDE 2022, and the Best Paper Award at ICDE 2021 for his work on “EfficientControl Flow in Dataflow Systems”. He holds five patents in core database areas and on machine learning. Earlier in his career, he was a Senior Scientist at the Qatar Computing Research Institute (QCRI) and a Postdoctoral Researcher at Saarland University. He obtained his PhD in computer science from INRIA (Nantes University).
Abstract: In this talk, I will present
RisingWave, a distributed SQL streaming database designed for the cloud. RisingWave provides standard
SQL as the interactive interface. It speaks in PostgreSQL dialect, and can be seamlessly integrated with
the PostgreSQL ecosystem with no code change. RisingWave treats streams as tables and allows users to
compose complex queries over streaming and historical data declaratively. RisingWave is designed for the
cloud. The cloud-native architecture enables RisingWave to scale compute and storage resources
separately and infinitely based on the users’ demands. We open-sourced RisingWave kernel under Apache
License 2.0. Together with the open community, we are on the mission to democratize stream processing:
to make stream processing simple, affordable, and accessible for everyone.
Bio: Yingjun Wu is the founder and CEO of Singularity
Data, a startup innovating next-generation cloud-native database systems. Before starting his adventure,
Yingjun was a software engineer at the Redshift team, Amazon Web Services, and a researcher at the
Database group, IBM Almaden Research Center. Yingjun received his PhD degree from National University of
Singapore, where he was affiliated with the Database Group (advisor: Kian-Lee Tan). He was also a
visiting PhD at the Database Group, Carnegie Mellon University (advisor: Andrew Pavlo). Yingjun Wu is
passionate about integrating research into real-world system products. During his time at AWS, Yingjun
was responsible for boosting Amazon Redshift performance using advanced vectorization and compression
techniques. Before that, he participated in the development of IBM Db2 Event Store’s indexing structure
and transaction processing mechanism. Yingjun was an early contributor to Stratosphere, which is now
widely known as Apache Flink. Yingjun is also active in academia. He is serving as a Program Committee
member in several top-tier database conferences, such as SIGMOD, VLDB, and ICDE.
Abstract: Large network data evolving over time
have become ubiquitous across most industries, ranging from automotive and pharma to e-commerce and
banking. Despite recent efforts, using temporal graph neural networks on continuously changing data in
an effective and scalable way has been a challenge. This talk presents one of the research tracks
studied at Euranova. We provide an overview of relevant continual learning methods directly applicable
to real-world use cases. As explainability has become the central ingredient in trustworthy AI, we
introduce the landscape of state-of-the-art methods designed for explaining node, link or graph level
predictions.
Bio: Madalina Ciortan is the head of the data science
department at Euranova. After graduating as an engineer, a master in computer science and a postmaster
in bioinformatics, she earned a doctorate in data science. She has over 15 years of experience in roles
ranging from development, architecture, team leading, coaching and research. She worked on topics
including computer vision, NLP, time series analysis, unsupervised analysis, self-supervised learning,
as well as high dimensional and noisy data analysis in the industry.
Abstract: Across our business and personal
lives,
we’re used to getting information in real-time. How many stops until my parcel arrives? Where’s my Uber?
Is it faster for me to walk or take the bus? Are there any new videos to watch? We all carry
super-computers in our pockets with instant access to all the streaming data we want, customised to our
personal preferences. Naturally, financial services professionals in brokerage, trading, and wealth
management expect their market data to be real-time as well, allowing them to make split second
decisions to buy and sell. But real time market data is hard to provide with sufficient Quality of
Information (QoI) and Quality of Service (QoS). It’s massive, constant, and complex. And most days,
real-time is much more than people need, even if it is what they ask for. We’ll look at what the real
business drivers are for going real-time, when delayed or daily is better, and how to balance the market
data needs against the wants.
Bio: Anna Almén – CTO, joined Infront, a leading
European provider of information and technology solutions, in March 2022. Anna has extensive experience
in integrating companies based on a data-driven approach leading to more efficient organisations. Anna
Almén has held various positions in the Swedish financial technology sector as well as working with
startup organisations going through exponential growth. Most recently, she held the position of CTO of
eCommerce at Worldline following the merger with the startup company Bambora where she had a key role in
the technology leadership. She has also worked for many years for companies in the trading segment
including Nasdaq OMX. Anna Almén earned her Master’s Degree in Computer Science at KTH Royal Institute
of Technology.
Abstract: Materialize is a system that presents to users as SQL against continually changing data. It transforms inbound streams of *change data capture* events into streams that exactly correspond to transformed data, and maintains indexed representations of the results for efficient access and operation. SQL over changing data is surprisingly (for me) expressive: Materialize can operate on unbounded data, implement data-driven windows, and perform event-based queries, all with ANSI standard SQL. We will discuss what an event-based SQL system looks like, SQL idioms that give rise to traditionally stream-exclusive behavior, and how one architects such a system to scale across multiple dimensions.
Bio: Frank McSherry is Chief Scientist at Materialize. He was initially a graduate student at the University of Washington where he worked with Anna Karlin on Spectral Graph Theory, then a Researcher at Microsoft Research SVC where he co-developed differential privacy and led the Naiad research project, and later a Visiting Researcher at ETH Zuerich where he honed the dataflows timely and differential.