Ver oferta completa

PHD POSITION IN ARTIFICIAL INTELLIGENCE FOR DATA LAKES MANAGEMENT

Descripción de la oferta de empleo

Context
The global data lake market size is projected to triple from 2019 to 2024 reaching $20.1 billion [1], and the European share is about one quarter of this amount [2]. Although data lakes are more than a promising approach, today’s solutions do not properly unleash the potential of data analysis, especially at a large cross-organization scale for several reasons. Firstly, data lakes are usually deployed and managed by a single party, and a centralized approach can lead to failure due to the complexity of the diverse data sources [3]. Secondly, the computing continuum (i.e. the resources located at the edge, fog, and cloud) is not fully exploited. To minimize impact of data transfer, data should be processed where they are generated, but at the same time security/privacy and governance concerns arise. Thirdly, data sovereignty [4] must be preserved, thus personal data or business sensitive data cannot leave the boundaries of the organization unless a proper data transformation is performed compliant with the organization’s policies and general norms, which often limits the data sharing. Finally, data lake implementations are not sustainable: because of the illusion created by low-cost storage devices and the assumption that all data have a huge value for companies [5], operators do not discriminate if a data is already stored or if it is useful, resulting in data duplications and storage of unused data.
Stretched Data Lakes aim to leverage Data Mesh and Data Fabric [6] concepts to address these challenges by enabling trusted, verifiable, and energy-efficient data flows across the edge-cloud continuum. They are based on a shared but decentralized approach for defining, enforcing, and tracking data governance requirements with specific emphasis on privacy/confidentiality. Moreover, by applying the principles of circular economy to data governance, i.e., to reuse data, application, and computation resources, Stretched Data Lakes will enable the creation of platforms for more energy-efficient and sustainable data analytics.
Topic
The overall aim of this thesis is to propose, implement and evaluate novel AI-based strategies for trustworthy, energy-efficient management of data flows within Stretched Data Lakes. The elaborated approach will enable the definition of gravity and friction-aware data privacy requirements that will drive data governance throughout the data lake. The enforcement of these policies will be explored by leveraging state of the art AI, distributed and federated learning techniques to implement specific data operations that can be seamlessly applied on access or movement of data. Finally, the approach will also optimize the energy footprint of the data flows by exploiting predictive and optimization AI models.
Keywords
Stretched Data Lakes, Data Mesh, Data Fabric, Cloud-Edge Continuum, Trustworthiness, Privacy-aware Data Management, Energy-efficient Data Operations
References
[1] MarketsAndMarkets, BigData Market - Global Forecast to 2025, 2020.
[2] Data Intelligence, Global Data Lakes Market 2019-2026, 2019.
[3] Z. Dehghani, How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh, https://martinfowler.com/articles/data-monolith-to-mesh.html, accessed in Aug. 2022.
[4] L. Nagel, D. Lycklama, Design Principles for Data Spaces Position Paper, v1.0, DOI: https://doi.org/10.5281/zenodo.5105744, 2021.
[5] F. Lucivero, Big Data, Big Waste? A Reflection on the Environmental Sustainability of Big Data Initiatives, Science and Engineering Ethics. 26, 2020, DOI: https://doi.org/10.1007/s11948-019-00171-7.
[6] A. Woodie, Data Mesh Vs. Data Fabric: Understanding the Differences, https://www.datanami.com/2021/10/25/data-mesh-vs-data-fabric-understanding-the-differences/, accessed in Aug. 2022.
Responsibilities
Realize a state-of-the-art study on AI-based privacy awareness and energy efficiency in data operations in the edge-cloud continuum.
Identify, document and prioritize a set of scientific and technical challenges affecting the viability of privacy-aware energy-efficient data operations in the continuum.
Propose and implement innovative approaches overcoming identified challenges,
Develop a validation prototype and conduct evaluation of the proposed approaches.
Participate in the TEADAL project tasks (meetings, deliverables, integration, etc.).
Environment
The selected candidate will be jointly mentored by a professor from the Universitat Politècnica de Catalunya (UPC) and a senior researcher from the Distributed AI department of i2CAT during 3 years. After having established a state of the art, the selected candidate will conceive, implement algorithmically and architecturally different strategies for the trustworthy and energy efficient data flow management within stretched data lakes. These strategies will be evaluated through analysis and prototyping of pilot use cases in at least two verticals (Health, Mobility, Agriculture, Industry 4.0 and Energy).
During the PhD program, the candidate must publish his work in scientific conferences and journals, and may contribute to patents. The successful candidate will also contribute to the TEADAL project tasks, meetings, presentations and deliverables.
The PhD position is fully funded by the TEADAL project funded by the European Union’s Horizon Europe research and innovation programme under Grant Agreement No. xxx (TBD).
Application
A curriculum vitae, highlighting research experience and education.
The official Transcripts of Records of all undergraduate and graduate studies, if possible with ranking.
Optionally, 1 - 2 recommendation letters.
Who we are:
The i2CAT Foundation is a non-profit research and innovation center that promotes mission-driven R&D activities on advanced Internet architectures, applications, and services. More than 15 years of international research define our expertise in the fields of 5G, IoT, VR, and Immersive Technologies, Cybersecurity, Blockchain, AI, and Digital Social Innovation. The center partners with companies, public administration, academia, and end-users to leverage this knowledge in order to meet real social and business challenges.
The greatest value of i2CAT is the talent of the people who make up our human team. We enjoy a team of people from more than 13 different nationalities and work every day to create and foster a work environment where we all feel comfortable creating, innovating and growing.
Want to know more? Visit our webpage! www.i2cat.net
What do we offer?
Work from our offices or from home, whichever works best fo
Ver oferta completa

Detalles de la oferta

Empresa
  • Fundació i2cat
Localidad
Dirección
  • Sin especificar - Sin especificar
Tipo de Contrato
  • Sin especificar
Fecha de publicación
  • 20/09/2022
Fecha de expiración
  • 19/12/2022
Data journalist position
Ms data agency

For more information and to apply for the position please visit us at: www... we are looking for freelancers all over spain to gather statistic data through venues, delivering them in real time with our own mobile application available for ios and android devices... we work with a global leader in understanding......

Data Journalist Position
MS Data Agency

For more information and to apply for the position contact us on: *****@*****/p>requisitos del puesto- advanced english - smartphone - computer... we are looking for freelancers all over spain to gather statistic data through venues, delivering them in real time with our own mobile application available......

Data steward and architect
INSTITUT CATALÀ DE NANOCIÈNCIA I NANOTECNOLOGIA (ICN2)

Main tasks and responsibilities: implementing the research data management section of the icn2 data management roadmap and contribute to the roadmap evolution... master in data science or data architecture will be an asset... deadline for applications: 30th september 2022... icn2 is following the procedure......

CALL 37-2022-2 - Researcher Position in PONS
Centre Tecnològic de Telecomunicacions de Catalunya

Flexible working hours for work-life balance tax-free optional benefits: restaurant ticket, nursery ticket, transport pass, private health insurance for employees and family... for more information about the pons research unit click here (https://www... who are we looking for? the packet optical networks......

CALL 59-2022-1 Researcher position in SaS
Centre Tecnològic de Telecomunicacions de Catalunya

For more information about sas please refer to https://www... flexible working hours for work-life balance tax-free optional benefits: restaurant ticket, nursery ticket, transport pass, private health insurance for employees and family... the candidates' past performance but also future potential will......

ICN2 PhD Programme
INSTITUT CATALÀ DE NANOCIÈNCIA I NANOTECNOLOGIA (ICN2)

Candidates who are successful in their applications will be admitted to the icn2 phd programme, and will benefit from: full-time working contract as a phd researcher... research career profile (based on the european framework for research careers): first stage researcher (r1)otros datos del puestosummary......

Senior Data DevOps Engineer
Sutherland Global Services

The data engineering team is responsible for the primary organs and arteries of client’s security solutions... the team’s mission is to consistently deliver a world-class data platform that enables security researchers, threat intel analysts, and data scientists to detect malicious non-human activity......

INNOVATION PROCUREMENT & PROJECT MANAGER FOR R&D PROJECTS
Icamcyl international center for advanced materials and raw materials

Experience in administrative proposals and project management... as part of cluster’s management board, icamcyl ensures that ismc gathers companies and organisations of the mining sector and its associated services to strengthen their position... assistance to economic and financial management of european......

Post-doctoral position in the group EPHONI
INSTITUT CATALÀ DE NANOCIÈNCIA I NANOTECNOLOGIA (ICN2)

The objective of this post-doc position is to develop a dual phononic and photonic (phoxonic) topologic insulator for optomechanical applications on a single si-based platform... main tasks and responsibilities: the objective of this post-doc position is to develop a dual phononic and photonic (phoxonic)......

INNOVATION PROCUREMENT & PROJECT MANAGER FOR R&D PROJECTS
Ismc iberian sustainable mining cluster

Experience in administrative proposals and project management... assistance to economic and financial management of european projects: periodic financial declarations and audits preparation (experience in horizon2020, interreg … )... the candidate will be specifically responsible for the following tasks:......