ReseARCh DAtA MANAGeMeNt AND DIGItAL CURAtION As A LIBRARy ACtIVIty

. In the age of information technologies, it is important to recall that research data be-comes more significant than mere scholarly publications without data sets. Data sets are treated as texts, they may have DOI, and they can be referred to with the utilization of any bibliographic style. Research data management is a multileveled and convoluted procedure, which can be executed in association with the library or digital curators in the laboratories of research institutions. For advanced analysts, it is urgent to working up a network of individuals to disperse their exploration results and with this reason including the library and data experts.

Introduction today, we observe strong connection between librarianship and information science in the world, which is affecting library practices. As an example, the Library of Glasgow University (scotland) pays considerable attention to the technical aspects of working with information: creating an online profile of the scholars, depositing thesis and dissertations, tracking information-seeking behavior on the library website, managing research data in accordance with the principles of academic integrity, automating library processes, archiving and digitizing online, etc. The library and It Center at Glasgow University are combined in entire department which means that library and It services are provided by one huge team of information specialists. Glasgow University is not the only institution, which takes into consideration research data management (RDM) and data curation, but it definitely has one of the best practices in conducting such processes in the library. We can name the following institutions concerned with this topic in europe and in the UsA such as edinburgh University 1 , University of Glasgow 2 , DataOne 3 , Florida state University 4 , North Carolina Chapel -hill University 5 , Ottawa University 6 . Of course, RDM is not just a library practice and it is possible to say that it is more scholarly activity but as research data caution principles are similar to library activities it is logical to provide RDM in libraries and by librarians as well as by research institutions and scholars.

Research data management: An overview
Research data management is now gaining popularity in the world of librarianship and information science. This practice is already widely used by some leading research centers and organizations and is mandatory in some institutions that receive grants for their research projects. The libraries of the universities of 87 scotland, in edinburgh and Glasgow, are actively working with their lecturers and academics to provide qualified data management, as funding for universities is directly dependent on their quality of research activity.
The example of RDM training activities provided by the Library of edinburgh University is presented by online and classroom-based activities 7 . Online activities include MOOC developed jointly with North Carolina University (Research Data Management and sharing) and free non-credit course MANtRA (Management training) which has video and text materials explaining main RDM principles and approaches of different scholars.
Classroom-based training activities include workshops oriented towards developing and improving RDM skills and scholars' profile. Classroom workshops are dedicated to the following topics: benefits of RDM for scholars, RDM planning activities and tools, ethical issues of involving vulnerable and personal data, Open Refine tool usage for cleaning the data, sPss tools for data, how to assess risk of data disclosure, quality and quantity assurance, effective use of the data in research, data visualizing tools and techniques. Library users may request additional training activities on specific topics concerning their research.
The second example of RDM activities provided by the library is Glasgow University Library. The library provides research support and depositing data at University repository enlighten. RDM activities are performed according to Research Data Management Policy of the University accepted in 2015 8 . RDM support services at Glasgow University offer to researchers a variety of activities and include general RDM training, RDM plans writing consultations and review, templates for RDM plans, support of compiling grant applications, depositing data assistance.
In order to understand why such successful research and education institutions of the world are concerned with RDM, we should regard and learn what RDM is and focus on its importance to the modern-day researchers. We should mention that the reputation of the research institution is closely connected to its effectiveness. In order to show the effective use of their resources and providing new knowledge, the researchers should remember about managing their data. Research data management is one of the best practices for making a good public image of the research institution. to begin with, we need to indicate what research data is. to answer this question, we are going to address American researcher Christine Borgman. According to her, data is considered as the main research unit, which represents factual interpreted information in forms of observation, calculation, records, experimental data, and digital data and exists in such types as texts, numbers, and multimedia, software, specified by discipline, specified by tools 9 .
Data also exists in different formats, has numerous versions, which affects naming conventions during data curation. With this purpose librarians use different software open codes or subscribe in order to arrange these elements in a proper data set. Usage of naming conventions helps in organizing the data and provides better data preservation and access.
Research data undergoes several stages of lifecycle, which may vary from one research institution to another. In order to understand the basic principles of the research data lifecycle, we shall regard one of the simple examples developed by the Library of Ottawa University 10 . This example contains the main elements of the research data lifecycle and explains what each stage means in order to proceed to the next level and, thus, pushing the research forward. Pobrane z czasopisma Folia Bibliologica http:/foliabibliologica.umcs.pl Data: 04/06/2020 21:42:44 U M C S every research starts with planning, especially when it is a part of the grant project. As Joyce M. Ray 11 in the Introduction to Research Data Management states, sharing research data returns a profit to investments made into the science, especially when the research may be very crucial to the public health and wellbeing. Ray also indicates that it is necessary to avoid duplication of research data, thus, economizing time for providing further research and not duplicating work 12 .
In order to provide a plan for RDM, some researchers may use online instruments like DMP Online or DMP tool and compile the 2-page document, which can be altered during the research performance. Planning also includes the sources review and informed consent from the participants of the medical, educational, sociological research, etc. After planning, the researcher creates data sets while gathering the raw data during experiments, interviews or surveys, etc. At the stage of creation, it is worth to assign metadata to the packages. The raw data must undergo some processing: cleansing, renaming, versioning, anonymizing, validation, etc.
When data is ready for analysis, some Data science tools may be used in order to perform operations in gathering important information from the data, which will support the hypothesis of the research. At this stage, some articles may be produced, and it is important to cite the data as it would serve as a basis for research results. In order to cite any data, it should be kept appropriately thus preserved on research institution servers, institutional repositories or in case of absence of the last -on open repositories for data, which have the seal of approval. The data sets and packages are shared with the help of the repositories, as well as stored in them. The accessibility of data also enables the second usage of data by the researcher him/ herself or by other researchers interested in similar topics.
As one can observe the lifecycle is very simple and easy to understand. The library can help researchers at several stages of research data processing. The library can help with data curation, renaming, versification, anonymization, etc. The library may help in sharing, citing, preserving data in the same way as it helps with text repositories of the institution. With this purpose (providing data curation for the researchers), librarians need to know how to use some tools in order to curate the data and to consult researchers on how they can manage their research data sets.  Planning tools The most popular tools for building up an RDM plan that librarians can suggest to researchers in order to comply with grant organization requirements are DMPtool and DMP online. In order to create an RDM plan, the researcher should register an account on one of these platforms and follow the suggestions of the system while creating the document. DMPtool has a dashboard, which opens after registering an account 13 . The researcher can choose a template, which suits to requirements of providing research grants. At the stage of creating a plan, the system asks a curator or researcher about the project, research organization and granting institution. After selecting and filling in these points the system opens a template, which the researcher must fulfill. This template is composed of several elements where the researcher describes project details by indicating its title, funding, abstract, etc. After the first stage, there is a plan overview which helps the researcher to check whether the plan fits the standards of selected funding organization and after reading instruction the researcher fills in the plan itself by answering the questions concerning the points of collecting and preserving data, ethics, metadata, and other important aspects. When the document is ready, it can be shared and downloaded. DMP online platform 14 has similar principles of creating plans. In order to create an RDM plan, the research must register for an account and undergo a similar procedure of filling in the information about research project details and funding information.
The role of the librarian at this stage is either to consult a researcher on how to write the plan or to write the part that concerns the details on metadata creation, preservation and sharing, thus, contributing to the research as a curator. In the research data management plan, the researcher will indicate who is responsible for data curation: either the librarian or the researcher him/herself.

Creation
As we mentioned before, the researcher creates data sets while performing surveys, carrying out some experiments and observations, thus, the library has a little impact on this process as this must be the researcher's responsibility to choose what data will be necessary to testify the main hypothesis or theory. The only possible help which can the librarian suggests to the researcher at this stage is to assign metadata in order to describe selected data sets for further preservation, sharing, and reuse.

Process
After the data received, it is necessary to process it. During processing, the researcher cleans, renames, anonymizes data, and creates data packages. One of the tools, which can be suggested by the librarian or digital curator for this purpose is Data Package Creator from Frictionless Data 15 . This tool is equipped with an easy -to-understand guide that enables better data package creation. It has a detailed description of how to assign metadata to your data package, which is very important in RDM and for the long-term preservation of the data.

Analysis
Different types of data can be analyzed depending on the sphere where it is used. several tools may be applied in this process: Jupiter Notebook, Zeppelin Notebook, Watson studio, and other tools peculiar to Data science 16 . Data packages become at this stage a kind of evidence on which the hypothesis and theories are based, thus, data serves as the basis for numerous publications and must be cited as any other source of information. At this stage, the librarian can suggest the best ways of citing data by means of bibliographical description.
Preservation and sharing (data repositories) At this stage, the researcher should care about ethics, in particular, academic integrity and research ethics, security (where to keep and store the data either on a personal UsB drive or on the research institution server, repository, etc.) In order to provide security to the research data packages, it is always better to deposit data packages to thematic or institutional data repositories. indicates that after each stage of lifecycle, i.e. creation, distribution, library, and archival collection, and long-term access, there is a stage of preservation 17 .
According to Kowalczyk, the following scheme can be presented as a linear, one direction model which represents moving forward.
The model of traditional preservation can be depicted in the form of a timeline. Thus, the model of traditional preservation is rather linear, and the action of preservation takes place once after the collection is compiled and is in the stage of preparation for long-term access.  One can see that in the model of digital preservation, the frequency of the preservation stage is larger than in the traditional model. The traditional model is used for preserving physical objects, while the digital model is designed for digital resources of information. As digital information changes rapidly, it is important to preserve these objects frequently which is reflected in the model. 17 Kowalczyk, s. (2018). Digital Curation for . santa Barbara: Libraries Unlimited. In case when the institution does not possess its own repository, the researcher can upload the materials to a thematic data repository. In order to find one, it can be useful to use the tool re3data.org 18 . During the selection of the repository, the researcher will encounter 27 filters and numerous subfilters according to which it is easier to find out what repository will suit the research data for particular studies. The main filters are the following: 1. subjects, 2. Content types, 3. Countries. 19 The variety of subjects and subtopics is represented by 4 main sectors: • Humanities and Social Sciences, • Life Sciences, • Natural Sciences, • Engineering Sciences. each of these sectors is divided between a variety of topics and subtopics and each subtopic offers some number of repositories for the research data. This tool is very useful for the researchers and for the librarians and data curators during their training and consultations for the researchers.

Reuse (Data citation principles)
As was stated before, data act as a basis for the research output. In order to demonstrate these data packages, each researcher needs to cite the data properly in the articles and conference proceedings.
As is the case for any other material, it is no exception for data to be cited. The data citation is ruled by certain principles. 4. Unique identification, 5. Access, 6. Persistence, 7. specificity and Verifiability, 8. Interoperability and Flexibility. The importance principle means that data should be considered as important as any other research output (e.g. research paper, etc.). It must be cited as it serves as a shred of evidence, which supports the theory or hypothesis. In order to provide better data citation, unique identifiers such as DOI would serve the best part. These data citations and identifiers provide access to the data packages and research results, so the community and scientific world can test the hypothesis or benefit from the research output. Identifiers also provide persistence, which means that data is findable and reusable while it exists, thus, it is flexible and interoperable because other scholars may use it.
These data citation principles act as certain rules or codes of conduct for supporting Academic Integrity principles in the research world thus, the researchers would not be afraid to share their data because of some misconduct or lack of attribution.

FAIR principles
In open knowledge society, it is important to take care of the quality of the data which we use in the researches as we should remember that the data is the main research unit and serves as evidence for the research output. FAIR principles is an acronym that is deciphered as: Findable, Accessible, Interoperable, and Reusable 21 . According to LIBeR (Association of european Research Libraries), these principles may be explained in the following way: • The findable principle implies the existence of DOI or other links or identifiers, metadata descriptions necessary for finding data via search mechanisms or data repositories.
• The accessible principle means that data are kept securely in a data repository with the seal of approval and can be read and understood both by humans and machines.
• The reusable principle provides necessary licenses that are clear and easy to comply with in order to use these data packages.
• The principle of interoperability means that the language of metadata is understandable and widespread across the scholarly community 22 .
As we can see, the huge role of metadata (data about data -thus, description) is observed through the whole range of principles. The role of librarians in supporting FAIR principles lies in promotion, consultation, curation, and providing guidance on depositing in correspondence with FAIR principles.
On the GO FAIR platform, there is a detailed guide on how to adjust data packages in accordance with FAIR principles. The "FAIRification of the data" 23 implies several stages: 1. We receive non-FAIR data; 2. We analyze it from the perspective of the content type and structure; 3. We define the semantic model (description of links between entities in the data set); 4. We make data linkable via Link technologies with the use of the semantic model; 5. We assign a license for effective reuse of the data; 6. Define rich metadata for the data set to be read by humans and machines; 7. Deploy (publish) FAIR data resource. As with any other information resource, the data sets should be made accessible and clear for understanding. As we may find any record in the e-catalog of any library, we can find any data set on data repository nowadays. This became possible with implementing of FAIR principles, Data Citation principles, development of data repositories, and effective data curation practices developed for new library professionals -data stewards or data curators. Robin Rice and John southall in their work The Data Librarian's Handbook state that Data Librarianship is like It data support and is closely connected to cultural norms of research funding in academia. This kind of library practice started first with technical support but now relates to scholarly communication principles in general. 24 These library scientists also pay attention to the role of librarians as advisors on consent agreements when dealing with sensitive data. They consider assistance in creating forms of consent an opportunity for the librarian to be involved in the research process, thus providing feedback. The consent forms have different types so, it is better for the librarian to gather wide-spread templates as with an RDM plan. It may be of assistance in further consultations. Consent agreement is integral part of research data packages when sensitive data involved, so it is the document which will be kept along with the data set in the repository and the librarian is obliged to check it for compliance with the standards for these kind of researches. 25

Conclusion
Research Data Management is a relatively new practice for modern-day librarianship in the world. This practice implies the possession of strong competencies in It, research, librarianship, information literacy, academic integrity, and Open Access principles. With the development of data and technologies for its processing there is a need for qualified specialists to manage these data sets and curate them. Thus, data curators or data librarians became new and demanded professions in information society, especially as different research grant programs have requirements connected to Research Data Management Plans.
The task of data librarian is like the task of traditional librarian -curate information resources, consult customers (researchers) on how to use them. Information age implies a huge amount of data which we may call "Big Data" and these types of information can be analyzed only by means of Data science and special software, but the data librarians can consult how to curate those data sets which are selected by researchers as necessary for their research projects and need long-term preservation.
With the development of the scholarly world we observe information resources development. There is a need for newly created professionals that would curate 24  these resources thus, data sets. Data librarians help researchers to comply with FAIR principles by making data sets findable, accessible, interoperable, and reusable. These principles pay a considerable amount of attention to the metadata which means that the librarian's help is necessary for such purposes. As data sets must be findable and accessible it is very important to involve information professionals -librarians in the description process of data packages and sets because library specialists will assign keywords and fill in metadata fields in the data repository in a more professional way than ordinary repository users. summarizing, we may say that the role of the librarian in the process of research data management is crucial in terms of providing effective scholarship activity. Librarians act as consultants while providing training activities on RDM principles, special tools for different stages of research data lifecycle, and RDM planning. Librarians act as data curators when they are involved in RDM process and provide data services by depositing it to a repository or cleaning and visualizing data sets. Research support is inevitable part of library work in modern-day research libraries and institutions and RDM is a process which in its essence is like traditional library processing of information resources. Data amount is growing which means that its management must be performed as effectively and efficiently as possible. In the variety of the data it is important to select the most valuable information which is necessary for the research. If finding the data for a project is a task of researcher, then to keep and help in providing access to it is a task of library professional.