Recommendation Component

From agINFRA

Jump to: navigation, search

Contents

Recommendation Component

The recommendation component is a tool that runs in the Grid and allows users to process user annotations such as ratings, tags and reviews in order to execute content-based and collaborative filtering algorithms and create various lists of recommended/suggested content resources/items, individualized per user.

The recommender component is a tool that runs on the agINFRA Grid in order to processs and analyse the input data. The power of the agINFRA Grid provides fast and accurate analysis of user annotations such as ratings, tags and reviews, executing content-based and collaborative filtering algorithms and creation of various lists of recommended content resources/items.

The component is used by other tools inside agINFRA, such as the social navigation component, or by external tools such as the CollaFis, a graphical user interface to execute different analysis based on a given or a user’s datasets and by using different algorithms.

Responsible body(ies):

  • Agro-Know

Usage and deployment (if publicly accessible)

The recommender service is deployed on the agINFRA Grid and is accessible for other tools to use it through an API.

Web address of the deployment: The component API is documented and can be found in the agINFRA RESTful gateway at http://agro.ipb.ac.rs/.

Example Usage Scenario

George is in charge for the development of a recommendation service that will be used on a portal for organic resources. The service will aim to increase the popularity of the portal by offering recommendations to users about resources that they may find interesting. George is faced with the choice of which algorithm is most suitable for the specific dataset of the portal. Choosing the wrong algorithm can reduce the accuracy of the recommendation service, frustrate the users and decrease the portal’s popularity, thus it is important that George chooses a suitable algorithm otherwise the service would offer poor suggestions and make the portal less appealing.

David is the administrator responsible for the portal maintained by George. David needs to ensure that the portal provides services of high quality throughout its lifetime. One of the things that David needs to take care of is to fine-tune the recommendation system that George built. The recommendation system takes as input user activity generated in the portal and processes it to find resources that the users will consider useful. Naturally, as the number of users and their habits change David knows that he has to fine tune the parameters of the recommendation algorithm to meet with these changes. Failure to do so will result in poor suggestions to the users. David would like to periodically re-evaluate the recommendation algorithm to ensure it is correctly configured.

Sofia is a consultant at an agricultural consultant agency. Sofia usually searches the web for resources on different subjects in the field of agriculture in order to provide her consulting services and has to keep up with new publications. Sofia has learned that the portal maintained by George could offer recommended resources and could limit the time she spends searching and decides to try the service.

Before agINFRA

George decides to use the CollaFis tool to help him choose the most appropriate recommendation algorithm, suitable for the dataset of David’s portal. In order to install CollaFis George need to meet specific requirements and follow a number of steps.

  1. Select an appropriate server machine with adequate hardware resources and a fresh Linux installation (for example Debian).
  2. Download the latest version of the CollaFis from BitBucket, a file approximately 90MB.
  3. Download the relevant “Collafis Installation Note.docx” file
  4. Acquire shell access to the server and install required software components, including Apache Tomcat web server (version 6.0 or higher), MySQL database server (version 5.0 or higher), Springsource Java Framework. So, at Debian he would need to use the shell and issue commands like:
  • > apt-get update; apt-get install sun-java6-jdk apache2 tomcat6 libmysql-java
  • > apt-get install mysql-server
  1. Configure apache2, tomcat and the mysql server
  2. Edit the configuration files in order to configure specific parameters for the new installation, including version, connection URL and credentials for the mysql database, email parameters, etc.
  3. Initialize the application by visiting the following URL http://localhost:8080/collafis/init
  4. Visit the URL http://localhost:8080/collafis/experiment and start using the tool.

After preparing the CollaFis installation, George uploads a sample from the portal’s dataset. In fact, due to the limited resources George can only run the evaluation on a small sample of the initial dataset making the results less accurate. Moreover, each evaluation takes significant amount of time making the process tiring and ineffective.

David decides to introduce the recommendation service in the portal, asking George to undertake its installation and integration. Moreover, David decides to test his dataset and reconfigure the recommendation engine on the portal weakly. In order to do so he can use the CollaFis tool. CollaFis provides a unique mode called “serial execution”. In this mode the user specifies a dataset and a recommendation algorithm; CollaFis execute the algorithm multiple times with different parameters in order to find the optimal ones for the specified dataset.

David is faced with same challenges as George; due to limited physical memory of the host server, David can only execute CollaFis against a subset of the dataset making the results less accurate. Moreover David has to face an extra challenge: the serial execution mode is much more demanding in terms of resources since it has to run the evaluation multiple times. David is faced with the choice of limiting the number of evaluations risking a suboptimal result or wait for a long period of time. Either way David must compromise. Sofia uses the service to search for recommendations on a specific subject. After using the service she realizes that the recommendations are not very accurate and the data pool is very small. Most of the recommendations she has already find them herself and she decides that the tool is not that useful to her.

agINFRA powered version

George learns that the agINFRA infrastructure supports the execution of a Recommendation module (CollaFis) and decides to give it a try. He is presented with a simple form where he can select that he wishes to install a new CollaFis instance, operating on Cloud resources. He provides some options, including the approximate size of the dataset, a short name and basic personal details and he presses the “Install” button. George is surprised that he can access CollaFis without the hassle of configuring anything and, furthermore, he does not have to struggle to allocate resources to run the module. When George is asked to choose the dataset for the evaluation, he is also given the option to upload his own dataset.

George decides to upload a sample from David’s portal. Since all data transfer is going-on through the Cloud, this step is also really quick and straightforward. After a few simple steps the dataset is ready for usage. He chooses which evaluation algorithm to use and configure its parameters. In the end of the process, he is presented with the evaluation of the algorithm for the given dataset. Finally, he can repeat the process choosing different algorithms. George finds CollaFis to be much faster and more scalable than any host server he would be able to use. This is because CollaFis takes advantage of the agINFRA Grid infrastructure.

George is curious about the “dataset enhancement” feature that CollaFis provides, which in fact he could not manage to use in a more resource constrained infrastructure. CollaFis uses random simulations to produce synthetic data that incorporates into the existing dataset. The new augmented dataset is an evolution of the existing dataset with extra synthetic data. George can evaluate the recommendation algorithms using both the existing and augmented dataset, gaining some insight on how the algorithm will perform as the dataset evolves.

David learns from George that there is also a version of CollaFis powered by the agINFRA Grid infrastructure. He decides to take a look; he visits the URL and discovers the same familiar interface. He quickly uploads the entire dataset from his portal and starts a `serial execution`. CollaFis takes advantage of the multiple processors of the Grid and performs multiple evaluations in parallel reducing the total time of the execution in a fraction of the time it would take in a single host. David is very impressed with the performance of CollaFis in the agINFRA Grid.

Sofia uses the new service in the portal. This time the recommendations are very accurate and Sofia discovers a number of new publications and some new sites to visit. Sofia is very impressed by this service and informs her colleagues in the consulting agency for the portal.

APIs

The recommender API end points and documentation is published at http://agro.ipb.ac.rs/userhow/index.html.

The recommender component was developed using Java and Apache Mahout machine learning library (http://mahout.apache.org/). The recommender API is following the REST architecture and uses JSON as a format for exchanging data. CollaFiS was developed using a MVC Java Framework namely Spring Roo (http://projects.spring.io/spring-roo/).

Example 1. Retrieving recommender output produced from particular input


Parameter recommend.process_parameter_id
Value The ID of the particular recommender input
Example Request curl http://agro.ipb.ac.rs/agcouchdb/_design/recommends/_list/search/list?recommend.process_parameter_id=f7248d1de52816a98a2e96a7c58ab249
Example Response

{ "offset":0, "total_rows":1, "rows":[ { "id":"f7248d1de52816a98a2e96a7c58b2311", "key":"f7248d1de52816a98a2e96a7c58b2311", "value":{ "_id":"f7248d1de52816a98a2e96a7c58b2311", "_rev":"2-002d4d187dedd1a11f8c0079df85ed78", "document_type":"recommend", "recommend":{ "type":"rcm_out", "process":"agrecommender", "process_parameter":"recommend", "process_parameter_id":"f7248d1de52816a98a2e96a7c58ab249", "utc_timestamp":1367072517, "utc_datetime":"Sat Apr 27 14:21:57 2013", "recommend":{...}, "agrecommender":{...}, "number_of_files":1, "size_of_files":"68K", "lfn_dataset_location":"lfn:/grid/vo.aginfra.eu/datasets/f7248d1de52816a98a2e96a7c58b2311.tgz", "lfn_log_location":"lfn:/grid/vo.aginfra.eu/logs/f7248d1de52816a98a2e96a7c58b2311.tgz", "http_dataset_location":"http://agro.ipb.ac.rs/datasets/f7248d1de52816a98a2e96a7c58b2311.tgz", "http_log_location":"http://agro.ipb.ac.rs/logs/f7248d1de52816a98a2e96a7c58b2311.tgz" }, "document_publisher":{...} } } ] }

Example 2. Retrieving the list of registered recommender inputs


Parameter key
Value rcm_in
Example Request curl http://agro.ipb.ac.rs/agcouchdb/_design/recommends/_view/list_by_type?key=\"rcm_in\"
Example Response

{ "total_rows":12, "offset":0, "rows":[ { "id":"ee7c34512d362f8316d6268385ffd96a", "key":"rcm_in", "value":{ "_id":"ee7c34512d362f8316d6268385ffd96a", "_rev":"3-63f2f422630a9558dc7aea141531d417", "document_type":"recommend", "recommend":{ "type":"rcm_in", "process":"agupload", "ext_dataset_location":"http://dusan.ipb.ac.rs/ratings.tgz", "rcm_pars":[ "--similarity cosine recommend", "--similarity cosine recommend 42", "--howMany 5 --neighborhoodSize 5 --similarity cosine recommend", "--howMany 5 --neighborhoodSize 5 --similarity cosine evaluate 0.8 0.2" ], "number_of_files":5, "size_of_files":"9.5M", "http_dataset_location":"http://agro.ipb.ac.rs/datasets/ee7c34512d362f8316d6268385ffd96a.tgz", "lfn_dataset_location":"lfn:/grid/vo.aginfra.eu/datasets/ee7c34512d362f8316d6268385ffd96a.tgz" }, "document_publisher":{...} } }, {...}, ... ] } | | |}

Personal tools