Velikanov.pro → Get started with Java and Spring. Part II. Data

Get started with Java and Spring

Part II. Data

You can find all the sources for this chapter here

Get started with Java and Spring series

This article series is designed mostly for computer engineers with non-Java/Spring programming background who want to get started with new technology fast and flawlessly.

Though it doesn’t cover most valuable principles of development, migrating and deployment, it can be used as good introductory view on new technology.

I won’t explain every step of this guide deeply but will try to serve you with valuable links where you would get much more useful information.

What you will learn here:

Throughout our journey we’ll build a simple Tube application which will crawl some videos from YouTube and present it on our website.

Introduction

Spring Data JPA is the best way to bootstrap your database interaction facilities in the application.

JPA itself provides you with the ability to describe entities, relations and many more while Spring Data JPA is used to interact with entities easily through Repositories.

Action

Defining entities

Let’s start with defining of Video entity.

At the beginning it will store the base information about our videos like title, thumbnail and duration.

Also we need to store the external video id to make sure that we don’t need to fetch this video second time.

./library/src/main/java/com/company/library/domain/Video.java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
package com.company.library.domain;

import org.hibernate.annotations.CreationTimestamp;
import org.hibernate.annotations.UpdateTimestamp;

import javax.persistence.*;
import java.util.Date;

@Entity
public class Video {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Integer id;
    @Column(nullable = false)
    private String title;
    @Column(unique = true, nullable = false)
    private String externalId;
    @Column(nullable = false)
    private String imageUri;
    @Column(nullable = false)
    private String duration;

    @CreationTimestamp
    @Column(updatable = false)
    private Date createdDate;
    @UpdateTimestamp
    @Column
    private Date modifiedDate;

    public Integer getId() {
        return id;
    }

    public String getTitle() {
        return title;
    }
    public void setTitle(String title) {
        this.title = title;
    }

    public String getExternalId() {
        return externalId;
    }
    public void setExternalId(String externalId) {
        this.externalId = externalId;
    }

    public String getImageUri() {
        return imageUri;
    }
    public void setImageUri(String imageUri) {
        this.imageUri = imageUri;
    }

    public String getDuration() {
        return duration;
    }
    public void setDuration(String duration) {
        this.duration = duration;
    }

    public Date getCreatedDate() {
        return createdDate;
    }
    public void setCreatedDate(Date createdDate) {
        this.createdDate = createdDate;
    }

    public Date getModifiedDate() {
        return modifiedDate;
    }
    public void setModifiedDate(Date modifiedDate) {
        this.modifiedDate = modifiedDate;
    }
}

Here we have defined some key abilities for our main entity like Generated Id and Creation/Update Timestamps.

Also we have an unique External ID that we will use to avoid duplicates in our data storage.

Configuring the Scheduler

First of all we have to fetch some videos to store them in our database.

We need to run a periodic task that will run with one minute intervals and watch the updates on external website.

We’ll start with defining of Spring Boot Application searching for any scheduled tasks and running them as we configure.

./scheduler/src/main/java/com/company/scheduler/Scheduler.java

1
2
3
4
5
6
7
8
9
10
11
package com.company.scheduler;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class Scheduler {
    public static void main(String[] args) {
        SpringApplication.run(Scheduler.class, args);
    }
}

Here we have defined our Spring Boot Application and allowed to run it using Scheduler class.

But first we have to define our database connection in application.properties file - the main Spring Boot configuration file.

BTW, you can find more about possible properties here.

./scheduler/src/main/resources/application.properties

1
2
3
4
5
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.MySQL5InnoDBDialect

spring.datasource.url=jdbc:mysql://localhost:3306/project?autoReconnect=true&useSSL=false
spring.datasource.username=project
spring.datasource.password=project

Here we have defined our DSN with auto reconnection feature and without using SSL just to suppress the non-secured connection warning.

Also we specified the SQL Dialect used by Hibernate data framework.

That’s almost all but we need to have a database running so we can kickstart it using Docker with Docker Compose like this.

If you are a Mac OS user you definitely want to use Docker for Mac. Especially its Edge version.

./docker-compose.yml

1
2
3
4
5
6
7
8
9
10
11
12
version: "3"

services:
  db:
    image: mysql:latest
    ports:
      - "3306:3306"
    environment:
      MYSQL_ROOT_PASSWORD: toor
      MYSQL_DATABASE: project
      MYSQL_USER: project
      MYSQL_PASSWORD: project

Start the database container using docker-compose up command from inside the directory where docker-compose.yml is located and we’ll have our shiny new database server running and listening on our host 3306 port.

Sure we need to include the MySQL dependency in case to be able to interact with MySQL database.

We can do it inside our main POM file to share the connector between modules.

./pom.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <!-- ... -->

    <dependencies>
        <!-- ... -->

        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
        </dependency>
    </dependencies>
</project>

And now we are ready to run our Scheduler application.

Preparations

Right now our application doesn’t make any sense. It runs and dies without any useful goals achieved.

So we have to schedule our operations.

Let’s write a basic crawler interface and service that will implement the crawling method.

Then we just call this service from our scheduled task and actually write data to the database.

./scheduler/src/main/java/com/company/scheduler/crawler/Crawler.java

1
2
3
4
5
package com.company.scheduler.crawler;

public interface Crawler {
    void crawl();
}

Now we need our crawlers to be able to do just one thing - crawl. No remorse.

./scheduler/src/main/java/com/company/scheduler/crawler/VideoCrawler.java

1
2
3
4
5
6
7
8
9
10
11
12
13
package com.company.scheduler.crawler;

import org.springframework.stereotype.Component;

@Component
public class VideoCrawler implements Crawler {
    @Override
    public void crawl() {
        System.out.println("New videos crawling started");

        System.out.println("New videos crawling ended");
    }
}

This is the place where all the hard work will be done.

Our crawler is a Component that will be found by Spring Boot Component Scan and can be Autowired in other components.

./scheduler/src/main/java/com/company/scheduler/schedule/CrawlSchedules.java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
package com.company.scheduler.schedule;

import com.company.scheduler.crawler.VideoCrawler;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.scheduling.annotation.EnableScheduling;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;

@Component
@EnableScheduling
public class CrawlSchedules {
    private final VideoCrawler videoCrawler;

    @Autowired
    public CrawlSchedules(VideoCrawler videoCrawler) {
        this.videoCrawler = videoCrawler;
    }

    @Scheduled(fixedDelay = 1 * 60 * 1000)
    public void scheduleNewVideosCrawl() {
        videoCrawler.crawl();
    }
}

And this is our almighty Video Crawler that will be called after 60 seconds has passed since last call or at the start of our application.

Now we can run our Scheduler application and realise that we have our Video Crawler running.

Fetching data

Now we can start to actually crawl videos from external source and store them in our object layer.

As we getting started with YouTube we will need an API key to be able to get videos updates.

When we have obtained API key we can start to implement our fetcher and we need to store our API key somewhere not in source code.

So we will do this inside our application.properties file.

./scheduler/src/main/resources/application.properties

1
2
3
4
5
6
7
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.MySQL5InnoDBDialect

spring.datasource.url=jdbc:mysql://localhost:3306/project?autoReconnect=true&useSSL=false
spring.datasource.username=project
spring.datasource.password=project

com.company.scheduler.you-tube-api-key=YOUR_GOOGLE_YOUTUBE_API_KEY_WITHOUT_QUOTES

Now we need to forward this property to our application layer. We will make this happen by defining a Configuration Properties component.

./scheduler/src/main/java/com/company/scheduler/properties/SchedulerProperties.java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
package com.company.scheduler.properties;

import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.context.annotation.Configuration;

@Configuration
@ConfigurationProperties("com.company.scheduler")
public class SchedulerProperties {
    private String youTubeApiKey = "";

    public String getYouTubeApiKey() {
        return youTubeApiKey;
    }
    public void setYouTubeApiKey(String youTubeApiKey) {
        this.youTubeApiKey = youTubeApiKey;
    }
}

Now our YouTube API key will be available from inside our Scheduler Properties Configuration Component instance.

We will fetch our videos with self-written YouTube API interaction provider.

./scheduler/src/main/java/com/company/scheduler/provider/YouTubeVideoProvider.java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
package com.company.scheduler.provider;

import com.company.scheduler.properties.SchedulerProperties;
import com.google.api.client.http.HttpRequest;
import com.google.api.client.http.HttpRequestInitializer;
import com.google.api.client.http.javanet.NetHttpTransport;
import com.google.api.client.json.jackson2.JacksonFactory;
import com.google.api.services.youtube.YouTube;
import com.google.api.services.youtube.YouTubeRequestInitializer;
import com.google.api.services.youtube.model.Video;
import com.google.api.services.youtube.model.VideoListResponse;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

@Component
public class YouTubeVideoProvider {
    private YouTube youTube;

    @Autowired
    public YouTubeVideoProvider(SchedulerProperties schedulerProperties) {
        youTube = new YouTube.Builder(
                new NetHttpTransport(),
                new JacksonFactory(),
                new HttpRequestInitializer() {
                    @Override
                    public void initialize(HttpRequest httpRequest) throws IOException {

                    }
                }
        ).setYouTubeRequestInitializer(new YouTubeRequestInitializer(schedulerProperties.getYouTubeApiKey())).build();
    }

    public List<Video> getRecentTrendingVideoList()
    {
        try {
            YouTube.Videos.List youTubeVideosList = youTube.videos().list("contentDetails,snippet");
            youTubeVideosList.setChart("mostPopular");

            VideoListResponse youtubeVideosListResponse = youTubeVideosList.execute();

            return youtubeVideosListResponse.getItems();
        } catch (IOException e) {
            e.printStackTrace();

            return new ArrayList<>();
        }
    }
}

Here we have defined our YouTube video provider that will create a YouTube connector instance and implement some methods for videos loading.

We have to create Video converter that will convert the YouTube video to our Video entity object.

./scheduler/src/main/java/com/company/scheduler/converter/VideoConverter.java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
package com.company.scheduler.converter;

import com.company.library.domain.Video;
import com.google.api.services.youtube.model.Thumbnail;
import com.google.api.services.youtube.model.ThumbnailDetails;
import com.google.api.services.youtube.model.VideoContentDetails;
import com.google.api.services.youtube.model.VideoSnippet;
import org.springframework.stereotype.Component;

@Component
public class VideoConverter {
    public static class ThumbnailUrlResolver {
        static String resolveThumbnailUrl(ThumbnailDetails thumbnailDetails) {
            Thumbnail thumbnail = thumbnailDetails.getMaxres();
            if (null != thumbnail) {
                return thumbnail.getUrl();
            }

            thumbnail = thumbnailDetails.getHigh();
            if (null != thumbnail) {
                return thumbnail.getUrl();
            }

            thumbnail = thumbnailDetails.getMedium();
            if (null != thumbnail) {
                return thumbnail.getUrl();
            }

            thumbnail = thumbnailDetails.getStandard();
            if (null != thumbnail) {
                return thumbnail.getUrl();
            }

            thumbnail = thumbnailDetails.getDefault();
            if (null != thumbnail) {
                return thumbnail.getUrl();
            }

            return null;
        }
    }

    public Video createFromYoutubeVideo(com.google.api.services.youtube.model.Video youtubeVideo) {
        VideoSnippet youtubeVideoSnippet = youtubeVideo.getSnippet();
        VideoContentDetails youtubeVideoContentDetails = youtubeVideo.getContentDetails();

        Video video = new Video();
        video.setExternalId(youtubeVideo.getId());

        if (null != youtubeVideoSnippet) {
            video.setTitle(youtubeVideoSnippet.getTitle());

            ThumbnailDetails youtubeVideoThumbnailDetails = youtubeVideoSnippet.getThumbnails();
            if (youtubeVideoThumbnailDetails.size() > 0) {
                video.setImageUri(ThumbnailUrlResolver.resolveThumbnailUrl(youtubeVideoThumbnailDetails));
            }
        }

        if (null != youtubeVideoContentDetails) {
            video.setDuration(youtubeVideo.getContentDetails().getDuration());
        }

        return video;
    }
}

And now we can assemble all the written components and reach our primary goal - crawl the videos.

./scheduler/src/main/java/com/company/scheduler/crawler/VideoCrawler.java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
package com.company.scheduler.crawler;

import com.company.library.domain.Video;
import com.company.library.domain.VideoRepository;
import com.company.scheduler.converter.VideoConverter;
import com.company.scheduler.provider.YouTubeVideoProvider;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;

import java.util.List;

@Component
public class VideoCrawler implements Crawler {
    private final VideoConverter videoConverter;
    private final YouTubeVideoProvider youTubeVideoProvider;
    private final VideoRepository videoRepository;

    @Autowired
    public VideoCrawler(
            VideoConverter videoConverter,
            YouTubeVideoProvider youTubeVideoProvider,
            VideoRepository videoRepository
    ) {
        this.videoConverter = videoConverter;
        this.youTubeVideoProvider = youTubeVideoProvider;
        this.videoRepository = videoRepository;
    }

    @Override
    public void crawl() {
        System.out.println("New videos crawling started");

        List<com.google.api.services.youtube.model.Video> recentTrendingVideoList =
                youTubeVideoProvider.getRecentTrendingVideoList();

        for (com.google.api.services.youtube.model.Video youtubeVideo : recentTrendingVideoList) {
            if (null != videoRepository.findByExternalId(youtubeVideo.getId())) {
                break;
            }

            Video video = videoConverter.createFromYoutubeVideo(youtubeVideo);

            videoRepository.save(video);

            System.out.println(String.format("Video %s (%s) saved", video.getTitle(), video.getExternalId()));
        }

        System.out.println("New videos crawling ended");
    }
}

Everything is ready to start the crawling but the database schema.

Hibernate has very handy feature for development environments - Automatic DDL Update.

./scheduler/src/main/resources/application.properties

1
2
3
4
5
6
7
8
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.MySQL5InnoDBDialect
spring.jpa.hibernate.ddl-auto=update

spring.datasource.url=jdbc:mysql://localhost:3306/project?autoReconnect=true&useSSL=false
spring.datasource.username=project
spring.datasource.password=project

com.company.scheduler.you-tube-api-key=YOUR_GOOGLE_YOUTUBE_API_KEY_WITHOUT_QUOTES

Finally we can start our Scheduler and fetch our first five videos.

In the next chapter we will learn how to use Spring with Jedis to store and retrieve fast data.

You can find all the sources for this chapter here