Big data in practice using Spark (EN/NL/FR)

Tijdsduur
Locatie
Op locatie, Online
Startdatum en plaats

Big data in practice using Spark (EN/NL/FR)

ABIS
Logo van ABIS
Opleiderscore: starstarstarstarstar_border 8 ABIS heeft een gemiddelde beoordeling van 8 (uit 1 ervaring)

Tip: meer info over het programma, prijs, en inschrijven? Download de brochure!

Startdata en plaatsen
placeLeuven (BE)
5 mei. 2026 tot 6 mei. 2026
computer Online: Zoom, Teams
5 mei. 2026 tot 6 mei. 2026
Beschrijving

Learn to work with Spark, the ideal framework for data analytics in the cloud, during this two-day ABIS training!

Nowadays everybody seems to be working with AI, Data Science and "big data". No doubt also you would like to interrogate your voluminous data sources (click streams, social media, relational data, cloud data, sensor data, ...) and are experiencing the shortcomings of traditional data analytics tools. Maybe you want the processing power of a cluster --and its parallel processing capabilities-- to interrogate your distributed data stores.

If fast prototyping and processing speed are a priority, Spark will most likely be the platform of your choice. Apache Spark is an open source …

Lees de volledige beschrijving

Veelgestelde vragen

Er zijn nog geen veelgestelde vragen over dit product. Als je een vraag hebt, neem dan contact op met onze klantenservice.

Nog niet gevonden wat je zocht? Bekijk deze onderwerpen: Sensoren, Big Data, Procesmanagement, Python en Statistiek.

Learn to work with Spark, the ideal framework for data analytics in the cloud, during this two-day ABIS training!

Nowadays everybody seems to be working with AI, Data Science and "big data". No doubt also you would like to interrogate your voluminous data sources (click streams, social media, relational data, cloud data, sensor data, ...) and are experiencing the shortcomings of traditional data analytics tools. Maybe you want the processing power of a cluster --and its parallel processing capabilities-- to interrogate your distributed data stores.

If fast prototyping and processing speed are a priority, Spark will most likely be the platform of your choice. Apache Spark is an open source processing engine focusing on low latency, ease of use, flexibility and analytics. It's an alternative to the MapReduce approach delivered of Hadoop with Hive (cf our course Big data in practice using Hadoop). Spark has complemented -actually superseded- Hadoop, due to the higher abstraction of Spark's APIs and its faster, in-memory processing.

More specifically, Spark allows to easily interrogate data sources on HDFS, in a NoSQL database (e.g. Cassandra or HBase), in a relational database, in the cloud (e.g. AWS S3) or in local files. Independent of this, a Spark job can easily run on either your local machine (i.e., in development mode), or on a Hadoop cluster (with Yarn), or a Mesos environment, or Kubernetes, or in the cloud. And all this through a simple Spark script or through a more complex (Java or Python) program or though a web based notebook (e.g. Zeppelin or Databricks).

This course builds on the context set forth in the Big data architecture and infrastructure overview course.

  • You will get hands-on practice (on Linux) with Spark and its libraries.
  • You learn how to implement robust data processing (in Python, Scala, or R) with an SQL-style interface.
  • After successful completion of the course, you will have sufficient basic expertise to set up a Spark or Databricks development environment, and use it to interrogate your data.
  • You will be able to write simple Spark scripts and programs (with the Python based PySpark, or with the Scala based SparkShell) based on DataFrames and RDDs, and optionally also use the MLlib, GraphX, or Streaming libraries.

Intended for

Whoever wants to start practising Spark: developers, data architects, and anyone who needs to work with data science technology.

Backgroud

Familiarity with the concepts of data clusters and distributed processing is necessary; see our course Big data architecture and infrastructure. Additionally, minimal knowledge of SQL and Linux are useful. Minimal experience with at least one programming language (e.g. Java, Python, Scala, Perl, JavaScript, PHP, C++, C#, ...) is a must.

Main topics

  • Motivation for Spark & base concepts
    • The Apache Spark project and its components
    • Spark and Databricks
    • Getting to learn the Spark architecture and programming model
    • The principles of Data Analytics
  • Data sources
    • Learn how to access data residing in Hadoop HDFS, Cassandra, AWS S3, or a relational database
  • Interfaces
    • Working with the several programming interfaces (specifically: Spark-shell and PySpark)
    • Writing and debugging programs for simple data analytic problems
  • Data Frames and RDDs
  • A short introduction to the use of the Spark libraries
    • SparkSQL
    • Machine learning (MLlib)
    • Streaming (i.e., processing "volatile" data)
    • Parallel computations in trees and graphs (GraphX)

Training method

Classroom instruction, supported by practical examples and extensive practical exercises.

Delivered as a live, interactive training – available in-person or online, or in a hybrid format. Training can be implemented in English, Dutch, or French.

Certificate

At the end of the session, the participant receives a "Certificate of Completion".

Duration
2 days.

Blijf op de hoogte van nieuwe ervaringen
Er zijn nog geen ervaringen.
Deel je ervaring
Heb je ervaring met deze cursus? Deel je ervaring en help anderen kiezen. Als dank voor de moeite doneert Springest € 1,- aan Stichting Edukans.

Er zijn nog geen veelgestelde vragen over dit product. Als je een vraag hebt, neem dan contact op met onze klantenservice.

Download gratis en vrijblijvend de informatiebrochure

(optioneel)
(optioneel)
(optioneel)
(optioneel)
(optioneel)
(optioneel)
(optioneel)

Heb je nog vragen?

(optioneel)
We slaan je gegevens op om je via e-mail en evt. telefoon verder te helpen.
Meer info vind je in ons privacybeleid.