New online classes:Declarative Information Extraction with SystemT

Event Notification Type: 
Other
Abbreviated Title: 
Location: 
State: 
California
Country: 
United States
Contact Email: 
City: 
San Jose
Contact: 
Laura Chiticariu

We are happy to announce the release of two free courses on Declarative Information Extraction (IE) with SystemT, available at: https://bigdatauniversity.com/learn/text_analytics

SystemT, a declarative IE system, has been designed and developed to address requirements of modern enterprise applications: accuracy, productivity, scalability, expressivity, transparency, and customizability. SystemT is based on the basic principle underlying relational database technology: complete separation of specification from execution. It consists of a state-of-the-art AQL language for expressing NLP algorithms, optimizer and runtime engine for execution at scale, and an easy to use visual interface. SystemT makes Information Extraction orders of magnitude more scalable and easy to use, maintain and customize.

Highlights of the courses:
• Learn to write information extraction programs, hands-on, in just hours !
• No prerequisites, and nothing to install !
• It’s free !

For questions, email chiti {at} us {.} ibm {.} com
SystemT publications and videos at: http://researcher.watson.ibm.com/researcher/view_group.php?id=1264

Best regards,
The SystemT team, in IBM Research – Almaden
http://researcher.watson.ibm.com/researcher/view_group.php?id=1264
---

Course 1: Text Analytics – Getting Results with SystemT

Information Extraction (IE) is the problem of distilling structured information from unstructured text. Example IE tasks range from finding mentions of Named Entities such as people and places or relationships between entities, finding opinions about products to deep semantic understanding of a sentence. IE has emerged as an essential building block for applications that leverage unstructured text, including social media analytics, healthcare analytics, financial risk analysis, semantic search, regulatory compliance, legal discovery and many others.
The course starts with an overview of IE, including common tasks, rule-based and machine learning based techniques, and quality evaluation for IE systems. The rest of the course focuses on the new paradigm of “Declarative Information Extraction”, which has recently emerged as a powerful approach to building high-performance IE systems. One particular system – SystemT developed in IBM Research, and available as IBM InfoSphere BigInsights Text Analytics – will be discussed in detail, including:
• Developing extractors using the visual interface in SystemT’s web tool (suitable for data scientists)
• Developing extractors by writing programs in AQL, the rule language of SystemT (suitable for NLP engineers)
At the end of the class, participants will have an understanding of Information Extraction techniques, and hands-on experience writing IE programs using SystemT.

Syllabus:
• Module 1 - Getting to know Information Extraction (IE)
• Module 2 - Getting to know SystemT
• Module 3 - IE with AQL
• Module 4 - AQL Basics
• Module 5 - Advanced AQL

---
Course 2: Advanced Text Analytics – Getting Results with SystemT
A continuation of the first class, this course discusses early IE rule-based systems based on a standard formalism of cascading grammars, and shows through detailed examples how this formalism suffers from fundamental limitations in both expressivity and runtime performance. These limitations lead to scalability, accuracy and usability issues with the information extraction program. The class then describes the declarative principles behind the SystemT Information Extraction system and the SystemT Optimizer, and how this approach addresses these limitations, leading to extractors that are scalable, accurate and easy to maintain and enhance for a new domain.

Syllabus:
• Module 1 - Limitations in previous approaches to rule-based IE
• Module 2 - Declarative IE and the SystemT optimizer