gromgull | BibSonomy

bookmarks (hide)574
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

8Pattern | CLiPS
Pattern is a web mining module for the Python programming language. It bundles tools for data retrieval (Google + Twitter + Wikipedia API, web spider, HTML DOM parser), text analysis (rule-based shallow parser, WordNet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics), clustering and classification (k-means, KNN, SVM), and data visualization (graph networks).
13 years ago by @gromgull
show all tags
machine-learning
web-mining
nlp
python
library
machine-learningweb-miningnlppythonlibrary
(0)
copydelete
- community post
- history of this post
1Atom Interface (DERI) - YouTube
Atom Interface is a novel interactive visualization of single/multiple tree structures. It is based on the metaphor of electrons, atoms and molecules. For mo...
13 years ago by @gromgull
show all tags
metaphor
visualization
deri
rdf
metaphorvisualizationderirdf
(0)
copydelete
- community post
- history of this post
1IBM - NoSQL Graph Store
DB2 Graph Store is an optimized way to store graph triples inside DB2 database. Support for the SPARQL query language Support for popular RDF Java APIs like JENA Support for HTTP SPARQL end-point via JOSEKI
13 years ago by @gromgull
show all tags
rdf-store
ibm
db2
graph-store
sparql
rdf-storeibmdb2graph-storesparql
(0)
copydelete
- community post
- history of this post
1Large-scale Incremental Processing Using Distributed Transactions and Notifications
Updating an index of the web as documents are crawled requires continuously transforming a large repository of existing documents as new documents arrive. This task is one example of a class of data processing tasks that transform a large repository of data via small, independent mutations. These tasks lie in a gap between the capabilities of existing infrastructure. Databases do not meet the storage or throughput requirements of these tasks: Google's indexing system stores tens of petabytes of data and processes billions of updates per day on thousands of machines. MapReduce and other batch-processing systems cannot process small updates individually as they rely on creating large batches for efficiency. We have built Percolator, a system for incrementally processing updates to a large data set, and deployed it to create the Google web search index. By replacing a batch-based indexing system with an indexing system based on incremental processing using Percolator, we process the same number of documents per day, while reducing the average age of documents in Google search results by 50%.
13 years ago by @gromgull
show all tags
paper
google
big-data
update
papergooglebig-dataupdate
(0)
copydelete
- community post
- history of this post
2Giraph
Giraph builds upon the graph-oriented nature of Pregel but additionally adds fault-tolerance to the coordinator process with the use of ZooKeeper as its centralized coordination service. Giraph follows the bulk-synchronous parallel model relative to graphs where vertices can send messages to other vertices during a given superstep. Checkpoints are initiated by the Giraph infrastructure at user-defined intervals and are used for automatic application restarts when any worker in the application fails. Any worker in the application can act as the application coordinator and one will automatically take over if the current application coordinator fails.
13 years ago by @gromgull
show all tags
big-data
graph-processing
hadoop
map-reduce
big-datagraph-processinghadoopmap-reduce
(0)
copydelete
- community post
- history of this post
2SparQLed - Assisted SPARQL Editor
SparQLed is an interactive SPARQL editor that provides context-aware recommendations, helping users in formulating complex SPARQL queries across multiple heterogeneous data sources.
13 years ago by @gromgull
show all tags
sparql
autocomplete
data-summary
sparqlautocompletedata-summary
(0)
copydelete
- community post
- history of this post
1Tell Me Something I Don't Already know - Acunu Reflex
An simple analytics engine for cassandra. For big data, individual data-point may be less interesting that simple aggregates.
13 years ago by @gromgull
show all tags
analytics
nosql
hive
cloud-computing
analyticsnosqlhivecloud-computing
(0)
copydelete
- community post
- history of this post
2clearspring/stream-lib
Stream summarizer and cardinality estimator. Contribute to stream-lib development by creating an account on GitHub.
13 years ago by @gromgull
show all tags
data-mining
stream-processing
data-miningstream-processing
(0)
copydelete
- community post
- history of this post
3Data Visualization Software | Tulip
Tulip is an information visualization framework dedicated to the analysis and visualization of relational data. Tulip aims to provide the developer with a complete library, supporting the design of interactive information visualization applications for relational data that can be tailored to the problems he or she is addressing.
13 years ago by @gromgull
show all tags
visualization
visualization
(0)
copydelete
- community post
- history of this post
1CEI SOFTWARE | EnSight Post-processing and Visualization for Scientific Data
EnSight Post-processing and Visualization for Scientific Data
13 years ago by @gromgull
show all tags
visualization
visualization
(0)
copydelete
- community post
- history of this post
2VisIt Visualization Tool
Developed by the Lawrence Livermore National Laboratory, VisIt contains a rich set of visualization methods—such as contour plots, pseudocolor plots, volume plots, vector plots, and boundary plots—for visualizing scientific data. VisIt allows the ability to provide quantitative as well as qualitative information from a scientific data set.
13 years ago by @gromgull
show all tags
visualization
visualization
(0)
copydelete
- community post
- history of this post
10VTK - The Visualization Toolkit
The Visualization ToolKit (VTK) is an open source, freely available software system for 3D computer graphics, image processing, and visualization used by thousands of researchers and developers around the world.
13 years ago by @gromgull
show all tags
visualization
visualization
(0)
copydelete
- community post
- history of this post
1ParaView - Open Source Scientific Visualization
ParaView is an open-source, multi-platform application designed to visualize data sets of size varying from small to very large.
13 years ago by @gromgull
show all tags
visualization
data
visualizationdata
(0)
copydelete
- community post
- history of this post
1Apache Stanbol - Entityhub
http://incubator.apache.org/stanbol/docs/trunk/entityhub/
13 years ago by @gromgull
show all tags
entity-matching
google-refine
entity-matchinggoogle-refine
(0)
copydelete
- community post
- history of this post
2Select2 2.0
http://ivaynberg.github.com/select2/
13 years ago by @gromgull
show all tags
javascript
jquery
select
javascriptjqueryselect
(0)
copydelete
- community post
- history of this post
1Probabilistic Data Structures for Web Analytics and Data Mining « Highly Scalable Blog
http://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/
13 years ago by @gromgull
show all tags
big-data
machine-learning
approximation
heuristics
data-structures
online-learning
big-datamachine-learningapproximationheuristicsdata-structuresonline-learning
(0)
copydelete
- community post
- history of this post
1Silk, the Semantic Web for the rest of us
New Dutch cloud service Silk, which is launching today, wants to fulfill the promise of the Semantic Web and make your documents, web pages and files more powerful -- and with a few fixes, it could get there.
13 years ago by @gromgull
show all tags
semantic-web
commercialisation
semantic-webcommercialisation
(0)
copydelete
- community post
- history of this post
2Prior Knowledge Home |
Another cloud computing service for prediction. This time based on a massive joint-prob model across all variables.
13 years ago by @gromgull
show all tags
machine-learning
web-service
cloud-computing
machine-learningweb-servicecloud-computing
(0)
copydelete
- community post
- history of this post
2Machine Learning in Python Has Never Been Easier!
BigML is a web-service for running distributed ML. 64gb free upload, API in python.
13 years ago by @gromgull
show all tags
machine-learning
python
api
cloud-computing
machine-learningpythonapicloud-computing
(0)
copydelete
- community post
- history of this post
1Simple federated queries with RDF - bobdc.blog
a nice example of integrating data-sources without rewriting the data.
13 years ago by @gromgull
show all tags
RDF
SPARQL
example
mapping
RDFSPARQLexamplemapping
(0)
copydelete
- community post
- history of this post

BibSonomy

bookmarks (hide)574
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

8Pattern | CLiPS

1Atom Interface (DERI) - YouTube

1IBM - NoSQL Graph Store

1Large-scale Incremental Processing Using Distributed Transactions and Notifications

2Giraph

2SparQLed - Assisted SPARQL Editor

1Tell Me Something I Don't Already know - Acunu Reflex

2clearspring/stream-lib

3Data Visualization Software | Tulip

1CEI SOFTWARE | EnSight Post-processing and Visualization for Scientific Data

2VisIt Visualization Tool

10VTK - The Visualization Toolkit

1ParaView - Open Source Scientific Visualization

1Apache Stanbol - Entityhub

2Select2 2.0

1Probabilistic Data Structures for Web Analytics and Data Mining « Highly Scalable Blog

1Silk, the Semantic Web for the rest of us

2Prior Knowledge Home |

2Machine Learning in Python Has Never Been Easier!

1Simple federated queries with RDF - bobdc.blog

publications (hide)
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...

discussion

concepts

similar users

shared groups

tags

bookmarks (hide)574 displayallbookmarks onlybookmarks per page5102050100 sort byadded attitle RSSBibTeXXML

publications (hide) displayallpublications onlypublications per page5102050100 sort byadded attitleauthorpublication dateentry typehelp for advanced sorting... RSSBibTeXRDFmore...

discussion

similar users

shared groups

tags

bookmarks (hide)574
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

publications (hide)
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...