Nomenklatura can not only store the master set of entities you want to match against but also will learn and record the various aliases for a given entity - such as a person, organisation or place - may have in various datasets. With Nomenklatura its a matters of minutes to set up your own set of master data to match against and it provides a simple user interface and API which you can then use do matching (the API is compatible with Open Refine's reconciliation function). Nomenklatura is a simple service that makes it easy to maintain a canonical list of entities such as persons, companies or event streets and to match messy input, such as their names against that canonical list – for example, matching Acme Widgets, Acme Widgets Inc and Acme Widgets Incorporated to the canonical "Acme Widgets". The focus of nomenklatura is on data integration, it does not provide further functionality with regards to the people and organisations that it helps to keep track of. This information is available in data cleaning tools like OpenRefine or in custom data processing scripts, so that you can automatically apply existing mappings in the future. It then helps you to define which of these entities are duplicates and what the canonical name for a given entity should be. The service will create references for all entities mentioned in a source dataset. Nomenklatura de-duplicates and integrates different names for entities - people, organisations or public bodies - to help you clean up messy data and to find links between different datasets. Standards and tooling and building a community of users and contributors. Outreach and Community: Engaging and evangelizing around the concepts,.Your existing apps and workflows whether that’s Excel, R, or Hadoop! Tooling and Integration: Making it easy to use and publish data packages from.Providing a base structure on which tooling and integration can build. Standards: A small set of lightweight ‘data package’ standards and patterns.Improved data quality, utilization and sharing. We think that by getting a few key pieces in place we can reduce frictionĮnough to revolutionize how the (open) data ecosystem operates with massively Simply want the best tool for the job, the easiest route to their goal. We need to make an ecosystem that, like open-source for software, is useful andĪttractive to those without any principled interest, the vast majority who Productive and attractive (open) data ecosystem. It kills the cycles of find, improve, share that would make for a dynamic, This friction stops people doing stuff: stops them creating, sharing,Ĭollaborating, and using data - especially amongst more distributedĬommunities. There’s too much friction working with data - friction getting data, friction
Our biggest challenge is how to most effectively use the (very small) amount of resources available to us. This upcoming project could really use the input and advice of skilled developers and admins. In addition to adding more puppycide records, we will use the resulting information to study how news organizations report on issues of lethal force. The crawler will seek and retrieve pages related to police use of force.
Our biggest upcoming code project is the customization and implementation of a web crawler in order to accelerate the growth of our database.